InfiniteGraph: Memory Model And Graph Partitioning

A few questions have arisen on the partitioning / sharding / distribution features of the InfiniteGraph demo code. I thought I’d take a moment to help explain it better.

First, Arvind posted a good description on InfiniteGraph’s underlying plumbing. InfiniteGraph is based on Objectivity. Kind of like other low latency database architectures such as HBase on Hadoop or Cassandra on Dynamo.

Underneath an InfiniteGraph database instance is an Objectivity federated database. Objectivity has deep experience building petascale database architectures. Logically, a federated database, looks like a single database instance. Databases can be distributed accross servers or disks with all of the logical to physical mapping done by the database engine. Each database is a peer so there is no shared query engine or centralized server. Threads have their own database cache, so database requests stream directly from servers to threads without contention. The federated database model allows for edges to transparently span databases and subgraphs are delivered in pages to optimize I/O and network traffic.

InfiniteGraph abstracts this idea even further programmatically. There is no thought of distribution in the application code. The placement managers take care of it all. The plan is that InfiniteGraph will provide a number of common placement algorithms. You will have plugin classes that are chosen at runtime from the configuration properties. The default multidatabase placement manager that I used in my example, basically streams the graph in sequentially per thread. The database the default multidatabase placement chooses depends on the capacity of storage remaining, or if there is an open lock.

InfinteGraph’s partitioning is not a magic bullet. Partitioning will open doors scalability-wise for graph persistence and optimize a difficult problem. However, there are tradeoffs that need to be made moving from an embedded graph instance to a distributed one, for instance, colocation or clustering advantages. If you have any suggestions or questions, you can reach me here.

Filed under  //   graphdb   infinitegraph   nosql  

Comments (0)

Leave a comment...