Graph Sharding In InfiniteGraph Database
I have been experimenting with InfiniteGraph quite a bit lately. One of my goals was to use InfiniteGraph’s distribution capabilities (aka graph partitioning or graph sharding). To shard a graph in InfiniteGraph, you need to set a placement strategy that enables multidatabase placement (set in the properties file for the graph).
Here are the specific properties that need to be set for multidatabase placement:
Runtime configuration means that graph distribution is also decided at runtime. Eventually you’ll be able to develop your own graph placement classes to shard anyway you want. InfiniteGraph will also be providing a selection for common sharding patterns.
Another feature I wanted to try was InfiniteGraph’s thread model. The thread model is very flexible. Basically, each thread has its own database cache and locking, which means threads can operate on different parts of the graph simultaneously. There is no shared query engine or centralized cache. The architecture was designed for horizontal scaling by adding more database shards or more threads. Caches are pooled for efficient memory use. Threads can also pass their cache to another thread for pipelining.
In the read case, threads can operate on the same parts of the graph without conflict. For updates, other threads are locked out until the update completes. If there is conflict, the thread can be configured to wait for the lock to release or move on. InfiniteGraph also detects deadlocks and race conditions. Eventually, InfiniteGraph will provide a work queue that will take care of all of the conflict resolution for you. The objective is to store the graph as it is ingested in any form and then work out the conflicts for you later.
As I was coding this sample application, I thought I would try to create an edge that holds a collection. I wanted to have one edge between two vertices that stored several similar typed properties (decided at runtime). The example I came up with is if one person pays another with different kinds of transactions, for instance using paypal, giving cash, or a wire transfer. I wanted to have a single edge of type paysTo with a polymorphic collection of TransactionType’s that store instances of the specific payment type.
To experiment, I created a synthetic dataset from US Census information. I used a list of the most popular surnames and combined them with common first names. I ended up with a unique list of names that is 151,671 entries long. This is great for creating synthetic social networks. I connect the people in my fake social network randomly.
My sample application does have a tendency to create deadlocks (which InfiniteGraph detects). I need to think of a better way of ingesting my social network to avoid this issue. It is really an artifact of the simple code that I wrote and not a reflection of InfiniteGraph capabilities or shortcomings. If anyone can think of a better approach or a more realistic use-case, please let me know. I’d be happy to try it a different way. I also want to thank Darren Wood, the lead architect of InfiniteGraph for answering my questions. I am looking forward to future releases and learning how to use InfiniteGraph better.
To summarize what this sample shows:
Creating a synthetic social graph
Edges with polymorphic collections
Graph sharding over multiple databases
Multithreaded graph ingest
Here’s the code if you are interested.
