Social Graph Persistence In A Java Graph Database

Graph databases are excellent for storing and analyzing social network information. However, I could not find graphDB sample code that generates more than a dozen or so vertices and edges. In this example, I will demonstrate how to construct and store a synthetic social graph of 9,100,260 (vertices, edges, and properties). The synthetic graph has 151,671 people nodes which is a hard limit based on the number of artificial names that I generated. However, you could increase the edge connections or edge properties to expand the graph further, both are configurable variables.

To generate the artificial names, I downloaded Census name lists and combined the male and female first names with the last names list. You can download the completed list here.

The persistent graph is distributed across multiple database instances (aka graph partitioning or graph sharding) to provide scalability. To build the sharded graph, I downloaded and installed the InfiniteGraph Java Graph Database. The next step is to set placement strategies that allow for multidatabase placement (set in the properties file for the graph). Graph placement is configured at runtime with customization possible by implementing your own placement classes. I just used the default multidatabase placement.

Here are the exact properties that need to be set for multidatabase placement:

The code loops through the name list file generating people vertices. A second loop goes through the people nodes connecting them together using a random algorithm. The social network context is one person pays another with different kinds of transactions: cash, wire, etc. The relationship edges between people are paysTo with a polymorphic collection of TransactionType. The collection along the edge stores instances of the specific payment type between the sender and recipient.

To summarize what this sample shows:

  • Creating a large synthetic social graph

  • Edges with polymorphic collections

  • Graph sharding over multiple databases

Code:

Filed under  //   graphdb   nosql  

Comments (1)

Jul 14, 2010
Todd Stavish said...
For an explanation of the graph partitioning and memory model that happens in this code snippet -> http://post.ly/n9fQ

Leave a comment...