Building an InfiniteGraph Application with Maven

Here are the steps to get the graph data store, InfiniteGraph, working with a Maven project management environment. The codebase is a small social network application.

  1. mvn archetype:create -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=com.infinitegraph.sample -DartifactId=GraphAPISample
  2. Use this POM w/ filesystem dependency
  3. Copy GraphAPISample.java to src/main/java/com/infinitegraph/sample
  4. Add GraphAPISample to package com.infinitegraph.sample
  5. Create a resources directory for your properties file with ‘mkdir src/main/resources’
  6. Copy SampleGraphProperties.properties to the resources directory. Note, this will move to the target directory during the compile step, so it can be picked up at runtime without path qualification. Therefore, take out the path in your code, for example:  GraphFactory.open(“SampleGraph”, “SampleGraphProperties.properties”);
  7. mvn compile
  8. mvn exec:exec

Make sure you have the distributed lockserver running in advance and source the IGConfg script. If you are new to Maven, it might be easier to download my whole setup. Or get it from GitHub via “git clone git@github.com:toddstavish/InfiniteGraph-Maven-Example-Setup.git.” If you have any suggestions or questions, you can reach me here.

Filed under  //   graphdb   infinitegraph   nosql  

Comments [0]

InfiniteGraph: Memory Model And Graph Partitioning

A few questions have arisen on the partitioning / sharding / distribution features of the InfiniteGraph demo code. I thought I’d take a moment to help explain it better.

First, Arvind posted a good description on InfiniteGraph’s underlying plumbing. InfiniteGraph is based on Objectivity. Kind of like other low latency database architectures such as HBase on Hadoop or Cassandra on Dynamo.

Underneath an InfiniteGraph database instance is an Objectivity federated database. Objectivity has deep experience building petascale database architectures. Logically, a federated database, looks like a single database instance. Databases can be distributed accross servers or disks with all of the logical to physical mapping done by the database engine. Each database is a peer so there is no shared query engine or centralized server. Threads have their own database cache, so database requests stream directly from servers to threads without contention. The federated database model allows for edges to transparently span databases and subgraphs are delivered in pages to optimize I/O and network traffic.

InfiniteGraph abstracts this idea even further programmatically. There is no thought of distribution in the application code. The placement managers take care of it all. The plan is that InfiniteGraph will provide a number of common placement algorithms. You will have plugin classes that are chosen at runtime from the configuration properties. The default multidatabase placement manager that I used in my example, basically streams the graph in sequentially per thread. The database the default multidatabase placement chooses depends on the capacity of storage remaining, or if there is an open lock.

InfinteGraph’s partitioning is not a magic bullet. Partitioning will open doors scalability-wise for graph persistence and optimize a difficult problem. However, there are tradeoffs that need to be made moving from an embedded graph instance to a distributed one, for instance, colocation or clustering advantages. If you have any suggestions or questions, you can reach me here.

Filed under  //   graphdb   infinitegraph   nosql  

Comments [0]

Social Graph Persistence In A Java Graph Database

Graph databases are excellent for storing and analyzing social network information. However, I could not find graphDB sample code that generates more than a dozen or so vertices and edges. In this example, I will demonstrate how to construct and store a synthetic social graph of 9,100,260 (vertices, edges, and properties). The synthetic graph has 151,671 people nodes which is a hard limit based on the number of artificial names that I generated. However, you could increase the edge connections or edge properties to expand the graph further, both are configurable variables.

To generate the artificial names, I downloaded Census name lists and combined the male and female first names with the last names list. You can download the completed list here.

The persistent graph is distributed across multiple database instances (aka graph partitioning or graph sharding) to provide scalability. To build the sharded graph, I downloaded and installed the InfiniteGraph Java Graph Database. The next step is to set placement strategies that allow for multidatabase placement (set in the properties file for the graph). Graph placement is configured at runtime with customization possible by implementing your own placement classes. I just used the default multidatabase placement.

Here are the exact properties that need to be set for multidatabase placement:

The code loops through the name list file generating people vertices. A second loop goes through the people nodes connecting them together using a random algorithm. The social network context is one person pays another with different kinds of transactions: cash, wire, etc. The relationship edges between people are paysTo with a polymorphic collection of TransactionType. The collection along the edge stores instances of the specific payment type between the sender and recipient.

To summarize what this sample shows:

  • Creating a large synthetic social graph

  • Edges with polymorphic collections

  • Graph sharding over multiple databases

Code:

Filed under  //   graphdb   nosql  

Comments [1]

Graph Sharding In InfiniteGraph Database

I have been experimenting with InfiniteGraph quite a bit lately. One of my goals was to use InfiniteGraph’s distribution capabilities (aka graph partitioning or graph sharding). To shard a graph in InfiniteGraph, you need to set a placement strategy that enables multidatabase placement (set in the properties file for the graph).

Here are the specific properties that need to be set for multidatabase placement:

Runtime configuration means that graph distribution is also decided at runtime. Eventually you’ll be able to develop your own graph placement classes to shard anyway you want. InfiniteGraph will also be providing a selection for common sharding patterns.

Another feature I wanted to try was InfiniteGraph’s thread model. The thread model is very flexible. Basically, each thread has its own database cache and locking, which means threads can operate on different parts of the graph simultaneously. There is no shared query engine or centralized cache. The architecture was designed for horizontal scaling by adding more database shards or more threads. Caches are pooled for efficient memory use. Threads can also pass their cache to another thread for pipelining.

In the read case, threads can operate on the same parts of the graph without conflict. For updates, other threads are locked out until the update completes. If there is conflict, the thread can be configured to wait for the lock to release or move on. InfiniteGraph also detects deadlocks and race conditions. Eventually, InfiniteGraph will provide a work queue that will take care of all of the conflict resolution for you. The objective is to store the graph as it is ingested in any form and then work out the conflicts for you later.

As I was coding this sample application, I thought I would try to create an edge that holds a collection. I wanted to have one edge between two vertices that stored several similar typed properties (decided at runtime). The example I came up with is if one person pays another with different kinds of transactions, for instance using paypal, giving cash, or a wire transfer. I wanted to have a single edge of type paysTo with a polymorphic collection of TransactionType’s that store instances of the specific payment type.

To experiment, I created a synthetic dataset from US Census information. I used a list of the most popular surnames and combined them with common first names. I ended up with a unique list of names that is 151,671 entries long. This is great for creating synthetic social networks. I connect the people in my fake social network randomly.

My sample application does have a tendency to create deadlocks (which InfiniteGraph detects). I need to think of a better way of ingesting my social network to avoid this issue. It is really an artifact of the simple code that I wrote and not a reflection of InfiniteGraph capabilities or shortcomings. If anyone can think of a better approach or a more realistic use-case, please let me know. I’d be happy to try it a different way. I also want to thank Darren Wood, the lead architect of InfiniteGraph for answering my questions. I am looking forward to future releases and learning how to use InfiniteGraph better.

To summarize what this sample shows:

  • Creating a synthetic social graph

  • Edges with polymorphic collections

  • Graph sharding over multiple databases

  • Multithreaded graph ingest

Here’s the code if you are interested.

Filed under  //   graphdb   infinitegraph   nosql  

Comments [1]

InfiniteGraph: Referential Integrity

Based on my post yesterday, the guys over at InfoGrid brought up some good points on InfiniteGraph’s edge implementation. I wrote some code to see if you could create a dangling reference case. I couldn’t find a way to do it.

It looks like, when you create an Edge, it is transient. To add it to the database you must call one of the addEdge variants which makes the edge persistent and requires you provide source and destination vertices. There are no dangling references, so referential integrity is kept.

Filed under  //   graphdb   infinitegraph   nosql  

Comments [0]

Get A Taste Of InfiniteGraph: HelloWorld For Graph Databases

I agree with Mr. Popescu. The best way to validate that you’ve got the basics right about a system is to use some basic code.  With this idea in mind, I ported his tagging app to InfiniteGraph, a new graph database.  Some things I learned in this effort about InfiniteGraph:

  • non-primitive vertex attribute types are supported (check out the timestamp in the Resource vertex)
  • efficient path evaluation with qualifiers and navigation result handlers
  • named vertex indexing is simple but powerful

I am looking forward to trying InfiniteGraph in a distributed deployment for my next post.

The tagging app console output looks like this:

Listing all tags and resources:

Tag: good

 Resource: http://xkcd.com/

 Resource: http://cnn.com/

Tag: funny

 Resource: http://xkcd.com/

 Resource: http://theonion.com/

xkcd is tagged with:

 Tag: funny

 Tag: good

Listing all paths to tagged sites:

Found matching path : goodhttp://xkcd.com/

Found matching path : goodhttp://cnn.com/

Found matching path : goodhttp://xkcd.com/funnyhttp://xkcd.com/

Found matching path : goodhttp://xkcd.com/funnyhttp://theonion.com/

Found matching path : funnyhttp://xkcd.com/

Found matching path : funnyhttp://theonion.com/

Found matching path : funnyhttp://xkcd.com/goodhttp://xkcd.com/

Found matching path : funnyhttp://xkcd.com/goodhttp://cnn.com/

Here’s the code if you are interested.

Filed under  //   graphdb   infinitegraph   nosql  

Comments [2]

InfiniteGraph: First Look

Graph databases are great for use in social network analysis applications. I built an enterprise content analytics system on Neo4J that visualized and measured centrality over users and topics. Neo4j has a very clean API, a purist’s view of the graph for sure. Coming from an OODBMS background, I have seen my share of implementations that push too much of the object model into the database. It was nice to see Neo’s strict separation of application code from the persistence layer (objects versus nodes). However, I could still envision requirements that could use graph persistence and also need modeling complexity around the graph, a hybrid graphDB / objectDB. The recently announced InfiniteGraph may be such a datastore. InfiniteGraph is a graph database implementation based on an OODBMS engine, a blend of a graph API with a distributed object persistence store. The sample code that comes with the install is shown below. If you have any suggestions or questions, you can reach me here.

Filed under  //   graphdb   infinitegraph   nosql  

Comments [2]