How To Use A Graph Database to Integrate And Analyze Relational Exports
Graph databases can be used to analyze data from disparate datasources. In this use-case, three relational databases have been exported to CSV. Each relational export is ingested into its own sharded sub-graph to increase performance and avoid lock contention when merging the datasets. Unique keys overlap the datasources to provide the mechanism to link the subgraphs produced from parsing the CSV. A REST server is used to send the merged graph to a visualization application for analysis.
The necessary components are below:
- InfiniteGraph supplies the distributed graph database
- RESTlet provides web service access to the data
- GSON and custom parsing code produces the JSON representation
- Gephi is used for interactive visualization and data exploration
A graph index is used to store the interlinking keys amongst the relational exports. The common graph index allows the ingests to be separated into a multithreaded client-side ingest. On the server side, InfiniteGraph offers parallel data loading to scale horizontally to the multithreaded ingest. Upon ingest, the Gephi streaming API is used to request the graph data from the REST server. Gephi has a variety of built-in graph algorithms and customized visualization settings. The following screenshot shows Gephi visualizing the consolidated graph. The source code for this sample application can be found in GitHub. The code is completely self contained including sample CSV data.
Comments [0]

