Real-time Relationship Analytics From Large-scale Graph Processing

Cassandra excels at storing large, active, decentralized datasets. Additionally, Cassandra’s rich data model allows efficient use for many applications beyond simple associative arrays. One interesting application is the processing of large-scale graph structures.

I have devised a graph application layer to extract and process social network analysis data from Cassandra, using InfiniteGraph. The technical benefits of the social-graph-extract application layer and its use of graph-oriented processing have been articulated.

Social network analysis is one application of a more general category, relationship analytics, as defined by Curt Monash. The relationship analytics problem domain maps well to the unique features of the Cassandra-InfiniteGraph hybrid system:

  • dedicated vertex/edge API
  • data can be clustered according to vertex/edge proximity
  • disk-based/memory-centric access
  • peer-to-peer communication from InfiniteGraph node to Cassandra node
  • bidirectional updates between raw Cassandra data and Infinitegraph analytics
  • parallel streaming and caching from InfintiteGraph
  • modeling flexibility to support a variety of sources
  • redundancy and high-availability
  • precision and speed for graph analytics
  • finding extremely long paths, all paths, unknown paths, or paths of nontrivial or indeterminate length

Current business problems that can utilize these features:

  • analyzing high-frequency trading
  • discovering high degrees of mutual interconnection in social networks
  • data mining subtle retail correlations
  • product recommendation engines
  • determining terrorist or criminal behavior inferred from known relationships
  • finding a pattern of relationships for fraud detection
  • investigating the directed relationships between proteins and genes
  • checking which entity has the shortest average connection to a group of others for cyber security (botnet controller)

The working codebase for this Cassandra / InfiniteGraph integration can be retrieved from GitHub. Forking of the main project is welcome (including downstream updates). If you have any questions or suggestions, please contact @toddstavish.

Filed under  //   cassandra   graphdb   relationship analytics  

Comments (0)

Leave a comment...