Stream Processing Blockchain to Neo4j
The goal of this project was to build an ETL system processing live online bitcoin transactional data and loading it to Neo4J graph database on the fly. The data included not only block chains but all details about transactions and interactions between addresses.
This project was a continuation of my previous project where I built a system for efficient bulk loading of all historical data.
In this project, I used Kafka as a data bus. I divided the process into multiple stages:
- monitoring bitcoin nodes,
- gathering data via bitcoin JSON RPC API,
- pre-processing and cleaning of the data,
- aligning data with Neo4j graph model designed in the previous project,
- loading and processing Neo4j transactions.
The system was designed to run for a long time gathering live increments of blockchain database in a stream fashion. It was fully encapsulated inside docker containers.
The final Neo4J graph data was used by a fintech company to analyze bitcoin transaction flows, network centrality, paths analysis, community detection, and many many more.