Data Modeling Bitcoin Blockchain to Neo4j

The goal of this project was to model blockchain historical data as a graph and to build an ETL system loading the entire data (blocks, transactions, inputs, outputs, addresses) into a Neo4j graph database. The solution I designed performed 10x faster than the best reported system.

The state-of-the-art systems performing such conversion are run in multiple servers for a period of 10 days to several weeks. In this project I managed to come up with a low-level bulk load solution loading all blockchain data to Neo4j overnight (~8 hours).

The process involved parallel parsing of raw bitcoin-core database files and converting into Neo4j compatible format. Following, the data was imported with low-level Neo4j bulk import tools. The source bitcoin raw database was about 400GB and the output Neo4j database was about 1TB.

The system was designed to run once as an init bulk load. It was fully encapsulated inside docker containers, including bitcoin-core and synchronization synchronization.

The final Neo4J graph data was used by a fintech company to analyze bitcoin transaction flows, network centrality, paths analysis, community detection, and many many more.

Data Scientist | Machine Learning Engineer | AI Advisor

20 years of experience in data processing from A to Z