Discussion Forum

Any thoughts on loading the Bitcoin blockchain into Grakn?

Might be an interesting use case: a Grakn-based query & browser tool for Bitcoin. Even better if it could be linked to other data sources, e.g. on wallet owners, OpenCorporates etc. I’m thinking for fraud investigation, due diligence etc.

It’s quite a large a growing dataset. How would Grakn cope with the volume? It’s been done for Neo4j and is apparently reasonably fast.

The combination of blockchain raw data with an entire environment of related data is precisely what Grakn excels at!
In terms of performance, we’re about to start benchmarking and iterating on performance. It would be an interesting stress test to load the Bitcoin blockchain (how big is it? what kind of queries would you perform?)
Without testing it out it’s hard to say how the load would be handled.
That being said, it’s possible to distribute Grakn or spin up a larger instance on GCP/AWS to improve performance if that is the bottleneck.

I believe the bitcoin blockchain is currently over 200GB and obviously growing. I am thinking about using Grakn to query the provenance of particular bitcoins by tracing the chain of transactions backwards across the wallets (nodes) to a chosen depth.
I might also want to do analysis in the vicinity of chosen wallets to see if there are patterns of busy transaction routes or clusters of activity, e.g. representing money laundering across a chain of wallets.
Another simpler query is to work out the ‘balance’ in a wallet, which is the sum of all the ‘unspent transaction outputs’ (UTXO’s) in a wallet, i.e. the total bitcoin ‘inputs’ (inflows) received that have not yet been spent. The balance needs to be adjusted for unconfirmed transactions that have not yet been validated by miners, like uncleared cheques in your bank account. The unconfirmed transactions must be adjusted for any ‘change’ to be returned to the wallet. Change arises because when you receive an amount of bitcoin into your wallet, you can only spend the whole amount. If you want to pay less, you send out the full amount and send the change back to yourself.

It sounds like you may need arithmetic in your queries which we don’t support yet, though it is planned for this year. The perfectly valid alternative is to do most of the arithmetic on the client, post-retrieval.

You could best take advantage of Grakn by modeling the larger Bitcoin environment to do exactly what you’re proposing: pattern match for money laundering etc (in other words, exploring the previously unknown relationships in the domain). Your best bet is to give it a shot! If you run into performance issues you can scale horizontally or vertically on Google Cloud or AWS, but expect baseline improvements in the next 3-6 months too!