Discussion Forum

Possible to use external (distributed) cassandra as backend

Hello,

I am new to GRAKN and it looks promising!

Is it correct that GRAKN (by default) is using a cassandra database as a backend, with a tinkerpop/janusgraph software layer on top. I cannot find any description of this in the documentation, it is only mentioned in other threads and blogs.

If so, is it possible to setup your own (possibly distributed) cassandra cluster and use that. I like to know if this is possible with the GRAKN Core, and without GRAKN KBMS.

Best regards,
Pieter.

Hi @pietermarsman!

Yes that is correct, Grakn is built on top of Cassandra and Tinkerpop/Janusgraph.

However, they are not separate entities and more or less comes in one box. Grakn is not designed to run with a separate Cassandra cluster.

If you need a cluster, you are encouraged to use KGMS (KBMS is the old name) which is available via Google Cloud or Amazon AWS.

Best regards,

Ganesh

@ganesh

What is the reasoning behind not allowing the use of external Cassandra.
Cassandra was designed to be a clustering data store and its true power only comes when there is a sufficiently large cluster. Running an embedded standalone instance is fine for local development work but that’s not acceptable in Production.
I understand KGMS is an option but the appearance is that people are being forced to pay for it.

I figure out how to get Grakn to connect to an external Cassandra cluster.
Before starting the server, modify grakn.properties with he Cassandra host like this

sed -ie ‘s|storage.hostname=.*|storage.hostname=${var.cassandra_host}|’ /grakn-core-all-linux/server/conf/grakn.properties

Unfortunately the embedded Cassandra instance will still run and waste a bit of resources.
This proves that it is possible to use an external Cassandra cluster.
I hope this setup will one day be officially supported.

Hi Mingfang, while you can indeed theoretically connect one Grakn server to a cluster of Cassandra instances, this will not scale the querying and processing - this will just scale the storage (which requires more processing power at query time, as the storage gets bigger). Grakn is not designed to work in this way (as Ganesh points out), to scale Grakn and have multiple instance of Grakn servers (each with its own Cassandra instance), this requires features that are only implemented in KGMS.

Grakn KGMS is indeed our enterprise product, which you may use on a per-hour consumption basis on various cloud marketplaces without the need to make any long-term commitment. We do see the need to evaluate KGMS, which is why in the future we will also offer a free tier for Grakn KGMS.

@tomas Thanks for the clarification.
Are there any plans to open source KGMS?

At the moment we do not have plans to open source Grakn KGMS as it’s our enterprise product, used by organisations who are building software that’s running in production. To understand better your use case and how we could help you, please do feel free to message me directly and we could talk in more depth.

Tomás