Discussion Forum

Spark Grakn Integration

Hello Folks, any plan to create a Grakn connector to Spark?

Reason: someone like me need tro do heavy duty custom analytics on the hypergraph.

If there was a way to connect grakn data to spark, one would do the advanced analytics there and import the results back to grakn.

Thanks
Mirco

Hi Mirko!

We are planning to support that in Grakn KGMS, our scalable enterprise offering. You can check with @tomas (tomas@grakn.ai) if you are interested.

Best regards,

Ganesh

Hi Ganesh, thanks! Of course that does interest me

@tomas, any idea when this release will be?

Regards,
Mirco

Mirco,

My bad, I may have misunderstood your question. If what you mean by creating a “Spark connector” is to create the ability of accessing the graph in the form of Spark’s RDD, unfortunately we have not yet planned that in our roadmap for either Core or KGMS.

I can raise it with the team to consider it if you think this is an essential feature.

Best regards,

Ganesh

Ganesh, no problem. Yes, for me it is an essential feature: why? Because if you are looking to do real heavy duty innovative analytics on a hypergraph, I see only two ways

  1. make a plug in architecture by which one can add custom code to be executed by graql queries (a bit like functions in SQL) or

  2. what I propose: get a connector to spark, move your hypergraph in nspark graphframes, do the magic there and go back to grakn

Best
Mirco

In this blogpost on DZone (https://dzone.com/articles/get-started-with-graknai) it looks like GRAKN is built on top of Apache Spark. Is that something different?

As an intermediate solution you might consider connecting Spark to the Cassandra cluster directly. (I am not an expert but) I think this will give you primitive access to the graph with Spark RDD’s or DataFrame’s.

Apache Spark is used internally by Grakn for running compute query in a distributed fashion. It is not exposed for use by the user.