Search for Reference Architecture to build a Correaltion Engine

We are developing a correlation engine in which we have multiple data sources to get data with different format of data for each source. The data being used in our problem is structured and we need to perform following tasks to build data pipeline
1- Fetch data from data sources.
2- Parse data to convert it into same format.
3- Ingest data into Grakn db.
4- Build correlations between data of different sources.
5- We are also concerned about scalability of the system to add more data sources in future.

For this purpose we need a relevant Reference Architecture in which Grakn is being used as a consumption point of data in data pipeline.

Please share your thoughts and relevant articles.

Hi @mkamran welcome to the community!

It might be helpful to know the domain you are operating in as well as how are you intending to load data into TypeDB?

When you say “parse data to convert it into the same format”, this is done during the loading process which requires some manual mapping of the data from its current format to the TypeDB schema you’ve built.

You can have a look at the BioGrakn Covid repo to see how this was done in the biomedical space. Also, take a look at TypeDB loader (formerly known as GraMi) which makes this process easier.

The dynamic nature of TypeDB schema makes defining new concepts and/or redefining what those correlations are based on, really simple.

Lastly, I would recommend joining our discord server while you are working through this, there are many community members who are doing similar things and can lend you their insights.



Hi daniel,
I have joined discord server, can you please guide me how to interact with community in that space? I have seen there are multiple channels in the community but they are not public so I need guidance regarding this.