Discussion Forum

Expected performance

Hi,
I am trying out Grakn and I consistently encounter very poor performance. My schema is very simple - 1 entity with 15 attributes and one infered relationship (if entity e1 and entity e2 both have same value for attribute ip, create relationship same-ip). I have about 50k entities loaded and one entity usually has this relations ship with 0 - 5 other entities.

When running simple queries I get instant response (e.g. match $x isa application; get; offset 0; limit 30;) but when querying for stuff related to the relationship (e.g. match $si (same-ip-app: $x, same-ip-app: $y) isa same-ip; get $x; offset 0; limit 10;) I have to wait like 15 to 30 minutes to actually get a response. I am using the official docker image and latest Workbase.

Is there something I did wrong in the setup or is it just not designed for this kind of schema and volume?

Hi - this does sound quite slow for the scale of data you have… can you post the schema to start? And if possible, also the data loader?

Hi,
thank you for answering. My schema looks like this (i renamed the unused attributes) https://pastebin.com/JRVGZ2Cr . I loaded it from two files (marked by the comment in schema on line 36), don’t know if it makes any difference.
I cannot post the data, but it’s basically 50k times this, but with different values https://pastebin.com/kVhuadWf
I also tried executing same queries using the grakn console with very similar results. What would we the expected response time range for such queries?

I haven’t reproduced anything quite yet, but wanted to point something out that might already help:
In Grakn, every attribute with a specific value exists exactly once. Everything owning that attribute type&value points to the same vertex with a special ownership relation. You can actually rewrite your rule from:

ip-mutuality sub rule,
when {
    $a1 isa application;
    $a2 isa application;
    $a1 has device_ip_address $ip1;
    $a2 has device_ip_address $ip2;
    $ip1 == $ip2;
    $a1 != $a2;
}, then {
    (same-ip-app: $a1, same-ip-app: $a2) isa same-ip;
};

to

ip-mutuality sub rule,
when {
    $a1 isa application;
    $a2 isa application;
    $ip isa device_ip_address;
    $a1 has $ip;
    $a2 has $ip;
    $a1 != $a2;
}, then {
    (same-ip-app: $a1, same-ip-app: $a2) isa same-ip;
};

I’m not sure that will help with your performance issue, but it might be worth trying out first :slight_smile: let me know

1 Like

OMG that actually made such a difference, now I get results in 1-2 seconds! Thank you so much!

btw I had to change lines
$a1 has $ip;
$a2 has $ip;
to
$a1 has device_ip_address $ip;
$a2 has device_ip_address $ip;
to make it actually work.

Oh yeah forgot about that restriction! Cool!

the reason it was slow before is that you were doing an O(N^2) comparison (due to the ==) between al pairs of IP addresses, which is extremely expensive. Really, all data points linked to the same values are already linked due to Grakn’s native data model, so your rule is a bit redundant. However, if it makes your queries clearer, feel free to play around!

1 Like

Yeah, I understand that now, this rule was basically my hello-world introduction :slight_smile: Awesome!