Discussion Forum

Simplifying the Python client queries (Grakn 2.0)

It seems to me, that my python side queries are complex and far from optimized and I would like some directions on how to do them better.

Let take a simple case and say I have an entity of type person that looks like this:

person sub entity,
  owns email @key,
  owns first_name,
  owns last_name;

(all the attributes are strings)

I also have a Python person class with the same attributes.
I inserted a person to Grakn (with only an email) and now I want to fetch it (and later convert it back into a person object). My current code for fetching an entity with all it’s attributes is:

def get_generic_entity_query(query: str, server: Optional[str] = None, database: Optional[str] = None,
                             fetch_attr: bool = True) -> List[Dict[str, Union[dict, Any]]]:
    server, database = set_default_connection_data(server, database)
    query_results = []
    with GraknClient(address=server) as client:
        with client.session(database,  SessionType.DATA) as session:
            with session.transaction(TransactionType.READ) as read_transaction:
                answer_iterator = read_transaction.query().match(query)
                answers = list(answer_iterator)
                for answer in answers:
                    for answer_k, answer_v in answer.map().items():
                        if answer_v.is_entity():
                            remote_answer_v = answer_v.as_remote(read_transaction)
                            res = {"Type": remote_answer_v.get_type().get_label(), "id": remote_answer_v.get_iid()}
                            if fetch_attr:
                                attrs = remote_answer_v.get_has()
                                attrs_dict = {}
                                for each in attrs:
                                    remote_each = each.as_remote(read_transaction)
                                    if remote_each.get_type().get_label() in attrs_dict:
                                        if type(attrs_dict[remote_each.get_type().get_label()]) is list:
                                            attrs_dict[remote_each.get_type().get_label()].append(remote_each.get_value())
                                        else:
                                            attrs_dict[remote_each.get_type().get_label()] = [attrs_dict[remote_each.get_type().get_label()],
                                                                                              remote_each.get_value()]
                                    else:
                                        attrs_dict[remote_each.get_type().get_label()] = remote_each.get_value()
                                res["attributes"] = attrs_dict
                            query_results.append(res)
    return query_results

For the query:
match $ent isa person, has email "dddd@fff.com"; get $ent; limit 1;

I will get:
[{'Type': 'person', 'id': '966e80028000000000000000', 'attributes': {'email': 'dddd@fff.com'}}]

Now, this looks like a lot of code and lots of roundtrips for a simple grakn query (find me the entity and return it to me with all its attributes).
Is there a significantly simpler way to do this?
If not then:

  1. How can I simplify and optimize the above method?
  2. Will requesting all the attributes explicitly help?
  3. Can we add (in the future, I know you are busy at this time) a high-level client and keep all these roundtrips at the server wherever it is possible? I believe a simple JSON response is very feasible for the large majority of the queries and can be built on server to be very efficient. Then, on the client-side, I will only need to move to the lower levels for more complex tasks.

I hope there is a simpler way or we will be able to make things simple and thanks for reading this long question :flushed:

Ill post a general guideline on how to do some of the things you’re trying to do:
if you really want to dump the entire type and contents of each entity, you’re going to have to do more round trips. In general, we avoid writing “generic” queries like the one above as its inefficient to dump pieces of the knowledge graph like this. Instead, its better to write custom queries to extract the information you’re looking for.

To first address exactly your question:

def get_query_entities_dump_json(query, ...[other required params]):
...session.transaction(READ) as tx:
  answer_iter = tx.query().match(query)

  dump = []

  for answer in answer_iter:
    for var, concept in answer.map().items():
      if concept.is_entity():
        entity_dump = { "type" : concept.as_remote(tx).get_type().label(), "iid" : concept.get_iid()}
        entity_dump["attributes"] = dump_entity_attributes(entity, tx)
        dump.append(entity_dump)

  return dump

def dump_entity_attributes(entity, tx):
  get_everything_query = "match $x iid {}, has $a; $a isa! $attr-type;"
  dump_query =  get_everything_query.format(entity.get_iid())
  attributes = {}
  answers = tx.query().match(dump_query)
  for ans in answers:
    attributes[ans.get("attr-type").get_label()] = ans.get("a").get_value()
  return attributes

The dumping of attributes is at least streamed back from the server, so its the minimal number of round trips.

The preferred, more “grakn” way to do this, would be to write the query you actually require in the first place, and have a way to convert that kind of query into a json object:

For example, if you want to get people, and their emails:
match $x isa $person-type, has $email; $email isa $email-type; $person type person; $email-type type email;…
then figure out a way to convert this into JSON (perhaps based on variable naming convention you come up with, to avoid round trips?

In essence, being as specific as you can with your original query means you do the last number of round trips and get the data you’re looking for up front. Of course, if you do want to get all attributes, even ones you didn’t really explicitly ask for, you’ll have to use the Concept API as in your code, or a more generic query as in my rewrite of your code.

Hope this helps!

Thanks :smiley:
This seems exactly like what I was looking for.

I will implement the specific object builder as suggested (as well as the generic for rapid prototyping).

Lots of thanks, it was very helpful

1 Like

Hi @joshua,
I implemented both.
The general query works well and given the key/attribute value and object type it returns a list of these initiated instances.

For the more explicit query, I must say I will be happy to understand why you wrote the query as you wrote it? and how does it affect what is happening inside Grakn?

Another question is:

My current implementation has a python code that is something like:

person = Person(email=EmailStr("dddd@fff.com"))
....
extended_person = extend_person(p, attr=["first_name", "last_name"])

Which generate the following query (based on the data that was given and the data that is being requested):

match $x is a person, has email "abc@aaa.com", has first_name $first_name, has last_name $last_name; get $first_name, $last_name;

Is there a way to build the query in a way that if a person gave only its first name (the last name attribute was not set) the query will still return with only the first name?
Currently, it returns nothing as it does not have last_name.

Many thanks

Do you mean this explicit query?

match $x isa $person-type, has $email; 
    $email isa $email-type; $email-type type email;
    $person-type type person; 

I did it this way so you get the type of the attribute and the instances of $x. if you don’t include the explicitly named variables $person-type, you won’t get the instance type back! We only return user-named variables.
eg. match $x isa person, has email $email; doesn’t return type of $x or $email, you’ll have to do more queries to find them.

If you want the exact type only, you should use the following:

match $x isa! $person-type, has $email; 
    $email isa! $email-type; $email-type type email;
    $person-type type person; 

If you’re essentially trying to get all the attributes associated with a particular person, you’re best bet is to collect the set of answers that come back from a query such as:

match $x is a person, has email "abc@aaa.com", has $attr; $attr isa! $attr-type; 

@joshua as this code snippet is being useful for me, it can be for others, so I make some observations.

I think that this line:

should be written as:

entity_dump["attributes"] = dump_entity_attributes(concept, tx)

and maybe this code is not updated, because to run it, I also had to change:

with:

 entity_dump = { "type" : concept.as_remote(tx).get_type().get_label(), "iid" : concept.get_iid()}