Discussion Forum

Handling Rules at Different Levels of Abstraction

Hello,

I am relatively new to Grakn and am trying to figure out how to model my schema appropriately. I will illustrate my question with a specific example.

Let’s say I am trying to model the development of skill in a baseball player. I can think of this at different levels of abstraction for example, going from the abstract to the most concrete (see image):
Levels of Abstraction

Of course, a behavior can develop more than one skill. For example, ‘batting practice’ could develop ‘hand-eye coordination’, ‘upper body strength’, etc.

I would like to understand the best way to model this example in Grakn. The thing I am trying to enable with this example is to be able to define a relationship (or a rule?) at the highest level of abstraction and then allow Grakn to infer the rule at the lower level.

For example, if I were to then add an instance of a Behavior called ‘Catching Fly Balls’, I would want Grakn to two things:

  1. (Ideally) attempt to infer which skill is being developed by that behavior, or (more probable) at least notify me when I have created a set of behaviors that has no skills.

  2. Model (probably with attributes?) that, as the amount of time spent in the batting cage increases (e.g. 20 hours vs 1000 hours), the person’s skill in ‘home run hitting’ also increases (ideally according to some specified function e.g. home_run_hitting_skill = f(time in batting cage).

The ultimate objective in this would be to infer which athletes are skilled in what things based upon data we collect.

Any thoughts are most appreciated.

1 Like

hi @brendan ,

I really loved your model, and in Grakn there are lots of ways to model this system, most of the major variations depend on your objectives, whether trivial or looking for a game changer. Assuming the second, then my advice is to gain power through reframing the model, before considering the Grakn strategy.

Consider that in your original diagram, you have an implicit production process, that some elemental input metric like batting practice can be conceptually aggregated up through layers to an overall performance, which is then converted to some set of intrinsic skill set.

You may find it advantageous to functionalise this idea, by establishing an overall system model, such as systems dynamics, where its a stocks and flows thing (like a fluid, indivisible), so a certain input creates a flow around the network, one just works out the rates/chokes. System dynamics is super simple and breaks down to differentials in time over the arrows to control the flow.

I personally prefer to completely reframe your model into a discrete event simulation (DEV’s) model, similar to a production line. In this we consider the bottom level as observations, which provide a time-stamped metric at intervals, and that some mechanism provides these data observations to me. Taking a simple approach we frame our mechanism as being describable as a simple 3-level device (element, component and system) (see image below).

One key concept to be considered is the method by which elements contribute, they add into the Component level with a simple function based on volume, yet their rate of change over time feeds into the Supervisory level. At the Supervisory Level, the rate of change of an element is a better predictor of reactive behaviour than volume.

The advantage of the DEV’s model over the continuous systems dynamic approach is that we get explicit time sequencing of input and outputs, which enables one to better map asynchronous inputs, and we can apply causal inferencing, assuming we want to go there. Plus you could build a digitial twin if desired.

Assuming there was historical data, and the aim was to build a predictive, insights engine, then i would first conduct the modelling in AnyLogic over the historical data and known outputs in order to get a handle on the transfer functions. Finally, I would model that system in Grakn, knowing that my transfer functions were well validated, and thereby I could rely on the results of my rulesets. Probably work out some kind of testing loop.

When developing any predictive model, particularly a mechanistic one, the trick is not just to get results, but also to prove that a basic set of rsults are correct, hence the reasoning has some ground truth. The other thing to consider is the development of a proper metrics model to support multi-scale summarising of measures, which i have not discussed here due to space.

I hope these few thoughts create value around your most interesting question

Hi @modeller,

First, thank you for both your kind words and your incredibly thoughtful response. I will respond in more detail later but love much of your thinking.

As mentioned I will write more in the next day or two as I get time, but let’s say I wanted to proceed with the model as you represented it above: could you use that example and show me how you might translate that into a Grakn schema? I think seeing the translation from your conceptual model (which in many ways I like and agree with) into Graql will help me ‘bridge the gap’ that still exists in my mind.

Once I get some sense of that, I think it will be a lot easier for me to engage at the conceptual level. If you are open to it, I’d love to continue our correspondence here, both about the model so far described, as well as discussions of the metric model.

Thank you again. I look forward to hearing from you further.

Warm Regards,
Brendan

PS. A few more questions I had while thinking about this:
First, you said this:

1A. When you say ‘Supervisory’ level’, are you referring to the “system level” in your diagram or something else?

1B. Why do you claim that the rate of change is a better change of reactive behavior than volume? (And what do you mean by ‘reactive behavior’?)

Second, you said this:

2A. I’m not super familiar with DEVs - practically how are they different than a system’s dynamic approach with stocks and flows, but using discrete math vs continuous? (Or maybe that’s all your saying… :slight_smile:

2B. I didn’t understand “Build a digital twin” – what did you mean by that?

Finally…

3A. I’ve never used Anylogic (though it looks very interesting!) - it’s unclear to me how you would precisely model a transfer function (between, say, ‘time in batting cage’ and ’ performance’), since ‘performance’ is just a conceptual construct.

3B. Then, even assuming we have a transfer function, I’m curious to understand how that would be then modeled in Grakn. Some sort of attributes of a relation?

3C. More broadly, it seems like you have pretty clear vision for the roles of something like AnyLogic vs Grakn in a process like this; any further elaboration on this would be quite useful.

Sorry I’m a bit flat strap at present, but to start with.

1. Why Construct a mechanistic model?
We often have systems of interest to investigate that have dynamic responses and we want to understand their behaviour, or the driving forces of their behaviour. These can be investigated by standard data science techniques, but there is a significant limitation to what can be discovered (e.g. you may be able to predict the output on some regression, but that doesn’t mean you can explain it mechanistically). In this case, the use of ‘coarse’ models (https://en.wikipedia.org/wiki/Coarse-grained_modeling) is well established in engineering sciences and many other fields. Assuming the system is only partially observable, a rough dynamic mechanism will often produce very similar trends to the actual system, assuming it is properly developed (i.e. grounded against real data, possibly with ml-estimated parameters). In short, a mechanistic model can be developed for almost any dynamic system, combined with conventional data science techniques and produce additional insights to ml by itself, so its a natural for combining with GRAKN.

2. On Types of Models, and Multi-Scale Models
Many seemingly complex systems of interest resolve down to flat domains (interactions at a single functional scale). For example, check out InsightMaker and examine the tutorials on infection and predator-prey dynamics (https://insightmaker.com/). Your model was very different and was actually a multi-scale model, where small mechanisms, add together into a component, that adds with other components into an overall behaviour, which through some mechanism creates a perishable capability, which is actually the sum total of set of specific capabilities for each capability category. In short, you have a system and are interested in adding up what the system internals are doing at each level and modelling system changes. Thus your model is more challenging in that it requires proposingwhat the appropriate measures are at each scale, and a mechanism on how they combine and transform between scales ( is a multi-scale metrics model). Without a metrics model you are stuck, however with some modicum of experimentation/imagination you can usually arrive at a useful starting position.

3. Data availability and scale immutability are key constraints on the ability to establish a mechanistic model.
The key difficulty with your example is it will be very challenging to describe skill development through just 3 layers, when it’s possible more are involved. There is no issue in flattening the model, simply model consistency may become a problem as you increase the number of skills and classification, or a 3-layer mechanism becomes harder to work out. The second concern is that both input and some output data are required in order to ensure the validity of the transfer functions. I am not sure this is available for batting in the way you have initially suggested. However, this approach could easily be shifted to bio-medical treatment programs (e.g. Parkinson’s exercise monitoring) or consumer behaviour, or a host of other input-output complex system dynamics. Since it is a generic technique i will continue with the example at hand, and will use fake data to illustrate the approach.

4. Adding up the contribution of element behaviour to overall sub-system, or system behaviour is not simple. At a high level, the overall objective function (the goal) is usually pushed from the top, down through layers of components, and the behaviour of lower-level elements is summed-up (e.g. integrated over a region), and these summing-ups are pushed up one layer, and eventually arrive at the top as system performance (i.e. overall metrics, rather than specific). Frequently, complex systems cannot be described by a single metric or figure of metric and thereby need a vector of metrics to describe the system performance at various levels

5. The benefit of Mechanistic Models
There are a couple of key benefits, obviously, the first one lies in establishing the mechanism behind dynamic system behaviour. The second one, less obviously, is that once I have a valid model, i can immediately build additional calculations to determine values that cannot be measured, which are often very important. This is the reason behind digital twins, which are a dynamic digital representation of a real process/system. In short, there is a lot that can be calculated from a mechanistic model, that can never be uncovered through ml alone

Over the next week or so, I will try to build an insightMaker model using some fake data to illustrate the point, then once that is completed show the grakn code, and talk about how i would set it up, assuming Grakn V3 (i.e. assuming I had calculations, store procedures and particular setup with attributes)

Thanks so much - I look forward to seeing what you come up with!

Warm Regards,
Brendan

Hi @modeller, any update here? I’m vary curious to understand how you would implement the general framework you laid out in a Grakn schema. If you don’t think you’ll have time to do it, I’d still appreciate it if you could let me know that, as I’ve been waiting to proceed until I saw how you approached it.

Warm Regards,
Brendan

Replying directly to your original post @brendan.
I suggest that at present you should define the most abstract elements of your model as schema. In this way you should expect that your schema won’t grow according to the amount of data you insert (and only changes when you need to amend or extend the structure of your data).

Therefore I would propose a schema like this:

define

name sub attribute, datatype string;
behaviour-name sub name;
skill-name sub name;
athlete-name sub name;
time-spent-minutes sub attribute, datatype long;

behaviour sub entity,
	key behaviour-name,
  plays developing-behaviour,
  plays measured;
  
skill-development sub relation,
	relates developing-behaviour,
	relates developed-skill;
	
skill sub entity,
	key skill-name,
	plays developed-skill,
	plays superior-skill,
	plays subordinate-skill;;
	
skill-hierarchy sub relation,
	relates superior-skill,
	relates subordinate-skill;
	
skill-hierarchy-is-transitive sub rule,
when {
	(superior-skill: $a, subordinate-skill: $b) isa skill-hierarchy;
	(superior-skill: $b, subordinate-skill: $c) isa skill-hierarchy;
}, then {
	(superior-skill: $a, subordinate-skill: $c) isa skill-hierarchy;
};

activity sub entity,
	key activity-name,
	plays participated-in,
	plays measure;

measurement sub relation,
	relates measured,
	relates measure;
	
participation sub relation,
	has time-spent-minutes,
	relates participant,
	relates participated-in;
  
athlete sub entity,
	key athlete-name,
	plays participant;

Note I’ve used some artistic licence using your prose description as inspiration.

We can then insert the data roughly as per your diagram as follows:

insert 
$a isa athlete, has athlete-name "batter";
(participant: $a, participated-in: $batting-cage) isa participation, has time-spent-minutes 60;
$batting-cage isa activity, has activity-name "batting cage";
(measure: $batting-cage, measured: $batting-practice) isa measurement;
$batting-practice isa behaviour, has behaviour-name "batting-practice";
(developing-behaviour: $batting-practice, developed-skill: $hrh) isa skill-development;
$off isa skill, has skill-name "offence";
(superior-skill: $off, subordinate-skill: $hrh) isa skill-hierarchy;
$hrh isa skill, has skill-name "home run hitting";

This example, using a transitive rule, shows you how you can achieve a hierarchy in your data that you can match for in any way you like, including using a rule.

Let’s try addressing your question 1 with a rule. I think we want to infer the connection between an activity and a skill in the schema I’ve given, so we extend our schema:

define

# we want to infer this relation
activity-skill-development sub relation,
	relates developing-activity,
	relates developed-skill;
	
# this extends the previous definition
activity sub entity, 
	plays developing-activity;

activity-develops-skill sub rule,
when {
	$activity isa activity;
	(measure: $activity, measured: $behaviour) isa measurement;
	$behaviour isa behaviour;
	(developing-behaviour: $behaviour, developed-skill: $skill) isa skill-development;
	$skill isa skill;
}, then {
	(developing-activity: $activity, developed-skill: $skill) isa	activity-skill-development;
};

activity-develops-superior-skills sub rule,
when {
	(developing-activity: $activity, developed-skill: $dev-skill) isa	activity-skill-development;
	(superior-skill: $parent-skill, subordinate-skill: $dev-skill) isa skill-hierarchy;
}, then {
	(developing-activity: $activity, developed-skill: $parent-skill) isa	activity-skill-development;
};

Now you should be able to query:

match
$activity isa activity, has name "batting cage";
(developing-activity: $activity, developed-skill: $skill) isa	activity-skill-development;
$skill isa skill, has name $skill-name; get;

You should get back home run hitting and offence as the skills developed. However, if you have added types of skill that sit elsewhere in the skill hierarchy then they won’e be returned. The rule searches for only the parent skills. This seems desirable because working on your hitting balls hard doesn’t necessarily imply that you’ve improved your home run hitting.

To your second quesiton, that would require arithmetic in rules, which isn’t available yet. What you can do is set simple thresholds in you rules, to say $x has time-spent-minutes $t; $t >= 60 and then label coarsely the different levels of skill this implies, e.g.

when {
	...
	$x has time-spent-minutes $t; $t >= 60;
	...
},
then {
	$y has skill-level "high";
}

There’s a lot in this example, hope it helps!

I know it’s taken me a long time to respond, but this may be the single greatest answer I’ve ever received on any type of support forum. Incredible. It will take me some time to process, but I get the gist and it’s super helpful. Thanks @james!

1 Like

@james , I know it’s been a long time since you wrote this response - I just wanted to let you know that I finally got back to it in detail. Thank you again - it’s still one of the most amazing responses ever.

A couple of questions (if you can even recall your thought process anymore on this):

  1. Why did you decided to create the hierarchy of skills via relations (i.e. skill sub entity, plays superior-skill, plays subordinate-skill) instead of just using ‘sub’ (i.e. Home-run-hitting sub offensive-skill abstract)?

  2. Do you know when arithmetic in rules will be available?

Thank you agaiin!

Warm Regards,
Brendan

Hi @brendan, it sounds like you’ve had a busy year! :joy:

(1) Great question. The way I see it, if you were to add the skill hierarchy in the schema you would run in to a number of problems:
i. You would not be able to connect to those skills using a relation instance directly.
ii. Therefore you would have to create a single instance for every skill type, and this is a code smell. The expectation when designing schema is that all types should have more than one instance.
iii. This supports only a single tree-shaped hierarchy of skills.
The upside of using schema for the skill hierarchy would be that you use the type system to compute transitivity, which is faster than the reasoning engine can ever be because the type system is a single tree and not a graph, whereas the reasoner must explore an arbitrary graph shape to check all plausible transitive paths. I believe this approach can be encoded into rules also, but to query for all parent skill types would look like:

match
$activity isa activity, has name "batting cage";
(developing-activity: $activity, developed-skill: $skill) isa activity-skill-development;
$skill isa $skill-type; get $skill-type;

then in the response you would retrieve the label for the all of the answers for $skill-type. So I’ll leave it to you to decide. We hope to make improvements to the efficiency of transitive rules in the reasoner so do look out for that.
(2) Sorry, I don’t have an estimate for you yet!