Skip to content

Add a new Mockpath Dataset

laurenzse edited this page Mar 12, 2018 · 6 revisions

Create new Mockdataset of MetaPaths

It will contain a list of itemA, itemB, nodes_on_path, edges_on_path, metapathinstances between two sets.

We need to define the two input sets A and B. For this we require two lists of node ids, which correspond to nodes in the freebase graph. They can be easily obtained by querying manually (example for programming lanuages: MATCH (n) WHERE n.name = "PYTHON" RETURN ID(n); )

Once these ids are collected, replace those ids in the lists in a cypher query like the following and replace /computer/programming_language/ with your domain. Note that the 3 here corresponds to the longest allowed meta path, you might want to change this.

MATCH p = (a)-[*1..3]-(b)
WHERE 
id(a) IN [33260702,23580293,70267249] 
and id(b) IN [16521328,31640738,60021106]
and all(x in relationships(p) WHERE type(x)=~'/computer/programming_language/.*') 
and all(x in nodes(p)[1..(size(nodes(p))-1)] WHERE (id(x) <> id(a) and id(x) <> id(b)))
RETURN a.name as a_set, b.name as b_set, extract(n IN nodes(p)| labels(n)) AS nodes_types, extract(r IN relationships(p)| type(r)) AS relationship_types, count(*) as path_count;

save this file on watson on /home/bp/dataset-extraction to your own folder as query.cypher You can execute it (best in a screen session) with cat query.cypher | /usr/bin/cypher-shell -u neo4j -a bolt://localhost:PORT_NUMBER --format plain > output.csv , change PORT_NUMBER to the bolt port-number of your database.

Once the query executed, put the output-file in the python framework to make it accessible in 32de-python/tests/data

You still need to tell Python how to find it, so in util/meta_path_loader_dispatcher.py add your dataset in the class MetaPathLoaderDispatcher to the variables available_datasets (name and description) and dataset_to_loader (relative path to output file you just created)

That's it!

Clone this wiki locally