Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerning the statistics 6.6.2.6 #121

Open
yayamamo opened this issue Mar 10, 2015 · 7 comments
Open

Concerning the statistics 6.6.2.6 #121

yayamamo opened this issue Mar 10, 2015 · 7 comments

Comments

@yayamamo
Copy link

Hi,
The spec of 6.6.2.6 defines the unique numbers of subjects and objects w.r.t a predicate.
This shows one aspect of the triples connecting two classes, but another cannot be obtained.
It is the unique number of triples connecting the two classes.
More properly, it specifies the number of unique triples that connects typed subjects and objects, which belong to certain classes, respectively.

One extreme example is that 100 different subjects have an identical property.
The former says that 100 distinctSubjects and 1 distinctObject(s) while the latter says 100 triples.
Another example is that each of 10 different subjects has an identical set of 10 properties.
The former says that 10 distinctSubjects and 10 distinctObjects while the latter says 100 triples.

I think the latter statistics is also useful to know the characteristics of the target dataset, and I feel this was on the document before, wasn't it?

@micheldumontier
Copy link
Member

The intent of 6.6.2.6 is to capture the total number of triples between subjects and objects of a specified type e.g. 100 distinct subjects may be connected to 10 distinct objects via 100 triples.
One way of dealing with the total number of triples between subjects and objects of a certain type would be to simply declare a property partition on "rdfs:property".

@yayamamo
Copy link
Author

I cannot fully understand what the meaning of to declare a property partition on "rdfs:property".
My previous comment may be vague, and a statistic what I'd like to know is the number of a certain predicate that connects specific classes (i.e., :c1 and :c2 in the example below). If the predicate connects these classes only, the number is identical to that of the predicate.

SELECT ?p (COUNT(?p) AS ?rc)
    WHERE {
      GRAPH :graph {
        ?s ?p ?o .
        ?s a :c1 .
        ?o a :c2 .
      }}
  GROUP BY ?p

@micheldumontier
Copy link
Member

@yayamamo
Copy link
Author

I don't think so.
The difference is what I wrote at the top of this comment.
Former is 6.6.2.6, and the latter is the query I wrote just above.

count(distinct ?s) = 100, count(distinct ?o) = 1, count(?p) = 100

One extreme example is that 100 different subjects have an identical property.
The former says that 100 distinctSubjects and 1 distinctObject(s) while the latter says 100 triples.

count(distinct ?s) = 10, count(distinct ?o) = 10, count(?p) = 100

Another example is that each of 10 different subjects has an identical set of 10 properties.
The former says that 10 distinctSubjects and 10 distinctObjects while the latter says 100 triples.

@micheldumontier
Copy link
Member

so 6.6.2.2 talks about properties and number of triples. This query is not, however, limited to the subject and object being of some arbitrary type - we imagine that this is necessarily true.

SELECT ?p (COUNT(?p) AS ?triples)
{ ?s ?p ?o }
GROUP BY ?p

@yayamamo
Copy link
Author

That is to say, would 6.6.2.6 be as follows?

:rdfdataset
    void:propertyPartition [
        void:property <property-uri> ;
        void:triples "###"^^xsd:integer ;
        void:classPartition [
            void:class <subject-class-uri> ;
            void:distinctSubjects "###"^^xsd:integer ;
        ];
        void-ext:objectClassPartition [
            void:class <object-class-uri> ;
            void:distinctObjects "###"^^xsd:integer ;
        ];
    ] .
SELECT (COUNT(DISTINCT ?s) AS ?scount) ?stype ?p (COUNT(?p) AS ?pcount) ?otype  (COUNT(DISTINCT ?o) AS ?ocount)  
{ 
 ?s ?p ?o . 
 ?s a ?stype .
 ?o a ?otype .
} GROUP BY ?p ?stype ?otype

@micheldumontier
Copy link
Member

yes that's right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants