Skip to content

Commit

Permalink
Some prettyfication on iswc folder
Browse files Browse the repository at this point in the history
  • Loading branch information
yum-yab committed May 6, 2022
1 parent d12b84c commit 2c9b7df
Show file tree
Hide file tree
Showing 4 changed files with 4 additions and 6 deletions.
4 changes: 2 additions & 2 deletions iswc2022/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
- [Table of all outages](https://docs.google.com/spreadsheets/d/1bL0cnzxPP2y46Z-byf56oHNwREnid1cG0affctbD9fI/edit#gid=281687190)
- [Outage per crawl](https://docs.google.com/spreadsheets/d/1bL0cnzxPP2y46Z-byf56oHNwREnid1cG0affctbD9fI/edit#gid=694221323)
- [Outage per ontology](https://docs.google.com/spreadsheets/d/1bL0cnzxPP2y46Z-byf56oHNwREnid1cG0affctbD9fI/edit#gid=1207680809)
- [Evaluation functions for generating the data](/archivo/iswc_eval.py)
- [Evaluation functions for generating the data](/iswc2022/archivo_data/iswc_eval.py)

## Archivo Source Evaluation
- [Ontology by source and addition](https://databus.dbpedia.org/ontologies/archivo-indices/ontologies/2021.11.21-220000/ontologies_type=official.csv)
Expand Down Expand Up @@ -45,7 +45,7 @@ https://databus.dbpedia.org/ontologies/collections/archivo-reproducibility-analy

## Unknown Terms of Archivo

The data can be found [here](unknown_terms_crawl) and contains the following
The data can be found [here](unknown_terms_crawl/term_count_reason_mapping.csv) and contains the following
* A table with mappings from terms to occurrence count in LOD cloud to the reason for not being able to be added to Archivo
* A json file with mappings from term to reason for not being added
* the script used to generate the stats
File renamed without changes.
6 changes: 2 additions & 4 deletions iswc2022/unknown_terms_crawl/crawl_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,15 +196,13 @@ def generate_term_count_reasoning_mapping(output_filepath: str, write_files: boo

def main():
print("Loading data...")
covered_by_archivo = load_first_column_csv("/home/denis/Workspace/Job/archivo_iswc_2022/archivo-analysis/new_all_archivo_classes.csv")

# covered_by_archivo = covered_by_archivo + load_first_column_csv("/home/denis/Workspace/Job/archivo_iswc_2022/archivo-analysis/new_all_archivo_classes.csv")
covered_by_archivo = load_first_column_csv("all_archivo_classes.csv")

covered_by_archivo = set(covered_by_archivo)

print("Reading the file and filter it...")

term_count_reason_list = generate_term_count_reasoning_mapping("class_count_reason_mapping.csv", stopset=covered_by_archivo, write_files=True)
term_count_reason_list = generate_term_count_reasoning_mapping("term_count_reason_mapping.csv", stopset=covered_by_archivo, write_files=True)

num_terms_not_in_archivo = len(term_count_reason_list)

Expand Down

0 comments on commit 2c9b7df

Please sign in to comment.