Skip to content

Latest commit

 

History

History
116 lines (90 loc) · 6.7 KB

README.md

File metadata and controls

116 lines (90 loc) · 6.7 KB

Predicting DBpedia types

This project contains the source code of the set of models for inferrying new types on DBpedia 3.9, exploiting its class hierarchy. The main script (main_predicting_DBpedia_types.R) allows to reproduce the experiments carried out. Check out http://es-ta.linkeddata.es/ to explore details and download data to test experiments.

Getting Started

Clone or download the source code into your machine. Check next section to find libraries and software used and how to install them.

Prerequisites

other dependencies...

Installing R packages

We recommend to search last stable versions, but here you will find both last stable and the ones used in Machine Learning(ML) libraries.

  • Main common libraries and dependencies
#http://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/r.html
if (! ("methods" %in% rownames(installed.packages()))) { install.packages("methods") }
if (! ("statmod" %in% rownames(installed.packages()))) { install.packages("statmod") }
if (! ("stats" %in% rownames(installed.packages()))) { install.packages("stats") }
if (! ("graphics" %in% rownames(installed.packages()))) { install.packages("graphics") }
if (! ("RCurl" %in% rownames(installed.packages()))) { install.packages("RCurl") }
if (! ("jsonlite" %in% rownames(installed.packages()))) { install.packages("jsonlite") }
if (! ("tools" %in% rownames(installed.packages()))) { install.packages("tools") }
if (! ("utils" %in% rownames(installed.packages()))) { install.packages("utils") }
if (! ("optparse" %in% rownames(installed.packages()))) { install.packages("optparse") }
  • Last ML versions
install.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/latest_stable_R")))

if (! ("C50" %in% rownames(installed.packages()))) { install.packages("C5.0") }
  • Used ML versions
url_h2o <- "https://cran.r-project.org/src/contrib/Archive/h2o/h2o_3.16.0.1.tar.gz"
install.packages(url_h2o, repos=NULL, type="source")

url_c50 <- "https://cran.r-project.org/src/contrib/Archive/C50/C50_0.1.0-24.tar.gz"
install.packages(url_c50, repos=NULL, type="source")

Running experiments

Using your CLI, navigate to source folder and execute -h or --help to see the available options. Use RScript command when using Windows (mind your PATH configuration)

./main_predicting_DBpedia_types.R --help
Usage: ./main_predicting_DBpedia_types.R <options>
Description:
                           this software provides the possibility of reproduce experiments showed in paper "Inferring new types on large datasets applying ontology class hierarchy classifiers: the DBpedia case"
                           (currently under review at ISWC2018)

Options:
        -a CHARACTER, --approach=CHARACTER
                approach selected. <global_ap1 | multilevel_ap2 | cascade_ap3>
              global_ap1 - First approach, it learns from most specific type from each resource
              multilevel_ap2 - Second approach, Local Classifiers per Level with binary decisions per level aimed to solve partial depth issue
              cascade_ap3 - Third approach, Local Classifiers per Level with binary decisions per level and cascade process aimed to solve partial depth issue and reduce hierarchy inconcistencies

        -l CHARACTER, --algorithm=CHARACTER
                algorithm used for the approach selected. <NB | C5.0 | DL | RF>.
              NB - Naïve Bayes. Only in global approach
              C5.0 - Only in multilevel approach
              DL - Deep Learning
              RF - Random Forest

        -t CHARACTER, --test=CHARACTER
                dataset selected. <test1 | test10 | test25 | fiveFold>
              test1  - test with retained resources with at least 1 ingoing property
              test10 - test with retained resources with at least 10 ingoing properties
              test25 - test with retained resources with at least 25 ingoing properties
              fiveFold - test with cross validation wtih 5 fold. Only in multilevel approach

        -i CHARACTER, --pathIn=CHARACTER
                path to input files. Directory should exist previously. Check out http://es-ta.linkeddata.es/#inputs to download experiments

        -o CHARACTER, --pathOut=CHARACTER
                path to output files. Directory should exist previously

        -f CHARACTER, --fileOut=CHARACTER
                files' output identifier or name to track files about same experiment [default= output]

        -s SEED, --seed=SEED
                random number generator seed for algorithms that are dependent on randomization [default= 1234]

        -h, --help
                Show this help message and exit

Examples:
                             -> using multilevel approach (2) with Random Forest algorithm and fiveFold test. Check out input folder is the first path showed with -i flag and generated files will be located at second path, as -o flag shows. Look for files with 'output_ap2_5f_execution1' to find related outputs with your experiment.
                           ./main_predicting_DBpedia_types.R -a multilevel_ap2 -l RF -t fiveFold -i /home/myExperiments/aboutDBpedia_hierarchyClasssifiers/approach2/crossValidation/ -o /home/myExperiments/aboutDBpedia_hierarchyClasssifiers/approach2/crossValidation/output/ -f output_ap2_5f_execution1
                             -> using cascade approach (3) with Deep Learning algorithm and test 25 (resources with at least 25 ingoing properties). Watch out in this example where related paths are used instance of absolute paths.
                           ./main_predicting_DBpedia_types.R -a cascade_ap3 -l DL -t test25 -i ./data/ap3/t25/ -o ./output/ap3_t25/ -f out_ap3_t25_execution7


About the authors

Acknowledgments

This work was partially funded by projects RTC-2016-4952-7, TIN2013-46238-C4-2-R and TIN2016-78011-C4-4-R, from the Spanish State Investigation Agency of the MINECO and FEDER Funds