Semantic Typing Service

Use this service to predict types of data after giving training data, see "Using the service" for details.

Software Requirements

Python 2.7 - Download
- Pip - If you're on Ubuntu just install "python-pip"
MongoDB - If you're on Ubuntu just install "mongodb-server"
Elasticsearch - Download

Apache Spark - Download

This can be a pain to get to work properly, you may find the sample code below to put in you ~/.bashrc or ~/.bash_profile helpful, just don't forget to change the two {{path to spark}} and {{version number}}:

export PATH={{path to spark}}/bin:$PATH
export SPARK_HOME="{{path to spark}}"
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-{{version number}}-src.zip:$PYTHONPATH

SemanticLabeling - Install using "pip install git+https://github.com/usc-isi-i2/SemanticLabelingAlgorithm.git"

Elastic2DocManager - Install using

pip install elastic2-doc-manager

or

git clone https://github.com/mongodb-labs/elastic2-doc-manager cd elastic2-doc-manager python setup.py install

MongoConnector - Install using

pip install mongo-connector

or

cd your/installation/directory
git clone https://github.com/mongodb-labs/mongo-connector.git
cd mongo-connector
python setup.py install_service

Running the service

Start MongoDB in replicaSet by running
```
mongod --replSet "rs0"
```
in the terminal
connect Mongo Shell to replicaSet by
```
mongo
```

Initiate the replica set by

rsconf = {
_id: “rs0”,
members: [
{
_id: 0,
host: “127.0.0.1:27017"
}
]
}
rs.initiate(rsconf)

4. Start Elasticsearch by running the "elasticsearch" in "bin" in your elasticsearch directory 5. Create a connection between MongoDb and ElasticSearch using

mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager -v

6. You will probably not kill the process (ctrl+c) if you are deploying ES in the staging/production so you will need to create a background process to avoid losing the connection between your db and ES when you end the ssh session, to do that use -

nohup mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager -v > /dev/null 2>&1 &

- For any queries with respect to steps 1-6, refer to this blog - https://madhacker.me/use-mongodb-and-elasticsearch-2/ 7. Run "server.py" - If you get a "No module named pyspark" error your Apache Spark is not configured correctly - If you get a "No module named {package name here}" error just run "pip install {insert name here}" in terminal, just make sure pip is installing to the correct python installation if you have more than one

Using the service

Getting Started

Before you can predict what kind of data something is you have to create semantic types and columns with data in the semantic types. The following diagram represents the relationship of the semantic types and columns in the service:

I recommend using swagger when you are trying the steps below for the first time since you have an explanation of what each parameter is right there. There is a section on using swagger with this service below.

Create semantic types by using the POST /semantic_types with the class and property you want for the semantic type, just note that the class must be a valid URL which also has a valid namespace (parent) URL. If you don't have any particular URL you want to use just make one up, if it isn't valid you'll get a 400 in response with a message that it isn't valid.
Create at least one column for each of the semantic types using the POST /semantic_types/{type_id} endpoint. Keep in mind that you will need the semantic type's id for this; the id of the semantic type is returned when you create the type but you can also get them using the GET /semantic_types endpoint. Even though you can create as many columns in a semantic type as you want, when you are predicting the service will only return the semantic type it thinks the data belongs to and no details about the column. When you're first creating the column you do not have to add data, even though you do need to add data before predicting. If you decide to create the column and add the data separately you can use the POST /semantic_types/{column_id} endpoint to add the data later. When you give data to the service, remember that each line is taken as a value, including blank lines.
Now that you have semantic types and columns with data, you can use the POST /predict endpoint to predict the semantic type of the data. When you give predict data, do it in the same format as adding the data. The more you provide the better. The data you will get back from the service will be a list where each element contains the semantic type id and the how confident the semantic labeler is that the specific semantic type is the correct one for the given data, which ranges from 0 to 1.

Example

Only one of each step below is listed since each of them is exactly the same, just with different data.

Create a semantic type:
Create a column:
Predict semantic type:

Quick summary of endpoints

`/predict`

POST Use this for predicting the semantic type of data. Without this endpoint this whole service is basically useless.

#####/semantic_types GET Returns all of the semantic types (and optionally the columns and data in the columns) in the system which match all of the given parameters.

POST Add a semantic type.

PUT Add a semantic type; it if already exists remove the old one and all of its data then make the new one.

DELETE Delete all semantic types (and all of their data) which match all of the given parameters.

#####/semantic_types/{type_id} GET Return all of the columns (and optionally the data in the columns) in a semantic type which match all of the given parameters.

POST Create a column in a semantic type, optionally with data.

PUT Create a column in a semantic type, optionally with data; it if already exists remove the old one and all of its data then make the new one.

DELETE Delete all of the columns in a semantic which match the given parameters.

`/semantic_types/type/{column_id}`

GET Returns all of the information and data about the column.

POST Append data to an existing column.

PUT Replace the data in an existing column.

DELETE Remove all of the data in the column.

`/bulk_add_models`

GET Returns all of the bulk add models in the system which match all of the given parameters. This can also be used to check the status of how well the labeler is working (using the "learnedSemanticTypes" in the model).

POST Add a bulk add model. The semantic types and columns in this model will be created now.

DELETE Remove all of the bulk add models which meet all of the given parameters.

`/bulk_add_models/{model_id}`

GET Get the bulk add model. This is basically what you send when adding the model but it can have the learned semantic types updated.

POST Add bulk amounts of data to the service. This adds all of the data to the columns for you.

Swagger

To view documentation for each of the endpoints and try it out with data, go to http://localhost:5000/api/spec.html#!/spec. For some reason it always starts with all of the endpoints hidden, so don't forget to click on "Show/Hide" or "List Operations". Here is approximately what it should look like after listing all of the endpoints:

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
model		model
readme_data		readme_data
service		service
.gitignore		.gitignore
README.md		README.md
license		license
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Typing Service

Software Requirements

Running the service

Using the service

Getting Started

Example

Quick summary of endpoints

`/predict`

`/semantic_types/type/{column_id}`

`/bulk_add_models`

`/bulk_add_models/{model_id}`

Swagger

About

Releases

Packages

Languages

License

rutujarane/SemanticLabelingService

Folders and files

Latest commit

History

Repository files navigation

Semantic Typing Service

Software Requirements

Running the service

Using the service

Getting Started

Example

Quick summary of endpoints

/predict

/semantic_types/type/{column_id}

/bulk_add_models

/bulk_add_models/{model_id}

Swagger

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`/predict`

`/semantic_types/type/{column_id}`

`/bulk_add_models`

`/bulk_add_models/{model_id}`

Packages