Skip to content

Latest commit

 

History

History
54 lines (34 loc) · 3.17 KB

README.md

File metadata and controls

54 lines (34 loc) · 3.17 KB

Medical Wordlist

Medical wordlists in English, French, and Ukrainian languages, which can be used for spell checking. This project is based on collection of SPARQL requests using Wikidata Query Service.

Getting Started

You can find all medical words by language in the following files:

  • fr/wordlist.fr.txt : 23403 words
  • en/wordlist.en.txt : 171701 words
  • ua/wordlist.ua.txt : 5187 words

By example, fr/anatomical-structure.fr.txt french wordlist is the result of sparql/fr/anatomical-structure.fr.sparql with Wikidata Query Service.

To contribute and add more medical keywords, you can make a merge request.

Contributions

Contributions are welcome to advance this project and create a comprehensive taxonomy of medicine in the future (using wikidata-taxonomy). If you would like to contribute, please follow these steps:

  • Fork the repository.
  • Create your SPARQL request.
  • Add the new medical keywords to the appropriate language file(s) using launcher-sparql-query.py.
  • Update general wordlist using generate-wordlist-txt.sh.
  • Create a pull request.

We appreciate any contributions to this project.

Scripts

The generate-wordlist-txt.sh command allows you to assemble multiple files containing keyword lists for spell checking from a specific language folder (en, fr, ua). It then sorts the keywords alphabetically and removes all duplicate keywords. The final result is a single file containing all the sorted and unique keywords. This command is useful for consolidating multiple sources of keywords into a single list for use as a reference in spelling correction processes.

The launcher-sparql-query.py script executes a SPARQL query on Wikidata Query Service and saves the results to a file. The user provides the path to the file containing the SPARQL query as well as the path to the file where the results will be saved. The script reads the query from the input file, sends it to Wikidata Query Service, and retrieves the results in JSON format. If the query is successful, the script extracts the desired data from the JSON results and writes it to the output file. If there is an error, the script prints an error message along with the response code.

The update-sparql-query.sh script takes a command-line argument -d to specify the directory containing all the SPARQL queries to be executed. The script generates the output files with a .txt extension using launcher-sparql-query.py script. For example, to execute the script on SPARQL queries located in the sparql/fr directory, run the command :

bash update-sparql-directory.sh -d sparql/fr

The output files will be generated in the current directory.

⚠️ Project in progress

  • SPARQL queries are not yet compliant, the most important is in the wordlist.en.txt file, the goal is not to classify by word types but by SPARQL queries.
  • This project does not come from the medical world
  • The keywords are all in lower case

License

This project is licensed under the Do What The F*ck You Want To Public License.

Acknowledgments