Human Language Technology Notebooks for Lab sessions, Master Students
This GitHub contains the python notebooks for the Introduction into Human Language Technology course at the Vrije Universiteit Amsterdam, Faculty of Humanities. This course targets Text Mining Master students and Research Master students. We assume you have some basic knowledge of Python.
The notebooks are grouped into 3 lab sessions:
- lab1: getting started, installing packages and learn to work with Natural Language Processing packages
- lab2: modeling the meaning of words: wordnets and distributional models
- lab3: machine learning for Natural Language Processing
During the first LAB session you are getting prepared to work with the massive amounts of tools and data that are available for many languages. We want you to work with on your own computer. A solid laptop and substantial disk space helps. We are going to use the following applications and toolkits:
- The Terminal under Linux or Mac OS, the command line under windows.
- Anaconda environment: https://anaconda.org, which also install Python 3.7
- Jupyter notebooks: https://jupyter.org. (you also may want to check out this: https://towardsdatascience.com/bringing-the-best-out-of-jupyter-notebooks-for-data-science-f0871519ca29
To prepare for the course, make sure you have installed Anaconda and you have familiarised yourself with the environment and with notebooks.We assume that you have basic programming skills in Python and that you can work in the Command Line. Please follow the instruction in the document "getting-started.pdf".
It is your own responsibility to have your skills up to date; without proper command line skills, you will not be able to attend the lab sessions.
In addition to the above tools, you need a plain text editor to inspect text file. Note that this is not the same as Word or Pdf. Word and Pdf files are binary files and contain many other things besides the text (just like HTML files):
- Windows: Notepad++
- Mac OS/Linux: Atom
Tip: to find out the difference between proper text files and other types of document, open one of the above text editors and load a Word, Pdf and Html file. You will see immediately that they contain a lot of stuff that is not text.