Skip to content

Latest commit

 

History

History
42 lines (28 loc) · 1.14 KB

File metadata and controls

42 lines (28 loc) · 1.14 KB

Improving Open Data Quality using Python

This repo has the complete materials of the tutorial session Improving Open Data Quality using Python, presented at PyData Global 2023 conference

Preparing the environment

First, we should create a python virtual environment and install the required dependencies. To do so, we can run the following commands:

python -m venv data-quality

Now depending on your OS, you should run the following command:

  • Linux/MacOS
source data-quality/bin/activate
  • Windows
data-quality\Scripts\Activate.ps1

Finally, we can install the required dependencies:

pip install -r requirements.txt

Running the environment in Google Colab

You can also launch the notebook in Google Colab performing the following steps:

  1. Open the Colab web site: https://colab.research.google.com/
  2. File menu -> Open notebook
  3. Click on the GitHub tab
  4. Paste the following URL: https://github.com/elsatch/yData-Global-2023-Improving-Open-Data-Quality-using-Python.git
  5. Select the single_datasets.ipynb notebook
  6. Execute the specific cell for colab at the beginning of the notebook