Thank you for your interest in contributing to our awesome lists! Please read through this document before you submit any pull requests or issues. It will help us work together more effectively.
To contribute, send us a pull request. Please review our general Guidelines for contributing and Style guide before you start.
This section describes the development environment setup and workflow which should be followed when making changes. We follow a set of pre-defined Style guide for consistent code quality throughout the book and expect the same from our community contributors. You may need to check other chapters from other contributors as well for this step.
The awesome lists are structured into several parts.
- A SQLite database containing the actual data.
- A Directus CMS as a user-friendly interface for data management.
- To each list, there are,
- a Jupyter Notebook
README.ipynb
queries the corresponding data from the SQLite database, and generates the output as a table. - a Markdown file
README.md
renders the parsed records in an easier-to-access way.
- a Jupyter Notebook
Here is the latest ERD of the database.
The built-in Directus CRM is the recommended way to update the data. But if you'd like to be in a more handy way, feel free to use any preferred SQLite editor or through any programming language.
The Directus is defined by the awesome/database/docker-compose.yml
by following the official self-hosting guidance, including,
- the login credentials,
- SQLite database location,
- port,
- key and secret.
You can simply follow the below steps to launch the Directus instance on your local.
- Set up Docker.
cd machine-learning/awesome/database
.docker compose up
.- Go visit
http://localhost:8055
. - Update data through the Directus Data Studio App. Go through this official instruction if you want to learn how to use it.
Some notes
- For the
source
field in the course table, kindly use the official website link of the course, not the link provided by the task. - The
authorCount
field should be filled in only if the number of course authors exceeds three. In cases where there are three or fewer authors, leave this field unchanged. - In the
source
field of the organization and user tables, please use the links pertaining to the organization or author, respectively. Do not use the course link in these fields.
You need nothing but a Jupyter Notebook environment to start the development of this step. You can either set up the environment locally or use any cloud-based solution like Google Colab. If you are using VSCode, please follow this.
E.g. you are adding some new content to the courses list.
- Launch JupyterLab or Jupyter Notebook as your IDE.
- Open
machine-learning/awesome/lists/courses/README.ipynb
. - Rerun all the cells.
- If you want to update the output rendering logic in the Notebook or
machine-learning/awesome/lists/lib
,- add newly introduced Python libraries if needed,
- update the rendering code in Python.
- Go back to JupyterLab or Jupyter Notebook, restart the kernel, and rerun all the cells.
- Check the output
README.md
.
Now, you are ready to submit a PR for your changes. Please make sure you have gone through above STEPs successfully first. Then,
- submit PR, a SQLite database diff will be generated automatically by the GitHub action,
- review the GitHub Action build log, and make sure only intended database change is included,
- review the content of the
README.md
.
TBD
If your PR does not have any conflicts, you can proceed to the next step. However, if conflicts are present, follow the steps below to resolve them effectively.
- Access the GitHub Action page, rerun and locate your PR. Copy the output of the
Show substantial tables differences
and save it as a .sql file. - To reset your code to the original state, execute the following command in your command-line interface:
git reset --hard upstream/main
. - Open SQLiteStudio and execute the .sql file you saved in the previous step.
- Once the changes have been successfully applied, push the modified database to the appropriate origin branch using the following command:
git push origin branch-name
.
Note:
- Remember not to include the .sql file in your PR. You can safely delete it after executing the file to resolve the conflicts.
- Please make sure that all your operations occur in the
machine-learning
directory, not themachine-learning/awesome/database
directory.
- copy sqldiff
- create .sql file
- execute the sql file in SQLiteStudio
The awesome lists SQLite database schema is managed by the Knex migration. The scripts are in machine-learning/awesome/lists/
, including,
- the definition of all the awesome lists entities,
- the minimal manipulating to the Directus system tables.
To update the awesome lists data schema,
cd machine-learning/awesome/
npm install
. You need to set up node.js environment first.cd database
knex migrate:make {VERSION_NUMBER}_{DESCRIPTION}
.- Update the generated migration file.
- If a new entity is introduced, please follow [2, 2] to update Directus
directus_fields
table so that Directus could,- automatically create
id
column as auuid
, - automatically update the
createdAt
column andupdatedAt
column.
- automatically create
knex migrate:latest
- Verify your changes by using Directus, a SQLite editor or any other way you prefer.
- Update the database ERD in the Notes for contributors section if needed by using DBVisualizer.
- Submit PR, a SQLite database diff will be generated automatically by the GitHub action.
- Review the GitHub Action build log, and make sure only intended change is included.
TBD
nbconvert is used to generate the final Markdown file from the Jupyter Notebook Content. You can set it up by following this, then simply run the below command.
jupyter nbconvert README.ipynb --no-input --to markdown --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags remove_cell
There are plenty of tools to edit a CSV file. If you prefer using VSCode, there are plugins like Rainbow CSV and Edit csv(recommended) to help you out.
You can use any programming language or available online tools to generate a UUID. If you prefer using VSCode, you can choose one of the many plugins or the uuidgen
from the built-in console.
You can always generate an ISO timestamp in a programming way.