Skills Extractor

This codebase helps to extract skills of people from a CSV files and create tags of those skills. These tags are then used to create a skill taxonomy and assign these tags from this taxonomy to the emoployees mentioned in the CSV file.

Explanation of the files

cluster_skills.py - Consists of the parallelized clusterization algorithm to make the skill taxonomy broader. Low n_clusters means more generalization and vice versa to be more specific.
utils.py - Main file that consists the logic to generate skills_taxonomy.txt and individual_skills.csv.
app.py
individual_skills.csv - A dataframe consists of 2 columns Name and Skills for every employee.
skills_taxonomy.txt - List of Skills that were generated from the initial dataset after clusterization.
postprocessing.py - In case you need a more refined output, i.e., make the skill taxonomy more broader or more specific. It generates individual_skills_refined.csv and skills_taxonomy_refined.txt
individual_skills_refined.csv - It has the same format as individual_skills.csv after running postprocessing.py.
skills_taxonomy_refined.txt - It has the same format as skills_taxonomy.txt after running postprocessing.py.
logs.txt - Consists of the logs of an example run of utils.py

Instructions to run

Create a CSV that has a column called "Skill Sets" that consists of skills defined in natural language for employees.
Create a .env file and define OPENAI_API_KEY environment variable. (Number of API calls will be equal to the number of rows in your CSV.)
python3 -m venv venv
pip install -r requirements.txt
Run python3 utils.py > logs.txt 2>&1
To get a more refined output run python3 postprocessing.py --n_clusters 100

Streamlit application link

The application is deployed in this link: Skill Extractor UI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skills Extractor

Explanation of the files

Instructions to run

Streamlit application link

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cluster_skills.py		cluster_skills.py
helpers.py		helpers.py
logs.txt		logs.txt
postprocessing.py		postprocessing.py
requirements.txt		requirements.txt
skills_taxonomy.txt		skills_taxonomy.txt
skills_taxonomy_refined.txt		skills_taxonomy_refined.txt
ui_screenshot.png		ui_screenshot.png
utils.py		utils.py

FormulaMonks/skills_extractor

Folders and files

Latest commit

History

Repository files navigation

Skills Extractor

Explanation of the files

Instructions to run

Streamlit application link

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages