Python Vacancies Scraper on DOU and Data Analysis

Project Description

This project is designed for scraping Python job vacancies from the DOU website, collecting data, and conducting analysis. The data collected includes:

Job Title
City
Salary
List of technologies mentioned in the job description

The scraping process is carried out in the vacancies_scraping/parse.py file. After running this file, a technologies.csv file will be created in the data directory, containing the collected data for further analysis.

Project Structure

vacancies_scraping/parse.py: The file for scraping data from the DOU website.
data/: Directory where the collected data (technologies.csv) and generated graphs in PNG format are stored.
data_analysis/main.ipynb: Jupyter Notebook that performs data analysis:
- Finds the top 10 most frequently mentioned technologies.
- Creates a pie chart of all technologies and their frequency of mentions.
- Creates a bar chart of the average minimum salary relative to cities.
- Creates a bar chart of the average maximum salary relative to cities.

Technologies Used

aiohttp: For asynchronous requests to web pages.
BeautifulSoup: For parsing the HTML content of the pages.
Selenium: For interacting with the pages and loading all job vacancies on the website.
pandas: For data processing and analysis.
numpy: For numerical computations.
matplotlib: For creating visualizations.

How to Run the Project

Clone the repository:

git clone <repository-url>
cd py-scraping-and-data-analysis

Set up the virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:

pip install -r requirements.txt

Scraping Vacancies:

Navigate to the vacancies_scraping/ directory.

Run the parse.py file:

python vacancies_scraping/parse.py

After the script completes, the technologies.csv file will be created in the data/ directory.

Data Analysis:

Open data_analysis/main.ipynb in Jupyter Notebook or any other environment that supports Jupyter Notebooks. Execute all cells in the notebook to perform data analysis. The resulting graphs will be saved in the data/ directory in PNG format.

Results

technologies.csv: A file containing the collected data on Python job vacancies from DOU.
A pie chart of top 10 technologies by their frequency of mentions.
A bar chart of all technologies and their frequency of mentions.
A bar chart of the average minimum salary relative to cities.
A bar chart of the average maximum salary relative to cities.

Notes

Ensure you have a web driver for Selenium installed (e.g., ChromeDriver) that matches the version of your browser.
Depending on your internet connection speed and system performance, some parameters (e.g., page load wait times) in the scripts may need adjustment for successful execution.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
data_analysis		data_analysis
vacancies_scraping		vacancies_scraping
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Vacancies Scraper on DOU and Data Analysis

Project Description

Project Structure

Technologies Used

How to Run the Project

Scraping Vacancies:

Data Analysis:

Results

Notes

About

Releases

Packages

Languages

SuskyiVolodymyr/py-scraping-and-data-analysis

Folders and files

Latest commit

History

Repository files navigation

Python Vacancies Scraper on DOU and Data Analysis

Project Description

Project Structure

Technologies Used

How to Run the Project

Scraping Vacancies:

Data Analysis:

Results

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages