Skip to content

Rakeshkraki/web-scraper_theverge.com

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Web Scraper for The Verge Articles This is a Python script that scrapes articles from The Verge website and stores the information in a CSV file and an SQLite database. The script is designed to be run on a daily basis to fetch new articles and avoid duplicates.

How to Use Install the required Python packages:

requests beautifulsoup4 sqlite3 Update the url variable to point to the desired website.

Run the script using python scraper.py.

The script will create a CSV file with today's date in the format ddmmyyyy_verge.csv and store the article information in an SQLite database.

Files scraper.py: the main Python script that scrapes the articles and stores them in a CSV file and an SQLite database. theverge.db: the SQLite database file where the article information is stored. read.md: the documentation file that explains how to use the script. Functionality The script performs the following tasks:

Defines the website URL and page headers. Creates a connection to an SQLite database and creates a table for the articles if it doesn't already exist. Sends a GET request to the website and uses Beautiful Soup to parse the HTML content of the page. Finds all the article elements on the page and loops through each one, extracting the relevant information (URL, headline, author, and date) and storing it in both a CSV file and the SQLite database. The CSV file is named using today's date, and the SQLite database is updated with the new articles. The INSERT OR IGNORE statement is used to avoid inserting duplicates into the database. The script can be run on a daily basis to fetch new articles and avoid duplicates. Dependencies Python 3.x requests beautifulsoup4 sqlite3

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages