Skip to content

arjunvenkat7/Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web-Scraping

Repository for web scraping based projects

Project1- Scraping News data

So the main idea here is to scrape headlines from news websites wire.in and firstpost.com Wire doesn’t have any restrictions on scraping data, since the robots.txt was missing. Firstpost has certain restrictions, but we will not be accessing those areas. So I scrape all the headlines and store it onto a notepad. One reason for doing this is to avoid advertisements and annoying pop ups. Further updates for this project would be adding news from newsminute.com and quint.in, Once I am done with scraping and handling data I will be performing NLP to analyse the sentiments of news published. Analysis will be extended to website by website. Visually I can also compare about priorities given to news articles by each website. Look onto file News_articles.py and News data for the output obtained.

(Will update when new changes are made)

Project 2: Scraping Menu from Local pizza store, to analyse the toppings used

Added new Web scraping project which I am doing currently I Am analysing toppings data from a local pizza store's website situated in Tampere Finland. Though I am using selenium actions to change language, the output is still in Finnish. I am looking onto the issue to sort it out. After fetching the menu, I am performing certain manipulations to take out only the toppings. Based on my preliminary analysis,

Top three most commonly used toppings:

  • Kinkku - Ham - It is used in 15 different Pizzas
  • Aurajuusto - Blue Cheese - It is used in 12 different Pizzas
  • Katkarapu - Shrimp - It is used in 11 different Pizzas

Least commonly used toppings:

Chilli Pepper, Banana are used only in one pizza Note there are additional data to the above list which I will update in next version of the code For further information look onto the file pizza.py

(Will update when new changes are made)

About

Repository for web scraping based projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages