Repository for web scraping based projects
So the main idea here is to scrape headlines from news websites wire.in and firstpost.com Wire doesn’t have any restrictions on scraping data, since the robots.txt was missing. Firstpost has certain restrictions, but we will not be accessing those areas. So I scrape all the headlines and store it onto a notepad. One reason for doing this is to avoid advertisements and annoying pop ups. Further updates for this project would be adding news from newsminute.com and quint.in, Once I am done with scraping and handling data I will be performing NLP to analyse the sentiments of news published. Analysis will be extended to website by website. Visually I can also compare about priorities given to news articles by each website. Look onto file News_articles.py and News data for the output obtained.
Added new Web scraping project which I am doing currently I Am analysing toppings data from a local pizza store's website situated in Tampere Finland. Though I am using selenium actions to change language, the output is still in Finnish. I am looking onto the issue to sort it out. After fetching the menu, I am performing certain manipulations to take out only the toppings. Based on my preliminary analysis,
- Kinkku - Ham - It is used in 15 different Pizzas
- Aurajuusto - Blue Cheese - It is used in 12 different Pizzas
- Katkarapu - Shrimp - It is used in 11 different Pizzas
Chilli Pepper, Banana are used only in one pizza Note there are additional data to the above list which I will update in next version of the code For further information look onto the file pizza.py