Skip to content

Maksym-UA/Scrapy_products

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Product finder with Scrapy

This project shows an example of website crawling with Scrapy framework.

First of all it is better to create a folder that will hold your whole project. After that you should enter this folder and it is recommended to use virtual environment

pip install virtualenv

Install Scrapy with

pip install Scrapy

Note that sometimes this may require solving compilation issues for some Scrapy dependencies depending on your operating system, so be sure to check the Platform specific installation notes

Now you are ready to start the project

scrapy startproject yourpoject

It creates a directory with such structure

yourpoject/
    scrapy.cfg            # deploy configuration file

    yourpoject/             # project's Python module, you'll import your code from here
        __init__.py

        items.py          # project items definition file

        middlewares.py    # project middlewares file

        pipelines.py      # project pipelines file

        settings.py       # project settings file

        spiders/          # a directory where you'll later put your spiders
            __init__.py

To put this spider (actually crawler) to work, go to the project’s top level directory and run:

scrapy crawl dior_spider -o links.csv -t csv

or depending on your prefernces and future tasks

scrapy crawl dior_spider -o goods.json 

Results will be saved to goods.json file (added to the repository). Further analysis can be performed with help of Pandas library.

The best way to learn how to extract data with Scrapy is trying selectors using the shell Scrapy shell

scrapy shell 'URL'

CONTACT

Please send you feedback to

  [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages