GitHub - Maksym-UA/Scrapy_products

Product finder with Scrapy

This project shows an example of website crawling with Scrapy framework.

First of all it is better to create a folder that will hold your whole project. After that you should enter this folder and it is recommended to use virtual environment

pip install virtualenv

Install Scrapy with

pip install Scrapy

Note that sometimes this may require solving compilation issues for some Scrapy dependencies depending on your operating system, so be sure to check the Platform specific installation notes

Now you are ready to start the project

scrapy startproject yourpoject

It creates a directory with such structure

yourpoject/
    scrapy.cfg            # deploy configuration file

    yourpoject/             # project's Python module, you'll import your code from here
        __init__.py

        items.py          # project items definition file

        middlewares.py    # project middlewares file

        pipelines.py      # project pipelines file

        settings.py       # project settings file

        spiders/          # a directory where you'll later put your spiders
            __init__.py

To put this spider (actually crawler) to work, go to the project’s top level directory and run:

scrapy crawl dior_spider -o links.csv -t csv

or depending on your prefernces and future tasks

scrapy crawl dior_spider -o goods.json

Results will be saved to goods.json file (added to the repository). Further analysis can be performed with help of Pandas library.

The best way to learn how to extract data with Scrapy is trying selectors using the shell Scrapy shell

scrapy shell 'URL'

CONTACT

Please send you feedback to

[email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Product finder with Scrapy

CONTACT

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
spiders		spiders
README.md		README.md
__init__.py		__init__.py
goods.json		goods.json
items.py		items.py
middlewares.py		middlewares.py
pipelines.py		pipelines.py
settings.py		settings.py

Maksym-UA/Scrapy_products

Folders and files

Latest commit

History

Repository files navigation

Product finder with Scrapy

CONTACT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages