A high performance, easy to use, multithreaded command line tool which downloads images from the given webpage.
Build Status | Version | Downloads |
---|---|---|
Click here to see it in action!
You can also download using pip:
$ pip install ImageScraper
Note that ImageScraper
depends on lxml
, requests
,
setproctitle
, and future
. It also depends on pyThreadpool
which can be downloaded and installed from
here temporarily. If you
run into problems in the compilation of lxml
through pip
,
install the libxml2-dev
and libxslt-dev
packages on your system.
$ image-scraper [OPTIONS] URL
You can also use it in your Python scripts.
import image_scraper
image_scraper.scrape_images(URL)
-h, --help Print help
-m, --max-images <number> Maximum number images to be scraped
-s, --save-dir <path> Name of the folder to save the images
-g, --injected Scrape injected images
--formats [ [FORMATS ..]] Specify the formats of images to be scraped
--min-filesize <size> Limit on size of image in bytes (default: 0)
--max-filesize <size> Limit on size of image in bytes (default: 100000000)
--dump-urls Print the URLs of the images
--scrape-reverse Scrape the images in reverse order
--proxy-urls Use the specified HTTP/HTTPS proxy
Extract the contents of the tar file.
$ cd ImageScraper/
$ python setup.py install
$ image-scraper --max-images 10 [url to scrape]
Scrape all images
$ image-scraper ananth.co.in/test.html
Scrape at max 2 images
$ image-scraper -m 2 ananth.co.in/test.html
Scrape only gifs and download to folder ./mygifs
$ image-scraper -s mygifs ananth.co.in/test.html --formats gif
By default, a new folder called "images_" will be created in the working directory, containing all the downloaded images.