Skip to content

univizor/u3

Repository files navigation

u3

u3 is scraper and feeder for univizor project.

Build status

Docker Pulls

Docker Stars

Supported scrapers

Scraper Homepage State
rul repozitorij.uni-lj.si Done
dkum dk.um.si Done
bf digitalna-knjiznica.bf.uni-lj.si Done
famnit famnit.upr.si Done
ung sabotin.ung.si Done

Running with Docker Compose

docker-compose run u3 bf -a categories=biologija -L INFO

If you need to rebuild image

docker build -t univizor/u3:latest .

Some crawling options can be seen in refresh.sh.

Running natively

Please read NATIVE.md.

Scripts and tools

  • refresh.sh - Script that starts scraping in parallel fashion. New items will be added to collection. This script should be ran on periodic intervals via cron.
  • recreate_database.py - Drops all existing tables, and creates new tables with up-to-date structure.

Configuration

This is default configuration that can be overwritten by setting ENV variables.

CONCURRENT_REQUESTS = 16
DOWNLOAD_DELAY = 3
FILES_STORE = ./data/files
HASHING_ALGORITHM = sha256 
DATABASE_URL = ...
PERSIST_STATS_INTERVAL = 10
DOGSTATSD_ADDR = ... 
DOGSTATSD_PORT = ...

Sentry

u3 now supports Sentry integration via scrapy-sentry library. To use, set the SENTRY_DSN environment variable:

docker run -ti --rm \
  --name u3 \
  --link pg \
  --env DATABASE_URL="postgresql://postgres:@pg:5432/u3_dev" \
  --env SENTRY_DSN="http://public:[email protected]/12345" \
  univizor/u3:latest bf -a categories=biologija

Contributors

About

Scrapers and feeders framework for univizor.si

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published