Skip to content

MiroBabic/itms2014

Repository files navigation

itms2014

python scraper for itms2014 data

Works with SQLite DB or PostgreSQL

DB Config: Setup connectors and specify in "active" variable which DB you want to use

{	"active": "postgresql",
	"db": {
		"postgresql" :{
			"server" : "localhost",
			"port" :"5432",
			"dbname" :"itms",
			"user" : "postgres",
			"password" : "my_password"
		},
		"sqlite": {
		"filename": "itms.db"
		}
	}
}

itms_urls.json contains base tables to download (so far projekty a vyzvy) and detail structure for objects. If you want to load data only for some table, set rest for active:false

Usage: if PostgreSQL is selected as active DB, create it first. Otherwise skip to next step (create tables)

python create_pg_db.py
python create_base_tables.py
python create_views.py
python get_base_data.py
python get_detail_data_multi.py  (if you want slow download one by one, use get_detail_data.py)

For exporting data into file use export.py - this works only for SQLite!

examples:

show available tables:

python export.py -f itms.db

simple export of one table:

python export.py -f itms.db -t projekt_ukonceny

export of one table with specified columns and delimiter (all data quoted)

python export.py -f itms.db -t projekty_total -c akronym aktivity cisloZmluvy datumKoncaRealizacie id intenzity kod meratelneUkazovatele partneri polozkyRozpoctu popisProjektu uzemneMechanizmy vyzva zameranieProjektu stav_implementacie miestaRealizacie_nuts3_gpsLat miestaRealizacie_nuts3_gpsLon -d ; -q
-f  file name with sqlite database
-t  table/view name to export
-c  column names separated by space (if no column provided, all columns will be exported)
-d  delimiter (if no delimiter specified, | is used as delimiter by default )
-q  quoted data (if not specified, only necessary records are delimited)

About

python scraper for itms2014 data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages