forked from cbick/gps2gtfs
-
Notifications
You must be signed in to change notification settings - Fork 0
Match raw AVL/GPS tracking data of public transit vehicles to their GTFS-defined schedules
License
vtavirtualtransit/gps2gtfs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
=============================== === gps2gtfs README === =============================== A suite of python and SQL tools for matching bus tracking data to GTFS data. === License === gps2gtfs is released under the MIT license. See LICENSE in this directory. === About === At the moment this is a more-or-less scattered collection of tools which you can use to match GPS tracking data of public transportation vehicles with their GTFS schedule counterparts. The instructions below give a straightforward step-by-step process which will accomplish this, but as of yet there is no coherent single interface by which this is done. === Instructions === 0. Prerequisites You will need: - a SQL database in which you can make tables and such - Python - Python module for accessing your database (recommend psycopg for postgresql) - gtfs_SQL_importer tool available at http://cbick.github.com/gtfs_SQL_importer - GTFS data for your agency imported into the database using the above - This software If you are missing any of the first 3: - I recommend PostgreSQL. It can be a pain to set up but handles very nicely. - Get python from www.python.org . I used version 2.5.x when writing this. - You can find python modules for your database from: http://wiki.python.org/moin/DatabaseInterfaces If you have PostgreSQL I recommend you get psycopg2: http://initd.org/pub/software/psycopg/ 1. Set up python database access If you're using postgres or other socket-communicating db server: In the config folder there is a file called db.config.example. Make a copy of this file called db.config and edit it appropriately. If you're using anything other than postgres: Modify common/src/dbutils.py to handle your DBMS (shouldn't be difficult). 2. Create the schema for bus tracking data. From the core/sql folder, apply the make_tables.sql file to you database. Using psql as an example, you would run the command: psql mydbname myusername -f make_tables.sql 3. Pull the bus tracking data into the DB. It is up to you to figure out how to do this for your particular data source. From the perspective of Python, the goal is to get your data points into the format requested by the update_routes() method inside the core/sql/dbqueries.py file. Otherwise, you will need to fit your data into the db scheme, in particular into the vehicle_track table that was created in step 2. To grab San Francisco Muni tracking data you can just use the code that is already provided in the nextmuni_import directory. This is set up to pull the data directly from the nextmuni xml service. To use it, follow these steps: 3.1. Fix erroneous SFMuni GTFS data: From the nextmuni_import/sql directory, apply the change_sf_gtfs.sql file. 3.2. Insert correct routeid/dirtag matchup data: From the nextmuni_import/sql directory, apple the populate_rid_dirtag.sql file. 3.3. Set up a job to collect the data: Essentially, you want to run the python script found at nextmuni_import/src/route_scraper.py to run every 1 minute (or however often you choose), giving it "ALL" as an argument: python route_scraper.py ALL To get this to run every minute, the easiest thing is to set up a cron job. You're on your own to figure this out, or to figure out any method to have this run on a regular basis. *** Before you choose how to provide data to the db, see *** step 4.2 below. 4. Do some pre-matchup setup. Now that there is tracking data in your database, you probably want to match that to the GTFS schedule, since you are using this package. To do this you first need to do some preprocessing. 4.1. From the core/src directory, run the following: python Preprocessing.py trips This separates the tracking data into distinct routes and trips. 4.2. Then, you have two options: a) Populate the routeid_dirtag table manually. (This was already done in step 3 if you imported data from San Francisco Muni). The idea is to match NextBus' "dirtag" values up with GTFS route ID's. The reason to do this manually is that NextBus and GTFS both are riddled with errors that you may or may not have caught yet. *** If you aren't using a NextBus data source, then you must choose *** what to do depending on what you provided for the dirtag field in *** step 3. b) Try to automatically populate the routeid_dirtag table. This will not always work, since as I said in (a) there are lots of data errors over which I have no control nor which do I have any method to detect. You can do this as follows (from core/src directory): python Preprocessing.py populate_ridtags Before you do this though you might want to check out the results from the following SQL queries: select count(*), route_short_name as cnt from gtf_routes group by route_short_name having count(*) > 1; select count(distinct routetag) from vehicle_track where dirtag != 'null' and routetag != 'null' group by dirtag having count(distinct routetag) > 1; If there are 0 results then you are in good shape. If there are a lot of results then you are not in so good shape, and should give serious consideration to (a). The data I currently have for SF Muni returns 121 rows for the second query. 5. Match GPS data to GTFS schedules (hoorah!) ** NOTE: You might want to selectively apply some of the indexes found in ** the core/sql/make_table_indexes.sql file, before doing the following. From the core/src directory, run: python Preprocessing.py match_trips You have now successfully matched the trips from 4.1 to GTFS scheduled trips. Now, run: python Preprocessing.py gps_schedules This populates the data telling you the actual schedule that each GPS trip followed (i.e., when each bus actually arrived at each stop). Specifically, this data is put into the gps_stop_times table (compare to gtf_stop_times). 6. Make some corrections Some situations can result in erroneous assignments after following step 5. To fix these, run python Preprocessing.py fix_earlybirds This searches for better matches for routes which are abnormally off-schedule. *** You can visually test the gtfs<->gps matchings using the OSMViz package, *** available at http://cbick.github.com/osmviz . *** Once you have OSMViz, go to the core/src/ directory and look at *** the visualizer.py file. By editing the query at the end of the file *** (inside the "if __name__ == '__main__'" clause), you can choose what *** to visualize. Run "python visualizer.py" and enjoy the show. 7. Get ready to analyze From the core/sql folder, you will want to apply the make_table_indexes.sql file to your database. This will make all future analyses much much faster. 8. Analyze There is some postprocessing/analysis code in the postprocessing directory. The postprocessing/sql directory holds code for creating a summary table of bus lateness data, as well as some various queries that were useful to me when exploring the data. The postprocessing/src directory holds the Stats.py file which performs a bunch of statistical analysis and makes plots of the data, but it requires the SciPy and Matplotlib libraries.
About
Match raw AVL/GPS tracking data of public transit vehicles to their GTFS-defined schedules
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 62.5%
- HTML 26.1%
- PLpgSQL 10.8%
- Other 0.6%