Extracting Insights From the Rogers Mobile Dataset

Intro

The Urban Data Lab has access to about 80 GB of geospatial data (in the Avro format) from Rogers which contains timestamps and geographic coordinates. The timestamp and latitude and longitude can be combined to extract information from the datasets like what areas are more popular than others and at what time of day.

Python was used as the programming language of choice to work with the data because of its ease of use stemming from pre-existing libraries for working with Avro files and manipulating spatial objects. Processing speed was a concern for working with a large amount of data; however, since most of the computations were performed just once, it did not make sense to devote more than a day’s time into optimizing performance. Choosing proper data structures like R-trees and implementing parallelism was enough to get the compute time to be at most several hours, an acceptable time where one can work on other tasks in the background.

The Process

Removing data not needed.

The process starts on the Azure VM in the “rogersdata” directory. The first step is to purge unwanted rows and columns. This is done by extracting the schemas from each unique file type and removing any columns that are not needed, and then feeding back those schema names to “dcols.py”. This will put all the cleaned files in a named output folder.

Searching the data

Once the data has been cleaned, the insights can be searched for, “search_all.py” will do this for you. It will look over all of the data and output a JSON file with all of the days and counts, which can then be converted to a GeoJSON that has the counts of everything.

Displaying the data

The GeoJSON file can then be displayed using the ArcGIS API for JavaScript.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
avro_schemas		avro_schemas
cleaned		cleaned
README.md		README.md
remove_rows_and_columns.py		remove_rows_and_columns.py
requirements.txt		requirements.txt
schema.py		schema.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extracting Insights From the Rogers Mobile Dataset

Intro

The Process

Removing data not needed.

Searching the data

Displaying the data

About

Releases

Packages

Languages

UBC-UrbanDataLab/Rogers-Avro-Toolkit

Folders and files

Latest commit

History

Repository files navigation

Extracting Insights From the Rogers Mobile Dataset

Intro

The Process

Removing data not needed.

Searching the data

Displaying the data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages