Skip to content

Batch Downloading Wigle.net Data

Brannon Dorsey edited this page Apr 10, 2015 · 1 revision

Scraping the Wigle.net Database

The Wigle.net database can be scraped using the wigle_dump.js script in the node/ directory.

Usage: node wigle_dump.js <required> [options]
Required:
    --north=<max_lat>, latrange1=<max_lat>
    --south=<min_lat>, latrange2=<min_lat>
    --west=<min_long>, longrange1=<min_long>
    --east=<max_long>, longrange2=<max_long>

Options:
    --username=[username],   -u [username]   Wigle.net username.
    --password=[password],   -c [password]   Wigle.net password.
    --chunkSize=[chunkSize], -c [chunkSize]  Set the chunkSize. i.e. 0.005
    --lastupdt=[lastupdt],   -l [lastupdt]   Search only networks found since YYYYMMDDHHMMSS. i.e. 20100101000000
    --dryRun,                -d              Calculate number of prepaired requests only. Does not actually execute reqests.

This script is used to batch download data from the Wigle.net database without having to worry (too much) about it's default 10,000 result-per-query limit. You specify a lat-long bounding box and a chunkSize and wigle_dump.js breaks your query into lots of smaller requests so that you can download data without worrying that Wigle.net's result limit will take affect before you have downloaded all of the networks in a specific query region (provided your chunkSize is appropriate).

The above image illustrates how wigle_dump.js can be used to download Wigle.net database data for a sizable region. In order to accurately download all of the Wigle.net's Chicago data in the minimum number of requests (so as not to max out the number of daily request limits on a Wigle.net account) four unique wigle_dump.js requests were made, each with their own north, west, south, and east lat-long boundaries and chunkSize settings.

node wigle_dump.js --north=<req_1_north> --west=<req_1_west> --south=<req_1_south> --east=<req_1_east> --chunkSize=0.004
node wigle_dump.js --north=<req_2_north> --west=<req_2_west> --south=<req_2_south> --east=<req_2_east> --chunkSize=0.004
node wigle_dump.js --north=<req_3_north> --west=<req_3_west> --south=<req_3_south> --east=<req_3_east> --chunkSize=0.004
node wigle_dump.js --north=<req_4_north> --west=<req_4_west> --south=<req_4_south> --east=<req_4_east> --chunkSize=0.004

As you can see from the Wigle.net density image, downtown Chicago's Loop Neighborhood has a far greater wifi network density than some of it's satellite neighborhoods. For this reason, the first request defined a small lat-long range area with a small chunk size. This forces wigle_dump.js to use more requests so as to hopefully not exceed the 10,000 result-per-query max imposed by the Wigle.net API. The next two wigle_dump.js commands cover large areas and have a chunk size that is twice that of the first command as the wifi density appears smaller. This is to minimize the number of requests made to the Wigle.net API. Finally, the fourth command narrows in on a small subsection of the map that was missed in request one and three. This area is quite dense with wifi networks, so a small chunk size was used.

wigle_dump.js saves downloaded data to the data/wigle_data/ directory with the north_south_west_east.json filename.

By default, Wigle.net accounts have per-account and per-ip limits of 50 requests per day (these limits reset daily at 12:00AM CST). For this reason it is helpful to optimize your wigle_dump.js commands such that you grab the most networks in the least number of requests (while also making sure that you rarely reach the 10,000 result-per-request limit).

I've found the Wigle.net limits to be quite strict/aggressive in their IP banning. It seems that if you have multiple accounts making requests from the same outward facing IP address, one reaching its daily limits may ban the rest of the account's limit for the day. I'm not absolutely sure if this is happening, but watch out for some strange behavior like this.

Note: If you contact [email protected] and ask nicely you they may up your daily request limit. By doing so I was granted a 250 request-per-day account bump.

Importing scraped data w/ MongoDB

Data downloaded with wigle_dump.js can be imported into a MongoDB database using the wigle_to_mongo.js command.

Usage: node wigle_to_mongo.js -i <input> -c <collection>
node wigle_to_mongo.js -i ../data/wigle_data/41.8986_41.8481_-87.6479_-87.6174.json -c wigle

Make sure that you have the MongoDB daemon mongod running before you import data. This script expects that you have a unique index for {geo: "2d", ssid: 1} so as not to insert duplicates (see below).

Wigle.net data imported this way saves each network as a document in the specified collection (-c) with the following schema:

{ 
	"_id" : ObjectId("5526dd6a63abfc875044e89c"), 
	"ssid" : "Palmateer_Time_Capsule", 
	"netid" : "00:1f:f3:c4:47:2b", 
	"geo" : { "lat" : 41.94921875, "lon" : -87.64414978 }, 
	"lastupdt" : 20121224000813 
}

Some helpful indexes for this data include:

db.wigle.ensureIndex({ ssid: 1 });
db.wigle.ensureIndex({ geo: "2d", ssid: 1 }, { unique: true });