Skip to content

Scrape Rest API

anthonyblackham edited this page Nov 6, 2019 · 8 revisions

There are a lot of GIS services online that have public REST endpoints, this data can be scraped if you need a local copy of the data to do more detailed analysis

*obligatory check legal standing of data before use etc. etc.

Basic Overview

A simple manual method for loading data:

  • assumes objects are queryable and return a geometry
  • assumes 1000 feature limit

Rest API's follow this format:

serviceURL = "http://website.com/rest/services"

serviceMap = "/layers/MapServer/"

serviceLayerID = "0"

serviceMaxRequest = 1000

A real example would be something like:

http://mtbachelor.co.washington.or.us/arcgiswa2/rest/services/LUT_ECS/Web_taxlots/MapServer/0

You can get a list of object ID's with:

query?where=1%3D1&returnIdsOnly=true&f=pjson

eg:

http://mtbachelor.co.washington.or.us/arcgiswa2/rest/services/LUT_ECS/Web_taxlots/MapServer/0/query?where=1%3D1&returnIdsOnly=true&f=pjson

If you want to scrape a web service and it has a limit a "simple" method is to generate a list of all object ID's, cut into chunks of 1000, query those chunks of 1000, then mash results together.

This is an example of one of those 1000 chunks generating a geojson:

http://mtbachelor.co.washington.or.us/arcgiswa2/rest/services/LUT_ECS/Web_taxlots/MapServer/0/query?where=CID%3E%3D11753000+AND+CID%3C%3D11754000&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&f=geojson

Tools/Scripts

There are a myriad of tools/scripts people have come up with.

https://github.com/crackernutter/EsriRESTScraper see also: https://community.esri.com/thread/118781

Rest Scraper

Attribution: Mike from socal GIS user group

Code: run from the command line using C:\Python27\ArcGIS10.2\python.exe \path\to\script.py

import arcpy
import urllib2
import json

# Setup
arcpy.env.overwriteOutput = True
baseURL = "https://domain.com/MapServer/0"
fields = "*"
outdata = "\path\to\file"

# Get record extract limit
urlstring = baseURL + "?f=json"
j = urllib2.urlopen(urlstring)
js = json.load(j)
maxrc = int(js["maxRecordCount"])
print "Record extract limit: %s" % maxrc

# Get object ids of features
where = "1=1"
urlstring = baseURL + "/query?where={}&returnIdsOnly=true&f=json".format(where)
j = urllib2.urlopen(urlstring)
js = json.load(j)
idfield = js["objectIdFieldName"]
idlist = js["objectIds"]
idlist.sort()
numrec = len(idlist)
print "Number of target records: %s" % numrec

# Gather features
print "Gathering records..."
fs = dict()
for i in range(0, numrec, maxrc):
 torec = i + (maxrc - 1)
 if torec > numrec:
   torec = numrec - 1
 fromid = idlist[i]
 toid = idlist[torec]
 where = "{} >= {} and {} <= {}".format(idfield, fromid, idfield, toid)
 print "  {}".format(where)
 urlstring = baseURL + "/query?where={}&returnGeometry=true&outFields={}&f=json".format(where,fields)
 fs[i] = arcpy.FeatureSet()
 fs[i].load(urlstring)

# Save features
print "Saving features..."
fslist = []
for key,value in fs.items():
 fslist.append(value)
arcpy.Merge_management(fslist, outdata)
print "Done!"

If it crashes for whatever reason because the filesystem is too large you can use this script:

import arcpy
import urllib2
import json

# Setup
arcpy.env.overwriteOutput = True
baseURL = "https://domain.com/MapServer/0"
fields = "*"
outdata = "\path\to\file"

# Get record extract limit
urlstring = baseURL + "?f=json"
j = urllib2.urlopen(urlstring)
js = json.load(j)
maxrc = int(js["maxRecordCount"])
print "Record extract limit: %s" % maxrc

# Get object ids of features
where = "1=1"
urlstring = baseURL + "/query?where={}&returnIdsOnly=true&f=json".format(where)
j = urllib2.urlopen(urlstring)
js = json.load(j)
idfield = js["objectIdFieldName"]
idlist = js["objectIds"]
idlist.sort()
numrec = len(idlist)
print "Number of target records: %s" % numrec

# Gather features
print "Gathering records..."
merge_list = []
for i in range(0, numrec, maxrc):
 torec = i + (maxrc - 1)
 if torec > numrec:
   torec = numrec - 1
 fromid = idlist[i]
 toid = idlist[torec]
 where = "{} >= {} and {} <= {}".format(idfield, fromid, idfield, toid)
 print "  {}".format(where)
 urlstring = baseURL + "/query?where={}&returnGeometry=true&outFields={}&f=json".format(where,fields)
 fs = arcpy.FeatureSet()
 fs.load(urlstring)
 tempdata = outdata + str(i)
 print "Copying features..."
 arcpy.CopyFeatures_management(fs,tempdata)
 merge_list.append(tempdata)

# Save features
print "Saving features..."
arcpy.Merge_management(merge_list, outdata)
print "Done!"

Install ijson dependency

copy the source files from HERE

and extract here:

C:\Python27\ArcGIS10.2
############################ Incomplete Attempts ##########################
### Install dependencies

Since we are working with python a windows machine through arcmap (and often people have work machines that are locked down) we'll try and get things running with what we have.

### Setup Tools

You first need a way to install a package manager

Download the source [setuptools](https://pypi.python.org/pypi/setuptools) to where python is installed eg

C:\Python27\ArcGIS10.2


from a command prompt:

cd c:/path/to/setuptools python setup.py install


now that you have proper tools you can install pip and subsequently other packages/dependencies

#### Installing Pip (python package manager)

to avoid SSL issues use pip==1.2.1 

C:\Python27\ArcGIS10.2\Scripts\easy_install.exe pip


### Install ijson dependency

pip install ijson