-
Notifications
You must be signed in to change notification settings - Fork 0
Scrape Rest API
There are a lot of GIS services online that have public REST endpoints, this data can be scraped if you need a local copy of the data to do more detailed analysis
*obligatory check legal standing of data before use etc. etc.
A simple manual method for loading data:
- assumes objects are queryable and return a geometry
- assumes 1000 feature limit
Rest API's follow this format:
serviceURL = "http://website.com/rest/services"
serviceMap = "/layers/MapServer/"
serviceLayerID = "0"
serviceMaxRequest = 1000
A real example would be something like:
http://mtbachelor.co.washington.or.us/arcgiswa2/rest/services/LUT_ECS/Web_taxlots/MapServer/0
You can get a list of object ID's with:
query?where=1%3D1&returnIdsOnly=true&f=pjson
eg:
http://mtbachelor.co.washington.or.us/arcgiswa2/rest/services/LUT_ECS/Web_taxlots/MapServer/0/query?where=1%3D1&returnIdsOnly=true&f=pjson
If you want to scrape a web service and it has a limit a "simple" method is to generate a list of all object ID's, cut into chunks of 1000, query those chunks of 1000, then mash results together.
This is an example of one of those 1000 chunks generating a geojson:
http://mtbachelor.co.washington.or.us/arcgiswa2/rest/services/LUT_ECS/Web_taxlots/MapServer/0/query?where=CID%3E%3D11753000+AND+CID%3C%3D11754000&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&f=geojson
There are a myriad of tools/scripts people have come up with.
https://github.com/crackernutter/EsriRESTScraper see also: https://community.esri.com/thread/118781
Attribution: Mike from socal GIS user group
Code: run from the command line using C:\Python27\ArcGIS10.2\python.exe \path\to\script.py
import arcpy
import urllib2
import json
# Setup
arcpy.env.overwriteOutput = True
baseURL = "https://domain.com/MapServer/0"
fields = "*"
outdata = "\path\to\file"
# Get record extract limit
urlstring = baseURL + "?f=json"
j = urllib2.urlopen(urlstring)
js = json.load(j)
maxrc = int(js["maxRecordCount"])
print "Record extract limit: %s" % maxrc
# Get object ids of features
where = "1=1"
urlstring = baseURL + "/query?where={}&returnIdsOnly=true&f=json".format(where)
j = urllib2.urlopen(urlstring)
js = json.load(j)
idfield = js["objectIdFieldName"]
idlist = js["objectIds"]
idlist.sort()
numrec = len(idlist)
print "Number of target records: %s" % numrec
# Gather features
print "Gathering records..."
fs = dict()
for i in range(0, numrec, maxrc):
torec = i + (maxrc - 1)
if torec > numrec:
torec = numrec - 1
fromid = idlist[i]
toid = idlist[torec]
where = "{} >= {} and {} <= {}".format(idfield, fromid, idfield, toid)
print " {}".format(where)
urlstring = baseURL + "/query?where={}&returnGeometry=true&outFields={}&f=json".format(where,fields)
fs[i] = arcpy.FeatureSet()
fs[i].load(urlstring)
# Save features
print "Saving features..."
fslist = []
for key,value in fs.items():
fslist.append(value)
arcpy.Merge_management(fslist, outdata)
print "Done!"
If it crashes for whatever reason because the filesystem is too large you can use this script:
import arcpy
import urllib2
import json
# Setup
arcpy.env.overwriteOutput = True
baseURL = "https://domain.com/MapServer/0"
fields = "*"
outdata = "\path\to\file"
# Get record extract limit
urlstring = baseURL + "?f=json"
j = urllib2.urlopen(urlstring)
js = json.load(j)
maxrc = int(js["maxRecordCount"])
print "Record extract limit: %s" % maxrc
# Get object ids of features
where = "1=1"
urlstring = baseURL + "/query?where={}&returnIdsOnly=true&f=json".format(where)
j = urllib2.urlopen(urlstring)
js = json.load(j)
idfield = js["objectIdFieldName"]
idlist = js["objectIds"]
idlist.sort()
numrec = len(idlist)
print "Number of target records: %s" % numrec
# Gather features
print "Gathering records..."
merge_list = []
for i in range(0, numrec, maxrc):
torec = i + (maxrc - 1)
if torec > numrec:
torec = numrec - 1
fromid = idlist[i]
toid = idlist[torec]
where = "{} >= {} and {} <= {}".format(idfield, fromid, idfield, toid)
print " {}".format(where)
urlstring = baseURL + "/query?where={}&returnGeometry=true&outFields={}&f=json".format(where,fields)
fs = arcpy.FeatureSet()
fs.load(urlstring)
tempdata = outdata + str(i)
print "Copying features..."
arcpy.CopyFeatures_management(fs,tempdata)
merge_list.append(tempdata)
# Save features
print "Saving features..."
arcpy.Merge_management(merge_list, outdata)
print "Done!"
copy the source files from HERE
and extract here:
C:\Python27\ArcGIS10.2
############################ Incomplete Attempts ##########################
### Install dependencies
Since we are working with python a windows machine through arcmap (and often people have work machines that are locked down) we'll try and get things running with what we have.
### Setup Tools
You first need a way to install a package manager
Download the source [setuptools](https://pypi.python.org/pypi/setuptools) to where python is installed eg
C:\Python27\ArcGIS10.2
from a command prompt:
cd c:/path/to/setuptools python setup.py install
now that you have proper tools you can install pip and subsequently other packages/dependencies
#### Installing Pip (python package manager)
to avoid SSL issues use pip==1.2.1
C:\Python27\ArcGIS10.2\Scripts\easy_install.exe pip
### Install ijson dependency
pip install ijson