A python CLI tool for managing and organizing the repetitive tasks involved with keeping remote geodatabases in sync with their sources. In other words, it is a tool to tame your scheduled task nightmare.
The first rule of π is it does not work on any sabbath.
The second rule of π is that it's out of your element Donny.
The work that forklift does is defined by Pallets. forklift.models.Pallet
is a base class that allows the user to define a job for forklift to perform by creating a new class that inherits from Pallet
. Each pallet should have Pallet
in it's file name and be unique among other pallets run by forklift.
A Pallet can have zero or more Crates. forklift.models.Crate
is a class that defines data that will be moved from one location to another (reprojecting to web mercator by default). Crates are created by calling the add_crates
(or add_crate
) methods within the build
method on the pallet. For example:
class MyPallet(Pallet):
def __init__(self):
#: this is required to initialize the Pallet base class properties
super(MyPallet, self).__init__()
def build(self, configuration)
#: all operations that can throw an exception should be done in build
destination_workspace = 'C:\\MapData'
source_workspace = path.join(self.garage, 'connection.sde')
self.add_crate('Counties', {'source_workspace': source_workspace,
'destination_workspace': destination_workspace})
For details on all of the members of the Pallet
and Crate
classes see models.py.
For examples of pallets see samples/PalletSamples.py.
Interacting with forklift is done via the command line interface. Run forklift -h
for a list of all of the available commands.
config.json
is created in the working directory after running forklift config init
. It contains the following properties:
-
changeDetectionTables
- An array of strings that are paths to change detection tables relative to the garage folder (e.g.SGID.sde\\SGID.META.ChangeDetection
). A match between the source table name of a crate and a name from this table will cause forklift to skip hashing and use the values in the change detection table to determine if a crate's data needs to be updated. Each table should have the following fields:table_name
- A string field that contains a lower-cased, fully-qualified table name (e.g.sgid.boundaries.counties
).hash
- A string that represents a unique hash of the entirety of the data in the table such that any change to data in the table will result in a new value.
-
configuration
- A configuration string (Production
,Staging
, orDev
) that is passed toPallet:build
to allow a pallet to use different settings based on how forklift is being run. Defaults toProduction
. -
dropoffLocation
- The folder location where production ready files will be placed. This data will be compressed and will not contain any forklift artifacts. Pallets place their data in this location within theircopy_data
property. -
email
- An object containingfromAddress
, andsmptPort
, andsmtpServer
or a sendgridapiKey
for sending report emails. -
hashLocation
- The folder location where forklift creates and manages data. This data contains hash digests that are used to check for changes. Referencing this location within a pallet is done by:os.path.join(self.staging_rack, 'the.gdb')
. -
notify
- An array of emails that will be sent the summary report each timeforklift lift
is run. -
repositories
- A list of github repositories in the<owner>/<name>
format that will be cloned/updated into thewarehouse
folder. A secure git repo can be added manually to the config in the format below:"repositories": [{ "host": "gitlabs.com/", "repo": "name/repo", "token": "personal access token with `read_repository` access only" }]
-
sendEmails
- A boolean value that determines whether or not to send forklift summary report emails after each lift. -
servers
- An object describing one or more production servers that data will be shipped to. See below for more information. -
serverStartWaitSeconds
- The number of seconds that forklift will wait after starting ArcGIS Server. Defaults to 300 (5 minutes). -
shipTo
- A folder location that forklift will copy data to for each server. This is the datas' final location. Everything in thedropoffLocation
will be copied to theshipTo
location during a forklift ship. TheshipTo
path is optionally formatted with theservers.host
value if present and necessary. Place a{}
in yourshipTo
path if you would like to use this feature. eg:\\\\{}\\c$\\data
. -
warehouse
- The folder location where all of therepositories
will be cloned into and where forklift will scan for pallets to lift. -
slackWebhookUrl
- If you have a slack channel, you can login to the admin website and create a webhook url. If you set this property forklift will send reports to that channel.
Any of these properties can be set via the config set
command like so:
forklift config set --key sendEmails --value False
If the property is a list then the value is appended to the existing list.
Metadata is only copied from source to destination when the destination is first created, not on subsequent data updates. If you want to push metadata updates, delete the destination in the hashing folder and then it will be updated when it is recreated on the next lift.
From within the ArcGIS Pro conda environment (c:\Program Files\ArcGIS\Pro\bin\Python\scripts\proenv.bat
):
-
Install git.
-
Install Visual Studio Build tools with the Desktop development with C++ module
-
Install ArcGIS Pro.
-
Add ArcGIS Pro to your path.
- If installed for all users:
c:\Program Files\ArcGIS\Pro\bin\Python\scripts\
. - If install for single user:
C:\Users\{USER}\AppData\Local\Programs\ArcGIS\Pro\bin\Python\Scripts
.
- If installed for all users:
-
Create a conda environment for forklift
conda create --name forklift python=3.9
. -
Activate the conda environment
activate forklift
. -
conda install arcpy -c esri
-
Checkout forklift repository:
git clone https://github.com/agrc/forklift.git
-
pip install .\
from the directory containingsetup.py
. -
Install the python dependencies for your pallets.
-
forklift config init
-
forklift config repos --add agrc/parcels
- The agrc/parcels is the user/repo to scan for Pallets. -
forklift garage open
- Opens garage directory. Copy all connection.sde files to the forklift garage. -
forklift git-update
- Updates pallet repos. Add any secrets or supplementary data your pallets need that is not in source control. -
Edit the
config.json
to add the arcgis server(s) to manage. The options property will be mixed in to all of the other servers.username
ArcGIS admin username.password
ArcGIS admin password.host
ArcGIS host address eg:myserver
. Validate this property by looking at themachineName
property returned by/arcgis/admin/machines?f=json
port
ArcGIS server instance port eg: 6080
"servers": { "options": { "username": "mapserv", "password": "test", "port": 6080 }, "primary": { "host": "this.is.the.qualified.name.as.seen.in.arcgis.server.machines", }, "secondary": { "host": "this.is.the.qualified.name.as.seen.in.arcgis.server.machines" }, "backup": { "host": "this.is.the.qualified.name.as.seen.in.arcgis.server.machines", "username": "test", "password": "password", "port": 6443 } }
-
Edit the
config.json
to add the email notification properties. (This is required for sending email reports)smtpServer
The SMTP server that you want to send emails with.smtpPort
The SMTP port number.fromAddress
The from email address for emails sent by forklift.
"email": { "smtpServer": "smpt.server.address", "smtpPort": 25, "fromAddress": "[email protected]" }
-
forklift lift
-
forklift ship
run_forklift.bat
is an example of a batch file that could be used to run forklift via the Windows Scheduler.
From the root of the forklift source code folder:
- Activate forklift environment:
activate forklift
- Pull any new updates from GitHub:
git pull origin master
- Pip install with the upgrade option:
pip install .\ -U
- Upgrade ArcGIS Pro
There is no second step if you originally created a fresh conda environment (not cloned from arcgispro-py3
) and installed arcpy via conda install arcpy -c esri
.
If you do need to recreate the forklift environment from scratch, follow these steps:
- Copy the
forklift-garage
folder to a temporary location. - Activate forklift environment:
activate forklift
- Export conda packages:
conda env export > env.yaml
- Export pip packages:
pip freeze > requirements.txt
- Remove and make note of any packages in
requirements.txt
that are not published to pypi such as forklift. - Deactivate forklift environment:
deactivate
- Remove forklift environment:
conda remove --name forklift --all
- Create new forklift environment:
conda create --clone arcgispro-py3 --name forklift --pinned
- Activate new environment:
activate forklift
- Reinstall conda packages:
conda env update -n forklift -f env.yaml
- Reinstall pip packages:
pip install -r requirements.txt
- Copy the
forklift-garage
folder to the site-packages folder of the newly created environment. - Reinstall forklift and any other missing pip package (from root of project):
pip install .\
- create new env
conda create --name forklift --clone arcgispro-py3
activate forklift
- install forklift and deps
pip install -e ".[tests]"
- run forklift
forklift -h
Tests should show up in VSCode's text explorer.
To run them from the command line:
pytest
Tests that depend on a local SDE database (see tests/data/UPDATE_TESTS.bak
) will automatically be skipped if it is not found on your system.
To run a specific test or suite: pytest -k <test/suite name>