Last updated: April 23, 2014
Authors: Adrian Laurenzi & Louis Fettet
Looking for the latest release? Get it here: https://github.com/socrata/datasync/releases
DataSync is an executable Java application which serves as a general solution to automate publishing data on the Socrata platform. It can be used through a easy-to-use graphical interface or as a command-line tool ('headless mode'). Whether you are a non-technical user, developer, or ETL specialist DataSync makes data publishing simple and reliable. DataSync takes a CSV or TSV file on a local machine or networked hard drive and publishes it to a Socrata dataset so that the Socrata dataset stays up-to-date. DataSync jobs can be integrated into an ETL process, scheduled using a tool such as the Windows Task Scheduler or Cron, or used to perform updates or create new datasets in batches. DataSync works on any platform that runs Java version 1.7 or higher (i.e. Windows, Mac, and Linux). This simple, yet powerful publishing tool lets you easily update Socrata datasets programmatically and automatically (scheduled), without writing a single line of code.
Comprehensive DataSync Documentation
The Socrata University Class: Socrata Introduction to Integration
Standard jobs can be set up to take a CSV data file from a local machine or networked folder and publish it to a specific dataset. A job can be automated easily using the Windows Task Scheduler or similar tool to run the job at specified intervals (i.e. once per day).
Port jobs are used for moving data around that is already on the Socrata platform. Users that have publisher rights can make copies of datasets through this tool. Port jobs allow the copying of both dataset schemas (metadata and columns) and data (rows).
This repository is our development basecamp. If you find a bug or have questions, comments, or suggestions, you can contribute to our issue tracker.
DataSync uses Maven for building and package management. For more information: What is Maven?
To build the project run:
mvn clean install
To compile the project into an executable JAR file (including all dependencies) run:
mvn clean compile -Dmaven.test.skip=true assembly:single
This puts the JAR file into the "target" directory inside the repo. So to open DataSync, simply:
cd target
java -jar DataSync-1.5.4-jar-with-dependencies.jar
DataSync can be used as a Java SDK, for detailed documentation refer to:
http://socrata.github.io/datasync/guides/datasync-library-sdk.html