Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge of Tjdett and Microtardis branches #4

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2e36e9b
Add some comments
Mar 12, 2012
0e46f48
Customise Atom ingest to support Microtardis workflow.
Mar 14, 2012
1eb9537
Fully implement dataset "updated" handling.
Mar 16, 2012
cb5e09b
Extra warning.
Mar 19, 2012
16a5cfb
Update documentation.
Mar 22, 2012
92dd20b
Catch and log exceptions around enclosure processing.
Mar 22, 2012
422c48d
Merge branch 'microtardis' of https://github.com/stevage/mytardis-app…
Mar 22, 2012
5d191d0
(Joanna H) Make filters work by calling the Django middleware explici…
Mar 22, 2012
d84a6ce
Auto-hide old versions of new files that are imported.
Mar 22, 2012
9ac9076
Move options to separate options.py
Mar 27, 2012
5743105
Merge.
Mar 27, 2012
3aa0c69
Merge.
Mar 27, 2012
35c92d9
Add options for filter processing. Work around issue with saving time…
Mar 27, 2012
e5b3877
Merge
Mar 27, 2012
ee21d90
Add proxy option, fix another timezone bug.
Apr 19, 2012
e0bc178
Update documentation re: options.
Apr 20, 2012
3b4dfec
Option to allow full/quick scans of the feed. Full scans pick up 'old…
Apr 24, 2012
463e74e
Fix missing import, and add a helpful comment in options.py.
May 25, 2012
00ede1f
Tentative merge of stevage and tjdett branches. Untested.
Jul 25, 2012
663c222
Refine merge. Still untested.
Aug 7, 2012
3093bea
Add workaround for util.py
Aug 7, 2012
ed78fe1
More post-merge fixes.
Aug 24, 2012
84126bb
Tyop.
Aug 24, 2012
d065bd8
More sensible default options.
Aug 31, 2012
e401b18
Fix datafile depth error in options.
Aug 31, 2012
0f9638d
Fail more gracefully if MicroTardis app is not present.
Sep 3, 2012
99c5416
Remove references to process_media_content()
Sep 3, 2012
65f11d2
Remove local copy of make_local_copy().
Sep 3, 2012
f318e5f
Local transfers should start with file:/// now.
Jan 4, 2013
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 45 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,36 @@
MyTardis Atom App
=================

Authors: Tim Dettrick (University of Queensland), Steve Bennett (VeRSI)

This app can be used to ingest datasets via Atom. Please see `tests/atom_test` for format examples.

New metadata is ingested first, with data files being copied asynchronously afterwards.

Installation
------------

Symlink this app into a MyTardis `tardis/apps` directory. The preferred name for the app is `atom`.
Git clone the app into `tardis/apps`:

[tardis/apps] git clone https://github.com/stevage/mytardis-app-atom

Configuration
-------------

Celery is used to schedule periodic file ingestion.
Celery is used to schedule periodic file ingestion. This version is designed to work with the Atom Dataset Provider (https://github.com/stevage/atom-dataset-provider), which provides a feed based on changes to a directory structure.

The `atom_ingest.walk_feeds` task takes a variable number of feeds and updates them. Here's an example
for `settings.py` that checks two Picassa feeds every 30 seconds:
The `atom_ingest.walk_feeds` task takes a variable number of feeds and updates them. Here's an example
for `settings.py` using the above dataset provider:

CELERYBEAT_SCHEDULE = dict(CELERYBEAT_SCHEDULE.items() + {
"update-feeds": {
"task": "atom_ingest.walk_feeds",
"schedule": timedelta(seconds=30),
"args": ('http://example.org/feed.atom',
'http://example.test/feed.atom')
"task": "atom_ingest.walk_feeds",
"schedule": timedelta(seconds=60),
"args": ('http://localhost:4000',)
},
}.items())


You must run [celerybeat][celerybeat] and [celeryd][celeryd] for the scheduled updates to be performed.
MyTardis provides a `Procfile` for this purpose, but you can run both adhoc with:

Expand All @@ -40,6 +44,38 @@ HTTP Basic password protection is available via `settings.py` in MyTardis:

In a production environment, you should combine HTTP Basic password protection with SSL for security.

Settings
-------------
Various policy settings are defined in options.py

ALLOW_EXPERIMENT_CREATION = True # Should we create new experiments
ALLOW_EXPERIMENT_TITLE_MATCHING = True # If there's no id, is the title enough to match on
ALLOW_UNIDENTIFIED_EXPERIMENT = True # If there's no title/id, should we process it as "uncategorized"?
DEFAULT_UNIDENTIFIED_EXPERIMENT_TITLE="Uncategorized Data"
ALLOW_UNNAMED_DATASETS = True # If a dataset has no title, should we ingest it with a default name
DEFAULT_UNNAMED_DATASET_TITLE = '(assorted files)'
ALLOW_USER_CREATION = True # If experiments belong to unknown users, create them?
# Can existing datasets be updated? If not, we ignore updates. To cause a new dataset to be created, the incoming
# feed must have a unique EntryID for the dataset (eg, hash of its contents).
ALLOW_UPDATING_DATASETS = True
# If a datafile is modified, do we re-harvest it (creating two copies)? Else, we ignore the update. False is not recommended.
ALLOW_UPDATING_DATAFILES = True

# If files are served as /user/instrument/experiment/dataset/datafile/moredatafiles
# then 'datafile' is at depth 5. This is so we can maintain directory structure that
# is significant within a dataset. Set to -1 to assume the deepest directory.
DATAFILE_DIRECTORY_DEPTH = 5

USE_MIDDLEWARE_FILTERS = False # Initialise metadata extraction filters? Requires settings.py config.
HIDE_REPLACED_DATAFILES = True # Mark old versions of updated datafiles as hidden. Requires datafile hiding feature in Tardis.

# If we can transfer files "locally" (ie, via SMB mount), then replace URL_BASE_TO_REPLACE with LOCAL_SOURCE_PATH
# to construct a file path that can be copied from.
USE_LOCAL_TRANSFERS = True
URL_BASE_TO_REPLACE = "http://dataprovider.example.com/files/"
LOCAL_SOURCE_PATH = "/mnt/dataprovider/"

HTTP_PROXY = "http://proxy.example.com:8080" # Leave blank for no proxy

[celerybeat]: http://ask.github.com/celery/userguide/periodic-tasks.html#starting-celerybeat
[celeryd]: http://ask.github.com/celery/userguide/workers.html#starting-the-worker
[celeryd]: http://ask.github.com/celery/userguide/workers.html#starting-the-workers
Loading