You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ingestion of the data files is processed successfully.
Then I want to test an ingestion of existing data in the Data Cube (source files have been updated and I want to update my Data Cube).
To do this, I change the date of the source files (with the Linux 'touch' command).
The datetime of the data is now greater than the datetime of the dataset in the database.
I run again agdc/ingest/wofs.py --source /home/adminprod/data1/rs0/tiles/wofs/ and I get the following exception:
2015-08-04 11:56:02,123 agdc.ingest.tile_contents INFO Tile already in place: '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER_115_-035_2011-01-10T01-59-19.155557.tif'
2015-08-04 11:56:02,217 agdc.ingest.core INFO Ingestion complete for dataset '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER_115-035_2011-01-10T01-59-19.155557.tif' in 0:00:00.197192.
Traceback (most recent call last):
File "/home/adminprod/agdc-develop/agdc/ingest/wofs.py", line 97, in
agdc.ingest.run_ingest(WofsIngester)
File "/home/adminprod/agdc-develop/agdc/ingest/_core.py", line 586, in run_ingest
ingester.ingest(ingester.args.source_dir)
File "/home/adminprod/agdc-develop/agdc/ingest/_core.py", line 186, in ingest
self.ingest_individual_dataset(dataset_path)
File "/home/adminprod/agdc-develop/agdc/ingest/core.py", line 207, in ingest_individual_dataset
self.tile(dataset_record, dataset)
File "/home/adminprod/agdc-develop/agdc/ingest/pretiled.py", line 312, in tile
dataset_record.store_tiles([tile_contents])
File "/home/adminprod/agdc-develop/agdc/ingest/dataset_record.py", line 238, in store_tiles
return [self.create_tile_record(tile_contents) for tile_contents in tile_list]
File "/home/adminprod/agdc-develop/agdc/ingest/dataset_record.py", line 320, in create_tile_record
size_mb=tile_contents.get_output_size_mb(),
File "/home/adminprod/agdc-develop/agdc/ingest/tile_contents.py", line 174, in get_output_size_mb
return get_file_size_mb(path)
File "/home/adminprod/agdc-develop/agdc/cube_util.py", line 109, in get_file_size_mb
return os.path.getsize(path) // (1024 * 1024)
File "/usr/lib/python2.7/genericpath.py", line 49, in getsize
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER_115-035_2011-02-27T01-59-34.560472.tif'
2015-08-04 11:56:02,352 agdc.ingest.core ERROR Unexpected error during path '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER_115-035_2011-02-27T01-59-34.560472.tif'
After some investigation, I think the issue is due to the fact the data file is removed in the '__commit' function of the 'collection.py' module: i.e.
# Remove tile files just after the commit, to avoid removing
# tile files when the deletion of a tile record has been rolled
# back. Again, tile files without records are possible if there
# is an exception or crash just after the commit.
#
# The tile remove list is filtered against the tile create list
# to avoid removing a file that has just been re-created. It is
# a bad idea to overwrite a tile file in this way (in a single
# transaction), because it will be overwritten just before the
# commit (above) and the wrong file will be in place if the
# transaction is rolled back.
tile_create_set = {t.get_output_path()
for t in self.tile_create_list}
for tile_pathname in self.tile_remove_list:
if tile_pathname not in tile_create_set:
if os.path.isfile(tile_pathname):
os.remove(tile_pathname)
To be able to ingest again the updated data source files, I have comment the 'os.remove' instruction above.
Note if the data source have not been updated (i.e. data of the source file = date of the database dataset), there is no issue.
Note If I run again the ingestion, the issue doesn't occur always on the same file: sometimes on the first file, sometimes on the nth file.
The text was updated successfully, but these errors were encountered:
We hit this bug last week ourselves in the development code – the overlap cleaner identified the second tile as redundant, which for other ingesters implies tile removal, and this was incorrectly running during WOfS ingestion. The WOfS ingester should be runnable with read-only access to its inputs (which is how we're running it), so any file modification is a serious bug.
Try updating to the latest version of the develop branch and retesting.
Hi AGDC Team,
While I'm testing WOfS ingestion, I found an issue.
I have downloaded some WOfS file from http://dapds00.nci.org.au/thredds/catalog/fk4/wofs/current/extents in a directory on my machine.
Then I run the ingest command for the first time: e.g
agdc/ingest/wofs.py --source /home/adminprod/data1/rs0/tiles/wofs/
Ingestion of the data files is processed successfully.
Then I want to test an ingestion of existing data in the Data Cube (source files have been updated and I want to update my Data Cube).
To do this, I change the date of the source files (with the Linux 'touch' command).
The datetime of the data is now greater than the datetime of the dataset in the database.
I run again agdc/ingest/wofs.py --source /home/adminprod/data1/rs0/tiles/wofs/ and I get the following exception:
2015-08-04 11:56:02,123 agdc.ingest.tile_contents INFO Tile already in place: '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER_115_-035_2011-01-10T01-59-19.155557.tif'
2015-08-04 11:56:02,217 agdc.ingest.core INFO Ingestion complete for dataset '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER_115-035_2011-01-10T01-59-19.155557.tif' in 0:00:00.197192.
Traceback (most recent call last):
File "/home/adminprod/agdc-develop/agdc/ingest/wofs.py", line 97, in
agdc.ingest.run_ingest(WofsIngester)
File "/home/adminprod/agdc-develop/agdc/ingest/_core.py", line 586, in run_ingest
ingester.ingest(ingester.args.source_dir)
File "/home/adminprod/agdc-develop/agdc/ingest/_core.py", line 186, in ingest
self.ingest_individual_dataset(dataset_path)
File "/home/adminprod/agdc-develop/agdc/ingest/core.py", line 207, in ingest_individual_dataset
self.tile(dataset_record, dataset)
File "/home/adminprod/agdc-develop/agdc/ingest/pretiled.py", line 312, in tile
dataset_record.store_tiles([tile_contents])
File "/home/adminprod/agdc-develop/agdc/ingest/dataset_record.py", line 238, in store_tiles
return [self.create_tile_record(tile_contents) for tile_contents in tile_list]
File "/home/adminprod/agdc-develop/agdc/ingest/dataset_record.py", line 320, in create_tile_record
size_mb=tile_contents.get_output_size_mb(),
File "/home/adminprod/agdc-develop/agdc/ingest/tile_contents.py", line 174, in get_output_size_mb
return get_file_size_mb(path)
File "/home/adminprod/agdc-develop/agdc/cube_util.py", line 109, in get_file_size_mb
return os.path.getsize(path) // (1024 * 1024)
File "/usr/lib/python2.7/genericpath.py", line 49, in getsize
return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER_115-035_2011-02-27T01-59-34.560472.tif'
2015-08-04 11:56:02,352 agdc.ingest.core ERROR Unexpected error during path '/home/adminprod/data1/rs0/tiles/wofs/LS7_ETM_WATER_115-035_2011-02-27T01-59-34.560472.tif'
After some investigation, I think the issue is due to the fact the data file is removed in the '__commit' function of the 'collection.py' module: i.e.
To be able to ingest again the updated data source files, I have comment the 'os.remove' instruction above.
Note if the data source have not been updated (i.e. data of the source file = date of the database dataset), there is no issue.
Note If I run again the ingestion, the issue doesn't occur always on the same file: sometimes on the first file, sometimes on the nth file.
The text was updated successfully, but these errors were encountered: