Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couple of changes/enhancements #135

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
08dc85c
Merge pull request #1 from developmentseed/master
wouellette Jan 30, 2019
b0cdbe1
Enables download for multiple countries at once
wouellette Jan 30, 2019
a75fcf8
added aoi_name and country as list
wouellette Jan 30, 2019
7c78d0b
added feat_id and label_area attributes to classification.geojson
wouellette Jan 30, 2019
bc1aedc
Update label.py
wouellette Jan 30, 2019
4dfe770
Update label.py
wouellette Jan 30, 2019
d5c600e
Update label.py
wouellette Jan 30, 2019
9f8e950
Convert label_area unit
wouellette Jan 31, 2019
6492904
Removed injection to postgis table
wouellette Feb 2, 2019
da1d2b7
improved segmentation ml_type to output shapes to geojson
wouellette Feb 2, 2019
1d4be9e
allow download and labels generation for multiple countries
Feb 25, 2019
6b4a4be
removed useless loading of geojson
wouellette Feb 26, 2019
336b1c8
Re-activate the creation of the labels.npz file
wouellette Feb 26, 2019
a881781
revert to country
wouellette Mar 19, 2019
107ab62
remove format attribute
wouellette Mar 19, 2019
d8efeb8
removed residual libraries
wouellette Mar 19, 2019
5b5a601
remove redundant library
wouellette Mar 19, 2019
366f99a
re-established geojson intended functionality
wouellette Mar 19, 2019
5fa230d
added aoi config key
wouellette Mar 19, 2019
d08c480
added aoi config key
wouellette Mar 19, 2019
5ff5573
Cleaner implementation of the classification ml_type
wouellette Apr 8, 2019
c5dfb32
minor fixes
wouellette Apr 8, 2019
c559fcf
updated unit tests
May 17, 2019
81d7ba6
updated unit tests
May 17, 2019
7bb1836
reverted to previous requirements
May 17, 2019
3427220
Update requirements.txt
wouellette May 17, 2019
66e3e2a
Merge pull request #3 from developmentseed/master
wouellette May 17, 2019
7f1a9b6
Merge branch 'master' of https://github.com/wouellette/label-maker
May 17, 2019
ac3bbca
finalized integration tests
Jun 5, 2019
94c24b0
cosmetic pylint changes
Jun 5, 2019
a6a70ac
additional pylint changes
Jun 5, 2019
de0e80f
removed venv directory
Jun 24, 2019
b3f4b96
Merge pull request #4 from developmentseed/master
wouellette Jun 24, 2019
dafa90e
added integration tests with two countries as input for all ml_types
Jun 26, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
25 changes: 13 additions & 12 deletions label_maker/download.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,19 @@ def download_mbtiles(dest_folder, country, **kwargs):
------------
dest_folder: str
Folder to save download into
country: str
Country for which to download the OSM QA tiles
country: list[str]
Countries for which to download the OSM QA tiles
**kwargs: dict
Other properties from CLI config passed as keywords to other utility functions
"""
download_file = path.join(dest_folder, '{}.mbtiles'.format(country))
print('Saving QA tiles to {}'.format(download_file))
url = 'https://s3.amazonaws.com/mapbox/osm-qa-tiles-production/latest.country/{}.mbtiles.gz'.format(country)
gz = tempfile.TemporaryDirectory()
tmp_path = path.join(gz.name, '{}.mbtiles.gz'.format(country))
download(url=url, path=tmp_path)
with gzip.open(tmp_path, 'rb') as r:
with open(download_file, 'wb') as w:
for line in r:
w.write(line)
for ctr in country:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over country list

download_file = path.join(dest_folder, '{}.mbtiles'.format(ctr))
print('Saving QA tiles to {}'.format(download_file))
url = 'https://s3.amazonaws.com/mapbox/osm-qa-tiles-production/latest.country/{}.mbtiles.gz'.format(ctr)
gz = tempfile.TemporaryDirectory()
tmp_path = path.join(gz.name, '{}.mbtiles.gz'.format(ctr))
download(url=url, path=tmp_path)
with gzip.open(tmp_path, 'rb') as r:
with open(download_file, 'wb') as w:
for line in r:
w.write(line)
229 changes: 124 additions & 105 deletions label_maker/label.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,99 +66,119 @@ def make_labels(dest_folder, zoom, country, classes, ml_type, bounding_box, spar
Other properties from CLI config passed as keywords to other utility functions
"""

mbtiles_file = op.join(dest_folder, '{}.mbtiles'.format(country))
mbtiles_file_zoomed = op.join(dest_folder, '{}-z{!s}.mbtiles'.format(country, zoom))

if not op.exists(mbtiles_file_zoomed):
filtered_geo = kwargs.get('geojson') or op.join(dest_folder, '{}.geojson'.format(country))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I am not sure I understand why kwargs.get('geojson') is picked up here for the variable. This would assume that the geojson provided in the config.json file contains the label features on which to generate the labelled output for classification, object detection or segmentation.

However, I cannot see a use case where someone would provide their labelled features as a standalone geojson file, because then would they really need to use the label-maker?
If we keep the geojson attribute in its current defintion, I still see the value of an aoi field to provide the AOI in the form of geometries, as an alternative to the country+bbox combination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wouellette the geojson attribute is provided so users can add a standalone GeoJSON file. label-maker still provides value here in that it will chip up the imagery and GeoJSON labels, rasterize or create object labels, etc. We need to keep this attribute as it's used in a number of existing workflows.

fast_parse = []
if not op.exists(filtered_geo):
fast_parse = ['-P']
print('Retiling QA Tiles to zoom level {} (takes a bit)'.format(zoom))
ps = Popen(['tippecanoe-decode', '-c', '-f', mbtiles_file], stdout=PIPE)
stream_filter_fpath = op.join(op.dirname(label_maker.__file__), 'stream_filter.py')
run([sys.executable, stream_filter_fpath, json.dumps(bounding_box)],
stdin=ps.stdout, stdout=open(filtered_geo, 'w'))
ps.wait()
run(['tippecanoe', '--no-feature-limit', '--no-tile-size-limit'] + fast_parse +
['-l', 'osm', '-f', '-z', str(zoom), '-Z', str(zoom), '-o',
mbtiles_file_zoomed, filtered_geo])

# Call tilereduce
print('Determining labels for each tile')
mbtiles_to_reduce = mbtiles_file_zoomed
tilereduce(dict(zoom=zoom, source=mbtiles_to_reduce, bbox=bounding_box,
args=dict(ml_type=ml_type, classes=classes)),
_mapper, _callback, _done)

# Add empty labels to any tiles which didn't have data
empty_label = _create_empty_label(ml_type, classes)
for tile in tiles(*bounding_box, [zoom]):
index = '-'.join([str(i) for i in tile])
global tile_results
if tile_results.get(index) is None:
tile_results[index] = empty_label

# Print a summary of the labels
_tile_results_summary(ml_type, classes)

# If the --sparse flag is provided, limit the total background tiles to write
if sparse:
pos_examples, neg_examples = [], []
for k in tile_results.keys():
# if we don't match any class, this is a negative example
if not sum([class_match(ml_type, tile_results[k], i + 1) for i, c in enumerate(classes)]):
neg_examples.append(k)
else:
pos_examples.append(k)

# Choose random subset of negative examples
n_neg_ex = int(kwargs['background_ratio'] * len(pos_examples))
neg_examples = np.random.choice(neg_examples, n_neg_ex, replace=False).tolist()

tile_results = {k: tile_results.get(k) for k in pos_examples + neg_examples}
print('Using sparse mode; subselected {} background tiles'.format(n_neg_ex))

# write out labels as numpy arrays
labels_file = op.join(dest_folder, 'labels.npz')
print('Writing out labels to {}'.format(labels_file))
np.savez(labels_file, **tile_results)

# write out labels as GeoJSON or PNG
if ml_type == 'classification':
features = []
for tile, label in tile_results.items():
feat = feature(Tile(*[int(t) for t in tile.split('-')]))
features.append(Feature(geometry=feat['geometry'],
properties=dict(label=label.tolist())))
json.dump(fc(features), open(op.join(dest_folder, 'classification.geojson'), 'w'))
elif ml_type == 'object-detection':
label_folder = op.join(dest_folder, 'labels')
if not op.isdir(label_folder):
makedirs(label_folder)
for tile, label in tile_results.items():
# if we have at least one bounding box label
if bool(label.shape[0]):
label_file = '{}.png'.format(tile)
img = Image.new('RGB', (256, 256))
draw = ImageDraw.Draw(img)
for box in label:
draw.rectangle(((box[0], box[1]), (box[2], box[3])), outline=class_color(box[4]))
print('Writing {}'.format(label_file))
img.save(op.join(label_folder, label_file))
elif ml_type == 'segmentation':
label_folder = op.join(dest_folder, 'labels')
if not op.isdir(label_folder):
makedirs(label_folder)
for tile, label in tile_results.items():
# if we have any class pixels
if np.sum(label):
label_file = '{}.png'.format(tile)
visible_label = np.array([class_color(l) for l in np.nditer(label)]).reshape(256, 256, 3)
img = Image.fromarray(visible_label.astype(np.uint8))
print('Writing {}'.format(label_file))
img.save(op.join(label_folder, label_file))

for ctr_idx, ctr in enumerate(country):
mbtiles_file = op.join(dest_folder, '{}.mbtiles'.format(ctr))
mbtiles_file_zoomed = op.join(dest_folder, '{}-z{!s}.mbtiles'.format(ctr, zoom))

if not op.exists(mbtiles_file_zoomed):
filtered_geo = kwargs.get('geojson') or op.join(dest_folder, '{}.geojson'.format(ctr))
fast_parse = []
if not op.exists(filtered_geo):
fast_parse = ['-P']
print('Retiling QA Tiles to zoom level {} (takes a bit)'.format(zoom))
ps = Popen(['tippecanoe-decode', '-c', '-f', mbtiles_file], stdout=PIPE)
stream_filter_fpath = op.join(op.dirname(label_maker.__file__), 'stream_filter.py')
run([sys.executable, stream_filter_fpath, json.dumps(bounding_box)],
stdin=ps.stdout, stdout=open(filtered_geo, 'w'))
ps.wait()
run(['tippecanoe', '--no-feature-limit', '--no-tile-size-limit'] + fast_parse +
['-l', 'osm', '-f', '-z', str(zoom), '-Z', str(zoom), '-o',
mbtiles_file_zoomed, filtered_geo])

# Call tilereduce
print('Determining labels for each tile')
mbtiles_to_reduce = mbtiles_file_zoomed
tilereduce(dict(zoom=zoom, source=mbtiles_to_reduce, bbox=bounding_box,
args=dict(ml_type=ml_type, classes=classes)),
_mapper, _callback, _done)

# Add empty labels to any tiles which didn't have data
empty_label = _create_empty_label(ml_type, classes)
for tile in tiles(*bounding_box, [zoom]):
index = '-'.join([str(i) for i in tile])
global tile_results
if tile_results.get(index) is None:
tile_results[index] = empty_label

# Print a summary of the labels
_tile_results_summary(ml_type, classes)

# If the --sparse flag is provided, limit the total background tiles to write
if sparse:
pos_examples, neg_examples = [], []
for k in tile_results.keys():
# if we don't match any class, this is a negative example
if not sum([class_match(ml_type, tile_results[k], i + 1) for i, c in enumerate(classes)]):
neg_examples.append(k)
else:
pos_examples.append(k)

# Choose random subset of negative examples
n_neg_ex = int(kwargs['background_ratio'] * len(pos_examples))
neg_examples = np.random.choice(neg_examples, n_neg_ex, replace=False).tolist()

tile_results = {k: tile_results.get(k) for k in pos_examples + neg_examples}
print('Using sparse mode; subselected {} background tiles'.format(n_neg_ex))

# write out labels as numpy arrays
labels_file = op.join(dest_folder, 'labels.npz')
print('Writing out labels to {}'.format(labels_file))
np.savez(labels_file, **tile_results)

# write out labels as GeoJSON or PNG
if ml_type == 'classification':
features = []
if ctr_idx == 0:
label_area = np.zeros((len(list(tile_results.values())[0]), len(tile_results), len(country)), dtype=float)
label_bool = np.zeros((len(list(tile_results.values())[0]), len(tile_results), len(country)), dtype=bool)
for i, (tile, label) in enumerate(tile_results.items()):
label_bool[:, i, ctr_idx] = np.asarray([bool(l) for l in label])
label_area[:, i, ctr_idx] = np.asarray([float(l) for l in label])
# if there are no classes, activate the background
if ctr == country[-1]:
if all(v == 0 for v in label_bool[:, i, ctr_idx]):
label_bool[0, i, ctr_idx] = 1
feat = feature(Tile(*[int(t) for t in tile.split('-')]))
features.append(Feature(geometry=feat['geometry'],
properties=dict(feat_id=str(tile),
label=np.any(label_bool[:, i, :], axis=1).astype(int).tolist(),
label_area=np.sum(label_area[:, i, :], axis=1).tolist())))
if ctr == country[-1]:
json.dump(fc(features), open(op.join(dest_folder, 'classification.geojson'), 'w'))
elif ml_type == 'object-detection':
label_folder = op.join(dest_folder, 'labels')
if not op.isdir(label_folder):
makedirs(label_folder)
for tile, label in tile_results.items():
# if we have at least one bounding box label
if bool(label.shape[0]):
label_file = '{}.png'.format(tile)
img = Image.new('RGB', (256, 256))
draw = ImageDraw.Draw(img)
for box in label:
draw.rectangle(((box[0], box[1]), (box[2], box[3])), outline=class_color(box[4]))
print('Writing {}'.format(label_file))
if op.isfile(op.join(label_folder, label_file)):
old_img = Image.open(op.join(label_folder, label_file))
img.paste(old_img)
else:
img.save(op.join(label_folder, label_file))
elif ml_type == 'segmentation':
label_folder = op.join(dest_folder, 'labels')
if not op.isdir(label_folder):
makedirs(label_folder)
for tile, label in tile_results.items():
# if we have any class pixels
if np.sum(label):
label_file = '{}.png'.format(tile)
visible_label = np.array([class_color(l) for l in np.nditer(label)]).reshape(256, 256, 3)
img = Image.fromarray(visible_label.astype(np.uint8))
print('Writing {}'.format(label_file))
if op.isfile(op.join(label_folder, label_file)):
old_img = Image.open(op.join(label_folder, label_file))
img.paste(old_img)
else:
img.save(op.join(label_folder, label_file))

def _mapper(x, y, z, data, args):
"""Iterate over OSM QA Tiles and return a label for each tile
Expand Down Expand Up @@ -197,14 +217,15 @@ def _mapper(x, y, z, data, args):

if tile['osm']['features']:
if ml_type == 'classification':
class_counts = np.zeros(len(classes) + 1, dtype=np.int)
for i, cl in enumerate(classes):
ff = create_filter(cl.get('filter'))
class_counts[i + 1] = int(bool([f for f in tile['osm']['features'] if ff(f)]))
# if there are no classes, activate the background
if np.sum(class_counts) == 0:
class_counts[0] = 1
return ('{!s}-{!s}-{!s}'.format(x, y, z), class_counts)
class_areas = np.zeros(len(classes) + 1)
for feat in tile['osm']['features']:
for i, cl in enumerate(classes):
ff = create_filter(cl.get('filter'))
if ff(feat):
feat['geometry']['coordinates'] = _convert_coordinates(feat['geometry']['coordinates'])
geo = shape(feat['geometry'])
class_areas[i + 1] = geo.area
return ('{!s}-{!s}-{!s}'.format(x, y, z), class_areas)
elif ml_type == 'object-detection':
bboxes = _create_empty_label(ml_type, classes)
for feat in tile['osm']['features']:
Expand Down Expand Up @@ -305,7 +326,7 @@ def _tile_results_summary(ml_type, classes):
cl_tiles = len([l for l in labels if len(list(filter(_bbox_class(i + 1), l)))]) # pylint: disable=cell-var-from-loop
print('{}: {} features in {} tiles'.format(cl.get('name'), cl_features, cl_tiles))
elif ml_type == 'classification':
class_tile_counts = list(np.sum(labels, axis=0))
class_tile_counts = list(np.count_nonzero(labels, axis=0))
for i, cl in enumerate(classes):
print('{}: {} tiles'.format(cl.get('name'), int(class_tile_counts[i + 1])))
elif ml_type == 'segmentation':
Expand All @@ -317,9 +338,7 @@ def _tile_results_summary(ml_type, classes):

def _create_empty_label(ml_type, classes):
if ml_type == 'classification':
label = np.zeros(len(classes) + 1, dtype=np.int)
label[0] = 1
return label
return np.zeros(len(classes) + 1, dtype=np.int)
elif ml_type == 'object-detection':
return np.empty((0, 5), dtype=np.int)
elif ml_type == 'segmentation':
Expand Down
4 changes: 4 additions & 0 deletions label_maker/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,10 @@ def cli():
if not valid:
raise Exception(v.errors)

# for aoi, overwrite bounding_box config key to correct labelling
if 'aoi' in config.keys():
config['bounding_box'] = get_bounds(json.load(open(config.get('aoi'), 'r')))

# custom validation for top level keys
# require either: country & bounding_box or geojson
if 'geojson' not in config.keys() and not ('country' in config.keys() and 'bounding_box' in config.keys()):
Expand Down
3 changes: 2 additions & 1 deletion label_maker/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@

schema = {
'geojson': {'type': 'string'},
'country': {'type': 'string', 'allowed': countries},
'aoi': {'type': 'string'},
'country': {'type': 'list', 'allowed': countries},
'bounding_box': {'type': 'list', 'items': [lon_schema, lat_schema, lon_schema, lat_schema]},
'zoom': {'type': 'integer', 'required': True},
'classes': {'type': 'list', 'schema': class_schema, 'required': True},
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ numpy==1.13.3
olefile==0.44
Pillow==4.3.0
protobuf==3.5.0.post1
pyclipper==1.0.6
pyclipper>=1.0.6
pycurl==7.43.0.1
pyproj==1.9.5.1
rasterio[s3]==1.0a12
requests>=2.20.0
Shapely>=1.6.3
six==1.10.0
tilepie==0.2.1
tilepie==0.2.1
Loading