Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds conversion of DCML harmony labels to Dezrann format #102

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
7b8f70e
end-of-day commig of dezrann.py
johentsch Feb 22, 2023
0851ce2
add 'origin' argument to handle 'layers' attributes in Dezrann labels
pythouille Feb 23, 2023
d395db5
makes the new argument 'origin' a str|Tuple[str] (lists should be avo…
johentsch Feb 23, 2023
62682e1
corrects output annotation
johentsch Feb 23, 2023
14148c1
adds CLI skeleton
johentsch Feb 23, 2023
a2b40a3
adds import and corrects positional argument
johentsch Feb 23, 2023
8775778
streamlines CLI arguments and suggests defaults
johentsch Feb 23, 2023
ee7e560
work in progress: CLI + line layout
pythouille Feb 23, 2023
9095d0e
end-of-day commit after collab session LC & JH
pythouille Feb 23, 2023
6ec8cd2
refines commandline interface with docstrings and better argument tre…
johentsch Feb 24, 2023
58b2cca
this version converts raw labels, independent of the given line argum…
johentsch Feb 24, 2023
d14f9ca
adds to transform_df() the algorithm that copies the preceding label …
johentsch Feb 24, 2023
aefeb9b
integrates calls to transform_df with the current logic within genera…
johentsch Feb 24, 2023
0ffd23a
link DCML labels to final Dezrann labels and layout
pythouille Feb 24, 2023
6a9d2e4
fix label_type naming typo
pythouille Feb 24, 2023
c12c4cf
remove fixed manual post-processing steps (NaN durations and handling…
pythouille Feb 24, 2023
d9c6d0c
add safe conversion of quarterbeats
pythouille Feb 24, 2023
3c0c6d1
more error handling
johentsch May 1, 2023
b8b0578
renames transform_df() => dcml_labels2dicts()
johentsch May 1, 2023
455f106
improves type annotations & docstrings
johentsch May 1, 2023
c567df6
replaces simple tests by new_tests/test_dezrann.py
johentsch May 1, 2023
3fafa71
factors out generate_dez_from_dfs()
johentsch May 1, 2023
683908b
generate_dez() and generate_dez_from_dfs() return boolean and also ac…
johentsch May 1, 2023
10e1ef0
extends test_dcml2dez() so that the number of written .dez labels cor…
johentsch May 1, 2023
487242a
adds additional unittest
johentsch May 1, 2023
f95799e
pre-commit run --all-files
johentsch Sep 11, 2023
7fc7733
factors out test_dezrann.py into folder dedicated to tests on Mozart
johentsch Sep 11, 2023
45cdf71
add measure map file with docstring
pythouille Sep 13, 2023
48ac790
add main measure map generation function
pythouille Sep 13, 2023
80a0b89
add full_info mode, with all keywords
pythouille Sep 13, 2023
5ae3288
improve compressed mode
pythouille Sep 13, 2023
341c610
computes quarterbeats_all_endings only if not present already
johentsch Sep 18, 2023
3ad8bfa
Merge branch 'main' into dezrann_rebased
johentsch Sep 19, 2023
41e48b9
adds review_cmd to debugging.py
johentsch Sep 19, 2023
2ae5d72
resolves #99
johentsch Sep 19, 2023
e31c2fc
enables extracting measure maps using ms3 extract -MM
johentsch Sep 19, 2023
632364d
changes MeasureMap suffix to measuremap.json
johentsch Sep 20, 2023
a179bb2
moves measures2measure_map() to ms3.transformations
johentsch Sep 21, 2023
5c72932
has MeasureMap fields start/end_repeat default to False
johentsch Sep 21, 2023
ead673f
writes measuremap.json with indent=2
johentsch Sep 21, 2023
cf1c16f
corrects time-signature => time_signature
johentsch Sep 21, 2023
b0d50f7
prevents panda.to_json's strange escaping of characters
johentsch Sep 21, 2023
2e9f53c
fix file formatting and keywords names
pythouille Sep 22, 2023
efc2af9
Merge branch 'dezrann_rebased' of https://github.com/johentsch/ms3 in…
pythouille Sep 22, 2023
9ddaf22
changes suffix .measuremap.json => .mm.json
johentsch Sep 25, 2023
dbec962
makes MeasureMaps fully verbose and corrects the order of fields
johentsch Sep 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion codemeta.json
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,4 @@
}
}
]
}
}
58 changes: 37 additions & 21 deletions src/ms3/annotations.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,17 @@ def __init__(
cols : :obj:`dict`
If your columns don't have standard names, pass a {NAME -> ACTUAL_NAME} dictionary.
Required columns: label, mc, mc_onset, staff, voice
Additional columns: harmony_layer, regex_match, absolute_root, rootCase, absolute_base, leftParen, rightParen, offset_x, offset_y, nashville, decoded, color_name,
Additional columns: harmony_layer, regex_match, absolute_root, rootCase, absolute_base, leftParen,
rightParen, offset_x, offset_y, nashville, decoded, color_name,
color_html, color_r, color_g, color_b, color_a, placement, minDistance, style, z
index_col
sep
mscx_obj
infer_types : :obj:`dict`, optional
If you want to check all labels against one or several regular expressions, pass them as a {label_type -> regEx} dictionary.
The column regex_match will display the label_type of the last matched regEx. If you pass None, the default behaviour
If you want to check all labels against one or several regular expressions, pass them as a {label_type ->
regEx} dictionary.
The column regex_match will display the label_type of the last matched regEx. If you pass None,
the default behaviour
is detecting labels of the DCML harmony annotation standard's current version.
read_only
logger_cfg : :obj:`dict`, optional
Expand Down Expand Up @@ -133,20 +136,23 @@ def __init__(
def add_initial_dots(self):
if self.read_only:
self.logger.warning(
f"Cannot change labels attached to a score. Detach them first."
"Cannot change labels attached to a score. Detach them first."
)
return
label_col = self.cols["label"]
notes = {"a", "b", "c", "d", "e", "f", "g", "h"}
add_dots = lambda s: "." + s if s[0].lower() in notes else s

def add_dots(s):
return "." + s if s[0].lower() in notes else s

self.df[label_col] = self.df[label_col].map(add_dots)

def prepare_for_attaching(
self, staff=None, voice=None, harmony_layer=1, check_for_clashes=True
):
if self.mscx_obj is None:
self.logger.warning(
f"Annotations object not aware to which MSCX object it is attached."
"Annotations object not aware to which MSCX object it is attached."
)
return pd.DataFrame()
df = self.df.copy()
Expand Down Expand Up @@ -213,25 +219,25 @@ def prepare_for_attaching(
mn_col = self.cols["mn"] if "mn" in self.cols else "mn"
if mn_col not in cols:
self.logger.error(
f"Annotations need to have at least one column named 'mn' or 'mc'."
"Annotations need to have at least one column named 'mn' or 'mc'."
)
error = True
else:
inferred_positions = self.infer_mc_from_mn()
if inferred_positions.isna().any().any():
self.logger.error(
f"Measure counts and corresponding mc_onsets could not be successfully inferred."
"Measure counts and corresponding mc_onsets could not be successfully inferred."
)
error = True
else:
if "mn_onset" not in self.cols:
self.logger.info(
f"Measure counts successfully inferred. Since there is no 'mn_onset' column, all "
f"mc_onsets have been set to 0."
"Measure counts successfully inferred. Since there is no 'mn_onset' column, all "
"mc_onsets have been set to 0."
)
else:
self.logger.info(
f"Measure counts and corresponding mc_onsets successfully inferred."
"Measure counts and corresponding mc_onsets successfully inferred."
)
df.insert(df.columns.get_loc("mn"), "mc", inferred_positions["mc"])
df.loc[:, "mc_onset"] = inferred_positions["mc_onset"]
Expand Down Expand Up @@ -260,7 +266,7 @@ def prepare_for_attaching(
error = True
elif check_for_clashes:
self.logger.error(
f"Check for clashes could not be performed because there are columns missing."
"Check for clashes could not be performed because there are columns missing."
)

if error:
Expand Down Expand Up @@ -420,7 +426,7 @@ def tuple_or_na(row):
elif has_rgb:
res.color = rgb2format(res, color_format)
else:
logger.warning(
self.logger.warning(
f"Color format '{color_format}' could not be computed from columns {present_cols}."
)
res.drop(columns=present_cols, inplace=True)
Expand All @@ -441,7 +447,8 @@ def expand_dcml(
all_in_c=False,
**kwargs,
):
"""Expands all labels where the regex_match has been inferred as 'dcml' and stores the DataFrame in self._expanded.
"""Expands all labels where the regex_match has been inferred as 'dcml' and stores the DataFrame in
self._expanded.

Parameters
----------
Expand Down Expand Up @@ -485,13 +492,17 @@ def expand_dcml(
df = self.get_labels(**kwargs)
select_dcml = (df.regex_match == "dcml").fillna(False)
if not select_dcml.any():
self.logger.info(f"Score does not contain any DCML harmonic annotations.")
self.logger.info("Score does not contain any DCML harmonic annotations.")
return
if not drop_others:
warn_about_others = False
if warn_about_others and (~select_dcml).any():
show_labels = decode_harmonies(
df[~select_dcml], keep_layer=True, logger=self.logger
)[["mc", "mn", "label", "harmony_layer"]].to_string()
self.logger.warning(
f"Score contains {(~select_dcml).sum()} labels that don't (and {select_dcml.sum()} that do) match the DCML standard:\n{decode_harmonies(df[~select_dcml], keep_layer=True, logger=self.logger)[['mc', 'mn', 'label', 'harmony_layer']].to_string()}",
f"Score contains {(~select_dcml).sum()} labels that don't (and {select_dcml.sum()} that do) match the "
f"DCML standard:\n{show_labels}",
extra={"message_id": (15,)},
)
df = df[select_dcml]
Expand Down Expand Up @@ -529,7 +540,10 @@ def expand_dcml(
"To retain the old behavior, use either.*"
),
)
df.loc[select_dcml, exp.columns] = exp
exp_shared_cols = exp.columns.isin(df.columns.values)
df_shared_cols = df.columns.isin(exp.columns.values)
df.loc[select_dcml, df_shared_cols] = exp.loc[:, exp_shared_cols]
df = pd.concat([df, exp.loc[:, ~exp_shared_cols]], axis=1)
df.loc[:, key_cols] = df[key_cols].ffill()
self._expanded = df
drop_cols = [
Expand All @@ -549,7 +563,7 @@ def expand_dcml(
def infer_mc_from_mn(self, mscx_obj=None):
if mscx_obj is None and self.mscx_obj is None:
self.logger.error(
f"Either pass an MSCX object or load this Annotations object to a score using load_annotations()."
"Either pass an MSCX object or load this Annotations object to a score using load_annotations()."
)
return False

Expand Down Expand Up @@ -594,7 +608,8 @@ def infer_types(self, regex_dict=None):
column_position = self.df.columns.get_loc("harmony_layer") + 1
self.df.insert(column_position, "regex_match", regex_col)
for name, regex in regex_dict.items():
# TODO: Check if in the loop, previously matched regex names are being overwritten by those matched after
# TODO: Check if in the loop, previously matched regex names are being overwritten by those matched
# after
try:
mtch = decoded[sel].str.match(regex)
except AttributeError:
Expand All @@ -607,7 +622,7 @@ def infer_types(self, regex_dict=None):
def remove_initial_dots(self):
if self.read_only:
self.logger.warning(
f"Cannot change labels attached to a score. Detach them first."
"Cannot change labels attached to a score. Detach them first."
)
return
label_col = self.cols["label"]
Expand Down Expand Up @@ -654,6 +669,7 @@ def _treat_harmony_layer_param(self, harmony_layer, warnings=True):
plural = len(not_found) > 1
plural_s = "s" if plural else ""
self.logger.warning(
f"No labels found with {'these' if plural else 'this'} label{plural_s} harmony_layer{plural_s}: {', '.join(not_found)}"
f"No labels found with {'these' if plural else 'this'} label{plural_s} harmony_layer{plural_s}: "
f"{', '.join(not_found)}"
)
return [all_types[t] for t in lt if t in all_types]
12 changes: 12 additions & 0 deletions src/ms3/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ def gather_extract_params(args) -> List[str]:
for name, arg in zip(
(
"measures",
"measure_maps",
"notes",
"rests",
"labels",
Expand All @@ -53,6 +54,7 @@ def gather_extract_params(args) -> List[str]:
),
(
args.measures,
args.measure_maps,
args.notes,
args.rests,
args.labels,
Expand Down Expand Up @@ -213,6 +215,7 @@ def extract_cmd(args, parse_obj: Optional[Parse] = None):
notes_folder=args.notes,
labels_folder=args.labels,
measures_folder=args.measures,
measure_maps_folder=args.measure_maps,
rests_folder=args.rests,
events_folder=args.events,
chords_folder=args.chords,
Expand Down Expand Up @@ -685,6 +688,15 @@ def get_arg_parser():
const="../measures",
help="Folder where to store TSV files with measure information needed for tasks such as unfolding repetitions.",
)
extract_args.add_argument(
"-MM",
"--measure_maps",
metavar="folder",
nargs="?",
const="../measures",
help="Folder where to store <name>.mm.json files. They are a variant of the 'normal' --measures with renamed "
"columns, satisfying the MeasureMap specification.",
)
extract_args.add_argument(
"-N",
"--notes",
Expand Down
40 changes: 35 additions & 5 deletions src/ms3/corpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,10 @@
import numpy as np
import pandas as pd
import pathos.multiprocessing as mp
from ms3.utils.frictionless_helpers import store_dataframe_resource
from ms3.utils.frictionless_helpers import (
store_as_json_or_yaml,
store_dataframe_resource,
)
from ms3.utils.functions import compute_path_from_file

from ._typing import (
Expand All @@ -43,6 +46,7 @@
)
from .piece import Piece
from .score import Score, compare_two_score_objects
from .transformations import measures2measure_map
from .utils import (
File,
ask_user_to_choose,
Expand Down Expand Up @@ -3046,6 +3050,7 @@ def store_extracted_facets(
view_name: Optional[str] = None,
root_dir: Optional[str] = None,
measures_folder: Optional[str] = None,
measure_maps_folder: Optional[str] = None,
notes_folder: Optional[str] = None,
rests_folder: Optional[str] = None,
notes_and_rests_folder: Optional[str] = None,
Expand Down Expand Up @@ -3097,11 +3102,19 @@ def store_extracted_facets(
folder_params = {
t: lcls[p] for t, p in zip(df_types, folder_vars) if lcls[p] is not None
}
output_metadata = metadata_suffix is not None
if len(folder_params) == 0 and not output_metadata:
do_store_metadata = metadata_suffix is not None
do_store_measure_maps = measure_maps_folder is not None
if (
len(folder_params) == 0
and not do_store_metadata
and not do_store_measure_maps
):
self.logger.warning("Pass at least one parameter to store files.")
return []
facets = list(folder_params.keys())
do_store_measures_tsv = "measures" in facets
if do_store_measure_maps and not do_store_measures_tsv:
facets.append("measures")
df_params = {p: True for p in folder_params.keys()}
n_scores = len(self._get_parsed_score_files(view_name=view_name, flat=True))
paths = []
Expand All @@ -3124,11 +3137,28 @@ def store_extracted_facets(
for facet, df in facet2dataframe.items():
if df is None:
continue
piece_name = file.piece
if facet == "measures" and do_store_measure_maps:
directory = compute_path_from_file(
file, root_dir=root_dir, folder=measure_maps_folder
)
file_path = os.path.join(directory, f"{piece}.mm.json")
if simulate:
self.logger.info(
f"Would have stored the MeasureMap from {file.rel_path} as {file_path}."
)
else:
measure_map = measures2measure_map(df)
measure_map_json = measure_map.to_dict(orient="records")
store_as_json_or_yaml(
measure_map_json, file_path, logger=self.logger
)
if not do_store_measures_tsv:
continue
folder = folder_params[facet]
directory = compute_path_from_file(
file, root_dir=root_dir, folder=folder
)
piece_name = file.piece
if unfold:
piece_name += "_unfolded"
facet_param = "harmonies" if facet == "expanded" else facet
Expand All @@ -3153,7 +3183,7 @@ def store_extracted_facets(
logger=self.logger,
)
paths.append(descriptor_or_resource_path)
if output_metadata:
if do_store_metadata:
if not markdown:
metadata_paths = self.update_metadata_tsv_from_parsed_scores(
root_dir=root_dir, suffix=metadata_suffix, markdown_file=None
Expand Down
Loading