-
Notifications
You must be signed in to change notification settings - Fork 3
MARC Grep
marc_grep
is a command-line tool for extraction of data from MARC-21 data sets. Furthermore it can act as a filtering tool, too. A statically-linked binary built for AMD64 and Linux can be found here.
Usage: marc_grep marc_filename query [output_label_format]
Query syntax:
query = [ leader_condition ]
simple_query
leader_condition = "leader[" offset_range "]="
string_constant
offset_range = start_offset [ "-"
end_offset ]
start_offset = unsigned_integer
end_offset = unsigned_integer
unsigned_integer = digit { digit }
digit = "0" | "1" | "2" | "3" | "4"
| "5" | "6" | "7" | "8"
| "9"
simple_query = simple_field_list
| conditional_field_or_subfield_references
simple_field_list = field_or_subfield_reference
{ ":" field_or_subfield_reference }
field_or_subfield_reference = field_reference | subfield_reference
conditional_field_or_subfield_references = conditional_field_or_subfield_reference
{ "," conditional_field_or_subfield_reference }
conditional_field_or_subfield_reference = "if" condition "extract"
(field_or_subfield_reference | "*")
condition = field_or_subfield_reference comp_op reg_ex
| field_or_subfield_reference "exists"
| field_or_subfield_reference "is_missing"
reg_ex = string_constant
comp_op = "==" | "!=" | "===" | "!=="
String constants start and end with double quotes. Backslashes and double quotes within need to be escaped with a backslash. The difference between the "=="
and "!="
vs. "==="
and "!==="
comparision operators is that the latter compare subfields within a given field while the former compare against any two matching fields or subfields. This becomes relevant when there are mutiple occurrences of a field in a record. "*"
matches all fields. Field and subfield references are strings and thus need to be quoted.
Output label format:
label_format = matched_field_or_subfield | control_number | control_number_and_matched_field_or_subfield
| no_label | marc_binary
The default output label is the control number followed by a colon followed by matched field or subfield.