-
Notifications
You must be signed in to change notification settings - Fork 3
MARC Grep
ruschein edited this page Oct 5, 2015
·
19 revisions
marc_grep
is a command-line tool for extraction of data from MARC-21 data sets. Furthermore it can act as a filtering tool, too. A statically-linked binary built for AMD64 and Linux can be found here.
Usage: marc_grep marc_filename query [output_label_format] Query syntax: query = [ leader_condition ] simple_query leader_condition = "leader[" offset_range "]=" string_constant offset_range = start_offset [ "-" end_offset ] start_offset = unsigned_integer end_offset = unsigned_integer unsigned_integer = digit { digit } digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" simple_query = simple_field_list | conditional_field_or_subfield_references simple_field_list = field_or_subfield_reference { ":" field_or_subfield_reference } field_or_subfield_reference = field_reference | subfield_reference conditional_field_or_subfield_references = conditional_field_or_subfield_reference { "," conditional_field_or_subfield_reference } conditional_field_or_subfield_reference = "if" condition "extract" (field_or_subfield_reference | "*") condition = field_or_subfield_reference comp_op reg_ex | field_or_subfield_reference "exists" | field_or_subfield_reference "is_missing" reg_ex = string_constant comp_op = "==" | "!=" | "===" | "!==" String constants start and end with double quotes. Backslashes and double quotes within need to be escaped with a backslash. The difference between the "==" and "!=" vs. "===" and "!===" comparision operators is that the latter compare subfields within a given field while the former compare against any two matching fields or subfields. This becomes relevant when there are mutiple occurrences of a field in a record. "*" matches all fields. Field and subfield references are strings and thus need to be quoted. Output label format: label_format = matched_field_or_subfield | control_number | control_number_and_matched_field_or_subfield | no_label | marc_binary The default output label is the control number followed by a colon followed by matched field or subfield