Skip to content

MARC Grep

ruschein edited this page Oct 5, 2015 · 19 revisions

Introduction

marc_grep is a command-line tool for the extraction of data from MARC-21 data sets. Furthermore it can act as a filtering tool, too. A statically-linked binary built for AMD64 and Linux can be found here.

Usage and Syntax

Usage: marc_grep marc_filename query [output_label_format]

  Query syntax:
    query                    = [ leader_condition ] simple_query
    leader_condition         = "leader[" offset_range "]=" string_constant
    offset_range             = start_offset [ "-" end_offset ]
    start_offset             = unsigned_integer
    end_offset               = unsigned_integer
    unsigned_integer         = digit { digit }
    digit                    = "0" | "1" | "2" | "3" | "4" | "5" | "6"
                               | "7" | "8" | "9"
    simple_query             = simple_field_list
                               | conditional_field_or_subfield_references
    simple_field_list        = field_or_subfield_reference
                               { ":" field_or_subfield_reference }
    field_or_subfield_reference
                             = field_reference | subfield_reference
    conditional_field_or_subfield_references
                             = conditional_field_or_subfield_reference
                               { ","
                               conditional_field_or_subfield_reference }
    conditional_field_or_subfield_reference
                             = "if" condition "extract"
                                (field_or_subfield_reference | "*")
    condition                = field_or_subfield_reference comp_op reg_ex
                               | field_or_subfield_reference "exists"
                               | field_or_subfield_reference "is_missing"
    reg_ex                   = string_constant
    comp_op                  = "==" | "!=" | "===" | "!=="

String constants start and end with double quotes. Backslashes and double quotes within need to be escaped with a backslash. The difference between the "==" and "!=" vs. "===" and "!===" comparision operators is that the latter compare subfields within a given field while the former compare against any two matching fields or subfields. This becomes relevant when there are multiple occurrences of a field in a record. "*" matches all fields. Field and subfield references are strings and thus need to be quoted.

Output label format:

label_format = matched_field_or_subfield | control_number
               | control_number_and_matched_field_or_subfield | no_label | marc_binary

The default output label is the control number followed by a colon followed by the matched field or subfield.