Skip to content

MARC Grep

ruschein edited this page Oct 5, 2015 · 19 revisions

Introduction

marc_grep is a command-line tool for extraction of data from MARC-21 data sets. Furthermore it can act as a filtering tool, too. A statically-linked binary built for AMD64 and Linux can be found here.

Usage and Syntax

Usage: marc_grep marc_filename query [output_label_format]

  Query syntax:
    query                                    = [ leader_condition ]
                                               simple_query
    leader_condition                         = "leader[" offset_range "]="
                                               string_constant
    offset_range                             = start_offset [ "-"
                                               end_offset ]
    start_offset                             = unsigned_integer
    end_offset                               = unsigned_integer
    unsigned_integer                         = digit { digit }
    digit                                    = "0" | "1" | "2" | "3" | "4"
                                               | "5" | "6" | "7" | "8"
                                               | "9"
    simple_query                             = simple_field_list
                                               | conditional_field_or_subfield_references
    simple_field_list                        = field_or_subfield_reference
                                               { ":" field_or_subfield_reference }
    field_or_subfield_reference              = field_reference | subfield_reference
    conditional_field_or_subfield_references = conditional_field_or_subfield_reference
                                               { "," conditional_field_or_subfield_reference }
    conditional_field_or_subfield_reference  = "if" condition "extract"
                                               (field_or_subfield_reference | "*")
    condition                                = field_or_subfield_reference comp_op reg_ex
                                               | field_or_subfield_reference "exists"
                                               | field_or_subfield_reference "is_missing"
    reg_ex                                   = string_constant
    comp_op                                  = "==" | "!=" | "===" | "!=="

  String constants start and end with double quotes. Backslashes and double quotes within need to be escaped
  with a backslash. The difference between the "==" and "!=" vs. "===" and "!===" comparision
  operators is that the latter compare subfields within a given field while the former compare against any two
  matching fields or subfields.  This becomes relevant when there are mutiple occurrences of a field in a
  record. "*" matches all fields.  Field and subfield references are strings and thus need to be quoted.

  Output label format:
    label_format = matched_field_or_subfield | control_number | control_number_and_matched_field_or_subfield
                   | no_label | marc_binary

  The default output label is the control number followed by a colon followed by matched field or subfield