Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lex prt line #8

Closed
wants to merge 200 commits into from
Closed

Fix lex prt line #8

wants to merge 200 commits into from

Conversation

mingodad
Copy link

No description provided.

thutt added 30 commits October 18, 2023 21:28
Details

  This update contains the full state of the lrstar software of an
  ongoing Linux port.

  The direction of this project is as follows

   1. Create a build process
   2. Get project to build
   3. Get project to build with no default-settings errors / warnings.
   4. Remove unused code.
   5. Make library of common code
   6. Increase strictness of compilation options
   8. Use getopt_long() for options.
   7. Create documentation

 As of this commit, the project is at step 4.

 To build the software, first the host environment must be set up by
 loading the 'setup' script and setting the build type and the
 build-output-directory (BOD).

  source ./scripts/setup --build-type debug --bod /tmp/lrstar

 The build type must be (debug | release).

 The BOD is the location where all build artifacts, including the
 resulting executables, will be placed.

 Once the environment has been configured, building can be done with:

   lrstar-build [-j <NUM CPUS>]

 To remove all the build artifacts (aka a 'clean'):

   lrstar-build [-j <NUM CPUS>] clean
Details

  Replace the non-standard MAX_PATH with PATH_MAX.  The motivation is
  to use as many standard C identifiers as possible, to increase
  portability.
Details

  As an initial cleanup pass, this change removes all symbols declared
  in CM_Global.h that not needed to compile & link the dfa executable.

  Besides making the module itself smaller, a previous change has done
  the same for lrstar's version of the same file.  Now the two files
  can be compared to find out what symbols are unique and what are
  shared; from the shared symbols a library can be created so that
  only one copy of code must be maintained.
Details

  When printing warning messages, the changes that constified the code
  had to allocate a small buffer on the heap and copy text into it to
  avoid having to write into the input data buffer.  The calculations
  related to the end-of-string were incorrect.  This caused warning
  messges to be improperly truncated.

  This change addresses that error.
Details

  This change reformats the code to convert all tabs to 3 spaces,
  updates the emacs file local variables to turn off use of tabs and
  puts all global scoped symbols at column 0 in the files.
Details

 In preparation for a precommit test harness, the directories present
 in this change have been renamed so the directory name matches the
 grammar and lexical grammer files.
Details

  Fix both shell functions so they return the current name of the
  directory in which the build has taken place.  This function is
  necessary because the build-output-directory (BOD) will change based
  on the build type (debug, release).
Details

  In preparation for a precommit harness, this change updates the
  stored grammars to include all files generated by lrstar and dfa.
  This will facilitate easy comparison of the output files when
  modifications are made to the softwawre, to ensure that nothing has
  changed.
Details

  The header file generated by lrstar and dfa have been modified to
  use a 'typedef' instead of '#define' to create some types.  This is
  an acceptable change to the output files that is being updated here.
Details

  The new 'precommit-test' will run lrstar on every sample grammar,
  and run dfa on every sample lexcial grammer.

  The source files that are generated by the tool should, in general,
  not change.  A change signals the need to look closer at the changes
  that are to be submitted to ascertain if they are correct with
  respect to the parser generator or not.

  The text files that are generated contain execution and and memory
  usage.  If neither of these is excessively different, the result is
  acceptable.  If any other data in the log files change, more
  scrutiny should be done on the change to find out why there is a
  difference.
Details

  As prt_message() is not used outside the CM_Global module, it is now
  declared static to this file, and it's prototype is removed from the
  header file.
Details

  The change that reformatted all the source files to be tab-free
  missed this file.  The oversight is now corrected.

  Diffs from git will now look better because tabs will not need to be expanded.
Details

  The 'spaces' variables is declared, and conditionally defined by the
  value of 'MAIN' in CM_Global.h.  This is an unnecessary complication
  in the source code.

  Now, the definition on the variable is in CM_Global.cpp, and the
  declaration is in CM_Global.h.  No special case conditional
  compilation necessary.
Details

 This change untabifies all the files in dfa.  This ensures
 consistency across all source files, regardless of indentation.  This
 most often is an issue when the desired indentation is not a multiple
 of the tab stop, thus requiring spaces to make vertical alignment.

 Additionally, a tabstop of 3 is unusual, so looking at the files with
 any viewer that does not honor the file's tab settings results in a
 very poor viewing experience; this is not a problem when spaces are
 used.
Details

  This change provides a few miscellaneous updates to simplify the
  overall code.

   1. Change 'charcode', 'lower', 'numeric' and 'alpha'  to be
      constant data.

   2. Replace {MAX_DIR, MAX_FILENAME, MAX_FILETYPE} with PATH_MAX

      Unlike Windows, POSIX systems do not distinguish limits for any
      individual part of the pathname.  To simplify, be consistent and
      prevent memory overruns, these identifiers are replaced with the
      single constant describing limits on pathnames: PATH_MAX

Testing

  lrstar-size --lrstar-build-path $(lrstar-build-path) --compare /tmp/lrstar.unmodified/release

                                      .text         .rodata           .data            .bss
  release/src/dfa/dfa           72K     (6)     11K   (544)      3K  (-512)     67K   (11K)
  release/src/lrstar/lrstar     86K   (104)     25K   (865)      6K (-1024)    105K   (11K)

  ./precommit-test
  (only differences were log files; those are acceptable differences)
Details

  Putting functions into a more global scope than needed deprives the
  compiler of optimization opportunities borne of inlining, and
  discourages refactoring.  Accordingly, this change moves functions
  that are only used in one translation unit from CM_Global to that
  translation unit and makes them static.  This reduces the number of
  global functions and makes it a little bit easier to reason about
  dfa.

Testing

  lrstar-size --lrstar-build-path $(lrstar-build-path) --compare /tmp/lrstar.unmodified/release

                                    .text         .rodata           .data            .bss
   release/src/dfa/dfa           72K  (-312)     11K   (544)      3K  (-512)     67K   (11K)
   release/src/lrstar/lrstar     86K    (38)     25K    (1K)      6K (-1024)    105K   (11K)

   (The change to lrstar size is due to other unpushed changes.)

  ./precommit-test
  (only differences were log files; those are acceptable differences)
Details

  Putting functions into a more global scope than needed deprives the
  compiler of optimization opportunities borne of inlining, and
  discourages refactoring.  Accordingly, this change moves functions
  that are only used in one translation unit from CM_Global to that
  translation unit and makes them static.  This reduces the number of
  global functions and makes it a little bit easier to reason about
  lrstar.

Testing

                                      .text         .rodata           .data            .bss
  release/src/dfa/dfa           72K     (0)     11K     (0)      3K     (0)     67K     (0)
  release/src/lrstar/lrstar     85K  (-772)     25K   (-33)      6K     (0)    105K     (0)

  ./precommit-test
  (only differences were log files; those are acceptable differences)
Details

 This change performs the following:

 o Updates build process to Remove -DLINUX from the compilation
    command lines.

 o Updates the build process to set LRSTAR_$(HOSTOS) on the
   compilation command lines.

  The HOSTOS value must be either 'LINUX' or 'WINDOWS'.

 o Turns all HOSTOS-specific compilation to use the now-standard
   LRSTAR_LINUX or LRSTAR_WINDOWS.

Testing

   lrstar-size \
     --lrstar-build-path $(lrstar-build-path) \
     --compare /tmp/lrstar.unmodified/release

                                    .text         .rodata           .data            .bss
   release/src/dfa/dfa           72K     (0)     11K     (0)      3K     (0)     67K     (0)
   release/src/lrstar/lrstar     85K     (0)     25K     (0)      6K     (0)    105K     (0)

  ./scripts/precommit-test
  (only differences were log files; those are acceptable differences)

  It has now been noted that the output of the precommit-test script
  should have updated all the generated C files, and they should all
  now reference the 'lrstar_'-prefixed source files.  Oddly, they do
  not.  This will be addressed in a future commit.
Details

 This change performs the following:

 o Changes the output file generation so that all files are
   unconditionally created.  The previous version of the code would
   only write the non-table files if they didn't already exist on
   disk.  While this undoubtedly saves a very minor amount of time,
   it's objectionable because changes to the LSTAR system will not be
   reflected in source code without manually removing the files.

 o While investigating the file writing, it was discovered to have a
   lot of duplicated code.  This was refactored to use a callback
   function; the main file management is handled by a single function
   that invokes the callback to write the data out to the file.

   This change saves over a Kb of code and R/O data combined.

 o As a byproduct, all the generated files for the grammars have
   changed.  The log files now show faster speeds, and the generated
   code will be tab-free and use the LINUX-version of the #include
   sections.

   Compilation of these sample files will come in a future change.

Testing

   lrstar-size \
     --lrstar-build-path $(lrstar-build-path) \
     --compare /tmp/lrstar.unmodified/release

                                        .text         .rodata           .data            .bss
    release/src/dfa/dfa           72K     (0)     11K     (0)      3K     (0)     67K     (0)
    release/src/lrstar/lrstar     84K  (-866)     24K  (-320)      6K     (0)    105K     (0)

  ./scripts/precommit-test
  (only differences were log files; those are acceptable differences)

  It has now been noted that the output of the precommit-test script
  should have updated all the generated C files, and they should all
  now reference the 'lrstar_'-prefixed source files.  Oddly, they do
  not.  This will be addressed in a future commit.
Details

 A mistake was made committing and pushing the previous change because
 the final changes to PG_Main::GenerateOtherFiles() were not actually
 finished.  This change finishes it up by removing dead code and
 non-longer-necessary variables.  This brings the total size reduction
 for cleaning up the file writing to over 1Kb of code.

Testing

   lrstar-size \
     --lrstar-build-path $(lrstar-build-path) \
     --compare /tmp/lrstar.unmodified/release

                                        .text         .rodata           .data            .bss
    release/src/dfa/dfa           72K     (0)     11K     (0)      3K     (0)     67K     (0)
    release/src/lrstar/lrstar     84K  (-866)     24K  (-320)      6K     (0)    105K     (0)

                                        .text         .rodata           .data            .bss
    release/src/dfa/dfa           72K     (0)     11K     (0)      3K     (0)     67K     (0)
    release/src/lrstar/lrstar     84K  (-196)     24K     (0)      6K     (0)    105K     (0)

  ./scripts/precommit-test
  (only differences were log files; those are acceptable differences)
Details

  To facilitate creating a Makefile for the generated C++ parser files
  without requiring the generated code to reference library source
  files relative to the source directory, the build process has been
  updated to have a top-level 'distribution' target.  The target is
  now the default.

  The 'distribution' target uses the Gnu 'install' utility to copy,
  and set permissions on, the generated files into a directory tree in
  the build process.  The destination directory tree mirrors the
  structure of '/usr/local' to facilitate copying the files to a known
  location in a file system, is so desired.

  To leverage the new 'distribution' location, two new exported shell
  functions are now available:

   o lrstar : Runs lrstar
   o dfa    : Runs dfa

  These functions are made available after loading configuring the
  environment by sourcing 'setup' from the scripts directory.

  Additionally, the following changes have been made:

  o make/config.mk:

    All the variables are now exported so they are available to
    recursive invocations of Make, without re-including the file.

  o src/{dfa,lrstar}/Makefile

    Unnecessary include of 'config.mk' has been removed.

    The main target in these Makefiles is now an actual image, whereas
    previously it was .PHONY target.  A real target is used now
    because the full pathname of each executable must be known
    globally, and this mechanism ensures that the same value is used
    everywhere.

  o make/distribution.mk

    This file executes Gnu 'install' on each file that is to be
    installed into the 'usr/local' directory tree.

  o make/toplevel.mk

    This defines a series if variables for top level build targets.
    Most importantly, the variables specify where the build process
    produces the binary artifact.  The 'distribution.mk' file uses
    this to copy them to the final destination.

  o scripts/functions

    Various updates to accommodate the new 'usr/local' location of the
    delivered build artifacts.

  o scripts/precommit-tests

    Updated to use the new final artifact location, and to enable
    'dfa' execution.  This produces a lot of new files that are not
    present in Paul's original Zip file.  They will be committed as a
    separate change.

Testing

   lrstar-size \
     --lrstar-build-path $(lrstar-build-path) \
     --compare /tmp/lrstar.unmodified/release

                                        .text         .rodata           .data            .bss
    release/src/dfa/dfa           72K     (0)     11K     (0)      3K     (0)     67K     (0)
    release/src/lrstar/lrstar     84K     (0)     24K     (0)      6K     (0)    105K     (0)

  ./scripts/precommit-test
  (New files produced, and many negligible differences in log files.
  Will be committed as a separate change.)
Details

  This change simply adds some more information to the README.md file
  to help interested people configure a development enviornment and
  build the tools.

  The tools are not-yet easily usable on Linux, but they do build.
Details

  This change checkpoints all the files produced by the
  'precommit-test' that are either new or changed.  This will be a new
  baseline that will be used to measure new commits.
Details

  Updates the entire set of software to make the example grammar's
  generated files trivially compilable on Linux.

 o lrstar_basic_defs.h

   This file is renamed from 'basic_defs.h' to ensure that there are
   no name conflicts with other files from other pieces of software
   that might be used.  It also makes things more uniform to ensure
   that all exported source files have an 'lrstar_' prefix.

 o distribution.mk

   Deliver 'lrstar_basic_defs.h'.  This file is necessary to deliver
   so that the type & macro definitions present can be used by the
   lrstar library code.

 o precommit-test

   The script is updated to change to the root of the project source
   directory before running the tools on all the grammars.  This makes
   it possible to run the script from anywhere.

 o lrstar_main.cpp

   Changed 'basic_defs.h' to 'lrstar_basic_defs.h'.
   Fixed a spelling error in a message.

 o lrstar_parser.{cpp,h}

   Fixed 'const' errors with string literal assignments.

   Also fixed a '>' comparison of a pointer; a comparison 'p > 0' is
   not logically meaningful.  The fix is to simply compare it with 0
   (or NULL).

 o dfa/CM_Global.h

   Changed 'basic_defs.h' to 'lrstar_basic_defs.h'.

 o PG_Generate.cpp

   Addressed known 'const' issues with assignment of string literals.

   Added the generation of a sufficient, but rudimentary, Makefile for
   generated grammars.

   The generated files for the example grammars can now be compiled &
   linked into a same-named executable in the same directory with an
   invocation like:

     make LRSTAR_INSTALL_ROOT=/tmp/lrstar/release/usr/local

   The LRSTAR_INSTALL_ROOT is the root directory where the LRSTAR
   software is installed.  At present, this will generally be in the
   build directory for the current build environment.

   The Makefiles and changes to the generated files will be submitted
   in a different commit.

Testing

   lrstar-size \
     --lrstar-build-path $(lrstar-build-path) \
     --compare /tmp/lrstar.unmodified/release

                                        .text         .rodata           .data            .bss
    release/src/dfa/dfa           72K     (0)     11K     (0)      3K     (0)     67K     (0)
    release/src/lrstar/lrstar     84K     (6)     25K   (608)      6K     (0)    105K     (0)

  ./scripts/precommit-test
  (New files produced, and many negligible differences in log files.
  Will be committed as a separate change.)

combine
Details

  This commit contains all the sample grammar's update source files.
  These new source files include the necessary 'lrstar_basic_defs.h'
  file, and address 'const' issues with assignment of string literals.

  Additionally, a rudimentary Makefile has been added for each sample
  grammar.

  The Makefiles can be used to build the sample grammar into an
  executable with a command like the following:

    make LRSTAR_INSTALL_ROOT=/tmp/lrstar/release/usr/local
Details

  Updating lrstar / dfa to generate a Makefile caused new log files to
  be generated.  This change checkpoints the log files for comparison
  against future changes.
Details

  The file guard for this file was not changed when the file was
  renamed.  This change corrects that oversight.
Details

  This trivial change adds emacs formatting comments to these files,
  converts them from MS-DOS to Unix format, and converts tabs to 3
  spaces.

Testing

  Visual inspection.
Details

  To move forwards, reformat code to use spaces instead of tabs, and
  start the lexical scoping at column 0.

  No semantic changes, only formatting.
Details

  As a testament to the dangers of conditional compilation, a syntax
  error has been found in lrstar_parser.cpp under the CPP symbol
  INSENSITIVE.  These types of errors can periodically creep in to C
  source that uses conditional compilation.  Fortunately the fix for
  this is easy, since it's simply a missing ']'.  In many other cases
  (in other software), data structures may have changed and
  determining a proper fix may require a lot of source code
  archeology.

  A future series of changes will convert all these pieces of
  conditional compilation into constant compile-time expressions.  The
  build tools will eliminate the unused code, and any defined, but
  unused data.

Testing

   lrstar-size \
     --lrstar-build-path $(lrstar-build-path) \
     --compare /tmp/lrstar.unmodified/release

                                         .text         .rodata           .data            .bss
    release/src/dfa/dfa           72K     (0)     11K     (0)      3K     (0)     67K     (0)
    release/src/lrstar/lrstar     84K     (0)     25K     (0)      6K     (0)    105K     (0)

  ./scripts/precommit-test
  No changes to important files.

[lrstar_parser.cpp] Correct relational pointer comparison with 0

Details

  Performing a '> 0' comparison with a pointer is non-sensical.
  This changes the '>' to '!='.
thutt and others added 29 commits November 30, 2023 21:27
Details

 While this example is not fully complete, it is believed that the
 grammar is now correct enough to read json files and create an AST.
 The ultimate goal for the example is to read the full JSON file into
 an internal representation to show how lrstar can be used to quickly
 make parsers.

Testing

  source ~/lrstar/lrstar/scripts/setup \
    --build-type debug \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test

  source ~/lrstar/lrstar/scripts/setup \
    --build-type release \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test
Details

 The Node allocation system allocates nodes in blocks of 'max_nodes'
 (100,000 by default).  When all allocated nodes are used, another
 block is allocated.

 While this is simplistic, and a little bit faster than allocating
 individual nodes at a time, it adds complexity to the code overall
 and only works if all AST nodes are exactly the same.  It's also
 wasteful of memory for cases where more than 1,000,000 nodes are not
 used.  The Node structure has three (3) four-byte integer fields, and
 five (5) eight-byte pointers.  That is 52 bytes for each Node, for a
 total of 49Mb per Node allocation.

 A more flexible system would allow the creation of nodes based on the
 AST node annotation in the grammer.  This will allow the user to
 create AST nodes that have additional fields that may be necessary
 when processing the AST.

 Accordingly, this change removes the preallocation in favor of
 allocation-when-needed.  A future change will update the system to
 use user-defined Node types.

Testing

  source ~/lrstar/lrstar/scripts/setup \
    --build-type debug \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test

  source ~/lrstar/lrstar/scripts/setup \
    --build-type release \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test
Details

  Although the comment in the orignal Windows source code for the
  lowercase variable states:

     // lowercase[x] is x.

  This is not true, and a more <careful> inspection of the actual
  array contents will show that the uppercase ASCII letters are mapped
  to their lowercase equivalent.

  This change reinstates the lowercase variable and makes it
  accessible through an object file supplied with the distribution of
  the software.

  Makefiles are updated to link with the new object file to reinstate
  true case insensitivity.

Testing

  source ~/lrstar/lrstar/scripts/setup \
    --build-type debug \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test

  source ~/lrstar/lrstar/scripts/setup \
    --build-type release \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test
Details

  An incorrect reading of the code and use of these two functions led
  to the conclusion that they could return a 'bool' rather than an
  'int'.  This broke ND parsing, and the 'LRK' sample no longer
  functioned.

  The LRK sample would fail with a syntax error, wheras now it
  successfully parses the file.

   /tmp/lrstar/release/examples/LRK/LRK test.input.txt

Testing

  source ~/lrstar/lrstar/scripts/setup \
    --build-type debug \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test

  source ~/lrstar/lrstar/scripts/setup \
    --build-type release \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test

  Before this change, the following command, run from the LRK
  directory, after running precommit-test, would fail with a syntax
  error.  It now works correctly.

   /tmp/lrstar/release/examples/LRK/LRK test.input.txt
Details

  This change is the result of manually examining every 'make.bat'
  file and transferring their lrstar & dfa options to precommit-test.

Testing

  source ~/lrstar/lrstar/scripts/setup \
    --build-type debug \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test

  source ~/lrstar/lrstar/scripts/setup \
    --build-type release \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test
Details

  This change improves the delivery of the 'lowercase' array by
  putting it into a static library that is linked with each generated
  parser.

  The work to create the static library can be leveraged in the future
  if other functions or data need to be delivered, but are not part of
  the templates used by the system.

Testing

  source ~/lrstar/lrstar/scripts/setup \
    --build-type debug \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test

  source ~/lrstar/lrstar/scripts/setup \
    --build-type release \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test
Details

  Overloaded functions makes reading, reasoning about, and finding
  functions in the source tree needlessly difficult.  This change
  alters the names of a the overloaded print_ast(), and taverse() to
  make the code more maintainable.

Testing

  source ~/lrstar/lrstar/scripts/setup \
    --build-type debug \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test

  source ~/lrstar/lrstar/scripts/setup \
    --build-type release \
    --bod /tmp/lrstar &&  \
  cd ${LRSTAR_DIR}

  lrstar-build -j 20 clean
  lrstar-build -j 20
  ${LRSTAR_DIR}/scripts/precommit-test
Details

  This change rearranges and standardizes the formatting of
  lrstar_main.cpp.  The rearrangement allows the rmeoval of superflous
  forward declarations, and the reformatting ensures that the code has
  a standard format with standard formatting.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  This change updates the 'path' field from 256 characters to
  PATH_MAX.  256 is not enough to store a full Linux pathname (as
  defined by the C library).

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

 One key to adoption is to use standard system methods for doing
 mundane tasks.  Rolling a custom argument parser is nice, but using a
 system that is familiar to Linux programmers is better.

 Although the number of arguments the sample grammars currently take
 is zero, laying a good framework encourages experimentation and
 change.

 Besides updating lrstar_main.cpp, this change also adds the argument
 processor to the library so that it can be more easily reused by
 others making parsers.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  This change reduces the overall size of the lrstar_parser structure
  by moving three array variables that are used for printing the AST
  to function-local because they are only used by one function.

  The fourth variable of this family is used by two functions.  To
  remove this (practically) unneeded field as well will require
  changing how the AST is written; that will come as a future change.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  The lrstar_parser constructor was formatted differently than all
  other functions.  This change provides more consistency.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  Adding a destructor for lrstar_parser is another step towards being
  able to instantiate multiple parsers.  Multiple parsers could be
  used to process multiple files provided on the command line, provide
  a service where multiple threads process individual files, or even
  support multiple languages.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  To further progress towards making it possible to easily process
  multiple files, and use multiple parsers in the same program, this
  change turns the generated 'generated_parser' variable into a
  pointer.  It is automatically allocated at startup, and deleted at
  the end of the run.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  A bit of housekeeping here to move header files from the 'code'
  directory to the 'include' directory.  This structure is more
  analogous to the distribution directory structure.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  The lrstar_main.h file is no longer needed to build the project or
  any of the sample grammars.  Because the file is now dead, and less
  code is easier to maintain, the file is removed.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  The lrstar_mmain.cpp implementation of main() should be considered
  as a sampler for how to use lrstar system.  In that light, work is
  underway to have lrstar generate a main in <grammar>_Main.cpp that
  will invoke the parsing code in lrstar_main.cpp.  This will leave
  the parser project with a 'main' file that can be edited to perform
  desired work.  This is a significant difference from the
  lrstar-generated code as it stands today because there is a shared
  file that is used for main() by each grammer; ediging that file will
  affect any other generated grammer.

  The new approach will be more flexible for the user.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  This trivial change removes #include lines that are no longer necessary.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  Update the text output when '--help' is specified to include the
  meaning of '--output'.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  Being able to effectively use a parser-generator is dependent on how
  easy the generated source is to use.  This major change updates the
  system to make it significantly easier to use by the following:

   o Update lrstar to generate a 'user main' file.

     This new file will contain a simple use of the generated parser,
     and is to be used as the starting point to making whatever tool
     is desired.  As generated, the file will parse a single file
     supplied on the command line, and then exit.

     Because lrstar will not overwrite this file if it exists, it can
     be modified to suit the developer's needs without fear that the
     work can be lost.

   o Delete lrstar_main.cpp

     Because a whole file is generated, there is no need to keep
     around a C++ file that used to be included by the generated
     parser sources.

   o Delete lrstar_cmdline.{cpp,h}

     The generated 'user main' file also has a skeletal getopt()-based
     command line processor.  Because of this, the library version is
     no longer needed.

   o Each of the sample grammars now comes with the generated 'user
   main' file, and and updated Makefile.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  Now that the main parser is dynamically allocated, it makes sense to
  allow an '--iterations' option to the basic command line arguments
  to test for memory leaks.

  Unfortunately, the lrstar_parser code generates a SEGV in in
  symbol_name(int sti) in the Calc example on the second iteration.
  This failure led to work adding constructors to the types (a still
  unfinished work), and simplifying the use pattern of the parser (for
  example, init_lexer() is removed).

  At this point, the root cause of the SEGV has not been determined,
  but it is known that adding a constructor to the Symbol class causes
  the system to SEGV on the first run.

  This commit is a checkpoint of the work to add constructors and
  remove 'init' functions.

  The lrstar program generates new files with this change, but they
  will be committed in an upcoming change.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

Squash

Squash
Details

  This change updates all the generated files to take '--iterations'
  and changes to the management of the lrstar_parser type in memory
  into account.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  A good way to find use of uninitialized memory is to initialize the
  memory to values that are known to cause failures when used.  The
  effort to create constructors for the parser proper is predicated on
  that idea, but as noted in a previous commit, adding a constructor
  to Symbol caused the Calc example grammar to SEGV when attempting to
  print a symbol.

  Consequently, the Symbol class now has a constructor that sets all
  the fields to illegal values.  Addtionally, an assert is used in
  symbol_name() to catch if any uninitialized Symbols instances are
  used.

  The cause of the SEGV was using an unitialized Symbol structure with
  a 'length' field set to ~0U, and a NULL 'start' field.  Because the
  length field was non-zero, a copy of the symbol name was triggered,
  and then a NULL pointer was dereferenced.  The root cause of the
  SEGV was use of the Symbol information from a Node that actually
  indicated 'no symbol is associated with this Node' (sti = 0, symbol
  = 0, line = 0).  The ultimate solution for this problem is to define:

    A Node's having the { symbol, sti, line } fields all 0 has no
    Symbol associated.

  With that established via two new methods in the Node structure, the
  use of uninialized memory is now mitigated in
  lrstar_parser::tracer().

  With this change it is possible / required to dyanamically allocate
  a parser (and many different or same can be allocated at the same
  time).  This facilitates using multiple parsers in the same
  executable, at the same time.

Testing

  To test that there were no memory leaks, the following was run:

    cd examples/Calc/
    time -p \
       /tmp/lrstar/release/examples/Calc/Calc \
          --iterations 1000000 \
          --output /dev/null \
          test.input.txt  >/dev/null

     Across a million runs, a memory leak would have been visible as a
     continual increase in memory utilization.  But that was not seen.

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  This change is heavily inspired by 'mingodad' enabling '-Wall' in a
  pull request.  The decision to go a bit further by turning all
  warnings into errors is also taken to ensure that the tools help the
  developers to write better code.

  In both tools, the following changes have been made:

   o Remove unused code & variables.

   o Remove variables that are assigned but never used.

   o Initialize variables that could be used uninitialized.

   o Cast 'char' array indexes to 'unsigned char' becuase the
     signed-ness of the 'char' type is implementation defined; if
     'char' is signed, then negative array indexes would be used for
     values greater than 127.

   o Removed commented-out code in proximity to other changes.

  Future changes will increase the number of code-quality diagnostics
  turned on when building {dfa, lrstar}.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  Some versions of GCC coupled with default settings of Bash running
  under Ubuntu use non-ASCII characters to print diagnostic messages.
  This yields a less than aesthetic output because the messages will
  contain non-displayable characters.

  A fix for this problem is to set the LANG to 'C'.

  This change does that and as a result diagnostic messages look
  better now.

Testing

  Visual inspection.
Details

  The lrstar program writes the ASCII value of { 0, 1 } to
  'lrstar.txt' to communicate the  return code it produced on the last
  run.  If the value written to the file is not (ASCII) '0', dfa will
  silently terinate with a non-zero return code.

  This is only necessary if lrstar and dfa are run separately from
  each other, and while that may be desirable while iterating on a
  grammar, both tools are fast enough with the default settings have a
  wrapper script that manages the execution of both tools, and
  simplifies the overall workflow.

  For example, PLSQL, which creates ~2Mb of parser tables takes only
  20 seconds to process both the syntax and lexical grammars on an old
  Intel NUC; most grammars will be much faster than that.  C11 takes
  0.21 seconds.

  To address this concern, and remove 'lrstar.txt', the following
  actions have been taken:

   o A new 'lrgen' script has been created.  This is now the preferred
     way to build the syntax and lexical grammars.  In its current
     form, it is executed like so:

       lrgen --directory <path of directory holding grammars> \
          --grm <grammar name>
          [--grmopt <lrstar grammar options>]
          [--lgropt <dfa grammar options>]

     The syntax & lexical grammars must reside in the same directory,
     and they must have the same basename.

     This tool is the preferred way to generate the parser.

   o All lrstar.txt files are removed.

   o Both 'dfa' and 'lrstar' have been modified to not use
     'lrstar.txt'.

   o precommit-test has been updated to use 'lrgen' rather than
     invoking each tool separately.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test
Details

  This change updates the lrgren tool to make the '--grm' argument
  optional.  If the '--grm' argument is not provided, the last
  component of the '--directory' specification will be used as the
  grammar name.

  Additionally, the '--help' text is elaborated, and a few invocation
  examples are provided.

  Furthermore, 'lrgen' is now delivered as part of the build process.

  Finally, some additional help text has been added to resolve issues
  with invoking lrstar & dfa, as well as the semantics for finding the
  executables.

Testing

      source ~/lrstar/lrstar/scripts/setup \
        --build-type debug \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      rm -rf ${LRSTAR_BUILD_DIR}        # A really clean build for testing.

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

      source ~/lrstar/lrstar/scripts/setup \
        --build-type release \
        --bod /tmp/lrstar &&  \
      cd ${LRSTAR_DIR}

      lrstar-build -j 20 clean
      lrstar-build -j 20
      ${LRSTAR_DIR}/scripts/precommit-test

   <open a new shell prompt, and attempt to use lrgen>

      /tmp/lrstar/release/usr/local/bin/lrgen \
        --directory <root>/lrstar/lrstar/grammars/ANTLR
      fatal: lrstar not found.  See --help.

      PATH=${PATH}:/tmp/lrstar/release/usr/local/bin

      /tmp/lrstar/release/usr/local/bin/lrgen \
        --directory <root>/lrstar/lrstar/grammars/ANTLR

      LRSTAR 24.0.017 32b Copyright Paul B Mann.
      ANTLR.grm
      Grammar      213 rules, 43 terminals, 126 nonterminals.
      States       305 states in LALR(1) state machine.
                   148 states when using shift-reduce actions.
      Conflicts      4 states, 4 shift-reduce, 2 reduce-reduce.
                  rows   cols          matrix       list      vect     total
      B matrix      51 x   38 x 1 =     1,938 ->     169 +     234 =     403
      T matrix       9 x   38 x 2 =       684 ->     402 +     191 =     593
      N matrix      29 x   76 x 2 =     4,408 ->   1,038 +     509 =   1,547
      R matrix       9 x   14 x 1 =       121 ->     121 +     339 =     460
      Total                                        1,730 +   1,273 =   3,003
      0 min 0.011 sec, 31.416 MB, 0 warnings, 0 errors.
      DFA 24.0.016 32b Copyright Paul B Mann.
      ANTLR.lgr
      Grammar     1603 rules, 256 terminals, 63 nonterminals.
      States      5338 states before converting to a DFA.
                   136 states in final DFA state machine.
      Conflicts      0 states, 0 shift-reduce, 0 reduce-reduce.
                  rows   cols          matrix       list      vect     total
      T matrix     113 x   57 x 1 =     6,441 ->   6,441 +     664 =   7,105
      0 min 0.095 sec, 38.784 MB, 0 warnings, 0 errors.
Details

  This change does the following:

  o Adds information about sample grammars contained in this
    parser-generator repository.

  o Attempts to ensure line breaks in markdown with two spaces at the
    end of lines.  This may (or may not) work because Markdown is not
    fully standardized...

Testing

  Visual inspection.
@mingodad mingodad closed this Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants