Fix lex prt line #8

mingodad · 2023-12-18T10:33:02Z

No description provided.

Details This update contains the full state of the lrstar software of an ongoing Linux port. The direction of this project is as follows 1. Create a build process 2. Get project to build 3. Get project to build with no default-settings errors / warnings. 4. Remove unused code. 5. Make library of common code 6. Increase strictness of compilation options 8. Use getopt_long() for options. 7. Create documentation As of this commit, the project is at step 4. To build the software, first the host environment must be set up by loading the 'setup' script and setting the build type and the build-output-directory (BOD). source ./scripts/setup --build-type debug --bod /tmp/lrstar The build type must be (debug | release). The BOD is the location where all build artifacts, including the resulting executables, will be placed. Once the environment has been configured, building can be done with: lrstar-build [-j <NUM CPUS>] To remove all the build artifacts (aka a 'clean'): lrstar-build [-j <NUM CPUS>] clean

Details Replace the non-standard MAX_PATH with PATH_MAX. The motivation is to use as many standard C identifiers as possible, to increase portability.

Details As an initial cleanup pass, this change removes all symbols declared in CM_Global.h that not needed to compile & link the dfa executable. Besides making the module itself smaller, a previous change has done the same for lrstar's version of the same file. Now the two files can be compared to find out what symbols are unique and what are shared; from the shared symbols a library can be created so that only one copy of code must be maintained.

Details When printing warning messages, the changes that constified the code had to allocate a small buffer on the heap and copy text into it to avoid having to write into the input data buffer. The calculations related to the end-of-string were incorrect. This caused warning messges to be improperly truncated. This change addresses that error.

Details This change reformats the code to convert all tabs to 3 spaces, updates the emacs file local variables to turn off use of tabs and puts all global scoped symbols at column 0 in the files.

Details In preparation for a precommit test harness, the directories present in this change have been renamed so the directory name matches the grammar and lexical grammer files.

Details Fix both shell functions so they return the current name of the directory in which the build has taken place. This function is necessary because the build-output-directory (BOD) will change based on the build type (debug, release).

Details In preparation for a precommit harness, this change updates the stored grammars to include all files generated by lrstar and dfa. This will facilitate easy comparison of the output files when modifications are made to the softwawre, to ensure that nothing has changed.

Details The header file generated by lrstar and dfa have been modified to use a 'typedef' instead of '#define' to create some types. This is an acceptable change to the output files that is being updated here.

Details The new 'precommit-test' will run lrstar on every sample grammar, and run dfa on every sample lexcial grammer. The source files that are generated by the tool should, in general, not change. A change signals the need to look closer at the changes that are to be submitted to ascertain if they are correct with respect to the parser generator or not. The text files that are generated contain execution and and memory usage. If neither of these is excessively different, the result is acceptable. If any other data in the log files change, more scrutiny should be done on the change to find out why there is a difference.

Details As prt_message() is not used outside the CM_Global module, it is now declared static to this file, and it's prototype is removed from the header file.

Details The change that reformatted all the source files to be tab-free missed this file. The oversight is now corrected. Diffs from git will now look better because tabs will not need to be expanded.

Details The 'spaces' variables is declared, and conditionally defined by the value of 'MAIN' in CM_Global.h. This is an unnecessary complication in the source code. Now, the definition on the variable is in CM_Global.cpp, and the declaration is in CM_Global.h. No special case conditional compilation necessary.

Details This change untabifies all the files in dfa. This ensures consistency across all source files, regardless of indentation. This most often is an issue when the desired indentation is not a multiple of the tab stop, thus requiring spaces to make vertical alignment. Additionally, a tabstop of 3 is unusual, so looking at the files with any viewer that does not honor the file's tab settings results in a very poor viewing experience; this is not a problem when spaces are used.

Details This change provides a few miscellaneous updates to simplify the overall code. 1. Change 'charcode', 'lower', 'numeric' and 'alpha' to be constant data. 2. Replace {MAX_DIR, MAX_FILENAME, MAX_FILETYPE} with PATH_MAX Unlike Windows, POSIX systems do not distinguish limits for any individual part of the pathname. To simplify, be consistent and prevent memory overruns, these identifiers are replaced with the single constant describing limits on pathnames: PATH_MAX Testing lrstar-size --lrstar-build-path $(lrstar-build-path) --compare /tmp/lrstar.unmodified/release .text .rodata .data .bss release/src/dfa/dfa 72K (6) 11K (544) 3K (-512) 67K (11K) release/src/lrstar/lrstar 86K (104) 25K (865) 6K (-1024) 105K (11K) ./precommit-test (only differences were log files; those are acceptable differences)

Details Putting functions into a more global scope than needed deprives the compiler of optimization opportunities borne of inlining, and discourages refactoring. Accordingly, this change moves functions that are only used in one translation unit from CM_Global to that translation unit and makes them static. This reduces the number of global functions and makes it a little bit easier to reason about dfa. Testing lrstar-size --lrstar-build-path $(lrstar-build-path) --compare /tmp/lrstar.unmodified/release .text .rodata .data .bss release/src/dfa/dfa 72K (-312) 11K (544) 3K (-512) 67K (11K) release/src/lrstar/lrstar 86K (38) 25K (1K) 6K (-1024) 105K (11K) (The change to lrstar size is due to other unpushed changes.) ./precommit-test (only differences were log files; those are acceptable differences)

Details Putting functions into a more global scope than needed deprives the compiler of optimization opportunities borne of inlining, and discourages refactoring. Accordingly, this change moves functions that are only used in one translation unit from CM_Global to that translation unit and makes them static. This reduces the number of global functions and makes it a little bit easier to reason about lrstar. Testing .text .rodata .data .bss release/src/dfa/dfa 72K (0) 11K (0) 3K (0) 67K (0) release/src/lrstar/lrstar 85K (-772) 25K (-33) 6K (0) 105K (0) ./precommit-test (only differences were log files; those are acceptable differences)

Details This change performs the following: o Updates build process to Remove -DLINUX from the compilation command lines. o Updates the build process to set LRSTAR_$(HOSTOS) on the compilation command lines. The HOSTOS value must be either 'LINUX' or 'WINDOWS'. o Turns all HOSTOS-specific compilation to use the now-standard LRSTAR_LINUX or LRSTAR_WINDOWS. Testing lrstar-size \ --lrstar-build-path $(lrstar-build-path) \ --compare /tmp/lrstar.unmodified/release .text .rodata .data .bss release/src/dfa/dfa 72K (0) 11K (0) 3K (0) 67K (0) release/src/lrstar/lrstar 85K (0) 25K (0) 6K (0) 105K (0) ./scripts/precommit-test (only differences were log files; those are acceptable differences) It has now been noted that the output of the precommit-test script should have updated all the generated C files, and they should all now reference the 'lrstar_'-prefixed source files. Oddly, they do not. This will be addressed in a future commit.

Details This change performs the following: o Changes the output file generation so that all files are unconditionally created. The previous version of the code would only write the non-table files if they didn't already exist on disk. While this undoubtedly saves a very minor amount of time, it's objectionable because changes to the LSTAR system will not be reflected in source code without manually removing the files. o While investigating the file writing, it was discovered to have a lot of duplicated code. This was refactored to use a callback function; the main file management is handled by a single function that invokes the callback to write the data out to the file. This change saves over a Kb of code and R/O data combined. o As a byproduct, all the generated files for the grammars have changed. The log files now show faster speeds, and the generated code will be tab-free and use the LINUX-version of the #include sections. Compilation of these sample files will come in a future change. Testing lrstar-size \ --lrstar-build-path $(lrstar-build-path) \ --compare /tmp/lrstar.unmodified/release .text .rodata .data .bss release/src/dfa/dfa 72K (0) 11K (0) 3K (0) 67K (0) release/src/lrstar/lrstar 84K (-866) 24K (-320) 6K (0) 105K (0) ./scripts/precommit-test (only differences were log files; those are acceptable differences) It has now been noted that the output of the precommit-test script should have updated all the generated C files, and they should all now reference the 'lrstar_'-prefixed source files. Oddly, they do not. This will be addressed in a future commit.

Details A mistake was made committing and pushing the previous change because the final changes to PG_Main::GenerateOtherFiles() were not actually finished. This change finishes it up by removing dead code and non-longer-necessary variables. This brings the total size reduction for cleaning up the file writing to over 1Kb of code. Testing lrstar-size \ --lrstar-build-path $(lrstar-build-path) \ --compare /tmp/lrstar.unmodified/release .text .rodata .data .bss release/src/dfa/dfa 72K (0) 11K (0) 3K (0) 67K (0) release/src/lrstar/lrstar 84K (-866) 24K (-320) 6K (0) 105K (0) .text .rodata .data .bss release/src/dfa/dfa 72K (0) 11K (0) 3K (0) 67K (0) release/src/lrstar/lrstar 84K (-196) 24K (0) 6K (0) 105K (0) ./scripts/precommit-test (only differences were log files; those are acceptable differences)

Details To facilitate creating a Makefile for the generated C++ parser files without requiring the generated code to reference library source files relative to the source directory, the build process has been updated to have a top-level 'distribution' target. The target is now the default. The 'distribution' target uses the Gnu 'install' utility to copy, and set permissions on, the generated files into a directory tree in the build process. The destination directory tree mirrors the structure of '/usr/local' to facilitate copying the files to a known location in a file system, is so desired. To leverage the new 'distribution' location, two new exported shell functions are now available: o lrstar : Runs lrstar o dfa : Runs dfa These functions are made available after loading configuring the environment by sourcing 'setup' from the scripts directory. Additionally, the following changes have been made: o make/config.mk: All the variables are now exported so they are available to recursive invocations of Make, without re-including the file. o src/{dfa,lrstar}/Makefile Unnecessary include of 'config.mk' has been removed. The main target in these Makefiles is now an actual image, whereas previously it was .PHONY target. A real target is used now because the full pathname of each executable must be known globally, and this mechanism ensures that the same value is used everywhere. o make/distribution.mk This file executes Gnu 'install' on each file that is to be installed into the 'usr/local' directory tree. o make/toplevel.mk This defines a series if variables for top level build targets. Most importantly, the variables specify where the build process produces the binary artifact. The 'distribution.mk' file uses this to copy them to the final destination. o scripts/functions Various updates to accommodate the new 'usr/local' location of the delivered build artifacts. o scripts/precommit-tests Updated to use the new final artifact location, and to enable 'dfa' execution. This produces a lot of new files that are not present in Paul's original Zip file. They will be committed as a separate change. Testing lrstar-size \ --lrstar-build-path $(lrstar-build-path) \ --compare /tmp/lrstar.unmodified/release .text .rodata .data .bss release/src/dfa/dfa 72K (0) 11K (0) 3K (0) 67K (0) release/src/lrstar/lrstar 84K (0) 24K (0) 6K (0) 105K (0) ./scripts/precommit-test (New files produced, and many negligible differences in log files. Will be committed as a separate change.)

Details This change simply adds some more information to the README.md file to help interested people configure a development enviornment and build the tools. The tools are not-yet easily usable on Linux, but they do build.

Details This change checkpoints all the files produced by the 'precommit-test' that are either new or changed. This will be a new baseline that will be used to measure new commits.

Details Updates the entire set of software to make the example grammar's generated files trivially compilable on Linux. o lrstar_basic_defs.h This file is renamed from 'basic_defs.h' to ensure that there are no name conflicts with other files from other pieces of software that might be used. It also makes things more uniform to ensure that all exported source files have an 'lrstar_' prefix. o distribution.mk Deliver 'lrstar_basic_defs.h'. This file is necessary to deliver so that the type & macro definitions present can be used by the lrstar library code. o precommit-test The script is updated to change to the root of the project source directory before running the tools on all the grammars. This makes it possible to run the script from anywhere. o lrstar_main.cpp Changed 'basic_defs.h' to 'lrstar_basic_defs.h'. Fixed a spelling error in a message. o lrstar_parser.{cpp,h} Fixed 'const' errors with string literal assignments. Also fixed a '>' comparison of a pointer; a comparison 'p > 0' is not logically meaningful. The fix is to simply compare it with 0 (or NULL). o dfa/CM_Global.h Changed 'basic_defs.h' to 'lrstar_basic_defs.h'. o PG_Generate.cpp Addressed known 'const' issues with assignment of string literals. Added the generation of a sufficient, but rudimentary, Makefile for generated grammars. The generated files for the example grammars can now be compiled & linked into a same-named executable in the same directory with an invocation like: make LRSTAR_INSTALL_ROOT=/tmp/lrstar/release/usr/local The LRSTAR_INSTALL_ROOT is the root directory where the LRSTAR software is installed. At present, this will generally be in the build directory for the current build environment. The Makefiles and changes to the generated files will be submitted in a different commit. Testing lrstar-size \ --lrstar-build-path $(lrstar-build-path) \ --compare /tmp/lrstar.unmodified/release .text .rodata .data .bss release/src/dfa/dfa 72K (0) 11K (0) 3K (0) 67K (0) release/src/lrstar/lrstar 84K (6) 25K (608) 6K (0) 105K (0) ./scripts/precommit-test (New files produced, and many negligible differences in log files. Will be committed as a separate change.) combine

Details This commit contains all the sample grammar's update source files. These new source files include the necessary 'lrstar_basic_defs.h' file, and address 'const' issues with assignment of string literals. Additionally, a rudimentary Makefile has been added for each sample grammar. The Makefiles can be used to build the sample grammar into an executable with a command like the following: make LRSTAR_INSTALL_ROOT=/tmp/lrstar/release/usr/local

Details Updating lrstar / dfa to generate a Makefile caused new log files to be generated. This change checkpoints the log files for comparison against future changes.

Details The file guard for this file was not changed when the file was renamed. This change corrects that oversight.

Details This trivial change adds emacs formatting comments to these files, converts them from MS-DOS to Unix format, and converts tabs to 3 spaces. Testing Visual inspection.

Details To move forwards, reformat code to use spaces instead of tabs, and start the lexical scoping at column 0. No semantic changes, only formatting.

Details As a testament to the dangers of conditional compilation, a syntax error has been found in lrstar_parser.cpp under the CPP symbol INSENSITIVE. These types of errors can periodically creep in to C source that uses conditional compilation. Fortunately the fix for this is easy, since it's simply a missing ']'. In many other cases (in other software), data structures may have changed and determining a proper fix may require a lot of source code archeology. A future series of changes will convert all these pieces of conditional compilation into constant compile-time expressions. The build tools will eliminate the unused code, and any defined, but unused data. Testing lrstar-size \ --lrstar-build-path $(lrstar-build-path) \ --compare /tmp/lrstar.unmodified/release .text .rodata .data .bss release/src/dfa/dfa 72K (0) 11K (0) 3K (0) 67K (0) release/src/lrstar/lrstar 84K (0) 25K (0) 6K (0) 105K (0) ./scripts/precommit-test No changes to important files. [lrstar_parser.cpp] Correct relational pointer comparison with 0 Details Performing a '> 0' comparison with a pointer is non-sensical. This changes the '>' to '!='.

Details While this example is not fully complete, it is believed that the grammar is now correct enough to read json files and create an AST. The ultimate goal for the example is to read the full JSON file into an internal representation to show how lrstar can be used to quickly make parsers. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details The Node allocation system allocates nodes in blocks of 'max_nodes' (100,000 by default). When all allocated nodes are used, another block is allocated. While this is simplistic, and a little bit faster than allocating individual nodes at a time, it adds complexity to the code overall and only works if all AST nodes are exactly the same. It's also wasteful of memory for cases where more than 1,000,000 nodes are not used. The Node structure has three (3) four-byte integer fields, and five (5) eight-byte pointers. That is 52 bytes for each Node, for a total of 49Mb per Node allocation. A more flexible system would allow the creation of nodes based on the AST node annotation in the grammer. This will allow the user to create AST nodes that have additional fields that may be necessary when processing the AST. Accordingly, this change removes the preallocation in favor of allocation-when-needed. A future change will update the system to use user-defined Node types. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details Although the comment in the orignal Windows source code for the lowercase variable states: // lowercase[x] is x. This is not true, and a more <careful> inspection of the actual array contents will show that the uppercase ASCII letters are mapped to their lowercase equivalent. This change reinstates the lowercase variable and makes it accessible through an object file supplied with the distribution of the software. Makefiles are updated to link with the new object file to reinstate true case insensitivity. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details An incorrect reading of the code and use of these two functions led to the conclusion that they could return a 'bool' rather than an 'int'. This broke ND parsing, and the 'LRK' sample no longer functioned. The LRK sample would fail with a syntax error, wheras now it successfully parses the file. /tmp/lrstar/release/examples/LRK/LRK test.input.txt Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test Before this change, the following command, run from the LRK directory, after running precommit-test, would fail with a syntax error. It now works correctly. /tmp/lrstar/release/examples/LRK/LRK test.input.txt

Details This change is the result of manually examining every 'make.bat' file and transferring their lrstar & dfa options to precommit-test. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details This change improves the delivery of the 'lowercase' array by putting it into a static library that is linked with each generated parser. The work to create the static library can be leveraged in the future if other functions or data need to be delivered, but are not part of the templates used by the system. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details Overloaded functions makes reading, reasoning about, and finding functions in the source tree needlessly difficult. This change alters the names of a the overloaded print_ast(), and taverse() to make the code more maintainable. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details This change rearranges and standardizes the formatting of lrstar_main.cpp. The rearrangement allows the rmeoval of superflous forward declarations, and the reformatting ensures that the code has a standard format with standard formatting. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details This change updates the 'path' field from 256 characters to PATH_MAX. 256 is not enough to store a full Linux pathname (as defined by the C library). Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details One key to adoption is to use standard system methods for doing mundane tasks. Rolling a custom argument parser is nice, but using a system that is familiar to Linux programmers is better. Although the number of arguments the sample grammars currently take is zero, laying a good framework encourages experimentation and change. Besides updating lrstar_main.cpp, this change also adds the argument processor to the library so that it can be more easily reused by others making parsers. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details This change reduces the overall size of the lrstar_parser structure by moving three array variables that are used for printing the AST to function-local because they are only used by one function. The fourth variable of this family is used by two functions. To remove this (practically) unneeded field as well will require changing how the AST is written; that will come as a future change. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details The lrstar_parser constructor was formatted differently than all other functions. This change provides more consistency. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details Adding a destructor for lrstar_parser is another step towards being able to instantiate multiple parsers. Multiple parsers could be used to process multiple files provided on the command line, provide a service where multiple threads process individual files, or even support multiple languages. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details To further progress towards making it possible to easily process multiple files, and use multiple parsers in the same program, this change turns the generated 'generated_parser' variable into a pointer. It is automatically allocated at startup, and deleted at the end of the run. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details A bit of housekeeping here to move header files from the 'code' directory to the 'include' directory. This structure is more analogous to the distribution directory structure. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details The lrstar_main.h file is no longer needed to build the project or any of the sample grammars. Because the file is now dead, and less code is easier to maintain, the file is removed. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details The lrstar_mmain.cpp implementation of main() should be considered as a sampler for how to use lrstar system. In that light, work is underway to have lrstar generate a main in <grammar>_Main.cpp that will invoke the parsing code in lrstar_main.cpp. This will leave the parser project with a 'main' file that can be edited to perform desired work. This is a significant difference from the lrstar-generated code as it stands today because there is a shared file that is used for main() by each grammer; ediging that file will affect any other generated grammer. The new approach will be more flexible for the user. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details This trivial change removes #include lines that are no longer necessary. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details Update the text output when '--help' is specified to include the meaning of '--output'. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details Being able to effectively use a parser-generator is dependent on how easy the generated source is to use. This major change updates the system to make it significantly easier to use by the following: o Update lrstar to generate a 'user main' file. This new file will contain a simple use of the generated parser, and is to be used as the starting point to making whatever tool is desired. As generated, the file will parse a single file supplied on the command line, and then exit. Because lrstar will not overwrite this file if it exists, it can be modified to suit the developer's needs without fear that the work can be lost. o Delete lrstar_main.cpp Because a whole file is generated, there is no need to keep around a C++ file that used to be included by the generated parser sources. o Delete lrstar_cmdline.{cpp,h} The generated 'user main' file also has a skeletal getopt()-based command line processor. Because of this, the library version is no longer needed. o Each of the sample grammars now comes with the generated 'user main' file, and and updated Makefile. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details Now that the main parser is dynamically allocated, it makes sense to allow an '--iterations' option to the basic command line arguments to test for memory leaks. Unfortunately, the lrstar_parser code generates a SEGV in in symbol_name(int sti) in the Calc example on the second iteration. This failure led to work adding constructors to the types (a still unfinished work), and simplifying the use pattern of the parser (for example, init_lexer() is removed). At this point, the root cause of the SEGV has not been determined, but it is known that adding a constructor to the Symbol class causes the system to SEGV on the first run. This commit is a checkpoint of the work to add constructors and remove 'init' functions. The lrstar program generates new files with this change, but they will be committed in an upcoming change. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test Squash Squash

Details This change updates all the generated files to take '--iterations' and changes to the management of the lrstar_parser type in memory into account. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details A good way to find use of uninitialized memory is to initialize the memory to values that are known to cause failures when used. The effort to create constructors for the parser proper is predicated on that idea, but as noted in a previous commit, adding a constructor to Symbol caused the Calc example grammar to SEGV when attempting to print a symbol. Consequently, the Symbol class now has a constructor that sets all the fields to illegal values. Addtionally, an assert is used in symbol_name() to catch if any uninitialized Symbols instances are used. The cause of the SEGV was using an unitialized Symbol structure with a 'length' field set to ~0U, and a NULL 'start' field. Because the length field was non-zero, a copy of the symbol name was triggered, and then a NULL pointer was dereferenced. The root cause of the SEGV was use of the Symbol information from a Node that actually indicated 'no symbol is associated with this Node' (sti = 0, symbol = 0, line = 0). The ultimate solution for this problem is to define: A Node's having the { symbol, sti, line } fields all 0 has no Symbol associated. With that established via two new methods in the Node structure, the use of uninialized memory is now mitigated in lrstar_parser::tracer(). With this change it is possible / required to dyanamically allocate a parser (and many different or same can be allocated at the same time). This facilitates using multiple parsers in the same executable, at the same time. Testing To test that there were no memory leaks, the following was run: cd examples/Calc/ time -p \ /tmp/lrstar/release/examples/Calc/Calc \ --iterations 1000000 \ --output /dev/null \ test.input.txt >/dev/null Across a million runs, a memory leak would have been visible as a continual increase in memory utilization. But that was not seen. source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details This change is heavily inspired by 'mingodad' enabling '-Wall' in a pull request. The decision to go a bit further by turning all warnings into errors is also taken to ensure that the tools help the developers to write better code. In both tools, the following changes have been made: o Remove unused code & variables. o Remove variables that are assigned but never used. o Initialize variables that could be used uninitialized. o Cast 'char' array indexes to 'unsigned char' becuase the signed-ness of the 'char' type is implementation defined; if 'char' is signed, then negative array indexes would be used for values greater than 127. o Removed commented-out code in proximity to other changes. Future changes will increase the number of code-quality diagnostics turned on when building {dfa, lrstar}. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details Some versions of GCC coupled with default settings of Bash running under Ubuntu use non-ASCII characters to print diagnostic messages. This yields a less than aesthetic output because the messages will contain non-displayable characters. A fix for this problem is to set the LANG to 'C'. This change does that and as a result diagnostic messages look better now. Testing Visual inspection.

Details The lrstar program writes the ASCII value of { 0, 1 } to 'lrstar.txt' to communicate the return code it produced on the last run. If the value written to the file is not (ASCII) '0', dfa will silently terinate with a non-zero return code. This is only necessary if lrstar and dfa are run separately from each other, and while that may be desirable while iterating on a grammar, both tools are fast enough with the default settings have a wrapper script that manages the execution of both tools, and simplifies the overall workflow. For example, PLSQL, which creates ~2Mb of parser tables takes only 20 seconds to process both the syntax and lexical grammars on an old Intel NUC; most grammars will be much faster than that. C11 takes 0.21 seconds. To address this concern, and remove 'lrstar.txt', the following actions have been taken: o A new 'lrgen' script has been created. This is now the preferred way to build the syntax and lexical grammars. In its current form, it is executed like so: lrgen --directory <path of directory holding grammars> \ --grm <grammar name> [--grmopt <lrstar grammar options>] [--lgropt <dfa grammar options>] The syntax & lexical grammars must reside in the same directory, and they must have the same basename. This tool is the preferred way to generate the parser. o All lrstar.txt files are removed. o Both 'dfa' and 'lrstar' have been modified to not use 'lrstar.txt'. o precommit-test has been updated to use 'lrgen' rather than invoking each tool separately. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test

Details This change updates the lrgren tool to make the '--grm' argument optional. If the '--grm' argument is not provided, the last component of the '--directory' specification will be used as the grammar name. Additionally, the '--help' text is elaborated, and a few invocation examples are provided. Furthermore, 'lrgen' is now delivered as part of the build process. Finally, some additional help text has been added to resolve issues with invoking lrstar & dfa, as well as the semantics for finding the executables. Testing source ~/lrstar/lrstar/scripts/setup \ --build-type debug \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} rm -rf ${LRSTAR_BUILD_DIR} # A really clean build for testing. lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test source ~/lrstar/lrstar/scripts/setup \ --build-type release \ --bod /tmp/lrstar && \ cd ${LRSTAR_DIR} lrstar-build -j 20 clean lrstar-build -j 20 ${LRSTAR_DIR}/scripts/precommit-test <open a new shell prompt, and attempt to use lrgen> /tmp/lrstar/release/usr/local/bin/lrgen \ --directory <root>/lrstar/lrstar/grammars/ANTLR fatal: lrstar not found. See --help. PATH=${PATH}:/tmp/lrstar/release/usr/local/bin /tmp/lrstar/release/usr/local/bin/lrgen \ --directory <root>/lrstar/lrstar/grammars/ANTLR LRSTAR 24.0.017 32b Copyright Paul B Mann. ANTLR.grm Grammar 213 rules, 43 terminals, 126 nonterminals. States 305 states in LALR(1) state machine. 148 states when using shift-reduce actions. Conflicts 4 states, 4 shift-reduce, 2 reduce-reduce. rows cols matrix list vect total B matrix 51 x 38 x 1 = 1,938 -> 169 + 234 = 403 T matrix 9 x 38 x 2 = 684 -> 402 + 191 = 593 N matrix 29 x 76 x 2 = 4,408 -> 1,038 + 509 = 1,547 R matrix 9 x 14 x 1 = 121 -> 121 + 339 = 460 Total 1,730 + 1,273 = 3,003 0 min 0.011 sec, 31.416 MB, 0 warnings, 0 errors. DFA 24.0.016 32b Copyright Paul B Mann. ANTLR.lgr Grammar 1603 rules, 256 terminals, 63 nonterminals. States 5338 states before converting to a DFA. 136 states in final DFA state machine. Conflicts 0 states, 0 shift-reduce, 0 reduce-reduce. rows cols matrix list vect total T matrix 113 x 57 x 1 = 6,441 -> 6,441 + 664 = 7,105 0 min 0.095 sec, 38.784 MB, 0 warnings, 0 errors.

Details This change does the following: o Adds information about sample grammars contained in this parser-generator repository. o Attempts to ensure line breaks in markdown with two spaces at the end of lines. This may (or may not) work because Markdown is not fully standardized... Testing Visual inspection.

thutt added 30 commits October 18, 2023 21:28

[lrstar] Replace MAX_PATH with PATH_MAX

d839e84

Details Replace the non-standard MAX_PATH with PATH_MAX. The motivation is to use as many standard C identifiers as possible, to increase portability.

[src] Reformat code

76946aa

Details This change reformats the code to convert all tabs to 3 spaces, updates the emacs file local variables to turn off use of tabs and puts all global scoped symbols at column 0 in the files.

[grammars] Rename some grammar directories

3e94bbb

Details In preparation for a precommit test harness, the directories present in this change have been renamed so the directory name matches the grammar and lexical grammer files.

[scripts] Fix lrstar-path & dfa-path

d2fd9b6

Details Fix both shell functions so they return the current name of the directory in which the build has taken place. This function is necessary because the build-output-directory (BOD) will change based on the build type (debug, release).

[grammars] Update generated header files for precommit-harness

c92aae9

Details The header file generated by lrstar and dfa have been modified to use a 'typedef' instead of '#define' to create some types. This is an acceptable change to the output files that is being updated here.

[lrstar] Make prt_message private to CM_Global module

1653e7f

Details As prt_message() is not used outside the CM_Global module, it is now declared static to this file, and it's prototype is removed from the header file.

[src/lrstar] Untabify CM_Global.cpp

2b48b74

Details The change that reformatted all the source files to be tab-free missed this file. The oversight is now corrected. Diffs from git will now look better because tabs will not need to be expanded.

[README] Add basic setup & build information to README file

80019c9

Details This change simply adds some more information to the README.md file to help interested people configure a development enviornment and build the tools. The tools are not-yet easily usable on Linux, but they do build.

[grammars] Create new baseline files for precommit-test

0b08bda

Details This change checkpoints all the files produced by the 'precommit-test' that are either new or changed. This will be a new baseline that will be used to measure new commits.

[grammars] Update log files

29433f7

Details Updating lrstar / dfa to generate a Makefile caused new log files to be generated. This change checkpoints the log files for comparison against future changes.

[include] Correct lrstar_basic_defs.h guard

8f7e938

Details The file guard for this file was not changed when the file was renamed. This change corrects that oversight.

[examples] Reformat to not use tabs

7fb0af7

Details This trivial change adds emacs formatting comments to these files, converts them from MS-DOS to Unix format, and converts tabs to 3 spaces. Testing Visual inspection.

[code] Reformat library code

74ede02

Details To move forwards, reformat code to use spaces instead of tabs, and start the lexical scoping at column 0. No semantic changes, only formatting.

thutt and others added 29 commits November 30, 2023 21:27

Fix lexer prt_line when debugging the lexer

14e98d4

mingodad closed this Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix lex prt line #8

Fix lex prt line #8

mingodad commented Dec 18, 2023

Fix lex prt line #8

Fix lex prt line #8

Conversation

mingodad commented Dec 18, 2023