Skip to content

Create native Ruby extensions from (almost) any ANTLR4 grammar.

License

Notifications You must be signed in to change notification settings

lutaml/antlr4-native-rb

 
 

Repository files navigation

antlr4-native

Create a Ruby native extension from (almost) any ANTLR4 grammar.

What is this thing?

This gem generates native Ruby extensions from ANTLR grammars, enabling Ruby developers to generate parsers for numerous programming languages, file formats, etc.

Who needs this?

If you're a Ruby programmer who wants to parse and traverse source code written in a plethora of programming languages, antlr4-native might be able to help you. A number of community-developed ANTLR grammars are available in ANTLR's grammars-v4 repo. Grab one, then use antlr4-native to generate a bunch of Ruby-compatible C++ code from it. The C++ code can be compiled and used as a native extension.

Rather than use antlr4-native directly, consider using its sister project, the antlr-gemerator, which can generate a complete rubygem from an ANTLR grammar.

Code Generation

Here's how to generate a native extension for a given lexer and parser (Python in this case), defined in two .g4 files:

require 'antlr4-native'

generator = Antlr4Native::Generator.new(
  grammar_files:      ['Python3Lexer.g4', 'Python3Parser.g4'],
  output_dir:         'ext',
  parser_root_method: 'file_input'
)

generator.generate

In the example above, the output directory is set to the standard Ruby native extensions directory, 'ext'. Antlr4-native will generate code into ext/<name>, where <name> is the name of the parser as defined in the grammar file(s). In this case, PythonParser.g4 contains:

parser grammar Python3Parser;

so antlr4-native will generate code into the ext/python3-parser directory.

Finally, the parser_root_method option tells antlr4-native which context represents the root of the parse tree. This context functions as the starting point for visitors.

Using extensions in Ruby

Parsers contain several methods for parsing source code. Use #parse to parse a string and #parse_file to parse the contents of a file:

parser = Python3Parser::Parser.parse(File.read('path/to/file.py'))

# equivalent to:
parser = Python3Parser::Parser.parse_file('path/to/file.py')

Use the #visit method on an instance of Parser to make use of a visitor:

visitor = MyVisitor.new
parser.visit(visitor)

See the next section for more info regarding creating and using visitors.

Visitors

A visitor class is automatically created during code generation. Visitors are just classes with a bunch of special methods, each corresponding to a specific part of the source language's syntax. The methods are essentially callbacks that are triggered in-order as the parser walks over the parse tree. For example, here's a visitor with a method that will be called whenever the parser walks over a Python function definition:

class FuncDefVisitor < Python3Parser::Visitor
  def visit_func_def(ctx)
    puts ctx.NAME.text  # print the name of the method
    visit_children(ctx)
  end
end

Make sure to always call #visit_children at some point in your visit_* methods. If you don't, the subtree under the current context won't get visited.

Finally, if you override #initialize in your visitor subclasses, don't forget to call super. If you don't, you'll get a nice big segfault.

Caveats

  1. Avoid retaining references to contexts, tokens, etc anywhere in your Ruby code. Contexts (i.e. the ctx variables in the examples above) and other objects that are created by ANTLR's C++ runtime are automatically cleaned up without the Ruby interpreter's knowledge. You'll almost surely see a segfault if you retain a reference to one of these objects and try to use it after the call to Parser#visit.
  2. Due to an ANTLR limitation, parsers cannot be used in a multi-threaded environment, even if each parser instance is used entirely in the context of a single thread (i.e. parsers are not shared between threads). According to the ANTLR C++ developers, parsers should be threadsafe. Unfortunately firsthand experience has proven otherwise. Your mileage may vary.
  3. The description of this gem says "(almost) any ANTLR4 grammar" because many grammars contain target-specific code. For example, the Python3 grammar referenced in the examples above contains inline Java code that the C++ compiler won't understand. You'll need to port any such code to C++ before you'll be able to compile and use the native extension.

System Requirements

  • A Java runtime (version 1.6 or higher) is required to generate parsers, since ANTLR is a Java tool. The ANTLR .jar file is distributed inside the antlr4-native gem, so there's no need to download it separately. You can download a Java runtime here.
  • Ruby >= 2.3.
  • A C compiler (like gcc or clang) that supports C++14. If Ruby is working on your machine then you likely already have this.

License

Licensed under the MIT license. See LICENSE.txt for details.

Authors

About

Create native Ruby extensions from (almost) any ANTLR4 grammar.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 79.3%
  • ANTLR 18.6%
  • Dockerfile 1.3%
  • Shell 0.8%