Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LPGF: Linearisation-only PGF format #103

Draft
wants to merge 118 commits into
base: master
Choose a base branch
from
Draft

LPGF: Linearisation-only PGF format #103

wants to merge 118 commits into from

Conversation

johnjcamilleri
Copy link
Member

Introduction

Recently I've been working on resurrecting on old idea, which is adding support for a PGF file format which only supports linearisation, since this is actually quite a common use for GF. The motivations are:

  1. Faster & less memory-intensive compilation
  2. Smaller binary files
  3. Faster linearisation at runtime
  4. New features impossible with parsing, e.g. dynamic lexicon.

The format itself is described in section 2 of the paper:

"PGF: A Portable Run-Time Format for Type-Theoretical Grammars"
Angelov, Bringert, Ranta (2009).
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.640.6330&rep=rep1&type=pdf

(where it is confusingly called "PGF"; what we call "PGF" today is really "PMCFG", section 3 of the same paper).

Progress so far

This draft pull request contains the following:

  1. An implementation of the LPGF format and runtime (src/runtime/haskell/LPGF.hs) which is correct w.r.t. the PGF and PGF2 implementations, with the exception of:
    1. Linearisation of missing functions (low priority)
    2. Variants, which are intentionally not supported
  2. Compilation from GF (canonical) to LPGF (src/compiler/GF/Compiler/GrammarToLPGF.hs), which can be used in the expected way: gf --make --output-format=lpgf ...
  3. Test suite with unit-test, Foods, and Phrasebook grammars for testing correctness.
  4. Benchmark for comparing performance between PGF, PGF2 and LPGF.

Notable ommisions

  • The LPGF runtime API needs some cleanup (in particular, one shouldn't need to import PGF to use LPGF).
  • The LPGF runtime should at least support type-checking of trees.
  • The GF shell doesn't support LPGF. Probably nice to have eventually, but not a priority either.
  • Bindings from the [Haskell] LPGF runtime to other languages (or actual implementations in other languages).

Performance

Unfortunately, so far I haven't been able to live up to all the performance goals:

  • LPGF files are [often] smaller than PGFs 😄
  • Runtime linearisation in LPGF is faster than both PGF and PGF2 🥳
  • Compiling to LPGF is at least as slow/memory-consuming as PGF, often significantly worse 😢

So my current focus is on trying to improve the performance of the LPGF compiler, with which I am struggling. I have done what I can with improving the data structures and algorithms used, but I am rather inexperienced with tinkering with strictness and other Haskell performance tuning. If anyone has more expertise in this area then please let me know and I can get more specific about where the bottlenecks are and what I've tried already. Until then, this pull request can remain open and be where any major updates to this project are made.

Still contains some hardcoded values, missing cases.

I notice now that LPGF and Canonical GF are almost identical, so maybe we don't need a new LPGF format,
just a linearization-only runtime which works on canonical grammars.
The argument for keeping LGPF is that it would be optimized for size and speed.
Also an unhandled Projection case
This avoids a lot of conversion back and forth between Strings and ByteStrings
# Conflicts:
#	gf.cabal
#	src/compiler/GF/Compile/GrammarToCanonical.hs
#	src/compiler/GF/Grammar/Canonical.hs
#	src/compiler/GF/Infra/Option.hs
# Conflicts:
#	gf.cabal
#	src/compiler/GF/Grammar/Canonical.hs
# Conflicts:
#	gf.cabal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant