Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a typed recursive ascent-descent backend #174

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
-----------------------------------------------------------------------------
1.21.0

* Add two new backends:
* Continuation-based LALR(1) (recursive ascent)
* Continuation-based RAD(1) (recursive ascent-descent)
* RAD generally produces smaller compiled code using less states
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... than a recursive ascent parser, which is an irrelevant comparison if all you know is the table-based LALR code generation scheme. Which is in fact smaller, so this is a bit misleading.

* Continuation-based form introduces a speed advantage over
table-based form

-----------------------------------------------------------------------------
1.20.0

Expand Down
Empty file added cabal.project.local
Empty file.
11 changes: 9 additions & 2 deletions happy.cabal
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: happy
version: 1.20.0
version: 1.21.0
license: BSD2
license-file: LICENSE
copyright: (c) Andy Gill, Simon Marlow
Expand Down Expand Up @@ -161,12 +161,14 @@ executable happy
build-depends: base < 5,
array,
containers >= 0.4.2,
dom-lt >= 0.2.2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB for other reviewers: A new non-boot dependency (not sure if that's a big deal)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is some kind of a deal

Currently happy is buildable with virtually any GHC you can get your hand on. With dom-lt it won't be true anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I wonder if we could remove the lower bound. Then we'd be fine, I think. AFAICT, the lower bound is because of AndreasPK/dom-lt#2, but if 0.2.0 and 0.2.1 were marked broken, the solver wouldn't pick these.

Copy link
Contributor

@phadej phadej Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That issue says that dom-lt-0.1.3 is broken as well, so that won't work.

It might be easier to say that happy-next would work only with GHC-8.0+. I think it is fine, not a small issue, but not huge either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly is dom-lt-0.2.2 not compatible with GHC-7? Maybe this can be dealt with in dom-lt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly is dom-lt-0.2.2 not compatible with GHC-7

Maybe the code would work. I honestly just haven't tried it with GHC 7 as I haven't used that old an GHC in years by now.

If someone put's the effort in to make it work/test it with GHC 7 I have no issue lowering the bounds.

text,
mtl >= 2.2.1
-- mtl-2.2.1 added Control.Monad.Except

default-language: Haskell98
default-extensions: CPP, MagicHash, FlexibleContexts
ghc-options: -Wall
ghc-options: -Wall -Wno-name-shadowing -Wno-missing-signatures -Wno-type-defaults -Wno-incomplete-patterns -Wno-unused-local-binds -Wno-unused-local-binds -Wno-unused-matches
other-modules:
Paths_happy
AbsSyn
Expand All @@ -186,6 +188,11 @@ executable happy
AttrGrammarParser
ParamRules
PrettyGrammar
RADCodeGen
RADCodeGen_LALR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about it a bit, I'd prefer it if you named the LALR backend "RA" for recursive ascent instead (so perhaps (RACodeGen). Because that's exactly what it is.

I'm also unsure if it's actually worth integrating the vanilla RA backend. I don't see a reason to pick it over the RAD backend.

Relatedly: The other backends seem to be named according to the convention Produce*Code. Stick to that. So ProduceRADCode, etc..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. There seems to be no advantage of the recursive ascent backend over the recursive ascent descent backend.

RADStateGen
RADTools
Follow

test-suite tests
type: exitcode-stdio-1.0
Expand Down
73 changes: 73 additions & 0 deletions src/Follow.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
module Follow where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty uncommon in the Haskell code I've seen so far to indent declaration after the module declaration with two spaces. I'd prefer we stick to the formatting conventions that are already present in happy.

Also I'd like to see an explicit export list, which makes it simpler to understand what the API is.

import Grammar
import RADTools
import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import GHC.Arr
import Data.List (findIndices, tails)
import NameSet (NameSet, empty, fromList, union, unions, delete, member, singleton)

-- Calculate the follow sets for all nonterminals in the grammar.
followArray :: Grammar -> ([Name] -> NameSet) -> Array Name NameSet
followArray g first = runST $ do
let bounds = liftM2 (,) head last (non_terminals g)
arr <- newArray bounds empty
startSymbols arr
updateRepeatedly arr first
elems <- getElems arr
return (listArray bounds elems)
where
startSymbols :: (STArray s Int NameSet) -> ST s ()
startSymbols arr = do
mapM_ (setEOF arr) (starts g)
setEOF :: (STArray s Int NameSet) -> (a, Int, b, c) -> ST s ()
setEOF arr (_, nt, _, _) = writeArray arr nt (singleton (eof_term g))

updateRepeatedly :: (STArray s Int NameSet) -> ([Name] -> NameSet) -> ST s ()
updateRepeatedly arr first = do
old <- getElems arr
updateStep arr first
new <- getElems arr
if old == new then return () else updateRepeatedly arr first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, you would reuse/refactor GenUtils.mkClosure so that it could be used from both First.hs and Follow.hs. I see that this implementation might be more efficient than the one in First. I wonder: Couldn't it entirely be an implementation detail that updateStep modifies an STArray? Probably not, because of the mutability. And the definition of mkClosure is quite tiny, so maybe the code is fine as it is.


updateStep :: (STArray s Int NameSet) -> ([Name] -> NameSet) -> ST s ()
updateStep arr first = mapM_ (updateRule arr first) (productions g)

updateRule :: (STArray s Int NameSet) -> ([Name] -> NameSet) -> Production -> ST s ()
updateRule arr first (Production lhs rhs _ _) = mapM_ (updateNT arr lhs first) (tails rhs)

updateNT :: (STArray s Int NameSet) -> Name -> ([Name] -> NameSet) -> [Name] -> ST s ()
updateNT _ _ _ [] = return ()
updateNT arr lhs first (tok:rhsRest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It should be sym, not tok, because it's ranging over non-terminals and terminals. In fact, the first guard makes sure it's always a non-terminal, never a token (terminal).

| not (elem tok (non_terminals g)) = return ()
| otherwise = do
let first' = first rhsRest
let first'' = delete epsilonTok first'
follow_lhs <- readArray arr lhs
let new_follow = union first'' (if member epsilonTok first' then follow_lhs else empty)
old_follow <- readArray arr tok
writeArray arr tok (union old_follow new_follow)

-- The lc-follow set of a single nonterminal, given a full "follow" array.
-- We only use rules where NT appears after the recognition point. If this is the case, enter into FOLLOW (not into LCFOLLOW),
-- i.e. recursive rules are processed as normal, irrespective of their recognition points.
lcfollow :: XGrammar -> ([Name] -> NameSet) -> (Array Name NameSet) -> Name -> NameSet
lcfollow x@(XGrammar { g = g }) first follow nt
| member nt startSymbols = union (singleton (eof_term g)) rest
| otherwise = rest
where
startSymbols = fromList $ map (\(_, a, _, _) -> a) (starts g)

rest = unions $ map (uncurry process) rules
rules = filter (rhsContains nt) (zip [0..] (productions g))
rhsContains nt (_, (Production _ rhs _ _)) = elem nt rhs

process :: Int -> Production -> NameSet
process ruleIndex (Production lhs rhs _ _) = unions $ map process' $ (reverse (findIndices (== nt) rhs)) where
process' i
| i < ((recognitionPoints x) !! ruleIndex) = empty
| member epsilonTok first_b = union (delete epsilonTok first_b) (follow ! lhs)
| otherwise = first_b
where
first_b = first (drop (i+1) rhs)
5 changes: 4 additions & 1 deletion src/Grammar.lhs
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,16 @@ Here is our mid-section datatype

> data Production
> = Production Name [Name] (String,[Int]) Priority
> deriving (Eq

#ifdef DEBUG

> deriving Show
> , Show

#endif

> )

> data Grammar
> = Grammar {
> productions :: [Production],
Expand Down
12 changes: 2 additions & 10 deletions src/LALR.lhs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Generation of LALR parsing tables.
> module LALR
> (genActionTable, genGotoTable, genLR0items, precalcClosure0,
> propLookaheads, calcLookaheads, mergeLookaheadInfo, countConflicts,
> Lr0Item(..), Lr1Item)
> Lr0Item(..), Lr1Item(..))
> where

> import GenUtils
Expand All @@ -35,15 +35,7 @@ Generation of LALR parsing tables.
This means rule $a$, with dot at $b$ (all starting at 0)

> data Lr0Item = Lr0 {-#UNPACK#-}!Int {-#UNPACK#-}!Int -- (rule, dot)
> deriving (Eq,Ord

#ifdef DEBUG

> ,Show

#endif

> )
> deriving (Eq,Ord,Show)

> data Lr1Item = Lr1 {-#UNPACK#-}!Int {-#UNPACK#-}!Int NameSet -- (rule, dot, lookahead)

Expand Down
113 changes: 112 additions & 1 deletion src/Main.lhs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ Path settings auto-generated by Cabal:
> import ProduceCode (produceParser)
> import ProduceGLRCode
> import Info (genInfoFile)
> import qualified RADCodeGen as RAD
> import qualified RADCodeGen_LALR as LALR
> import RADStateGen
> import Target (Target(..))
> import System.Console.GetOpt
> import Control.Monad ( liftM )
Expand All @@ -30,6 +33,7 @@ Path settings auto-generated by Cabal:
> import System.IO
> import Data.Array( assocs, elems, (!) )
> import Data.List( nub, isSuffixOf )
> import Data.Maybe (fromMaybe)
> import Data.Version ( showVersion )
#if defined(mingw32_HOST_OS)
> import Foreign.Marshal.Array
Expand Down Expand Up @@ -202,6 +206,84 @@ Add any special options or imports required by the parsing machinery.
> )
> in

-------------------------------------
Branch off to continuation-based LALR parser production:

> getForallMatch cli >>= \forallMatch ->
> let showTypes = OptCB_ShowTypes `elem` cli || rank2Types
> showComments = OptCB_ShowComments `elem` cli
> rank2Types = maybe False (return True) forallMatch
> match = fromMaybe "" forallMatch
> rulesTupleBased = OptCB_RAD_TupleBased `elem` cli
> in

> if OptCB_LALR `elem` cli
> then
> let (isMonad, _, parserType, _, _) = monad g
>
> ptype = case (Grammar.lexer g, isMonad) of
> (Nothing, False) -> LALR.Normal
> (Nothing, True) -> LALR.Monad
> (Just _, False) -> error "%lexer without %monad not supported in RAD"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about whether and how we can support it.

> (Just _, True) -> LALR.MonadLexer
>
> options = LALR.GenOptions {
> LALR.ptype = ptype,
> LALR.wrapperType = if parserType == "Parser" then "HappyP" else "Parser",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this can't be right, can it? The info you seek is encoded in the Grammar's monad field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this wrapperType is a wrapper around the user-supplied monad type (see 4.2.1 in here: the user-defined monad is called P, and we define a wrapper and call it Parser). This wrapper is always called "Parser", except when the Grammar's monad field is also called "Parser"; then we call the wrapper type "HappyP".

> LALR.errorTokenType = "ErrorToken",
> LALR.header = fromMaybe "" hd,
> LALR.footer = fromMaybe "" tl,
> LALR.showTypes = showTypes,
> LALR.comments = showComments,
> LALR.rank2Types = rank2Types,
> LALR.forallMatch = match
> }
>
> lalrStates = generateLALRStates g action goto items2
> in
> LALR.genCode options g lalrStates action goto >>=
> (if outfilename == "-" then putStr else writeFile outfilename)
>
> else

Branch off to continuation-based RAD parser production:

> if OptCB_RAD `elem` cli || OptCB_RAD_TupleBased `elem` cli
> then
>
> let (isMonad, _, parserType, _, _) = monad g
>
> optimize = OptCB_RAD_Optimizations `elem` cli
>
> ptype = case (Grammar.lexer g, isMonad) of
> (Nothing, False) -> RAD.Normal
> (Nothing, True) -> RAD.Monad
> (Just _, False) -> error "%lexer without %monad not supported in RAD"
> (Just _, True) -> RAD.MonadLexer
>
> options = RAD.GenOptions {
> RAD.ptype = ptype,
> RAD.wrapperType = if parserType == "Parser" then "HappyP" else "Parser",
> RAD.errorTokenType = "ErrorToken",
> RAD.header = fromMaybe "" hd,
> RAD.footer = fromMaybe "" tl,
> RAD.showTypes = showTypes,
> RAD.comments = showComments,
> RAD.rank2Types = rank2Types,
> RAD.rulesTupleBased = rulesTupleBased,
> RAD.forallMatch = match,
> RAD.optimize = optimize
> }
>
> lalrStates = generateLALRStates g action goto items2
> in
> createXGrammar g lalrStates >>= \x ->
> generateRADStates x lalrStates unused_rules >>= \radStates ->
> RAD.genCode options x radStates action goto >>=
> (if outfilename == "-" then putStr else writeFile outfilename)
>
> else


%---------------------------------------
Branch off to GLR parser production
Expand Down Expand Up @@ -403,6 +485,14 @@ The command line arguments.
> | OptGLR
> | OptGLR_Decode
> | OptGLR_Filter
>
> | OptCB_RAD
> | OptCB_RAD_TupleBased
> | OptCB_RAD_Optimizations
> | OptCB_LALR
> | OptCB_ShowTypes
> | OptCB_ShowComments
> | OptCB_ForallMatch (Maybe String)
> deriving Eq

> argInfo :: [OptDescr CLIFlags]
Expand Down Expand Up @@ -436,7 +526,21 @@ The command line arguments.
> Option ['?'] ["help"] (NoArg DumpHelp)
> "display this help and exit",
> Option ['V','v'] ["version"] (NoArg DumpVersion) -- ToDo: -v is deprecated
> "output version information and exit"
> "output version information and exit",
> Option [] ["cb-rad"] (NoArg OptCB_RAD)
> "create a continuation-based Recursive Ascent-Descent parser. Not compatible with most other options",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not compatible with most other options

Sigh. The ship about a consistent command-line interface has long sailed, I guess, but for a happy-2.0 we badly need a refactoring of all the possible modes here. Some flags only concern certain backends and should emit a warning or an error if used in unforeseen contexts.

> Option [] ["cb-rad-tuple"] (NoArg OptCB_RAD_TupleBased)
> "same as cb-rad, but uses tuples instead of continuations inside rule functions",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to integrate the RAD-Tuple backend. There's no advantage to it, it was merely an experiment. Kill it.

> Option [] ["optims"] (NoArg OptCB_RAD_Optimizations)
> "add optimizations such as eta-expansions and explicit rule-inlining to a RAD parser",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I don't see a reason why we wouldn't want to do the optimisations. Kill the flag, do it always.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess having ability to disable stuff for debugging purposes might be helpful?

> Option [] ["cb-lalr"] (NoArg OptCB_LALR)
> "create a continuation-based LALR parser. Not compatible with most other options",
> Option [] ["types"] (NoArg OptCB_ShowTypes)
> "show function types for continuation-based parsers",
> Option [] ["comments"] (NoArg OptCB_ShowComments)
> "show comments for continuation-based parsers",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could instead provide this useful info in the accompanying info file, generated with -iparser.info, I believe. I don't really want to introduce a new flag that is there simply for debugging purposes.

Copy link
Contributor Author

@knothed knothed Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. The RAD-specific info file generation is a point on the todo list.

Actually, as the comments do not increase compiled code size, I would simply always include them in the generated code. There is no benefit in not adding comments.

(Comments of state functions just state the core items of the respective state, while comments of rule functions state the rule with its recognition point. Both extremely useful when looking at the generated code.)

> Option [] ["forall"] (OptArg OptCB_ForallMatch "MATCH")
> "a string which is used to detect and handle higher-rank function types"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a flag for this. People that want to use higher-rank types should write a signature with forall, end of discussion. There's no situation where that wouldn't be possible. So kill the flag.

Copy link
Contributor Author

@knothed knothed Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same, but what about a grammar which has a semantic type like ForallStatement? Or different variations of the word forall? What should be the string that is used to correctly match for an actual forall-expression?


#ifdef DEBUG

Expand Down Expand Up @@ -572,6 +676,13 @@ Extract various command-line options.
> [] -> return Nothing
> f:fs -> return (Just (map toLower (last (f:fs))))

> getForallMatch :: [CLIFlags] -> IO (Maybe String)
> getForallMatch cli
> = case [ s | (OptCB_ForallMatch s) <- cli ] of
> [] -> return Nothing
> [f] -> return f
> _many -> dieHappy "multiple --forall options\n"

> getCoerce :: Target -> [CLIFlags] -> IO Bool
> getCoerce _target cli
> = if OptUseCoercions `elem` cli
Expand Down
Loading