-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a typed recursive ascent-descent backend #174
Changes from 8 commits
e50c825
75fa5f2
c940075
92efa8d
e403e9e
286f3bc
a5c67a6
5035c94
911473a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
name: happy | ||
version: 1.20.0 | ||
version: 1.21.0 | ||
license: BSD2 | ||
license-file: LICENSE | ||
copyright: (c) Andy Gill, Simon Marlow | ||
|
@@ -161,12 +161,14 @@ executable happy | |
build-depends: base < 5, | ||
array, | ||
containers >= 0.4.2, | ||
dom-lt >= 0.2.2, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NB for other reviewers: A new non-boot dependency (not sure if that's a big deal) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is some kind of a deal
Currently There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm. I wonder if we could remove the lower bound. Then we'd be fine, I think. AFAICT, the lower bound is because of AndreasPK/dom-lt#2, but if 0.2.0 and 0.2.1 were marked broken, the solver wouldn't pick these. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That issue says that It might be easier to say that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why exactly is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe the code would work. I honestly just haven't tried it with GHC 7 as I haven't used that old an GHC in years by now. If someone put's the effort in to make it work/test it with GHC 7 I have no issue lowering the bounds. |
||
text, | ||
mtl >= 2.2.1 | ||
-- mtl-2.2.1 added Control.Monad.Except | ||
|
||
default-language: Haskell98 | ||
default-extensions: CPP, MagicHash, FlexibleContexts | ||
ghc-options: -Wall | ||
ghc-options: -Wall -Wno-name-shadowing -Wno-missing-signatures -Wno-type-defaults -Wno-incomplete-patterns -Wno-unused-local-binds -Wno-unused-local-binds -Wno-unused-matches | ||
other-modules: | ||
Paths_happy | ||
AbsSyn | ||
|
@@ -186,6 +188,11 @@ executable happy | |
AttrGrammarParser | ||
ParamRules | ||
PrettyGrammar | ||
RADCodeGen | ||
RADCodeGen_LALR | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having thought about it a bit, I'd prefer it if you named the LALR backend "RA" for recursive ascent instead (so perhaps ( I'm also unsure if it's actually worth integrating the vanilla RA backend. I don't see a reason to pick it over the RAD backend. Relatedly: The other backends seem to be named according to the convention There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True. There seems to be no advantage of the recursive ascent backend over the recursive ascent descent backend. |
||
RADStateGen | ||
RADTools | ||
Follow | ||
|
||
test-suite tests | ||
type: exitcode-stdio-1.0 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
module Follow where | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's pretty uncommon in the Haskell code I've seen so far to indent declaration after the Also I'd like to see an explicit export list, which makes it simpler to understand what the API is. |
||
import Grammar | ||
import RADTools | ||
import Control.Monad | ||
import Control.Monad.ST | ||
import Data.Array.ST | ||
import GHC.Arr | ||
import Data.List (findIndices, tails) | ||
import NameSet (NameSet, empty, fromList, union, unions, delete, member, singleton) | ||
|
||
-- Calculate the follow sets for all nonterminals in the grammar. | ||
followArray :: Grammar -> ([Name] -> NameSet) -> Array Name NameSet | ||
followArray g first = runST $ do | ||
let bounds = liftM2 (,) head last (non_terminals g) | ||
arr <- newArray bounds empty | ||
startSymbols arr | ||
updateRepeatedly arr first | ||
elems <- getElems arr | ||
return (listArray bounds elems) | ||
where | ||
startSymbols :: (STArray s Int NameSet) -> ST s () | ||
startSymbols arr = do | ||
mapM_ (setEOF arr) (starts g) | ||
setEOF :: (STArray s Int NameSet) -> (a, Int, b, c) -> ST s () | ||
setEOF arr (_, nt, _, _) = writeArray arr nt (singleton (eof_term g)) | ||
|
||
updateRepeatedly :: (STArray s Int NameSet) -> ([Name] -> NameSet) -> ST s () | ||
updateRepeatedly arr first = do | ||
old <- getElems arr | ||
updateStep arr first | ||
new <- getElems arr | ||
if old == new then return () else updateRepeatedly arr first | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally, you would reuse/refactor |
||
|
||
updateStep :: (STArray s Int NameSet) -> ([Name] -> NameSet) -> ST s () | ||
updateStep arr first = mapM_ (updateRule arr first) (productions g) | ||
|
||
updateRule :: (STArray s Int NameSet) -> ([Name] -> NameSet) -> Production -> ST s () | ||
updateRule arr first (Production lhs rhs _ _) = mapM_ (updateNT arr lhs first) (tails rhs) | ||
|
||
updateNT :: (STArray s Int NameSet) -> Name -> ([Name] -> NameSet) -> [Name] -> ST s () | ||
updateNT _ _ _ [] = return () | ||
updateNT arr lhs first (tok:rhsRest) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: It should be |
||
| not (elem tok (non_terminals g)) = return () | ||
| otherwise = do | ||
let first' = first rhsRest | ||
let first'' = delete epsilonTok first' | ||
follow_lhs <- readArray arr lhs | ||
let new_follow = union first'' (if member epsilonTok first' then follow_lhs else empty) | ||
old_follow <- readArray arr tok | ||
writeArray arr tok (union old_follow new_follow) | ||
|
||
-- The lc-follow set of a single nonterminal, given a full "follow" array. | ||
-- We only use rules where NT appears after the recognition point. If this is the case, enter into FOLLOW (not into LCFOLLOW), | ||
-- i.e. recursive rules are processed as normal, irrespective of their recognition points. | ||
lcfollow :: XGrammar -> ([Name] -> NameSet) -> (Array Name NameSet) -> Name -> NameSet | ||
lcfollow x@(XGrammar { g = g }) first follow nt | ||
| member nt startSymbols = union (singleton (eof_term g)) rest | ||
| otherwise = rest | ||
where | ||
startSymbols = fromList $ map (\(_, a, _, _) -> a) (starts g) | ||
|
||
rest = unions $ map (uncurry process) rules | ||
rules = filter (rhsContains nt) (zip [0..] (productions g)) | ||
rhsContains nt (_, (Production _ rhs _ _)) = elem nt rhs | ||
|
||
process :: Int -> Production -> NameSet | ||
process ruleIndex (Production lhs rhs _ _) = unions $ map process' $ (reverse (findIndices (== nt) rhs)) where | ||
process' i | ||
| i < ((recognitionPoints x) !! ruleIndex) = empty | ||
| member epsilonTok first_b = union (delete epsilonTok first_b) (follow ! lhs) | ||
| otherwise = first_b | ||
where | ||
first_b = first (drop (i+1) rhs) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,9 @@ Path settings auto-generated by Cabal: | |
> import ProduceCode (produceParser) | ||
> import ProduceGLRCode | ||
> import Info (genInfoFile) | ||
> import qualified RADCodeGen as RAD | ||
> import qualified RADCodeGen_LALR as LALR | ||
> import RADStateGen | ||
> import Target (Target(..)) | ||
> import System.Console.GetOpt | ||
> import Control.Monad ( liftM ) | ||
|
@@ -30,6 +33,7 @@ Path settings auto-generated by Cabal: | |
> import System.IO | ||
> import Data.Array( assocs, elems, (!) ) | ||
> import Data.List( nub, isSuffixOf ) | ||
> import Data.Maybe (fromMaybe) | ||
> import Data.Version ( showVersion ) | ||
#if defined(mingw32_HOST_OS) | ||
> import Foreign.Marshal.Array | ||
|
@@ -202,6 +206,84 @@ Add any special options or imports required by the parsing machinery. | |
> ) | ||
> in | ||
|
||
------------------------------------- | ||
Branch off to continuation-based LALR parser production: | ||
|
||
> getForallMatch cli >>= \forallMatch -> | ||
> let showTypes = OptCB_ShowTypes `elem` cli || rank2Types | ||
> showComments = OptCB_ShowComments `elem` cli | ||
> rank2Types = maybe False (return True) forallMatch | ||
> match = fromMaybe "" forallMatch | ||
> rulesTupleBased = OptCB_RAD_TupleBased `elem` cli | ||
> in | ||
|
||
> if OptCB_LALR `elem` cli | ||
> then | ||
> let (isMonad, _, parserType, _, _) = monad g | ||
> | ||
> ptype = case (Grammar.lexer g, isMonad) of | ||
> (Nothing, False) -> LALR.Normal | ||
> (Nothing, True) -> LALR.Monad | ||
> (Just _, False) -> error "%lexer without %monad not supported in RAD" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should think about whether and how we can support it. |
||
> (Just _, True) -> LALR.MonadLexer | ||
> | ||
> options = LALR.GenOptions { | ||
> LALR.ptype = ptype, | ||
> LALR.wrapperType = if parserType == "Parser" then "HappyP" else "Parser", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, this can't be right, can it? The info you seek is encoded in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, this wrapperType is a wrapper around the user-supplied monad type (see 4.2.1 in here: the user-defined monad is called |
||
> LALR.errorTokenType = "ErrorToken", | ||
> LALR.header = fromMaybe "" hd, | ||
> LALR.footer = fromMaybe "" tl, | ||
> LALR.showTypes = showTypes, | ||
> LALR.comments = showComments, | ||
> LALR.rank2Types = rank2Types, | ||
> LALR.forallMatch = match | ||
> } | ||
> | ||
> lalrStates = generateLALRStates g action goto items2 | ||
> in | ||
> LALR.genCode options g lalrStates action goto >>= | ||
> (if outfilename == "-" then putStr else writeFile outfilename) | ||
> | ||
> else | ||
|
||
Branch off to continuation-based RAD parser production: | ||
|
||
> if OptCB_RAD `elem` cli || OptCB_RAD_TupleBased `elem` cli | ||
> then | ||
> | ||
> let (isMonad, _, parserType, _, _) = monad g | ||
> | ||
> optimize = OptCB_RAD_Optimizations `elem` cli | ||
> | ||
> ptype = case (Grammar.lexer g, isMonad) of | ||
> (Nothing, False) -> RAD.Normal | ||
> (Nothing, True) -> RAD.Monad | ||
> (Just _, False) -> error "%lexer without %monad not supported in RAD" | ||
> (Just _, True) -> RAD.MonadLexer | ||
> | ||
> options = RAD.GenOptions { | ||
> RAD.ptype = ptype, | ||
> RAD.wrapperType = if parserType == "Parser" then "HappyP" else "Parser", | ||
> RAD.errorTokenType = "ErrorToken", | ||
> RAD.header = fromMaybe "" hd, | ||
> RAD.footer = fromMaybe "" tl, | ||
> RAD.showTypes = showTypes, | ||
> RAD.comments = showComments, | ||
> RAD.rank2Types = rank2Types, | ||
> RAD.rulesTupleBased = rulesTupleBased, | ||
> RAD.forallMatch = match, | ||
> RAD.optimize = optimize | ||
> } | ||
> | ||
> lalrStates = generateLALRStates g action goto items2 | ||
> in | ||
> createXGrammar g lalrStates >>= \x -> | ||
> generateRADStates x lalrStates unused_rules >>= \radStates -> | ||
> RAD.genCode options x radStates action goto >>= | ||
> (if outfilename == "-" then putStr else writeFile outfilename) | ||
> | ||
> else | ||
|
||
|
||
%--------------------------------------- | ||
Branch off to GLR parser production | ||
|
@@ -403,6 +485,14 @@ The command line arguments. | |
> | OptGLR | ||
> | OptGLR_Decode | ||
> | OptGLR_Filter | ||
> | ||
> | OptCB_RAD | ||
> | OptCB_RAD_TupleBased | ||
> | OptCB_RAD_Optimizations | ||
> | OptCB_LALR | ||
> | OptCB_ShowTypes | ||
> | OptCB_ShowComments | ||
> | OptCB_ForallMatch (Maybe String) | ||
> deriving Eq | ||
|
||
> argInfo :: [OptDescr CLIFlags] | ||
|
@@ -436,7 +526,21 @@ The command line arguments. | |
> Option ['?'] ["help"] (NoArg DumpHelp) | ||
> "display this help and exit", | ||
> Option ['V','v'] ["version"] (NoArg DumpVersion) -- ToDo: -v is deprecated | ||
> "output version information and exit" | ||
> "output version information and exit", | ||
> Option [] ["cb-rad"] (NoArg OptCB_RAD) | ||
> "create a continuation-based Recursive Ascent-Descent parser. Not compatible with most other options", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sigh. The ship about a consistent command-line interface has long sailed, I guess, but for a |
||
> Option [] ["cb-rad-tuple"] (NoArg OptCB_RAD_TupleBased) | ||
> "same as cb-rad, but uses tuples instead of continuations inside rule functions", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we want to integrate the |
||
> Option [] ["optims"] (NoArg OptCB_RAD_Optimizations) | ||
> "add optimizations such as eta-expansions and explicit rule-inlining to a RAD parser", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly, I don't see a reason why we wouldn't want to do the optimisations. Kill the flag, do it always. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess having ability to disable stuff for debugging purposes might be helpful? |
||
> Option [] ["cb-lalr"] (NoArg OptCB_LALR) | ||
> "create a continuation-based LALR parser. Not compatible with most other options", | ||
> Option [] ["types"] (NoArg OptCB_ShowTypes) | ||
> "show function types for continuation-based parsers", | ||
> Option [] ["comments"] (NoArg OptCB_ShowComments) | ||
> "show comments for continuation-based parsers", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we could instead provide this useful info in the accompanying info file, generated with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, makes sense. The RAD-specific info file generation is a point on the todo list. Actually, as the comments do not increase compiled code size, I would simply always include them in the generated code. There is no benefit in not adding comments. (Comments of state functions just state the core items of the respective state, while comments of rule functions state the rule with its recognition point. Both extremely useful when looking at the generated code.) |
||
> Option [] ["forall"] (OptArg OptCB_ForallMatch "MATCH") | ||
> "a string which is used to detect and handle higher-rank function types" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need a flag for this. People that want to use higher-rank types should write a signature with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought the same, but what about a grammar which has a semantic type like |
||
|
||
#ifdef DEBUG | ||
|
||
|
@@ -572,6 +676,13 @@ Extract various command-line options. | |
> [] -> return Nothing | ||
> f:fs -> return (Just (map toLower (last (f:fs)))) | ||
|
||
> getForallMatch :: [CLIFlags] -> IO (Maybe String) | ||
> getForallMatch cli | ||
> = case [ s | (OptCB_ForallMatch s) <- cli ] of | ||
> [] -> return Nothing | ||
> [f] -> return f | ||
> _many -> dieHappy "multiple --forall options\n" | ||
|
||
> getCoerce :: Target -> [CLIFlags] -> IO Bool | ||
> getCoerce _target cli | ||
> = if OptUseCoercions `elem` cli | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... than a recursive ascent parser, which is an irrelevant comparison if all you know is the table-based LALR code generation scheme. Which is in fact smaller, so this is a bit misleading.