diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..8f71f43 --- /dev/null +++ b/LICENSE @@ -0,0 +1,202 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "{}" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright {yyyy} {name of copyright owner} + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + diff --git a/README.md b/README.md new file mode 100644 index 0000000..00c7dee --- /dev/null +++ b/README.md @@ -0,0 +1,45 @@ +# erlml +erlml - an ML implementation for the Erlang/OTP VM + +ErlML +===== + +ErlML is an implementation of the ML programming language, running +on the Erlang/OTP virtual machine. + +ErlML currently aims for SML'97 compatibility, in order to facilitate +bootstrapping, but will likely later evolve into a unique ML dialect +better suited for the Erlang/OTP VM. + +ErlML is written in a subset of SML'97 (with runtime support written +in Erlang), and compiles SML'97 to Core Erlang and then to BEAM code. + +Status +====== + +ErlML currently requires an SML'97 compiler for compiling itself. +Moscow ML 2.10 is known to work. + +The generated Core Erlang code requires an Erlang/OTP installation for +compiling it to BEAM and then executing it. OTP-20 is known to work. + +Features: +- ErlML performs separate compilation of individual signatures and + structures. It persists derived information about these compilation + units in .basis files, and fetches those automatically when unbound + signatures or structures are referenced. It is required that each + source file contains exactly ONE signature (.sig file) or structure + (.sml file) declaration with the same name as the source file (except + for the .sig or .sml extension). +- Eventually ErlML will support interfacing with Erlang, but that is + not yet implemented. + +Omissions: +- functors are not supported, and may never be +- sub-structures are not supported, and may never be +- sharing constrains are not supported, and may never be +- exceptions are currently not SML-compliant +- support for the SML Basis Library is incomplete, but improving +- the type checker is incomplete, but improving +- no documentation yet +- no Makefile yet diff --git a/src/compiler/ABSYN.sig b/src/compiler/ABSYN.sig new file mode 100644 index 0000000..58f2bd9 --- /dev/null +++ b/src/compiler/ABSYN.sig @@ -0,0 +1,174 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature ABSYN = + sig + + type ident = Basis.ident + + datatype longid + = LONGID of ident list * ident + + datatype label + = IDlab of ident + | INTlab of int + + (* Type Expressions *) + + type tyvar + = ident + + type tycon + = ident + + type longtycon + = longid + + datatype ty + = VARty of tyvar + | RECty of (label * ty) list + | CONSty of ty list * longtycon + | FUNty of ty * ty + + (* Patterns *) + + type vid + = ident + + type longvid + = longid + + datatype scon + = INTsc of IntInf.int + | WORDsc of IntInf.int + | REALsc of real + | STRINGsc of string + | CHARsc of char + + datatype pat + = WILDpat + | SCONpat of scon + | VIDpat of longvid * Basis.idstatus option ref + | RECpat of (label * pat) list * bool + | CONSpat of longvid * pat + | TYPEDpat of pat * ty + | ASpat of vid * pat + + datatype typbind + = TYPBIND of (tyvar list * tycon * ty) list + + datatype conbind + = CONBIND of (vid * ty option) list + + datatype datbind + = DATBIND of (tyvar list * tycon * conbind) list + + datatype exbind + = CONexb of vid + | OFexb of vid * ty + | EQexb of vid * longvid + + datatype dec + = DEC of dec1 list + + and dec1 + = VALdec of tyvar list * (pat * exp) list * (pat * match) list + | TYPEdec of typbind + | DATATYPEdec of datbind * typbind + | DATAREPLdec of tycon * longtycon + | ABSTYPEdec of datbind * typbind * dec + | EXdec of exbind list + | LOCALdec of dec * dec + | OPENdec of longid list + + and exp + = SCONexp of scon + | VIDexp of longvid * Basis.idstatus option ref + | RECexp of (label * exp) list + | LETexp of dec * exp + | APPexp of exp * exp + | TYPEDexp of exp * ty + | HANDLEexp of exp * match + | RAISEexp of exp + | FNexp of match + + and match + = MATCH of (pat * exp) list + + type sigid + = ident + + type strid + = ident + + type longstrid + = longid + + type funid + = ident + + datatype spec + = SPEC of spec1 list + + and spec1 + = VALspec of (ident * ty) list + | TYPEspec of (tyvar list * tycon) list + | EQTYPEspec of (tyvar list * tycon) list + | DATATYPEspec of datbind + | DATAREPLspec of tycon * longtycon + | EXspec of conbind + | STRUCTUREspec of (strid * sigexp) list + | INCLUDEspec of sigexp + | SHARINGTYspec of spec * longtycon list + | SHARINGSTRspec of spec * longstrid list + + and sigexp + = SPECsigexp of spec + | SIGIDsigexp of sigid + | WHEREsigexp of sigexp * tyvar list * longtycon * ty + + datatype sigbind + = SIGBIND of (sigid * sigexp) list + + datatype strdec + = STRDEC of strdec1 list + + and strdec1 + = DECstrdec of dec + | STRUCTUREstrdec of strbind + | LOCALstrdec of strdec * strdec + + and strbind + = STRBIND of (strid * strexp) list + + and strexp + = STRUCTstrexp of strdec + | LONGSTRIDstrexp of longstrid + | TRANSPARENTstrexp of strexp * sigexp * Basis.env option ref + | OPAQUEstrexp of strexp * sigexp * Basis.env option ref + | FUNAPPstrexp of funid * strexp + | LETstrexp of strdec * strexp + + datatype funbind + = FUNBIND of (funid * strid * sigexp * strexp) list + + datatype topdec + = STRDECtopdec of strdec + | SIGDECtopdec of sigbind + | FUNDECtopdec of funbind + + val gensym : unit -> ident (* for creating fresh VIDs and STRIDs *) + + end diff --git a/src/compiler/Absyn.sml b/src/compiler/Absyn.sml new file mode 100644 index 0000000..659643e --- /dev/null +++ b/src/compiler/Absyn.sml @@ -0,0 +1,183 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Absyn : ABSYN = + struct + + type ident = Basis.ident + + datatype longid + = LONGID of ident list * ident + + datatype label + = IDlab of ident + | INTlab of int + + (* Type Expressions *) + + type tyvar + = ident + + type tycon + = ident + + type longtycon + = longid + + datatype ty + = VARty of tyvar + | RECty of (label * ty) list + | CONSty of ty list * longtycon + | FUNty of ty * ty + + (* Patterns *) + + type vid + = ident + + type longvid + = longid + + datatype scon + = INTsc of IntInf.int + | WORDsc of IntInf.int + | REALsc of real + | STRINGsc of string + | CHARsc of char + + datatype pat + = WILDpat + | SCONpat of scon + | VIDpat of longvid * Basis.idstatus option ref + | RECpat of (label * pat) list * bool + | CONSpat of longvid * pat + | TYPEDpat of pat * ty + | ASpat of vid * pat + + datatype typbind + = TYPBIND of (tyvar list * tycon * ty) list + + datatype conbind + = CONBIND of (vid * ty option) list + + datatype datbind + = DATBIND of (tyvar list * tycon * conbind) list + + datatype exbind + = CONexb of vid + | OFexb of vid * ty + | EQexb of vid * longvid + + datatype dec + = DEC of dec1 list + + and dec1 + = VALdec of tyvar list * (pat * exp) list * (pat * match) list + | TYPEdec of typbind + | DATATYPEdec of datbind * typbind + | DATAREPLdec of tycon * longtycon + | ABSTYPEdec of datbind * typbind * dec + | EXdec of exbind list + | LOCALdec of dec * dec + | OPENdec of longid list + + and exp + = SCONexp of scon + | VIDexp of longvid * Basis.idstatus option ref + | RECexp of (label * exp) list + | LETexp of dec * exp + | APPexp of exp * exp + | TYPEDexp of exp * ty + | HANDLEexp of exp * match + | RAISEexp of exp + | FNexp of match + + and match + = MATCH of (pat * exp) list + + type sigid + = ident + + type strid + = ident + + type longstrid + = longid + + type funid + = ident + + datatype spec + = SPEC of spec1 list + + and spec1 + = VALspec of (ident * ty) list + | TYPEspec of (tyvar list * tycon) list + | EQTYPEspec of (tyvar list * tycon) list + | DATATYPEspec of datbind + | DATAREPLspec of tycon * longtycon + | EXspec of conbind + | STRUCTUREspec of (strid * sigexp) list + | INCLUDEspec of sigexp + | SHARINGTYspec of spec * longtycon list + | SHARINGSTRspec of spec * longstrid list + + and sigexp + = SPECsigexp of spec + | SIGIDsigexp of sigid + | WHEREsigexp of sigexp * tyvar list * longtycon * ty + + datatype sigbind + = SIGBIND of (sigid * sigexp) list + + datatype strdec + = STRDEC of strdec1 list + + and strdec1 + = DECstrdec of dec + | STRUCTUREstrdec of strbind + | LOCALstrdec of strdec * strdec + + and strbind + = STRBIND of (strid * strexp) list + + and strexp + = STRUCTstrexp of strdec + | LONGSTRIDstrexp of longstrid + | TRANSPARENTstrexp of strexp * sigexp * Basis.env option ref + | OPAQUEstrexp of strexp * sigexp * Basis.env option ref + | FUNAPPstrexp of funid * strexp + | LETstrexp of strdec * strexp + + datatype funbind + = FUNBIND of (funid * strid * sigexp * strexp) list + + datatype topdec + = STRDECtopdec of strdec + | SIGDECtopdec of sigbind + | FUNDECtopdec of funbind + + val gensym = + let val counter = ref 0 + in + fn () => + let val c = 1 + !counter + in + counter := c; + Int.toString c + end + end + + end diff --git a/src/compiler/BASIS.sig b/src/compiler/BASIS.sig new file mode 100644 index 0000000..d7b68ef --- /dev/null +++ b/src/compiler/BASIS.sig @@ -0,0 +1,42 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature BASIS = + sig + + type ident = string (* TODO: for now *) + val identCompare : ident * ident -> order + + datatype idstatus = CON of bool (* hasarg? *) + | EXN of bool (* hasarg? *) + | VAL + datatype valenv = VE of (ident, idstatus) Dict.dict (* TODO: add TypeScheme *) + + datatype env = E of strenv * valenv (* TODO: add TyEnv *) + and strenv = SE of (ident, env) Dict.dict + + datatype sigma = SIG of env (* TODO: add TyNameSet? *) + datatype sigenv = SIGE of (ident, sigma) Dict.dict + + datatype basis = BASIS of sigenv * env (* TODO: add FunEnv *) + + val emptyEnv : env + val emptyBasis : basis + val initialBasis : basis + + val write : string * basis -> unit + val read : string -> basis + + end diff --git a/src/compiler/Basis.sml b/src/compiler/Basis.sml new file mode 100644 index 0000000..dcca0a9 --- /dev/null +++ b/src/compiler/Basis.sml @@ -0,0 +1,271 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Basis : BASIS = + struct + + type ident = string (* TODO: for now *) + val identCompare = String.compare + + datatype idstatus = CON of bool (* hasarg? *) + | EXN of bool (* hasarg? *) + | VAL + datatype valenv = VE of (ident, idstatus) Dict.dict (* TODO: add TypeScheme *) + + datatype env = E of strenv * valenv (* TODO: add TyEnv *) + and strenv = SE of (ident, env) Dict.dict + + datatype sigma = SIG of env (* TODO: add TyNameSet? *) + datatype sigenv = SIGE of (ident, sigma) Dict.dict + + datatype basis = BASIS of sigenv * env (* TODO: add FunEnv *) + + val emptyEnv = E(SE(Dict.empty identCompare), + VE(Dict.empty identCompare)) + + val emptyBasis = BASIS(SIGE(Dict.empty identCompare), emptyEnv) + + (* Initial Basis (TODO: incomplete) *) + + val veTextIO = VE(Dict.fromList(identCompare, [("output", VAL), ("stdOut", VAL)])) + + val initialSigEnv = SIGE(Dict.empty identCompare) + val initialValEnv = VE(Dict.empty identCompare) + val initialStrEnv = SE(Dict.fromList(identCompare, [("TextIO", E(SE(Dict.empty identCompare), veTextIO))])) + val initialEnv = E(initialStrEnv, initialValEnv) + val initialBasis = BASIS(initialSigEnv, initialEnv) + + (* File I/O of basis objects *) + + exception Basis + + fun error msg = + (TextIO.output(TextIO.stdErr, "Error reading basis file: " ^ msg ^ "\n"); raise Basis) + + fun prematureEof wanted = + error("expected " ^ wanted ^ ", got premature eof") + + fun expected(wanted, got) = + error("expected " ^ wanted ^ ", got " ^ Char.toString got) + + fun readEof is = + case TextIO.endOfStream is + of true => () + | false => error "trailing garbage" + + fun readChar(is, wanted) = + case TextIO.input1 is + of SOME c => if c = wanted then () else expected(Char.toString wanted, c) + | NONE => prematureEof(Char.toString wanted) + + (* I/O of version marker at start of basis file *) + + val version = #"0" + + fun writeVersion os = + TextIO.output1(os, version) + + fun readVersion is = + readChar(is, version) + + (* I/O of identifiers *) + + fun writeIdent(os, ident) = + (TextIO.output1(os, #"\""); + TextIO.output(os, ident); + TextIO.output1(os, #"\"")) + + fun readIdent is = + let val _ = readChar(is, #"\"") + fun loop cs = + case TextIO.input1 is + of SOME #"\"" => String.implode(List.rev cs) + | SOME c => loop(c :: cs) + | NONE => prematureEof "identifier" + in + loop [] + end + + (* I/O of IdStatus *) + + fun writeIdStatus(os, idStatus) = + let val (c, hasargOpt) = + case idStatus + of CON hasarg => (#"c", SOME hasarg) + | EXN hasarg => (#"e", SOME hasarg) + | VAL => (#"v", NONE) + in + TextIO.output1(os, c); + case hasargOpt + of SOME false => TextIO.output1(os, #"0") + | SOME true => TextIO.output1(os, #"1") + | NONE => () + end + + fun readIdStatus is = + let fun readHasArg is = + case TextIO.input1 is + of SOME #"0" => false + | SOME #"1" => true + | SOME c => expected("constatus hasarg (0|1)", c) + | NONE => prematureEof "constatus hasarg (0|1)" + in + case TextIO.input1 is + of SOME #"c" => CON(readHasArg is) + | SOME #"e" => EXN(readHasArg is) + | SOME #"v" => VAL + | SOME c => expected("idstatus (c|e|v)", c) + | NONE => prematureEof "idstatus (c|e|v)" + end + + (* I/O of Dict.dict *) + + fun writeDict(os, dict, writeMapping) = + (TextIO.output1(os, #"["); + Dict.fold(writeMapping, os, dict); + TextIO.output1(os, #"]")) + + fun readDict(is, readMapping) = + let val _ = readChar(is, #"[") + fun loop dict = + case TextIO.lookahead is + of SOME #"]" => (readChar(is, #"]"); dict) + | _ => loop(readMapping(is, dict)) + in + loop(Dict.empty identCompare) + end + + (* I/O of valenv *) + + fun writeValenvMapping(vid, idStatus, os) = + (TextIO.output1(os, #"{"); + writeIdent(os, vid); + TextIO.output1(os, #" "); + writeIdStatus(os, idStatus); + TextIO.output1(os, #"}"); + os) + + fun readValenvMapping(is, dict) = + let val _ = readChar(is, #"{") + val vid = readIdent is + val _ = readChar(is, #" ") + val idStatus = readIdStatus is + val _ = readChar(is, #"}") + in + Dict.insert(dict, vid, idStatus) + end + + fun writeValenv(os, VE dict) = + writeDict(os, dict, writeValenvMapping) + + fun readValenv is = + VE(readDict(is, readValenvMapping)) + + (* I/O of env and strenv *) + + fun writeEnv(os, E(strenv, valenv)) = + (TextIO.output1(os, #"("); + writeStrenv(os, strenv); + writeValenv(os, valenv); + TextIO.output1(os, #")")) + + and writeStrenv(os, SE dict) = + writeDict(os, dict, writeStrenvMapping) + + and writeStrenvMapping(strid, env, os) = + (TextIO.output1(os, #"{"); + writeIdent(os, strid); + writeEnv(os, env); + TextIO.output1(os, #"}"); + os) + + fun readEnv is = + let val _ = readChar(is, #"(") + val strenv = readStrenv is + val valenv = readValenv is + val _ = readChar(is, #")") + in + E(strenv, valenv) + end + + and readStrenv is = + SE(readDict(is, readStrenvMapping)) + + and readStrenvMapping(is, dict) = + let val _ = readChar(is, #"{") + val strid = readIdent is + val env = readEnv is + val _ = readChar(is, #"}") + in + Dict.insert(dict, strid, env) + end + + (* I/O of sigenv *) + + fun writeSigenvMapping(sigid, SIG env, os) = + (TextIO.output1(os, #"{"); + writeIdent(os, sigid); + writeEnv(os, env); + TextIO.output1(os, #"}"); + os) + + fun readSigenvMapping(is, dict) = + let val _ = readChar(is, #"{") + val sigid = readIdent is + val env = readEnv is + val _ = readChar(is, #"}") + in + Dict.insert(dict, sigid, SIG env) + end + + fun writeSigenv(os, SIGE dict) = + writeDict(os, dict, writeSigenvMapping) + + fun readSigenv is = + SIGE(readDict(is, readSigenvMapping)) + + (* I/O of basis *) + + fun writeBasis(os, BASIS(sigenv, env)) = + (writeVersion os; + TextIO.output1(os, #"("); + writeSigenv(os, sigenv); + writeEnv(os, env); + TextIO.output1(os, #")")) + + fun readBasis is = + let val _ = readVersion is + val _ = readChar(is, #"(") + val sigenv = readSigenv is + val env = readEnv is + val _ = readChar(is, #")") + val _ = readEof is + in + BASIS(sigenv, env) + end + + fun write(file, basis) = + let val os = TextIO.openOut file + in + Util.after(fn() => writeBasis(os, basis), fn() => TextIO.closeOut os) + end + + fun read file = + let val is = TextIO.openIn file + in + Util.after(fn() => readBasis is, fn() => TextIO.closeIn is) + end + + end diff --git a/src/compiler/CORE_ERLANG.sig b/src/compiler/CORE_ERLANG.sig new file mode 100644 index 0000000..2cba985 --- /dev/null +++ b/src/compiler/CORE_ERLANG.sig @@ -0,0 +1,92 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * Core Erlang representation. + * + * Omissions: + * - literals omit binaries and maps + * - 'case' clauses omit guards + * - all entities omit annotations + * - no support for multiple values, so 'let' binds a single variable, 'of' in + * 'try' binds a single variable, and 'case' clauses have a single pattern + * - 'apply' requires the function position to be a variable rather than any + * expression, for bug compatibility with OTP < 20.1 (see OTP PR #1526) + *) +signature CORE_ERLANG = + sig + + type atom = string + type integer = IntInf.int + type float = real + type var = string + + datatype literal + = L_ATOM of atom + | L_INTEGER of integer + | L_FLOAT of float + | L_STRING of string + | L_NIL + + datatype constant + = C_LITERAL of literal + | C_CONS of constant * constant + | C_TUPLE of constant list + + datatype pat + = P_LITERAL of literal + | P_CONS of pat * pat + | P_TUPLE of pat list + | P_VARIABLE of var + | P_ALIAS of var * pat + + datatype fname + = FNAME of atom * int + + datatype fvar + = FV of var + | FN of fname + + datatype expr + = E_LITERAL of literal + | E_CONS of expr * expr + | E_TUPLE of expr list + | E_VARIABLE of var + | E_FNAME of fname + | E_FUN of funexpr + | E_LET of var * expr * expr + | E_LETREC of (fname * funexpr) list * expr + | E_APPLY of fvar * expr list + | E_CALL of expr * expr * expr list + | E_PRIMOP of atom * expr list + | E_CASE of expr * (pat * expr) list + | E_TRY of expr * var * expr * var * var * var * expr + + and funexpr + = FUN of var list * expr + + datatype module + = MODULE of atom * fname list * (atom * constant) list * (fname * funexpr) list + + val mkVar: string option -> var + + (* convert basic entities to Core Erlang notation *) + + val integerToString: integer -> string + val floatToString: float -> string + val atomToString: atom -> string + val stringToString: string -> string + val varToString: var -> string + + end diff --git a/src/compiler/CORE_ERLANG_PRINT.sig b/src/compiler/CORE_ERLANG_PRINT.sig new file mode 100644 index 0000000..57efcda --- /dev/null +++ b/src/compiler/CORE_ERLANG_PRINT.sig @@ -0,0 +1,21 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature CORE_ERLANG_PRINT = + sig + + val printModule: string * CoreErlang.module -> unit + + end diff --git a/src/compiler/CoreErlang.sml b/src/compiler/CoreErlang.sml new file mode 100644 index 0000000..249ec14 --- /dev/null +++ b/src/compiler/CoreErlang.sml @@ -0,0 +1,173 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * Core Erlang representation. + * + * Omissions: + * - literals omit binaries and maps + * - 'case' clauses omit guards + * - all entities omit annotations + * - no support for multiple values, so 'let' binds a single variable, 'of' in + * 'try' binds a single variable, and 'case' clauses have a single pattern + * - 'apply' requires the function position to be a variable rather than any + * expression, for bug compatibility with OTP < 20.1 (see OTP PR #1526) + *) +structure CoreErlang : CORE_ERLANG = + struct + + type atom = string + type integer = IntInf.int + type float = real + type var = string + + datatype literal + = L_ATOM of atom + | L_INTEGER of integer + | L_FLOAT of float + | L_STRING of string + | L_NIL + + datatype constant + = C_LITERAL of literal + | C_CONS of constant * constant + | C_TUPLE of constant list + + datatype pat + = P_LITERAL of literal + | P_CONS of pat * pat + | P_TUPLE of pat list + | P_VARIABLE of var + | P_ALIAS of var * pat + + datatype fname + = FNAME of atom * int + + datatype fvar + = FV of var + | FN of fname + + datatype expr + = E_LITERAL of literal + | E_CONS of expr * expr + | E_TUPLE of expr list + | E_VARIABLE of var + | E_FNAME of fname + | E_FUN of funexpr + | E_LET of var * expr * expr + | E_LETREC of (fname * funexpr) list * expr + | E_APPLY of fvar * expr list + | E_CALL of expr * expr * expr list + | E_PRIMOP of atom * expr list + | E_CASE of expr * (pat * expr) list + | E_TRY of expr * var * expr * var * var * var * expr + + and funexpr + = FUN of var list * expr + + datatype module + = MODULE of atom * fname list * (atom * constant) list * (fname * funexpr) list + + val mkVar = + let val counter = ref 0 + in + fn (SOME name) => "V" ^ name + | NONE => + let val c = 1 + !counter + in + counter := c; + "V" ^ Int.toString c + end + end + + (* convert basic entities to Core Erlang notation *) + + fun tildeToDash #"~" = #"-" + | tildeToDash c = c + + fun integerToString int = + String.map tildeToDash (IntInf.toString int) + + (* TODO: work around Moscow ML lack of Real.isFinite + exception BadFloat of real + + fun floatToString f = + if Real.isFinite f then String.map tildeToDash (Real.toString f) + else raise (BadFloat f) + *) + fun floatToString float = + String.map tildeToDash (Real.toString float) + + fun charToOctal3 c = + let val s = Int.fmt StringCvt.OCT (Char.ord c) + in + case String.size s + of 1 => "00" ^ s + | 2 => "0" ^ s + | _ => s + end + + fun escapeString(s, quote) = + let fun escape c = + case c + of + (* Escapes that are the same in SML and Core Erlang *) + #"\b" => [#"\\", #"b"] + | #"\t" => [#"\\", #"t"] + | #"\n" => [#"\\", #"n"] + | #"\v" => [#"\\", #"v"] + | #"\f" => [#"\\", #"f"] + | #"\r" => [#"\\", #"r"] + | #"\\" => [#"\\", #"\\"] + (* Core Erlang only escapes (except for the single and double quotes) *) + | #"\u001b" => [#"\\", #"e"] + | #" " => [#"\\", #"s"] + | #"\u007f" => [#"\\", #"d"] + (* \a for \u0007 is SML-only *) + | _ => + if c = quote then [#"\\", c] + else if Char.isPrint c then [c] + else #"\\" :: String.explode(charToOctal3 c) + fun loop([], res) = String.implode(quote :: List.rev(quote :: res)) + | loop(c :: cs, res) = loop(cs, List.rev(escape c) @ res) + in + loop(String.explode s, []) + end + + fun atomToString atom = escapeString(atom, #"'") + + fun stringToString str = escapeString(str, #"\"") + + fun varToString var = + let fun mangle c = (* _OOO *) + #"_" :: String.explode(charToOctal3 c) + fun toInitial c = + if c = #"_" then [c, c] + else if Char.isUpper c then [c] + else mangle c + fun toSubsequent c = + case c + of #"_" => [c, c] + | #"@" => [c] + | _ => + if Char.isAlphaNum c then [c] + else mangle c + fun loop([], res) = String.implode(List.rev res) + | loop(c :: cs, res) = loop(cs, List.rev(toSubsequent c) @ res) + val cs = String.explode var + in + loop(tl cs, List.rev(toInitial(hd cs))) + end + + end diff --git a/src/compiler/CoreErlangPrint.sml b/src/compiler/CoreErlangPrint.sml new file mode 100644 index 0000000..640ad1b --- /dev/null +++ b/src/compiler/CoreErlangPrint.sml @@ -0,0 +1,219 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure CoreErlangPrint : CORE_ERLANG_PRINT = + struct + + fun prList(os, lparen, rparen, f, xs) = + (TextIO.output1(os, lparen); + (case xs + of (x::xs') => + (f(os, x); List.app (fn(x) => (TextIO.output1(os, #","); f(os, x))) xs') + | [] => ()); + TextIO.output1(os, rparen)) + + fun prAtom(os, a) = TextIO.output(os, CoreErlang.atomToString a) + + fun prLiteral(os, lit) = + case lit + of CoreErlang.L_ATOM a => + prAtom(os, a) + | CoreErlang.L_INTEGER i => + TextIO.output(os, CoreErlang.integerToString i) + | CoreErlang.L_FLOAT f => + TextIO.output(os, CoreErlang.floatToString f) + | CoreErlang.L_STRING s => + TextIO.output(os, CoreErlang.stringToString s) + | CoreErlang.L_NIL => + TextIO.output(os, "[]") + + fun prConstant(os, cst) = + case cst + of CoreErlang.C_LITERAL lit => + prLiteral(os, lit) + | CoreErlang.C_CONS(c1, c2) => + (* output lists as lists not right-recursive trees *) + let fun loop(CoreErlang.C_CONS(p1, p2)) = + (TextIO.output1(os, #","); + prConstant(os, c1); + loop c2) + | loop(CoreErlang.C_LITERAL CoreErlang.L_NIL) = + TextIO.output1(os, #"]") + | loop c = + (TextIO.output1(os, #"|"); + prConstant(os, c); + TextIO.output1(os, #"]")) + in + TextIO.output1(os, #"["); + prConstant(os, c1); + loop c2 + end + | CoreErlang.C_TUPLE cs => + prList(os, #"{", #"}", prConstant, cs) + + fun prVar(os, v) = TextIO.output(os, CoreErlang.varToString v) + + fun prPat(os, pat) = + case pat + of CoreErlang.P_LITERAL l => + prLiteral(os, l) + | CoreErlang.P_CONS(p1, p2) => + (* output lists as lists not right-recursive trees *) + let fun loop(CoreErlang.P_CONS(p1, p2)) = + (TextIO.output1(os, #","); + prPat(os, p1); + loop p2) + | loop(CoreErlang.P_LITERAL CoreErlang.L_NIL) = + TextIO.output1(os, #"]") + | loop p = + (TextIO.output1(os, #"|"); + prPat(os, p); + TextIO.output1(os, #"]")) + in + TextIO.output1(os, #"["); + prPat(os, p1); + loop p2 + end + | CoreErlang.P_TUPLE ps => + prList(os, #"{", #"}", prPat, ps) + | CoreErlang.P_VARIABLE v => + prVar(os, v) + | CoreErlang.P_ALIAS(v, p) => + (prVar(os, v); + TextIO.output1(os, #"="); + prPat(os, p)) + + fun prFName(os, CoreErlang.FNAME(name,arity)) = + (prAtom(os, name); + TextIO.output1(os, #"/"); + TextIO.output(os, Int.toString arity)) + + fun prFVar(os, CoreErlang.FV v) = prVar(os, v) + | prFVar(os, CoreErlang.FN n) = prFName(os, n) + + fun prExpr(os, expr) = + case expr + of CoreErlang.E_LITERAL l => + prLiteral(os, l) + | CoreErlang.E_CONS(e1, e2) => + (* output lists as lists not right-recursive trees *) + let fun loop(CoreErlang.E_CONS(e1, e2)) = + (TextIO.output1(os, #","); + prExpr(os, e1); + loop e2) + | loop(CoreErlang.E_LITERAL CoreErlang.L_NIL) = + TextIO.output1(os, #"]") + | loop e = + (TextIO.output1(os, #"|"); + prExpr(os, e); + TextIO.output1(os, #"]")) + in + TextIO.output1(os, #"["); + prExpr(os, e1); + loop e2 + end + | CoreErlang.E_TUPLE es => + prList(os, #"{", #"}", prExpr, es) + | CoreErlang.E_VARIABLE v => + prVar(os, v) + | CoreErlang.E_FNAME fname => + prFName(os, fname) + | CoreErlang.E_FUN fexpr => + prFunExpr(os, fexpr) + | CoreErlang.E_LET(v, e1, e2) => + (TextIO.output(os, "let "); + prVar(os, v); + TextIO.output1(os, #"="); + prExpr(os, e1); + TextIO.output(os, " in "); + prExpr(os, e2)) + | CoreErlang.E_LETREC(fundefs, body) => + (TextIO.output(os, "letrec "); + List.app (fn (fundef) => prFunDef(os, fundef)) fundefs; + TextIO.output(os, " in "); + prExpr(os, body)) + | CoreErlang.E_APPLY(fvar, es) => + (TextIO.output(os, "apply "); + prFVar(os, fvar); + prList(os, #"(", #")", prExpr, es)) + | CoreErlang.E_CALL(m, f, es) => + (TextIO.output(os, "call "); + prExpr(os, m); + TextIO.output1(os, #":"); + prExpr(os, f); + prList(os, #"(", #")", prExpr, es)) + | CoreErlang.E_PRIMOP(a, es) => + (TextIO.output(os, "primop "); + prAtom(os, a); + prList(os, #"(", #")", prExpr, es)) + | CoreErlang.E_CASE(e, cs) => + (TextIO.output(os, "case "); + prExpr(os, e); + TextIO.output(os, " of "); + List.app (fn(p, e) => (prPat(os, p); TextIO.output(os, " when 'true' -> "); prExpr(os, e))) cs; + TextIO.output(os, " end")) + | CoreErlang.E_TRY(e1, v1, e2, cv1, cv2, cv3, ce) => + (TextIO.output(os, "try "); + prExpr(os, e1); + TextIO.output(os, " of "); + prVar(os, v1); + TextIO.output(os, " -> "); + prExpr(os, e2); + TextIO.output(os, " catch "); + prList(os, #"<", #">", prVar, [cv1, cv2, cv3]); + TextIO.output(os, " -> "); + prExpr(os, ce)) + + and prFunExpr(os, CoreErlang.FUN(vs, e)) = + (TextIO.output(os, "fun "); + prList(os, #"(", #")", prVar, vs); + TextIO.output(os, " -> "); + prExpr(os, e)) + + and prFunDef(os, (fname, fexpr)) = + (prFName(os, fname); + TextIO.output1(os, #"="); + prFunExpr(os, fexpr)) + + fun prExports(os, exports) = + prList(os, #"[", #"]", prFName, exports) + + fun prAttribute(os, (atom, constant)) = + (prAtom(os, atom); + TextIO.output1(os, #"="); + prConstant(os, constant)) + + fun prAttributes(os, attributes) = + (TextIO.output(os, "attributes "); + prList(os, #"[", #"]", prAttribute, attributes)) + + fun prModule(os, CoreErlang.MODULE(name,exports,attributes,fundefs)) = + (TextIO.output(os, "module "); + prAtom(os, name); + TextIO.output1(os, #"\n"); + prExports(os, exports); + TextIO.output1(os, #"\n"); + prAttributes(os, attributes); + TextIO.output1(os, #"\n"); + List.app (fn fundef => (prFunDef(os, fundef); TextIO.output1(os, #"\n"))) fundefs; + TextIO.output(os, "end\n")) + + fun printModule(file, module) = + let val os = TextIO.openOut file + in + Util.after(fn() => prModule(os, module), fn() => TextIO.closeOut os) + end + + end diff --git a/src/compiler/DICT.sig b/src/compiler/DICT.sig new file mode 100644 index 0000000..12cb0f0 --- /dev/null +++ b/src/compiler/DICT.sig @@ -0,0 +1,31 @@ +(* + * Copyright 1996, 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature DICT = + sig + + (* structure Key : ORD_KEY *) (* XXX: TODO *) + + type ('key, 'value) dict + + val empty : ('key * 'key -> order) -> ('key, 'value) dict + val insert : ('key, 'value) dict * 'key * 'value -> ('key, 'value) dict + val find' : ('key, 'value) dict * 'key -> ('key * 'value) option + val find : ('key, 'value) dict * 'key -> 'value option + val plus : ('key, 'value) dict * ('key, 'value) dict -> ('key, 'value) dict + val fold : ('key * 'value * 'b -> 'b) * 'b * ('key, 'value) dict -> 'b + val fromList: ('key * 'key -> order) * ('key * 'value) list -> ('key, 'value) dict + + end (* signature DICT *) diff --git a/src/compiler/Dict.sml b/src/compiler/Dict.sml new file mode 100644 index 0000000..443782b --- /dev/null +++ b/src/compiler/Dict.sml @@ -0,0 +1,159 @@ +(* + * Copyright 1996, 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * De-functorized version of: + * + * aa-dict-fn.sml + * Written by: Mikael Pettersson, 1996. + * + * An SML implementation of dictionaries based on balanced search + * trees as described in the following article: + * + * @InProceedings{Andersson93, + * author = {Arne Andersson}, + * title = {Balanced Search Trees Made Simple}, + * pages = {60--71}, + * crossref = {WADS93} + * } + * + * @Proceedings{WADS93, + * title = {Proceedings of the Third Workshop on Algorithms and Data Structures, WADS'93}, + * booktitle ={Proceedings of the Third Workshop on Algorithms and Data Structures, WADS'93}, + * editor = {F. Dehne and J-R. Sack and N. Santoro and S. Whitesides}, + * publisher ={Springer-Verlag}, + * series = {Lecture Notes In Computer Science}, + * volume = 709, + * year = 1993 + * } + * + * The original Pascal code represented empty trees by a single shared node + * with level 0, and whose left and right fields pointed back to itself. + * The point of this trick was to eliminate special cases in the skew and + * split procedures. Since it would be expensive to emulate, this SML code + * uses a traditional representation, making all special cases explicit. + * + * This is the vanilla version with no optimizations applied. + *) +structure Dict : DICT = + struct + + local + + datatype ('key, 'value) tree + = E + | T of {key: 'key, + attr: 'value, + level: int, + left: ('key, 'value) tree, + right: ('key, 'value) tree } + + fun split(t as E) = t + | split(t as T{right=E,...}) = t + | split(t as T{right=T{right=E,...},...}) = t + | split(t as T{key=kx,attr=ax,level=lx,left=a, + right=T{key=ky,attr=ay,left=b, + right=(z as T{level=lz,...}),...}}) = + if lx = lz then (* rotate left *) + T{key=ky,attr=ay,level=lx+1,right=z, + left=T{key=kx,attr=ax,level=lx,left=a,right=b}} + else t + + fun skew(t as E) = t + | skew(t as T{left=E,...}) = t + | skew(t as T{key=kx,attr=ax,level=lx,right=c, + left=T{key=ky,attr=ay,level=ly,left=a,right=b}}) = + if lx = ly then (* rotate right *) + T{key=ky,attr=ay,level=ly,left=a, + right=T{key=kx,attr=ax,level=lx,left=b,right=c}} + else t + + fun tfind(compare, t, x) = + let fun look(E) = E + | look(t as T{key,left,right,...}) = + case compare(x, key) + of LESS => look left + | GREATER => look right + | EQUAL => t + in + look t + end + + fun tinsert(compare, t, x, y) = + let fun insert'(E, x, y) = T{key=x, attr=y, level=1, left=E, right=E} + | insert'(T{key,attr,level,left,right}, x, y) = + let val t = case compare(x,key) + of LESS => + T{key=key, attr=attr, level=level, right=right, + left=insert'(left,x,y)} + | GREATER => + T{key=key, attr=attr, level=level, left=left, + right=insert'(right,x,y)} + | EQUAL => + T{key=x, attr=y, level=level, left=left, right=right} + val t = skew t + val t = split t + in + t + end + in + insert'(t, x, y) + end + + in + + datatype ('key, 'value) dict + = DICT of {compare: 'key * 'key -> order, + tree: ('key, 'value) tree} + + fun empty compare = DICT{compare = compare, tree = E} + + fun insert(DICT{compare, tree}, x, y) = + DICT{compare = compare, tree = tinsert(compare, tree, x, y)} + + fun find'(DICT{compare, tree}, x) = + case tfind(compare, tree, x) + of E => NONE + | T{key,attr,...} => SOME(key,attr) + + fun find(DICT{compare, tree}, x) = + case tfind(compare, tree, x) + of E => NONE + | T{attr,...} => SOME attr + + fun plus(DICT{compare, tree}, DICT{tree = tree2, ...}) = + let fun plus'(bot, E) = bot + | plus'(bot, T{key,attr,left,right,...}) = + tinsert(compare, plus'(plus'(bot, left), right), key, attr) + in + DICT{compare = compare, tree = plus'(tree, tree2)} + end + + fun fold(f, init, DICT{tree = dict, ...}) = + let fun traverse(E, state) = state + | traverse(T{key,attr,left,right,...}, state) = + traverse(right, traverse(left, f(key,attr,state))) + in + traverse(dict, init) + end + + fun fromList(compare, alist) = + let fun insert((x, y), t) = tinsert(compare, t, x, y) + in + DICT{compare = compare, tree = List.foldl insert E alist} + end + + end (* local *) + + end (* structure Dict *) diff --git a/src/compiler/LEXARG.sig b/src/compiler/LEXARG.sig new file mode 100644 index 0000000..b64e09e --- /dev/null +++ b/src/compiler/LEXARG.sig @@ -0,0 +1,29 @@ +(* + * Copyright 1997, 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature LEXARG = + sig + + type pos = int + type lexarg + + val new : string * TextIO.instream -> lexarg + val input1 : lexarg -> char option * pos + exception Unget + val unget1 : lexarg * char * pos -> unit + val error : lexarg -> string * pos * pos -> unit + val source : lexarg -> Source.source + + end (* signature LEXARG *) diff --git a/src/compiler/LEXER.sig b/src/compiler/LEXER.sig new file mode 100644 index 0000000..d4e1c8a --- /dev/null +++ b/src/compiler/LEXER.sig @@ -0,0 +1,19 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature LEXER = + sig + val lex : LexArg.lexarg -> Token.token + end diff --git a/src/compiler/LEXUTIL.sig b/src/compiler/LEXUTIL.sig new file mode 100644 index 0000000..1bc7a76 --- /dev/null +++ b/src/compiler/LEXUTIL.sig @@ -0,0 +1,29 @@ +(* + * Copyright 1997, 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature LEXUTIL = + sig + + val decint : string -> IntInf.int + val hexint : string -> IntInf.int +(* + val icon : string -> IntInf.int option +*) + val rcon : string -> real +(* + val ccon : string -> char + val scon : string -> string +*) + end (* signature LEXUTIL *) diff --git a/src/compiler/LexArg.sml b/src/compiler/LexArg.sml new file mode 100644 index 0000000..c6b8fb7 --- /dev/null +++ b/src/compiler/LexArg.sml @@ -0,0 +1,82 @@ +(* + * Copyright 1997, 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure LexArg : LEXARG = + struct + + type pos = int + + datatype lexarg + = A of { fileName : string, + instream : TextIO.instream, + newLines : int list ref, + thisLine : pos ref, + readPos : int ref, + unget : (char * pos) option ref} + + fun new(fileName, instream) = + let val pos = Source.startPos + 1 + in + A{fileName = fileName, + instream = instream, + newLines = ref [], + thisLine = ref pos, + readPos = ref pos, + unget = ref NONE} + end + + fun input1(A{instream, newLines, thisLine, readPos, unget, ...}) = + case !unget + of SOME(ch, pos) => (unget := NONE; (SOME ch, pos)) + | NONE => + let val pos = !readPos + val chOpt = + case TextIO.input1 instream + of NONE => NONE + | SOME ch => + let val incr = + case ch + of #"\n" => + (newLines := pos :: !newLines; + thisLine := pos + 1; + 1) + | #"\t" => + let val lpos = pos - !thisLine + in + 8 - Int.rem(lpos, 8) + end + | _ => 1 + in + readPos := pos + incr; + SOME ch + end + in + (chOpt, pos) + end + + exception Unget + + fun unget1(A{unget, ...}, ch, pos) = + case !unget + of NONE => unget := SOME(ch, pos) + | SOME _ => raise Unget + + fun source(A{fileName,newLines,...}) = + Source.SOURCE{fileName = fileName, newLines = !newLines} + + fun error lexarg (msg,left,right) = + Source.sayMsg (source lexarg) ("Error: "^msg, left, right) + + end (* structure LexArg *) diff --git a/src/compiler/LexUtil.sml b/src/compiler/LexUtil.sml new file mode 100644 index 0000000..20d611f --- /dev/null +++ b/src/compiler/LexUtil.sml @@ -0,0 +1,194 @@ +(* + * Copyright 1997, 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure LexUtil : LEXUTIL = + struct + + fun mkinfint(s, indx) = + let val size = String.size s + val (isneg, indx) = case String.sub(s, indx) + of #"~" => (true, indx + 1) + | _ => (false, indx) + fun scan(i, indx) = + (* This does a "i * 10 - c" loop followed by a negation if the + number is positive, because that gives extra precision with + fixed-precision 'int'. IntInf.int doesn't need that, but + this makes the code more reusable. *) + if indx < size then + let val c = Char.ord(String.sub(s, indx)) - Char.ord #"0" + val i = IntInf.-(IntInf.*(i, IntInf.fromInt 10), IntInf.fromInt c) + in + scan(i, indx + 1) + end + else if isneg then i else IntInf.~ i + in + scan(IntInf.fromInt 0, indx) + end + + fun mkint(s, indx) = IntInf.toInt(mkinfint(s, indx)) + + fun decint s = mkinfint(s, 0) + + fun hexint s = (* (~?0|0w)x{hexdigit}+ *) + let val size = String.size s + val indx = 0 + val (isneg, indx) = case String.sub(s, indx) + of #"~" => (true, indx + 1) + | _ => (false, indx) + val indx = indx + 1 (* skip "0" *) + val indx = case String.sub(s, indx) + of #"w" => indx + 1 + | _ => indx + val indx = indx + 1 (* skip "x" *) + fun digit c = + if c >= #"a" then Char.ord c - (Char.ord #"a" - 10) + else if c >= #"A" then Char.ord c - (Char.ord #"A" - 10) + else Char.ord c - Char.ord #"0" + fun scan(i, indx) = + if indx < size then + let val c = digit(String.sub(s, indx)) + val i = IntInf.-(IntInf.*(i, IntInf.fromInt 16), IntInf.fromInt c) + in + scan(i, indx + 1) + end + else if isneg then i else IntInf.~ i + in + scan(IntInf.fromInt 0, indx) + end +(* + fun icon s = + let val size = String.size s + fun hex_dig indx = + if indx >= size then NONE + else + let val c = String.sub(s, indx) + in + if (c >= #"0" andalso c <= #"9") orelse + (c >= #"A" andalso c <= #"F") orelse + (c >= #"a" andalso c <= #"f") + then SOME(hexint s) + else NONE + end + fun hex_pfx indx = + if indx >= size then SOME(IntInf.fromInt 0) + else + let val c = String.sub(s, indx) + in + if c = #"x" then hex_dig(indx + 1) + else if c >= #"0" andalso c <= #"9" then SOME(decint s) + else NONE + end + fun after_sign indx = + if indx >= size then NONE + else + let val c = String.sub(s, indx) + in + if c = #"0" then hex_pfx(indx + 1) + else if c >= #"1" andalso c <= #"9" then SOME(decint s) + else NONE + end + fun sign indx = + if indx >= size then NONE + else + case String.sub(s, indx) + of #"~" => after_sign(indx + 1) + | _ => after_sign indx + in + sign 0 + end +*) + fun mkreal(s, indx) = + let val size = String.size s + val (isneg, indx) = case String.sub(s, indx) + of #"~" => (true, indx + 1) + | _ => (false, indx) + fun apply_sign f = if isneg then ~f else f + fun scale(f, _, 0) = apply_sign f + | scale(f, x, e) = scale(f * x, x, e - 1) + fun exp(f, e, indx) = + let val e = mkint(s, indx) - e + in + scale(f, if e < 0 then 0.10 else 10.0, abs e) + end + fun frac(f, e, indx) = + if indx < size then + let val c = String.sub(s, indx) + in + if c >= #"0" andalso c <= #"9" then + frac(f * 10.0 + real(Char.ord c - Char.ord #"0"), e + 1, indx + 1) + else if c = #"e" orelse c = #"E" then exp(f, e, indx + 1) + else scale(f, 0.10, e) + end + else scale(f, 0.10, e) + fun start(f, indx) = + if indx < size then + let val c = String.sub(s, indx) + in + if c >= #"0" andalso c <= #"9" then + start(f * 10.0 + real(Char.ord c - Char.ord #"0"), indx + 1) + else if c = #"." then frac(f, 0, indx + 1) + else if c = #"e" orelse c = #"E" then exp(f, 0, indx + 1) + else apply_sign f + end + else apply_sign f + in + start(0.0, indx) + end + + fun rcon s = mkreal(s, 0) +(* + fun escseq(s, i) = + let fun echar #"n" = #"\n" (* Char.chr 10 *) + | echar #"r" = #"\r" (* Char.chr 13 *) + | echar #"t" = #"\t" (* Char.chr 9 *) + | echar #"f" = #"\f" (* Char.chr 12 *) + | echar #"a" = #"\a" (* Char.chr 7 *) + | echar #"b" = #"\b" (* Char.chr 8 *) + | echar #"v" = #"\v" (* Char.chr 11 *) + | echar c = c (* #"\\" and #"\"" *) + fun cntrl c = Char.chr((Char.ord c - 64) mod 128) + val c = String.sub(s, i) + in + if c >= #"0" andalso c <= #"9" then + let val d1 = Char.ord c - Char.ord #"0" + and d2 = Char.ord(String.sub(s, i + 1)) - Char.ord #"0" + and d3 = Char.ord(String.sub(s, i + 2)) - Char.ord #"0" + in + (Char.chr(d1 * 100 + d2 * 10 + d3), i + 3) + end + else if c = #"^" then (cntrl(String.sub(s, i + 1)), i + 2) + else (echar c, i + 1) + end + + fun ccon s = (* #\"{cdesc}\" -> char *) + case String.sub(s,2) + of #"\\" => let val (c, _) = escseq(s, 3) in c end + | c => c + + fun scon s = (* \"{cdesc}*\" -> string *) + let fun sitem(i, rev_cs) = + case String.sub(s,i) + of #"\"" => String.implode(List.rev rev_cs) + | #"\\" => + let val (c, i) = escseq(s, i + 1) + in + sitem(i, c :: rev_cs) + end + | c => sitem(i + 1, c :: rev_cs) + in + sitem(1, []) + end +*) + end (* structure LexUtil *) diff --git a/src/compiler/Lexer.sml b/src/compiler/Lexer.sml new file mode 100644 index 0000000..8b0a148 --- /dev/null +++ b/src/compiler/Lexer.sml @@ -0,0 +1,556 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Lexer : LEXER = + struct + + fun error(msg, pos1, pos2, lexarg) = + (LexArg.error lexarg (msg, pos1, pos2); + (Token.T(pos1, pos2, Token.ERROR), NONE)) + + fun premature_eof(context, pos1, pos2, lexarg) = + error("premature EOF in "^context, pos1, pos2, lexarg) + + fun unexpected(context, expected, got, pos1, pos2, lexarg) = + error("in " ^ context ^ ", expected " ^ expected ^ ", got: " ^ Char.toString got, pos1, pos2, lexarg) + + (* ctype clone, for scanning sequences of characters of the same type + * in identifiers and numerals; + * encoding: + * - bit 0( 1) is set for alphanumeric characters (A: 64+1) + * - bit 1( 2) is set for hexadecimal digits (C: 64+2+1) + * - bit 2( 4) is set for decimal digits (G: 64+4+2+1) + * - bit 3( 8) is set for symbolic characters (H: 64+8) + * - bit 6(64) is always set to make the codes readable (@) + *) + val ctype = + (* 0-7 8-15 15-23 24-31 !"#$%&'()*+,-./0123456789:;<=>? *) + "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@H@HHHHA@@HH@H@HGGGGGGGGGGH@HHHH\ + \HCCCCCCAAAAAAAAAAAAAAAAAAAA@H@HAHCCCCCCAAAAAAAAAAAAAAAAAAAA@H@H@" + (* @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstyvwxyz{|}~DEL *) + + val ctype_alnum = Word.fromInt 1 + val ctype_hexdig = Word.fromInt 2 + val ctype_digit = Word.fromInt 4 + val ctype_symbol = Word.fromInt 8 + + fun is_ctype(mask, ch) = + if Char.ord ch < 128 then + let val code = Word.fromInt(Char.ord(String.sub(ctype, Char.ord ch))) + in + not(Word.andb(mask, code) = Word.fromInt 0) + end + else false + + fun scan_ctype(mask, acc, pos2, lexarg) = + let fun loop(acc, pos2) = + case LexArg.input1 lexarg + of (NONE, _) => (acc, pos2, NONE) + | (SOME ch, pos3) => + if is_ctype(mask, ch) then loop(ch :: acc, pos3) + else (acc, pos2, SOME(ch, pos3)) + in + loop(acc, pos2) + end + + fun mkintc acc = + let val lexeme = List.rev acc + fun dointc() = Token.INTC(LexUtil.decint(String.implode lexeme)) + fun donumc() = Token.NUMC(LexUtil.decint(String.implode lexeme)) + in + case lexeme + of [#"0"] => Token.DIGZ + | [ch] => Token.DIGNZ(Char.ord ch - Char.ord #"0") + | #"~" :: _ => dointc() + | #"0" :: _ => dointc() + | _ => donumc() + end + + fun mkwordc acc = + Token.WORDC(LexUtil.decint(String.implode(List.rev acc))) + + fun mkhexc(isWord, acc) = + let val i = LexUtil.hexint(String.implode(List.rev acc)) + in + if isWord then Token.WORDC i else Token.INTC i + end + + fun mkrealc acc = + Token.REALC(LexUtil.rcon(String.implode(List.rev acc))) + + fun mkstringc acc = + Token.STRINGC(String.implode(List.rev acc)) + + (* seen "~"?+("."+)[eE], want "~"?+ *) + fun scan_real_exponent(acc, pos1, pos2, lexarg) = + let fun post_sign(ch, acc, pos2) = + if Char.isDigit ch then + let val (acc, pos2, lookahead) = scan_ctype(ctype_digit, ch :: acc, pos2, lexarg) + in + (Token.T(pos1, pos2, mkrealc acc), lookahead) + end + else unexpected("real exponent", "digit", ch, pos1, pos2, lexarg) + in + case LexArg.input1 lexarg + of (NONE, _) => premature_eof("real exponent", pos1, pos2, lexarg) + | (SOME ch, pos3) => + if ch = #"~" then + case LexArg.input1 lexarg + of (NONE, _) => premature_eof("real exponent", pos1, pos2, lexarg) + | (SOME ch, pos4) => post_sign(ch, #"~" :: acc, pos4) + else post_sign(ch, acc, pos3) + end + + (* seen "~"?+".", want +? *) + fun scan_real_fraction(acc, pos1, pos2, lexarg) = + case LexArg.input1 lexarg + of (NONE, _) => premature_eof("real fraction", pos1, pos2, lexarg) + | (SOME ch, pos3) => + if Char.isDigit ch then + let val (acc, pos3, lookahead) = scan_ctype(ctype_digit, ch :: acc, pos3, lexarg) + in + case lookahead + of NONE => (Token.T(pos1, pos3, mkrealc acc), NONE) + | SOME(ch, pos4) => + if ch = #"e" orelse ch = #"E" then scan_real_exponent(ch :: acc, pos1, pos4, lexarg) + else (Token.T(pos1, pos3, mkrealc acc), SOME(ch, pos4)) + end + else unexpected("real fraction", "digit", ch, pos1, pos3, lexarg) + + fun scan_decimal_integer_post_ch(ch, acc, pos1, pos2, pos3, lexarg) = + if ch = #"." then scan_real_fraction(ch :: acc, pos1, pos3, lexarg) + else if ch = #"e" orelse ch = #"E" then scan_real_exponent(ch :: acc, pos1, pos3, lexarg) + else (Token.T(pos1, pos2, mkintc acc), SOME(ch, pos3)) + + fun scan_decimal_integer(acc, pos1, pos2, lexarg) = + let val (acc, pos2, lookahead) = scan_ctype(ctype_digit, acc, pos2, lexarg) + in + case lookahead + of NONE => (Token.T(pos1, pos2, mkintc acc), NONE) + | SOME(ch, pos3) => scan_decimal_integer_post_ch(ch, acc, pos1, pos2, pos3, lexarg) + end + + (* seen (~?0|0w)x, want + *) + fun scan_hex(acc, pos1, pos2, isWord, lexarg) = + case LexArg.input1 lexarg + of (NONE, _) => premature_eof("hex constant", pos1, pos2, lexarg) + | (SOME ch, pos2) => + if Char.isHexDigit ch then + let val (acc, pos2, lookahead) = scan_ctype(ctype_hexdig, ch :: acc, pos2, lexarg) + in + (Token.T(pos1, pos2, mkhexc(isWord, acc)), lookahead) + end + else unexpected("hex constant", "hex digit", ch, pos1, pos2, lexarg) + + (* seen ~?0x, want + *) + fun scan_hex_integer(acc, pos1, pos2, lexarg) = scan_hex(acc, pos1, pos2, false, lexarg) + + (* seen 0wx, want + *) + fun scan_hex_word(acc, pos1, pos2, lexarg) = scan_hex(acc, pos1, pos2, true, lexarg) + + (* seen 0w, want + *) + fun scan_decimal_word(ch, acc, pos1, pos2, lexarg) = + if Char.isDigit ch then + let val (acc, pos2, lookahead) = scan_ctype(ctype_digit, ch :: acc, pos2, lexarg) + in + (Token.T(pos1, pos2, mkwordc acc), lookahead) + end + else unexpected("word constant", "digit", ch, pos1, pos2, lexarg) + + (* PRE: seen "~"? *) + fun scan_number(ch, acc, pos1, pos2, lexarg) = + if ch = #"0" then (* check for "0x", "0wx", "0w" *) + case LexArg.input1 lexarg + of (NONE, _) => (Token.T(pos1, pos2, mkintc(ch :: acc)), NONE) + | (SOME ch, pos3) => + case ch + of #"x" => scan_hex_integer(#"x" :: #"0" :: acc, pos1, pos3, lexarg) + | #"w" => (* 0w+ or 0wx+, must not have ~ prefix *) + (case acc + of _ :: _ => error("invalid word constant", pos1, pos3, lexarg) + | [] => (* no ~ prefix, Ok *) + case LexArg.input1 lexarg + of (NONE, pos4) => premature_eof("word constant", pos1, pos4, lexarg) + | (SOME ch, pos4) => + let val acc = [#"w", #"0"] + in + case ch + of #"x" => scan_hex_word(#"x" :: acc, pos1, pos4, lexarg) + | _ => scan_decimal_word(ch, acc, pos1, pos4, lexarg) + end) + | _ => + if Char.isDigit ch then scan_decimal_integer(ch :: #"0" :: acc, pos1, pos3, lexarg) + else scan_decimal_integer_post_ch(ch, #"0" :: acc, pos1, pos2, pos3, lexarg) + else scan_decimal_integer(ch :: acc, pos1, pos2, lexarg) + + fun scan_symbolic_id(acc, pos1, pos2, lexarg) = + let val (acc, pos2, lookahead) = scan_ctype(ctype_symbol, acc, pos2, lexarg) + val buf = String.implode(List.rev acc) + val tok = + case buf + of ":" => Token.COLON + | "=" => Token.EQ + | "=>" => Token.FATARROW + | "->" => Token.THINARROW + | "#" => Token.HASH + | "*" => Token.STAR + | "|" => Token.BAR + | ":>" => Token.COLONGT + | _ => Token.ID buf + in + (Token.T(pos1, pos2, tok), lookahead) + end + + fun scan_tilde(pos1, lexarg) = + case LexArg.input1 lexarg + of (NONE, _) => (Token.T(pos1, pos1, Token.ID "~"), NONE) + | (SOME ch, pos2) => + if is_ctype(ctype_digit, ch) then scan_number(ch, [#"~"], pos1, pos2, lexarg) + else if is_ctype(ctype_symbol, ch) then scan_symbolic_id([ch, #"~"], pos1, pos2, lexarg) + else (Token.T(pos1, pos1, Token.ID "~"), SOME(ch, pos2)) + + fun scan_one_alpha(ch, pos1, lexarg) = + let val (acc, pos2, lookahead) = scan_ctype(ctype_alnum, [ch], pos1, lexarg) + val buf = String.implode(List.rev acc) + val tok = + case buf + of "abstype" => Token.ABSTYPE + | "and" => Token.AND + | "andalso" => Token.ANDALSO + | "as" => Token.AS + | "case" => Token.CASE + | "datatype" => Token.DATATYPE + | "do" => Token.DO + | "else" => Token.ELSE + | "end" => Token.END + | "exception" => Token.EXCEPTION + | "fn" => Token.FN + | "fun" => Token.FUN + | "handle" => Token.HANDLE + | "if" => Token.IF + | "in" => Token.IN + | "infix" => Token.INFIX + | "infixr" => Token.INFIXR + | "let" => Token.LET + | "local" => Token.LOCAL + | "nonfix" => Token.NONFIX + | "of" => Token.OF + | "op" => Token.OP + | "orelse" => Token.ORELSE + | "raise" => Token.RAISE + | "rec" => Token.REC + | "then" => Token.THEN + | "type" => Token.TYPE + | "val" => Token.VAL + | "with" => Token.WITH + | "withtype" => Token.WITHTYPE + | "while" => Token.WHILE + | "eqtype" => Token.EQTYPE + | "functor" => Token.FUNCTOR + | "include" => Token.INCLUDE + | "sharing" => Token.SHARING + | "sig" => Token.SIG + | "signature" => Token.SIGNATURE + | "struct" => Token.STRUCT + | "structure" => Token.STRUCTURE + | "where" => Token.WHERE + | _ => Token.ID buf + in + (Token.T(pos1, pos2, tok), lookahead) + end + + fun scan_qualid(strids, pos1, pos2, lexarg) = + case LexArg.input1 lexarg + of (NONE, _) => premature_eof("qualified identifier", pos1, pos2, lexarg) + | (SOME ch, pos3) => + if is_ctype(ctype_symbol, ch) then + case scan_symbolic_id([ch], pos3, pos3, lexarg) + of (Token.T(_, pos4, Token.ID s), lookahead) => (Token.T(pos1, pos4, Token.QUALID(List.rev strids, s)), lookahead) + | (Token.T(_, pos4, Token.STAR), lookahead) => (Token.T(pos1, pos4, Token.QUALID(List.rev strids, "*")), lookahead) + | (Token.T(_, pos4, _), lookahead) => error("invalid qualified identifier", pos1, pos4, lexarg) + else if Char.isAlpha ch then + case scan_one_alpha(ch, pos3, lexarg) + of (Token.T(_, _, Token.ID s), SOME(#".", pos4)) => scan_qualid(s :: strids, pos1, pos4, lexarg) + | (Token.T(_, pos4, Token.ID s), lookahead) => (Token.T(pos1, pos4, Token.QUALID(List.rev strids, s)), lookahead) + | (Token.T(_, pos4, _), lookahead) => error("invalid qualified identifier", pos1, pos4, lexarg) + else unexpected("qualified identifier", "letter or symbol", ch, pos1, pos3, lexarg) + + fun scan_alpha(ch, pos1, lexarg) = + case scan_one_alpha(ch, pos1, lexarg) + of (Token.T(_, _, Token.ID s), SOME(#".", pos3)) => scan_qualid([s], pos1, pos3, lexarg) + | (tok, lookahead) => (tok, lookahead) + + fun scan_tyvar(pos1, lexarg) = + let val (acc, pos2, lookahead) = scan_ctype(ctype_alnum, [], pos1, lexarg) + in + case acc + of [] => error("invalid type variable", pos1, pos2, lexarg) + | _ => (Token.T(pos1, pos2, Token.TYVAR(String.implode(List.rev acc))), lookahead) + end + + fun scan_string(acc, pos1, lexarg) = + case LexArg.input1 lexarg + of (NONE, pos2) => premature_eof("string constant", pos1, pos2, lexarg) + | (SOME ch, pos2) => + case ch + of #"\"" => (Token.T(pos1, pos2, mkstringc acc), NONE) + | #"\\" => + (case LexArg.input1 lexarg + of (NONE, pos2) => premature_eof("string constant", pos1, pos2, lexarg) + | (SOME ch, pos2) => + case ch + of #"a" => scan_string((Char.chr 7) :: acc, pos1, lexarg) + | #"b" => scan_string((Char.chr 8) :: acc, pos1, lexarg) + | #"t" => scan_string((Char.chr 9) :: acc, pos1, lexarg) + | #"n" => scan_string((Char.chr 10) :: acc, pos1, lexarg) + | #"v" => scan_string((Char.chr 11) :: acc, pos1, lexarg) + | #"f" => scan_string((Char.chr 12) :: acc, pos1, lexarg) + | #"r" => scan_string((Char.chr 13) :: acc, pos1, lexarg) + | #"\"" => scan_string(ch :: acc, pos1, lexarg) + | #"\\" => scan_string(ch :: acc, pos1, lexarg) + | #"^" => (* \^C for C in [@, _] *) + (case LexArg.input1 lexarg + of (NONE, pos2) => premature_eof("string constant", pos1, pos2, lexarg) + | (SOME ch, pos2) => + if Char.ord ch >= 64 andalso Char.ord ch <= 95 + then scan_string((Char.chr(Char.ord ch - 64)) :: acc, pos1, lexarg) + else unexpected("string escape", "character in [@,_]", ch, pos1, pos2, lexarg)) + | #"u" => scan_hex4(acc, pos1, lexarg) + | _ => + if Char.isDigit ch then scan_dig3(ch, acc, pos1, lexarg) + else if Char.isSpace ch then scan_gap(acc, pos1, lexarg) + else unexpected("string escape", "whitespace, digit, or [abtnvfru^\\\"]", ch, pos1, pos2, lexarg)) + | _ => + if Char.isPrint ch then scan_string(ch :: acc, pos1, lexarg) + else unexpected("string constant", "\", \\, or printable character", ch, pos1, pos2, lexarg) + + and scan_hex4(acc, pos1, lexarg) = (* \uxxxx, seen \u *) + let fun hexdig_value ch = (* XXX: similar code in LexUtil *) + if ch >= #"a" then Char.ord ch - (Char.ord #"a" - 10) + else if ch >= #"A" then Char.ord ch - (Char.ord #"A" - 10) + else Char.ord ch - Char.ord #"0" + fun loop(0, v) = scan_string((Char.chr v) :: acc, pos1, lexarg) + | loop(n, v) = + case LexArg.input1 lexarg + of (NONE, pos2) => premature_eof("string escape", pos1, pos2, lexarg) + | (SOME ch, pos2) => + if Char.isHexDigit ch then loop(n - 1, v * 16 + hexdig_value ch) + else unexpected("string escape", "hex digit", ch, pos1, pos2, lexarg) + in + loop(4, 0) + end + + and scan_dig3(ch, acc, pos1, lexarg) = (* \ddd, seen \d *) + let fun digit_value ch = Char.ord ch - Char.ord #"0" + fun loop(0, v) = scan_string((Char.chr v) :: acc, pos1, lexarg) + | loop(n, v) = + case LexArg.input1 lexarg + of (NONE, pos2) => premature_eof("string escape", pos1, pos2, lexarg) + | (SOME ch, pos2) => + if Char.isDigit ch then loop(n - 1, v * 10 + digit_value ch) + else unexpected("string escape", "digit", ch, pos1, pos2, lexarg) + in + loop(2, digit_value ch) + end + + and scan_gap(acc, pos1, lexarg) = (* skip *\\ *) + let fun loop() = + case LexArg.input1 lexarg + of (NONE, pos2) => premature_eof("string gap", pos1, pos2, lexarg) + | (SOME ch, pos2) => + if ch = #"\\" then scan_string(acc, pos1, lexarg) + else if Char.isSpace ch then loop() + else unexpected("string gap", "whitespace or \\", ch, pos1, pos2, lexarg) + in + loop() + end + + fun scan_char_con(pos1, pos2, lexarg) = + let val (Token.T(_, pos3, tok), lookahead) = scan_string([], pos2, lexarg) + in + case tok + of Token.STRINGC str => + (case String.size str + of 1 => (Token.T(pos1, pos3, Token.CHARC(String.sub(str, 0))), lookahead) + | _ => error("invalid character constant", pos1, pos3, lexarg)) + | _ => error("invalid character constant", pos1, pos3, lexarg) + end + + fun scan_hash(pos1, lexarg) = + case LexArg.input1 lexarg + of (NONE, pos2) => (Token.T(pos1, pos1, Token.HASH), NONE) + | (SOME ch, pos2) => + if ch = #"\"" then scan_char_con(pos1, pos2, lexarg) + else if is_ctype(ctype_symbol, ch) then scan_symbolic_id([ch, #"#"], pos1, pos2, lexarg) + else (Token.T(pos1, pos1, Token.HASH), SOME(ch, pos2)) + + fun scan_comment lexarg = (* returns NONE on success or SOME pos on premature EOF *) + let fun outer level = + let fun inner() = + let fun lparen() = + case LexArg.input1 lexarg + of (NONE, pos) => SOME pos + | (SOME ch, _) => + case ch + of #"*" => outer(level + 1) + | #"(" => lparen() + | _ => inner() + fun star() = + case LexArg.input1 lexarg + of (NONE, pos) => SOME pos + | (SOME ch, _) => + case ch + of #")" => + let val level = level - 1 + in + if level = 0 then NONE else outer level + end + | #"*" => star() + | #"(" => lparen() + | _ => inner() + in + case LexArg.input1 lexarg + of (NONE, pos) => SOME pos + | (SOME ch, _) => + case ch + of #"(" => lparen() + | #"*" => star() + | _ => inner() + end + in + inner() + end + in + outer 1 + end + + fun scan_dot(pos1, lexarg) = + let fun loop(0, pos2) = (Token.T(pos1, pos2, Token.DOTDOTDOT), NONE) + | loop(n, _) = + case LexArg.input1 lexarg + of (SOME #".", pos2) => loop(n - 1, pos2) + | (SOME ch, pos2) => unexpected("...", ".", ch, pos1, pos2, lexarg) + | (NONE, pos2) => premature_eof("...", pos1, pos2, lexarg) + in + loop(2, pos1) + end + + datatype chclass + = SPACE + | SYMBOL (* sans TILDE and HASH *) + | TILDE (* special case of SYMBOL *) + | HASH (* special case of SYMBOL *) + | DQUOTE + | SQUOTE + | LPAREN + | RPAREN + | COMMA + | DOT + | DIGIT + | SEMICOLON + | ALPHA + | LBRACK + | RBRACK + | UNDERSCORE + | LBRACE + | RBRACE + | ERROR + + fun chclass ch = + case ch + of #"!" => SYMBOL + | #"\"" => DQUOTE + | #"#" => HASH (* should be SYMBOL, but we must check for character constant *) + | #"$" => SYMBOL + | #"%" => SYMBOL + | #"&" => SYMBOL + | #"'" => SQUOTE + | #"(" => LPAREN + | #")" => RPAREN + | #"*" => SYMBOL + | #"+" => SYMBOL + | #"," => COMMA + | #"-" => SYMBOL + | #"." => DOT + | #"/" => SYMBOL + | #":" => SYMBOL + | #";" => SEMICOLON + | #"<" => SYMBOL + | #"=" => SYMBOL + | #">" => SYMBOL + | #"?" => SYMBOL + | #"@" => SYMBOL + | #"[" => LBRACK + | #"\\" => SYMBOL + | #"]" => RBRACK + | #"^" => SYMBOL + | #"_" => UNDERSCORE + | #"`" => SYMBOL + | #"{" => LBRACE + | #"|" => SYMBOL + | #"}" => RBRACE + | #"~" => TILDE (* should be SYMBOL, but we must check for ~ *) + | _ => + if Char.isDigit ch then DIGIT + else if Char.isAlpha ch then ALPHA + else if Char.isSpace ch then SPACE + else ERROR + + fun scan_start lexarg = + case LexArg.input1 lexarg + of (NONE, pos) => (Token.T(pos, pos, Token.EOF), NONE) + | (SOME ch, pos) => + case chclass ch + of SYMBOL => scan_symbolic_id([ch], pos, pos, lexarg) + | DQUOTE => scan_string([], pos, lexarg) + | HASH => scan_hash(pos, lexarg) + | SQUOTE => scan_tyvar(pos, lexarg) + | LPAREN => scan_lparen(pos, lexarg) + | RPAREN => (Token.T(pos, pos, Token.RPAREN), NONE) + | COMMA => (Token.T(pos, pos, Token.COMMA), NONE) + | DOT => scan_dot(pos, lexarg) + | SEMICOLON => (Token.T(pos, pos, Token.SEMICOLON), NONE) + | LBRACK => (Token.T(pos, pos, Token.LBRACK), NONE) + | RBRACK => (Token.T(pos, pos, Token.RBRACK), NONE) + | UNDERSCORE => (Token.T(pos, pos, Token.UNDERSCORE), NONE) + | LBRACE => (Token.T(pos, pos, Token.LBRACE), NONE) + | RBRACE => (Token.T(pos, pos, Token.RBRACE), NONE) + | TILDE => scan_tilde(pos, lexarg) + | DIGIT => scan_number(ch, [], pos, pos, lexarg) + | ALPHA => scan_alpha(ch, pos, lexarg) + | SPACE => scan_start lexarg + | ERROR => error("invalid character", pos, pos, lexarg) + + and scan_lparen(pos1, lexarg) = + case LexArg.input1 lexarg + of (NONE, _) => (Token.T(pos1, pos1, Token.LPAREN), NONE) + | (SOME ch, pos) => + case ch + of #"*" => + (case scan_comment lexarg + of NONE => scan_start lexarg + | SOME pos2 => premature_eof("comment", pos1, pos2, lexarg)) + | _ => (Token.T(pos1, pos1, Token.LPAREN), SOME(ch, pos)) + + fun lex lexarg = + let val (token, lookahead) = scan_start lexarg + in + case lookahead + of NONE => () + | SOME(ch, pos) => LexArg.unget1(lexarg, ch, pos); + token + end + + end diff --git a/src/compiler/MAIN.sig b/src/compiler/MAIN.sig new file mode 100644 index 0000000..fa9b6e4 --- /dev/null +++ b/src/compiler/MAIN.sig @@ -0,0 +1,19 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature MAIN = + sig + val main: string list -> OS.Process.status + end diff --git a/src/compiler/Main.sml b/src/compiler/Main.sml new file mode 100644 index 0000000..78f6d5c --- /dev/null +++ b/src/compiler/Main.sml @@ -0,0 +1,75 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Main : MAIN = + struct + + fun sayErr s = TextIO.output(TextIO.stdErr, s) + + exception Main + + fun nyi msg = + (sayErr("Main: nyi " ^ msg ^ "\n"); raise Main) + + exception Usage + + fun usage arg = + let val progname = OS.Path.file(CommandLine.name()) + in + sayErr(progname ^ ": invalid argument '" ^ arg ^ "'\n"); + sayErr("usage: " ^ progname ^ " [options] .{sig,sml} ...\n"); + sayErr "valid options are:\n"; + sayErr "-v\n"; + raise Usage + end + + fun version() = sayErr "ErlML version 0.0\n" + + fun option arg = + case arg + of "-v" => version() + | _ => usage arg + + fun translate file = + let val absynTopdec = Parser.parse_file file + val basis = TypeCheck.check absynTopdec + in + case absynTopdec + of Absyn.STRDECtopdec(Absyn.STRDEC[Absyn.STRUCTUREstrdec(Absyn.STRBIND[(strid, _)])]) => + let val _ = Basis.write(strid ^ ".sml.basis", basis) + val cerlModule = Translate.translate absynTopdec + in + CoreErlangPrint.printModule((OS.Path.base file) ^ ".core", cerlModule) + end + | Absyn.SIGDECtopdec(Absyn.SIGBIND[(sigid, _)]) => + Basis.write(sigid ^ ".sig.basis", basis) + | _ => nyi "non-plain toplevel structure or signature file" + end + + fun main argv = + let fun loop([]) = OS.Process.success + | loop(arg :: argv) = + if String.sub(arg, 0) = #"-" then + (option arg; loop argv) + else + case OS.Path.ext arg + of SOME "sml" => (translate arg; loop argv) + | SOME "sig" => (translate arg; loop argv) + | _ => usage arg + in + loop argv + end + + end diff --git a/src/compiler/PARSER.sig b/src/compiler/PARSER.sig new file mode 100644 index 0000000..13ba7c2 --- /dev/null +++ b/src/compiler/PARSER.sig @@ -0,0 +1,29 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature PARSER = + sig + + type tokens + val tokens_open : LexArg.lexarg -> tokens + + type fixenv + val fe_init : fixenv + + val parse_topdec : tokens * fixenv -> (Absyn.topdec * fixenv) option + + val parse_file : string -> Absyn.topdec + + end diff --git a/src/compiler/Parser.sml b/src/compiler/Parser.sml new file mode 100644 index 0000000..f71711d --- /dev/null +++ b/src/compiler/Parser.sml @@ -0,0 +1,2045 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * Recursive descent parser for SML'97: + * - uses a token buffer with unbounded pushback capacity + * - uses the shunting yard algorithm for infix expressions and patterns + * - lowers all derived forms to the bare language, except for those that + * require semantic analysis (e.g. structure sharing constraints) + *) +structure Parser : PARSER = + struct + + fun sayErr s = TextIO.output(TextIO.stdErr, s) + fun sayErr1 c = TextIO.output1(TextIO.stdErr, c) + fun sayToken t = (sayErr(Token.toString t); sayErr1 #"\n") + + (* + * Token buffer with unbounded pushback capacity. + * + * Some constructs require two tokens of lookahead (topdecs, tyvarseq). + * While it is possible to handle them by combining a single-token pushback + * buffer with rewriting affected parsing functions to return / pass around + * a pre-read token, doing so greatly complicates the parser. A buffer with + * unbounded capacity is actually simpler than a single-token buffer. + *) + + datatype tokens + = TS of {lexarg : LexArg.lexarg, + lookahead : Token.token' list ref} (* XXX: tokens stripped of positions *) + + fun tokens_open lexarg = TS{lexarg = lexarg, lookahead = ref []} + + fun tokens_get(TS{lexarg, lookahead}) = + case !lookahead + of tok :: rest => (lookahead := rest; tok) + | [] => let val Token.T(_, _, tok) = Lexer.lex lexarg in tok end (* XXX: preserve positions *) + + fun tokens_unget(TS{lookahead, ...}, tok) = lookahead := tok :: !lookahead + + fun lookingAt(msg, tokens) = + let val tok = tokens_get tokens + val _ = tokens_unget(tokens, tok) + in + sayErr msg; + sayErr ", tok "; + sayToken tok + end + + (* + * Helpers for parsing single tokens. + * + * Token.token is not an equality type, due to Token.REALC having a real-typed attribute, + * so we have specialized functions for all relevant cases. A Token.eq predicate would + * be awkward to implement and likely slower. + *) + + val token_is_and = fn Token.AND => true | _ => false + val token_is_bar = fn Token.BAR => true | _ => false + val token_is_comma = fn Token.COMMA => true | _ => false + val token_is_rbrace = fn Token.RBRACE => true | _ => false + val token_is_rbrack = fn Token.RBRACK => true | _ => false + val token_is_rparen = fn Token.RPAREN => true | _ => false + val token_is_semicolon = fn Token.SEMICOLON => true | _ => false + val token_is_thinarrow = fn Token.THINARROW => true | _ => false + + exception SyntaxError + + fun expected(msg, tok) = + (sayErr "Syntax Error: expected "; sayErr msg; sayErr ", got "; sayToken tok; raise SyntaxError) + + fun check_end tok = + case tok + of Token.END => () + | _ => expected("END", tok) + + fun check_eq tok = + case tok + of Token.EQ => () + | _ => raise SyntaxError + + fun check_id tok = + case tok + of Token.ID id => id + | _ => raise SyntaxError + + fun check_longid tok = + case tok + of Token.ID id => Absyn.LONGID([], id) + | Token.QUALID(strids, id) => Absyn.LONGID(strids, id) + | _ => raise SyntaxError + + fun check_rparen tok = + case tok + of Token.RPAREN => () + | _ => raise SyntaxError + + fun parse_label tokens = + case tokens_get tokens + of Token.ID id => Absyn.IDlab id + | Token.STAR => Absyn.IDlab "*" + | Token.DIGNZ n => Absyn.INTlab n + | Token.NUMC n => Absyn.INTlab(IntInf.toInt n) + | _ => raise SyntaxError + + fun parse_colon tokens = + case tokens_get tokens + of Token.COLON => () + | _ => raise SyntaxError + + fun parse_do tokens = + case tokens_get tokens + of Token.DO => () + | _ => raise SyntaxError + + fun parse_else tokens = + case tokens_get tokens + of Token.ELSE => () + | _ => raise SyntaxError + + fun parse_end tokens = check_end(tokens_get tokens) + + fun parse_eq tokens = check_eq(tokens_get tokens) + + fun parse_fatarrow tokens = + case tokens_get tokens + of Token.FATARROW => () + | _ => raise SyntaxError + + fun parse_fn tokens = + case tokens_get tokens + of Token.FN => () + | _ => raise SyntaxError + + fun parse_id tokens = check_id(tokens_get tokens) + + fun parse_in tokens = + case tokens_get tokens + of Token.IN => () + | _ => raise SyntaxError + + fun parse_lparen tokens = + case tokens_get tokens + of Token.LPAREN => () + | _ => raise SyntaxError + + fun parse_of tokens = + case tokens_get tokens + of Token.OF => () + | _ => raise SyntaxError + + fun parse_rbrace tokens = + case tokens_get tokens + of Token.RBRACE => () + | _ => raise SyntaxError + + fun parse_rparen tokens = check_rparen(tokens_get tokens) + + fun parse_then tokens = + case tokens_get tokens + of Token.THEN => () + | _ => raise SyntaxError + + fun parse_type tokens = + case tokens_get tokens + of Token.TYPE => () + | _ => raise SyntaxError + + fun parse_tyvar tokens = + case tokens_get tokens + of Token.TYVAR id => id + | tok => expected("TYVAR", tok) + + fun parse_with tokens = + case tokens_get tokens + of Token.WITH => () + | _ => raise SyntaxError + + (* + * Parse-time Fixity Environment + * + * The top-level environment has the following infix identifiers: + * + * infix 7 * / div mod + * infix 6 + - ^ + * infixr 5 :: @ + * infix 4 = <> > >= < <= + * infix 3 := o + * infix 0 before + * + * We represent the precedence and associativity with a single integer + * by incrementing the precedence level by one (range 1-10), and then + * negating that if the identifier is right-associative. + * An identifier that is unbound or bound to zero is considered nonfix. + *) + + datatype fixenv = FE of (Absyn.ident, int) Dict.dict + + val fe_init = + let val alist = + [("/", 8), + ("div", 8), + ("mod", 8), + ("*", 8), + ("+", 7), + ("-", 7), + ("^", 7), + ("::", ~6), + ("@", ~6), + ("=", 5), + ("<>", 5), + ("<", 5), + (">", 5), + ("<=", 5), + (">=", 5), + (":=", 4), + ("o", 4), + ("before", 1)] + in + FE(Dict.fromList(Basis.identCompare, alist)) + end + + val fe_empty = FE(Dict.empty Basis.identCompare) + + fun fe_plus(FE dict1, FE dict2) = FE(Dict.plus(dict1, dict2)) + + fun fe_insert(FE dict, id, prio) = FE(Dict.insert(dict, id, prio)) + + fun id_is_infix(FE dict, id) = + case Dict.find(dict, id) + of NONE => NONE + | SOME 0 => NONE + | SOME prio => SOME(id, prio) + + fun token_is_infix(fe, eqok, tok) = + case tok + of Token.ID id => id_is_infix(fe, id) + | Token.STAR => id_is_infix(fe, "*") + | Token.EQ => if eqok then id_is_infix(fe, "=") else NONE + | _ => NONE + + (* + * A generic parser for infix patterns or expressions. + * + * This implements the shunting-yard algorithm, modified to keep + * triplets on the stack and building + * ASTs rather than emitting RPN code. The result is essentially + * a specialized shift-reduce parser. + *) + fun parse_infix(tokens, fe, eqok, parse_operand, combine, init) = + let datatype 'a stack = S of 'a * Absyn.ident * int + fun right_binds_tighter(oprL, prioL, oprR, prioR) = + if oprL = oprR then prioL < 0 (* negative means right-associative *) + else abs prioR > abs prioL + fun finally([], opndR) = opndR + | finally(S(opndL, opr, _) :: stack, opndR) = finally(stack, combine(opr, opndL, opndR)) + fun operand stack = operator(parse_operand(tokens, fe), stack) + and operator(opnd, stack) = + let val tok = tokens_get tokens + in + case token_is_infix(fe, eqok, tok) + of SOME(id, prio) => resolve(id, prio, opnd, stack) + | NONE => (tokens_unget(tokens, tok); finally(stack, opnd)) + end + and resolve(oprR, prioR, opndL, stack) = + case stack + of [] => operand(S(opndL, oprR, prioR) :: stack) + | S(opndL', oprL, prioL) :: stack' => + if right_binds_tighter(oprL, prioL, oprR, prioR) then operand(S(opndL, oprR, prioR) :: stack) + else resolve(oprR, prioR, combine(oprL, opndL', opndL), stack') + val stack = + case init + of NONE => [] + | SOME(opnd, opr, prio) => [S(opnd, opr, prio)] + in + operand stack + end + + (* + * Helper for parsing right-recursive constructs. + *) + + fun rr_parse(tokens, parse_item, is_recur_token, combine) = + let fun recur() = + let val left = parse_item tokens + val tok = tokens_get tokens + in + if is_recur_token tok then combine(left, recur()) + else (tokens_unget(tokens, tok); left) + end + in + recur() + end + + (* + * Helpers for parsing lists of items with separator tokens. + *) + + (* Non-empty lists with given separator token. *) + fun item_plus_parse(tokens, parse_item, is_sep_token) = + let fun loop acc = + let val item = parse_item tokens + val acc = item :: acc + val tok = tokens_get tokens + in + if is_sep_token tok then loop acc + else (tokens_unget(tokens, tok); List.rev acc) + end + in + loop [] + end + + (* Non-empty lists with comma as separator token. Also consumes end token. *) + fun comma_item_plus_parse(tokens, parse_item, is_end_token) = + let val items = item_plus_parse(tokens, parse_item, token_is_comma) + val tok = tokens_get tokens + in + if is_end_token tok then items + else expected("${END}-token", tok) + end + + (* Possibly empty lists with comma as separator token. Also consumes end token. *) + fun comma_item_star_parse(tokens, parse_item, is_end_token) = + let val tok = tokens_get tokens + in + if is_end_token tok then [] + else (tokens_unget(tokens, tok); comma_item_plus_parse(tokens, parse_item, is_end_token)) + end + + fun bar_item_plus_parse(tokens, parse_item) = item_plus_parse(tokens, parse_item, token_is_bar) + + fun and_item_plus_parse(tokens, parse_item) = item_plus_parse(tokens, parse_item, token_is_and) + + (* + * Grammar: Type expressions + * + * ty ::= tyvar (highest precedence) + * | '{' '}' + * | tyseq longtycon + * | ty1 '*' ... '*' tyn (n >= 2) + * | ty '->' ty' (right-associative) + * | '(' ty ')' (lowest precedence) + * tyrow ::= lab ':' ty <',' tyrow> + * tyseq ::= ty + * | (empty) + * | '(' ty1 ',' ... ',' tyn ')' (n >= 1) + *) + + fun parse_ty tokens = parse_funty tokens + + and parse_funty tokens = + rr_parse(tokens, parse_tuplety, token_is_thinarrow, Absyn.FUNty) + + and parse_tuplety tokens = + let fun loop(acc, i) = + let val ty = parse_consty tokens + val acc = (Absyn.INTlab i, ty) :: acc + val i = i + 1 + in + case tokens_get tokens + of Token.STAR => loop(acc, i) + | tok => + (tokens_unget(tokens, tok); + case acc + of [_] => ty + | _ => Absyn.RECty(List.rev acc)) + end + in + loop([], 1) + end + + and parse_consty tokens = + let fun loop tyseq = + case tokens_get tokens + of Token.ID id => loop [Absyn.CONSty(tyseq, Absyn.LONGID([], id))] + (* Note that '*' is not a TyCon, so Token.STAR is excluded here *) + | Token.QUALID(strids, id) => loop [Absyn.CONSty(tyseq, Absyn.LONGID(strids, id))] + | tok => + (tokens_unget(tokens, tok); + case tyseq + of [ty] => ty + | _ => raise SyntaxError) + in + loop(parse_tyseq tokens) + end + + and parse_tyseq tokens = + case tokens_get tokens + of Token.TYVAR id => [Absyn.VARty id] + | Token.LBRACE => + [Absyn.RECty(comma_item_star_parse(tokens, parse_tyrow_item, token_is_rbrace))] + | Token.LPAREN => + comma_item_plus_parse(tokens, parse_ty, token_is_rparen) + | tok => (tokens_unget(tokens, tok); []) + + and parse_tyrow_item tokens = + let val label = parse_label tokens + val _ = parse_colon tokens + val ty = parse_ty tokens + in + (label, ty) + end + + (* + * Helpers to assemble expressions and patterns in the bare language. + *) + + fun build_infix_pat(vid, pat1, pat2) = + Absyn.CONSpat(Absyn.LONGID([], vid), + Absyn.RECpat([(Absyn.INTlab 1, pat1), (Absyn.INTlab 2, pat2)], + false)) + + fun build_infix_exp(vid, exp1, exp2) = + Absyn.APPexp(Absyn.VIDexp(Absyn.LONGID([], vid), ref NONE), + Absyn.RECexp([(Absyn.INTlab 1, exp1), (Absyn.INTlab 2, exp2)])) + + fun build_list([], _, xnil) = xnil + | build_list(x :: xs, xcons, xnil) = xcons(x, build_list(xs, xcons, xnil)) + + fun build_listpat pats = + let val xcons = fn (pat1, pat2) => build_infix_pat("::", pat1, pat2) + val xnil = Absyn.VIDpat(Absyn.LONGID([], "nil"), ref NONE) + in + build_list(pats, xcons, xnil) + end + + fun build_listexp exps = + let val xcons = fn (exp1, exp2) => build_infix_exp("::", exp1, exp2) + val xnil = Absyn.VIDexp(Absyn.LONGID([], "nil"), ref NONE) + in + build_list(exps, xcons, xnil) + end + + fun build_case_exp(exp, match) = Absyn.APPexp(Absyn.FNexp match, exp) + + fun mk_true_exp() = Absyn.VIDexp(Absyn.LONGID([], "true"), ref NONE) + fun mk_false_exp() = Absyn.VIDexp(Absyn.LONGID([], "false"), ref NONE) + + fun build_if_exp(exp1, exp2, exp3) = + let val match = Absyn.MATCH([(Absyn.VIDpat(Absyn.LONGID([], "true"), ref NONE), exp2), + (Absyn.VIDpat(Absyn.LONGID([], "false"), ref NONE), exp3)]) + in + build_case_exp(exp1, match) + end + + fun build_andalso(exp1, exp2) = build_if_exp(exp1, exp2, mk_false_exp()) + fun build_orelse(exp1, exp2) = build_if_exp(exp1, mk_true_exp(), exp2) + + fun build_seq_exp(exp1, exp2) = build_case_exp(exp1, Absyn.MATCH([(Absyn.WILDpat, exp2)])) + + fun build_while_exp(exp1, exp2) = + let val vid = Absyn.LONGID([], Absyn.gensym()) + val appvid = Absyn.APPexp(Absyn.VIDexp(vid, ref NONE), Absyn.RECexp []) + val exp = build_seq_exp(exp2, appvid) + val exp = build_if_exp(exp1, exp, Absyn.RECexp []) + val match = Absyn.MATCH([(Absyn.RECpat([], false), exp)]) + in + Absyn.LETexp(Absyn.DEC [Absyn.VALdec([], [], [(Absyn.VIDpat(vid, ref NONE), match)])], + appvid) + end + + fun build_hashexp lab = + let val longid = Absyn.LONGID([], Absyn.gensym()) + in + Absyn.FNexp(Absyn.MATCH([(Absyn.RECpat([(lab, Absyn.VIDpat(longid, ref NONE))], true), Absyn.VIDexp(longid, ref NONE))])) + end + + (* + * Helpers for identifier classification. + * + * Since "*" and "=" have specific uses in the syntax, they are returned from the + * lexer as keyword tokens rather than as the general identifier token. + * + * In pattern contexts we map the "*" keyword to the corresponding identifier. + * + * In expression contexts we map the "*" and "=" keywords to their corresponding identifiers. + * + * In contexts that allow infix notation, identifiers that have infix status are mapped + * to a synthetic token class representing infix identifiers. + *) + + fun pat_filter_token tok = + case tok + of Token.STAR => Token.ID "*" + (* Token.EQ is not mapped to Token.ID here *) + | _ => tok + + fun parse_vid tokens = check_id(pat_filter_token(tokens_get tokens)) + + fun exp_filter_token tok = + case tok + of Token.STAR => Token.ID "*" + | Token.EQ => Token.ID "=" + | _ => tok + + fun fe_filter_token(fe, tok) = + case tok + of Token.ID id => + (case id_is_infix(fe, id) + of NONE => tok + | SOME(_, prio) => Token.INFID(id, prio)) + | _ => tok + + fun pat_fe_filter_token(fe, tok) = fe_filter_token(fe, pat_filter_token tok) + + fun exp_fe_filter_token(fe, tok) = fe_filter_token(fe, exp_filter_token tok) + + fun parse_longvid_pat tokens = check_longid(pat_filter_token(tokens_get tokens)) (* includes STAR, excludes EQ *) + + fun parse_longvid_exp tokens = check_longid(exp_filter_token(tokens_get tokens)) (* includes STAR and EQ *) + + (* + * Grammar: Patterns + * + * atpat ::= '_' + * | scon + * | <'op'>longvid + * | '{' '}' + * | '(' pat1 ',' ... ',' patn ')' (n >= 0) + * | '[' pat1 ',' ... ',' patn ']' (n >= 0) + * patrow ::= '...' + * | lab '=' pat <',' patrow> + * | vid<':' ty> <'as' pat> <',' patrow> + * apppat ::= atpat + * | <'op'>longvid atpat + * infpat ::= apppat + * | infpat1 vid infpat2 (vid is infix) + * pat ::= infpat (highest precedence) + * | pat ':' ty + * | <'op'>vid<':' ty> 'as' pat (lowest precedence) + * + * Note 1: The apppat and infpat layering is a correction from SuccessorML, + * "Fixing various minor issues with the formal syntax". + * + * Implementation Note: The infix notation in fvalbind places very specific + * requirements on the syntax of the formal parameters. To handle this we + * parse patterns in two steps: The first step parses following the grammar + * and produces a representation of the actual parse tree. The second step + * maps the parse tree to an abstract syntax tree. The intermediate parse + * tree is inspected when parsing function clauses in fvalbinds; in other + * contexts the parse tree is immediately mapped to a syntax tree. + *) + + datatype pat + (* pats: primitive forms, 1-to-1 with Absyn *) + = WILDpat + | SCONpat of Absyn.scon + | RECpat of (Absyn.label * pat) list * bool (* bool: flexible? *) + | CONSpat of Absyn.longid * pat + | TYPEDpat of pat * Absyn.ty + | ASpat of Absyn.ident * pat + (* pats: derived forms to be lowered or additional attributes to be removed *) + | VIDpat of bool * Absyn.longid (* bool: prefixed by op? *) + | TUPLEpat of pat list + | LISTpat of pat list + | INFIXpat of Absyn.ident * pat * pat + + fun pat_to_absyn pat = + case pat + of WILDpat => Absyn.WILDpat + | SCONpat sc => Absyn.SCONpat sc + | RECpat(row, flexible) => + let fun convert(lab, pat) = (lab, pat_to_absyn pat) + in + Absyn.RECpat(map convert row, flexible) + end + | CONSpat(longid, pat) => Absyn.CONSpat(longid, pat_to_absyn pat) + | TYPEDpat(pat, ty) => Absyn.TYPEDpat(pat_to_absyn pat, ty) + | ASpat(vid, pat) => Absyn.ASpat(vid, pat_to_absyn pat) + | VIDpat(_, longid) => Absyn.VIDpat(longid, ref NONE) + | TUPLEpat pats => + let fun loop([], [(_, pat)], _) = pat + | loop([], row, _) = Absyn.RECpat(List.rev row, false) + | loop(pat :: pats, row, i) = loop(pats, (Absyn.INTlab i, pat_to_absyn pat) :: row, i + 1) + in + loop(pats, [], 1) + end + | LISTpat pats => build_listpat(map pat_to_absyn pats) + | INFIXpat(opr, opndL, opndR) => build_infix_pat(opr, pat_to_absyn opndL, pat_to_absyn opndR) + + fun parse_atpat'(tokens, fe) = + case parse_atpat_opt'(tokens, fe) + of SOME pat => pat + | NONE => raise SyntaxError + + and parse_atpat_opt'(tokens, fe) = + let val tok = tokens_get tokens + in + case pat_fe_filter_token(fe, tok) + of Token.UNDERSCORE => SOME WILDpat + | Token.DIGZ => SOME(SCONpat(Absyn.INTsc(IntInf.fromInt 0))) + | Token.DIGNZ n => SOME(SCONpat(Absyn.INTsc(IntInf.fromInt n))) + | Token.NUMC n => SOME(SCONpat(Absyn.INTsc n)) + | Token.INTC n => SOME(SCONpat(Absyn.INTsc n)) + | Token.WORDC n => SOME(SCONpat(Absyn.WORDsc n)) + (* Token.REALC is disallowed here *) + | Token.STRINGC s => SOME(SCONpat(Absyn.STRINGsc s)) + | Token.CHARC c => SOME(SCONpat(Absyn.CHARsc c)) + | Token.OP => SOME(VIDpat(true, parse_longvid_pat tokens)) + | Token.ID id => SOME(VIDpat(false, Absyn.LONGID([], id))) (* includes STAR, excludes EQ, excludes infix IDs *) + | Token.QUALID(strids, id) => SOME(VIDpat(false, Absyn.LONGID(strids, id))) + | Token.LBRACE => SOME(parse_patrow'(tokens, fe)) + | Token.LPAREN => SOME(parse_tuplepat'(tokens, fe)) + | Token.LBRACK => SOME(parse_listpat'(tokens, fe)) + | _ => (tokens_unget(tokens, tok); NONE) + end + + and parse_patrow'(tokens, fe) = + let fun loop acc = + case pat_filter_token(tokens_get tokens) + of Token.DOTDOTDOT => (parse_rbrace tokens; RECpat(List.rev acc, true)) + | Token.DIGNZ n => row_eq(n, acc) + | Token.NUMC n => row_eq(IntInf.toInt n, acc) + | Token.ID id => row_id(id, acc) (* includes STAR, excludes EQ *) + | _ => raise SyntaxError + and row_eq(n, acc) = (parse_eq tokens; row_pat(Absyn.INTlab n, acc)) + and row_pat(lab, acc) = + let val pat = parse_pat' fe tokens + in + next((lab, pat) :: acc) + end + and next acc = next_tok(tokens_get tokens, acc) + and next_tok(tok, acc) = + (case tok + of Token.COMMA => loop acc + | Token.RBRACE => RECpat(List.rev acc, false) + | _ => raise SyntaxError) + and row_id(id, acc) = + (case tokens_get tokens + of Token.EQ => row_pat(Absyn.IDlab id, acc) + | Token.COLON => row_ty(id, acc) + | Token.AS => row_as_pat(NONE, id, acc) + | tok => + let val pat = VIDpat(false, Absyn.LONGID([], id)) + val acc = (Absyn.IDlab id, pat) :: acc + in + next_tok(tok, acc) + end) + and row_ty(id, acc) = + let val ty = parse_ty tokens + in + case tokens_get tokens + of Token.AS => row_as_pat(SOME ty, id, acc) + | tok => + let val pat = TYPEDpat(VIDpat(false, Absyn.LONGID([], id)), ty) + val acc = (Absyn.IDlab id, pat) :: acc + in + next_tok(tok, acc) + end + end + and row_as_pat(tyOpt, id, acc) = + let val pat = parse_pat' fe tokens + val pat = ASpat(id, pat) + val pat = case tyOpt + of NONE => pat + | SOME ty => TYPEDpat(pat, ty) + val acc = (Absyn.IDlab id, pat) :: acc + in + next acc + end + in + case tokens_get tokens + of Token.RBRACE => RECpat([], false) + | tok => (tokens_unget(tokens, tok); loop []) + end + + and parse_tuplepat'(tokens, fe) = + TUPLEpat(comma_item_star_parse(tokens, parse_pat' fe, token_is_rparen)) + + and parse_listpat'(tokens, fe) = + LISTpat(comma_item_star_parse(tokens, parse_pat' fe, token_is_rbrack)) + + and parse_apppat'(tokens, fe) = + let val tok = tokens_get tokens + in + case pat_fe_filter_token(fe, tok) + of Token.OP => parse_apppat_atpat_opt'(true, parse_longvid_pat tokens, tokens, fe) + | Token.ID id => (* includes STAR, excludes EQ, excludes infix IDs *) + parse_apppat_atpat_opt'(false, Absyn.LONGID([], id), tokens, fe) + | Token.QUALID(strids, id) => parse_apppat_atpat_opt'(false, Absyn.LONGID(strids, id), tokens, fe) + | _ => (tokens_unget(tokens, tok); parse_atpat'(tokens, fe)) + end + + and parse_apppat_atpat_opt'(have_op, longid, tokens, fe) = + case parse_atpat_opt'(tokens, fe) + of SOME pat => CONSpat(longid, pat) + | NONE => VIDpat(have_op, longid) + + and parse_infpat'(tokens, fe) = parse_infpat3'(tokens, fe, NONE) + + and parse_infpat3'(tokens, fe, init) = + parse_infix(tokens, fe, false, parse_apppat', INFIXpat, init) + + and parse_pat' fe tokens = + let val tok = tokens_get tokens + in + case pat_fe_filter_token(fe, tok) + of Token.OP => parse_pat_after_vid'(true, parse_vid tokens, tokens, fe) (* includes STAR, excludes EQ *) + | Token.ID id => parse_pat_after_vid'(false, id, tokens, fe) (* includes STAR, excludes EQ, excludes infix IDs *) + | _ => + let val _ = tokens_unget(tokens, tok) + val pat = parse_infpat'(tokens, fe) + in + case tokens_get tokens + of Token.COLON => TYPEDpat(pat, parse_ty tokens) + | tok => (tokens_unget(tokens, tok); pat) + end + end + + and parse_pat_after_vid'(have_op, id, tokens, fe) = + case tokens_get tokens + of Token.COLON => + let val ty = parse_ty tokens + in + case tokens_get tokens + of Token.AS => + let val pat = parse_pat' fe tokens + in + TYPEDpat(ASpat(id, pat), ty) + end + | tok => (tokens_unget(tokens, tok); TYPEDpat(VIDpat(have_op, Absyn.LONGID([], id)), ty)) + end + | Token.AS => + let val pat = parse_pat' fe tokens + in + ASpat(id, pat) + end + | tok => + let val _ = tokens_unget(tokens, tok) + val apppat = parse_apppat_atpat_opt'(have_op, Absyn.LONGID([], id), tokens, fe) + val tok = tokens_get tokens + in + case pat_fe_filter_token(fe, tok) + of Token.INFID(id, prio) => parse_infpat3'(tokens, fe, SOME(apppat, id, prio)) + | _ => (tokens_unget(tokens, tok); apppat) + end + + fun parse_atpat(tokens, fe) = pat_to_absyn(parse_atpat'(tokens, fe)) + fun parse_pat(tokens, fe) = pat_to_absyn(parse_pat' fe tokens) + + (* + * Parse and resolve function clause parameter syntax. + * A function clause can have one of the following three forms: + * + * 1. vid atpat1 ... atpatn <:ty> = exp (n >= 1) + * 2. (atpat1 vid atpat2) atpat3 ... atpatn <:ty> = exp (n >= 2, vid is infix) + * 3. atpat1 vid atpat2 <:ty> = exp (vid is infix) + * + * XXX: merge parse & resolve into a single state machine for improved error reporting? + *) + + datatype farg (* not "funarg" since historically that has a different meaning *) + = INFIDfarg of Absyn.ident + | ATPATfarg of pat + + fun parse_fargs(tokens, fe) = + let fun loop acc = + let val tok = tokens_get tokens + in + case pat_fe_filter_token(fe, tok) + of Token.INFID(id, _) => loop(INFIDfarg id :: acc) + | Token.COLON => (tokens_unget(tokens, tok); List.rev acc) + | Token.EQ => (tokens_unget(tokens, tok); List.rev acc) + | _ => (tokens_unget(tokens, tok); loop(ATPATfarg(parse_atpat'(tokens, fe)) :: acc)) + end + in + loop [] + end + + fun resolve_fargs fargs = + let fun farg2pat(ATPATfarg pat) = pat + | farg2pat(INFIDfarg _) = raise SyntaxError + fun finish(id, pats) = + let val arity = length pats + val pat = pat_to_absyn(TUPLEpat pats) + in + (id, arity, pat) + end + in + case fargs + of [ATPATfarg pat1, INFIDfarg id, ATPATfarg pat2] => (* form #3 *) + (id, 1, pat_to_absyn(TUPLEpat[pat1, pat2])) + | ATPATfarg(TUPLEpat[INFIXpat(id, pat1, pat2)]) :: rest => (* form #2 *) + finish(id, TUPLEpat[pat1, pat2] :: map farg2pat rest) + | ATPATfarg(VIDpat(_, Absyn.LONGID([], id))) :: rest => (* form #1 *) + finish(id, map farg2pat rest) + | _ => raise SyntaxError + end + + fun build_fdef fclauses = + let fun mkvids(0, vids, [(_, exp)]) = (vids, exp) + | mkvids(0, vids, row) = (vids, Absyn.RECexp row) + | mkvids(i, vids, row) = + let val vid = Absyn.gensym() + in + mkvids(i - 1, vid :: vids, (Absyn.INTlab i, Absyn.VIDexp(Absyn.LONGID([], vid), ref NONE)) :: row) + end + fun mkfn([], exp) = exp + | mkfn(vid :: vids, exp) = Absyn.FNexp(Absyn.MATCH([(Absyn.VIDpat(Absyn.LONGID([], vid), ref NONE), mkfn(vids, exp))])) + fun mkmatch(vid :: vids, exp) = Absyn.MATCH([(Absyn.VIDpat(Absyn.LONGID([], vid), ref NONE), mkfn(vids, exp))]) + | mkmatch([], _) = raise SyntaxError + fun mkfdef(name, arity, mrules) = + let val match = Absyn.MATCH mrules + val (vids, exp) = mkvids(arity, [], []) + val exp = Absyn.APPexp(Absyn.FNexp match, exp) + val match = mkmatch(vids, exp) + in + (Absyn.VIDpat(Absyn.LONGID([], name), ref NONE), match) + end + fun check(fclauses, name, arity, mrules) = + case fclauses + of [] => mkfdef(name, arity, List.rev mrules) + | (name', arity', mrule') :: fclauses' => + if name = name' andalso arity = arity' then check(fclauses', name, arity, mrule' :: mrules) + else raise SyntaxError + in + case fclauses + of (name, arity, mrule) :: fclauses' => check(fclauses', name, arity, [mrule]) + | [] => raise SyntaxError + end + + (* + * Grammar: Declarations (Core) + * + * dec ::= 'val' tyvarseq valbind + * | 'fun' tyvarseq fvalbind + * | 'type' typbind + * | 'datatype' datbind <'withtype' typbind> + * | 'datatype' tycon '=' 'datatype' longtycon + * | 'abstype' datbind <'withtype' typbind> 'with' dec 'end' + * | 'exception' exbind + * | 'local' dec1 'in' dec2 'end' + * | 'open' longstrid1 ... longstridn (n >= 1) + * | (empty) + * | dec1 <';'> dec2 + * | infix vid1 ... vidn (n >= 1) + * | infixr vid1 ... vidn (n >= 1) + * | noinfix vid1 ... vidn (n >= 1) + * valbind ::= pat '=' exp <'and' valbind> + * | 'rec' valbind (Note 1) + * fvalbind ::= fclause <'and' fvalbind> + * fdef ::= fclause <'|' fclause> + * fclause ::= <'op'>vid atpat1...atpatn <':' ty> '=' exp (Note 2, 3) + * typbind ::= tyvarseq tycon '=' ty <'and' typbind> + * datbind ::= tyvarseq tycon '=' conbind <'and' datbind> + * conbind ::= <'op'>vid <'of' ty> <'|' conbind> + * exbind ::= <'op'>vid <'of' ty> <'and' exbind> + * | <'op'>vid '=' <'op'>longvid <'and' exbind> + * + * Note 1: For each value binding "pat = exp" within "rec", "exp" must be of + * the form "fn match". + * + * Note 2: If "vid" has infix status, then either "op" must be present, or + * "vid" must be infixed as "(atpat1 vid atpat2) ..."; the parentheses may + * be dropped if n = 2, i.e. if ": ty" or "= exp" follows immediately. + *) + + fun parse_op_opt_vid(tokens, fe) = + let val tok = tokens_get tokens + in + case pat_fe_filter_token(fe, tok) + of Token.OP => parse_vid tokens + | Token.ID id => id + | _ => raise SyntaxError + end + + fun parse_op_opt_longvid(tokens, fe) = + let val tok = tokens_get tokens + in + case pat_fe_filter_token(fe, tok) + of Token.OP => parse_longvid_pat tokens + | Token.ID id => Absyn.LONGID([], id) + | Token.QUALID(strids, id) => Absyn.LONGID(strids, id) + | _ => raise SyntaxError + end + + fun parse_exbind_item fe tokens = + let val id = parse_op_opt_vid(tokens, fe) + in + case tokens_get tokens + of Token.OF => Absyn.OFexb(id, parse_ty tokens) + | Token.EQ => Absyn.EQexb(id, parse_op_opt_longvid(tokens, fe)) + | tok => (tokens_unget(tokens, tok); Absyn.CONexb id) + end + + fun parse_exbind(tokens, fe) = + and_item_plus_parse(tokens, parse_exbind_item fe) + + fun parse_exception_dec(tokens, fe) = Absyn.EXdec(parse_exbind(tokens, fe)) + + fun parse_ofty_opt tokens = + case tokens_get tokens + of Token.OF => SOME(parse_ty tokens) + | tok => (tokens_unget(tokens, tok); NONE) + + fun parse_conbind_item fe tokens = + let val id = parse_op_opt_vid(tokens, fe) + val tyOpt = parse_ofty_opt tokens + in + (id, tyOpt) + end + + fun parse_conbind(tokens, fe) = + Absyn.CONBIND(bar_item_plus_parse(tokens, parse_conbind_item fe)) + + (* may be followed by in or , so after seeing "(" + * we must inspect one more token to determine if the "(" is part of the or + * not. Therefore this requires two tokens of lookahead. + *) + fun parse_tyvarseq tokens = + case tokens_get tokens + of Token.TYVAR id => [id] + | (tok as Token.LPAREN) => + (case tokens_get tokens + of Token.TYVAR id => + (case tokens_get tokens + of Token.RPAREN => [id] + | Token.COMMA => id :: comma_item_plus_parse(tokens, parse_tyvar, token_is_rparen) + | _ => raise SyntaxError) + | tok2 => (tokens_unget(tokens, tok2); tokens_unget(tokens, tok); [])) + | tok => (tokens_unget(tokens, tok); []) + + val parse_tycon = parse_id (* excludes STAR *) + + fun parse_tyvarseq_tycon tokens = + let val tyvarseq = parse_tyvarseq tokens + val tycon = parse_tycon tokens + in + (tyvarseq, tycon) + end + + fun parse_tyvarseq_tycon_eq tokens = + let val (tyvarseq, tycon) = parse_tyvarseq_tycon tokens + val _ = parse_eq tokens + in + (tyvarseq, tycon) + end + + fun parse_tyvarseq_tycon_eq_ty tokens = + let val (tyvarseq, tycon) = parse_tyvarseq_tycon_eq tokens + val ty = parse_ty tokens + in + (tyvarseq, tycon, ty) + end + + fun parse_typbind tokens = + Absyn.TYPBIND(and_item_plus_parse(tokens, parse_tyvarseq_tycon_eq_ty)) + + fun parse_type_dec tokens = Absyn.TYPEdec(parse_typbind tokens) + + fun parse_datbind_item fe tokens = + let val (tyvarseq, tycon) = parse_tyvarseq_tycon_eq tokens + val conbind = parse_conbind(tokens, fe) + in + (tyvarseq, tycon, conbind) + end + + fun parse_datbind(datbOpt, tokens, fe) = + let fun parse() = and_item_plus_parse(tokens, parse_datbind_item fe) + val datbinds = + case datbOpt + of NONE => parse() + | SOME datb => + case tokens_get tokens + of Token.AND => datb :: parse() + | tok => (tokens_unget(tokens, tok); [datb]) + in + Absyn.DATBIND datbinds + end + + fun parse_withtype_opt tokens = + case tokens_get tokens + of Token.WITHTYPE => parse_typbind tokens + | tok => (tokens_unget(tokens, tok); Absyn.TYPBIND []) + + (* XXX: check that it's not QUALID(strids, "*") *) + fun parse_longtycon tokens = check_longid(tokens_get tokens) + + fun parse_datatype_dec(tokens, fe) = + let fun do_datbind datbOpt = + let val datbind = parse_datbind(datbOpt, tokens, fe) + val typbind = parse_withtype_opt tokens + in + Absyn.DATATYPEdec(datbind, typbind) + end + in + case tokens_get tokens + of Token.ID tycon => + let val _ = parse_eq tokens + in + case tokens_get tokens + of Token.DATATYPE => Absyn.DATAREPLdec(tycon, parse_longtycon tokens) + | tok => + (* We've seen "datatype tycon = X" where X isn't "datatype", + * so we must have a conbind and possibly more datbinds. + * If our token pushback buffer was larger we could just + * push back "X", "=", and "tycon" and retry at datbind. + * Instead we parse the first datbind, and make parse_datbind + * accept an optional pre-parsed datbind. + * + * XXX: rewrite to use new unbounded pushback buffer + *) + let val _ = tokens_unget(tokens, tok) + val conbind = parse_conbind(tokens, fe) + val datb = ([], tycon, conbind) + in + do_datbind(SOME datb) + end + end + | tok => (tokens_unget(tokens, tok); do_datbind NONE) + end + + fun is_strid id = Char.isAlpha(String.sub(id, 0)) + + fun check_strid id = if is_strid id then () else raise SyntaxError + + fun parse_longstrid tokens = + let val (longstrid as Absyn.LONGID(_, strid)) = parse_longtycon tokens + val _ = check_strid strid + in + longstrid + end + + fun parse_open_longstrids tokens = + let fun loop acc = + let fun check(strids, id) = (check_strid id; loop(Absyn.LONGID(strids, id) :: acc)) + in + case tokens_get tokens + of Token.ID id => check([], id) + | Token.QUALID(strids, id) => check(strids, id) + | tok => (tokens_unget(tokens, tok); List.rev acc) + end + in + loop [parse_longstrid tokens] + end + + fun parse_open_dec tokens = Absyn.OPENdec(parse_open_longstrids tokens) + + fun parse_fixity_vids tokens = + let fun loop acc = + let val tok = tokens_get tokens + in + case pat_filter_token tok + of Token.ID id => loop(id :: acc) + | _ => (tokens_unget(tokens, tok); List.rev acc) + end + in + loop [parse_vid tokens] + end + + fun parse_infix_directive tokens = + let val d = + case tokens_get tokens + of Token.DIGNZ d => d + | tok => (tokens_unget(tokens, tok); 0) + in + (d, parse_fixity_vids tokens) + end + + fun process_fixity_directive(prio, vids) = + List.foldl (fn(id, fe) => fe_insert(fe, id, prio)) fe_empty vids + + fun parse_infix_dec tokens = + let val (d, vids) = parse_infix_directive tokens + val prio = d + 1 + in + process_fixity_directive(prio, vids) + end + + fun parse_infixr_dec tokens = + let val (d, vids) = parse_infix_directive tokens + val prio = ~(d + 1) + in + process_fixity_directive(prio, vids) + end + + fun parse_nonfix_dec tokens = + let val vids = parse_fixity_vids tokens + val prio = 0 + in + process_fixity_directive(prio, vids) + end + + (* Declarations parsing needs to be greedy, except when parsing top-level + * declarations at the REPL, in which case we should stop at the first SEMI. + * + * We pass in an inherited fixity environment to be used for parsing, and + * return a synthesized fixity environment containing only the additions + * from this declaration, if any. Sequencing combines these appropriately. + * For LOCAL dec1 IN dec2 END, the synthesized bindings from dec1 are merged + * with the initial inherited bindings to be the inherited bindings for dec2, + * whose synthesized bindings become the overall synthesized bindings. + *) + fun parse_dec(tokens, feIn, stopAtSemi) = + let fun outer(acc, feIn, feOut) = + let fun done() = (Absyn.DEC(List.rev acc), feOut) + fun onlydec dec = outer(dec :: acc, feIn, feOut) + fun fixity feOut' = outer(acc, fe_plus(feIn, feOut'), fe_plus(feOut, feOut')) + fun both(dec, feOut') = outer(dec :: acc, fe_plus(feIn, feOut'), fe_plus(feOut, feOut')) + in + case tokens_get tokens + of Token.VAL => onlydec(parse_val_dec(tokens, feIn)) + | Token.FUN => onlydec(parse_fun_dec(tokens, feIn)) + | Token.TYPE => onlydec(parse_type_dec tokens) + | Token.DATATYPE => onlydec(parse_datatype_dec(tokens, feIn)) + | Token.ABSTYPE => both(parse_abstype_dec(tokens, feIn)) + | Token.EXCEPTION => onlydec(parse_exception_dec(tokens, feIn)) + | Token.LOCAL => both(parse_local_dec(tokens, feIn)) + | Token.OPEN => onlydec(parse_open_dec tokens) + | Token.INFIX => fixity(parse_infix_dec tokens) + | Token.INFIXR => fixity(parse_infixr_dec tokens) + | Token.NONFIX => fixity(parse_nonfix_dec tokens) + | Token.SEMICOLON => if stopAtSemi then done() else outer(acc, feIn, feOut) + | tok => (tokens_unget(tokens, tok); done()) + end + in + outer([], feIn, fe_empty) + end + + and parse_abstype_dec(tokens, feIn) = + let val datbind = parse_datbind(NONE, tokens, feIn) + val typbind = parse_withtype_opt tokens + val _ = parse_with tokens + val (dec, feOut) = parse_dec(tokens, feIn, false) + val _ = parse_end tokens + in + (Absyn.ABSTYPEdec(datbind, typbind, dec), feOut) + end + + and parse_local_dec(tokens, feIn) = + let val (dec1, feOut1) = parse_dec(tokens, feIn, false) + val _ = parse_in tokens + val (dec2, feOut2) = parse_dec(tokens, fe_plus(feIn, feOut1), false) + val _ = parse_end tokens + in + (Absyn.LOCALdec(dec1, dec2), feOut2) + end + + and parse_val_dec(tokens, fe) = + let val tyvarseq = parse_tyvarseq tokens + fun done(nonrecs, recs) = Absyn.VALdec(tyvarseq, nonrecs, recs) + fun dorec(nonrecs, recs) = + done(nonrecs, and_item_plus_parse(tokens, parse_valrec_item fe)) + fun nonrec acc = + case tokens_get tokens + of Token.REC => dorec(List.rev acc, []) + | tok => + let val _ = tokens_unget(tokens, tok) + val pat = parse_pat(tokens, fe) + val _ = parse_eq tokens + val exp = parse_exp fe tokens + val acc = (pat, exp) :: acc + in + case tokens_get tokens + of Token.AND => nonrec acc + | tok => (tokens_unget(tokens, tok); done(acc, [])) + end + in + nonrec [] + end + + and parse_valrec_item fe tokens = + let val pat = parse_pat(tokens, fe) + val _ = parse_eq tokens + val _ = parse_fn tokens + val match = parse_match(tokens, fe) + in + (pat, match) + end + + and parse_fun_dec(tokens, fe) = + let val tyvarseq = parse_tyvarseq tokens + in + Absyn.VALdec(tyvarseq, [], and_item_plus_parse(tokens, parse_fdef fe)) + end + + and parse_fdef fe tokens = + build_fdef(bar_item_plus_parse(tokens, parse_fclause fe)) + + and parse_fclause fe tokens = + let val (name, arity, pat) = resolve_fargs(parse_fargs(tokens, fe)) + val tyOpt = + case tokens_get tokens + of Token.COLON => SOME(parse_ty tokens) + | tok => (tokens_unget(tokens, tok); NONE) + val _ = parse_eq tokens + val exp = parse_exp fe tokens + val exp = + case tyOpt + of SOME ty => Absyn.TYPEDexp(exp, ty) + | NONE => exp + val mrule = (pat, exp) + in + (name, arity, mrule) + end + + (* + * Grammar: Expressions + * + * atexp ::= scon + * | <'op'>longvid + * | '{' '}' + * | '#' lab + * | '(' exp1 ',' ... ',' expn ')' (n >= 0) + * | '[' exp1 ',' ... ',' expn ']' (n >= 0) + * | '(' exp1 ';' ... ';' expn ')' (n >= 2) + * | 'let' dec 'in' exp1 ';' ... ',' expn 'end' (n >= 1) + * exprow ::= lab '=' exp <',' exprow> + * appexp ::= atexp + * | appexp atexp + * infexp ::= appexp + * | infexp1 vid infexp2 (vid is infix) + * exp ::= infexp (highest precendence) + * | exp ':' ty + * | exp1 'andalso' exp2 + * | exp1 'orelse' exp2 + * | exp 'handle' match + * | 'raise' exp + * | 'if' exp1 'then' exp2 'else' exp3 + * | 'while' exp1 'do' exp2 + * | 'case' exp 'of' match + * | 'fn' match (lowest precedence) + * match ::= mrule <'|' match> + * mrule ::= pat '=>' exp + *) + + and parse_atexp(tokens, fe) = + case parse_atexp_opt(tokens, fe) + of SOME exp => exp + | NONE => raise SyntaxError + + and parse_atexp_opt(tokens, fe) = + let val tok = tokens_get tokens + in + case exp_fe_filter_token(fe, tok) + of Token.DIGZ => SOME(Absyn.SCONexp(Absyn.INTsc(IntInf.fromInt 0))) + | Token.DIGNZ n => SOME(Absyn.SCONexp(Absyn.INTsc(IntInf.fromInt n))) + | Token.NUMC n => SOME(Absyn.SCONexp(Absyn.INTsc n)) + | Token.INTC n => SOME(Absyn.SCONexp(Absyn.INTsc n)) + | Token.WORDC n => SOME(Absyn.SCONexp(Absyn.WORDsc n)) + | Token.REALC r => SOME(Absyn.SCONexp(Absyn.REALsc r)) + | Token.STRINGC s => SOME(Absyn.SCONexp(Absyn.STRINGsc s)) + | Token.CHARC c => SOME(Absyn.SCONexp(Absyn.CHARsc c)) + | Token.OP => SOME(Absyn.VIDexp(parse_longvid_exp tokens, ref NONE)) + | Token.ID id => SOME(Absyn.VIDexp(Absyn.LONGID([], id), ref NONE)) (* includes STAR and EQ, excludes infix IDs *) + | Token.QUALID(strids, id) => SOME(Absyn.VIDexp(Absyn.LONGID(strids, id), ref NONE)) + | Token.LBRACE => SOME(parse_exprow(tokens, fe)) + | Token.LPAREN => SOME(parse_lparen_exp(tokens, fe)) + | Token.LBRACK => SOME(parse_listexp(tokens, fe)) + | Token.LET => SOME(parse_letexp(tokens, fe)) + | Token.HASH => SOME(parse_hashexp tokens) + | _ => (tokens_unget(tokens, tok); NONE) + end + + and parse_exprow(tokens, fe) = + Absyn.RECexp(comma_item_star_parse(tokens, parse_exprow_item fe, token_is_rbrace)) + + and parse_exprow_item fe tokens = + let val lab = parse_label tokens + val _ = parse_eq tokens + val exp = parse_exp fe tokens + in + (lab, exp) + end + + and parse_lparen_exp(tokens, fe) = + let fun tuple(acc, i) = + let val exp = parse_exp fe tokens + val acc = (Absyn.INTlab i, exp) :: acc + val i = i + 1 + in + case tokens_get tokens + of Token.COMMA => tuple(acc, i) + | Token.RPAREN => Absyn.RECexp(List.rev acc) + | _ => raise SyntaxError + end + in + case tokens_get tokens + of Token.RPAREN => Absyn.RECexp [] + | tok => + let val _ = tokens_unget(tokens, tok) + val exp = parse_exp fe tokens + in + case tokens_get tokens + of Token.RPAREN => exp + | Token.COMMA => tuple([(Absyn.INTlab 1, exp)], 2) + | Token.SEMICOLON => + let val exp2 = parse_sequence_exp(tokens, fe) + val _ = parse_rparen tokens + in + build_seq_exp(exp, exp2) + end + | _ => raise SyntaxError + end + end + + and parse_sequence_exp(tokens, fe) = (* exp1 ; ... ; expn *) + rr_parse(tokens, parse_exp fe, token_is_semicolon, build_seq_exp) + + and parse_listexp(tokens, fe) = + build_listexp(comma_item_star_parse(tokens, parse_exp fe, token_is_rbrack)) + + and parse_letexp(tokens, feIn) = + let val (dec, feOut) = parse_dec(tokens, feIn, false) + val _ = parse_in tokens + val fe = fe_plus(feIn, feOut) + val exp = parse_sequence_exp(tokens, fe) + val _ = parse_end tokens + in + Absyn.LETexp(dec, exp) + end + + and parse_hashexp tokens = + build_hashexp(parse_label tokens) + + and parse_appexp(tokens, fe) = + let fun loop f = + case parse_atexp_opt(tokens, fe) + of SOME atexp => loop(Absyn.APPexp(f, atexp)) + | NONE => f + in + loop(parse_atexp(tokens, fe)) + end + + and parse_infexp(tokens, fe) = + parse_infix(tokens, fe, true, parse_appexp, build_infix_exp, NONE) + + and parse_exp fe tokens = + let fun parse_exp_prio(tokens, fe, prioL) = + let fun loop expL = + let val tok = tokens_get tokens + fun done() = (tokens_unget(tokens, tok); expL) + in + case tok + of Token.COLON => (* prio 9, always > prioL *) + loop(Absyn.TYPEDexp(expL, parse_ty tokens)) + | Token.ANDALSO => (* prio 8 *) + if prioL >= 8 then done() + else loop(build_andalso(expL, parse_exp_prio(tokens, fe, 8))) + | Token.ORELSE => (* prio 7 *) + if prioL >= 7 then done() + else loop(build_orelse(expL, parse_exp_prio(tokens, fe, 7))) + | Token.HANDLE => (* prio 6 *) + if prioL >= 6 then done() + else loop(Absyn.HANDLEexp(expL, parse_match(tokens, fe))) + | tok => done() + end + in + (* The relative priorities of the following cases don't matter, + only that they are all less than those of the operators above. *) + case tokens_get tokens + of Token.RAISE => (* prio 5 *) + loop(Absyn.RAISEexp(parse_exp_prio(tokens, fe, 5))) + | Token.IF => (* prio 4 *) + let val exp1 = parse_exp fe tokens + val _ = parse_then tokens + val exp2 = parse_exp fe tokens + val _ = parse_else tokens + val exp3 = parse_exp_prio(tokens, fe, 4) + in + loop(build_if_exp(exp1, exp2, exp3)) + end + | Token.WHILE => (* prio 3 *) + let val exp1 = parse_exp fe tokens + val _ = parse_do tokens + val exp2 = parse_exp_prio(tokens, fe, 3) + in + loop(build_while_exp(exp1, exp2)) + end + | Token.CASE => (* prio 2 *) + let val exp = parse_exp fe tokens + val _ = parse_of tokens + val match = parse_match(tokens, fe) + in + loop(build_case_exp(exp, match)) + end + | Token.FN => (* prio 1 *) + loop(Absyn.FNexp(parse_match(tokens, fe))) + | tok => (tokens_unget(tokens, tok); loop(parse_infexp(tokens, fe))) + end + in + parse_exp_prio(tokens, fe, 0) + end + + and parse_match(tokens, fe) = + Absyn.MATCH(bar_item_plus_parse(tokens, parse_mrule fe)) + + and parse_mrule fe tokens = + let val pat = parse_pat(tokens, fe) + val _ = parse_fatarrow tokens + val exp = parse_exp fe tokens + in + (pat, exp) + end + + (* + * Grammar: Specifications and Signature Expressions + * + * spec ::= 'val' valdesc + * | 'type' typdesc + * | 'type' typdesc' (derived form) + * | 'eqtype' typdesc + * | 'datatype' datdesc + * | 'datatype' tycon '=' 'datatype' longtycon + * | 'exception' exdesc + * | 'structure' strdesc + * | 'include' sigexp + * | 'include' sigid1 ... sigidn (n >= 2, derived form) + * | (empty) + * | spec1 <';'> spec2 (Note 1) + * | spec 'sharing' 'type' longtycon1 '=' ... '=' longtyconn (n >= 2) + * | spec 'sharing' longstrid1 '=' ... '=' longstridn (n >= 2, derived form) + * valdesc ::= vid ':' ty <'and' valdesc> + * typdesc ::= tyvarseq tycon <'and' typdesc> + * typdesc' ::= tyvarseq tycon '=' ty <'and' typdesc'> (derived form) + * datdesc ::= tyvarseq tycon '=' condesc <'and' datdesc> + * condesc ::= vid <'of' ty> <'|' condesc> + * exdesc ::= vid <'of' ty> <'and' condesc> + * strdesc ::= strid ':' sigexp <'and' strdesc> + * + * sigexp ::= 'sig' spec 'end' + * | sigid + * | sigexp 'where' 'type' tyvarseq tycon '=' ty + * | sigexp 'where' typdesc'' (derived form, Note 2) + * typdesc''::= 'type' tyvarseq tycon '=' ty <'and' typdesc''> (derived form, Note 2) + * + * sigdec ::= 'signature' sigbind + * sigbind ::= sigid '=' sigexp <'and' sigbind> + * + * Note 1: Restriction: In a sequential specification, spec2 may not contain a sharing specification. + * (From SuccessorML, "Fixing various minor issues with the formal syntax".) + * + * Note 2: A sigexp can be followed by "'and' strdesc" or "'and' sigbind", but can also + * have "'and' 'type' tyvarseq tycon '=' ty" at its end, necessitating two tokens of + * lookahead. SuccessorML removes the "'and' 'type' ..." derived form from sigexps. + *) + + fun parse_valdesc_item tokens = + let val vid = parse_vid tokens + val _ = parse_colon tokens + val ty = parse_ty tokens + in + (vid, ty) + end + + fun parse_val_spec tokens = + Absyn.VALspec(and_item_plus_parse(tokens, parse_valdesc_item)) + + fun parse_typdesc tokens = and_item_plus_parse(tokens, parse_tyvarseq_tycon) + + fun parse_type_spec tokens = + let val (tyvarseq, tycon) = parse_tyvarseq_tycon tokens + in + case tokens_get tokens + of Token.AND => + [Absyn.TYPEspec((tyvarseq, tycon) :: parse_typdesc tokens)] + | Token.EQ => + let fun assemble(tyvarseq, tycon, ty) = + Absyn.INCLUDEspec(Absyn.WHEREsigexp(Absyn.SPECsigexp(Absyn.SPEC[Absyn.TYPEspec[(tyvarseq, tycon)]]), + tyvarseq, Absyn.LONGID([], tycon), ty)) + fun parse_item tokens = + let val (tyvarseq, tycon, ty) = parse_tyvarseq_tycon_eq_ty tokens + in + assemble(tyvarseq, tycon, ty) + end + val ty = parse_ty tokens + val first = assemble(tyvarseq, tycon, ty) + in + case tokens_get tokens + of Token.AND => List.rev(first :: and_item_plus_parse(tokens, parse_item)) + | tok => (tokens_unget(tokens, tok); [first]) + end + | tok => (tokens_unget(tokens, tok); [Absyn.TYPEspec[(tyvarseq, tycon)]]) + end + + fun parse_eqtype_spec tokens = Absyn.EQTYPEspec(parse_typdesc tokens) + + fun parse_condesc_item tokens = + let val vid = parse_vid tokens + val tyOpt = parse_ofty_opt tokens + in + (vid, tyOpt) + end + + fun parse_condesc tokens = + Absyn.CONBIND(bar_item_plus_parse(tokens, parse_condesc_item)) + + fun parse_datdesc_item tokens = + let val (tyvarseq, tycon) = parse_tyvarseq_tycon_eq tokens + val condesc = parse_condesc tokens + in + (tyvarseq, tycon, condesc) + end + + fun parse_datdesc(datbOpt, tokens) = + let fun parse() = and_item_plus_parse(tokens, parse_datdesc_item) + val datbinds = + case datbOpt + of NONE => parse() + | SOME datb => + case tokens_get tokens + of Token.AND => datb :: parse() + | tok => (tokens_unget(tokens, tok); [datb]) + in + Absyn.DATBIND datbinds + end + + fun parse_datatype_spec tokens = + let fun do_datdesc datbOpt = Absyn.DATATYPEspec(parse_datdesc(datbOpt, tokens)) + in + case tokens_get tokens + of Token.ID tycon => + let val _ = parse_eq tokens + in + case tokens_get tokens + of Token.DATATYPE => Absyn.DATAREPLspec(tycon, parse_longtycon tokens) + | tok => + (* We've seen "datatype tycon = X" where X isn't "datatype", + * so we must have a condesc and possibly more datdescs. + * If our token pushback buffer was larger we could just + * push back "X", "=", and "tycon" and retry at datdesc. + * Instead we parse the first datdesc, and make parse_datdesc + * accept an optional pre-parsed datdesc. + * + * XXX: rewrite to use new unbounded buffer + *) + let val _ = tokens_unget(tokens, tok) + val condesc = parse_condesc tokens + val datb = ([], tycon, condesc) + in + do_datdesc(SOME datb) + end + end + | tok => (tokens_unget(tokens, tok); do_datdesc NONE) + end + + val parse_exdesc_item = parse_condesc_item + + fun parse_exception_spec tokens = + Absyn.EXspec(Absyn.CONBIND(and_item_plus_parse(tokens, parse_exdesc_item))) + + fun parse_longtycons tokens = + let fun loop acc = + let val acc = parse_longtycon tokens :: acc + in + case tokens_get tokens + of Token.EQ => loop acc + | tok => (tokens_unget(tokens, tok); List.rev acc) + end + val longtycon = parse_longtycon tokens + val _ = parse_eq tokens + in + loop [longtycon] + end + + fun parse_sharing_longstrids tokens = + let fun loop acc = + let val acc = parse_longstrid tokens :: acc + in + case tokens_get tokens + of Token.EQ => loop acc + | tok => (tokens_unget(tokens, tok); List.rev acc) + end + val longstrid = parse_longstrid tokens + val _ = parse_eq tokens + in + loop [longstrid] + end + + fun parse_sharing_spec(tokens, acc) = + let fun parse spec = + case tokens_get tokens + of Token.TYPE => Absyn.SHARINGTYspec(spec, parse_longtycons tokens) + | tok => + let val _ = tokens_unget(tokens, tok) + in + Absyn.SHARINGSTRspec(spec, parse_sharing_longstrids tokens) + end + fun loop spec = + let val spec = parse spec + in + case tokens_get tokens + of Token.SHARING => loop(Absyn.SPEC[spec]) + | tok => (tokens_unget(tokens, tok); [spec]) + end + in + loop(Absyn.SPEC(List.rev acc)) + end + + (* XXX: use new unbounded pushback buffer, do NOT return tok *) + + fun parse_sigexp_where(sigexp, tokens) = + let fun loop sigexp = + let val _ = parse_type tokens + val tyvarseq = parse_tyvarseq tokens + val longtycon = parse_longtycon tokens + val _ = parse_eq tokens + val ty = parse_ty tokens + val sigexp = Absyn.WHEREsigexp(sigexp, tyvarseq, longtycon, ty) + in + case tokens_get tokens + of Token.AND => loop sigexp + | Token.WHERE => loop sigexp + | tok => (tokens_unget(tokens, tok); (sigexp, tok)) + end + in + loop sigexp + end + + fun parse_sigexp_where_opt(sigexp, tokens) = + case tokens_get tokens + of Token.WHERE => parse_sigexp_where(sigexp, tokens) + | tok => (sigexp, tok) + + fun parse_strid tokens = + let val strid = parse_id tokens + val _ = check_strid strid + in + strid + end + + val parse_sigid = parse_strid + val parse_funid = parse_strid + + fun parse_spec tokens = + let fun next acc = loop(acc, tokens_get tokens) + and loop(acc, tok) = + let fun onespec spec = next(spec :: acc) + fun specs s = next(s @ acc) + fun specntok(spec, tok) = loop(spec :: acc, tok) + fun specsntok(specs, tok) = loop(specs @ acc, tok) + in + case tok + of Token.VAL => onespec(parse_val_spec tokens) + | Token.TYPE => specs(parse_type_spec tokens) + | Token.EQTYPE => onespec(parse_eqtype_spec tokens) + | Token.DATATYPE => onespec(parse_datatype_spec tokens) + | Token.EXCEPTION => onespec(parse_exception_spec tokens) + | Token.STRUCTURE => specntok(parse_structure_spec tokens) + | Token.INCLUDE => specsntok(parse_include_spec tokens) + | Token.SEMICOLON => next acc + | Token.SHARING => next(parse_sharing_spec(tokens, acc)) + | _ => (Absyn.SPEC(List.rev acc), tok) + end + in + next [] + end + + and parse_structure_spec tokens = + let fun loop acc = + let val strid = parse_strid tokens + val _ = parse_colon tokens + val (sigexp, tok) = parse_sigexp tokens + val acc = (strid, sigexp) :: acc + in + case tok + of Token.AND => loop acc + | _ => (Absyn.STRUCTUREspec(List.rev acc), tok) + end + in + loop [] + end + + and parse_include_spec tokens = + case tokens_get tokens + of Token.ID id => + let fun id2incl id = Absyn.INCLUDEspec(Absyn.SIGIDsigexp id) + val _ = check_strid id + val tok = tokens_get tokens + in + case tok + of Token.ID id2 => + if is_strid id2 then + let fun loop specs = + let val tok = tokens_get tokens + in + case tok + of Token.ID id3 => + if is_strid id3 then loop(id2incl id3 :: specs) + else (specs, tok) + | _ => (specs, tok) + end + in + loop [id2incl id2, id2incl id] + end + else ([id2incl id], tok) + | Token.WHERE => + let val (sigexp, tok) = parse_sigexp_where(Absyn.SIGIDsigexp id, tokens) + in + ([Absyn.INCLUDEspec sigexp], tok) + end + | _ => ([id2incl id], tok) + end + | tok => + let val _ = tokens_unget(tokens, tok) + val (sigexp, tok) = parse_sigexp tokens + in + ([Absyn.INCLUDEspec sigexp], tok) + end + + and parse_sigexp tokens = + case tokens_get tokens + of Token.SIG => + let val (spec, tok) = parse_spec tokens + val _ = check_end tok + in + parse_sigexp_where_opt(Absyn.SPECsigexp spec, tokens) + end + | Token.ID id => + (check_strid id; + parse_sigexp_where_opt(Absyn.SIGIDsigexp id, tokens)) + | tok => expected("sigexp", tok) + + fun parse_sigbind tokens = + let fun loop acc = + let val sigid = parse_sigid tokens + val _ = parse_eq tokens + val (sigexp, tok) = parse_sigexp tokens + val acc = (sigid, sigexp) :: acc + in + case tok + of Token.AND => loop acc + | _ => (Absyn.SIGBIND(List.rev acc), tok) + end + in + loop [] + end + + (* + * Grammar: Structure Expressions and Declarations + * + * strexp ::= 'struct' strdec 'end' + * | longstrid + * | strexp ':' sigexp + * | strexp ':>' sigexp + * | funid '(' strexp ')' + * | funid '(' strdec ')' (derived form) + * | 'let' strdec 'in' strexp 'end' + * strdec ::= dec (Note 1) + * | 'structure' strbind + * | 'local' strdec1 'in' strdec2 'end' + * | (empty) + * | strdec1 <';'> strdec2 + * strbind ::= strid '=' strexp <'and' strbind> + * | strid ':' sigexp '=' strexp <'and' strbind> (derived form) + * | strid ':>' sigexp '=' strexp <'and' strbind> (derived form) + * + * Note 1: Restriction: A declaration dec appearing in a structure declaration may not be + * a sequential or local declaration. (From SuccessorML, "Fixing various minor issues with + * the formal syntax".) + *) + + fun token_starts_dec tok = + case tok (* LOCAL and SEMICOLON deliberately excluded *) + of Token.VAL => true + | Token.FUN => true + | Token.TYPE => true + | Token.DATATYPE => true + | Token.ABSTYPE => true + | Token.EXCEPTION => true + | Token.OPEN => true + | Token.INFIX => true + | Token.INFIXR => true + | Token.NONFIX => true + | _ => false + + fun token_starts_strdec tok = + case tok (* SEMICOLON deliberately excluded *) + of Token.STRUCTURE => true + | Token.LOCAL => true + | _ => token_starts_dec tok + + fun token_starts_strexp tok = + case tok + of Token.STRUCT => true + | Token.QUALID(_, id) => is_strid id + | Token.ID id => is_strid id + | Token.LET => true + | _ => false + + fun apply_strexp_constraint_opt(NONE, strexp) = strexp + | apply_strexp_constraint_opt(SOME f, strexp) = f strexp + + fun parse_strexp_constraint_opt_tok(tokens, tok) = + case tok + of Token.COLON => + let val (sigexp, tok) = parse_sigexp tokens + in + (SOME(fn strexp => Absyn.TRANSPARENTstrexp(strexp, sigexp, ref NONE)), tok) + end + | Token.COLONGT => + let val (sigexp, tok) = parse_sigexp tokens + in + (SOME(fn strexp => Absyn.OPAQUEstrexp(strexp, sigexp, ref NONE)), tok) + end + | tok => (NONE, tok) + + fun parse_strexp_constraint_opt tokens = parse_strexp_constraint_opt_tok(tokens, tokens_get tokens) + + fun parse_and_apply_strexp_constraint_opt(strexp, tokens) = + let fun loop(strexp, tok) = + case parse_strexp_constraint_opt_tok(tokens, tok) + of (SOME f, tok) => loop(f strexp, tok) + | (NONE, tok) => (strexp, tok) + in + loop(strexp, tokens_get tokens) + end + + fun parse_strdec(tokens, feIn, isTopDec) = + let fun next(acc, feIn, feOut) = loop(acc, feIn, feOut, tokens_get tokens) + and loop(acc, feIn, feOut, tok) = + let fun done() = (Absyn.STRDEC(List.rev acc), feOut) + fun decnfe(sdec, feOut') = next(sdec :: acc, fe_plus(feIn, feOut'), fe_plus(feOut, feOut')) + fun decntok(sdec, tok) = loop(sdec :: acc, feIn, feOut, tok) + in + case tok + of Token.STRUCTURE => decntok(parse_structure_strdec(tokens, feIn)) + | Token.LOCAL => decnfe(parse_local_strdec(tokens, feIn)) + | Token.SEMICOLON => if isTopDec then done() else next(acc, feIn, feOut) + | tok => + if token_starts_dec tok then decnfe(parse_dec_strdec(tokens, feIn, tok)) + else (tokens_unget(tokens, tok); done()) + end + in + next([], feIn, fe_empty) + end + + and parse_structure_strdec(tokens, fe) = + let fun loop acc = + let val strid = parse_strid tokens + val (constraintOpt, tok) = parse_strexp_constraint_opt tokens + val _ = check_eq tok + val (strexp, tok) = parse_strexp(tokens, fe) + val strexp = apply_strexp_constraint_opt(constraintOpt, strexp) + val acc = (strid, strexp) :: acc + in + case tok + of Token.AND => loop acc + | _ => (Absyn.STRUCTUREstrdec(Absyn.STRBIND(List.rev acc)), tok) + end + in + loop [] + end + + and parse_local_strdec(tokens, feIn) = + let val (sdec1, feOut1) = parse_strdec(tokens, feIn, false) + val _ = parse_in tokens + val (sdec2, feOut2) = parse_strdec(tokens, fe_plus(feIn, feOut1), false) + val _ = parse_end tokens + in + (Absyn.LOCALstrdec(sdec1, sdec2), feOut2) + end + + and parse_dec_strdec(tokens, feIn, tok) = + let val _ = tokens_unget(tokens, tok) + val (dec, feOut) = parse_dec(tokens, feIn, (*stopAtSemi=*)true) + in + (Absyn.DECstrdec dec, feOut) + end + + and parse_strexp(tokens, fe) = + let val strexp = + case tokens_get tokens + of Token.STRUCT => parse_struct_strexp(tokens, fe) + | Token.QUALID(strids, id) => + (check_strid id; Absyn.LONGSTRIDstrexp(Absyn.LONGID(strids, id))) + | Token.ID id => parse_strid_strexp(tokens, fe, id) + | Token.LET => parse_let_strexp(tokens, fe) + | _ => raise SyntaxError + in + parse_and_apply_strexp_constraint_opt(strexp, tokens) + end + + and parse_struct_strexp(tokens, fe) = + let val (strdec, _) = parse_strdec(tokens, fe, false) + val _ = parse_end tokens + in + Absyn.STRUCTstrexp strdec + end + + and parse_strid_strexp(tokens, fe, id) = + let val _ = check_strid id + in + case tokens_get tokens + of Token.LPAREN => + let fun parse_strexp_or_strdec() = + let val tok = tokens_get tokens + in + if token_starts_strexp tok then + (tokens_unget(tokens, tok); + parse_strexp(tokens, fe)) + else if token_starts_strdec tok then + let val _ = tokens_unget(tokens, tok) + val (strdec, _) = parse_strdec(tokens, fe, false) + in + (Absyn.STRUCTstrexp strdec, tokens_get tokens) + end + else raise SyntaxError + end + val (strexp, tok) = parse_strexp_or_strdec() + val _ = check_rparen tok + in + Absyn.FUNAPPstrexp(id, strexp) + end + | tok => (tokens_unget(tokens, tok); Absyn.LONGSTRIDstrexp(Absyn.LONGID([], id))) + end + + and parse_let_strexp(tokens, fe) = + let val (strdec, feOut) = parse_strdec(tokens, fe, false) + val _ = parse_in tokens + val (strexp, tok) = parse_strexp(tokens, fe_plus(fe, feOut)) + val _ = check_end tok + in + Absyn.LETstrexp(strdec, strexp) + end + + (* + * Grammar: Functor Declarations, Top-level Declarations, and Programs + * + * fundec ::= 'functor' funbind + * funbind ::= fundef <'and' funbind> + * fundef ::= funid '(' strid ':' sigexp ')' '=' strexp + * | funid '(' strid ':' sigexp ')' ':' sigexp' '=' strexp (derived form) + * | funid '(' strid ':' sigexp ')' ':>' sigexp' '=' strexp (derived form) + * | funid '(' spec ')' <':' sigexp> '=' strexp (derived form) + * | funid '(' spec ')' <':>' sigexp> '=' strexp (derived form) + * + * topdec ::= strdec (Note 1) + * | sigdec + * | fundec + * + * program ::= topdec ';' + * | exp ';' (derived form) + * + * Note 1: Restriction: No topdec may contain, as an initial segment, + * a strdec followed by a semicolon. + * Furthermore, the strdec may not be a sequential declaration strdec1<;>strdec2. + * (From SuccessorML, "Fixing various minor issues with the formal syntax".) + *) + + fun parse_fundef_formal_arg tokens = + case tokens_get tokens + of Token.ID strid => + let val _ = check_strid strid + val _ = parse_colon tokens + val (sigexp, tok) = parse_sigexp tokens + in + (strid, sigexp, fn strexp => strexp, tok) + end + | tok => + let val _ = tokens_unget(tokens, tok) + val (spec, tok) = parse_spec tokens + val strid = Absyn.gensym() + val sigexp = Absyn.SPECsigexp spec + fun wrap strexp = + Absyn.LETstrexp(Absyn.STRDEC[Absyn.DECstrdec(Absyn.DEC[Absyn.OPENdec[Absyn.LONGID([], strid)]])], + strexp) + in + (strid, sigexp, wrap, tok) + end + + fun parse_fundef(tokens, fe) = + let val funid = parse_funid tokens + val _ = parse_lparen tokens + val (strid, sigexp, wrapStrExp, tok) = parse_fundef_formal_arg tokens + val _ = check_rparen tok + val (constraintOpt, tok) = parse_strexp_constraint_opt tokens + val _ = check_eq tok + val (strexp, tok) = parse_strexp(tokens, fe) + in + ((funid, strid, sigexp, wrapStrExp(apply_strexp_constraint_opt(constraintOpt, strexp))), tok) + end + + fun parse_funbind(tokens, fe) = + let fun loop acc = + let val (fundef, tok) = parse_fundef(tokens, fe) + val acc = fundef :: acc + in + case tok + of Token.AND => loop acc + | _ => (Absyn.FUNBIND(List.rev acc), tok) + end + in + loop [] + end + + fun parse_topdec(tokens, fe) = + case tokens_get tokens + of Token.FUNCTOR => + let val (funbind, tok) = parse_funbind(tokens, fe) + val _ = tokens_unget(tokens, tok) + in + SOME(Absyn.FUNDECtopdec funbind, fe) + end + | Token.SIGNATURE => + let val (sigbind, tok) = parse_sigbind tokens + val _ = tokens_unget(tokens, tok) + in + SOME(Absyn.SIGDECtopdec sigbind, fe) + end + | Token.SEMICOLON => SOME(Absyn.STRDECtopdec(Absyn.STRDEC[]), fe) + | Token.EOF => NONE + | tok => + let val _ = tokens_unget(tokens, tok) + in + if token_starts_strdec tok then + let val (strdec, feOut) = parse_strdec(tokens, fe, true) + in + SOME(Absyn.STRDECtopdec strdec, fe_plus(fe, feOut)) + end + else + let val exp = parse_exp fe tokens + val _ = (case tokens_get tokens + of Token.SEMICOLON => () + | Token.EOF => () + | _ => raise SyntaxError) + val dec = Absyn.DEC[Absyn.VALdec([], [(Absyn.VIDpat(Absyn.LONGID([], "it"), ref NONE), exp)], [])] + val strdec = Absyn.STRDEC[Absyn.DECstrdec dec] + in + SOME(Absyn.STRDECtopdec strdec, fe) + end + end + + fun parse_file file = + let val is = TextIO.openIn file + val lexarg = LexArg.new(file, is) + val tokens = tokens_open lexarg + in + case Util.after(fn() => parse_topdec(tokens, fe_init), fn() => TextIO.closeIn is) + of SOME(ast, _) => ast + | NONE => expected("topdec", Token.EOF) + end + + end diff --git a/src/compiler/SOURCE.sig b/src/compiler/SOURCE.sig new file mode 100644 index 0000000..0cab246 --- /dev/null +++ b/src/compiler/SOURCE.sig @@ -0,0 +1,28 @@ +(* + * Copyright 1997, 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature SOURCE = + sig + + datatype source + = SOURCE of + { fileName: string, + newLines: int list } (* _descending_ order *) + + val startPos: int + val dummy : source + val sayMsg : source -> string * int * int -> unit + + end (* signature SOURCE *) diff --git a/src/compiler/START.sig b/src/compiler/START.sig new file mode 100644 index 0000000..b60e93e --- /dev/null +++ b/src/compiler/START.sig @@ -0,0 +1,19 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature START = + sig + val start: unit -> 'a + end diff --git a/src/compiler/Source.sml b/src/compiler/Source.sml new file mode 100644 index 0000000..a5ce2d2 --- /dev/null +++ b/src/compiler/Source.sml @@ -0,0 +1,65 @@ +(* + * Copyright 1997, 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Source : SOURCE = + struct + + datatype source + = SOURCE of + { fileName: string, + newLines: int list } (* _descending_ order *) + + val dummy = SOURCE{fileName="", newLines=[]} + + (* The pos of an imaginary newline before a file's very + * first character. This is necessary to adjust for the + * weird notion of ML-Lex that the first character has + * position 2. Not 0 or 1, but 2. + * THIS WILL BREAK IF ML-LEX IS FIXED + *) + val startPos = 1 + + fun lookup(newLines, pos) = + let fun loop([], _) = {line = 1, column = pos - startPos} + | loop(newLine::newLines, line) = + if pos > newLine then {line = line, column = pos - newLine} + else loop(newLines, line - 1) + in + loop(newLines, 1 + List.length newLines) + end + + fun sayErr s = TextIO.output(TextIO.stdErr, s) + fun sayErr1 c = TextIO.output1(TextIO.stdErr, c) + + fun sayFile file = (sayErr file; sayErr1 #":") + + fun sayPos(newLines, pos) = + let val {line,column} = lookup(newLines, pos) + in + sayErr(Int.toString line); + sayErr1 #"."; + sayErr(Int.toString column) + end + + fun sayMsg (SOURCE{fileName,newLines}) (msg,leftPos,rightPos) = + (sayFile fileName; + sayPos(newLines, leftPos); + sayErr1 #"-"; + sayPos(newLines, rightPos); + sayErr1 #" "; + sayErr msg; + sayErr1 #"\n") + + end (* structure Source *) diff --git a/src/compiler/Start.sml b/src/compiler/Start.sml new file mode 100644 index 0000000..2f25028 --- /dev/null +++ b/src/compiler/Start.sml @@ -0,0 +1,30 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Start : START = + struct + + fun start() = + OS.Process.exit(Main.main(CommandLine.arguments())) + handle exn => + let val progname = OS.Path.file(CommandLine.name()) + in + TextIO.output(TextIO.stdErr, + progname ^ ": unhandled exception " ^ + General.exnMessage exn ^ "\n"); + OS.Process.exit OS.Process.failure + end + + end diff --git a/src/compiler/TOKEN.sig b/src/compiler/TOKEN.sig new file mode 100644 index 0000000..eca8949 --- /dev/null +++ b/src/compiler/TOKEN.sig @@ -0,0 +1,102 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature TOKEN = + sig + datatype token' (* Core: reserved words *) + = ABSTYPE + | AND + | ANDALSO + | AS + | CASE + | DATATYPE + | DO + | ELSE + | END + | EXCEPTION + | FN + | FUN + | HANDLE + | IF + | IN + | INFIX + | INFIXR + | LET + | LOCAL + | NONFIX + | OF + | OP + | OPEN + | ORELSE + | RAISE + | REC + | THEN + | TYPE + | VAL + | WITH + | WITHTYPE + | WHILE + (* Core: special symbols *) + | LPAREN + | RPAREN + | LBRACK + | RBRACK + | LBRACE + | RBRACE + | COMMA + | COLON + | SEMICOLON + | DOTDOTDOT + | UNDERSCORE + | BAR + | EQ (* may be used as ID *) + | FATARROW + | THINARROW + | HASH + | STAR (* may be used as ID, except in TyCon *) + (* Core: special constants *) + | DIGZ (* [0], admissible as fixity level or integer constant *) + | DIGNZ of int (* [1-9], admissible as fixity level, numeric label, or integer constant *) + | NUMC of IntInf.int (* [1-9][0-9]+, admissible as numeric label or integer constant *) + | INTC of IntInf.int (* integers starting with ~ or 0 *) + | WORDC of IntInf.int + | REALC of real + | STRINGC of string + | CHARC of char + (* Core: identifiers *) + | ID of string + | QUALID of string list * string + | TYVAR of string + (* Modules *) + | EQTYPE + | FUNCTOR + | INCLUDE + | SHARING + | SIG + | SIGNATURE + | STRUCT + | STRUCTURE + | WHERE + | COLONGT + (* Synthetic *) + | INFID of string * int + | ERROR + | EOF + + type pos = int + datatype token = T of pos * pos * token' + + val toString : token' -> string + end diff --git a/src/compiler/TRANSLATE.sig b/src/compiler/TRANSLATE.sig new file mode 100644 index 0000000..66a08ae --- /dev/null +++ b/src/compiler/TRANSLATE.sig @@ -0,0 +1,19 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature TRANSLATE = + sig + val translate: Absyn.topdec -> CoreErlang.module + end diff --git a/src/compiler/TYPES.sig b/src/compiler/TYPES.sig new file mode 100644 index 0000000..ef99540 --- /dev/null +++ b/src/compiler/TYPES.sig @@ -0,0 +1,59 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature TYPES = + sig + + type Level = int + type TyName = string + type Label = string + + datatype Type = VAR of TyVar + | REC of Record + | CONS of Type list * TyName + + and TyVar = TYVAR of { + (* generalization scope upper bound *) + level : int, + (* requires equality? *) + eq : bool, + (* optional monomorphic overloading *) + ovld : TyName list option, + (* substitution *) + subst : Type option ref + } + + and Record = RECORD of { + (* known fields *) + fields : (Label * Type) list, + (* flexible? *) + is_flexible : bool, + (* substitution *) + subst : Record option ref + } + + val mkTyVar : Level * bool * TyName list option -> TyVar + val mkFreeTyVar : Level -> TyVar + val mkEqTyVar : Level -> TyVar + val mkOvldTyVar : TyName list * Level -> TyVar + + val tyvarOvld : TyVar -> TyName list option + + val derefTy : Type -> Type + + val mkRecord : (Label * Type) list * bool -> Record + val derefRecord : Record -> Record + + end diff --git a/src/compiler/TYPE_CHECK.sig b/src/compiler/TYPE_CHECK.sig new file mode 100644 index 0000000..4328d63 --- /dev/null +++ b/src/compiler/TYPE_CHECK.sig @@ -0,0 +1,19 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature TYPE_CHECK = + sig + val check: Absyn.topdec -> Basis.basis + end diff --git a/src/compiler/TYPE_SCHEME.sig b/src/compiler/TYPE_SCHEME.sig new file mode 100644 index 0000000..2fe45e6 --- /dev/null +++ b/src/compiler/TYPE_SCHEME.sig @@ -0,0 +1,26 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature TYPE_SCHEME = + sig + type Level = Types.Level + type TyVar = Types.TyVar + type Type = Types.Type + type TypeScheme + val inst : TypeScheme * Level -> TyVar list * Type + val gen_limit : Type * Level -> TypeScheme + val gen_all : Type -> TypeScheme + val gen_none : Type -> TypeScheme + end diff --git a/src/compiler/Token.sml b/src/compiler/Token.sml new file mode 100644 index 0000000..17b0e58 --- /dev/null +++ b/src/compiler/Token.sml @@ -0,0 +1,177 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Token : TOKEN = + struct + datatype token' (* Core: reserved words *) + = ABSTYPE + | AND + | ANDALSO + | AS + | CASE + | DATATYPE + | DO + | ELSE + | END + | EXCEPTION + | FN + | FUN + | HANDLE + | IF + | IN + | INFIX + | INFIXR + | LET + | LOCAL + | NONFIX + | OF + | OP + | OPEN + | ORELSE + | RAISE + | REC + | THEN + | TYPE + | VAL + | WITH + | WITHTYPE + | WHILE + (* Core: special symbols *) + | LPAREN + | RPAREN + | LBRACK + | RBRACK + | LBRACE + | RBRACE + | COMMA + | COLON + | SEMICOLON + | DOTDOTDOT + | UNDERSCORE + | BAR + | EQ (* may be used as ID *) + | FATARROW + | THINARROW + | HASH + | STAR (* may be used as ID, except in TyCon *) + (* Core: special constants *) + | DIGZ (* [0], admissible as fixity level or integer constant *) + | DIGNZ of int (* [1-9], admissible as fixity level, numeric label, or integer constant *) + | NUMC of IntInf.int (* [1-9][0-9]+, admissible as numeric label or integer constant *) + | INTC of IntInf.int (* integers starting with ~ or 0 *) + | WORDC of IntInf.int + | REALC of real + | STRINGC of string + | CHARC of char + (* Core: identifiers *) + | ID of string + | QUALID of string list * string + | TYVAR of string + (* Modules *) + | EQTYPE + | FUNCTOR + | INCLUDE + | SHARING + | SIG + | SIGNATURE + | STRUCT + | STRUCTURE + | WHERE + | COLONGT + (* Synthetic *) + | INFID of string * int + | ERROR + | EOF + + type pos = int + datatype token = T of pos * pos * token' + + fun toString token = + case token + of ABSTYPE => "ABSTYPE" + | AND => "AND" + | ANDALSO => "ANDALSO" + | AS => "AS" + | CASE => "CASE" + | DATATYPE => "DATATYPE" + | DO => "DO" + | ELSE => "ELSE" + | END => "END" + | EXCEPTION => "EXCEPTION" + | FN => "FN" + | FUN => "FUN" + | HANDLE => "HANDLE" + | IF => "IF" + | IN => "IN" + | INFIX => "INFIX" + | INFIXR => "INFIXR" + | LET => "LET" + | LOCAL => "LOCAL" + | NONFIX => "NONFIX" + | OF => "OF" + | OP => "OP" + | OPEN => "OPEN" + | ORELSE => "ORELSE" + | RAISE => "RAISE" + | REC => "REC" + | THEN => "THEN" + | TYPE => "TYPE" + | VAL => "VAL" + | WITH => "WITH" + | WITHTYPE => "WITHTYPE" + | WHILE => "WHILE" + | LPAREN => "LPAREN" + | RPAREN => "RPAREN" + | LBRACK => "LBRACK" + | RBRACK => "RBRACK" + | LBRACE => "LBRACE" + | RBRACE => "RBRACE" + | COMMA => "COMMA" + | COLON => "COLON" + | SEMICOLON => "SEMICOLON" + | DOTDOTDOT => "DOTDOTDOT" + | UNDERSCORE => "UNDERSCORE" + | BAR => "BAR" + | EQ => "EQ" + | FATARROW => "FATARROW" + | THINARROW => "THINARROW" + | HASH => "HASH" + | STAR => "STAR" + | DIGZ => "DIGZ" + | DIGNZ _ => "DIGNZ" + | NUMC _ => "NUMC" (* TODO: convert attribute *) + | INTC _ => "INTC" (* TODO: convert attribute *) + | WORDC _ => "WORDC" (* TODO: convert attribute *) + | REALC _ => "REALC" (* TODO: convert attribute *) + | STRINGC _ => "STRINGC" (* TODO: convert attribute *) + | CHARC _ => "CHARC" (* TODO: convert attribute *) + | ID _ => "ID _" (* TODO: convert attribute *) + | QUALID _ => "QUALID(_, _)" (* TODO: convert attribute *) + | TYVAR _ => "TYVAR _" (* TODO: convert attribute *) + | EQTYPE => "EQTYPE" + | FUNCTOR => "FUNCTOR" + | INCLUDE => "INCLUDE" + | SHARING => "SHARING" + | SIG => "SIG" + | SIGNATURE => "SIGNATURE" + | STRUCT => "STRUCT" + | STRUCTURE => "STRUCTURE" + | WHERE => "WHERE" + | COLONGT => "COLONGT" + | INFID _ => "INFID" (* TODO: convert attribute *) + | ERROR => "ERROR" + | EOF => "EOF" + + end diff --git a/src/compiler/Translate.sml b/src/compiler/Translate.sml new file mode 100644 index 0000000..6541267 --- /dev/null +++ b/src/compiler/Translate.sml @@ -0,0 +1,284 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Translate : TRANSLATE = + struct + + exception Translate + + fun sayErr s = TextIO.output(TextIO.stdErr, s) + + fun nyi msg = + (sayErr("translate: nyi " ^ msg ^ "\n"); raise Translate) + + (* + * PATTERNS + *) + + fun transScon scon = + case scon + of Absyn.INTsc i => CoreErlang.L_INTEGER i + | Absyn.WORDsc i => CoreErlang.L_INTEGER i + | Absyn.REALsc r => CoreErlang.L_FLOAT r + | Absyn.STRINGsc s => CoreErlang.L_STRING s + | Absyn.CHARsc c => CoreErlang.L_INTEGER(IntInf.fromInt(Char.ord c)) + + fun recordToTuple row = + let fun labelLt(Absyn.INTlab i1, Absyn.INTlab i2) = i1 < i2 + | labelLt(Absyn.INTlab _, Absyn.IDlab _) = true + | labelLt(Absyn.IDlab i1, Absyn.IDlab i2) = i1 < i2 + | labelLt(Absyn.IDlab _, Absyn.INTlab _) = false + fun insert(item, []) = [item] + | insert(item1 as (lab1,_), row as ((item2 as (lab2,_)) :: row')) = + if labelLt(lab1, lab2) then item1 :: row else item2 :: insert(item1, row') + in + List.map #2 (List.foldl insert [] row) + end + + fun transPat pat = + case pat + of Absyn.WILDpat => CoreErlang.P_VARIABLE(CoreErlang.mkVar NONE) + | Absyn.SCONpat scon => CoreErlang.P_LITERAL(transScon scon) + | Absyn.VIDpat(longid as Absyn.LONGID(_, vid), refOptIdStatus) => + (* TODO: + * - allow for different con rep than atoms (e.g. fixnums or nil) + * - proper SML semantics for exn con (generative, alias)? + *) + (case valOf(!refOptIdStatus) + of Basis.VAL => CoreErlang.P_VARIABLE vid + | _ => CoreErlang.P_LITERAL(CoreErlang.L_ATOM vid)) + | Absyn.RECpat(row, false) => CoreErlang.P_TUPLE(List.map transPat (recordToTuple row)) + | Absyn.RECpat(_, true) => nyi "flexible record patterns" + | Absyn.CONSpat(Absyn.LONGID(_, con), pat) => + CoreErlang.P_TUPLE [CoreErlang.P_LITERAL(CoreErlang.L_ATOM con), transPat pat] + | Absyn.TYPEDpat(pat, _) => transPat pat + | Absyn.ASpat(_, pat) => transPat pat + + (* + * EXPRESSIONS + *) + + fun recbindFname(Absyn.VIDpat(Absyn.LONGID([], id), _)) = CoreErlang.FNAME(id, 1) + | recbindFname(_) = nyi "bad function name pattern in 'let val rec'" + + fun mkFnameAlias((fname as CoreErlang.FNAME(id, _), _), body) = + CoreErlang.E_LET(id, CoreErlang.E_FNAME fname, body) + + fun mkCall(m, f, args) = + CoreErlang.E_CALL(CoreErlang.E_LITERAL(CoreErlang.L_ATOM m), + CoreErlang.E_LITERAL(CoreErlang.L_ATOM f), + args) + + fun callErlang(f, args) = mkCall("erlang", f, args) + + fun callErlmlRuntime(f, args) = mkCall("erlml_runtime", f, args) + + fun transConExp(vid, hasarg) = + let val tagExp = CoreErlang.E_LITERAL(CoreErlang.L_ATOM vid) + in + case hasarg + of false => tagExp + | true => + let val var = CoreErlang.mkVar NONE + val bodyExp = CoreErlang.E_TUPLE[tagExp, CoreErlang.E_VARIABLE var] + in + CoreErlang.E_FUN(CoreErlang.FUN([var], bodyExp)) + end + end + + fun transVarExp(Absyn.LONGID(strids, vid)) = + case strids + of [] => CoreErlang.E_VARIABLE vid + | [strid] => mkCall(strid, vid, []) + | _ => nyi "nested structures in expressions" + + fun transExp exp = + case exp + of Absyn.SCONexp scon => CoreErlang.E_LITERAL(transScon scon) + | Absyn.VIDexp(longid as Absyn.LONGID(_, vid), refOptIdStatus) => + (* TODO: + * - allow for different con rep than atoms (e.g. fixnum or nil) + * - proper SML semantics for exn con (generative, alias)? + *) + (case valOf(!refOptIdStatus) + of Basis.VAL => transVarExp longid + | Basis.CON hasarg => transConExp(vid, hasarg) + | Basis.EXN hasarg => transConExp(vid, hasarg)) + | Absyn.RECexp row => CoreErlang.E_TUPLE(List.map transExp (recordToTuple row)) + | Absyn.LETexp(Absyn.DEC decs, exp) => transLet(decs, transExp exp) + | Absyn.APPexp(f, arg) => + let val var = CoreErlang.mkVar NONE + in + CoreErlang.E_LET(var, transExp f, CoreErlang.E_APPLY(CoreErlang.FV var, [transExp arg])) + end + | Absyn.TYPEDexp(exp, _) => transExp exp + | Absyn.HANDLEexp(exp, match) => + let val e1 = transExp exp + val v1 = CoreErlang.mkVar NONE + val e2 = CoreErlang.E_VARIABLE v1 + val cv1 = CoreErlang.mkVar NONE + val cv2 = CoreErlang.mkVar NONE + val cv3 = CoreErlang.mkVar NONE + val default = CoreErlang.E_PRIMOP("raise", [CoreErlang.E_VARIABLE cv3, CoreErlang.E_VARIABLE cv2]) + val ce = transMatch(cv2, match, default) + in + CoreErlang.E_TRY(e1, v1, e2, cv1, cv2, cv3, ce) + end + | Absyn.RAISEexp exp => + (* TODO: currently all SML exceptions will have class 'throw', + * figure out a way to generate and handle ones with other classes, + * including native Erlang non-throw exceptions + *) + callErlang("throw", [transExp exp]) + | Absyn.FNexp match => + let val var = CoreErlang.mkVar NONE + val default = callErlmlRuntime("raise_match", []) + in + CoreErlang.E_FUN(CoreErlang.FUN([var], transMatch(var, match, default))) + end + + and transMatch(var, Absyn.MATCH clauses, default) = + CoreErlang.E_CASE(CoreErlang.E_VARIABLE var, + (List.map transClause clauses) @ + [(CoreErlang.P_VARIABLE(CoreErlang.mkVar NONE), default)]) + + and transClause(pat, exp) = (transPat pat, transExp exp) + + and transLet([], body) = body + | transLet(dec :: decs, body) = transLetDec(dec, transLet(decs, body)) + + and transLetDec(dec, body) = + case dec + of Absyn.VALdec(_, nonrecs, recs) => transLetVal(nonrecs, transLetRec(recs, body)) + | _ => nyi "type, exception, local, or open form of in LET" + + and transLetVal([(pat,exp)], body) = + CoreErlang.E_CASE(transExp exp, + [(transPat pat, body), + (CoreErlang.P_VARIABLE(CoreErlang.mkVar NONE), + callErlmlRuntime("raise_bind", []))]) + | transLetVal([], body) = body + | transLetVal(_, _) = nyi "multiple bindings in 'let val'" + + and transLetRec(recs, body) = + let val fundefs = List.map transOneRecBind recs + in + CoreErlang.E_LETREC(List.map transOneRecBind recs, + List.foldl mkFnameAlias body fundefs) + end + + and transOneRecBind(pat, match) = + let val fname = recbindFname pat + val var = CoreErlang.mkVar NONE + val default = callErlmlRuntime("raise_match", []) + in + (fname, CoreErlang.FUN([var], transMatch(var, match, default))) + end + + (* + * STRUCTURES + *) + + fun transDec(dec, body) = + case dec + of Absyn.VALdec(_, nonrecs, recs) => transLetVal(nonrecs, transLetRec(recs, body)) + | Absyn.TYPEdec _ => body + | Absyn.DATATYPEdec _ => body + | Absyn.DATAREPLdec _ => body + | Absyn.EXdec _ => body (* TODO: proper SML semantics for exn con (generative, alias)? *) + | _ => nyi "abstype, local, or open form of structure-level " + + fun transDecs([], body) = body + | transDecs(dec :: decs, body) = transDec(dec, transDecs(decs, body)) + + (* + * MODULES + *) + + fun mkModinfo0 modExp = + (CoreErlang.FNAME("module_info", 0), + CoreErlang.FUN([], callErlang("get_module_info", [modExp]))) + + fun mkModinfo1 modExp = + let val var = CoreErlang.mkVar NONE + in + (CoreErlang.FNAME("module_info", 1), + CoreErlang.FUN([var], callErlang("get_module_info", [modExp, CoreErlang.E_VARIABLE var]))) + end + + fun veToFunDef strid (vid, idstatus, fundefs) = + case idstatus + of Basis.VAL => + let val vidkey = CoreErlang.E_TUPLE[CoreErlang.E_LITERAL(CoreErlang.L_ATOM strid), + CoreErlang.E_LITERAL(CoreErlang.L_ATOM vid)] + val fbody = callErlmlRuntime("get_var", [vidkey]) + in + (CoreErlang.FNAME(vid, 0), CoreErlang.FUN([], fbody)) :: fundefs + end + | _ => fundefs + + fun mkCtor strid (CoreErlang.FNAME(vid, _), ctor) = + let val vidkey = CoreErlang.E_TUPLE[CoreErlang.E_LITERAL(CoreErlang.L_ATOM strid), + CoreErlang.E_LITERAL(CoreErlang.L_ATOM vid)] + in + CoreErlang.E_LET(CoreErlang.mkVar NONE, callErlmlRuntime("set_var", [vidkey, CoreErlang.E_VARIABLE vid]), + ctor) + end + + fun transEnv(strid, Basis.E(_, Basis.VE ve)) = + let val fundefs = Dict.fold(veToFunDef strid, [], ve) + val exports = List.map #1 fundefs + val ctor = List.foldl (mkCtor strid) (CoreErlang.E_LITERAL(CoreErlang.L_ATOM "ok")) exports + in + (exports, fundefs, ctor) + end + + fun transModule(env, strid, Absyn.DEC decs) = + let val (exports, fundefs, ctor) = transEnv(strid, env) + val ctor = transDecs(decs, ctor) + val ctordef = (CoreErlang.FNAME("ctor", 0), CoreErlang.FUN([], ctor)) + val modExp = CoreErlang.E_LITERAL(CoreErlang.L_ATOM strid) + val (modinfo0 as (fname0, _)) = mkModinfo0 modExp + val (modinfo1 as (fname1, _)) = mkModinfo1 modExp + in + CoreErlang.MODULE(strid, + fname0 :: fname1 :: exports, + [("on_load", + CoreErlang.C_CONS(CoreErlang.C_TUPLE[CoreErlang.C_LITERAL(CoreErlang.L_ATOM "ctor"), + CoreErlang.C_LITERAL(CoreErlang.L_INTEGER(IntInf.fromInt 0))], + CoreErlang.C_LITERAL CoreErlang.L_NIL))], + modinfo0 :: modinfo1 :: ctordef :: fundefs) + end + + fun transStrExp(strid, strexp) = + case strexp + of Absyn.TRANSPARENTstrexp(Absyn.STRUCTstrexp(Absyn.STRDEC[Absyn.DECstrdec dec]), _, refOptEnv) => + transModule(valOf(!refOptEnv), strid, dec) + | Absyn.OPAQUEstrexp(Absyn.STRUCTstrexp(Absyn.STRDEC[Absyn.DECstrdec dec]), _, refOptEnv) => + transModule(valOf(!refOptEnv), strid, dec) + | _ => nyi "non-plain form of " + + fun transStrDecs strdecs = + case strdecs + of [Absyn.STRUCTUREstrdec(Absyn.STRBIND[(strid, strexp)])] => + transStrExp(strid, strexp) + | _ => nyi "non-plain kind of " + + fun translate topdec = + case topdec + of Absyn.STRDECtopdec(Absyn.STRDEC strdecs) => transStrDecs strdecs + | _ => nyi "non- kind of " + + end diff --git a/src/compiler/TypeCheck.sml b/src/compiler/TypeCheck.sml new file mode 100644 index 0000000..83b09d7 --- /dev/null +++ b/src/compiler/TypeCheck.sml @@ -0,0 +1,300 @@ +(* + * Copyright 2017-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure TypeCheck : TYPE_CHECK = + struct + + exception TypeCheck + + fun sayErr s = TextIO.output(TextIO.stdErr, s) + + fun nyi msg = + (sayErr("TypeCheck: nyi " ^ msg ^ "\n"); raise TypeCheck) + + fun error msg = + (sayErr("TypeCheck: " ^ msg ^ "\n"); raise TypeCheck) + + fun readBasisFile(id, ext) = + (* TODO: path for basis files? *) + SOME(Basis.read(id ^ ext ^ ".basis")) handle _ => NONE + + (* + * PATTERNS + *) + + fun lookupLongVid'(Basis.E(_, Basis.VE dict), [], vid) = Dict.find(dict, vid) + | lookupLongVid'(Basis.E(Basis.SE dict, _), strid :: strids, vid) = + lookupLongVid'(valOf(Dict.find(dict, strid)), strids, vid) + + (* For a short VId we look it up first in the current Env, and then in the initial Basis. + For a long VId, we look up the first StrId first in the current Env, then in the + initial Basis, and lastly from a .basis file. The resulting Env is then used to + look up subsequent StrIds and finally the VId. *) + + fun lookupLongVid(Basis.E(_, Basis.VE dict), Absyn.LONGID([], vid)) = + (case Dict.find(dict, vid) + of NONE => + let val Basis.BASIS(_, Basis.E(_, Basis.VE dict)) = Basis.initialBasis + in + Dict.find(dict, vid) + end + | sth => sth) + | lookupLongVid(Basis.E(Basis.SE dict, _), Absyn.LONGID(strid :: strids, vid)) = + case Dict.find(dict, strid) + of SOME env => lookupLongVid'(env, strids, vid) + | NONE => + let val Basis.BASIS(_, Basis.E(Basis.SE dict, _)) = Basis.initialBasis + in + case Dict.find(dict, strid) + of SOME env => lookupLongVid'(env, strids, vid) + | NONE => + case readBasisFile(strid, ".sml") + of SOME(Basis.BASIS(_, env)) => lookupLongVid'(env, strids, vid) + | NONE => NONE + end + + fun bindVid(Basis.E(strenv, Basis.VE dict), vid, idstatus) = + Basis.E(strenv, Basis.VE(Dict.insert(dict, vid, idstatus))) + + fun checkPat(env, pat) = + case pat + of Absyn.WILDpat => env + | Absyn.SCONpat _ => env + | Absyn.VIDpat(longid, refOptIdStatus) => + (case longid + of Absyn.LONGID([], vid) => + (case lookupLongVid(env, longid) + of SOME idstatus => (refOptIdStatus := SOME idstatus; env) + | NONE => (refOptIdStatus := SOME Basis.VAL; bindVid(env, vid, Basis.VAL))) + | _ => (refOptIdStatus := SOME(valOf(lookupLongVid(env, longid))); env)) + | Absyn.RECpat(row, false) => List.foldl checkFieldPat env row + | Absyn.RECpat(_, true) => nyi "flexible record patterns" + | Absyn.CONSpat(_, pat) => checkPat(env, pat) + | Absyn.TYPEDpat(pat, _) => checkPat(env, pat) + | Absyn.ASpat(vid, pat) => checkPat(bindVid(env, vid, Basis.VAL), pat) + + and checkFieldPat((_, pat), env) = checkPat(env, pat) + + (* + * EXPRESSIONS + *) + + fun checkExp(env, exp) = + case exp + of Absyn.SCONexp _ => () + | Absyn.VIDexp(longid, refOptIdStatus) => refOptIdStatus := SOME(valOf(lookupLongVid(env, longid))) + | Absyn.RECexp row => List.app (checkFieldExp env) row + | Absyn.LETexp(Absyn.DEC decs, exp) => checkExp(checkLetDecs(decs, env), exp) + | Absyn.APPexp(f, arg) => (checkExp(env, f); checkExp(env, arg)) + | Absyn.TYPEDexp(exp, _) => checkExp(env, exp) + | Absyn.HANDLEexp(exp, match) => (checkExp(env, exp); checkMatch(env, match)) + | Absyn.RAISEexp exp => checkExp(env, exp) + | Absyn.FNexp match => checkMatch(env, match) + + and checkFieldExp env (_, exp) = checkExp(env, exp) + + and checkMatch(env, Absyn.MATCH clauses) = List.app (checkClause env) clauses + + and checkClause env (pat, exp) = checkExp(checkPat(env, pat), exp) + + and checkLetDecs([], env) = env + | checkLetDecs(dec :: decs, env) = checkLetDecs(decs, checkLetDec(dec, env)) + + and checkLetDec(dec, env) = + case dec + of Absyn.VALdec(_, nonrecs, recs) => checkLetRecs(recs, checkLetNonRecs(nonrecs, env)) + | _ => nyi "type, exception, local, or open form of in LET" + + and checkLetNonRecs([], env) = env + | checkLetNonRecs((pat, exp) :: nonrecs, env) = + (checkExp(env, exp); checkLetNonRecs(nonrecs, checkPat(env, pat))) + + and checkLetRecs(recs, env) = + let val env' = checkLetRecPats(recs, env) + val _ = List.app (checkLetRecMatch env') recs + in + env' + end + + and checkLetRecPats([], env) = env + | checkLetRecPats((pat, _) :: recs, env) = + checkLetRecPats(recs, checkPat(env, pat)) + + and checkLetRecMatch env (_, match) = (checkMatch(env, match); ()) + + (* + * SPECIFICATIONS + *) + + fun checkValSpecs([], env) = env + | checkValSpecs((vid, _) :: valspecs, env) = + (* TODO: + * - check vid may be bound (not bound in env, not forbidden) + * - elaborate ty and record that too + *) + checkValSpecs(valspecs, bindVid(env, vid, Basis.VAL)) + + fun checkConBinds([], _, env) = env + | checkConBinds((vid, tyOpt) :: conbinds, mkis, env) = + (* TODO: + * - check vid may be bound (not bound in env, not forbidden) + * - elaborate tyOpt and record that too + *) + let val hasarg = case tyOpt of SOME _ => true + | NONE => false + in + checkConBinds(conbinds, mkis, bindVid(env, vid, mkis hasarg)) + end + + fun checkDatBinds([], _, env) = env + | checkDatBinds((_, _, Absyn.CONBIND conbinds) :: datbinds, mkis, env) = + (* TODO: + * - check tycon may be bound + * - compute equality attribute + * - record tycon in TE + *) + checkDatBinds(datbinds, mkis, checkConBinds(conbinds, mkis, env)) + + fun checkSpec(spec, env) = + case spec + of Absyn.VALspec valspecs => checkValSpecs(valspecs, env) + | Absyn.TYPEspec _ => env (* TODO *) + | Absyn.EQTYPEspec _ => env (* TODO *) + | Absyn.DATATYPEspec(Absyn.DATBIND datbinds) => checkDatBinds(datbinds, Basis.CON, env) + | Absyn.DATAREPLspec _ => env (* TODO *) + | Absyn.EXspec(Absyn.CONBIND conbinds) => checkConBinds(conbinds, Basis.EXN, env) + | Absyn.STRUCTUREspec _ => nyi "nested structure in " + | Absyn.INCLUDEspec _ => nyi "include " (* TODO *) + | Absyn.SHARINGTYspec _ => nyi "sharing type " + | Absyn.SHARINGSTRspec _ => nyi "sharing " + + fun checkSpecs([], env) = Basis.SIG env + | checkSpecs(spec :: specs, env) = checkSpecs(specs, checkSpec(spec, env)) + + (* + * SIGNATURE EXPRESSIONS & BINDINGS + *) + + fun lookupSigid'(Basis.BASIS(Basis.SIGE dict, _), sigid) = + Dict.find(dict, sigid) + + fun lookupSigid(basis, sigid) = + case lookupSigid'(basis, sigid) + of NONE => lookupSigid'(Basis.initialBasis, sigid) + | sth => sth + + fun bindSigid(Basis.BASIS(Basis.SIGE dict, env), sigid, sigma) = + (* TODO: check that sigid isn't already bound *) + Basis.BASIS(Basis.SIGE(Dict.insert(dict, sigid, sigma)), env) + + fun findSigma(sigid, basis) = + case lookupSigid(basis, sigid) + of SOME sigma => SOME sigma + | NONE => + case readBasisFile(sigid, ".sig") + of NONE => NONE + | SOME basis => lookupSigid'(basis, sigid) + + fun checkSigid(sigid, basis) = + case findSigma(sigid, basis) + of SOME sigma => sigma + | NONE => error("sigid " ^ sigid ^ " is unbound and no valid .basis file found") + + fun checkSigExp(sigexp, basis) = + case sigexp + of Absyn.SPECsigexp(Absyn.SPEC specs) => checkSpecs(specs, Basis.emptyEnv) + | Absyn.SIGIDsigexp sigid => checkSigid(sigid, basis) + | Absyn.WHEREsigexp _ => nyi "where bindVid(env, vid, Basis.EXN false) + | Absyn.OFexb(vid, _) => bindVid(env, vid, Basis.EXN true) + | Absyn.EQexb(vid, longvid) => + case lookupLongVid(env, longvid) + of SOME(idstatus as Basis.EXN _) => bindVid(env, vid, idstatus) + | SOME _ => error "exception aliasing non-exception" + | NONE => error "exception aliasing unbound identifier" + + fun checkExBinds([], env) = env + | checkExBinds(exb :: exbinds, env) = + checkExBinds(exbinds, checkExBind(exb, env)) + + fun checkDec(dec, env) = + case dec + of Absyn.VALdec(_, nonrecs, recs) => checkLetRecs(recs, checkLetNonRecs(nonrecs, env)) + | Absyn.TYPEdec _ => env + | Absyn.DATATYPEdec(Absyn.DATBIND datbinds, _) => checkDatBinds(datbinds, Basis.CON, env) + | Absyn.DATAREPLdec _ => env + | Absyn.EXdec exbinds => checkExBinds(exbinds, env) + | _ => nyi "abstype, local, or open form of structure-level " + + fun checkDecs([], env) = env + | checkDecs(dec :: decs, env) = checkDecs(decs, checkDec(dec, env)) + + fun checkModule(Absyn.DEC decs, sigid, refOptEnv, basis) = + let val _ = checkDecs(decs, Basis.emptyEnv) + val Basis.SIG env = checkSigid(sigid, basis) + in + refOptEnv := SOME env; + env + end + + fun checkStrExp(strexp, basis) = + case strexp + of Absyn.TRANSPARENTstrexp(Absyn.STRUCTstrexp(Absyn.STRDEC[Absyn.DECstrdec dec]), Absyn.SIGIDsigexp sigid, refOptEnv) => + checkModule(dec, sigid, refOptEnv, basis) + | Absyn.OPAQUEstrexp(Absyn.STRUCTstrexp(Absyn.STRDEC[Absyn.DECstrdec dec]), Absyn.SIGIDsigexp sigid, refOptEnv) => + checkModule(dec, sigid, refOptEnv, basis) + | _ => nyi "non-plain form of " + + fun bindStrid(Basis.BASIS(sigenv, Basis.E(Basis.SE dict, valenv)), strid, env) = + Basis.BASIS(sigenv, Basis.E(Basis.SE(Dict.insert(dict, strid, env)), valenv)) + + fun checkStrBinds([], basis) = basis + | checkStrBinds((strid, strexp) :: strbinds, basis) = + checkStrBinds(strbinds, bindStrid(basis, strid, checkStrExp(strexp, basis))) + + fun checkStrDec(strdec, basis) = + case strdec + of Absyn.DECstrdec _ => nyi "top-level plain " + | Absyn.STRUCTUREstrdec(Absyn.STRBIND strbinds) => checkStrBinds(strbinds, basis) + | Absyn.LOCALstrdec _ => nyi "top-level 'local'" + + fun checkStrDecs([], basis) = basis + | checkStrDecs(strdec :: strdecs, basis) = + checkStrDecs(strdecs, checkStrDec(strdec, basis)) + + (* + * TOP-LEVEL DECLARATIONS + *) + + fun checkTopDec(topdec, basis) = + case topdec + of Absyn.STRDECtopdec(Absyn.STRDEC strdecs) => checkStrDecs(strdecs, basis) + | Absyn.SIGDECtopdec(Absyn.SIGBIND sigbinds) => checkSigBinds(sigbinds, basis) + | Absyn.FUNDECtopdec _ => nyi "functor declarations" + + fun check topdec = checkTopDec(topdec, Basis.emptyBasis) + + end diff --git a/src/compiler/TypeScheme.sml b/src/compiler/TypeScheme.sml new file mode 100644 index 0000000..b5d5884 --- /dev/null +++ b/src/compiler/TypeScheme.sml @@ -0,0 +1,139 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure TypeScheme : TYPE_SCHEME = + struct + type Level = Types.Level + type TyVar = Types.TyVar + type Type = Types.Type + + datatype Comb = QUOTE of Type + | BOUND of int + | REC of (Types.Label * Comb) list * bool * Types.Record option ref + | CONS of Comb list * Types.TyName + + datatype TyVarScheme = TVS of {eq: bool, ovld: Types.TyName list option} + + datatype TypeScheme = TYS of {bvars: TyVarScheme list, comb: Comb} + + (* INSTANTIATION *) + + fun evalComb bvars = + let fun eval(QUOTE ty) = ty + | eval(BOUND i) = Vector.sub(bvars, i) + | eval(REC(fields, is_flexible, subst)) = + let val fields' = List.map evalField fields + in + Types.REC(Types.RECORD{fields = fields', is_flexible = is_flexible, subst = subst}) + end + | eval(CONS(combs, tyname)) = + Types.CONS(List.map eval combs, tyname) + and evalField(label, comb) = (label, eval comb) + in + eval + end + + fun instBVars(bvars, level) = + let fun inst_bvar(TVS{eq, ovld}) = Types.mkTyVar(level, eq, ovld) + in + List.map inst_bvar bvars + end + + fun inst(TYS{bvars, comb}, level) = + let val bvars' = instBVars(bvars, level) + in + (bvars', evalComb (Vector.fromList(List.map Types.VAR bvars')) comb) + end + + (* GENERALIZATION *) + + fun gen_bvar(Types.TYVAR{eq, ovld, ...}, _) = TVS{eq = eq, ovld = ovld} + + fun cannot_gen(_, NONE) = false + | cannot_gen(Types.TYVAR{ovld = SOME _, ...}, SOME _) = true + | cannot_gen(Types.TYVAR{ovld = NONE, level, ...}, SOME limit) = level <= limit + + fun next_offset([]) = 0 + | next_offset((_, n) :: _) = n + 1 + + fun gen_tyvar(tyvar, bvars_in, cond) = + case cannot_gen(tyvar, cond) + of true => + (bvars_in, QUOTE(Types.VAR tyvar)) + | false => + case Util.bound(bvars_in, tyvar) + of SOME offset => + (bvars_in, BOUND offset) + | NONE => + let val offset = next_offset bvars_in + in + ((tyvar, offset) :: bvars_in, BOUND offset) + end + + fun mkcons(combs, tyname) = + let fun try_unquote([], tys) = QUOTE(Types.CONS(List.rev tys, tyname)) + | try_unquote((QUOTE ty) :: combs', tys) = try_unquote(combs', ty :: tys) + | try_unquote(_ :: _, _) = CONS(combs, tyname) + in + try_unquote(combs, []) + end + + fun mkrec(fields, is_flexible, subst) = + let fun try_unquote([], fields'') = QUOTE(Types.REC(Types.RECORD{fields = fields'', is_flexible = is_flexible, subst = subst})) + | try_unquote((label, QUOTE ty) :: fields', fields'') = try_unquote(fields', (label, ty) :: fields'') + | try_unquote(_ :: _, _) = REC(fields, is_flexible, subst) + in + try_unquote(fields, []) + end + + fun gen_ty(ty, bvars_in, cond) = + case Types.derefTy ty + of Types.VAR tyvar => + gen_tyvar(tyvar, bvars_in, cond) + | Types.REC record => + let val Types.RECORD{fields, is_flexible, subst} = Types.derefRecord record + val (bvars_out, fields', _) = List.foldl gen_field (bvars_in, [], cond) fields + in + (bvars_out, mkrec(List.rev fields', is_flexible, subst)) + end + | Types.CONS(tys, tyname) => + let val (bvars_out, combs, _) = List.foldl gen_elt (bvars_in, [], cond) tys + in + (bvars_out, mkcons(List.rev combs, tyname)) + end + + and gen_field((label, ty), (bvars_in, fields, cond)) = + let val (bvars_out, comb) = gen_ty(ty, bvars_in, cond) + in + (bvars_out, (label, comb) :: fields, cond) + end + + and gen_elt(ty, (bvars_in, combs, cond)) = + let val (bvars_out, comb) = gen_ty(ty, bvars_in, cond) + in + (bvars_out, comb::combs, cond) + end + + fun gen_cond(ty, cond) = + let val (bvars, comb) = gen_ty(ty, [], cond) + in + TYS{bvars = map gen_bvar (List.rev bvars), comb = comb} + end + + fun gen_limit(ty, limit) = gen_cond(ty, SOME limit) + fun gen_all ty = gen_cond(ty, NONE) + fun gen_none ty = TYS{bvars = [], comb = QUOTE ty} + + end diff --git a/src/compiler/Types.sml b/src/compiler/Types.sml new file mode 100644 index 0000000..de412e2 --- /dev/null +++ b/src/compiler/Types.sml @@ -0,0 +1,79 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Types : TYPES = + struct + + type Level = int + type TyName = string + type Label = string + + datatype Type = VAR of TyVar + | REC of Record + | CONS of Type list * TyName + + and TyVar = TYVAR of { + (* generalization scope upper bound *) + level : int, + (* requires equality? *) + eq : bool, + (* optional monomorphic overloading *) + ovld : TyName list option, + (* substitution *) + subst : Type option ref + } + + and Record = RECORD of { + (* known fields *) + fields : (Label * Type) list, + (* flexible? *) + is_flexible : bool, + (* substitution *) + subst : Record option ref + } + + fun mkTyVar(level, eq, ovld) = TYVAR{level = level, eq = eq, ovld = ovld, subst = ref NONE} + fun mkFreeTyVar(level) = mkTyVar(level, false, NONE) + fun mkEqTyVar(level) = mkTyVar(level, true, NONE) + fun mkOvldTyVar(ovld, level) = mkTyVar(level, false, SOME ovld) + + fun tyvarOvld(TYVAR{ovld, ...}) = ovld + + (* tyvar dereference with path compression *) + + fun update(x, []) = x + | update(x, subst :: substs) = (subst := SOME x; update(x, substs)) + + local + fun deref(VAR(TYVAR{subst as ref(SOME ty), ...}), subst', substs) = deref(ty, subst, subst' :: substs) + | deref(ty, _, substs) = update(ty, substs) + in + fun derefTy(VAR(TYVAR{subst as ref(SOME ty), ...})) = deref(ty, subst, []) + | derefTy(ty) = ty + end + + fun mkRecord(fields, is_flexible) = RECORD{fields = fields, is_flexible = is_flexible, subst = ref NONE} + + (* record dereference with path compression *) + + local + fun deref(RECORD{subst as ref(SOME record), ...}, subst', substs) = deref(record, subst, subst' :: substs) + | deref(record as RECORD{subst = ref NONE, ...}, _, substs) = update(record, substs) + in + fun derefRecord(RECORD{subst as ref(SOME record), ...}) = deref(record, subst, []) + | derefRecord(record as RECORD{subst = ref NONE, ...}) = record + end + + end diff --git a/src/compiler/UNIFY.sig b/src/compiler/UNIFY.sig new file mode 100644 index 0000000..ace0468 --- /dev/null +++ b/src/compiler/UNIFY.sig @@ -0,0 +1,21 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature UNIFY = + sig + type Type = Types.Type + exception Unify + val unify : Type * Type -> unit + end diff --git a/src/compiler/UTIL.sig b/src/compiler/UTIL.sig new file mode 100644 index 0000000..a9ba0a2 --- /dev/null +++ b/src/compiler/UTIL.sig @@ -0,0 +1,24 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature UTIL = + sig + val bound : (''a * 'b) list * ''a -> 'b option + val lookup : (''a * 'b) list * ''a -> 'b (* XXX: delete *) + val min : int * int -> int + val member : ''a * ''a list -> bool + val intersect : ''a list * ''a list -> ''a list + val after : (unit -> 'a) * (unit -> unit) -> 'a + end diff --git a/src/compiler/Unify.sml b/src/compiler/Unify.sml new file mode 100644 index 0000000..e092ddb --- /dev/null +++ b/src/compiler/Unify.sml @@ -0,0 +1,141 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Unify : UNIFY = + struct + + type Type = Types.Type + + exception Unify + + fun same_tyvars(Types.TYVAR{subst = subst1, ...}, Types.TYVAR{subst = subst2, ...}) = subst1 = subst2 + + fun check_occur tyvar1 = + let fun checkTy ty = + case Types.derefTy ty + of Types.VAR tyvar2 => + if same_tyvars(tyvar1, tyvar2) then raise Unify else () + | Types.REC record => + let val Types.RECORD{fields, ...} = Types.derefRecord record + in + List.app checkField fields + end + | Types.CONS(tys, _) => + List.app checkTy tys + and checkField(_, ty) = checkTy ty + in + checkTy + end + + fun check_level maxlevel = + let fun checkTy ty = + case Types.derefTy ty + of Types.VAR(Types.TYVAR{level, eq, ovld, subst}) => + if level > maxlevel then subst := SOME(Types.VAR(Types.mkTyVar(maxlevel, eq, ovld))) + else () + | Types.REC record => + let val Types.RECORD{fields, ...} = Types.derefRecord record + in + List.app checkField fields + end + | Types.CONS(tys, _) => + List.app checkTy tys + and checkField(_, ty) = checkTy ty + in + checkTy + end + + fun check_equality(eq, ty) = + let fun checkTy ty = + case Types.derefTy ty + of Types.VAR(Types.TYVAR{level, eq, ovld, subst}) => + if eq then () + else subst := SOME(Types.VAR(Types.mkTyVar(level, true, ovld))) + | Types.REC record => + let val Types.RECORD{fields, is_flexible, ...} = Types.derefRecord record + in + if is_flexible then raise Unify + else List.app checkField fields + end + | Types.CONS(tys, tyname) => + if tyname = "->" then raise Unify + else if tyname = "ref" then () + else List.app checkTy tys + and checkField(_, ty) = checkTy ty + in + if eq then checkTy ty else () + end + + fun check_ovld(NONE, _) = () + | check_ovld(SOME tynames, Types.CONS([], tyname)) = + if Util.member(tyname, tynames) then () else raise Unify + | check_ovld(SOME _, _) = raise Unify + + fun bind_tyvar(tyvar1 as Types.TYVAR{level, eq, ovld, subst}, ty2) = + (check_occur tyvar1 ty2; + check_level level ty2; + check_equality(eq, ty2); + check_ovld(ovld, ty2); + subst := SOME ty2) + + fun join_level(level1, level2) = Util.min(level1, level2) + + fun join_eq(eq1, eq2) = eq1 orelse eq2 + + fun join_ovld(NONE, ovld2) = ovld2 + | join_ovld(ovld1 as SOME _, NONE) = ovld1 + | join_ovld(SOME tynames1, SOME tynames2) = SOME(Util.intersect(tynames1, tynames2)) + + fun unify_tyvars(tyvar1, tyvar2) = + if same_tyvars(tyvar1, tyvar2) then () + else + let val Types.TYVAR{level = level1, eq = eq1, ovld = ovld1, subst = subst1} = tyvar1 + val Types.TYVAR{level = level2, eq = eq2, ovld = ovld2, subst = subst2} = tyvar2 + val level3 = join_level(level1, level2) + val eq3 = join_eq(eq1, eq2) + val ovld3 = join_ovld(ovld1, ovld2) + val subst3 = SOME(Types.VAR(Types.mkTyVar(level3, eq3, ovld3))) + in + subst1 := subst3; + subst2 := subst3 + end + + fun unify(ty1, ty2) = unify2(Types.derefTy ty1, Types.derefTy ty2) + + and unify2(Types.VAR tyvar1, Types.VAR tyvar2) = unify_tyvars(tyvar1, tyvar2) + | unify2(Types.VAR tyvar1, ty2) = bind_tyvar(tyvar1, ty2) + | unify2(ty1, Types.VAR tyvar2) = bind_tyvar(tyvar2, ty1) + | unify2(Types.REC record1, Types.REC record2) = unify_records(Types.derefRecord record1, Types.derefRecord record2) + | unify2(Types.CONS(tys1, tyname1), Types.CONS(tys2, tyname2)) = + if tyname1 = tyname2 then unify_tys(tys1, tys2) else raise Unify + | unify2(_, _) = raise Unify + + and unify_tys([], []) = () + | unify_tys(ty1 :: tys1, ty2 :: tys2) = (unify(ty1, ty2); unify_tys(tys1, tys2)) + | unify_tys(_, _) = raise Unify + + and unify_records(record1 as Types.RECORD{fields = fields1, is_flexible = is_flexible1, subst = subst1}, + record2 as Types.RECORD{fields = fields2, is_flexible = is_flexible2, subst = subst2}) = + if is_flexible1 then + (if is_flexible2 then + unify_records_unordered(fields1, fields2, subst1, subst2) + else + unify_records_less_or_equal(fields1, subst1, fields2, record2)) + else if is_flexible2 then + unify_records_less_or_equal(fields2, subst2, fields1, record1) + else + unify_records_equal(fields1, fields2) + + end diff --git a/src/compiler/Util.sml b/src/compiler/Util.sml new file mode 100644 index 0000000..6f6a7bf --- /dev/null +++ b/src/compiler/Util.sml @@ -0,0 +1,51 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Util : UTIL = + struct + + fun bound([], _) = NONE + | bound((key, attr) :: env, key') = + if key = key' then SOME attr else bound(env, key') + + exception Lookup + + fun lookup(env, key) = + case bound(env, key) + of SOME attr => attr + | NONE => raise Lookup + + fun min(x, y) : int = if x < y then x else y + + fun member(_, []) = false + | member(x, y :: ys) = (x = y) orelse member(x, ys) + + fun intersect'([], _, zs) = zs + | intersect'(x :: xs, ys, zs) = + intersect'(xs, ys, if member(x, ys) then x :: zs else zs) + + fun intersect(xs, ys) = intersect'(xs, ys, []) + + fun after(compute, cleanup) = + let datatype 'a status = OK of 'a | EXN of exn + val status = OK(compute()) handle exn => EXN exn + val _ = cleanup() + in + case status + of OK value => value + | EXN exn => raise exn + end + + end diff --git a/src/compiler/load.sml b/src/compiler/load.sml new file mode 100644 index 0000000..d6601b4 --- /dev/null +++ b/src/compiler/load.sml @@ -0,0 +1,58 @@ +(* + * Copyright 2015-2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +load "IntInf"; +load "Word"; +load "Real"; +load "OS"; +load "CommandLine"; +use "UTIL.sig"; +use "Util.sml"; +use "DICT.sig"; +use "Dict.sml"; +use "SOURCE.sig"; +use "Source.sml"; +use "LEXARG.sig"; +use "LexArg.sml"; +use "TOKEN.sig"; +use "Token.sml"; +use "LEXUTIL.sig"; +use "LexUtil.sml"; +use "LEXER.sig"; +use "Lexer.sml"; +use "BASIS.sig"; +use "Basis.sml"; +use "ABSYN.sig"; +use "Absyn.sml"; +use "PARSER.sig"; +use "Parser.sml"; +use "TYPES.sig"; +use "Types.sml"; +use "TYPE_SCHEME.sig"; +use "TypeScheme.sml"; +use "UNIFY.sig"; +(*use "Unify.sml";*)(*TODO: unfinished*) +use "TYPE_CHECK.sig"; +use "TypeCheck.sml"; +use "CORE_ERLANG.sig"; +use "CoreErlang.sml"; +use "CORE_ERLANG_PRINT.sig"; +use "CoreErlangPrint.sml"; +use "TRANSLATE.sig"; +use "Translate.sml"; +use "MAIN.sig"; +use "Main.sml"; +use "START.sig"; +use "Start.sml"; diff --git a/src/runtime/TextIO.erl b/src/runtime/TextIO.erl new file mode 100644 index 0000000..f9d354f --- /dev/null +++ b/src/runtime/TextIO.erl @@ -0,0 +1,25 @@ +%%% -*- erlang-indent-level: 2 -*- +%%% +%%% Copyright 2018 Mikael Pettersson +%%% +%%% Licensed under the Apache License, Version 2.0 (the "License"); +%%% you may not use this file except in compliance with the License. +%%% You may obtain a copy of the License at +%%% +%%% http://www.apache.org/licenses/LICENSE-2.0 +%%% +%%% Unless required by applicable law or agreed to in writing, software +%%% distributed under the License is distributed on an "AS IS" BASIS, +%%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +%%% See the License for the specific language governing permissions and +%%% limitations under the License. + +-module('TextIO'). +-export([ 'output'/0 + , 'stdOut'/0 + ]). + +output() -> fun output/1. +output({'stdOut', String}) -> io:format(String), {}. + +'stdOut'() -> 'stdOut'. diff --git a/src/runtime/erlml_runtime.erl b/src/runtime/erlml_runtime.erl new file mode 100644 index 0000000..f12fd5c --- /dev/null +++ b/src/runtime/erlml_runtime.erl @@ -0,0 +1,30 @@ +%%% -*- erlang-indent-level: 2 -*- +%%% +%%% Copyright 2018 Mikael Pettersson +%%% +%%% Licensed under the Apache License, Version 2.0 (the "License"); +%%% you may not use this file except in compliance with the License. +%%% You may obtain a copy of the License at +%%% +%%% http://www.apache.org/licenses/LICENSE-2.0 +%%% +%%% Unless required by applicable law or agreed to in writing, software +%%% distributed under the License is distributed on an "AS IS" BASIS, +%%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +%%% See the License for the specific language governing permissions and +%%% limitations under the License. + +-module(erlml_runtime). +-export([ init/0 + , set_var/2 + , get_var/1 + , raise_match/0 + ]). + +init() -> ets:new(?MODULE, [named_table, public]). + +set_var(Key, Val) -> ets:insert(?MODULE, {Key, Val}), {}. + +get_var(Key) -> ets:lookup_element(?MODULE, Key, 2). + +raise_match() -> throw('Match'). diff --git a/src/test/HELLO.sig b/src/test/HELLO.sig new file mode 100644 index 0000000..d8b1787 --- /dev/null +++ b/src/test/HELLO.sig @@ -0,0 +1,19 @@ +(* + * Copyright 2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +signature HELLO = + sig + val hello: unit -> unit + end diff --git a/src/test/Hello.sml b/src/test/Hello.sml new file mode 100644 index 0000000..6cea2fb --- /dev/null +++ b/src/test/Hello.sml @@ -0,0 +1,19 @@ +(* + * Copyright 2018 Mikael Pettersson + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + *) +structure Hello : HELLO = + struct + fun hello () = TextIO.output(TextIO.stdOut, "ErlML says Hello\n") + end