-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed changes to TokenReader #697
Comments
That all sounds great to me. Use all the FP! Your composability goal still comes down to "last write wins" right? Just with a well defined method for doing that composition. Are you sure you want to expose the current API first? Your proposed changes seem clearly superior so we could make the changes first and then expose but admittedly I don't have a clear picture of the details yet. |
Yes. It's just
No. Users of the API would just get annoyed when we break it. I'll get started on this after we hash out what constitutes a language, I know we've discussed it before, but I'd like to be solid on that first. My thinking from a while back was that a language would be composed of a readtable and one or more syntactic macros that would be automatically imported (and potentially invoked). input: // mylang.js
'lang sweet.js'
...
export default lang;
export { table };
// main.js
'lang mylang';
... output: import mylang from 'mylang';
mylang
... We pass
As far as language composability is concerned I'm thinking: 'lang yourlang, mylang'; to: import yourlang from 'yourlang';
import mylang from 'mylang';
yourlang
mylang We concat the tables and then pass to the reader. I've been looking for the right structure to represent the readtable for several months now. I think a trie is the way to go, but I think I'll start with POJOs and linear algorithms first to test out the API. This is probably going to take a while. Fortunately I don't think it's all or nothing. I can implement something like the above API and see what the default table looks like once that's done (caching should be in place at that point). If everything looks good, I'll find/make a better data structure for the table as I expect the first pass to be dog slow. Laziness is a nice to have AFAIC. |
👍
Good call. My initial proposal is to do the most general thing possible and then consider how to layer API conveniences on top of it. Fundamentally a language needs to:
I think this is satisfied by a language definition simply being a function from source to a list of // L.js
export default function read(source: string): List<TokenTree>
// main.js
'lang L.js';
// ... The language's As far as composability goes I'm not sure what the difference between 'lang L.js, K.js'; and 'lang LK.js';
// LK.js
import LReadtable from '...';
import KReadtable from '...';
export default read(source: string) {
return L.concat(K).read(source); // or whatever
} is/should be. In otherwords, do we need something specific in the Just some initial thoughts, I don't have strong opinions yet :) |
This also requires exporting the readtable from each language file, which is fine. But it doesn't make languages composable unless I can turn the
The difference is that languages can now be very small and as a user of a language I can compose without learning about readtables and Say |
The concern I have about allowing the language consumer to do arbitrary composition is that readtables are not a monoid (they are not associative) and they don't commute. Having order subtly matter will be a real footgun I worry. |
Oh wait, they are associative aren't they? Regardless, ordering matters so my point still stands. |
I see your point, but ordering matters any time you do binding. And declaring a language binds tokenizers to string patterns. I think what you're getting at is that a user of composable languages would have to be nearly as sophisticated as a developer of languages and have knowledge of the internals of an implementation. |
I'm still worried about extensibility though. Are you thinking readtables will be exported in addition to readers? |
Right, ordering where the consequences of that ordering are obvious (like binding) is perfectly fine. If I bind We could address this a couple of ways. One, allow your proposed syntax of Two, don't allow any composition syntax; users can only specify one language. Languages can still be composed of multiple independent readtables following your proposed API, it just has to be done by language authors. Pro: simplest for the user. Con: much more work to create compositions. Three, take inspiration from traits:
Pro: most general, users have all the power. Con: potentially confusing, unclear if additional operators would be useful in practice. |
I like this idea a lot. But we don't know how much power people will want/need. Let's start just exposing one language and seeing what people's pain points are. We can always add operators later. To be clear, a language module must have a read function as default export. Optionally, it can expose >= 1 readtable and whatever else it wants. Correct? |
I think that's sufficient. The expander needs to know how to read a file, the pragma tells the expander where to find the right reader, readers can use readtables or whatever else they want to do the actually reading. Future composition operators in the pragma might require a different export API but that will be opt-in. |
I've been thinking about changes to the reader API to make it simpler and add better state management and caching.
It would make sense to postpone these features until reader macros are in the wild to see what pain points there are.
It would also make sense to see if the API would be informed by a new implementation
Big concerns:
Example:
Using this API would aid language composability as the user is simply concatenating two readtables together (Readtable would form a Monoid).
'lang sweet.js, decorator'
could simply result in a call tocomposeLanguages
:Behind the scenes, the readtable is being "updated" (it's immutable) with the slice of the source string that was consumed mapped to a thunk returning the token(s) produced by the reader. This would allow large portions of the source to avoid analysis and tokenization in the incremental compilation case.
Tokens would be decorated with a
slice
containing location information by the reader.I have struggled with state management in the reader for a while now and am starting to suspect that a state monad transformer would be useful here. Currently both the reader and charstream contain state necessary to create a token. I would like to get rid of
CharStream
altogether as IMO it doesn't provide much beyond holding location state if the readtable can return matched prefixes ala a trie.I don't know if this caching would actually result in a performance improvement. It's currently just speculation.
As for laziness, I was considering redefining
TokenTree
as:Then
read
would be:/cc @disnet
The text was updated successfully, but these errors were encountered: