Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokenizer gets no-meaning infix ops from JSON #87

Merged
merged 7 commits into from
Nov 25, 2024

Conversation

rocky
Copy link
Member

@rocky rocky commented Nov 24, 2024

With this I think all infix operators with no meaning will start to work in mathics-core.

@rocky
Copy link
Member Author

rocky commented Nov 24, 2024

Looks like UndirectedEdge (which is not a no-meaning operator) needs to be handled. Possibly DirectedEdge too.

("Element", r" \u2208 "),
("NotElement", r" \u2209 "),
("Subset", r" \u2282 "),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these entries gone?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now gets pulled in from JSON.

# named-characters.yml

operators_table_path = osp.join(ROOT_DIR, "data", "operators.json")
assert osp.exists(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces an error when the operator.json table is built for the first time.
mathics_scanner.generate.build_operator_tables imports mathics_scanner.__version__,
with makes that mathics_scanner.__init__ be loaded. But then, it tries to import this module, and finds that the file is not already created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to avoid this error would be to put all the initialization code inside an initialization function, and instead of raising an exception if the file does not exist, just show a warning.

Copy link
Member Author

@rocky rocky Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the situation before 370b8fe and this does not happen now on Ubuntu and Macos. But I don't know why Windows is still failing here.

I am exhausted from tracking down all the little inconsistencies for today. If you can move this forward, please do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to avoid this error would be to put all the initialization code inside an initialization function, and instead of raising an exception if the file does not exist, just show a warning.

Moving code to a function was done and delaying initialization was attempted but the code is too thorny that something in there is doing stuff earlier than when a tokenizer is first created.

I now have a workaround, so let's not add yet another.

Copy link
Member Author

@rocky rocky Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

If you have work cycles to spare, here are some things that in my opinion are more important than yet another workaround:

  • Put in the correct precedence for no-meaning operators
  • Split out the following list of operator names by creating sections for left assoc infix, right assoc infix, flat infix, (not yet done prefix/postfix), "misc" and newly added {Und,D]irectedEdge operators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have work cycles to spare, here are some things that in my opinion are more important than yet another workaround:

* Put in the correct precedence for no-meaning operators

Regarding this, "correct" would be in relation with the WMA Precedence[...], isn´t it?

* Split out the following list of operator names by creating sections for left assoc infix, right assoc infix, flat infix,  (not yet done prefix/postfix), "misc" and newly added {Und,D]irectedEdge operators.

Do you mean as submodules of no_meaning?

Function unicode change so as not to conflict with RightTeeArrow.
Function unicode is a long arrow.

Remove CSV to YML stuff. CSV is beyond hope of keeping in sync.

Remove tokeniser import from __init__.py. Workaround for now.
We need this so we can create operator JSON without needing the
JSON table to be prevously around.

clanup tokeniser.py a little bit.

SPlit typing changing variables like tokens and literal_tokens into
separate variables for each type they can hold.


# Initalized below in update_tokens_from_JSON
tokens = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these variables a part of the module interface? Or are they just internal variables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokens and literal tokens start out as tuples of operator name and string value and are used to create position indexes which are used in parsing.

but after this, they are converted to some sort of compiled expression, and then I think these type-changed variables are used in parsing. (It may have been in some style of programming where variables morph after their old value is no longer needed, was possibly at one time considered clever. Nowadays, it is considered annoying and frowned upon. In fact. in strongly-typed languages you are not allowed to do this.)

I have split out those two uses. We could run del tokens when they are no longer needed, but, right now, I don't think it is worth the bother.

@rocky rocky force-pushed the scanner-uses-operators-JSON branch 2 times, most recently from 3b0e5aa to ec5e5fa Compare November 25, 2024 00:32
@rocky rocky force-pushed the scanner-uses-operators-JSON branch from ec5e5fa to 7be3875 Compare November 25, 2024 00:35
* working out the initialization of the tokenizer
@mmatera
Copy link
Contributor

mmatera commented Nov 25, 2024

@rocky, sorry. Something went wrong. It just passed all the tests... Let's revert

@rocky rocky merged commit 76d2876 into master Nov 25, 2024
12 checks passed
@rocky rocky deleted the scanner-uses-operators-JSON branch November 25, 2024 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants