-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tokenizer gets no-meaning infix ops from JSON #87
Merged
Merged
Changes from 2 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
192d530
tokenizer gets no-meaning infix ops from JSON
rocky 173e071
Temporily test from operator-info-from-JSON branch
rocky 370b8fe
Toward getting this working...
rocky 7be3875
See if we can get MS Windows working
rocky ebe7baa
Scanner uses operators json mm (#88)
mmatera c6d025a
Scanner uses operators json mm (#89)
mmatera 5dc0978
allows to reload the tables in tokenizer module (#91)
mmatera File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
|
||
import os.path as osp | ||
import re | ||
import string | ||
from typing import Optional | ||
|
@@ -9,6 +9,22 @@ | |
from mathics_scanner.errors import ScanError | ||
from mathics_scanner.prescanner import Prescanner | ||
|
||
ROOT_DIR = osp.dirname(__file__) | ||
try: | ||
import ujson | ||
except ImportError: | ||
import json as ujson # type: ignore[no-redef] | ||
|
||
# Load Mathics3 character information from JSON. The JSON is built from | ||
# named-characters.yml | ||
|
||
operators_table_path = osp.join(ROOT_DIR, "data", "operators.json") | ||
assert osp.exists( | ||
operators_table_path | ||
), f"Internal error: Mathics3 Operator information are missing; expected to be in {operators_table_path}" | ||
with open(osp.join(operators_table_path), "r", encoding="utf8") as operator_f: | ||
OPERATOR_DATA = ujson.load(operator_f) | ||
|
||
# special patterns | ||
NUMBER_PATTERN = r""" | ||
( (?# Two possible forms depending on whether base is specified) | ||
|
@@ -33,7 +49,6 @@ | |
) | ||
full_names_pattern = r"(`?{0}(`{0})*)".format(base_names_pattern) | ||
|
||
# FIXME: Revise to get Character Symbols from data/characters.json | ||
tokens = [ | ||
("Definition", r"\? "), | ||
("Information", r"\?\? "), | ||
|
@@ -102,9 +117,7 @@ | |
("Equal", r" (\=\=) | \uf431 | \uf7d9 "), | ||
("Unequal", r" (\!\= ) | \u2260 "), | ||
("LessEqual", r" (\<\=) | \u2264 "), | ||
("LessSlantEqual", r" \u2a7d "), | ||
("GreaterEqual", r" (\>\=) | \u2265 "), | ||
("GreaterSlantEqual", r" \u2a7e "), | ||
("Greater", r" \> "), | ||
("Less", r" \< "), | ||
# https://reference.wolfram.com/language/ref/character/DirectedEdge.html | ||
|
@@ -148,7 +161,6 @@ | |
# ('PartialD', r' \u2202 '), | ||
# uf4a0 is Wolfram custom, u2a2f is standard unicode | ||
("Cross", r" \uf4a0 | \u2a2f"), | ||
("Colon", r" \u2236 "), | ||
# uf3c7 is Wolfram custom, 1d40 is standard unicode | ||
("Transpose", r" \uf3c7 | \u1d40"), | ||
("Conjugate", r" \uf3c8 "), | ||
|
@@ -159,56 +171,32 @@ | |
("Del", r" \u2207 "), | ||
# uf520 is Wolfram custom, 25ab is standard unicode | ||
("Square", r" \uf520 | \u25ab"), | ||
("SmallCircle", r" \u2218 "), | ||
("CircleDot", r" \u2299 "), | ||
# ('Sum', r' \u2211 '), | ||
# ('Product', r' \u220f '), | ||
("PlusMinus", r" \u00b1 "), | ||
("MinusPlus", r" \u2213 "), | ||
("Nor", r" \u22BD "), | ||
("Nand", r" \u22BC "), | ||
("Xor", r" \u22BB "), | ||
("Xnor", r" \uF4A2 "), | ||
("Diamond", r" \u22c4 "), | ||
("Wedge", r" \u22c0 "), | ||
("Vee", r" \u22c1 "), | ||
("CircleTimes", r" \u2297 "), | ||
("CenterDot", r" \u00b7 "), | ||
("Star", r" \u22c6"), | ||
("VerticalTilde", r" \u2240 "), | ||
("Coproduct", r" \u2210 "), | ||
("Cap", r" \u2322 "), | ||
("Cup", r" \u2323 "), | ||
("CirclePlus", r" \u2295 "), | ||
("CircleMinus", r" \u2296 "), | ||
("Congruent", r" \u2261 "), | ||
("Intersection", r" \u22c2 "), | ||
("Union", r" \u22c3 "), | ||
("VerticalBar", r" \u2223 "), | ||
("NotVerticalBar", r" \u2224 "), | ||
("DoubleVerticalBar", r" \u2225 "), | ||
("NotDoubleVerticalBar", r" \u2226 "), | ||
("Element", r" \u2208 "), | ||
("NotElement", r" \u2209 "), | ||
("Subset", r" \u2282 "), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are these entries gone? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It now gets pulled in from JSON. |
||
("Superset", r" \u2283 "), | ||
("ForAll", r" \u2200 "), | ||
("Exists", r" \u2203 "), | ||
("NotExists", r" \u2204 "), | ||
("Not", r" \u00AC "), | ||
("Equivalent", r" \u29E6 "), | ||
("Implies", r" \uF523 "), | ||
("RightTee", r" \u22A2 "), | ||
("DoubleRightTee", r" \u22A8 "), | ||
("LeftTee", r" \u22A3 "), | ||
("DoubleLeftTee", r" \u2AE4 "), | ||
("SuchThat", r" \u220D "), | ||
("VerticalSeparator", r" \uF432 "), | ||
("Therefore", r" \u2234 "), | ||
("Because", r" \u2235 "), | ||
("Backslash", r" \u2216 "), | ||
] | ||
|
||
for table in ("no-meaning-infix-operators",): | ||
table_info = OPERATOR_DATA[table] | ||
for operator_name, unicode in table_info.items(): | ||
# if any([tup[0] == operator_name for tup in tokens]): | ||
# print(f"Please remove {operator_name}") | ||
tokens.append((operator_name, f" {unicode} ")) | ||
|
||
|
||
literal_tokens = { | ||
"!": ["Unequal", "Factorial2", "Factorial"], | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This produces an error when the
operator.json
table is built for the first time.mathics_scanner.generate.build_operator_tables
importsmathics_scanner.__version__
,with makes that
mathics_scanner.__init__
be loaded. But then, it tries to import this module, and finds that the file is not already created.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A way to avoid this error would be to put all the initialization code inside an initialization function, and instead of raising an exception if the file does not exist, just show a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the situation before 370b8fe and this does not happen now on Ubuntu and Macos. But I don't know why Windows is still failing here.
I am exhausted from tracking down all the little inconsistencies for today. If you can move this forward, please do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving code to a function was done and delaying initialization was attempted but the code is too thorny that something in there is doing stuff earlier than when a tokenizer is first created.
I now have a workaround, so let's not add yet another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have work cycles to spare, here are some things that in my opinion are more important than yet another workaround:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this, "correct" would be in relation with the WMA
Precedence[...]
, isn´t it?Do you mean as submodules of
no_meaning
?