Replies: 10 comments 14 replies
-
We (or I need) to do something like this for the expression-elements-properties branch. |
Beta Was this translation helpful? Give feedback.
-
https://github.com/Mathics3/mathics-core/blob/benchmarking/bench-results.rst has recent results comparing V 4.0.0 performance on the combinatorica 0.9 test versus the current master. I will be drilling down here to see what is up in more details. Will also do some comparisons for #83 (comment) If you want to verify the two branches to use are https://github.com/Mathics3/mathics-core/tree/benchmarking for current and https://github.com/Mathics3/mathics-core/tree/4.0.0-benchmark Run:
in each branch. |
Beta Was this translation helpful? Give feedback.
-
For
vs 4.0.0-benchmarking
While for 4.1.0:
4.0.0:
|
Beta Was this translation helpful? Give feedback.
-
I don't have time right now to investigate this further, but I do have a plan for approaching things that I think is neat. First of all, we could use (Elswhere I have mentioned that the "1000" could be turned into "50" or even "10" and things would be just as apparent but take less time and easier to debug.) But here's something else that I may work on because it is neat. In What one can do is remove any data where the number of calls didn't change between say 10 and 50. Or the number of calls doesn't change between version 4.0.0 and 4.1.0. And clearly remove any line where that has a 0 With this, I think we would have a very precise indication of what's up. |
Beta Was this translation helpful? Give feedback.
-
Here is a proposal for benchmarking code, to compare the performance in the evaluation of several (simple) expressions, against
Results: machine: x86_64 system: Linux 5.13.0-41-generic test1<< Do[{a,a,a,a,a,a,a,a,a,a,a,a,a,a,a},{10}] >> 2.2.0: 0.581977ms +/-6.1% test2<< Do[Table[a, {15}],{10}] >> 2.2.0: 2.380450ms +/-4.0% test3<< Do[Table[i, {i, 15}],{10}] >> 2.2.0: 37.125100ms +/-0.8% |
Beta Was this translation helpful? Give feedback.
-
@rocky, here I added it. machine: x86_64 system: Linux 5.13.0-41-generic test1<< Do[{a,a,a,a,a,a,a,a,a,a,a,a,a,a,a},{10}] >> 2.2.0: 0.588356ms +/-5.8% test2<< Do[Table[a, {15}],{10}] >> 2.2.0: 2.362820ms +/-4.6% test3<< Do[Table[i, {i, 15}],{10}] >> 2.2.0: 37.315500ms +/-0.8% |
Beta Was this translation helpful? Give feedback.
-
Thanks for benchmarking. I have a number of thoughts and comments on this. That will have to wait for when I have time. How is 4.10.dev0 (which I assume is 4.1.0.dev0) different than origin/master? |
Beta Was this translation helpful? Give feedback.
-
I have been looking the code and following traces in execution of This is an area where we are slow because we are calling The fact that The
So here I am thinking of adding a function
There are times when we create an So computing properties of elements in creating an Expression slows things down when an evaluation will never occur. Lastly, there is a pattern that comes up: Here I am thinking about adding a method In sum:
Now a little bit about the parameters for the functions.
|
Beta Was this translation helpful? Give feedback.
-
With change along the lines above, I am seeing I've been a bit sloppy or pessimistic what I've done so far - it was hard enough to get this far. I do think the code is a little bit clearer and cleaner. Also interesting would be an extended sequence like the combinatorica runs. But that too will have to wait for some other time. |
Beta Was this translation helpful? Give feedback.
-
In working on this, there are a couple of conclusions or observations.
|
Beta Was this translation helpful? Give feedback.
-
Continuing with the problem of performance in Mathics, I want to discuss here one of the critical factors that determine it: the evaluation routine. To decouple it from other aspects, like pattern matching, rule applying, or parameter replacement, let's review how much it takes to evaluate a trivial expression, i.e. expressions that does not involve previously defined symbols. According to MathicsBenchmark, to evaluate something like
F[x]
whenF
andx
are undefined symbols, takes around 60 us. We can also check it inside the interpreter using the function
Timing
:We can see that each call takes, in Mathics, around 60+/-10 us, while in WMA .20+/-.02 us.
In Mathics, since the time does not seems to depend on the number of atomic parameters, we can expect that most of the time is consumed in the
mathics.core.expression.evaluate
method. Inside this method, most of the work is done by themathics.core.expression.evaluate_next
method, which is applied iteratively until a fixed point is reached.Looking inside
mathics.core.expression.evaluate_next
by splitting in into different small chunks, and timing those chunks I found:(Processing
Global`F[Global`x]
)(Bellow is the modified code)
The first two lines provides a baseline to discount the time required by the timer, that after the first call are just on two function calls (~120ns x 2).
Discarding those lines, and sorting this by the time consumed, we obtain then
Processing
Global`F[Global`x]
So, the most costly step is to check attributes. In our case, attributes are empty, so this is a kind of lower bound. Probably, we can improve this time using a bitmask instead of a list of strings to store attributes. Building the new expression can also be improved by optimizing
from_python
(as in PR #51) or avoiding to call it at all (PR #49).Following the list, we have the step of processing rules. In this case, there are not rules, so all the time is devoted into finding a definition for the head and the leaves, and again, checking attributes. In the following section I am going to discuss some ideas about how to speedup the access to the
Definition
object.Very close, in the next place, we find the "checking range" chuck. Again, in part of this subroutine, we need to check attributes, but also appears another routine used many times along the code:
Expression.has_form
.has_form
involves a) a function call, b) string comparisons instead of symbol (is) comparisons. Some idea to speed up this calls are implemented in PR #65. Also, this chunk involves calls toevaluate
over the leaves, which in this case we can expect takes the half of the time (according to the time required to evaluate the head of the expression, position 6. in the ranking).The remaining items in the list takes each one less than 3us, involving a) private function definitions, b) local module loading, c) coping leaves. If they seems to be less critical, we should notice that each of them takes 10 times more time that the whole evaluation in WMA.
Finally, notice that by using CYTHON, times are roughly reduced to a half, except in the case of loading attributes, when the time is incremented. In any case, what we need here is to reduce all these times in a factor of 10 at the very least.
Regarding the access to definitions
As I mentioned before, on each step on the evaluation process, the method
Expression.evaluate
has to access theDefinition
object associated to certainlookup_name
in order to recover different sets of rules and attributes, that defines the way in which the rules are applied. Any improvement in the algorithm that brings theDefinition
of a givenSymbol
should have a measurable positive improvement in the evaluation time.In the current implementation, the
Definition
objects are stored in the collection classDefinitions
. This class stores all the definitions in the session. This allows for example, along the same Python session, to have several independent Mathics sessions. I think this is used in Mathics-Django to serve different instances [Check]. In other implementations (like WMA or jupyter's wolfram-kernel) this behaviour is implemented using different "kernels" that run as different process. With the current implementation, aSymbol
can have manyDefinition
s, according the session, and then, during the evaluation, we need to "lookup" the corresponding definitions.Definitions.get_definition
returns the definition of a givenSymbol
, in terms of its (not necesarily fulli qualified) name. Definitions stores up to 4 instaces ofDefinition
s objects for each symbol in different dictionaries:Definitions.definitions_cache
,Definitions.builtin
,Definitions.user
, andDefinitions.pymathics
.Definitions.get_definition
first try to find the (non necesarily fqn) in theDefinitions.definitions_cache
dictionary. Most of the time, theDefinition
is in thedefinitions_cache
and then is just returned. If the definition is not in thedefinitions_cache
, thenget_definition
looks into the other three dictionaries by its fqn. If the a definition appears in just one of the three dictionaries, then that definition is stored in the cache and afterward, is returned.If there is no definition, a new empty definition is created and stored as
user
definition and into the cache. On the other hand, if there are more than one definition, a merge of the available definitions is built and stored into the cache.builtin
definitions are built and stored whenDefinitions
object is created, whilepymathics
definitions are built and stored when apymathics
module is loaded. This allows tounload
modules without erase the vanilla definition provided bymathics-core
, and thatClear[S]
restores the vanilla definition provided bybuiltin
and thepymathics
definitions already loaded.When the input from the front-end is parsed, usually the expressions consists of non fqns. During this parsing step,
Definitions.get_definition
helps to determine if certain name exists in the current context, or in the context path, to produce expressions made ofSymbols
, that always have fqn's.Afterward, during the evaluation, we can assume that all the calls to
get_definition
are made with fqn's. In this process,Expression.evaluate
andSymbol.evaluate
ask for the lookup name of the symbols in the expression callingget_lookup_name()
. ForSymbol
sget_lookup_name()
returns the (fqn) of theSymbol
. ForExpression
s,get_lookup_name()
(in master) returnsself._head.get_lookup_name()
, in a recursive way. In the branch faster_get_lookup_name (PR #16) an iterative algorithm is proposed, that seems to reduce the time in around a 20% (132ns vs 167ns in my laptop).Another idea to improve the access time would be to avoid function calls to the
get_definition
method by checking first if there is already a definition in thedefinitions_cache
PR #59 explores this direction.Still faster could be also to avoid dealing with
Definitions
collections and store definitions directly insideSymbol
objects. These are directions that we can explore in the future.Appendix:
Modified code for timing
evaluate_next
Trivial expression evaluations from jupyter (using timeit)
1
126 ns ± 3.75 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Global`a
329 ns ± 33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Global`F[]
2 µs ± 130 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Global`F[Global`a]
2.1 µs ± 9.26 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Global`F[System`Pi]
2.2 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Global`F[Global`a, Global`b]
2.29 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Global`F[Global`a, Global`b, Global`c, Global`d, Global`e, Global`f, Global`g, Global`h, Global`i, Global`j, Global`k]
4.09 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Beta Was this translation helpful? Give feedback.
All reactions