Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDataFrame simplifications - numpy/python syntax #297

Open
miranov25 opened this issue Jan 20, 2023 · 19 comments
Open

RDataFrame simplifications - numpy/python syntax #297

miranov25 opened this issue Jan 20, 2023 · 19 comments

Comments

@miranov25
Copy link
Owner

https://gitter.im/matrix/5ba1f93bd73408ce4fa8a265/@agoose77:matrix.org?at=639f83faa151003b5a7550f4

Possible simplification the creation of RDataFrame function definitions
As for simplifying the generation of RDataFrame templates, @pl0xz0rz has implemented something similar in RootInteractive for Python -> javascript with ast. We have replaced the python functions with corresponding javascript functions.

E.g.:
In [136]: ast.dump(ast.parse("track.GetP() / mass", mode="eval"),True,False)
Out [136]: "Expression(body=BinOp(left=Call(func=Attribute(value=Name(id='track', ctx=Load()), attr='GetP', ctx=Load()), args=[], keywords=[]), op=Div(), right=Name(id='mass', ctx=Load())))"
In [140]: ast.dump(ast.parse("track[1:10,x:y].GetP() / mass", mode="eval"),True,False)
Out[140]: "Expression(body=BinOp(left=Call(func=Attribute(value=Subscript(value=Name(id='track', ctx=Load()), slice=ExtSlice(dims=[Slice(lower=Constant(value=1, kind=None), upper=Constant(value=10, kind=None), step=None), Slice(lower=Name(id='x', ctx=Load()), upper=Name(id='y', ctx=Load()), step=None)]), ctx=Load()), attr='GetP', ctx=Load()), args=[], keywords=[]), op=Div(), right=Name(id='mass', ctx=Load())))"
@miranov25
Copy link
Owner Author

Test code updated. First test passing:

In [10]:     rdf2 = makeDefine("arrayD","array1D0[1:10]-array1D2[:20:2]", rdf,3, True)
====================================
arrayD
 array1D0[1:10]-array1D2[:20:2]
====================================

Implementation:
 
auto arrayD(){
    RVec<double> result(9);
    for(size_t i=0; i<9; i++){
        result[i] = (array1D0[1+i*1]) - (array1D2[0+i*2]);
    }
    return result;
}
            
Dependencies
 ['array1D0', 'array1D2']

@miranov25
Copy link
Owner Author

Failing tests:

   # rdf2 = makeDefine("arrayD","cos(array1D0[1:10])", rdf,3, True)               # TODO Failing - cos is fund as an obect - not function
    # rdf2 = makeDefine("arrayD","array1DTrack[1:10].Px()", rdf,3, True)           # TODO Failing - member function not found


@miranov25
Copy link
Owner Author

Range checks 1D:

  • Static range checks can be done if static ranges
  • Better add a dynamic check if requested - as it can slow down

miranov25 added a commit that referenced this issue Jan 30, 2023
miranov25 added a commit that referenced this issue Jan 30, 2023
…al for discusion with RDataFrame responsible)
miranov25 added a commit that referenced this issue Feb 8, 2023
…Define creates new columns, not just generating function
@miranov25
Copy link
Owner Author

fd829c5

New test 1D and 2D + class methods

parsed= makeDefine("arrayD","array1D0[1:10]-array1D2[:20:2]", rdf,3, True) # working
rdf = makeDefineRDF("arrayD0", parsed["name"], parsed, rdf, verbose=1)
rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameD0.root")
# test 1 - 1D delta auto range -OK
parsed= makeDefine("arrayDAll","array1D0[:]-array1D2[:]", rdf,3, True) # working
rdf = makeDefineRDF("arrayDAall", parsed["name"], parsed, rdf, verbose=1)
rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameDAll.root")
# test 2 - - 1D function fix range -OK
parsed = makeDefine("arrayCos","cos(array1D0[1:10])", rdf,3, True);
rdf = makeDefineRDF("arrayCos0", parsed["name"], parsed, rdf, verbose=1)
rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameCos0.root");
# test 3 - - 1D function full range -OK
parsed = makeDefine("arrayCosAll","cos(array1D0[:])", rdf,3, True);
rdf = makeDefineRDF("arrayCosAll", parsed["name"], parsed, rdf, verbose=1)
rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameCosAll.root");
# test 4 - 1D member function OK
parsed = makeDefine("arrayPx","array1DTrack[1:10].Px()", rdf,3, True);
rdf = makeDefineRDF("arrayPx", parsed["name"], parsed, rdf, verbose=1)
rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameArrayPx.root");
# test 5 - 2D delta auto range
parsed = makeDefine("arrayD2D","array2D0[1:10,:]-array2D1[1:10,:]", rdf,3, True);
rdf = makeDefineRDF("arrayD2D", parsed["name"], parsed, rdf, verbose=1)
rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameArrayD2D.root");
# test 6 - 2D delta against 1D
parsed = makeDefine("arrayD1D2D","array2D0[:,:]-array1D0[:]", rdf,3, True);
rdf = makeDefineRDF("arrayD1D2D", parsed["name"], parsed, rdf, verbose=1)
rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameArrayD1D2D.root");
#
rdf = makeDefine("arrayD1D2DP","array2D0[:,:]+array1D0[:]", rdf,3, False);
rdf.Describe()
# Test of invarainces

Test output:

--------                -----
Columns in total           14
Columns from defines       14
Event loops run             8
Processing slots            1

Column          Type                                            Origin
------          ----                                            ------
array1D0        ROOT::VecOps::RVec<float>                       Define
array1D2        ROOT::VecOps::RVec<float>                       Define
array1DTrack    ROOT::VecOps::RVec<TParticle>                   Define
array2D0        ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Define
array2D1        ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Define
arrayCos0       ROOT::VecOps::RVec<double>                      Define
arrayCosAll     ROOT::VecOps::RVec<double>                      Define
arrayD0         ROOT::VecOps::RVec<float>                       Define
arrayD1D2D      ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Define
arrayD2D        ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> >  Define
arrayDAall      ROOT::VecOps::RVec<float>                       Define
arrayPx         ROOT::VecOps::RVec<double>                      Define
nPoints         int                                             Define
nPoints2        int                                             Define

@miranov25
Copy link
Owner Author

Todo:

  1. Add C++ interface
  2. Invariance test
  3. Protection against invalid inputs

@miranov25
Copy link
Owner Author

miranov25 commented Feb 8, 2023

Failing test - diagnostic

  • test failed in the compilation - as it was working syntax - wrong bracket
  • What to do in case of incompatible syntax - for the moment let C++ compiler to print error
In [8]:     rdf=makeDefine("arrayD2D","array2D0[:][:]>0", rdf,3, True);
ROOT::VecOps::RVec<float>  ROOT::VecOps::RVec<float> 
float ('f', 32)
====================================
arrayD2D
 array2D0[:][:]>0
====================================

Implementation:
 ROOT::VecOps::RVec<char> arrayD2D(ROOT::VecOps::RVec<ROOT::VecOps::RVec<float> > &array2D0){
    ROOT::VecOps::RVec<char> result(array2D0[0+i*1].size() - 0);
    for(size_t i=0; i<array2D0[0+i*1].size() - 0; i++){
        result[i] = ((array2D0[0+i*1][0+i*1]) > (0));
    }
    
    return result;
} 
Dependencies
 ['array2D0']

@miranov25
Copy link
Owner Author

Test 7 failed in case we did not make a dictionary for the 2D array of boolean:

  • should we generate interface automatically?
    #pragma link C++ class ROOT::RVec < ROOT::RVec < char>> + ;
    # test 7   - 2D boolen test
    parsed=makeDefine("arrayD20Bool","array2D0[:,:]>0", rdf,3, True);
    rdf = makeDefineRDF("arrayD20Bool", parsed["name"], parsed,  rdf, verbose=1)
    rdf.Snapshot("makeTestRDataFrame","makeTestRDataFrameArrayD1D2D.root");

ehellbar pushed a commit to ehellbar/RootInteractive that referenced this issue Feb 21, 2023
miranov25 pushed a commit that referenced this issue Aug 1, 2023
…ted class

* bug fix in the retrieving of class method
* make a dictionary for templated class
@miranov25
Copy link
Owner Author

Problem for templated classes

After fix withaccess to class method in commit above, still some problems observed

In [9]: getClassMethod("o2::tpc::TrackTPC","getAlpha")
Out[9]:
('float o2::track::TrackParametrization<float>::getAlpha()',
 'float o2::track::TrackParametrization<float>::getAlpha()')

==>

File ~/github/RootInteractive/RootInteractive/Tools/RDataFrame/RDataFrame_Array.py:74, in scalar_type_str(dtype)
     63 def scalar_type_str(dtype):
     64     dtypes = {
     65         ('f', 32): "float",
     66         ('f', 64): "double",
   (...)
     72         ('i', 8): "char"
     73     }
---> 74     return dtypes[dtype]

KeyError: 'float o2::track::TrackParametrization<float>::getAlpha()'

@miranov25
Copy link
Owner Author

Problem to find if method exist

In [24]: getClassMethod("""o2.track.TrackParametrization""","getAlpha")
Non supported o2.track.TrackParametrization.getAlpha
Out[24]: ('', '')

In [25]: getClassMethod("""o2.track.TrackParametrization<float>""","getAlpha")
Non supported o2.track.TrackParametrization<float>.getAlpha
Out[25]: ('', '')
In [28]: getClassMethod("""o2.track.TrackParametrization("float")""","getAlpha")
Out[28]:
('float o2::track::TrackParametrization<float>::getAlpha()',
 'float o2::track::TrackParametrization<float>::getAlpha()')

@miranov25
Copy link
Owner Author

miranov25 commented Aug 1, 2023

Template arguments to be replaced
-> ("xxx")

In [29]: ROOT.o2.track.TrackParametrization("float").getAlpha
Out[29]: <cppyy.CPPOverload at 0x7f935058d880>

after patch:

    className2=className.replace("::",".")
    className2 =className2.replace("<", '("')
    className2 =className2.replace(">", '")')
In [2]: getClassMethod("""o2::track::TrackParametrization<float>""","getAlpha")
Out[2]:
('float o2::track::TrackParametrization<float>::getAlpha()',
 'float o2::track::TrackParametrization<float>::getAlpha()')

miranov25 pushed a commit that referenced this issue Aug 1, 2023
* adding failing test
* still failing in other place in AST most probably
miranov25 pushed a commit that referenced this issue Aug 2, 2023
* adding failing test
* test fails as in case of more than one function, tuple used, proper function with proper argument list to be choosen
@miranov25
Copy link
Owner Author

The getClassmetheod was not finished, arguments were ignored for a moment

The code crash if more than one function is returning.
Example:

  • example 1
In [11]: getClassMethod("""o2::track::TrackParametrization<float>""","getAlpha")
Out[11]:
('float o2::track::TrackParametrization<float>::getAlpha()',
 'float o2::track::TrackParametrization<float>::getAlpha()')
  • example 2
In [16]: getClassMethod("""o2::track::TrackParametrization<float>""","getYAt")
Out[16]:
('float o2::track::TrackParametrization<float>::getYAt(float xk, float b)',
 'float o2::track::TrackParametrization<float>::getYAt(float xk, float b)')

miranov25 pushed a commit that referenced this issue Aug 2, 2023
* fixing doc string parsing
* argument parsing not yet
@miranov25
Copy link
Owner Author

Remaining problems in ROOT dicitionary

  • Non supported properties - new function to be added - getClass property
  • property value

miranov25 pushed a commit that referenced this issue Aug 2, 2023
* more verbose output
* argument parsing not yet
miranov25 pushed a commit that referenced this issue Aug 2, 2023
* for a moment only for public properties
* Example
        In [11]: getClassProperty("TParticle","fPdgCode")
Out[11]: ('int', 40)
miranov25 pushed a commit that referenced this issue Aug 2, 2023
* adding property test which is FAILING
* new function getClassProperty to be used to solve
miranov25 pushed a commit that referenced this issue Aug 2, 2023
miranov25 pushed a commit that referenced this issue Aug 5, 2023
miranov25 pushed a commit that referenced this issue Aug 5, 2023
…Data')

* adding new test which currently fails
miranov25 pushed a commit that referenced this issue Aug 5, 2023
…of the ROOT.gROOT.FindSTLClass

* avoiding seg.fault in the TClass destructor
@miranov25
Copy link
Owner Author

Automatic template function generation consideration - Error handling to define

In case the function is generated, in the second time the generation fails, because the function is already in the scope.

What should be error handling?

miranov25 pushed a commit that referenced this issue Aug 15, 2023
…the RDataFrame documentation

* remaining problem with ROOT::EnableImplicitMT in C++->Python C++
miranov25 pushed a commit that referenced this issue Aug 15, 2023
…the RDataFrame documentation

* remaining problem with ROOT::EnableImplicitMT
* adding test with nCores
* test is for a moment failing
miranov25 pushed a commit that referenced this issue Aug 15, 2023
* problem with  ROOT::EnableImplicitMT  solved
* test with nCores working
miranov25 pushed a commit that referenced this issue Aug 16, 2023
…ous commit

* test with nCores not working stabily
* test with 1, 2  cores working
* 4 cores sometimes failing
@miranov25
Copy link
Owner Author

Problem with ROOT.EnableImplicitMT(nCores) in ROOT + automatic C++ code generation

The problem looks to be random

  • test with nCores not working stably
  • test with 1, 2 cores working
  • 4 cores sometimes failing

To simplify the debugging and to make the code faster, it is preferable as an option to save the code in C++ and make a shared library from there.

@miranov25
Copy link
Owner Author

Using precomiled C++ macro the problem wit the EnableImplicitMT(nCores) dissapeared

  • The library loading has to be done however in order
  • First load libraries, only after load the generated code

@miranov25
Copy link
Owner Author

Problem in the function generation to be checked

  • looks like function parsing is not save
  • some functions from namespaces are not found
  • this is working
           R"(dfRI=makeDefine('normChi2TPC', 'sqrt(tracksExtra[:].fTPCChi2NCl)', dfRI,0x3,0x2,rdfLib))""\n"
  • this is crashing with seg.fault
           R"(dfRI=makeDefine('normChi2TPC', 'TMath:sqrt(tracksExtra[:].fTPCChi2NCl)', dfRI,0x3,0x2,rdfLib))""\n"

@miranov25
Copy link
Owner Author

To parse the function- python func-doc can be used

Similarly already done for classes

in example above:

In [12]: ROOT.sqrt.func_doc
Out[12]: 'long double ::sqrt(long double __x)\nfloat ::sqrt(float __x)\ndouble ::sqrt(double __x)'

In [13]: ROOT.TMath.Sqrt.func_doc
Out[13]: 'double TMath::Sqrt(double x)'

@miranov25
Copy link
Owner Author

miranov25 commented Aug 18, 2023

C++ Namespace function support e.g TMath::<>

  • adding support for the parsing - getGlobalFunction - DONE
  • but not entering there ...in last parsing
    parsed = makeDefine("array2D0_cos0", "cos(array2D0[0:0,:])", rdf, None, 3); # this is working
    parsed = makeDefine("array2D0_cos1", "TMath::Cos(array2D0[0:0,:])", rdf, None, 3); # this is failing

@miranov25
Copy link
Owner Author

miranov25 commented Aug 18, 2023

AST Support for the slice with dimensionality reduction

  • for the RVec, std::vec
  • for the std::tuple(later) - wishlist
     rdf = makeDefine("array2D0_0", "array2D0[0:0,:]", rdf, None, 3);    # this works as intended
    #rdf = makeDefine("array2D0_0", "array2D0[0]", rdf, None, 3);       # should return 1D RVec at position 0, now it is failing

miranov25 pushed a commit that referenced this issue Aug 18, 2023
* not yet working
* adding test which are failing for a moment
miranov25 pushed a commit that referenced this issue Aug 18, 2023
* getGlobalFunction returns C++ name of unction which is differnt from python in case of the ::
* not yet working
* adding test which are failing for a moment
miranov25 pushed a commit that referenced this issue Aug 21, 2023
miranov25 pushed a commit that referenced this issue Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant