You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to see a more carefully thought out and straight forward way to ask for data to come back from ServiceX. In short, normalize the access patterns for servicex. The current interface has grown organically, and there are now so many operations and it is hard to surface them from one place to the other. Time to take a step back, perhaps.
And AsAwkwardArray can be replaced by a bunch of different things:
AsPandasDF, as_pandas - a panda dataframe (this does not support nested objects!)
AsROOTTTree, as_ROOT_tree - a list of file(s) that contains a root TTree object
AsParquetFiles, as_parquet - a list of parquet file(s) built from awkward's to_parquet method
AsAwkwardArray, as_awkward - returns an awkward array of all the data
These methods do not return the actual data - just the request to generate the data. The value() call at the end actually triggers the infrastructure to generate the data. There is another version of the method called value_async() that does the same thing, but allows you to easily queue up many requests at once.
There are at least two axes here:
What data format should come back from the ServiceX query
Should programming interface be sync or asynchronous?
It would be nice to see a more carefully thought out and straight forward way to ask for data to come back from ServiceX. In short, normalize the access patterns for servicex. The current interface has grown organically, and there are now so many operations and it is hard to surface them from one place to the other. Time to take a step back, perhaps.
What we have now
And
AsAwkwardArray
can be replaced by a bunch of different things:AsPandasDF
,as_pandas
- apanda
dataframe (this does not support nested objects!)AsROOTTTree
,as_ROOT_tree
- a list of file(s) that contains aroot TTree
objectAsParquetFiles
,as_parquet
- a list of parquet file(s) built fromawkward
'sto_parquet
methodAsAwkwardArray
,as_awkward
- returns anawkward
array of all the dataThese methods do not return the actual data - just the request to generate the data. The
value()
call at the end actually triggers the infrastructure to generate the data. There is another version of the method calledvalue_async()
that does the same thing, but allows you to easily queue up many requests at once.There are at least two axes here:
There is yet another axis for the root and
parquet
queries - do you want the files downloaded locally into a cache or just auri
to access them over the web? This is only accessible via direct calls to theservicex
library (e.g. seeget_root_files_async
,get_root_files_stream
, andget_data_rootfiles_uri_stream
andget_data_rootfiles_uri_async
).What do users of
func_adl
want?Let's look at each one and reason about why different choices are made.
awkward
distributed processing?root
andparquet
) or local copied files?Starting from Scratch
The text was updated successfully, but these errors were encountered: