Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateTime64(ns) KeyError, when ns is set in table definition #28

Open
Lytmbot opened this issue May 8, 2020 · 4 comments
Open

DateTime64(ns) KeyError, when ns is set in table definition #28

Lytmbot opened this issue May 8, 2020 · 4 comments

Comments

@Lytmbot
Copy link

Lytmbot commented May 8, 2020

Hey, thanks for your great work, this module has been really helpful!

using clickhouse-server version: 20.3.8.53

I have a small problem with DateTime64(ns) where the size of ns has been explicitly set in the table definition.

With a clickhouse table as follows:

CREATE TABLE db_name.tbl_name (
   Timestamp DateTime64(6) CODEC(Delta(8), LZ4),
   SomeData Float32 CODEC(LZ4)
) 
ENGINE = MergeTree() PARTITION BY toYYYYMMDD(Timestamp)
ORDER BY intHash64(toInt64(Timestamp)) 
SAMPLE BY intHash64(toInt64(Timestamp))

A query:

query = "SELECT DISTINCT * FROM db_name.tbl_name"
connection = {
    'host': server_name,
    'database': db_name,
    'user': 'default'
}    
pandahouse.read_clickhouse(query=query, index=False, connection=connection)

Results in:

pandahouse/core.py in read_clickhouse(query, tables, index, connection, **kwargs)
     56     lines = execute(query, external=external, stream=True,
     57                     connection=connection)
---> 58     return to_dataframe(lines, **kwargs)
     59 
     60 

pandahouse/convert.py in to_dataframe(lines, **kwargs)
     65     dtypes, parse_dates, converters = {}, [], {}
     66     for name, chtype in zip(names, types):
---> 67         dtype = CH2PD[chtype]
     68         if dtype == 'object':
     69             converters[name] = decode_escapes

KeyError: 'DateTime64(6)

If I understand correctly, the mapping defined earlier in the file convert.py

MAPPING = {'object': 'String',
           'uint64': 'UInt64',
           'uint32': 'UInt32',
           'uint16': 'UInt16',
           'uint8': 'UInt8',
           'float64': 'Float64',
           'float32': 'Float32',
           'int64': 'Int64',
           'int32': 'Int32',
           'int16': 'Int16',
           'int8': 'Int8',
           'datetime64[D]': 'Date',
           'datetime64[ns]': 'DateTime'}

does not cover the DateTime64(6) case. Or by extension any other DateTime(ns) case?

I would be happy to contribute a solution with a little guidance.

Thanks

@Lytmbot
Copy link
Author

Lytmbot commented May 8, 2020

not sure if this is acceptable, but a solution could be as simple as:

def to_dataframe(lines, **kwargs):
    names = lines.readline().decode('utf-8').strip().split('\t')
    types = lines.readline().decode('utf-8').strip().split('\t')

    dtypes, parse_dates, converters = {}, [], {}
    for name, chtype in zip(names, types):

        if chtype.startswith('DateTime64'):
            precs = int(chtype.replace('DateTime64(', '').replace(')', ''))
            chtype = 'DateTime'

        dtype = CH2PD[chtype]
        if dtype == 'object':
            converters[name] = decode_escapes
        elif dtype.startswith('datetime'):
            parse_dates.append(name)
        else:
            dtypes[name] = dtype

    return pd.read_table(lines, header=None, names=names, dtype=dtypes,
                         parse_dates=parse_dates, converters=converters,
                         na_values=set(), keep_default_na=False, **kwargs)

@kszucs
Copy link
Owner

kszucs commented May 12, 2020

Thanks for the bug report and the patch.

I'm trying to allocate some time to create a new release in the upcoming week.

@kdkavanagh
Copy link

Hi @kszucs have you had a chance to evaluate this? This issue is preventing us from meaningfully using pandahouse unfortunately. If @Lytmbot's proposed soln is agreeable, I'd be happy to submit a PR provided you can cut another release.

@kszucs
Copy link
Owner

kszucs commented Aug 18, 2020

Could you please submit a PR including unittests?

My mid-term plan is to use the newly added arrow and parquet clickhouse output formats, but their type support is incomplete so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants