Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I create a pandas dataframe without using Airflow? #12914

Closed
kebab-mai-haddi opened this issue Jun 7, 2019 · 8 comments
Closed

How can I create a pandas dataframe without using Airflow? #12914

kebab-mai-haddi opened this issue Jun 7, 2019 · 8 comments

Comments

@kebab-mai-haddi
Copy link

No description provided.

@mbasmanova
Copy link
Contributor

@avisrivastava254084 Aviral, could you provide some context for this question? What are you trying to do?

@kebab-mai-haddi
Copy link
Author

@mbasmanova , Hi!
I want to create a pandas dataframe from a hive table using Presto.
Currently, I am doing it in this manner:

def get_pandas_dataframe(self, hql, parameters=None):
        if not self.airflow_conn_reqd:
            import pandas
            cursor = self.presto_client.cursor()
            try:
                cursor.execute(self._strip_sql(hql), parameters)
                data = cursor.fetchall()
            except DatabaseError as e:
                raise PrestoException(self._get_pretty_exception_message(e))
            column_descriptions = cursor.description
            if data:
                df = pandas.DataFrame(data)
                df.columns = [c[0] for c in column_descriptions]
            else:
                df = pandas.DataFrame()
        else:
            df = self.get_pandas_df(query)
        return (
            df.replace('', np.nan)
        )

Are you able to understand it?

Note: Creating a pandas dataframe is provided in Airflow hooks of Presto but in normal Python client.

Can I create a PR and get this merged?

@mbasmanova
Copy link
Contributor

@avisrivastava254084 Aviral, are you suggesting to add an Airflow hook for Presto (https://airflow.apache.org/concepts.html#hooks) ? Sure, let's try that. Go ahead and create a PR.

@mbasmanova
Copy link
Contributor

@avisrivastava254084

Note: Creating a pandas dataframe is provided in Airflow hooks of Presto

You are referring to this piece of code, right?

https://github.com/apache/airflow/blob/7d904467d6523f80b8a441093bb71f6e77dcdd68/airflow/hooks/presto_hook.py#L99

Why can't you simply copy-paste the relevant parts into your Python program?

@mbasmanova
Copy link
Contributor

@avisrivastava254084 Is this related to prestodb/presto-python-client#83 ?

@mbasmanova
Copy link
Contributor

CC: @ggreg

@ggreg
Copy link

ggreg commented Jun 25, 2019

Following up on prestodb/presto-python-client#83

@mbasmanova
Copy link
Contributor

@ggreg Thank you, Greg.

@avisrivastava254084 Aviral, I'm closing this issue in favor of prestodb/presto-python-client#83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants