Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No JAVA_HOME at run time makes Pydoop very slow #344

Open
simleo opened this issue Feb 22, 2019 · 0 comments
Open

No JAVA_HOME at run time makes Pydoop very slow #344

simleo opened this issue Feb 22, 2019 · 0 comments

Comments

@simleo
Copy link
Member

simleo commented Feb 22, 2019

#338 added JAVA_HOME auto detection. That's convenient, especially at compile time, since it makes the installation process easier. It also allows Pydoop to work with no JAVA_HOME set at run time, which is also convenient, but it turns out that things can be much slower in that case. Running the entire unit tests suite (minus the avro ones) with no JAVA_HOME is almost 5 times slower. HADOOP_HOME also has an effect, though not nearly as big (a quick comparison on my laptop resulted in 344s with both unset, 75s with JAVA_HOME set and 70s with both set).

Reviewing our caching of these variables (or lack thereof) might help, although not in the case where one is running several Python processes that use Pydoop (auto detection needs to be performed at least once). We do need to document this properly though, so that users make sure they have the most efficient run time setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant