Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make scipy and matplotlib required dependencies (#1159) #1404

Merged
merged 4 commits into from
Jun 22, 2017

Conversation

orbeckst
Copy link
Member

Fixes #1159 and #1361

Changes made in this Pull Request:

PR Checklist

  • Tests?
  • n/a Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@orbeckst orbeckst added this to the 0.16.x milestone Jun 16, 2017
@orbeckst orbeckst changed the title Issue 1159 analysis deps make scipy and matplotlib required dependencies (#1159) Jun 16, 2017
.travis.yml Outdated
- PIP_DEPENDENCIES='griddataformats'
- CONDA_DEPENDENCIES="mmtf-python nose=1.3.7 mock six biopython networkx cython joblib nose-timer matplotlib scipy griddataformats"
- CONDA_ALL_DEPENDENCIES="mmtf-python nose=1.3.7 mock six biopython networkx cython joblib nose-timer matplotlib netcdf4 scikit-learn scipy griddataformats seaborn coveralls clustalw=2.1"
- PIP_DEPENDENCIES=""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this line completely.

Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good besides some minor comments

@@ -42,6 +42,7 @@
'contact_matrix', 'dist', 'between']

import numpy as np
import scipy.sparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not from scipy import sparse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the full imports much better if we only use a few things. Much easier to grep. And really not much more to write. Why not be explicit with where you get functions from? And it keeps the name space tidier.

(Sidenote: I particular dislike function imports – I think they are really bad because you have no idea where they come from when reading code. )

del msg

import numpy as np
import scipy.stats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not from scipy.stats import gaussian_kde?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that is evil.

Honestly, function imports at the module level are bad. See comments above. See PEP something or other IIRC...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: Please don't.

@@ -155,14 +155,16 @@
from __future__ import division, absolute_import
from six.moves import zip
import numpy as np
import scipy.optimize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again why not import the one function we need?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above. Don't.

@@ -258,6 +257,10 @@
import logging
from itertools import cycle

import numpy as np
import matplotlib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a common idom is import matplotlib as mpl

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but we don't use it a lot and it's clearer to read and to grep for: if I am looking for matplotlib I use git grep matplotlib.

:func:`scipy.spatial.distance.directed_hausdorff` is an optimized
implementation of the early break algorithm of [Taha2015]_; note that one
still has to calculate the *symmetric* Hausdorff distance as
`max(directed_hausdorff(P, Q)[0], directed_hausdorff(Q, P)[0])`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are both still rendered under notes in the sphinx output?

# all standard requirements are available through PyPi and
# typically can be installed without difficulties through setuptools
setup_requires=[
'numpy>=1.9.3',
'numpy>=1.10.4',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you add this bump to the CHANGELOG? I know we did it a while again and forgot to change setup.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we did add it to the CHANGELOG alread.

@@ -128,7 +128,7 @@ def test_triangular_matrix():

multiplied_triangular_matrix_2 = triangular_matrix_2 * scalar
assert_equal(multiplied_triangular_matrix_2[0,1], expected_value * scalar,
err_msg="Error in TriangularMatrix: multiplication by scalar gave\
err_msg="Error in TriangularMatrix: multiplication by scalar gave\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the trailing \ and align with the opening brackets in the previous line

@kain88-de kain88-de dismissed their stale review June 16, 2017 20:32

I did changes myself

@kain88-de
Copy link
Member

I pushed more commits to the branch. I don't know why they dont show up for the PR

Copy link
Member Author

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really dislike importing functions from external modules because it makes it hard to understand where they come from (and they pollute the name space, but that's minor... until you define similar sounding things yourself).

@@ -42,7 +42,7 @@
'contact_matrix', 'dist', 'between']

import numpy as np
import scipy.sparse
from scipy import sparse
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok, I can live with it.

If I take only few things I like fully qualified imports but this one is fine if you like it better.

@@ -176,7 +176,7 @@
import logging

import numpy as np
import scipy.stats
from scipy.stats import gaussian_kde
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't. This looses all the context and when browsing code it is not all clear where this function comes from. Is it defined locally? Is it imported from elsewhere?

I really hate it when used for things not in the standard library.

Please revert.

@@ -155,7 +155,7 @@
from __future__ import division, absolute_import
from six.moves import zip
import numpy as np
import scipy.optimize
from scipy.optimize leastsq
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do this. See comment above.

@@ -106,7 +106,7 @@
import warnings

import numpy as np
import scipy.integrate
from scipy.integrate import simps
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't. See above.

@@ -36,7 +36,7 @@
from six.moves import range

import numpy as np
import scipy.optimize
from scipy.optimize import curve_fit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't. See above.

@kain88-de kain88-de force-pushed the issue-1159-analysis-deps branch 2 times, most recently from a40dbf8 to 259cdbd Compare June 17, 2017 18:38
@orbeckst
Copy link
Member Author

@kain88-de I do appreciate it that you humoured my personal pet-peeve with the imports. Apologies if my comments were not quite as civil as they could have been.

Thanks for working with me on these boring last minute PRs!

@orbeckst
Copy link
Member Author

The travis failure is in full because it exceeded 50 mins ( #1394 ...).

@orbeckst
Copy link
Member Author

I will clean up the history.

@orbeckst
Copy link
Member Author

Compacted history and force-pushed.

Should be ready for review and merge.

@kain88-de
Copy link
Member

Can we retarget this to 0.17.0 ? This takes consistently longer then 50 minutes to run our full test suite. I don't understand how this change suddenly added more then 10 minutes to our overall runtime.

@orbeckst
Copy link
Member Author

It looks to me that the overall +10 minutes comes from the fact that "minimal" is not so minimal anymore – having scipy enables many more tests. Minimal ran for 41 min and full timed out at 50 mins but in other cases full ran for 48:44 min. In the other case, minimal took 25:33 min.

Running the setup (astropy helpers) took 441.98s for the time-out full, 328.76s for the minimal with scipy and 294.30s for the other full that passed but only 137.51s for the other minimal without scipy.

I think we still have a general problem with the run time of full. I didn't quite understand the data in #1409 but if different installation of netcdf4 or travis caching #1405 could help then we should investigate further.

I am not convinced that this PR in particular extends the installation time of full because these installation times have been fairly random as of late; the numbers that I quote above differ by almost 150s for full without any changes in the travis config that would affect the full build.

I would advocate for adding it to 0.16.2 because it is ready and it will make things a bit easier for downstream packaging #1361.

@kain88-de
Copy link
Member

The runtime of full is too long even with the change in installation times. We used to be at ~30-35 minutes for a full test run. Our installation doesn't suddenly take more then 10 minutes. Also if full regularly runs into the time limit we won't know about our code coverage.

@orbeckst
Copy link
Member Author

The runtime of full is too long even with the change in installation times. We used to be at ~30-35 minutes for a full test run.

Yes, true. But if understand you correctly then you're hypothesizing that it is due to this PR. The numbers above show that it is not due to this PR: the full one https://travis-ci.org/MDAnalysis/mdanalysis/builds/244102132 for PR #1403 also takes almost 50 minutes.

Our installation doesn't suddenly take more then 10 minutes.

Fair enough, ~150 s is only 2:30 min. Did you look at what takes more time in the tests?

From the 48 min full:

MDAnalysisTests.coordinates.test_lammps.TestLAMMPSDATAWriter_data.test_Writer_atoms: 161.1134s
MDAnalysisTests.analysis.test_gnm.TestGNM.test_closeContactGNMAnalysis: 145.9985s
MDAnalysisTests.analysis.test_gnm.TestGNM.test_closeContactGNMAnalysis_weights_None: 138.4231s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_conect_bonds_all: 37.0044s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_conect_bonds_conect: 34.5182s
MDAnalysisTests.coordinates.test_gro.TestGROLargeWriter.test_writer_large: 32.3000s
MDAnalysisTests.analysis.test_encore.TestEncore.test_hes_custom_weights: 30.8317s
MDAnalysisTests.coordinates.test_lammps.TestLAMMPSDATAWriter_cnt.test_Writer_atoms: 23.0828s
MDAnalysisTests.analysis.test_pca.TestPCA.test_cov: 22.8741s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_slice_iteration: 20.3056s
MDAnalysisTests.coordinates.test_lammps.TestLAMMPSDATAWriter_data.test_Writer_dimensions: 19.9035s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_n_atoms_frame: 19.4444s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_n_frames: 19.3878s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_numconnections: 19.3644s
MDAnalysisTests.analysis.test_pca.TestPCA.test_transform_universe: 18.7797s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_rewind: 18.7055s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_iteration: 18.2078s
MDAnalysisTests.analysis.test_pca.TestPCA.test_transform: 18.1015s
MDAnalysisTests.analysis.test_pca.TestPCA.test_cosine_content: 17.9944s
MDAnalysisTests.analysis.test_encore.TestEncore.test_hes: 17.5463s
MDAnalysisTests.analysis.test_pca.TestPCA.test_different_steps: 17.3986s
MDAnalysisTests.analysis.test_gnm.TestGNM.test_generate_kirchoff: 17.1699s
MDAnalysisTests.analysis.test_encore.TestEncore.test_ces_error_estimation_ensemble_bootstrap: 16.8289s
MDAnalysisTests.coordinates.test_gro.TestGROLargeWriter.test_write_trajectory_universe: 16.7904s
MDAnalysisTests.coordinates.test_gro.TestGROLargeWriter.test_write_trajectory_atomgroup: 16.7405s
MDAnalysisTests.analysis.test_pca.TestPCA.test_cum_var: 16.4954s
MDAnalysisTests.analysis.test_pca.TestPCA.test_transform_mismatch: 15.7796s
MDAnalysisTests.analysis.test_pca.TestPCA.test_pcs: 15.6358s
MDAnalysisTests.analysis.test_encore.TestEncore.test_hes_align: 14.8079s
MDAnalysisTests.analysis.test_hbonds.TestHydrogenBondAnalysis.test_true_traj: 14.3645s
MDAnalysisTests.analysis.test_hbonds.TestHydrogenBondAnalysisHeuristic.test_true_traj: 13.9034s
MDAnalysisTests.coordinates.test_gro.TestGROLargeWriter.test_writer_large_residue_count: 13.4650s
MDAnalysisTests.analysis.test_hbonds.TestHydrogenBondAnalysisHeavy.test_true_traj: 13.0224s
MDAnalysisTests.analysis.test_hbonds.TestHydrogenBondAnalysisHeavyFail.test_true_traj: 12.7882s
MDAnalysisTests.core.test_atomselections.TestSelectionsXTC.test_same_fragment: 12.6707s
MDAnalysisTests.analysis.test_gnm.TestGNM.test_gnm: 12.6454s
MDAnalysisTests.analysis.test_encore.TestEncore.test_hes_error_estimation: 12.2850s

If you compare this to the minimal with scipy

MDAnalysisTests.analysis.test_gnm.TestGNM.test_closeContactGNMAnalysis: 153.0787s
MDAnalysisTests.analysis.test_gnm.TestGNM.test_closeContactGNMAnalysis_weights_None: 102.6995s
MDAnalysisTests.coordinates.test_lammps.TestLAMMPSDATAWriter_data.test_Writer_atoms: 95.6346s
MDAnalysisTests.analysis.test_encore.TestEncore.test_hes_custom_weights: 60.4777s
MDAnalysisTests.analysis.test_pca.TestPCA.test_cov: 29.9189s
MDAnalysisTests.analysis.test_pca.TestPCA.test_cosine_content: 26.2424s
MDAnalysisTests.analysis.test_pca.TestPCA.test_pcs: 25.6768s
MDAnalysisTests.analysis.test_pca.TestPCA.test_transform: 24.8249s
MDAnalysisTests.coordinates.test_gro.TestGROLargeWriter.test_writer_large: 24.0293s
MDAnalysisTests.analysis.test_encore.TestEncore.test_hes_align: 23.4301s
MDAnalysisTests.analysis.test_pca.TestPCA.test_transform_universe: 23.1988s
MDAnalysisTests.analysis.test_pca.TestPCA.test_different_steps: 23.1815s
MDAnalysisTests.analysis.test_encore.TestEncore.test_hes: 22.9368s
MDAnalysisTests.analysis.test_encore.TestEncore.test_hes_to_self: 22.7151s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_conect_bonds_all: 20.8662s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_conect_bonds_conect: 20.3900s
MDAnalysisTests.analysis.test_pca.TestPCA.test_transform_mismatch: 18.8019s
MDAnalysisTests.analysis.test_pca.TestPCA.test_cum_var: 18.0902s
MDAnalysisTests.analysis.test_gnm.TestGNM.test_generate_kirchoff: 14.8774s
MDAnalysisTests.coordinates.test_lammps.TestLAMMPSDATAWriter_cnt.test_Writer_atoms: 13.6212s
MDAnalysisTests.analysis.test_gnm.TestGNM.test_gnm: 13.1033s
MDAnalysisTests.coordinates.test_lammps.TestLAMMPSDATAWriter_data.test_Writer_dimensions: 12.9685s
MDAnalysisTests.analysis.test_encore.TestEncoreDimensionalityReduction.test_dimensionality_reduction_three_ensembles_two_identical: 12.6368s
MDAnalysisTests.coordinates.test_gro.TestGROLargeWriter.test_write_trajectory_atomgroup: 12.1999s
MDAnalysisTests.coordinates.test_gro.TestGROLargeWriter.test_write_trajectory_universe: 12.0097s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_n_frames: 11.8778s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_numconnections: 11.1740s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_iteration: 11.1318s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_n_atoms_frame: 11.1067s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_slice_iteration: 11.1007s
MDAnalysisTests.coordinates.test_pdb.TestMultiPDBReader.test_rewind: 10.9571s
MDAnalysisTests.analysis.test_hbonds.TestHydrogenBondAnalysisHeavyFail.test_true_traj: 10.2214s
MDAnalysisTests.coordinates.test_gro.TestGROLargeWriter.test_writer_large_residue_count: 10.0398s

There is a lot of variation: for instance MDAnalysisTests.coordinates.test_lammps.TestLAMMPSDATAWriter_data.test_Writer_atoms took either 161.1134s or 95.6346s. One GNM test differs by 30s. This is admittedly not an exhaustive comparison but it hints to variation in the run time on travis.

Also if full regularly runs into the time limit we won't know about our code coverage.

Yes, and that is definitely bad. Perhaps we need to split the build matrix into core and analysis and report coverage separately (or find a way to merge across builds).

@orbeckst orbeckst requested a review from jbarnoud June 20, 2017 23:29
@orbeckst
Copy link
Member Author

@MDAnalysis/coredevs this needs a review, thanks.

@orbeckst orbeckst mentioned this pull request Jun 20, 2017
11 tasks
Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the scipy version, everything looks OK.

@@ -494,11 +494,12 @@ def dynamic_author_list():
classifiers=CLASSIFIERS,
cmdclass=cmdclass,
requires=['numpy (>=1.10.4)', 'biopython', 'mmtf (>=1.0.0)',
'networkx (>=1.0)', 'GridDataFormats (>=0.3.2)', 'joblib'],
'networkx (>=1.0)', 'GridDataFormats (>=0.3.2)', 'joblib',
'scipy', 'matplotlib (>=1.5.1)'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hausdorff distance was introduced in scipy 0.19.0, maybe we should specify the minimum version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have our own hausdorff distance I think. You can open an issue for switching to the scipy implementation with a fallback to our own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read a comment to fast; I thought we switched to the scipy version. My bad.

@mnmelo
Copy link
Member

mnmelo commented Jun 21, 2017

@orbeckst, you didn't add yourself to the version's author list in the CHANGELOG.

@kain88-de
Copy link
Member

@orbeckst, you didn't add yourself to the version's author list in the CHANGELOG.

I added Oliver

@kain88-de kain88-de force-pushed the issue-1159-analysis-deps branch 2 times, most recently from d378937 to 85c0c56 Compare June 21, 2017 09:49
@richardjgowers richardjgowers self-assigned this Jun 21, 2017
def setUp(self):
sys.modules.pop('MDAnalysis.analysis.distances', None)

@block_import('scipy')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was so proud of this decorator, and now we hardly need it 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, worry not, I'm opening a PR in 5 minutes where it is very useful!

@richardjgowers
Copy link
Member

Is adding dependencies allowed in a 16.x release or does it have to be 0.17? Otherwise looks good

@kain88-de
Copy link
Member

Is adding dependencies allowed in a 16.x release or does it have to be 0.17? Otherwise looks good

Well with strict following of SemVer it is not. Even deprecating publib API would have to be in 0.17.0 and not 0.16.2. http://semver.org/

I personally think this is less of an issue then the deprecations.

@kain88-de
Copy link
Member

Tests are passing now. the full build just does a timeout like usual.

@orbeckst
Copy link
Member Author

orbeckst commented Jun 21, 2017 via email

@kain88-de kain88-de mentioned this pull request Jun 21, 2017
@kain88-de
Copy link
Member

If there are now more issues with this PR I will merge it in the afternoon

@kain88-de kain88-de merged commit 853447c into develop Jun 22, 2017
@tylerjereddy
Copy link
Member

I note that my normal development workflow has been disrupted by alterations to the installation process, with development branch python setup.py install --user resulting in (truncated):

                        * The following required packages can not be built:
                        * freetype, png
error: Setup script exited with 1

The issue is resolved by doing this first: conda install matplotlib

Perhaps another dev can try a fresh conda environment without matplotlib to see if they can reproduce this. Granted, it is on WSL, but that's basically just ubuntu. If you can reproduce this, maybe open an issue if it seems annoying enough, but I guess the conda install is a simple enough fix.

@orbeckst
Copy link
Member Author

orbeckst commented Jul 1, 2017

When I pip install anything it nowadays comes as a whl and I don't notice the pain of compiling matplotlib. I think I never managed to build matplotlib from source with pip. That had been one of the main reasons why I used to advocate for light dependencies. But at least my experience (on OSX and Linux) has become that installation of standard packages has become easy.

Have you tried pip install --user matplotlib in a fresh virtualenv?

@kain88-de kain88-de deleted the issue-1159-analysis-deps branch January 24, 2018 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants