Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap histogram #1072

Merged
merged 37 commits into from
Apr 26, 2021
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c042b0a
Add histogram.py
willschlitzer Mar 17, 2021
4409760
remove outfile name
willschlitzer Mar 17, 2021
fe086b3
Update pygmt/src/histogram.py
willschlitzer Mar 17, 2021
da2539f
wrap additional arguments
willschlitzer Mar 18, 2021
f1cbad2
run make format
willschlitzer Mar 18, 2021
c2ec478
Merge branch 'master' into wrap-histogram
willschlitzer Apr 5, 2021
ac99e08
Merge branch 'wrap-histogram' of github.com:GenericMappingTools/pygmt…
willschlitzer Apr 5, 2021
1e0e9aa
add into of docstring for histogram.py
willschlitzer Apr 5, 2021
6ec0a66
add histogram to index.rst
willschlitzer Apr 5, 2021
fb2b4ce
add histogram aliases to docstring
willschlitzer Apr 6, 2021
8899a48
remove uncommon parameters
willschlitzer Apr 8, 2021
d66de75
Merge branch 'master' into wrap-histogram
willschlitzer Apr 8, 2021
d30037c
update docstring for histogram
willschlitzer Apr 8, 2021
b8040ab
add interval
willschlitzer Apr 8, 2021
2439d6a
Add test_histogram.py
willschlitzer Apr 8, 2021
05c005f
add test_histogram.png.dvc
willschlitzer Apr 8, 2021
34af05f
run make format
willschlitzer Apr 8, 2021
1cfac35
add line breaks
willschlitzer Apr 9, 2021
72c022c
add docstrings to test_histogram.py
willschlitzer Apr 9, 2021
14aa960
Merge branch 'master' into wrap-histogram
willschlitzer Apr 9, 2021
c84b008
Merge branch 'master' into wrap-histogram
willschlitzer Apr 10, 2021
b5cf339
Apply suggestions from code review
willschlitzer Apr 12, 2021
780b820
Merge branch 'master' into wrap-histogram
willschlitzer Apr 12, 2021
d9b8e39
remove GMTTempFile
willschlitzer Apr 12, 2021
151d90b
Merge branch 'master' into wrap-histogram
willschlitzer Apr 13, 2021
5ab0e5d
Merge branch 'master' into wrap-histogram
willschlitzer Apr 19, 2021
1bc9300
Merge branch 'master' into wrap-histogram
willschlitzer Apr 20, 2021
6bc7fa2
Merge branch 'master' into wrap-histogram
willschlitzer Apr 24, 2021
69f5d4d
Update pygmt/src/histogram.py
willschlitzer Apr 24, 2021
55c3fbc
update interval parameter to series in test_histogram() in test_histo…
willschlitzer Apr 24, 2021
1d86dc7
update test_histogram.png.dvc
willschlitzer Apr 24, 2021
c914954
run make format
willschlitzer Apr 24, 2021
cb9e25e
Update pygmt/src/histogram.py
willschlitzer Apr 24, 2021
30e2004
Merge branch 'master' into wrap-histogram
willschlitzer Apr 24, 2021
85201e5
add region argument in test_histogram() in test_histogram.py
willschlitzer Apr 26, 2021
9980315
update test_histogram.png.dvc
willschlitzer Apr 26, 2021
f1575f9
Merge remote-tracking branch 'origin/wrap-histogram' into wrap-histogram
willschlitzer Apr 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Plotting data and laying out the map:
Figure.grdcontour
Figure.grdimage
Figure.grdview
Figure.histogram
Figure.image
Figure.inset
Figure.legend
Expand Down
1 change: 1 addition & 0 deletions pygmt/figure.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,7 @@ def _repr_html_(self):
grdcontour,
grdimage,
grdview,
histogram,
image,
inset,
legend,
Expand Down
1 change: 1 addition & 0 deletions pygmt/src/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from pygmt.src.grdinfo import grdinfo
from pygmt.src.grdtrack import grdtrack
from pygmt.src.grdview import grdview
from pygmt.src.histogram import histogram
from pygmt.src.image import image
from pygmt.src.info import info
from pygmt.src.inset import inset
Expand Down
59 changes: 59 additions & 0 deletions pygmt/src/histogram.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""
Histogram - Create a histogram
"""
from pygmt.clib import Session
from pygmt.helpers import build_arg_string, fmt_docstring, kwargs_to_strings, use_alias


@fmt_docstring
@use_alias(
A="horizontal",
B="frame",
C="cmap",
G="fill",
J="projection",
R="region",
T="series",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is series a good name? Other Python histogram functions (e.g., numpy.histogram, plt.histogram use bins instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on #1072 (comment). Probably not a good idea to use bins as the input for -T is structured quite differently to the bins argument in np.histogram and plt.histogram.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it depends on how we want to implement the -T option in PyGMT.

histogram -T have following usages:

  1. -T6
  2. -T0/200/5
  3. -Tfile
  4. -T0,3,4,6,10

The current implementation doesn't work well for usage 4, since users have to pass a string "0,3,4,6,10".

Instead, if we adopt the syntax of np.histogram, we can use more Pythonic syntax

bins=6
bins=np.arange(0, 200, 5)
bins="file"
bins=[0, 3, 4, 6, 10)

Copy link
Member

@weiji14 weiji14 Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we're using @kwargs_to_strings(T="sequence") which implies Option 2 (-T0/200/5) and I think this works well for makecpt/grd2cpt where regular sized bins are the norm, and also histogram to some extent. Doing Option 4 would mean using @kwargs_to_strings(T="sequence_comma") and I guess this would be fine for people wanting irregular sized bins for histogram.

I don't often plot histograms, so I'll leave it up to you all to decide which one is better. Ideally we would standardize the -T option across all modules, but we could also use series (with T="sequence") for makecpt/grd2cpt, and bins (with T="sequence_comma") for histogram if that makes more sense for new users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note here: we don't have to use @use_alias and @kwargs_to_strings for -T.

Instead, we can check if the bins (or any other parameter name) argument is a string, one value, three values, or a list of values. I think it provides the most Pythonic syntax and also the greatest flexibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. For a single histogram plot, it would be be much easier to use something like the built-in pandas hist function (e.g. just do df.hist()). But for say, mixing a grdview plot with a histogram as an 'inset', then using PyGMT would be better 😉

Let's get another opinion. Ping @core-man since he seemed interested in histogram at #563 (comment). What syntax do you think PyGMT's histogram should follow, GMT's or Matplotlib's?

Copy link
Member

@core-man core-man Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my 10-year seismological studies, I only use GMT to plot figures and sometimes use AI/3D max to plot cartoons. I began to use Python and PyGMT at the same time last year, so I have never learned matplotlib. So for my study, GMT/PyGMT is enough right now. For the PyGMT scope I understand now, we'd like to provide a good Python interface for GMT. So as we go to v1.0.0, I prefer more Pythonic syntax instead of strict GMT's (e.g., #1117). At that time, I think (also hope) most GMT users will learn GMT by using PyGMT.

For this issue, I think the current implementation is okay and most users can use it well (I never use option 4). However, it's better we could refractor all the similar parameters to be more Pythonic syntax as we go to v1.0.0, e.g.,

bins=6
bins=np.arange(0, 200, 5)
bins="file"
bins=[0, 3, 4, 6, 10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should move to a more Pythonic syntax overall. Assuming that is the eventual plan, it's creating more work for our future selves by wrapping new features with a GMT syntax. That being said, I think there are more commonly-used features (setting map projections, updating pen/fill settings, adding a frame) that should be prioritized if the plan is to begin incorporating Pythonic syntax. This is certainly an issue outside of the scope of wrapping histogram, but I think if we are changing new features to be in-line with a more Pythonic syntax we should be also updating our old features ASAP.

Copy link
Contributor Author

@willschlitzer willschlitzer Apr 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GenericMappingTools/pygmt-contributors Maybe I am missing it, but is there a consensus on how this function should be modified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of time and consistency with other PyGMT modules like makecpt/grd2cpt, I say we stick with using series (-T) instead of bins. For better or worse, we are still heavily skewed towards the style of GMT core rather than NumPy, and decisions of what is 'Pythonic' should go in a separate issue. It is better to be consistently like GMT, rather than confuse users with a mix of NumPy-like and GMT command-line-like syntax. So @willschlitzer, let's keep things as it is for now :)

W="pen",
c="panel",
l="label",
p="perspective",
willschlitzer marked this conversation as resolved.
Show resolved Hide resolved
)
@kwargs_to_strings(R="sequence", T="sequence")
def histogram(self, table, **kwargs):
weiji14 marked this conversation as resolved.
Show resolved Hide resolved
r"""
Plots a histogram, and can read data from a file or
list, array, or dataframe.

Full option list at :gmt-docs:`histogram.html`

{aliases}

Parameters
----------
table : str, list, or 1d array
A data file name, list, or 1d numpy array. This is a required argument.
{J}
{R}
{B}
{CPT}
{G}
{W}
{c}
label : str
Add a legend entry for the symbol or line being plotted.
{p}
willschlitzer marked this conversation as resolved.
Show resolved Hide resolved
horizontal : bool
Plot the histogram using horizonal bars instead of the
default vertical bars.
interval : int or str
willschlitzer marked this conversation as resolved.
Show resolved Hide resolved
[*min*\ /*max*\ /]\ *inc*\ [**+n**\ ]
Set the interval for the width of each bar in the histogram.

"""
willschlitzer marked this conversation as resolved.
Show resolved Hide resolved
kwargs = self._preprocess(**kwargs) # pylint: disable=protected-access
with Session() as lib:
file_context = lib.virtualfile_from_data(check_kind="vector", data=table)
with file_context as infile:
arg_str = " ".join([infile, build_arg_string(kwargs)])
lib.call_module("histogram", arg_str)
4 changes: 4 additions & 0 deletions pygmt/tests/baseline/test_histogram.png.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
outs:
- md5: 314634815787d56a557f9646dcba7526
size: 21674
path: test_histogram.png
24 changes: 24 additions & 0 deletions pygmt/tests/test_histogram.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# pylint: disable=redefined-outer-name
"""
Tests histogram.
"""
import pytest
from pygmt import Figure


@pytest.fixture(scope="module")
def table():
"""
Returns a list of integers to be used in the histogram.
"""
return [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 8, 8, 8]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Is the baseline image correct? I expect to see y=6 for x=8.

BTW, please change projection to a smaller dimension like X10c/5c.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seisman I just tried it on both GMT 6.1.1 and 6.2.0rc1, with the figure coming out the same way. Any idea what might be causing the miscount for 8?

Also, changing the projection to X3c/10c to give it a larger vertical scale; does that work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like an upstream bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and recreate it in GMT and open an upstream issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GMT automatically determines the x range from input data. In this example, xmin=1, xmax=8, but the vertical bar for x=8 is outside of the [1, 8] range, so it's not plotted.

You should be able to see the x=8 bar, if you manually set region=[0, 9, 0, 6]. I think we can do it temporarily so that this PR can be merged, and report it to upstream and wait for feedback and potential fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, changing the projection to X3c/10c to give it a larger vertical scale; does that work?

BTW, the large vertical scale makes the image difficult to read, why not just use "X10c/10c" for a square basemap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seisman I made your suggested changes



@pytest.mark.mpl_image_compare
def test_histogram(table):
"""
Tests plotting a histogram using a list of integers.
"""
fig = Figure()
fig.histogram(table=table, projection="X10c/25c", series=1, frame="a", fill="green")
return fig