-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique value classifier for categorical maps with distinct colors for large number of categories. #173
base: main
Are you sure you want to change the base?
Conversation
sjsrey
commented
Feb 9, 2023
for more information, see https://pre-commit.ci
Does |
|
Codecov Report
@@ Coverage Diff @@
## main #173 +/- ##
=======================================
- Coverage 88.5% 86.2% -2.3%
=======================================
Files 8 8
Lines 1070 1108 +38
=======================================
+ Hits 947 955 +8
- Misses 123 153 +30
|
To be fair, I don't really see a need for it, especially in mapclassify. It doesn't do anything on the classification front and the only possible benefit over calling gdf.plot('STATE_NAME', cmap=distinctipy.get_colormap(distinctipy.get_colors(gdf.STATE_NAME.nunique()))) If the main functionality you are interested in here is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be super useful for a lot of applications (e.g. like our neighborhood delineation over in geosnap where you can a few hundred neighborhoods in a single metro, but the mpl colormaps dont give enough variation)
I see your point Martin, but i'd just add that IMO the utility of geopandas using mapclassify under the hood is that i dont need to know or care about distinctpy as a user (and i definitely dont want to have to remember cmap=distinctipy.get_colormap(distinctipy.get_colors(gdf.STATE_NAME.nunique())))
) even if that's sufficient. Since mapclassify has applications beyond geopandas, and this is a super useful classification method, it feels like an obvious enhancement to me
But you will not use this under the hood from geopandas and if the main point is exposure of distintcipy, then it should be in splot. I just don't think this belongs to mapclassify and is not consistent with the rest of the package. |
(well, i definitely would use this under the hood in geopandas lots of times, cause i dont wanna type that long string. That's like saying the so i think this is the rub: imo, the purpose of mapclassify is to create binning schemes that are appropriate for (a wide variety) of cartographic display. In the case of unique values, it's true that you dont need to classify those values, but the purpose of splot is for spatial statistical visualizations which is why the esda plotting methods live there (and why that stuff isnt in mapclassify in the first place) so i guess what im saying is its far more natural for a 'unique binner' to live in the binning package, rather than our version of seaborn |
Alright, let me elaborate a bit as I think that my comments may have come across as a bit too harsh. I think that this is super useful feature to have when I need to plot categorical variables with more than 20 classes supported natively by geopandas. And if exposed in geopandas, I would also use it myself. However, as implemented here it is not compatible with being exposed in geopandas. That is what I meant. We use If we want to use this from geopandas, then the reasonable thing would be to open an issue there resulting in a PR ensuring that you can pass something like Now onto second point. The point of mapclassify is to discretise continuous variable into a set of classes. I am fine expanding that logic to categorical variables if we think it is useful in some way. But the output is always an array (bins, labels...). And that is consistent across the package. If there is any functionality in PySAL that is remotely close to this type of choropleth plotting it is We've been discussing the mess we have with plotting weights (one method in splot, other in libpysal) and that it should be consistently implemented in splot, so I don't want to create yet another place where we have some plotting code. As a conclusion - if we, as a community, think that it would be useful to have a direct access to N-colored cmap when plotting a categorical variable from a GeoDataFrame, let's open an issue in geopandas and implement it there, where it would belong most naturally. If we also think that having the counts in the legend is important, it may also be included there. The same code can then be shared with the If you all think that it is okay to implement it as is in mapclassify and that it is the best place for this functionality, I'll accept that. But at the moment I am just not convinced of that. |
one of my fav parts of the dev process is having these discussions to make decisions by committee :D |
This discussion is what I hoped the WIP label would stimulate, so I think this is very productive. The original motivation for this came from a user of mapclassify who asked for this ability. My pr is intended to show how this might be done - I am uncertain myself where this actually should live - I can see merits in all the the options that have been suggested thus far. A couple of thoughts: Geopandas consumption@martinfleis is correct that the current implementation cannot be exposed in geopandas as Unique_Value is not a subclass of MapClassifier. This was done because the classes for UV do not have bounds/intervals, just labels. We could refactor this to extend the legend handling in mapclassify to deal with the continuous and catgorical variables in a more elegant fashion. If so, then exposing this in geopandas should be possible with the existing api. Alternatively, we could do a PR into geopandas to add this functionality directly (i.e., it wouldn't be a classifier in mapclassify). api inconsistencyYes, since Unique_Value jettisons the inheritance in mapclassify it is not consistent with the other classifiers. The addition of the plot method also marks a departure. The latter was intended to flesh out the plotting issues/design more so than to suggest we add a plot method to all the classifiers in mapclassify (although this gets asked for from time to time). For plotting code in pysal, I agree it is best to centralize that logic in splot. other packages can consume that api but the consistency should come through splot. I'm not against giving the different packages their own plot methods where it makes sense, as long as the are composed through splot to the extent possible to ensure consistsency. |
😁 @martinfleis I dont read you as harsh, we're both just direct writers with opinions
my view is this is a philosophical distinction. Packages evolve over time and this function is designed to help make good looking maps by putting data into bins--which is precisely the purpose of mapclassify I guess the question is how you view the categorization of functionality across the packages. I'd argue the conceptual difference between the package is more important. The "precedent" here is providing simple tools for creating good looking maps. It doesnt matter how the code works. Implicitly, mapclassify is about creating choropleths, regardless of whether we've done the actual plotting in the past. In this case, it makes sense to go ahead and implement the plotting because that's the best way to surface the functionality we're actually after with the package--making attractive maps easily although the value-by-alpha stuff exists in splot, the package itself is not about choropleths. It's about wrapping tailored visualizations around spatial analyses. So, personally, I see no precedent for this function over there if this function is a crayon, then it makes more sense to me inside the box of pencils (mapclassify) than the box of protractors (splot) |
That is only partially true. We would be able to pass labels to geopandas but not colours. And passing labels is a bit pointless given geopandas can do that itself and will call
Would any of you object this option? It is imho the best one. I would add |
I'm leaning this way, as I think it is the cleanest solution. I can think of a couple of additional options that might be useful, but I could add them to the PR in GP if this the way we go. |
I would start with an issue outlining the idea in the geopandas repo to gather feedback from folks there. We may hit resistance (I don't think we will) and circle back here. |