-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different results with plotly ternary vs python-ternary #140
Comments
So I can't speak to what Plotly is doing, but I think one difference is that Since some of the data rows sum to more than 1, I presume that Plotly is doing some kind of truncation or normalization to keep plots in the simplex, or the meaning of a ternary scatter plot is different in their implementation. |
So is the position being normalized per-row rather than per-column? I was confused that even when normalizing the data such that the maximum of each column was 1. I renormalized the data so the rows sum to 1 and yay, it's working now, thank you so much!! I guess I assumed that normalization would happen within the program. Or potentially a check on the data to make sure the rows sum to 1. What do you think? I'd be happy to add it. |
It could be useful to check that the sum isn't equal to |
I think even having this stated in the README.md and in the introduction Jupyter notebook would be helpful. I liked the idea of a ternary notebook, but spent hours scratching my head before I found the "sum to constant" constraint mentioned on Wikipedia. |
Hi @cmacdonald, do you want to open a PR with a change to the readme where you would have liked a warning / statement? You should be able to do it easily through the github interface. |
Hi @marcharper and @cmacdonald Thanks for the detailed discussion on the need to have the sum equal to scale on a row basis. I think that it will be beneficial to include such information in the main github page and documentation. As well to provide a nice example on how to perform the normalization. In my case, I have a data set with following characteristics: So, I proceed in two steps: i) min max scaling normalization on each column, and ii) row normalization (as done by @olgabot) to produce a ternary plot |
Hi, thanks for the suggestions. The wikipedia page on ternary plots explains the coordinates have to sum to a constant. There's a link to the wikipedia page on the top of the documentation. This library just plots. There are many ways to normalize or otherwise transform data and the library doesn't know which methods the user wants. For almost any scenario, there exists example code on Stack Overflow and other sites that explain how to for example normalize a Pandas dataframe by row or column. |
Indeed, there are so many ways to normalize data sets and it is up to users to decide what is the best way. However, I do think that it will be very helpful to let know new users of this nice package that the coordinates must sum to a constant (either 1 or 100, or even something else). Including the link to wikipedia will be an extra benefit! And again, many thanks for such great package! |
Hello,
Thank you so much for making this package! I'd like to overlay heatmap + scatter as mentioned in this issue: #129 and addressed in #121. However, I'm having trouble using the library.
I'm plotting median values of gene expression of a cell type across three species. When I use plotly, the result makes sense to me, where there are many dots in the middle, indicating many shared genes:
However, when I use that same data for
python-ternary
, the result didn't make any sense to me. There's a bunch of points outside of the plot, and it's not clear what's happening to the rest. The code is in "Details" below.Here is the code:
I thought this was a simple rescaling issue and divided each column by the maximum so there were no values greater than 0, but this didn't replicate the results I saw in Plotly, and don't make sense to me as there are still dots outside of the plot, and the pattern doesn't match what I see in plotly:
Do you know what may be happening?
Here is the data for reference: medians.csv.txt
The text was updated successfully, but these errors were encountered: