-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expected data format #3
Comments
Hi @e-kotov A) Helper function that accepts a long table of flows and a table with the nodes and checks for matches (like od_to_sf does) before combining into the format required for I think i prefer (B) because internally, the inputs already get converted to long format for plotting. So in future versions, the unnecessary conversion long -> messy ->long could be bypassed completely. Would that be helpful? |
@JohMast to me, option B sounds sensible, and it is also in line with how the original inspiration library and package https://github.com/FlowmapBlue/flowmapblue.R expects the input. I think it is nice to mimic the functionality of other packages, as that's what allows to create an ecosystem of packages that is ultimately very easy to use by the end user, and also allows your package to be an easy drop-in replacement for Note: Sadly, there seems to be no industry standard for this type of data, which would be universally accepted by packages. E.g.
And
@Robinlovelace's @SymbolixAU, @crazycapivara, @Robinlovelace, @JohMast , perhaps it is time to discuss and try to implement a standard? |
Okay, I implemented the change. You should now be able to pass Regarding formats: I agree that it would be neat to have a standard way of doing things. To my understanding, all these formats contain the same information. Some are more efficient regarding storage/memory and some are more readable, though that is probably a matter of taste. However, I suspect that the differences are because they focus on different fields and applications? Analyzing the nodes, modeling flows, exploration... What might be possible is some standardization of the terminology? I am thinking of things like x/y versus lon/lat Harmonizing these might make it easier for users to find comparable packages, and jointly use them. If a source for that exists already, I would be happy about a pointer! |
@JohMast Great, thank you! I will test today and get back to you! |
trying with this small data sample: od <- structure(list(o = c("01001_AM", "01001_AM", "01001_AM", "01001_AM",
"01001_AM", "01001_AM", "01001_AM", "01001_AM", "01001_AM", "01001_AM",
"01001_AM", "01001_AM", "01001_AM", "01001_AM", "01001_AM", "01001_AM",
"01001_AM", "01001_AM", "01001_AM", "01001_AM"), d = c("01002",
"0105906", "01063_AM", "19058_AM", "1913005", "20036", "2006903",
"20075", "20902", "22084_AM", "24212_AM", "28045", "3120102",
"3120106", "3120107", "31208_AM", "33036", "37073_AM", "39020_AM",
"39059"), value = c(20.178, 2078.967, 134.44, 11.225, 7.309,
5.928, 17.053, 46.277, 12.177, 11.906, 6.352, 11.225, 21.94,
10.537, 17.846, 9.394, 29, 4.768, 138.539, 4.113)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
o = "01001_AM", .rows = structure(list(1:20), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))
nodes <- structure(list(x = c(478391.311766996, 484914.014734671, 578774.440509757,
611908.530688029, 608573.323628031, 611017.377114496, 580132.802557328,
436937.31903137, 407370.337626221, 517286.944855483, 595144.389218619,
787373.931469436, 477333.501735812, 586892.188214235, 546087.082616995,
351638.077170337, 525487.254205587, 284534.1296852, 502306.423213228,
285509.147488101, 618662.856436448), y = c(4797745.16494694,
4502754.1831806, 4781295.45569597, 4742681.29936187, 4742222.82183755,
4740866.63365897, 4790187.05267574, 4501721.43675297, 4761496.29470226,
4758305.73549361, 4801630.20307999, 4700483.60434869, 4498035.54417975,
4791106.8478728, 4747657.38543724, 4806647.49130073, 4744247.07625825,
4695004.91047174, 4763643.29102358, 4555167.70779241, 4647074.93331487
), name = c("39020_AM", "1913005", "20075", "3120106", "3120107",
"3120102", "20902", "28045", "39059", "01063_AM", "20036", "22084_AM",
"19058_AM", "2006903", "01001_AM", "33036", "0105906", "24212_AM",
"01002", "37073_AM", "31208_AM")), class = "data.frame", row.names = c(NA,
-21L))
ggplot() |> add_flowmap(od = od, nodes = nodes) update: corrected When debugging, just before failing the flows is:
Fails at: Line 287 in 2052296
with:
|
Happy to provide input here. FYI we have had a similar conversation before: itsleeds/od#20 Would be happy to implement a class system in OD with checks. Could be an S3 class system or even an S7 class system, which looks ideal for this, forcing all For my purposes keeping it super-simple, simply using I do think {od} can be a handy translation layer so happy to add more conversion functions there. |
cc @mtennekes who instigated the very similar conversation in itsleeds/od#20 and of tmap fame, would welcome any updated thoughts from you on this, especially with your experience. Could be good to extend and to be able to fall-back to data.frame/sf objects is my thinking. |
Thanks for that example👍 I added a check for grouping variables, that should prevent this and other grouping-related issues. |
@JohMast Thank you! It now passes the tests with larger datasets in the vignette. One more minor thing, if you could force-convert any Otherwise, I have rewritten the vignette i referred to using the new arguments you created. It is currently in my local branch. I will update the original vignette that currently preps the data in the |
Good point - I see no reason not to force-convert the entire id column in the inputs to character. That's how they are treated internally anyway, and as the user only gets the plot and not the data, it should not matter to them. On the broader discussion, I agree with @Robinlovelace, both options sound good to me. I personally appreciate the flexibility of using |
I have not developed anything with classes, so I cannot comment on that. Sounds like it would be useful, but also a bit niche, as this data type is not as common as, for example,
I think that is absolutely fine. As long as the packages that work with that kind of data expect the same input.
That was my thinking too. If different package devs cannot agree on the same format, at least |
Great, thank you! |
Makes me think of a new title of {od}: A Universal Translator Between Different 'Origin-Destination' Data Formats See: itsleeds/od#52 |
Also: now you've shown how easy it is to create hex's, od should gain a hex... |
For the time being I think this issue may be 'fixed', great job on the package John, glad to have given it a spin in rOpenSpain/spanishoddata#65 |
Great! Thanks for the positive feedback and the discussion! |
Great to hear, good outcome Johannes (apologies for typo in your name previously, just checked the DESCRIPTION)! |
Thanks! You got it right, Johannes is just the German version of John 😊 |
Hi @JohMast ,
I have a question regarding the data format that
{flowmapper}
expects. I have completed a draft of a new vignette (can be found here, or if the branch is already merged and deleted, I guess the vignette will be here) for{spanishoddata}
that shows how to use{flowmapper}
to visualise the data aqcuired with{spanishoddata}
. And at some point I have ran into a problem with the format that{flowmapper}
expects. To quote the vignette:So the problem is that, as far as I know (and I have worked with origin-destination data for quite some time, @Robinlovelace may correct me if I am wrong), the format you are expecting as input for
{flowmapper}
is not standard in the "industry" at all. I would also refer you to the{od}
package (https://itsleeds.github.io/od/articles/od.html).Would it be possible for
{flowmapper}
to support (and auto-detect) the more standard long format of OD data? Or some other more standardised form of origin-destination data, like od matrices?The text was updated successfully, but these errors were encountered: