Skip to content
This repository has been archived by the owner on Nov 26, 2018. It is now read-only.

determine how to de-dupe BG readings #109

Open
jebeck opened this issue Jun 6, 2014 · 10 comments
Open

determine how to de-dupe BG readings #109

jebeck opened this issue Jun 6, 2014 · 10 comments

Comments

@jebeck
Copy link

jebeck commented Jun 6, 2014

Once we expand the types of self-monitored blood glucose readings we are pulling into our data platform (as discussed here), we will need a mechanism for preventing the user from being visually bombarded with duplicate BGs (for example because we get a BG event both directly from the user's meter and because the user entered that reading into the bolus wizard).

There are many ways to address this issue, among them (non-exhaustively):

  • only visualize certain types (i.e., don't visualize manually entered wizard BGs at all)
  • server-side logic to "link" suspected duplicate events combined with client-side logic to display only one of a set of linked events
  • client-side logic to determine which events could be duplicate and a priority system for displaying only one
@jebeck jebeck changed the title determine how to de-dupe wizard and smbg readings determine how to de-dupe BG readings Jun 6, 2014
@cheddar
Copy link
Contributor

cheddar commented Jun 6, 2014

My vote is on "only visualize certain types" for now.

It seems like that is the shortest time to value path for this. Then, as we run into problems with visual artifacts from that, we make the next decision about what should be done.

@jebeck
Copy link
Author

jebeck commented Jun 6, 2014

SGTM, @cheddar

@kentquirk
Copy link
Contributor

Here's an idea:

If there are multiple events within a couple of minutes that have similar
readings, then we aggregate them into a single dot but show all the values
if you hover. Basically, if things would cause multiple dots to overlap a
lot then we collapse them (and maybe indicate it with a 2 in the dot or
some other indicator of specialness).

On Fri, Jun 6, 2014 at 7:33 PM, Jana Beck [email protected] wrote:

Once we expand the types of self-monitored blood glucose readings we are
pulling into our data platform (as discussed here
tidepool-org/tidepool-org.github.io#17), we
will need a mechanism for preventing the user from being visually bombarded
with duplicate BGs (for example because we get a BG event both directly
from the user's meter and because the user entered that reading into the
bolus wizard).

There are many ways to address this issue, among them (non-exhaustively):

  • only visualize certain types (i.e., don't visualize manually entered
    wizard BGs at all)
  • server-side logic to "link" suspected duplicate events combined with
    client-side logic to display only one of a set of linked events
  • client-side logic to determine which events could be duplicate and a
    priority system for displaying only one


Reply to this email directly or view it on GitHub
#109.

Kent Quirk
VP of Engineering, Tidepool

Tidepool is an open source, not-for-profit effort to build an open data
platform and better applications to reduce the burden of Type 1 Diabetes.

@jebeck
Copy link
Author

jebeck commented Jun 9, 2014

For clarification, is your suggestion for a particular client vs. server side solution @kentquirk or still agnostic on that point?

Otherwise I like the idea - I also think eventually it'll be useful to expose readings that you entered into your CGM for calibration, which are almost guaranteed to be duplicates (assuming we're also getting them from the meter). This will be helpful for knowing what to trust when the CGM trace and the fingerstick diverge (if you entered a fingerstick as a calibration, it's probably the CGM that's off).

@cheddar
Copy link
Contributor

cheddar commented Jun 10, 2014

I agree on the calibration thing Jana, that's why deviceMeta has a calibration event that is defined as CGM calibrations ;).

On the finding overlaps, when we need to, it should be the visualization that is making the choice not to show data, not the server making the choice not to deliver data.

@jebeck
Copy link
Author

jebeck commented Jun 10, 2014

Ah, @cheddar I think you misunderstand what I believe the server-side task would be. One way to do it, I believe, would be to link all suspected duplicates with a duplicateGroupId property or something, so the client side doesn't have to do any computation, only referencing that property and having some system of priority for determining which of the duplicate group gets visualized (with info on all of them potentially appearing in tooltip, as @kentquirk suggested).

@kentquirk
Copy link
Contributor

I was presuming something like this would be de-duped on the client side; the client is the place where you can best tell if the dots would overlap.

Jana, I believe you are proposing that some duplicate detection code be run server-side that would decorate the data points with some indicator that they're possibly duplicated. I wasn't proposing that we be that sophisticated -- my algorithm was just "if the points would overlap on the display by 'too much', they would be collapsed into one and both the tooltip and the graphic used would indicate that there was an overlap. There's no judgement required, but what is required is some knowledge of how the information is displayed. In other words, if the dots get bigger, the algorithm gets adjusted.

@jebeck
Copy link
Author

jebeck commented Jun 10, 2014

I'm not sure what you're describing, @kentquirk, in terms of the logic is something I'd agree with. It sounds like the logic you're proposing is actually sensitive to the visual overlap. That is not the type of de-duplication I think we should be doing, but rather semantic de-duplication based on (a) identical readings and (b) near-identical timestamps.

I don't believe we want to prevent displaying overlapping SMBG circles if you test your BG twice in a row within a short span of time (to calibrate your Dexcom, perhaps) and get a slightly different reading each time - 134 mg/dL and 140 mg/dL, for example. Those are separate, independently valid SMBG events, and having both might even be useful ("Oh, I checked my BG twice in a row here, that's probably when I calibrated a new Dexcom sensor.") unlike when you enter an SMBG into the bolus wizard or we pull an SMBG from both your meter directly and from the pump because it was a linked meter.

@kentquirk
Copy link
Contributor

The problem as stated was "visual clutter". What I'm proposing reduces that without making any judgement about the data. It basically converts multiple overlapping dots to a single dot with an indicator of overlap (perhaps a number inside). So if you saw a dot that looked like (2) in place of ( ), you'd know that there were 2 readings there. And if you hover over that (2) you'll see the 134 and 140.

Which problem are we really trying to solve?

@jebeck
Copy link
Author

jebeck commented Jun 13, 2014

@kentquirk I agree that the question of which problem(s) we're trying to solve is the question at issue here. I think the semantic de-duplication is far more important than the display de-duplication. The display de-duplication also presents significant challenges since I believe it would have to be code that runs as part of rendering (to query actual pixel distance between points, assuming our eventual goal of a responsive visualization). Semantic de-duplication could be part of the data pre-processing step that only happens once per dataset and doesn't affect anything but initial rendering time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants