-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion for tag list groups #115
Comments
Okay first I'd like to clarify my understanding. This section:
Says that for each named tagvaluelist, we can ultimately expect a list of strings. The commands relevant are:
But how is that different from:
Other than being included in this list first? I think what would be most intuitive is to have this set of key values (the name of the tag list and the list itself) to the item lookup, and then treat it as a variable. I'm taking your example and modifying because we need an identifier for a taglist. What is still missing is a way to say "This is going to remove based on checking values, and not field names" - do you have a suggestion?
I was going to add VALUE after REMOVE, but I think there could be a case of a field called VALUE and then the recipe format would break because it indicates two different things. The above is also assuming it's called a "tagvaluelist" - maybe we can think of something better. Mixing python code in with the header doesn't directly tell the user where that function is from, which is why I didn't choose your second example. But for the example above, if we used patient_info, it would hit a regular expression. |
Also taking into account what a more full recipe looks like (https://github.com/pydicom/deid/blob/master/examples/deid/deid.dicom) with filters:
We are basically proposing to add a section that is general and lets the user define a list of things based on some parsing / criteria. This means that reading in the recipe would store the actions with the recipe object, but the actual derivation of the list would be run at the onset of reading each dicom. I'm wondering if we would want to make this ability to define groups of tags as something more general that can be potentially used in other functions for deid (e.g., filter groups). For example, what if instead of
If we allow for expansion of field names, we should allow them here too.
The use of group implies a list, even if it's just one item. And then in the recipe, it's referenced as a group, and used in a way that is allowed.
And then VALUE would be a special case, but again, it could be a field so that might not be the perfect solution. But since it's general, if/when it's requested, it could be used as a filter parameter as well:
This isn't something we'd develop now, but the idea is that we should be able to extend the usage if requested, and not have some variable name that is hard coded just for the header section. Technically, another section is not nested in header, and shouldn't just be associated with it. Let me know your thoughts @wetzelj. |
Your understanding of SPLIT and FIELD are spot on. I would think that for both of these we should trim leading/trailing whitespace and eliminate empty strings. As far as the default parameters go, I think that using an default minlength=1 and default split value of something would be acceptable. In my mind, there are really two values that could reasonably be the default split value - either empty space or ^. I'm making an assumption that the ^ character is used in the header fields because this is how the field would be received in an HL7v2 message. But honestly, I'd probably flip a coin on this decision. Regarding REGEX... Functionally, specifiying:
is exactly the same as:
The only benefit/reason to potentially allow the REGEX type in the tagvaluelist is for logical grouping. If we wanted to create a group to handle other patient information, it creates a nice grouping to encapsulate the REGEX pattern for the master patient identifier in with the other patient info rather than using a separate
If I understand this correctly, it's mainly about ensuring that we're visually clear in the action entries
It doesn't follow any other pattern in deid, but the similarity to an excel-type function may be widely understood. I think that expanding this functionality to a full recipe would be awesome. The only thing that I was a little unclear on was the use of the VALUE keyword. Is my understanding below correct? With the |
The typical usage for a REMOVE is to target a FIELD as the second variable, e.g.,
However in the way you suggested it, you are providing a list of values, but they aren't for fields - they are for a list of values that could be found in any field. But if we wrote this:
intuitively that reads as "Remove the group of friends represented in patient_info" and not "remove the group of fields that include any values from patient_info. Thus, we would need a way to make it clear that we are searching over values and not fields. Even saying this:
Would be better to say "Remove all fields that are in the group "patient_info". But that's probably not clear enough, because group still could be a list of values, no? Actually, now that I think of it, even saying:
Is confusing because it's not clear if we want the field name itself, or the value inside. We might actually want:
to explicitly say the value. And then for the action, deid doesn't know the difference between a group of values vs. fields (or even potentially both if the user does something weird). So I had suggested adding VALUES to indicate this:
But that's not very good because VALUES could be a field name, potentially. But what if we just simplified it and made the group type explitit?
and then saying
"Remove all values that include anything in the list patient_info" and would be different than
"Remove all fields that are in the list patient info. In summary:
Ah, and I hate Microsoft products, so I am definitely not a fan of the Excel function look, haha. :) I'm closing up shop soon, so likely I won't respond again until tomorrow. Let me know your thoughts on the above! |
This approach is great! It keeps the recipe clear and explicit while providing the flexibility needed. Let me know how you would like to proceed from here. In one of your prior responses, you mentioned that you would "likely want to take charge of the work", however, I'd like to assist in any way you think would be beneficial. |
I should be able to make time this weekend to get this underway! I will need huge help to test, possibly write a few new tests, and two issues I’m interested in about updating the version requirement for pydicom and lifting for matplotlib. I’m hugely busy this morning as I have meetings and a podcast recording, but hopefully might even be able to get started this evening. How would you like to proceed with #113 - I’d possibly like to include this feature before this next refactor and it would be good to get it tested and reviewed. |
I've done some preliminary testing on #113 and will be performing some additional testing this afternoon. In general, I think it is good functionality to have at our disposal. We'll definitely be able to help with testing. In general- are you okay with me opening other issues? I'd like to get an issue out there for the Private Tag inclusion. |
Yes 100%! I think here is how I see next steps:
In the last point, figuring out if we can update pydicom is important as well. |
Just a quick note - I'm thinking instead of
Is there any reason to not allow any number of parameters after
But hmm I'm rethinking this now - it would get weird with a space within the comma. The |
Another question - would a |
Good thought. I like the way that reads. As for the # as a delimiter - personally, I think it’s an okay limitation to say it can’t be used as the delimiter.
|
Groups were added in #120. |
This is discussion continued from #112 (comment).
I agree... when I came back into the office this morning and re-read, I didn't like it either. Really, I think I merged two purposes (scanning the tag values and identifying the targeted tag identifiers). For the sake of clarity, I'm going to try again with an example. Which I hope will also illustrate where the additional section could be beneficial. At the moment, I'm also specifically ignoring a potential wholesale refactor of the replacement. I need to do more studying of the current process to make informed comments... I'll continue to think on that point.
Sample Header
For the sample project, the goal would be to remove patient identifiers, but there's also a need to replace any references to the scanner's vendor, make, and model. In the prior suggestion, I discussed a new section called
%scandefinition
. I think a more appropriate name for this section would be%tagvaluelist
.Proposed Recipe (new sections)
The goal of these new sections would be to create lists of strings (processed tag values) that could be referenced by name in the header section. At this point, the tagvaluelist sections would simply be creating two lists of strings, but at runtime, these would be evaluated to image-specific lists of tags:
The existing header section pattern would remain unchanged (
[ACTION] [FIELD] [VALUE]
), except now, whereas theFIELD
must either be a header named field or the keyword ALL, tagvaluelist names could be specified in actions.Propsed Recipe (%header)
I'm listing two options here... these are referenced below in the implementation/processing replacement section.
Option A:
Option B:
Implementation/Processing Replacement
(again, pushing off a refactor for performance for the time being)
Within replace identifiers, before processing actions, we would need to convert the tagvaluelists into specific tags to be acted on. This would need to occur before actions were applied to the dicom file, and given our sample would create the following tag lists using the tagvaluelists. The recipe option A and B that I listed above would just drive what we think is most clear in the recipe - would we want an implicit or explicit conversion from tagvaluelists to tag values. Regardless of the option chose, the tagvaluelists could be converted to actual tags:
At this point, standard the recipe actions could be performed in pretty much exactly the same way as a rule like
REMOVE ALL
orREPLACE ALL var:myvar
works today. The only difference is that the ALL keyword would be a different value tying to the tagvaluelist which evaluates to a subset of the tags.The Results
Given these adjusments, the final deidentified header would look like this:
The text was updated successfully, but these errors were encountered: