Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error in Fast*Filter #1834

Merged
merged 2 commits into from
Jan 10, 2025
Merged

Conversation

stevenyoungs
Copy link
Contributor

get_raw_person_data now returns a DataDict.
Update FastMaleFilter.match() and FastFemaleFilter.match() accordingly to avoid a runtime error

Steps to reproduce the problem:

  1. start a new tree
  2. add a family
  3. in the EditFamily dialog, click the select button to choose the Father

The following error is displayed:

 File "C:\msys64\home\steve\gramps\gramps\gui\editors\editfamily.py", line 421, in match
    return value[2] == Person.MALE
           ~~~~~^^^
KeyError: 2

@stevenyoungs
Copy link
Contributor Author

@dsblank do you want to review this one?

I'm not sure how we can identify similar problems. A brute force search perhaps "value[[0-9]+]"?
Would the type hint changes @DavidMStraub is proposing help find this class of problem? Presumably not since value["gender"] is perfectly valid.

Copy link
Member

@dsblank dsblank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@DavidMStraub
Copy link
Member

That's exactly what I meant here #1828 (comment)

I suspect we'll be seeing a lot of AttributeErrors or alternatively have to sprinkle the code with assert hasattr. (And it will be impossible to type check.)

Yes, type hints would help, since if DataDict is an allowed argument, it would tell you that a DataDict does not have keys.

But the problem is that type hints cannot tell which primary object type the DataDict belongs to, which is why one would have to add an additional assert here to make sure it is a person DataDict and not an event or something else. Which is what I meant with the second part above.

In fact, in my opinion adding type hints throughout the code (which I have volunteered to do but is blocked by #1717 not being merged) would have the benefit of identifying such cases (and I'm sure there are more lurking in other functions).

@dsblank
Copy link
Member

dsblank commented Dec 31, 2024

I'm not sure how we can identify similar problems. A brute force search perhaps "value[[0-9]+]"?

I think I missed this one because it used value instead of data.

Would the type hint changes @DavidMStraub is proposing help find this class of problem? Presumably not since value["gender"] is perfectly valid.

I am a big fan of type hints, and used them throughout the filter refactor for the database and arguments to apply_to_one. This set of changes is harder because:

  1. we need to add type hints to the raw function return values, and harder:
  2. we would need to add type hints to the arguments (or return value) of the function that the raw function is used (in order to force the hints to be checked)

I think for now, just visual inspection and testing by running the code will help find these. I think it is mostly in the editors that we need to triple check.

@dsblank
Copy link
Member

dsblank commented Dec 31, 2024

it would tell you that a DataDict does not have keys.

Actually, like the name implies, DataDict is a dict and does have keys. We could of course change the API, and even make different versions for each primary object, but I don't think it will be worth it because we aren't going to use the raw data any more than what it is already used.

@stevenyoungs
Copy link
Contributor Author

But the problem is that type hints cannot tell which primary object type the DataDict belongs to, which is why one would have to add an additional assert here to make sure it is a person DataDict and not an event or something else. Which is what I meant with the second part above.

Great. My background is primarily statically typed languages so I personally like the safety net that static type checking provides..

Conceptually at least a PersonDataDict, EventDataDict etc. would take care of that

@stevenyoungs
Copy link
Contributor Author

I'm not sure how we can identify similar problems. A brute force search perhaps "value[[0-9]+]"?

I think I missed this one because it used value instead of data.

I've updated the PR to change the variable name to data as well.

@dsblank
Copy link
Member

dsblank commented Dec 31, 2024

$ git grep "value\[" | grep editors
gramps/gui/editors/displaytabs/repoembedlist.py:        value[0].ref = value[1].handle
gramps/gui/editors/displaytabs/repoembedlist.py:        data.append(value[0])
gramps/gui/editors/displaytabs/repoembedlist.py:        self.callman.register_handles({"repository": [value[1].handle]})
gramps/gui/editors/editfamily.py:        return value[2] == Person.MALE
gramps/gui/editors/editfamily.py:        return value[2] == Person.FEMALE

suggests that the are uses in repoembedlist.py ... I'll check those.

@DavidMStraub
Copy link
Member

Actually, like the name implies, DataDict is a dict and does have keys.

Yes, but if it's typed as dict[str, Any], some_dict[0] would be caught because the keys cannot be integers.

  • we need to add type hints to the raw function return values, and harder:
  • we would need to add type hints to the arguments (or return value) of the function that the raw function is used (in order to force the hints to be checked)

Yes.

@stevenyoungs
Copy link
Contributor Author

stevenyoungs commented Dec 31, 2024

Actually, like the name implies, DataDict is a dict and does have keys. We could of course change the API, and even make different versions for each primary object, but I don't think it will be worth it because we aren't going to use the raw data any more than what it is already used.

On the flipside, any external code that uses get_raw_*_data needs updating as well. Type checking would help find these. Other than a little typing to create the classes, is there a downside to having PersonDataDict, EventDataDict etc. They can all rely on the super class DataDict for implementation
There are alsoiseveral places in the addons-source repo that need updating. For example
change = -self.dbstate.db.get_raw_person_data(handle)[17]

and then code external to gramps-project

@dsblank
Copy link
Member

dsblank commented Dec 31, 2024

There are at several places in the addons-source repo that need updating. For example
change = -self.dbstate.db.get_raw_person_data(handle)[17]

As soon as we make the Gramps 6.0 branch for addons, I'll go through those.

@dsblank
Copy link
Member

dsblank commented Dec 31, 2024

repoembedlist.py looks good, just GUI code I believe. It would be great to have type hints in this GUI code, or maybe even to rewrite this. I had a professor in college who said that if you have any numbers in your code, something is wrong.

@dsblank dsblank added the bug-fix A PR that fixes a bug label Dec 31, 2024
@dsblank
Copy link
Member

dsblank commented Dec 31, 2024

@Nick-Hall, what's the policy on merging small bug-fixes like this one? Is there a minimum amount of time that needs to pass? BTW, I still have merge rights. I'd be glad to help with these small PR fixes by reviewing and merging.

@emyoulation
Copy link
Contributor

Since an "Other" gender was recently added, may I ask if a FastOtherFilter class should exist?

@dsblank
Copy link
Member

dsblank commented Dec 31, 2024

Since an "Other" gender was recently added, may I ask if a FastOtherFilter class should exist?

That's a good question. We have a bunch of related issues:

  1. But how should OTHER be used?
  2. In the mother editor, only those that are FEMALE are selected
  3. Is a "mother" FEMALE or OTHER? If so, we can just change this code. Same for father.
  4. We probably shouldn't even use the term "mother" as "Spouse Minor updates to readme #2" in a family

On another level, I'm note sure we should even use these Fast*Filters. They drop down to the raw level, by bypassing any proxy. If a proxy hid the gender, this code breaks that barrier.

The use of raw data should be thoroughly examined in the codebase.

@stevenyoungs
Copy link
Contributor Author

On another level, I'm note sure we should even use these Fast*Filters. They drop down to the raw level, by bypassing any proxy. If a proxy hid the gender, this code breaks that barrier.

Won't the Fast*Filters respect whatever is returned by the db they are passed?
From quickly looking at the code, ProxyBase.get_raw_person_data appears to do the right thing.

@dsblank
Copy link
Member

dsblank commented Dec 31, 2024

From quickly looking at the code, ProxyBase.get_raw_person_data appears to do the right thing.

Oh, wow! I completely missed that. Of course it is slower than molasses, but it would work.

It does bother me that it isn't easy to see all of these costs building up. This function will get the raw data, turn it into an object, sanitize it, and then turn it back into raw data.

[I wonder how fast it is having to do all of that in a proxy]

@Nick-Hall
Copy link
Member

what's the policy on merging small bug-fixes like this one? Is there a minimum amount of time that needs to pass?

The current policy is that bug fixes should remain open for at least 1 week before merging. Enhancements should remain open for at least three weeks to give time for all developers to comment. Sometimes people can be travelling or have limited internet access for other reasons.

BTW, I still have merge rights. I'd be glad to help with these small PR fixes by reviewing and merging.

Thanks for volunteering.

The workflow that I have used in the past is to add the "Approved" label to a PR when it is ready for further review and testing. It indicates that I have no fundamental objections to the PR, but haven't had time to look at it in detail yet. I'll start going through the open PRs and adding these labels tomorrow.

@Nick-Hall
Copy link
Member

Is a "mother" FEMALE or OTHER? If so, we can just change this code. Same for father.

In the family editor, father is male and mother is female. These will list the most likely results. There is always the "All" checkbox to select from everyone. This is useful for same sex relationships and also to select a person who is not recorded as male or female.

We probably shouldn't even use the term "mother"

Generally "spouse" is going to be preferable to "husband" or "wife" and "parent" preferable to "father" or "mother". However, the family editor is used for families that may or may not have children and the partners may or may not be of the same sex. Do we need these headings in the family editor?

@jralls
Copy link
Member

jralls commented Dec 31, 2024

Generally "spouse" is going to be preferable to "husband" or "wife"

Maybe. Partner gets used a lot, especially but not exclusively with LGBTQ+ families. I know quite a few marriages with two husbands or two wives.

and "parent" preferable to "father" or "mother".

That gets a bit tricky when dealing with Y and Mt DNA. For the foreseeable future everybody gets one biological father and one biological mother, and that's the family that most—even LGBTQ+—genealogists want to trace.

However, the family editor is used for families that may or may not have children and the partners may or may not be of the same sex. Do we need these headings in the family editor?

The current headings in the Family Editor are Father/Partner1 and Mother/Partner2. That's a bit clumsy. As you say there are families with no children and therefore no Father or Mother at all.

How about letting the user choose what the family members are titled?

@emyoulation
Copy link
Contributor

I like when the user can choose.

The Family object editor looks a bit "unfinished" with the current "Father/partner1" and "Mother/partner2" labels.

The use of "Spouse" would be a misnomer when the Family's Type is "Unknown" or "Unmarried"

@stevenyoungs
Copy link
Contributor Author

a couple of points:

  1. given that this fixes a regression in master, personally I'd prefer to see it merged as is and the discussion on further changes to the Family \ Fast*Filter classes moved to a separate PR
  2. whilst adding options sounds appealing, every option comes with a long term maintenance cost (keeping it working, documenting etc.). This option would probably have to be per family as well.
    Can we devise a heuristic that does the right thing? Is the label signifying the relationship between the two adults or the relationship between the adult(s) and children?
    The table below is incomplete but hopefully gives an idea. gramps would use the first 4 columns to determine which label to display. Also need to consider translation - does this scheme support all other languages?
Person 1 Gender Person 2 Gender Children? Relationship Type Label 1 Label 2
Unknown Unknown No * Partner 1 Partner 2
Unknown Unknown Yes * Parent 1 Parent 2
Male Female No Married Husband Wife
Male Female No * Partner 1 Partner 2
Male Female Yes * Mother Father
Male Male No Married Husband 1 Husband 2
Male Male No * Partner 1 Partner 2
Male Male Yes Father 1 Father 2
Female Female No * Partner 1 Partner 2
Female Female No Married Wife 1 Wife 2
Female Female Yes Mother 1 Mother 2

@dsblank
Copy link
Member

dsblank commented Jan 1, 2025

I'd prefer to see it merged as is and the discussion on further changes to the Family \ Fast*Filter classes moved to a separate PR

Agreed. (This is an instance where it would be nice to enable github "Issue Tracker" so we can easily reference these conversations and side issues)

@Nick-Hall
Copy link
Member

Nick-Hall commented Jan 1, 2025

given that this fixes a regression in master, personally I'd prefer to see it merged as is and the discussion on further changes to the Family \ Fast*Filter classes moved to a separate PR

Yes. Please create a separate PR for the enhancement.

This is an instance where it would be nice to enable github "Issue Tracker"

I'll create a post on the gramps-devel mailing list to discuss this. See Proposal to use GitHub Issues.

@emyoulation
Copy link
Contributor

emyoulation commented Jan 1, 2025

The heuristic looks interesting. Although I'd worry about someone deciding (retroactively) that the gender-guessing and surname-guessing had to "go away" when adding a person to a family. Citing that the gender of the Partner position was no longer a 'dependable' (as opposed to 'reasonable') assumption.

I find those "guessing" features valuable in reducing data entry burden. (In fact, I added a request that "Patrilineal" surname "origin" guessing be added to the "Father's Surname" option in Preferences-Data-Input options-Surname guessing automation.)

Discussed further in a discussion about SuperTool script with an Origin calculation example: https://gramps.discourse.group/t/tree-vivisection-experiments-with-the-isotammi-supertool/1621

@stevenyoungs
Copy link
Contributor Author

The heuristic looks interesting. Although I'd worry about someone deciding (retroactively) that the gender-guessing and surname-guessing had to "go away" when adding a person to a family. Citing that the gender of the Partner position was no longer a 'dependable' (as opposed to 'reasonable') assumption.

I could not follow the jump to potential gender \ surname-guessing "going away". My idea was to use a heuristic to determine the correct label to show within the dialog. If the label is wrong in a corner case, it does not impact the users research data in an way. By "label" I mean the two highlighted phrases in this image:
image

I find those "guessing" features valuable in reducing data entry burden.

I fully agree, these are a massive time saver.

@giotodibondone
Copy link

@stevenyoungs

I could not follow the jump to potential gender \ surname-guessing "going away". My idea was to use a heuristic to determine the correct label to show within the dialog. If the label is wrong in a corner case, it does not impact the users research data in an way. By "label" I mean the two highlighted phrases in this image:

Resulting discussion and feature request.

https://gramps.discourse.group/t/labels-in-edit-family-dialog/6700

#1843

Adjust code now that get_raw_person_data returns a DataDict
@Nick-Hall Nick-Hall merged commit 0c517db into gramps-project:master Jan 10, 2025
2 checks passed
@stevenyoungs stevenyoungs deleted the fast_filter branch January 19, 2025 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix A PR that fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants