Page 1 of 1

Find Duplicate Individuals with no family name

Posted: 13 Feb 2023 14:39
by JoopvB
Hi Mike,
Your plugin Find Duplicate Individuals is one of my favorites but since I'm getting further back in time most of people's names are without a family name. A patronymic is then, at least here in Holland, used to identify people. I store the patronymic as the last part of the first name (as it should be, is my opinion).

This works well but I can't get the plugin to work effectively with patronymic names and no last name.

My question is: is there a (hidden) trick/setting in the plugin to make it compare patronymic names effectively (i.e. not finding that nearly all people without a last name are duplicates)?

And if not might it be included on your todo list for the foreseeable future?

Thanks, Joop

Re: Find Duplicate Individuals with no family name

Posted: 13 Feb 2023 17:33
by tatewise
Joop, I've had a quick look at this and think I need more details before investigating further.

I made a small change to the Name scoring such that if two people have no Surnames then that is treated as if they had matching Surnames. That simply added the default score of 7 to their Individual column score.

So in my tests, there is little difference between the published plugin and the modified plugin.
They both take into account Forenames that have a matching patronymic part together with all the other Event and close Relative matching details.
Therefore, if two Individual records have the same patronymic Forename, and similar Events, and similar Relatives, then they will get a fairly high score and be listed as duplicates, but should not match well with others who also have no Surname.

So, I don't really understand why you say the plugin is finding that nearly all people without a last name are duplicates.
A screenshot of the Result Set scores might make things clearer.

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 11:03
by JoopvB
In essence my point is that a patronymic is like a temporary (for 1 generation) last name. If the plugin could somehow attach a score to that then, I guess, the resulting output would be extra valuable.

I've attached a screenshot of the output with the plugin setting to default.

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 12:52
by tatewise
Sorry, but you need to explain what problems you have with that Result Set, which looks OK to me.
I don't see lots of people without a Surname being matched as duplicates.
Which of the names displayed are patronymics?

Currently, each matching Forename is given a score (6 by default I think).
So if the patronymic of one person is the same as the patronymic of another then they gain that score.

In that Result Set they are all very low scores and mostly based on some name partial match and partial event match.
So I'm not sure what higher scores you are expecting.

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 14:19
by JoopvB
Alle names in small print only are individuals without a last name.

My "problem" isn't a problem as the plugin is very helpful. It's just that in dealing with older names patronymic are important and since there is no separate field for it in FH I store them as the last part of the first name. This works fine and my question really is:
a. is there a best setting for the plugin that would give extra weight to the last part of the first name if there is no last name resulting in a high score?
b. if not a, would it be an option to extend the plugin to handle people without a last name in such a way that if the first name has more then 1 part, the last part can be given a separate weight since that last part may be a patronymic?

In Holland records go far back (some of my family lines go to 1550) and last names only got formally introduced in 1811. Before that patronymics were heavily used.

So, a matching forename (whatever the score) can't take in account the completely different meaning of the parts of the forename in case of patronymics. Hence te question.

Cheers, Joop

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 16:32
by tatewise
JoopvB wrote:
14 Feb 2023 14:19
a. is there a best setting for the plugin that would give extra weight to the last part of the first name if there is no last name resulting in a high score?
b. if not a, would it be an option to extend the plugin to handle people without a last name in such a way that if the first name has more then 1 part, the last part can be given a separate weight since that last part may be a patronymic?
Remember that the scores only apply where some form of match applies between two people.
As I said earlier:
tatewise wrote:
13 Feb 2023 17:33
I made a small change to the Name scoring such that if two people have no Surnames then that is treated as if they had matching Surnames. That simply added the default score of 7 to their Individual column score.
I think that is very similar to your requests and for the people in your screenshot would add 7 to the score for the pairs with no Surname.
i.e.
2801 Annitje Luijyten 2508 Annetgen Leulofsdr = 39 ( mostly from matching events & children and not names )
2816 Cornelis Louwen 2814 Cornelis Andriesz Louwen = 38 ( ~ ditto ~ )

Presumably, a potential duplicate pair needs more than just having no Surname and matching patronymic, but also other Forenames must match and some events and relatives must match too.

What I am saying is that even boosting the score by 7 points where both Surnames are missing does not have a major effect on the total score, because other data must also match to identify potential duplicates.

As I said, your screenshot has very low scores, so none of the people listed are likely to be duplicates, even if those where both have no Surname had 7 extra points.

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 17:24
by JoopvB
tatewise wrote:
13 Feb 2023 17:33
I made a small change to the Name scoring such that if two people have no Surnames then that is treated as if they had matching Surnames. That simply added the default score of 7 to their Individual column score.
What preference did you change in Names Matching to get that effect?

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 20:19
by tatewise
I had to edit the plugin script to change the way it works just as an experiment. It is not an option in the published plugin.

As I explained, I don't see it as having any significant benefit.
It won't make people with matching patronymics and no Surname have a dramatically better score.
Typically, it will add 7 to their score which has no great effect.

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 21:23
by JoopvB
I see.

Could I test it on my database to compare with the published one?

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 22:03
by tatewise
Yes, try the attached Find Duplicate Individuals plugin Version 3.8.1 Date 13 Feb 2023.

Re: Find Duplicate Individuals with no family name

Posted: 14 Feb 2023 22:47
by JoopvB
Thanks Mike, tomorrows todo.

Re: Find Duplicate Individuals with no family name

Posted: 15 Feb 2023 13:57
by JoopvB
Hi Mike,

I've run a number of tests with version 3.8.1 and compared the results with 3.8. The result is indeed disappointing, it increases the score somewhat for the "no last namers" but that's it.

So I'll keep using 3.8. Thanks for the time and effort!

Cheers, Joop

Re: Find Duplicate Individuals with no family name

Posted: 15 Feb 2023 16:21
by tatewise
I think it simply confirms that your current database does not have any duplicates.

If people match their patronymic forename and even have 7 extra points for having no Surname, that totals 13 points.
They need many more matching details to gain a score that suggests they are candidate duplicates.
So they need other forenames to match or several events to match or some relatives to match to get a suitable score.
As I said before, an extra 7 points (or even the maximum of 20 points for a Name match) makes little difference.