* Find Duplicate Individuals with no family name

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
avatar
JoopvB
Superstar
Posts: 328
Joined: 02 May 2015 14:32
Family Historian: V7

Find Duplicate Individuals with no family name

Post by JoopvB » 13 Feb 2023 14:39

Hi Mike,
Your plugin Find Duplicate Individuals is one of my favorites but since I'm getting further back in time most of people's names are without a family name. A patronymic is then, at least here in Holland, used to identify people. I store the patronymic as the last part of the first name (as it should be, is my opinion).

This works well but I can't get the plugin to work effectively with patronymic names and no last name.

My question is: is there a (hidden) trick/setting in the plugin to make it compare patronymic names effectively (i.e. not finding that nearly all people without a last name are duplicates)?

And if not might it be included on your todo list for the foreseeable future?

Thanks, Joop

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals with no family name

Post by tatewise » 13 Feb 2023 17:33

Joop, I've had a quick look at this and think I need more details before investigating further.

I made a small change to the Name scoring such that if two people have no Surnames then that is treated as if they had matching Surnames. That simply added the default score of 7 to their Individual column score.

So in my tests, there is little difference between the published plugin and the modified plugin.
They both take into account Forenames that have a matching patronymic part together with all the other Event and close Relative matching details.
Therefore, if two Individual records have the same patronymic Forename, and similar Events, and similar Relatives, then they will get a fairly high score and be listed as duplicates, but should not match well with others who also have no Surname.

So, I don't really understand why you say the plugin is finding that nearly all people without a last name are duplicates.
A screenshot of the Result Set scores might make things clearer.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
JoopvB
Superstar
Posts: 328
Joined: 02 May 2015 14:32
Family Historian: V7

Re: Find Duplicate Individuals with no family name

Post by JoopvB » 14 Feb 2023 11:03

In essence my point is that a patronymic is like a temporary (for 1 generation) last name. If the plugin could somehow attach a score to that then, I guess, the resulting output would be extra valuable.

I've attached a screenshot of the output with the plugin setting to default.
Attachments
SNAG-0001.png
Possible duplicates
SNAG-0001.png (210.27 KiB) Viewed 881 times

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals with no family name

Post by tatewise » 14 Feb 2023 12:52

Sorry, but you need to explain what problems you have with that Result Set, which looks OK to me.
I don't see lots of people without a Surname being matched as duplicates.
Which of the names displayed are patronymics?

Currently, each matching Forename is given a score (6 by default I think).
So if the patronymic of one person is the same as the patronymic of another then they gain that score.

In that Result Set they are all very low scores and mostly based on some name partial match and partial event match.
So I'm not sure what higher scores you are expecting.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
JoopvB
Superstar
Posts: 328
Joined: 02 May 2015 14:32
Family Historian: V7

Re: Find Duplicate Individuals with no family name

Post by JoopvB » 14 Feb 2023 14:19

Alle names in small print only are individuals without a last name.

My "problem" isn't a problem as the plugin is very helpful. It's just that in dealing with older names patronymic are important and since there is no separate field for it in FH I store them as the last part of the first name. This works fine and my question really is:
a. is there a best setting for the plugin that would give extra weight to the last part of the first name if there is no last name resulting in a high score?
b. if not a, would it be an option to extend the plugin to handle people without a last name in such a way that if the first name has more then 1 part, the last part can be given a separate weight since that last part may be a patronymic?

In Holland records go far back (some of my family lines go to 1550) and last names only got formally introduced in 1811. Before that patronymics were heavily used.

So, a matching forename (whatever the score) can't take in account the completely different meaning of the parts of the forename in case of patronymics. Hence te question.

Cheers, Joop

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals with no family name

Post by tatewise » 14 Feb 2023 16:32

JoopvB wrote:
14 Feb 2023 14:19
a. is there a best setting for the plugin that would give extra weight to the last part of the first name if there is no last name resulting in a high score?
b. if not a, would it be an option to extend the plugin to handle people without a last name in such a way that if the first name has more then 1 part, the last part can be given a separate weight since that last part may be a patronymic?
Remember that the scores only apply where some form of match applies between two people.
As I said earlier:
tatewise wrote:
13 Feb 2023 17:33
I made a small change to the Name scoring such that if two people have no Surnames then that is treated as if they had matching Surnames. That simply added the default score of 7 to their Individual column score.
I think that is very similar to your requests and for the people in your screenshot would add 7 to the score for the pairs with no Surname.
i.e.
2801 Annitje Luijyten 2508 Annetgen Leulofsdr = 39 ( mostly from matching events & children and not names )
2816 Cornelis Louwen 2814 Cornelis Andriesz Louwen = 38 ( ~ ditto ~ )

Presumably, a potential duplicate pair needs more than just having no Surname and matching patronymic, but also other Forenames must match and some events and relatives must match too.

What I am saying is that even boosting the score by 7 points where both Surnames are missing does not have a major effect on the total score, because other data must also match to identify potential duplicates.

As I said, your screenshot has very low scores, so none of the people listed are likely to be duplicates, even if those where both have no Surname had 7 extra points.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
JoopvB
Superstar
Posts: 328
Joined: 02 May 2015 14:32
Family Historian: V7

Re: Find Duplicate Individuals with no family name

Post by JoopvB » 14 Feb 2023 17:24

tatewise wrote:
13 Feb 2023 17:33
I made a small change to the Name scoring such that if two people have no Surnames then that is treated as if they had matching Surnames. That simply added the default score of 7 to their Individual column score.
What preference did you change in Names Matching to get that effect?
Last edited by tatewise on 14 Feb 2023 20:14, edited 1 time in total.
Reason: Corrected quote tags

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals with no family name

Post by tatewise » 14 Feb 2023 20:19

I had to edit the plugin script to change the way it works just as an experiment. It is not an option in the published plugin.

As I explained, I don't see it as having any significant benefit.
It won't make people with matching patronymics and no Surname have a dramatically better score.
Typically, it will add 7 to their score which has no great effect.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
JoopvB
Superstar
Posts: 328
Joined: 02 May 2015 14:32
Family Historian: V7

Re: Find Duplicate Individuals with no family name

Post by JoopvB » 14 Feb 2023 21:23

I see.

Could I test it on my database to compare with the published one?

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals with no family name

Post by tatewise » 14 Feb 2023 22:03

Yes, try the attached Find Duplicate Individuals plugin Version 3.8.1 Date 13 Feb 2023.
Attachments
Find Duplicate Individuals.fh_lua
Version 3.8.1 Date 13 Feb 2023
(262.61 KiB) Downloaded 23 times
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
JoopvB
Superstar
Posts: 328
Joined: 02 May 2015 14:32
Family Historian: V7

Re: Find Duplicate Individuals with no family name

Post by JoopvB » 14 Feb 2023 22:47

Thanks Mike, tomorrows todo.

avatar
JoopvB
Superstar
Posts: 328
Joined: 02 May 2015 14:32
Family Historian: V7

Re: Find Duplicate Individuals with no family name

Post by JoopvB » 15 Feb 2023 13:57

Hi Mike,

I've run a number of tests with version 3.8.1 and compared the results with 3.8. The result is indeed disappointing, it increases the score somewhat for the "no last namers" but that's it.

So I'll keep using 3.8. Thanks for the time and effort!

Cheers, Joop

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals with no family name

Post by tatewise » 15 Feb 2023 16:21

I think it simply confirms that your current database does not have any duplicates.

If people match their patronymic forename and even have 7 extra points for having no Surname, that totals 13 points.
They need many more matching details to gain a score that suggests they are candidate duplicates.
So they need other forenames to match or several events to match or some relatives to match to get a suitable score.
As I said before, an extra 7 points (or even the maximum of 20 points for a Name match) makes little difference.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

Post Reply