* Finding Duplicate Individuals

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
avatar
andrewbraid
Famous
Posts: 124
Joined: 30 Jul 2005 09:18
Family Historian: V7
Location: Leeds, Yorkshire

Finding Duplicate Individuals

Post by andrewbraid » 22 Mar 2014 08:49

Mike

I am beta testing Family Tree Analyzer and Alexander has added a duplicate detection tab.

When I run it I get some duplicates which the plugin is not showing. Some examples are

Jean Barber Fletcher and Joan Barber Fletcher born on the same day in the same place. They might be twins I have no idea.

I have four cases of a slight difference in surname - Grimshaw and Grimshay. Two have the same first name and same place and year of birth, one has a slightly different first name, Jesias and Josias but same place and year of birth and one has the same first name and year of birth but different place.

I would have thought that all of these should have appeared in the top 100.
Andrew Braid

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Finding Duplicate Individuals

Post by tatewise » 22 Mar 2014 10:40

Thank you for the feedback Andrew.
Yes, I would expect them to figure in the Result Set, providing that:
  • You don't have 100 higher scoring entries ( I get scores from 4% to 8% and 13 to 24 points ).
  • You are using Plugin default settings.
  • The pairs all have the correct Sex recorded, otherwise points are deducted.
  • The pairs are NOT in the Omit Non-Duplicates list, otherwise they are omitted.
  • The pairs are NOT closely related in your tree, otherwise they are omitted or points are deducted:
    e.g. If Jean & Joan Barber Fletcher are Spouses, Siblings, Parent/Child, or Grand-Parent/Grand-Child of each other they will be excluded, or if slightly more generations apart then points will be deducted.
I have entered the following four pairs of unrelated Individuals and run the Plugin V3.3 with results shown below.
Does anything presented above or below explain their omission from your Result Set?
BraidFindDupsRecords.png
BraidFindDupsRecords.png (24.04 KiB) Viewed 6970 times
BraidFindDupsResults.png
BraidFindDupsResults.png (7.79 KiB) Viewed 6970 times
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
andrewbraid
Famous
Posts: 124
Joined: 30 Jul 2005 09:18
Family Historian: V7
Location: Leeds, Yorkshire

Re: Finding Duplicate Individuals

Post by andrewbraid » 22 Mar 2014 13:23

Mike

Thank you

I have checked that i am on default settings and the Omit Duplicate table is blank. I have over 100 entries ranging from 14% - 42 points to 6% - 20 points

Jean and Joan Barber Fletcher are siblings, in fact, twins. They were born in same place in same year and I know of no free method of checking their birth so I do not know if there are two of them or one is a typo in a paper list that is my source so I do regard them as possible duplicates but I understand why the plug-in has them with a low score

I think it also explains the Grimshaw(y)s. They are all siblings.

The question then arises how do I check for duplicate siblings - sometimes with different spelling of the surname?
Andrew Braid

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Finding Duplicate Individuals

Post by tatewise » 22 Mar 2014 14:11

OK, that explains why they are omitted since they are siblings.
During development, users requested that siblings and other very close relations be automatically excluded, because they did not want twins listed, nor siblings with same name & same birth place & similar dates where the 1st sibling died very young, nor father & son, etc with same name & same birth place but unknown dates.
It was also presumed that such close relations would be very obvious when updating families, and so did not need inclusion.

However, the solution is designed into the Plugin.
On the Set Preferences tab, select the Family & Gender tab, and reduce Family Generations to 0.
A value of 1 should still include siblings and grandparent/grandchild, but exclude spouses and parent/child relations.

This is where you can also adjust the points deducted for near relatives.
Adjust the Relatives Generations value &/or the Relatives Deduction value.

See the Help and Advice page on Family & Gender for more details.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Finding Duplicate Individuals

Post by tatewise » 02 Apr 2014 09:58

Further investigation of this family members feature of the Plugin revealed a minor mistake that stopped some close relations being included regardless of the Preference Tab settings.

That is now fixed by V3.4 in the Plugin Store.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
johnmorrisoniom
Megastar
Posts: 882
Joined: 18 Dec 2008 07:40
Family Historian: V7
Location: Isle of Man

Re: Finding Duplicate Individuals

Post by johnmorrisoniom » 02 Apr 2014 14:31

Hi Mike,
Just run the new version.
At the moment I have 42322 Individuals and 12416 Families in my data.

I have now retired my Win XP machine and replaced it with a 3.6Ghz i5 with 8Gb Ram 500Gb SSD and 2 x 2Tb Raid 0 array as data drive, running W7 x64.

Run time was 1362 seconds and only 90 second delay before result set shown.

This is about half of what it took on my Home W7 machine and 3 times faster than the old XP machine.

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Finding Duplicate Individuals

Post by tatewise » 02 Apr 2014 16:48

If my maths is right, that is about 23 mins, compared to 45 mins on W7, and 1 hour on Win XP.

Is it possible to get a direct comparative measurement between V3.3 and V3.4 on the same PC?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
johnmorrisoniom
Megastar
Posts: 882
Joined: 18 Dec 2008 07:40
Family Historian: V7
Location: Isle of Man

Re: Finding Duplicate Individuals

Post by johnmorrisoniom » 02 Apr 2014 17:24

I did run V3.3 on the new computer a week ago, run time was slightly longer (From memory about 28 Mins, but I could be wrong. Win XP machine died just after (Luckily) I had copied most of my data across.
Maybe the plugin needs a tracking log to keep track of times

User avatar
johnmorrisoniom
Megastar
Posts: 882
Joined: 18 Dec 2008 07:40
Family Historian: V7
Location: Isle of Man

Re: Finding Duplicate Individuals

Post by johnmorrisoniom » 04 Apr 2014 19:27

Hi Mike,
Just run Vers 3.4 on my Laptop (2.53Ghz i3 4Gb Ram W7 Pro x64)
2780 seconds with 42326 individuals and 12416 familes.

Post Reply