* Finding Duplicate Individuals
-
andrewbraid
- Famous
- Posts: 124
- Joined: 30 Jul 2005 09:18
- Family Historian: V7
- Location: Leeds, Yorkshire
Finding Duplicate Individuals
Mike
I am beta testing Family Tree Analyzer and Alexander has added a duplicate detection tab.
When I run it I get some duplicates which the plugin is not showing. Some examples are
Jean Barber Fletcher and Joan Barber Fletcher born on the same day in the same place. They might be twins I have no idea.
I have four cases of a slight difference in surname - Grimshaw and Grimshay. Two have the same first name and same place and year of birth, one has a slightly different first name, Jesias and Josias but same place and year of birth and one has the same first name and year of birth but different place.
I would have thought that all of these should have appeared in the top 100.
I am beta testing Family Tree Analyzer and Alexander has added a duplicate detection tab.
When I run it I get some duplicates which the plugin is not showing. Some examples are
Jean Barber Fletcher and Joan Barber Fletcher born on the same day in the same place. They might be twins I have no idea.
I have four cases of a slight difference in surname - Grimshaw and Grimshay. Two have the same first name and same place and year of birth, one has a slightly different first name, Jesias and Josias but same place and year of birth and one has the same first name and year of birth but different place.
I would have thought that all of these should have appeared in the top 100.
Andrew Braid
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Finding Duplicate Individuals
Thank you for the feedback Andrew.
Yes, I would expect them to figure in the Result Set, providing that:
Does anything presented above or below explain their omission from your Result Set?
Yes, I would expect them to figure in the Result Set, providing that:
- You don't have 100 higher scoring entries ( I get scores from 4% to 8% and 13 to 24 points ).
- You are using Plugin default settings.
- The pairs all have the correct Sex recorded, otherwise points are deducted.
- The pairs are NOT in the Omit Non-Duplicates list, otherwise they are omitted.
- The pairs are NOT closely related in your tree, otherwise they are omitted or points are deducted:
e.g. If Jean & Joan Barber Fletcher are Spouses, Siblings, Parent/Child, or Grand-Parent/Grand-Child of each other they will be excluded, or if slightly more generations apart then points will be deducted.
Does anything presented above or below explain their omission from your Result Set?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
-
andrewbraid
- Famous
- Posts: 124
- Joined: 30 Jul 2005 09:18
- Family Historian: V7
- Location: Leeds, Yorkshire
Re: Finding Duplicate Individuals
Mike
Thank you
I have checked that i am on default settings and the Omit Duplicate table is blank. I have over 100 entries ranging from 14% - 42 points to 6% - 20 points
Jean and Joan Barber Fletcher are siblings, in fact, twins. They were born in same place in same year and I know of no free method of checking their birth so I do not know if there are two of them or one is a typo in a paper list that is my source so I do regard them as possible duplicates but I understand why the plug-in has them with a low score
I think it also explains the Grimshaw(y)s. They are all siblings.
The question then arises how do I check for duplicate siblings - sometimes with different spelling of the surname?
Thank you
I have checked that i am on default settings and the Omit Duplicate table is blank. I have over 100 entries ranging from 14% - 42 points to 6% - 20 points
Jean and Joan Barber Fletcher are siblings, in fact, twins. They were born in same place in same year and I know of no free method of checking their birth so I do not know if there are two of them or one is a typo in a paper list that is my source so I do regard them as possible duplicates but I understand why the plug-in has them with a low score
I think it also explains the Grimshaw(y)s. They are all siblings.
The question then arises how do I check for duplicate siblings - sometimes with different spelling of the surname?
Andrew Braid
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Finding Duplicate Individuals
OK, that explains why they are omitted since they are siblings.
During development, users requested that siblings and other very close relations be automatically excluded, because they did not want twins listed, nor siblings with same name & same birth place & similar dates where the 1st sibling died very young, nor father & son, etc with same name & same birth place but unknown dates.
It was also presumed that such close relations would be very obvious when updating families, and so did not need inclusion.
However, the solution is designed into the Plugin.
On the Set Preferences tab, select the Family & Gender tab, and reduce Family Generations to 0.
A value of 1 should still include siblings and grandparent/grandchild, but exclude spouses and parent/child relations.
This is where you can also adjust the points deducted for near relatives.
Adjust the Relatives Generations value &/or the Relatives Deduction value.
See the Help and Advice page on Family & Gender for more details.
During development, users requested that siblings and other very close relations be automatically excluded, because they did not want twins listed, nor siblings with same name & same birth place & similar dates where the 1st sibling died very young, nor father & son, etc with same name & same birth place but unknown dates.
It was also presumed that such close relations would be very obvious when updating families, and so did not need inclusion.
However, the solution is designed into the Plugin.
On the Set Preferences tab, select the Family & Gender tab, and reduce Family Generations to 0.
A value of 1 should still include siblings and grandparent/grandchild, but exclude spouses and parent/child relations.
This is where you can also adjust the points deducted for near relatives.
Adjust the Relatives Generations value &/or the Relatives Deduction value.
See the Help and Advice page on Family & Gender for more details.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Finding Duplicate Individuals
Further investigation of this family members feature of the Plugin revealed a minor mistake that stopped some close relations being included regardless of the Preference Tab settings.
That is now fixed by V3.4 in the Plugin Store.
That is now fixed by V3.4 in the Plugin Store.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Re: Finding Duplicate Individuals
Hi Mike,
Just run the new version.
At the moment I have 42322 Individuals and 12416 Families in my data.
I have now retired my Win XP machine and replaced it with a 3.6Ghz i5 with 8Gb Ram 500Gb SSD and 2 x 2Tb Raid 0 array as data drive, running W7 x64.
Run time was 1362 seconds and only 90 second delay before result set shown.
This is about half of what it took on my Home W7 machine and 3 times faster than the old XP machine.
Just run the new version.
At the moment I have 42322 Individuals and 12416 Families in my data.
I have now retired my Win XP machine and replaced it with a 3.6Ghz i5 with 8Gb Ram 500Gb SSD and 2 x 2Tb Raid 0 array as data drive, running W7 x64.
Run time was 1362 seconds and only 90 second delay before result set shown.
This is about half of what it took on my Home W7 machine and 3 times faster than the old XP machine.
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Finding Duplicate Individuals
If my maths is right, that is about 23 mins, compared to 45 mins on W7, and 1 hour on Win XP.
Is it possible to get a direct comparative measurement between V3.3 and V3.4 on the same PC?
Is it possible to get a direct comparative measurement between V3.3 and V3.4 on the same PC?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Re: Finding Duplicate Individuals
I did run V3.3 on the new computer a week ago, run time was slightly longer (From memory about 28 Mins, but I could be wrong. Win XP machine died just after (Luckily) I had copied most of my data across.
Maybe the plugin needs a tracking log to keep track of times
Maybe the plugin needs a tracking log to keep track of times
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Re: Finding Duplicate Individuals
Hi Mike,
Just run Vers 3.4 on my Laptop (2.53Ghz i3 4Gb Ram W7 Pro x64)
2780 seconds with 42326 individuals and 12416 familes.
Just run Vers 3.4 on my Laptop (2.53Ghz i3 4Gb Ram W7 Pro x64)
2780 seconds with 42326 individuals and 12416 familes.