* Find Duplicate Individuals Version 1.5+
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 1.5+
Mike,
The IntXxxxLastWrong is working great. Thank you.
I'm seeing something I hadn't noticed before.
I have two individuals named Isaac /Van Meter/ and William /Van Meter/. They are ending up with 14 points under the Individual column. I have the plugin values set as follows:
IntIndiDeduction = -5
IntIndiMinimum = 1
IntIndiMaximum = 20
IntIndiThreshold = 10
IntIndiSoundex = 2
IntIndiForeWrong = 0
IntIndiForeRight = 3
IntIndiLastRight = 7
IntIndiLastWrong = -0
Shouldn't these two only get 7 points for IntIndiLastRight and no other points? They have no alternate names and they have no event matches (in fact one doesn't have any events at all).
Actually, this happens to a lot of the folks with the last name of Van Meter. Is it somehow thinking the 'Van' is part of the forenames instead of being part of the surname? Or, alternatively, is it giving 7 points for Van matching and another 7 points for Meter matching?
Thanks,
Bill
The IntXxxxLastWrong is working great. Thank you.
I'm seeing something I hadn't noticed before.
I have two individuals named Isaac /Van Meter/ and William /Van Meter/. They are ending up with 14 points under the Individual column. I have the plugin values set as follows:
IntIndiDeduction = -5
IntIndiMinimum = 1
IntIndiMaximum = 20
IntIndiThreshold = 10
IntIndiSoundex = 2
IntIndiForeWrong = 0
IntIndiForeRight = 3
IntIndiLastRight = 7
IntIndiLastWrong = -0
Shouldn't these two only get 7 points for IntIndiLastRight and no other points? They have no alternate names and they have no event matches (in fact one doesn't have any events at all).
Actually, this happens to a lot of the folks with the last name of Van Meter. Is it somehow thinking the 'Van' is part of the forenames instead of being part of the surname? Or, alternatively, is it giving 7 points for Van matching and another 7 points for Meter matching?
Thanks,
Bill
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
mikegscoles ~ As explained under Other in the Help page, the Plugin excludes siblings and other very close family members for two reasons. (1) The similar names and relations of close family members would result in many false positives. (2) Such close family member duplicates are easily spotted by inspection of a family tree.
Who counts as a 'close family member' can be adjusted by editing the User Preference Settings at the head of the Plugin script. These will probably become available on the Set Preferences tab in a later version.
If you were to duplicate your father as well as yourself in a separate small tree, then I am certain the Plugin would report the duplication.
Bill ~ Yes, multiple Surnames score 7 points each. Currently, any part of a Name that is only 1 or 2 characters long is ignored, but I have wondered if Names & Titles such as Van and Sir might cause a problem. So probably 3 character Names should also be ignored, or another parameter is needed to set minimum Name length.
BTW ~ Have you used the Omit Non-Duplicates tab feature yet?
Who counts as a 'close family member' can be adjusted by editing the User Preference Settings at the head of the Plugin script. These will probably become available on the Set Preferences tab in a later version.
If you were to duplicate your father as well as yourself in a separate small tree, then I am certain the Plugin would report the duplication.
Bill ~ Yes, multiple Surnames score 7 points each. Currently, any part of a Name that is only 1 or 2 characters long is ignored, but I have wondered if Names & Titles such as Van and Sir might cause a problem. So probably 3 character Names should also be ignored, or another parameter is needed to set minimum Name length.
BTW ~ Have you used the Omit Non-Duplicates tab feature yet?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 1.5+
Mike,
Hmmm... not sure that will work. I have some actual surnames that are three characters. Could it be changed so the entire surname (all parts) have to match in order to get the 7 points?
Yes... I did try the non-duplicates processing. Seemed to work great. Thanks much!
Bill
Hmmm... not sure that will work. I have some actual surnames that are three characters. Could it be changed so the entire surname (all parts) have to match in order to get the 7 points?
Yes... I did try the non-duplicates processing. Seemed to work great. Thanks much!
Bill
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
I am not keen on the whole Surname match idea, because it is quite common for double-barrel Surnames such as SMITH-JONES to be recorded as just SMITH or just JONES in earlier or later stages of a person's life. Also these part Surnames often appear as Forenames, and result in a Soundex match. If the whole Surname was treated as one Name, then the above scenarios would not match and score zero.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
-
mikegscoles
- Diamond
- Posts: 66
- Joined: 01 Sep 2006 21:27
- Family Historian: V6.1
Find Duplicate Individuals Version 1.5+
Mike
Thanks for your reply.
In my tree of 1600 individuals I did as you suggested and added my father (as a duplicate)to my Grand father and myself as a son. So we both appear twice in my tree. The plugin does not pick it up. I also added my father as an unrelated individual and me as a son to him. We both appear twice in the tree but the plugin still does not pick us up.
Changing the file root does not make a difference.
Mike
Thanks for your reply.
In my tree of 1600 individuals I did as you suggested and added my father (as a duplicate)to my Grand father and myself as a son. So we both appear twice in my tree. The plugin does not pick it up. I also added my father as an unrelated individual and me as a son to him. We both appear twice in the tree but the plugin still does not pick us up.
Changing the file root does not make a difference.
Mike
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 1.5+
Because you added your father to his father, he is seen as a sibling and ignored by the plugin.
If you enter your 3 generations totally separate from there originals, they should be picked up.
If you enter your 3 generations totally separate from there originals, they should be picked up.
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 1.5+
Mike,
I can understand that for hyphenated names, but not so much for names with spaces in them. In my case, the surname is not Van or Meter it is 'Van Meter'. I don't think anyone with that name would go by just Van or just Meter.
That is OK, I can just add all my Van Meter individuals to the non-duplicates list.
Thanks,
Bill
I can understand that for hyphenated names, but not so much for names with spaces in them. In my case, the surname is not Van or Meter it is 'Van Meter'. I don't think anyone with that name would go by just Van or just Meter.
That is OK, I can just add all my Van Meter individuals to the non-duplicates list.
Thanks,
Bill
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
Mike ~ As the Help I referred to states:
However, the unrelated duplicate entries for Father and Son should be listed, unless there is over 100 higher scoring duplicates.
The duplicated Son entries should score 7 points for matching Surname, and 6 points for matching Forename, plus 6 points for matching Birth Date. His matching Father should add 7 points for matching Surname, and 6 points for matching Forename. So the total for the duplicated Son entries should be 32 points.
The duplicated Father entries should score exactly the same total points.
Please double check you have correctly set the Sex=Male of both the new Father and Son entries.
[EDIT] Also ensure both these new Father and Son entries are Unrelated to any other Individuals, i.e. a COMPLETELY SEPARATE family tree.
Could you give some more details about what does appear in the Result Set, and confirm you are using Plugin Version 2.0 with default settings, and you are not selecting any subset of the Individual Records.
So, your Father and Son entries added to your Grandfather will be excluded.Immediate family are excluded from the results. These family members are Spouses, Siblings, Parent/Child, and Grand-Parent/Grand-Child.
However, the unrelated duplicate entries for Father and Son should be listed, unless there is over 100 higher scoring duplicates.
The duplicated Son entries should score 7 points for matching Surname, and 6 points for matching Forename, plus 6 points for matching Birth Date. His matching Father should add 7 points for matching Surname, and 6 points for matching Forename. So the total for the duplicated Son entries should be 32 points.
The duplicated Father entries should score exactly the same total points.
Please double check you have correctly set the Sex=Male of both the new Father and Son entries.
[EDIT] Also ensure both these new Father and Son entries are Unrelated to any other Individuals, i.e. a COMPLETELY SEPARATE family tree.
Could you give some more details about what does appear in the Result Set, and confirm you are using Plugin Version 2.0 with default settings, and you are not selecting any subset of the Individual Records.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
-
mikegscoles
- Diamond
- Posts: 66
- Joined: 01 Sep 2006 21:27
- Family Historian: V6.1
Find Duplicate Individuals Version 1.5+
Mike
Thanks once again for you reply.
I think I have found the answer.
So that I could easily spot the unrelated duplicate additions I used lowercase for the christian names. When I changed this to uppercase they appeared in the duplicate list.
For the test I added unrelated individuals grandfather, father and myself with DOB's.
I am using v 2.0 default settings.
Thanks for all the help you give to members of this group.
Mike
Thanks once again for you reply.
I think I have found the answer.
So that I could easily spot the unrelated duplicate additions I used lowercase for the christian names. When I changed this to uppercase they appeared in the duplicate list.
For the test I added unrelated individuals grandfather, father and myself with DOB's.
I am using v 2.0 default settings.
Thanks for all the help you give to members of this group.
Mike
-
mikegscoles
- Diamond
- Posts: 66
- Joined: 01 Sep 2006 21:27
- Family Historian: V6.1
Find Duplicate Individuals Version 1.5+
Sorry Mike I wasn't clear. I used lowercase for the first letter of the christian when the duplicates were not found.
Uppercase for the first letter of the forename when the duplicates were found.
Mike
Uppercase for the first letter of the forename when the duplicates were found.
Mike
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
Mike ~ Thanks for that feedback. I will modify the next version to disregard the case of characters in names, so that John will match john, etc.
Bill ~ I have had a rethink about space separated Surname parts, and you are right, Surnames such as Van Dyke and De la Roche should be treated differently from hyphen separated Surnames such as Smith-Jones.
So the next version will strip space characters from Surnames, and only split Surnames into parts using punctuation characters such as hyphen.
This has the probably beneficial side-effect that Van Dyke will match Vandyke, etc.
Bill ~ I have had a rethink about space separated Surname parts, and you are right, Surnames such as Van Dyke and De la Roche should be treated differently from hyphen separated Surnames such as Smith-Jones.
So the next version will strip space characters from Surnames, and only split Surnames into parts using punctuation characters such as hyphen.
This has the probably beneficial side-effect that Van Dyke will match Vandyke, etc.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- Valkrider
- Megastar
- Posts: 1534
- Joined: 04 Jun 2012 19:03
- Family Historian: V7
- Location: Lincolnshire
- Contact:
Find Duplicate Individuals Version 1.5+
Mike
That will certainly help me too as mine used Le Fever and Lefever interchangeably throughout their lives.
That will certainly help me too as mine used Le Fever and Lefever interchangeably throughout their lives.
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 1.5+
Mike,
That sounds great. I know my Van Meter family sometimes went by Vanmeter as well. The same thing happened for some of my other 'Van' families as well.
Thanks again,
Bill
That sounds great. I know my Van Meter family sometimes went by Vanmeter as well. The same thing happened for some of my other 'Van' families as well.
Thanks again,
Bill
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
The Find Duplicate Individuals Version 2.1 is now available for download.
This version offers the Name handling improvements discussed above.
It also starts to add more sticky settings to the Set Preferences tab, which should be completed in the next version.
This version offers the Name handling improvements discussed above.
It also starts to add more sticky settings to the Set Preferences tab, which should be completed in the next version.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 1.5+
Mike,
Should version 2.1 be remembering the non-duplicate list from 2.0? Mine is empty.
Thanks,
Bill
Should version 2.1 be remembering the non-duplicate list from 2.0? Mine is empty.
Thanks,
Bill
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 1.5+
Mike,
How are place names handled?
I think I may be suffering from part of the place name swamping the other parts. A lot of my individuals are from different parts of the 'Isle of Man'. Is this getting treated as 3 parts, or 1.
I also did a test with version 1.9 with 3 individuals selected.
Father with no dates at all, and two daughters with calculated DOB of 1840 and 1850 respectively.
The father was matched with several candidates who were only born after 1840, should this have been possible?
How are place names handled?
I think I may be suffering from part of the place name swamping the other parts. A lot of my individuals are from different parts of the 'Isle of Man'. Is this getting treated as 3 parts, or 1.
I also did a test with version 1.9 with 3 individuals selected.
Father with no dates at all, and two daughters with calculated DOB of 1840 and 1850 respectively.
The father was matched with several candidates who were only born after 1840, should this have been possible?
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
The Find Duplicate Individuals Version 2.2 is now available for download with bug fix for Non-Duplicates list - Sorry Bill!
John ~ Only comma separated Place parts are matched, so Isle of Man should score at most 3 points, but other Place parts and the Date could add to that.
Regarding the Date Chronology case of Father and Daughters, you are right, and I have added some extra checks into the next version. They might not eliminate the candidate duplicate pairs, but should reduce their score somewhat.
John ~ Only comma separated Place parts are matched, so Isle of Man should score at most 3 points, but other Place parts and the Date could add to that.
Regarding the Date Chronology case of Father and Daughters, you are right, and I have added some extra checks into the next version. They might not eliminate the candidate duplicate pairs, but should reduce their score somewhat.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 1.5+
Mike,
With 2.2 my non-duplicates list is still empty. Should it be? Do I have to rebuild it because of the error in 2.1?
Thanks,
Bill
With 2.2 my non-duplicates list is still empty. Should it be? Do I have to rebuild it because of the error in 2.1?
Thanks,
Bill
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
Yes, Bill, I am afraid you will have to rebuild the list, unless you happen to have a Project backup, including the ....fh_dataPlugin DataFind Duplicate Individuals.nondups file.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 1.5+
Mike,
No problem... just wanted to make sure I shouldn't see my old list before populating a new one.
Bill
No problem... just wanted to make sure I shouldn't see my old list before populating a new one.
Bill
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
One other possibility, if you are using Windows 7, is to right click the ....fh_dataPlugin DataFind Duplicate Individuals.nondups file and choose Properties, then on the Previous Versions tab an earlier Version may be available.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- tatewise
- Megastar
- Posts: 27078
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 1.5+
The Find Duplicate Individuals Version 2.3 is now available for download.
Sorry for the delay but vacations, visiting relations, school holidays, and the Olympics got in the way.
The main change is that all settings are now 'sticky' via the Set Preferences tab.
If you have edited the V2.2 User Preference Settings at the start of the LUA script, then preserve them before installing V2.3, and enter them into the new V2.3 Set Preferences tabs, where they will become saved in the Project ...Plugin DataFind Duplicate Individuals.dat file, which can be copied from Project to Project to transfer the settings.
A few extra Date Chronology checks have been added as suggested by John for Father and Daughters.
When comparing Place Name parts, any spaces and upper/lower case differences are disregarded similar to Individual Names.
If this version passes muster, then I will create structured Help & Advice pages and publish V3.0 in the Plugin Store.
Sorry for the delay but vacations, visiting relations, school holidays, and the Olympics got in the way.
The main change is that all settings are now 'sticky' via the Set Preferences tab.
If you have edited the V2.2 User Preference Settings at the start of the LUA script, then preserve them before installing V2.3, and enter them into the new V2.3 Set Preferences tabs, where they will become saved in the Project ...Plugin DataFind Duplicate Individuals.dat file, which can be copied from Project to Project to transfer the settings.
A few extra Date Chronology checks have been added as suggested by John for Father and Daughters.
When comparing Place Name parts, any spaces and upper/lower case differences are disregarded similar to Individual Names.
If this version passes muster, then I will create structured Help & Advice pages and publish V3.0 in the Plugin Store.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry