* Find Duplicate Individuals Version 1.5+

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals Version 1.5+

Post by BillH » 23 Jul 2012 21:18

Mike,

The IntXxxxLastWrong is working great.  Thank you.

I'm seeing something I hadn't noticed before.

I have two individuals named Isaac /Van Meter/ and William /Van Meter/.  They are ending up with 14 points under the Individual column.  I have the plugin values set as follows:

IntIndiDeduction      =        -5
IntIndiMinimum            =         1
IntIndiMaximum            =        20
IntIndiThreshold      =        10
IntIndiSoundex            =         2
IntIndiForeWrong      =         0
IntIndiForeRight      =         3
IntIndiLastRight      =         7
IntIndiLastWrong      =        -0

Shouldn't these two only get 7 points for IntIndiLastRight and no other points?  They have no alternate names and they have no event matches (in fact one doesn't have any events at all).

Actually, this happens to a lot of the folks with the last name of Van Meter.  Is it somehow thinking the 'Van' is part of the forenames instead of being part of the surname?  Or, alternatively, is it giving 7 points for Van matching and another 7 points for Meter matching?

Thanks,

Bill

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 27 Jul 2012 18:47

mikegscoles ~ As explained under Other in the Help page, the Plugin excludes siblings and other very close family members for two reasons. (1) The similar names and relations of close family members would result in many false positives. (2) Such close family member duplicates are easily spotted by inspection of a family tree.
Who counts as a 'close family member' can be adjusted by editing the User Preference Settings at the head of the Plugin script. These will probably become available on the Set Preferences tab in a later version.
If you were to duplicate your father as well as yourself in a separate small tree, then I am certain the Plugin would report the duplication.

Bill ~ Yes, multiple Surnames score 7 points each. Currently, any part of a Name that is only 1 or 2 characters long is ignored, but I have wondered if Names & Titles such as Van and Sir might cause a problem. So probably 3 character Names should also be ignored, or another parameter is needed to set minimum Name length.
BTW ~ Have you used the Omit Non-Duplicates tab feature yet?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals Version 1.5+

Post by BillH » 27 Jul 2012 22:33

Mike,

Hmmm... not sure that will work. I have some actual surnames that are three characters. Could it be changed so the entire surname (all parts) have to match in order to get the 7 points?

Yes... I did try the non-duplicates processing. Seemed to work great. Thanks much!

Bill

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 28 Jul 2012 09:53

I am not keen on the whole Surname match idea, because it is quite common for double-barrel Surnames such as SMITH-JONES to be recorded as just SMITH or just JONES in earlier or later stages of a person's life. Also these part Surnames often appear as Forenames, and result in a Soundex match. If the whole Surname was treated as one Name, then the above scenarios would not match and score zero.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
mikegscoles
Diamond
Posts: 66
Joined: 01 Sep 2006 21:27
Family Historian: V6.1

Find Duplicate Individuals Version 1.5+

Post by mikegscoles » 28 Jul 2012 13:08

Mike

Thanks for your reply.

In my tree of 1600 individuals I did as you suggested and added my father (as a duplicate)to my Grand father and myself as a son. So we both appear twice in my tree. The plugin does not pick it up. I also added my father as an unrelated individual and me as a son to him. We both appear twice in the tree but the plugin still does not pick us up.

Changing the file root does not make a difference.

Mike

User avatar
johnmorrisoniom
Megastar
Posts: 882
Joined: 18 Dec 2008 07:40
Family Historian: V7
Location: Isle of Man

Find Duplicate Individuals Version 1.5+

Post by johnmorrisoniom » 28 Jul 2012 17:03

Because you added your father to his father, he is seen as a sibling and ignored by the plugin.
If you enter your 3 generations totally separate from there originals, they should be picked up.

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals Version 1.5+

Post by BillH » 28 Jul 2012 17:21

Mike,

I can understand that for hyphenated names, but not so much for names with spaces in them. In my case, the surname is not Van or Meter it is 'Van Meter'. I don't think anyone with that name would go by just Van or just Meter.

That is OK, I can just add all my Van Meter individuals to the non-duplicates list.

Thanks,

Bill

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 28 Jul 2012 22:07

Mike ~ As the Help I referred to states:
Immediate family are excluded from the results. These family members are Spouses, Siblings, Parent/Child, and Grand-Parent/Grand-Child.
So, your Father and Son entries added to your Grandfather will be excluded.

However, the unrelated duplicate entries for Father and Son should be listed, unless there is over 100 higher scoring duplicates.
The duplicated Son entries should score 7 points for matching Surname, and 6 points for matching Forename, plus 6 points for matching Birth Date. His matching Father should add 7 points for matching Surname, and 6 points for matching Forename. So the total for the duplicated Son entries should be 32 points.
The duplicated Father entries should score exactly the same total points.

Please double check you have correctly set the Sex=Male of both the new Father and Son entries.
[EDIT] Also ensure both these new Father and Son entries are Unrelated to any other Individuals, i.e. a COMPLETELY SEPARATE family tree.

Could you give some more details about what does appear in the Result Set, and confirm you are using Plugin Version 2.0 with default settings, and you are not selecting any subset of the Individual Records.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
mikegscoles
Diamond
Posts: 66
Joined: 01 Sep 2006 21:27
Family Historian: V6.1

Find Duplicate Individuals Version 1.5+

Post by mikegscoles » 29 Jul 2012 11:31

Mike

Thanks once again for you reply.

I think I have found the answer.

So that I could easily spot the unrelated duplicate additions I used lowercase for the christian names. When I changed this to uppercase they appeared in the duplicate list.

For the test I added unrelated individuals grandfather, father and myself with DOB's.

I am using v 2.0 default settings.

Thanks for all the help you give to members of this group.

Mike

avatar
mikegscoles
Diamond
Posts: 66
Joined: 01 Sep 2006 21:27
Family Historian: V6.1

Find Duplicate Individuals Version 1.5+

Post by mikegscoles » 29 Jul 2012 12:12

Sorry Mike I wasn't clear. I used lowercase for the first letter of the christian when the duplicates were not found.

Uppercase for the first letter of the forename when the duplicates were found.

Mike

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 29 Jul 2012 13:49

Mike ~ Thanks for that feedback. I will modify the next version to disregard the case of characters in names, so that John will match john, etc.

Bill ~ I have had a rethink about space separated Surname parts, and you are right, Surnames such as Van Dyke and De la Roche should be treated differently from hyphen separated Surnames such as Smith-Jones.
So the next version will strip space characters from Surnames, and only split Surnames into parts using punctuation characters such as hyphen.
This has the probably beneficial side-effect that Van Dyke will match Vandyke, etc.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Valkrider
Megastar
Posts: 1533
Joined: 04 Jun 2012 19:03
Family Historian: V7
Location: Lincolnshire
Contact:

Find Duplicate Individuals Version 1.5+

Post by Valkrider » 29 Jul 2012 14:55

Mike

That will certainly help me too as mine used Le Fever and Lefever interchangeably throughout their lives.

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals Version 1.5+

Post by BillH » 29 Jul 2012 18:31

Mike,

That sounds great. I know my Van Meter family sometimes went by Vanmeter as well. The same thing happened for some of my other 'Van' families as well.

Thanks again,

Bill

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 29 Jul 2012 23:48

The Find Duplicate Individuals Version 2.1 is now available for download.

This version offers the Name handling improvements discussed above.

It also starts to add more sticky settings to the Set Preferences tab, which should be completed in the next version.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals Version 1.5+

Post by BillH » 30 Jul 2012 01:27

Mike,

Should version 2.1 be remembering the non-duplicate list from 2.0? Mine is empty.

Thanks,

Bill

User avatar
johnmorrisoniom
Megastar
Posts: 882
Joined: 18 Dec 2008 07:40
Family Historian: V7
Location: Isle of Man

Find Duplicate Individuals Version 1.5+

Post by johnmorrisoniom » 30 Jul 2012 11:56

Mike,
How are place names handled?
I think I may be suffering from part of the place name swamping the other parts. A lot of my individuals are from different parts of the 'Isle of Man'. Is this getting treated as 3 parts, or 1.
I also did a test with version 1.9 with 3 individuals selected.
Father with no dates at all, and two daughters with calculated DOB of 1840 and 1850 respectively.
The father was matched with several candidates who were only born after 1840, should this have been possible?

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 30 Jul 2012 12:50

The Find Duplicate Individuals Version 2.2 is now available for download with bug fix for Non-Duplicates list - Sorry Bill!

John ~ Only comma separated Place parts are matched, so Isle of Man should score at most 3 points, but other Place parts and the Date could add to that.
Regarding the Date Chronology case of Father and Daughters, you are right, and I have added some extra checks into the next version. They might not eliminate the candidate duplicate pairs, but should reduce their score somewhat.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals Version 1.5+

Post by BillH » 30 Jul 2012 19:37

Mike,

With 2.2 my non-duplicates list is still empty.  Should it be?  Do I have to rebuild it because of the error in 2.1?

Thanks,

Bill

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 30 Jul 2012 21:29

Yes, Bill, I am afraid you will have to rebuild the list, unless you happen to have a Project backup, including the ....fh_dataPlugin DataFind Duplicate Individuals.nondups file.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals Version 1.5+

Post by BillH » 30 Jul 2012 22:42

Mike,

No problem... just wanted to make sure I shouldn't see my old list before populating a new one.

Bill

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 30 Jul 2012 22:46

One other possibility, if you are using Windows 7, is to right click the ....fh_dataPlugin DataFind Duplicate Individuals.nondups file and choose Properties, then on the Previous Versions tab an earlier Version may be available.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 27075
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals Version 1.5+

Post by tatewise » 07 Oct 2012 16:50

The Find Duplicate Individuals Version 2.3 is now available for download.

Sorry for the delay but vacations, visiting relations, school holidays, and the Olympics got in the way.

The main change is that all settings are now 'sticky' via the Set Preferences tab.

If you have edited the V2.2 User Preference Settings at the start of the LUA script, then preserve them before installing V2.3, and enter them into the new V2.3 Set Preferences tabs, where they will become saved in the Project ...Plugin DataFind Duplicate Individuals.dat file, which can be copied from Project to Project to transfer the settings.

A few extra Date Chronology checks have been added as suggested by John for Father and Daughters.

When comparing Place Name parts, any spaces and upper/lower case differences are disregarded similar to Individual Names.

If this version passes muster, then I will create structured Help & Advice pages and publish V3.0 in the Plugin Store.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

Post Reply