* Find Duplicate Individuals (All Relations)
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
All previous Find Duplicate Individuals versions only considered each Individual's first Mother, Father, Spouse, and Child.
This could lead to duplicates being missed if for example the 1st Child of one Individual is entered as the 2nd Child of the other Individual.
I have developed a version that considers ALL Mothers, Fathers, Spouses, and Children, and uses the score from the best matching pairs.
However, I am concerned that it may extend the Plugin run-time.
This version is available from WiP find_duplicate_individuals.
It will install alongside the Plugin Store Version 3.0, but uses the same Preferences, and Omit Non-Duplicates list, etc.
Both versions will overwrite the same saved Result Set data file.
Please let me know how run-time is affected, and if there are any problems, or improvements that you notice.
ID:6590
This could lead to duplicates being missed if for example the 1st Child of one Individual is entered as the 2nd Child of the other Individual.
I have developed a version that considers ALL Mothers, Fathers, Spouses, and Children, and uses the score from the best matching pairs.
However, I am concerned that it may extend the Plugin run-time.
This version is available from WiP find_duplicate_individuals.
It will install alongside the Plugin Store Version 3.0, but uses the same Preferences, and Omit Non-Duplicates list, etc.
Both versions will overwrite the same saved Result Set data file.
Please let me know how run-time is affected, and if there are any problems, or improvements that you notice.
ID:6590
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
Hi Mike,
Just tried it on my main tree of 32500+ Run time 1949 secs (32.5 Mins) so no major impact on run time.
Highest score 55.
Highest Ind score 37.
Just tried it on my main tree of 32500+ Run time 1949 secs (32.5 Mins) so no major impact on run time.
Highest score 55.
Highest Ind score 37.
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
On my system, when I downloaded the new version, it replaced the old version since it had the same name. This was just as fast as the old version.
On my tree with 10,327 individuals, it ran in 1 min 25 sec. The highest score was 42 and the highest individual score was 29.
I did find an individual where he was shown as the first child in one family and the second child in the other.
Bill
On my system, when I downloaded the new version, it replaced the old version since it had the same name. This was just as fast as the old version.
On my tree with 10,327 individuals, it ran in 1 min 25 sec. The highest score was 42 and the highest individual score was 29.
I did find an individual where he was shown as the first child in one family and the second child in the other.
Bill
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Bill, sorry, I guess you never downloaded the Plugin Store Version named Find Duplicate Individuals, and still had the old WiP version named find_duplicate_individuals.
You can still obtain Find Duplicate Individuals Version 3.0 from the FH Plugin Store.
With positive feedback from you both, I plan to put this latest version into the Plugin Store next week, unless anyone else has problems.
The change for multiple instances of relations was fairly straight-forward.
However, the Marriage Event Dates/Places have a small residual problem.
The Plugin data currently only supports one Date/Place entry for a Marriage per Individual.
With only the 1st Spouse being considered, this was not a problem.
But multiple Spouses can result in multiple Marriage Dates/Places.
At present the Plugin utilises the Marriage Date/Place of the earliest Spouse for which a Marriage Date/Place exists.
When synthesising missing Dates, the Plugin only uses a Marriage Date to synthesis other Dates, if there is only one Spouse.
I suspect none of the above limitations have a significant impact on detecting duplicates.
You can still obtain Find Duplicate Individuals Version 3.0 from the FH Plugin Store.
With positive feedback from you both, I plan to put this latest version into the Plugin Store next week, unless anyone else has problems.
The change for multiple instances of relations was fairly straight-forward.
However, the Marriage Event Dates/Places have a small residual problem.
The Plugin data currently only supports one Date/Place entry for a Marriage per Individual.
With only the 1st Spouse being considered, this was not a problem.
But multiple Spouses can result in multiple Marriage Dates/Places.
At present the Plugin utilises the Marriage Date/Place of the earliest Spouse for which a Marriage Date/Place exists.
When synthesising missing Dates, the Plugin only uses a Marriage Date to synthesis other Dates, if there is only one Spouse.
I suspect none of the above limitations have a significant impact on detecting duplicates.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
No problem at all. I just had it override the old version. I just now cleaned out the old test version and downloaded and installed version 3.0 from the plugin store. I look forward to getting 3.1 from the store once it is out there.
Thanks,
Bill
No problem at all. I just had it override the old version. I just now cleaned out the old test version and downloaded and installed version 3.0 from the plugin store. I look forward to getting 3.1 from the store once it is out there.
Thanks,
Bill
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
There is a revised Find Duplicate Individuals V3.1a in the WiP page.
This download will overwrite the previous V3.1 WiP download of find_duplicate_individuals.
This version not only considers all instances of Relations but also all instances of Events.
So if there are multiple Birth or Baptism/Christening or Marriage or Death/Burial Events then the best matches will be used in the Results.
Can you please assess its impact on run-time on your large data bases, or any other side effects.
This download will overwrite the previous V3.1 WiP download of find_duplicate_individuals.
This version not only considers all instances of Relations but also all instances of Events.
So if there are multiple Birth or Baptism/Christening or Marriage or Death/Burial Events then the best matches will be used in the Results.
Can you please assess its impact on run-time on your large data bases, or any other side effects.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
On my database of 10,340, version 3.0 ran in 83 seconds and version 3.1a ran in 81 seconds. I don't have 3.1 still installed to compare it to.
Bill
On my database of 10,340, version 3.0 ran in 83 seconds and version 3.1a ran in 81 seconds. I don't have 3.1 still installed to compare it to.
Bill
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Plugin Store Find Duplicate Individuals V3.1 is now available.
It is a slight upgrade from the V3.1a in WiP, with some small changes to improve run-time, and function prototypes to improve the LUA code structure.
It is a slight upgrade from the V3.1a in WiP, with some small changes to improve run-time, and function prototypes to improve the LUA code structure.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
Mike,
At one of the version two stages it was possible to keepe the plugin window open whilst checking , to make life easier for adding pairs to the omit data. This seems to have disappeared, but I can't place which version it went away. Any chance of this returning, or did it interfere with something else?
At one of the version two stages it was possible to keepe the plugin window open whilst checking , to make life easier for adding pairs to the omit data. This seems to have disappeared, but I can't place which version it went away. Any chance of this returning, or did it interfere with something else?
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Mike,tatewise said:
Plugin Store Find Duplicate Individuals V3.1 is now available.
Just downloaded the latest version. Whilst it runs OK the Progress Bar/%age/Time displays do not update. The Individual ID display does update.
Not really bothered myself and perhaps an option to turn off the progress bar/%age/Time might speed things up even more?
Gerry
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
John ~ I don't believe that would have ever been possible, because to display the Result Set any Plugin must close.
If you open any Plugin, then FH is locked, and any Result Set on display cannot be manipulated.
I wonder if you are recalling the ability to add pairs of Individuals from the Result Set to a Named List named Non-Duplicates.
That technique was removed from V3 because it didn't just exclude the pair, but excluded the two Individuals separately, which users did not like.
Have you seen FAQ How do I Avoid Long Run-Times?
Gerry ~ I have once or twice today experienced the same Progress Bar problem, but not sure why.
I will investigate.
If you open any Plugin, then FH is locked, and any Result Set on display cannot be manipulated.
I wonder if you are recalling the ability to add pairs of Individuals from the Result Set to a Named List named Non-Duplicates.
That technique was removed from V3 because it didn't just exclude the pair, but excluded the two Individuals separately, which users did not like.
Have you seen FAQ How do I Avoid Long Run-Times?
Gerry ~ I have once or twice today experienced the same Progress Bar problem, but not sure why.
I will investigate.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
There is a revised Find Duplicate Individuals V3.1b in the WiP page.
This download will overwrite the previous V3.1a WiP download of find_duplicate_individuals.
It uses a revised method of loading its database prior to running comparisons, which seems faster.
This method also avoids a bug in ~FATH> & ~MOTH> shortcuts that don't find both members of same sex partnerships.
It also fixes the Progress Bar bug reported by Gerry.
This download will overwrite the previous V3.1a WiP download of find_duplicate_individuals.
It uses a revised method of loading its database prior to running comparisons, which seems faster.
This method also avoids a bug in ~FATH> & ~MOTH> shortcuts that don't find both members of same sex partnerships.
It also fixes the Progress Bar bug reported by Gerry.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Thanks for the quick response, Mike.tatewise said:
There is a revised Find Duplicate Individuals V3.1b in the WiP page.
....
It also fixes the Progress Bar bug reported by Gerry.
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Sorry, Mike, I think I spoke to soon. The plugin hangs at loading database.Gerry Newnham said:
tatewise said:
There is a revised Find Duplicate Individuals V3.1b in the WiP page.
....
It also fixes the Progress Bar bug reported by Gerry.
Thanks for the quick response, Mike.
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Gerry ~ Can you give me some details of your database:
- What is its size in terms of Individuals & Families ?
- How long did previous versions of Plugin take to complete ?
- Does Plugin Store Version 3.1 run OK ?
Please try revised Find Duplicate Individuals V3.1c in the WiP page, which adds progress messages to the Loading Records phase.
These last two versions of the Plugin use a new recursive technique while Loading Records Database, which may be upsetting the FH LUA interpreter.
- What is its size in terms of Individuals & Families ?
- How long did previous versions of Plugin take to complete ?
- Does Plugin Store Version 3.1 run OK ?
Please try revised Find Duplicate Individuals V3.1c in the WiP page, which adds progress messages to the Loading Records phase.
These last two versions of the Plugin use a new recursive technique while Loading Records Database, which may be upsetting the FH LUA interpreter.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Mike,tatewise said:
Gerry ~ Can you give me some details of your database:
- What is its size in terms of Individuals & Families ?
- How long did previous versions of Plugin take to complete ?
- Does Plugin Store Version 3.1 run OK ?
Please try revised Find Duplicate Individuals V3.1c in the WiP page, which adds progress messages to the Loading Records phase.
These last two versions of the Plugin use a new recursive technique while Loading Records Database, which may be upsetting the FH LUA interpreter.
The database has 7314 individuals and 2077 families. Version 3.1 (listed as version 8 on the plugin popup menu) runs in 2m 25 secs.
v3.1c (listed as version 10) stalls at 'Loading Record ID 1'.
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Gerry ~ thank you for the feedback.
I don't understand, what is 'the plugin popup menu' and where does it list Version numbers?
Could you give some details of Individual Record Id 1:
Does it have any unusual or multiple Birth, Baptism, Christening, Marriage, Death, or Burial Events?
Does it have any unusual or multiple Father, Mother, Spouse, or Child Relations?
Or do any of these Relations lead to any unusual or multiple Relations?
Roughly how many generations up or down do these Relations extend?
I might need to post a version of the Plugin with more diagnostic progress messages in order to discover the problem.
I don't understand, what is 'the plugin popup menu' and where does it list Version numbers?
Could you give some details of Individual Record Id 1:
Does it have any unusual or multiple Birth, Baptism, Christening, Marriage, Death, or Burial Events?
Does it have any unusual or multiple Father, Mother, Spouse, or Child Relations?
Or do any of these Relations lead to any unusual or multiple Relations?
Roughly how many generations up or down do these Relations extend?
I might need to post a version of the Plugin with more diagnostic progress messages in order to discover the problem.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Mike - screen shot attached of Plugin Popup Menu:tatewise said:
Gerry ~ thank you for the feedback.
I don't understand, what is 'the plugin popup menu' and where does it list Version numbers?

Id1 is 4 generations down from the root (me!), had 2 parents and one spouse and 10 children. He has only one of each of the events you mention. He has never given problems before and I haven't changed his details for some time.Could you give some details of Individual Record Id 1:
Does it have any unusual or multiple Birth, Baptism, Christening, Marriage, Death, or Burial Events?
Does it have any unusual or multiple Father, Mother, Spouse, or Child Relations?
Or do any of these Relations lead to any unusual or multiple Relations?
Roughly how many generations up or down do these Relations extend?
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Gerry - Those Plugin filename suffixes are caused by the download cache in your browser, and can be ignored.
It is quite convenient, as you can run any of the recent Plugin versions.
Eventually they can all be Deleted by right-clicking the Plugin name.
Please try Find Duplicate Individuals V3.1d in the WiP page, which has revised progress messages for the Loading Records phase.
I think it may have simply been that I put the progress messages in the wrong place.
It is quite convenient, as you can run any of the recent Plugin versions.
Eventually they can all be Deleted by right-clicking the Plugin name.
Please try Find Duplicate Individuals V3.1d in the WiP page, which has revised progress messages for the Loading Records phase.
I think it may have simply been that I put the progress messages in the wrong place.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Mike, that version seems to work OK. It took just under 2 mins to load the records and then a further minute to check them.tatewise said:
Gerry -
....
Please try Find Duplicate Individuals V3.1d in the WiP page, which has revised progress messages for the Loading Records phase.
I think it may have simply been that I put the progress messages in the wrong place.
I notice that the progress bar wasn't updated until the checking phase. Is this intended?
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Please can you all try Find Duplicate Individuals V3.1e via the WiP page, which has revised the Progress Bar messages.
Now that I have a better understanding of how various size databases interact with the Plugin revised design, I have altered the Progress Bar methodology to both maintain useful progress messages, and minimise their effect on run-time.
The net effect should be reduced run-time compared with earlier versions ~ I hope!
I have also made the Record Id digits fixed width with leading zeros. Is this preferable?
Now that I have a better understanding of how various size databases interact with the Plugin revised design, I have altered the Progress Bar methodology to both maintain useful progress messages, and minimise their effect on run-time.
The net effect should be reduced run-time compared with earlier versions ~ I hope!
I have also made the Record Id digits fixed width with leading zeros. Is this preferable?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
On my database of 10,344 individuals, version 3.1e takes 88 seconds. Version 3.1 takes 80 seconds. Just a bit slower in 3.1e, but still very fast.
It doesn't really matter, but I like it without the leading zeroes.
The progress window keeps taking back focus so that I can't do anything else on my computer while the plugin runs. Has it always been that way?
Bill
On my database of 10,344 individuals, version 3.1e takes 88 seconds. Version 3.1 takes 80 seconds. Just a bit slower in 3.1e, but still very fast.
It doesn't really matter, but I like it without the leading zeroes.
The progress window keeps taking back focus so that I can't do anything else on my computer while the plugin runs. Has it always been that way?
Bill
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Can all of you please give me a rough idea of how long the Loading Records phase takes versus the Scoring Records phase?
Please also provide the number of Individuals in your database.
The leading zeros was an experiment to stop the Progress Bar message jiggling as the number of Record Id digits change.
I will use a different technique in the next version that has no leading zeros.
I think the Progress Bar taking focus method has been present since Version 2.4.
However, I can see it is annoying, so will change it back in next Version.
Please also provide the number of Individuals in your database.
The leading zeros was an experiment to stop the Progress Bar message jiggling as the number of Record Id digits change.
I will use a different technique in the next version that has no leading zeros.
I think the Progress Bar taking focus method has been present since Version 2.4.
However, I can see it is annoying, so will change it back in next Version.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Please can you all try Find Duplicate Individuals V3.1f via the WiP page, which has revised the Progress Bar messages again, plus a few other minor changes, as mentioned above.
When convenient, can you please supply the information requested in my previous posting above.
When convenient, can you please supply the information requested in my previous posting above.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
On my database of 10,345, version 3.1f took 25 seconds for the loading phase and 63 seconds for the scoring phase for a total of 88 seconds.
I found what might be a bug. I clicked on the Stop Finding Duplicates button when it was at 2 seconds and it kept going until it got to about 55 seconds before it stopped. It used to stop immediately.
It doesn't really matter, but I think I liked the way the record id used to count up from 1 sequentially better than the way 3.1f displays the record id's in no particular order.
Bill
On my database of 10,345, version 3.1f took 25 seconds for the loading phase and 63 seconds for the scoring phase for a total of 88 seconds.
I found what might be a bug. I clicked on the Stop Finding Duplicates button when it was at 2 seconds and it kept going until it got to about 55 seconds before it stopped. It used to stop immediately.
It doesn't really matter, but I think I liked the way the record id used to count up from 1 sequentially better than the way 3.1f displays the record id's in no particular order.
Bill