* Find Duplicate Individuals (All Relations)
- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Yes, sorry Bill that is a bug.
Somehow the line of code detecting the Stop in the Loading phase went AWOL.
The shorter Loading phase, using the new recursive algorithm, processes Record Id in what appears a random order.
An alternative would be to simply display a count of how many records have been Loaded, which will always increase sequentially.
The longer Scoring phase still ascends sequentially through Record Id from 1, but in an effort to reduce Progress Bar overhead the display is only updated about once per second or percent, so some may get skipped.
I realise on your system the run-time has increased about 10% with respect to Version 3.1, whereas on mine it has reduced about 10%.
Somehow the line of code detecting the Stop in the Loading phase went AWOL.
The shorter Loading phase, using the new recursive algorithm, processes Record Id in what appears a random order.
An alternative would be to simply display a count of how many records have been Loaded, which will always increase sequentially.
The longer Scoring phase still ascends sequentially through Record Id from 1, but in an effort to reduce Progress Bar overhead the display is only updated about once per second or percent, so some may get skipped.
I realise on your system the run-time has increased about 10% with respect to Version 3.1, whereas on mine it has reduced about 10%.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Loading approx 2m 10 sectatewise said:
Please can you all try Find Duplicate Individuals V3.1f via the WiP page, which has revised the Progress Bar messages again, plus a few other minor changes, as mentioned above.
When convenient, can you please supply the information requested in my previous posting above.
Scoring 1min 10 sec
Individualas: 7314
Families: 2077
Is scoring the same as checking? I didn't see any checking message unless it flashed up very quickly.
- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Thanks Gerry, yes, I have change the earlier request to Scoring Records.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
Hi Mike.
Version 3.1f
Individuals 32864
families 8991
Loading time 13m 15secs. Record ID and time update at random intervals Record Id's Non sequential
Scoring time 23m 05s. Record ID and time update at fairly regular intervals. Record ID's sequential.
Total run time 36m 20s
Time to display result set: less than 1 min
{edit}
I tried pre sorting records by ID, but that had no effect on the randomness of loading record ID's (I would guess they are loaded sequentially in the order they are accessed in the gedcom file)
However pressing the 'stop finding duplicates' had no effect at all. And even clicking the top right x on the window does not stop the plugin, it just closes the progress window, which then can't be re-opened.
Version 3.1f
Individuals 32864
families 8991
Loading time 13m 15secs. Record ID and time update at random intervals Record Id's Non sequential
Scoring time 23m 05s. Record ID and time update at fairly regular intervals. Record ID's sequential.
Total run time 36m 20s
Time to display result set: less than 1 min
{edit}
I tried pre sorting records by ID, but that had no effect on the randomness of loading record ID's (I would guess they are loaded sequentially in the order they are accessed in the gedcom file)
However pressing the 'stop finding duplicates' had no effect at all. And even clicking the top right x on the window does not stop the plugin, it just closes the progress window, which then can't be re-opened.
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
As for the record id display, it is completely up to you. I can live with it no matter how it is done. The plugin is so fast on my system with my database, that it really doesn't matter. Whatever speeds things up for those that have longer processing times is the way to go.
Thanks!
Bill
As for the record id display, it is completely up to you. I can live with it no matter how it is done. The plugin is so fast on my system with my database, that it really doesn't matter. Whatever speeds things up for those that have longer processing times is the way to go.
Thanks!
Bill
- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Please can you all try Find Duplicate Individuals V3.1g via the WiP page.
This has reverted to the original sequential processing of Record Id that many prefer, especially as the separate Loading and Scoring phased approach did not result in the run-time reductions I had expected.
Could you please provide your usual feedback of project size and run-time.
Gerry ~ I am fascinated that your Project is the only one that took longer in the Loading phase than the Scoring phase, and is one of the smaller Projects.
The Loading phase placed demands on the FH to Plugin interface to extract data from the Project.
Whereas, the Scoring phase was almost exclusively internal Plugin processing with little FH interaction until the Result Set display.
Is there anything unusual about your Project or PC set-up that might account for this anomaly?
This has reverted to the original sequential processing of Record Id that many prefer, especially as the separate Loading and Scoring phased approach did not result in the run-time reductions I had expected.
Could you please provide your usual feedback of project size and run-time.
Gerry ~ I am fascinated that your Project is the only one that took longer in the Loading phase than the Scoring phase, and is one of the smaller Projects.
The Loading phase placed demands on the FH to Plugin interface to extract data from the Project.
Whereas, the Scoring phase was almost exclusively internal Plugin processing with little FH interaction until the Result Set display.
Is there anything unusual about your Project or PC set-up that might account for this anomaly?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
On my database of 10,353 individuals:
Version 3.1: 98 seconds
Version 3.1g: 102 seconds
Thanks for the change to not have the progress window take focus. I was able to type this post while the plugin was running.
Thanks!
Bill
On my database of 10,353 individuals:
Version 3.1: 98 seconds
Version 3.1g: 102 seconds
Thanks for the change to not have the progress window take focus. I was able to type this post while the plugin was running.
Thanks!
Bill
- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Bill ~ In posting Re: Find Duplicate Individuals (All Relations) Posted on: 26/11/12 you suggested V3.1 took 80 s.
Do the timings tend to vary depending on what else you are doing?
Do the timings tend to vary depending on what else you are doing?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
I was surprised by this increase in duration as well. Usually, I find that the plugin is pretty consistent on its duration. I don't find that the timing varies depending on what I'm running.
Just now I ran them again and version 3.1 took 85 seconds and version 3.1g took 87 seconds.
Not sure what was going on this morning. I wasn't running any additional programs I was aware of. Maybe my antivirus was doing a background scan or something?
Bill
I was surprised by this increase in duration as well. Usually, I find that the plugin is pretty consistent on its duration. I don't find that the timing varies depending on what I'm running.
Just now I ran them again and version 3.1 took 85 seconds and version 3.1g took 87 seconds.
Not sure what was going on this morning. I wasn't running any additional programs I was aware of. Maybe my antivirus was doing a background scan or something?
Bill
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Mike, I am running FH in Windows XP under Parallels on my iMac. Not sure why this should make any difference to the relative speeds for each section. Would the number of Sources attached to an individual make any difference? There are 2970 sources and nearly 3200 multimedia items.tatewise said:
Gerry ~ I am fascinated that your Project is the only one that took longer in the Loading phase than the Scoring phase, and is one of the smaller Projects.
The Loading phase placed demands on the FH to Plugin interface to extract data from the Project.
Whereas, the Scoring phase was almost exclusively internal Plugin processing with little FH interaction until the Result Set display.
Is there anything unusual about your Project or PC set-up that might account for this anomaly?
- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
It is not so much the number of Source Records as the number of Citations that might be the cause.
What might be useful is to run the Show Project Statistics Plugin and look at the Records tab that shows the number of Source Records and the number of (Citation) Links to them.
What might be useful is to run the Show Project Statistics Plugin and look at the Records tab that shows the number of Source Records and the number of (Citation) Links to them.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
I'm not sure if it would help for comparison reasons, but I ran the Show Project Statistics plugin and my database has 1,952 sources with 49,661 links.
Bill
Bill
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Here are the results:tatewise said:
It is not so much the number of Source Records as the number of Citations that might be the cause.
What might be useful is to run the Show Project Statistics Plugin and look at the Records tab that shows the number of Source Records and the number of (Citation) Links to them.


- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Gerry ~ Nothing in those statistics looks particularly unusual.
There are rather more Flags than I tend to use, but that should not matter much.
Would it be interesting if I created a temporary Plugin designed to report the run-time of the Plugin FH API interface versus purely Plugin LUA Code?
This could be run by various users and we could compare times.
Quite unrelated, I notice from your statistics that you someone Born aged 18 and someone in a Census aged -2.
There are rather more Flags than I tend to use, but that should not matter much.
Would it be interesting if I created a temporary Plugin designed to report the run-time of the Plugin FH API interface versus purely Plugin LUA Code?
This could be run by various users and we could compare times.
Quite unrelated, I notice from your statistics that you someone Born aged 18 and someone in a Census aged -2.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Sounds like a good idea!tatewise said:
Gerry ~ Nothing in those statistics looks particularly unusual.
There are rather more Flags than I tend to use, but that should not matter much.
Would it be interesting if I created a temporary Plugin designed to report the run-time of the Plugin FH API interface versus purely Plugin LUA Code?
This could be run by various users and we could compare times.
Yes, I had spotted these - just a question of finding them! Any suggestions?Quite unrelated, I notice from your statistics that you someone Born aged 18 and someone in a Census aged -2.
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
I used a query to find age at birth and excluded those with age was null (I had someone born aged 102)
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Unfortunately a simple query isn't showing up the culprits as they must be buried somewhere in the Facts/Events.
- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
The age statistics are all under Age@ so as the Help & Advice mentions they are derived from the AgeAt() function and do NOT exist as Fact Date fields.
So make a Custom copy of the Standard 'All Facts' Query, and add a Column for Age At using =AgeAt(FactOwner(%FACT%,1,MALES_FIRST),%FACT.DATE%).
Then in the Result Set click on the Age At Column to sort into ascending order, or Alt key click to sort into descending order.
Click on the Fact Column to sort the Facts by Name, i.e. bring all the Birth Facts together.
Alternatively, use the Columns tab Sort option at the bottom.
So make a Custom copy of the Standard 'All Facts' Query, and add a Column for Age At using =AgeAt(FactOwner(%FACT%,1,MALES_FIRST),%FACT.DATE%).
Then in the Result Set click on the Age At Column to sort into ascending order, or Alt key click to sort into descending order.
Click on the Fact Column to sort the Facts by Name, i.e. bring all the Birth Facts together.
Alternatively, use the Columns tab Sort option at the bottom.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Would any of you like to try the Assess Plugins V1.0 via the WiP page.
This Plugin has two phases.
The first phase loops through each Individual Record and performs a variety of FH API interface functions (all read only) but does very little else.
The second phase loops through a variety of LUA local script statements with no FH API at all.
The run-time of the two phases is reported at the end.
The phases take about 2 secs each for 2000 Individuals and 1000000 loops on my Windows 7 Home Premium SP1 64-bit PC with Pentium Dual-Core 2.6 GHz CPU and 3 GB RAM.
This Plugin has two phases.
The first phase loops through each Individual Record and performs a variety of FH API interface functions (all read only) but does very little else.
The second phase loops through a variety of LUA local script statements with no FH API at all.
The run-time of the two phases is reported at the end.
The phases take about 2 secs each for 2000 Individuals and 1000000 loops on my Windows 7 Home Premium SP1 64-bit PC with Pentium Dual-Core 2.6 GHz CPU and 3 GB RAM.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2179
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
Here are my results for the Assess Plugins plugin:

Bill
Here are my results for the Assess Plugins plugin:

Bill
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Thanks, Mike. That showed up the problems nicely.tatewise said:
The age statistics are all under Age@ so as the Help & Advice mentions they are derived from the AgeAt() function and do NOT exist as Fact Date fields.
.....
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Mike,
Here are the results for my setup (Win XP running under Parallels on an iMac):

Here are the results for my setup (Win XP running under Parallels on an iMac):

- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
Hi Mike.
An Odd result on my XP SP3 2.4g Quad Core.

The data file has 33018 Individuals and 9040 families
An Odd result on my XP SP3 2.4g Quad Core.

The data file has 33018 Individuals and 9040 families
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
You can't post an image on an edit. So this is an edit to my previous post.
I have Found the problem.
Because the plugin window keeps grabbing the focus, I had managed to cancel the run unintentionally.
The correct result should have been:

I have Found the problem.
Because the plugin window keeps grabbing the focus, I had managed to cancel the run unintentionally.
The correct result should have been:

- tatewise
- Megastar
- Posts: 27075
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Gerry & John ~ Your assessments are remarkably similar, given that the results are rounded down to the nearest second.
The ratio between FH API and LUA Script is approaching 10 to 1 on both PC.
Whereas, Bill & my results have a ratio much nearer 1 to 1.
John's 2.4 GHz Quad-Core CPU is very similar to my 2.6 GHz Dual-Core CPU, yet the FH API time per 1,000 Individuals is about 10 times mine (& Bill's).
How much RAM do you each have on your PC?
Or maybe it is a Windows XP versus Windows 7 characteristic?
The ratio between FH API and LUA Script is approaching 10 to 1 on both PC.
Whereas, Bill & my results have a ratio much nearer 1 to 1.
John's 2.4 GHz Quad-Core CPU is very similar to my 2.6 GHz Dual-Core CPU, yet the FH API time per 1,000 Individuals is about 10 times mine (& Bill's).
How much RAM do you each have on your PC?
Or maybe it is a Windows XP versus Windows 7 characteristic?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry