Page 1 of 3
Find Duplicate Individuals (All Relations)
Posted: 15 Nov 2012 14:01
by tatewise
All previous
Find Duplicate Individuals versions only considered each Individual's first Mother, Father, Spouse, and Child.
This could lead to duplicates being missed if for example the 1st Child of one Individual is entered as the 2nd Child of the other Individual.
I have developed a version that considers ALL Mothers, Fathers, Spouses, and Children, and uses the score from the best matching pairs.
However, I am concerned that it may extend the Plugin run-time.
This version is available from WiP
find_duplicate_individuals.
It will install alongside the
Plugin Store Version 3.0, but uses the same
Preferences, and
Omit Non-Duplicates list, etc.
Both versions will overwrite the same saved
Result Set data file.
Please let me know how run-time is affected, and if there are any problems, or improvements that you notice.
ID:6590
Find Duplicate Individuals (All Relations)
Posted: 15 Nov 2012 19:10
by johnmorrisoniom
Hi Mike,
Just tried it on my main tree of 32500+ Run time 1949 secs (32.5 Mins) so no major impact on run time.
Highest score 55.
Highest Ind score 37.
Find Duplicate Individuals (All Relations)
Posted: 15 Nov 2012 20:34
by BillH
Mike,
On my system, when I downloaded the new version, it replaced the old version since it had the same name. This was just as fast as the old version.
On my tree with 10,327 individuals, it ran in 1 min 25 sec. The highest score was 42 and the highest individual score was 29.
I did find an individual where he was shown as the first child in one family and the second child in the other.
Bill
Find Duplicate Individuals (All Relations)
Posted: 15 Nov 2012 21:24
by tatewise
Bill, sorry, I guess you never downloaded the
Plugin Store Version named
Find Duplicate Individuals, and still had the old
WiP version named
find_duplicate_individuals.
You can still obtain
Find Duplicate Individuals Version 3.0 from the FH Plugin Store.
With positive feedback from you both, I plan to put this latest version into the Plugin Store next week, unless anyone else has problems.
The change for multiple instances of relations was fairly straight-forward.
However, the Marriage Event Dates/Places have a small residual problem.
The Plugin data currently only supports one Date/Place entry for a Marriage per Individual.
With only the 1st Spouse being considered, this was not a problem.
But multiple Spouses can result in multiple Marriage Dates/Places.
At present the Plugin utilises the Marriage Date/Place of the earliest Spouse for which a Marriage Date/Place exists.
When synthesising missing Dates, the Plugin only uses a Marriage Date to synthesis other Dates, if there is only one Spouse.
I suspect none of the above limitations have a significant impact on detecting duplicates.
Find Duplicate Individuals (All Relations)
Posted: 15 Nov 2012 21:29
by BillH
Mike,
No problem at all. I just had it override the old version. I just now cleaned out the old test version and downloaded and installed version 3.0 from the plugin store. I look forward to getting 3.1 from the store once it is out there.
Thanks,
Bill
Find Duplicate Individuals (All Relations)
Posted: 21 Nov 2012 15:18
by tatewise
There is a revised
Find Duplicate Individuals V3.1a in the WiP page.
This download will overwrite the previous
V3.1 WiP download of
find_duplicate_individuals.
This version not only considers all instances of Relations but also all instances of
Events.
So if there are multiple
Birth or
Baptism/Christening or
Marriage or
Death/Burial Events then the best matches will be used in the Results.
Can you please assess its impact on run-time on your large data bases, or any other side effects.
Find Duplicate Individuals (All Relations)
Posted: 21 Nov 2012 22:47
by BillH
Mike,
On my database of 10,340, version 3.0 ran in 83 seconds and version 3.1a ran in 81 seconds. I don't have 3.1 still installed to compare it to.
Bill
Find Duplicate Individuals (All Relations)
Posted: 23 Nov 2012 11:42
by tatewise
Plugin Store
Find Duplicate Individuals V3.1 is now available.
It is a slight upgrade from the V3.1a in WiP, with some small changes to improve run-time, and function prototypes to improve the LUA code structure.
Find Duplicate Individuals (All Relations)
Posted: 23 Nov 2012 17:39
by johnmorrisoniom
Mike,
At one of the version two stages it was possible to keepe the plugin window open whilst checking , to make life easier for adding pairs to the omit data. This seems to have disappeared, but I can't place which version it went away. Any chance of this returning, or did it interfere with something else?
Find Duplicate Individuals (All Relations)
Posted: 24 Nov 2012 12:12
by gerrynuk
tatewise said:
Plugin Store Find Duplicate Individuals V3.1 is now available.
Mike,
Just downloaded the latest version. Whilst it runs OK the Progress Bar/%age/Time displays do not update. The Individual ID display does update.
Not really bothered myself and perhaps an option to turn off the progress bar/%age/Time might speed things up even more?
Gerry
Find Duplicate Individuals (All Relations)
Posted: 24 Nov 2012 12:30
by tatewise
John ~ I don't believe that would have ever been possible, because to display the
Result Set any
Plugin must close.
If you open any
Plugin, then FH is locked, and any
Result Set on display cannot be manipulated.
I wonder if you are recalling the ability to add pairs of
Individuals from the
Result Set to a
Named List named
Non-Duplicates.
That technique was removed from
V3 because it didn't just exclude the pair, but excluded the two
Individuals separately, which users did not like.
Have you seen FAQ
How do I Avoid Long Run-Times?
Gerry ~ I have once or twice today experienced the same
Progress Bar problem, but not sure why.
I will investigate.
Find Duplicate Individuals (All Relations)
Posted: 25 Nov 2012 00:08
by tatewise
There is a revised
Find Duplicate Individuals V3.1b in the WiP page.
This download will overwrite the previous
V3.1a WiP download of
find_duplicate_individuals.
It uses a revised method of loading its database prior to running comparisons, which seems faster.
This method also avoids a bug in ~FATH> & ~MOTH> shortcuts that don't find both members of same sex partnerships.
It also fixes the
Progress Bar bug reported by Gerry.
Find Duplicate Individuals (All Relations)
Posted: 25 Nov 2012 17:45
by gerrynuk
tatewise said:
There is a revised Find Duplicate Individuals V3.1b in the WiP page.
....
It also fixes the Progress Bar bug reported by Gerry.
Thanks for the quick response, Mike.
Find Duplicate Individuals (All Relations)
Posted: 25 Nov 2012 19:46
by gerrynuk
Gerry Newnham said:
tatewise said:
There is a revised Find Duplicate Individuals V3.1b in the WiP page.
....
It also fixes the Progress Bar bug reported by Gerry.
Thanks for the quick response, Mike.
Sorry, Mike, I think I spoke to soon. The plugin hangs at loading database.
Find Duplicate Individuals (All Relations)
Posted: 26 Nov 2012 11:27
by tatewise
Gerry ~ Can you give me some details of your database:
- What is its size in terms of Individuals & Families ?
- How long did previous versions of Plugin take to complete ?
- Does Plugin Store Version 3.1 run OK ?
Please try revised
Find Duplicate Individuals V3.1c in the WiP page, which adds progress messages to the
Loading Records phase.
These last two versions of the Plugin use a new recursive technique while
Loading Records Database, which may be upsetting the FH LUA interpreter.
Find Duplicate Individuals (All Relations)
Posted: 26 Nov 2012 11:54
by gerrynuk
tatewise said:
Gerry ~ Can you give me some details of your database:
- What is its size in terms of Individuals & Families ?
- How long did previous versions of Plugin take to complete ?
- Does Plugin Store Version 3.1 run OK ?
Please try revised Find Duplicate Individuals V3.1c in the WiP page, which adds progress messages to the Loading Records phase.
These last two versions of the Plugin use a new recursive technique while Loading Records Database, which may be upsetting the FH LUA interpreter.
Mike,
The database has 7314 individuals and 2077 families. Version 3.1 (listed as version 8 on the plugin popup menu) runs in 2m 25 secs.
v3.1c (listed as version 10) stalls at 'Loading Record ID 1'.
Find Duplicate Individuals (All Relations)
Posted: 26 Nov 2012 12:46
by tatewise
Gerry ~ thank you for the feedback.
I don't understand, what is 'the plugin popup menu' and where does it list Version numbers?
Could you give some details of Individual Record Id 1:
Does it have any unusual or multiple Birth, Baptism, Christening, Marriage, Death, or Burial Events?
Does it have any unusual or multiple Father, Mother, Spouse, or Child Relations?
Or do any of these Relations lead to any unusual or multiple Relations?
Roughly how many generations up or down do these Relations extend?
I might need to post a version of the Plugin with more diagnostic progress messages in order to discover the problem.
Find Duplicate Individuals (All Relations)
Posted: 26 Nov 2012 13:25
by gerrynuk
tatewise said:
Gerry ~ thank you for the feedback.
I don't understand, what is 'the plugin popup menu' and where does it list Version numbers?
Mike - screen shot attached of Plugin Popup Menu:
Could you give some details of Individual Record Id 1:
Does it have any unusual or multiple Birth, Baptism, Christening, Marriage, Death, or Burial Events?
Does it have any unusual or multiple Father, Mother, Spouse, or Child Relations?
Or do any of these Relations lead to any unusual or multiple Relations?
Roughly how many generations up or down do these Relations extend?
Id1 is 4 generations down from the root (me!), had 2 parents and one spouse and 10 children. He has only one of each of the events you mention. He has never given problems before and I haven't changed his details for some time.
Find Duplicate Individuals (All Relations)
Posted: 26 Nov 2012 15:11
by tatewise
Gerry - Those Plugin filename suffixes are caused by the download cache in your browser, and can be ignored.
It is quite convenient, as you can run any of the recent Plugin versions.
Eventually they can all be
Deleted by right-clicking the Plugin name.
Please try
Find Duplicate Individuals V3.1d in the WiP page, which has revised progress messages for the
Loading Records phase.
I think it may have simply been that I put the progress messages in the wrong place.
Find Duplicate Individuals (All Relations)
Posted: 26 Nov 2012 19:42
by gerrynuk
tatewise said:
Gerry -
....
Please try Find Duplicate Individuals V3.1d in the WiP page, which has revised progress messages for the Loading Records phase.
I think it may have simply been that I put the progress messages in the wrong place.
Mike, that version seems to work OK. It took just under 2 mins to load the records and then a further minute to check them.
I notice that the progress bar wasn't updated until the checking phase. Is this intended?
Find Duplicate Individuals (All Relations)
Posted: 26 Nov 2012 22:48
by tatewise
Please can you all try
Find Duplicate Individuals V3.1e via the WiP page, which has revised the
Progress Bar messages.
Now that I have a better understanding of how various size databases interact with the Plugin revised design, I have altered the
Progress Bar methodology to both maintain useful progress messages, and minimise their effect on run-time.
The net effect should be reduced run-time compared with earlier versions ~ I hope!
I have also made the
Record Id digits fixed width with leading zeros. Is this preferable?
Find Duplicate Individuals (All Relations)
Posted: 26 Nov 2012 23:08
by BillH
Mike,
On my database of 10,344 individuals, version 3.1e takes 88 seconds. Version 3.1 takes 80 seconds. Just a bit slower in 3.1e, but still very fast.
It doesn't really matter, but I like it without the leading zeroes.
The progress window keeps taking back focus so that I can't do anything else on my computer while the plugin runs. Has it always been that way?
Bill
Find Duplicate Individuals (All Relations)
Posted: 27 Nov 2012 11:50
by tatewise
Can all of you please give me a rough idea of how long the Loading Records phase takes versus the Scoring Records phase?
Please also provide the number of Individuals in your database.
The leading zeros was an experiment to stop the Progress Bar message jiggling as the number of Record Id digits change.
I will use a different technique in the next version that has no leading zeros.
I think the Progress Bar taking focus method has been present since Version 2.4.
However, I can see it is annoying, so will change it back in next Version.
Find Duplicate Individuals (All Relations)
Posted: 27 Nov 2012 12:42
by tatewise
Please can you all try
Find Duplicate Individuals V3.1f via the WiP page, which has revised the
Progress Bar messages again, plus a few other minor changes, as mentioned above.
When convenient, can you please supply the information requested in my previous posting above.
Find Duplicate Individuals (All Relations)
Posted: 27 Nov 2012 17:31
by BillH
Mike,
On my database of 10,345, version 3.1f took 25 seconds for the loading phase and 63 seconds for the scoring phase for a total of 88 seconds.
I found what might be a bug. I clicked on the Stop Finding Duplicates button when it was at 2 seconds and it kept going until it got to about 55 seconds before it stopped. It used to stop immediately.
It doesn't really matter, but I think I liked the way the record id used to count up from 1 sequentially better than the way 3.1f displays the record id's in no particular order.
Bill