* Find Duplicate Individuals Version 2.3+
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 2.3+
Mike,
A few thoughts on version 2.4.
1. In the Set Preferences tab, I'm a little confused. I read the help, but still can't quite figure out the difference between the Individual Threshold on the User Interface tab and the one on the Names Matching tab. Could you explain the difference?
2. What happened to the Individual Minimum and the Individual Deduction values on the Names Matching tab? Can we no longer set these?
3. I liked the order of the values on the Names Matching tab better in version 2.3. It had the Last Right and Last Wrong together and the Fore Right and Fore Wrong together, and all four of these were grouped together.
4. What happened to Fore Wrong and what is Fore Other?
5. Not a big deal, but 2.4 takes about 1 minute 55 seconds for my file of 10,005 individuals whereas 2.3 takes about 1 min 37 seconds. So it is just a tad bit slower. Still very acceptable though.
6. The incrementing of the time with the percentage works great for me.
Thanks again for a great plugin.
Bill
A few thoughts on version 2.4.
1. In the Set Preferences tab, I'm a little confused. I read the help, but still can't quite figure out the difference between the Individual Threshold on the User Interface tab and the one on the Names Matching tab. Could you explain the difference?
2. What happened to the Individual Minimum and the Individual Deduction values on the Names Matching tab? Can we no longer set these?
3. I liked the order of the values on the Names Matching tab better in version 2.3. It had the Last Right and Last Wrong together and the Fore Right and Fore Wrong together, and all four of these were grouped together.
4. What happened to Fore Wrong and what is Fore Other?
5. Not a big deal, but 2.4 takes about 1 minute 55 seconds for my file of 10,005 individuals whereas 2.3 takes about 1 min 37 seconds. So it is just a tad bit slower. Still very acceptable though.
6. The incrementing of the time with the percentage works great for me.
Thanks again for a great plugin.
Bill
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
Bill, thanks for the feed back.
1.
The Threshold on the Names Matching tab is applied after the Names Assessment on each pair of family members (Individual, Father, Mother, etc), to decide if Event Assessment should proceed for that pair.
The Individual Threshold on the User Interface tab is only applied to the pair of Individuals after completing Names Assessment and Event Assessment, to decide if their Relations (Father, Mother, etc) should be assessed.
2.
If after assessing a pair of Individuals, the score has not reached the Names Assessment Threshold, then not only is Event Assessment abandoned, but the pair are eliminated from the Results.
Making their score negative will not affect that decision, unless the Names Assessment Threshold for Individuals is set to 0, which is unlikely.
3.
I suspect it is simply that you have become accustomed to the old order.
The V2.4 order is consistent with how they are used in the assessments, and how the Help & Advice explains them.
It now groups them in order of precedence, firstly for how the Name fields are assessed, and then how the resulting Score is assessed.
So the assessment progresses through the values in the order presented on the tab.
Last Wrong is an overriding (but optional) assessment applied independently of the other scoring.
It seemed inappropriate to put it at the top of the values, since it is optional.
This same logic is applied to all the Set Preferences tabs.
4.
Fore Wrong and Place Part Wrong were really misnomers, and have just been renamed using ...Other.
They mean the name matches, but other than in the correct position.
Whereas Last Wrong does mean the Lastname is wrong and mismatches.
5.
Please check that Names Matching values all match the Defaults except where you have deliberately changed them.
In particular the Threshold under Individual should be 6 not 9.
6.
Since your run times are about 100 secs or so, the 1% steps and 1 sec steps almost coincide.
The problem arises with run times of many minutes or more, but will be resolved in the next release.
1.
The Threshold on the Names Matching tab is applied after the Names Assessment on each pair of family members (Individual, Father, Mother, etc), to decide if Event Assessment should proceed for that pair.
The Individual Threshold on the User Interface tab is only applied to the pair of Individuals after completing Names Assessment and Event Assessment, to decide if their Relations (Father, Mother, etc) should be assessed.
2.
If after assessing a pair of Individuals, the score has not reached the Names Assessment Threshold, then not only is Event Assessment abandoned, but the pair are eliminated from the Results.
Making their score negative will not affect that decision, unless the Names Assessment Threshold for Individuals is set to 0, which is unlikely.
3.
I suspect it is simply that you have become accustomed to the old order.
The V2.4 order is consistent with how they are used in the assessments, and how the Help & Advice explains them.
It now groups them in order of precedence, firstly for how the Name fields are assessed, and then how the resulting Score is assessed.
So the assessment progresses through the values in the order presented on the tab.
Last Wrong is an overriding (but optional) assessment applied independently of the other scoring.
It seemed inappropriate to put it at the top of the values, since it is optional.
This same logic is applied to all the Set Preferences tabs.
4.
Fore Wrong and Place Part Wrong were really misnomers, and have just been renamed using ...Other.
They mean the name matches, but other than in the correct position.
Whereas Last Wrong does mean the Lastname is wrong and mismatches.
5.
Please check that Names Matching values all match the Defaults except where you have deliberately changed them.
In particular the Threshold under Individual should be 6 not 9.
6.
Since your run times are about 100 secs or so, the 1% steps and 1 sec steps almost coincide.
The problem arises with run times of many minutes or more, but will be resolved in the next release.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- Jane
- Site Admin
- Posts: 8442
- Joined: 01 Nov 2002 15:00
- Family Historian: V7
- Location: Somerset, England
- Contact:
Find Duplicate Individuals Version 2.3+
Mike, I was wondering if you had tried making all the 'child' functions for FindDuplicateRecords() local to it, as they are all called many times making them local might help performance.
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 2.3+
Mike,
Thanks for the explanations. I think everything is fine the way you have it.
As for the timing issue, #5, I was using the same values for version 2.4 as I used for 2.3. I had changed some values from their defaults.
Version 2.3:

Version 2.4:

Thanks,
Bill
Thanks for the explanations. I think everything is fine the way you have it.
As for the timing issue, #5, I was using the same values for version 2.4 as I used for 2.3. I had changed some values from their defaults.
Version 2.3:

Version 2.4:

Thanks,
Bill
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
Jane, strangely making those 'child' functions local makes very little difference.
If anything the local function version is marginally slower.
Bill, the slightly longer run time of V2.4 is possibly down to some minor changes in the Name assessments.
One ensures LastNameRight/Wrong checks are infallible now that users have greater freedom over their preferred points.
Another makes Place Name assessment ignore punctuation as well spaces.
Although quite tiny changes, they are executed thousands of times, and it all adds up.
If anything the local function version is marginally slower.
Bill, the slightly longer run time of V2.4 is possibly down to some minor changes in the Name assessments.
One ensures LastNameRight/Wrong checks are infallible now that users have greater freedom over their preferred points.
Another makes Place Name assessment ignore punctuation as well spaces.
Although quite tiny changes, they are executed thousands of times, and it all adds up.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 2.3+
Mike,
Sounds right. As I mentioned, this wasn't really a problem. It is still very fast for me.
Thanks,
Bill
Sounds right. As I mentioned, this wasn't really a problem. It is still very fast for me.
Thanks,
Bill
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 2.3+
Hi Mike,
Just tried v2.4 on my file of 32014 individuals, 8691 Families. I returned settings to default, apart from last wrong, which I set to -1 instead of 0.
Time taken was just over 35 Minutes, then a gap of 2 minutes thirty five (I managed to time it this time) from when progress bar went, to result set appearing.
Highest total score 43 (17I, 13F, 13M)(Not a Match.
Definitely much faster on a large file.
Just tried v2.4 on my file of 32014 individuals, 8691 Families. I returned settings to default, apart from last wrong, which I set to -1 instead of 0.
Time taken was just over 35 Minutes, then a gap of 2 minutes thirty five (I managed to time it this time) from when progress bar went, to result set appearing.
Highest total score 43 (17I, 13F, 13M)(Not a Match.
Definitely much faster on a large file.
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
I would be interested to discover why it takes so long to get from Progress Bar closure to Result Set.
Could you edit the Plugin and comment out two lines near the end.
On line 2484 and 2486 insert -- double hyphen comment markers at start of the dateTimespan:Set... lines.
These add the Date Timespans to the Result Set but only appear if Enable Diagnostics Mode and Including Date Timespans are both ticked.
Then run the Plugin as before and note the time from Progress Bar closure to Result Set.
[PS EDIT]
Alternatively, run V2.5 and note the time between Progress Bar Messages at the end.
e.g.
Sorting Result Set Candidates
Adding Result Set Timespans
Composing Result Set Entries
Could you edit the Plugin and comment out two lines near the end.
On line 2484 and 2486 insert -- double hyphen comment markers at start of the dateTimespan:Set... lines.
These add the Date Timespans to the Result Set but only appear if Enable Diagnostics Mode and Including Date Timespans are both ticked.
Then run the Plugin as before and note the time from Progress Bar closure to Result Set.
[PS EDIT]
Alternatively, run V2.5 and note the time between Progress Bar Messages at the end.
e.g.
Sorting Result Set Candidates
Adding Result Set Timespans
Composing Result Set Entries
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
The Find Duplicate Individuals Version 2.5 is available for download.
I hope this version has reduced run time slightly ~ please let me know.
It adds some progress messages to the Progress Bar indicating what Record Id it has reached, and finally what operations it performs on the Result Set.
When I get time, the Help & Advice pages for the Set Preferences tab will get added.
I hope this version has reduced run time slightly ~ please let me know.
It adds some progress messages to the Progress Bar indicating what Record Id it has reached, and finally what operations it performs on the Result Set.
When I get time, the Help & Advice pages for the Set Preferences tab will get added.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals Version 2.3+
Mike,
2.5 is a little faster for me. My 10,005 person file ran in 1m 33s, whereas 2.4 took about 1m 55s.
Not sure if the progress bar is working like planned or not. For me it said it was working on ID 1 for the first minute or so and then changed and said it was on ID 7931 until the window disappeared. Those were the only two ID numbers that displayed.
Bill
2.5 is a little faster for me. My 10,005 person file ran in 1m 33s, whereas 2.4 took about 1m 55s.
Not sure if the progress bar is working like planned or not. For me it said it was working on ID 1 for the first minute or so and then changed and said it was on ID 7931 until the window disappeared. Those were the only two ID numbers that displayed.
Bill
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
Run time sounds correct, as V2.5 should be about the same or better than V2.3.
To avoid impacting run time, the Record Id is currently only updated when the Set Preferences tab User Interface tab Memory Conservation limit is reached.
So if few candidates are being discovered, the progress will update infrequently.
I might rework that, but in any case it will eventually be described in the Help & Advice.
To avoid impacting run time, the Record Id is currently only updated when the Set Preferences tab User Interface tab Memory Conservation limit is reached.
So if few candidates are being discovered, the progress will update infrequently.
I might rework that, but in any case it will eventually be described in the Help & Advice.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 2.3+
Hi Mike,
Version 2.5 does seem slightly faster (Just over 30mins for 32000+ records).
The three messages at the finish of the run, were so fat that I couldn't read them, The progress bar went and 1Min 45 secs later the result set appeared.
2.5 seems to have installed alongside 2.4, so I have both at the moment.
Could this occur if one was installed via a windows XP machine and the other on a windows 7 machine?
The record ID seemed to only update about 5 or 6 times during the run, and I could not work out a pattern as to when it was being updated, more quickly at the beginning with the time span an ID was show getting longer each time ( I would have expected it to be the other way round [I used to know the formula for this, but can't remember it]).
Version 2.5 does seem slightly faster (Just over 30mins for 32000+ records).
The three messages at the finish of the run, were so fat that I couldn't read them, The progress bar went and 1Min 45 secs later the result set appeared.
2.5 seems to have installed alongside 2.4, so I have both at the moment.
Could this occur if one was installed via a windows XP machine and the other on a windows 7 machine?
The record ID seemed to only update about 5 or 6 times during the run, and I could not work out a pattern as to when it was being updated, more quickly at the beginning with the time span an ID was show getting longer each time ( I would have expected it to be the other way round [I used to know the formula for this, but can't remember it]).
- Jane
- Site Admin
- Posts: 8442
- Joined: 01 Nov 2002 15:00
- Family Historian: V7
- Location: Somerset, England
- Contact:
Find Duplicate Individuals Version 2.3+
John,
How large is the result set? Lua works in a 'Virtual Machine' so at the end all the table values are passed back to the result set window, so it could be the delay is after Mikes code finishes and FH picks up the data and displays it.
How large is the result set? Lua works in a 'Virtual Machine' so at the end all the table values are passed back to the result set window, so it could be the delay is after Mikes code finishes and FH picks up the data and displays it.
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
Yes, I too am interested in why it takes so long for John's Result Set to appear.
The only significant thing the Plugin does after outputting the Result Set is to save the 'sticky' data file, Results Set file, Non-Duplicates list file, and Soundex cache file.
John, how large are these files?
As I explained before, use Windows Explorer and navigate to the Plugin Data folder at:
C:Users{user}DocumentsFamily Historian Projects{project}{project}.fh_dataPlugin Data
What are the sizes of:
Find Duplicate Individuals.dat
Find Duplicate Individuals.nondups
Find Duplicate Individuals.results
Find Duplicate Individuals.soundex
John, what are the names of the two Plugins?
They must be different, otherwise one would have overwritten the other.
The expected name from the WiP download would be find_duplicate_individuals.
When you run the Plugin, how long does it take for the user interface to appear?
Apart from the Result Set file, the Plugin loads the other three files at startup, before displaying the GUI.
Default Result Set size is 100 entries.Earlier John has said:
10/22/12 - 09:09:22 Result set is exactly 100 pairs.
10/26/12 - 20:58:54 I returned settings to default
The only significant thing the Plugin does after outputting the Result Set is to save the 'sticky' data file, Results Set file, Non-Duplicates list file, and Soundex cache file.
John, how large are these files?
As I explained before, use Windows Explorer and navigate to the Plugin Data folder at:
C:Users{user}DocumentsFamily Historian Projects{project}{project}.fh_dataPlugin Data
What are the sizes of:
Find Duplicate Individuals.dat
Find Duplicate Individuals.nondups
Find Duplicate Individuals.results
Find Duplicate Individuals.soundex
John, what are the names of the two Plugins?
They must be different, otherwise one would have overwritten the other.
The expected name from the WiP download would be find_duplicate_individuals.
When you run the Plugin, how long does it take for the user interface to appear?
Apart from the Result Set file, the Plugin loads the other three files at startup, before displaying the GUI.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 2.3+
Hi Mike
.dat file is 2kb
,nondups file is 22kb
. results file is 121kb
.soundex file is 172kb
result set is 100 pairs
highest score is 37
Lowest is 33
The file name of version 2.5 has a space before the dot
[find_duplicate_individuals .fh_lua]
version 2.4 does not
[find_duplicate_individuals.fh_lua]
When I run the plugin, the interface appears straight away.
For consistency, I have run my tests on my laptop, which is an i3 W7 64 bit
I also can run it on a quad core W7 84bit and a Quad Core Win XP to see if there are any major changes., but that will be Sunday before I can do that.
.dat file is 2kb
,nondups file is 22kb
. results file is 121kb
.soundex file is 172kb
result set is 100 pairs
highest score is 37
Lowest is 33
The file name of version 2.5 has a space before the dot
[find_duplicate_individuals .fh_lua]
version 2.4 does not
[find_duplicate_individuals.fh_lua]
When I run the plugin, the interface appears straight away.
For consistency, I have run my tests on my laptop, which is an i3 W7 64 bit
I also can run it on a quad core W7 84bit and a Quad Core Win XP to see if there are any major changes., but that will be Sunday before I can do that.
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
The file sizes are not exceptional.
I have Results Files double that size, although I have not seen a Soundex File that large.
The Plugin saves the Soundex File more often than necessary, and I will improve that.
I cannot explain where the space before the dot came from in the find_duplicate_individuals .fh_lua filename.
Did you download differently for V2.5 than for V2.4?
I have Results Files double that size, although I have not seen a Soundex File that large.
The Plugin saves the Soundex File more often than necessary, and I will improve that.
I cannot explain where the space before the dot came from in the find_duplicate_individuals .fh_lua filename.
Did you download differently for V2.5 than for V2.4?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 2.3+
Hi Mike,
I use google chrome on all my computers, but sometimes download on an XP machine, other times on W7.
This has happened before on version 2.0 to 2.1 and also on one of Jane's plugin's.
I use google chrome on all my computers, but sometimes download on an XP machine, other times on W7.
This has happened before on version 2.0 to 2.1 and also on one of Jane's plugin's.
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 2.3+
Hi Mike,
I have managed to replicate the file name occurrence.
I deleted version 2.4, then renamed version 2.5 to remove the space.
I then downloaded a new copy, and double clicked to install it. The new copy was installed alongside the original.
Looking at my downloads folder, this is what I found.

When the file has been run, the number and Brackets have been correctly removed, but not the preceding space.
I seem to remember something similar happening befor on a previous thread, but at that time, the part in brackets was also retained.
I have managed to replicate the file name occurrence.
I deleted version 2.4, then renamed version 2.5 to remove the space.
I then downloaded a new copy, and double clicked to install it. The new copy was installed alongside the original.
Looking at my downloads folder, this is what I found.

When the file has been run, the number and Brackets have been correctly removed, but not the preceding space.
I seem to remember something similar happening befor on a previous thread, but at that time, the part in brackets was also retained.
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
You are right that those symptoms have arisen before, but had supposedly been fixed.
The Find Duplicate Individuals Version 2.6 is available for download.
This may reduce run time slightly, but mainly updates the Progress Bar presentation, and adjusts the way Soundex cache files are loaded & saved.
If Saving Soundex Cache file takes a long time then it will be apparent in the Progress Bar messages.
The Find Duplicate Individuals Version 2.6 is available for download.
This may reduce run time slightly, but mainly updates the Progress Bar presentation, and adjusts the way Soundex cache files are loaded & saved.
If Saving Soundex Cache file takes a long time then it will be apparent in the Progress Bar messages.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 2.3+
Hi Mike,
Version 2.6 took 32 Minutes with 32061 individuals. About 1min 30 secs to produce result set.
Having the record Id advancing is a definite improvement, as it shows a progression, Id's changed very fast to start with gradually slowing to about 2 per second in the last few percent
Version 2.6 took 32 Minutes with 32061 individuals. About 1min 30 secs to produce result set.
Having the record Id advancing is a definite improvement, as it shows a progression, Id's changed very fast to start with gradually slowing to about 2 per second in the last few percent
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
Yes, the Individual Id progression will slow down as explained below.
The 1[sup]st[/sup] Id is compared with nobody, so is very quick.
The 2[sup]nd[/sup] Id is compared with 1[sup]st[/sup] Id, so is still quick.
The 3[sup]rd[/sup] Id is compared with 1[sup]st[/sup] & 2[sup]nd[/sup] Id, but still quick.
The 4[sup]th[/sup] Id is compared with 1[sup]st[/sup] & 2[sup]nd[/sup] & 3[sup]rd[/sup] Id.
You get the picture...
The 1,000[sup]th[/sup] Id is compared with 1[sup]st[/sup] through 999[sup]th[/sup] Id, so slowing down.
The 32,000[sup]th[/sup] Id is compared with 1[sup]st[/sup] through 31,999[sup]th[/sup] Id, so quite slow.
If the 1 min 30 secs to produce Result Set is after Progress Bar closes, then it can only be FH that is slow.
Maybe the large size of the database (32,000+) is the problem, even with a small Result Set of 100 pairs of Individuals.
However, the Plugin Show previous Result Set of Duplicates in Family Historian for the same Result Set displays quickly.
Could it be the Plugin LUA code garbage collecting a complex table with 32000+ entries, one per Individual, when it closes?
John, can you run the Plugin on a smaller database, of say 3,000 Individuals, just to see what happens.
Perhaps Jane or Simon have some ideas.
The 1[sup]st[/sup] Id is compared with nobody, so is very quick.
The 2[sup]nd[/sup] Id is compared with 1[sup]st[/sup] Id, so is still quick.
The 3[sup]rd[/sup] Id is compared with 1[sup]st[/sup] & 2[sup]nd[/sup] Id, but still quick.
The 4[sup]th[/sup] Id is compared with 1[sup]st[/sup] & 2[sup]nd[/sup] & 3[sup]rd[/sup] Id.
You get the picture...
The 1,000[sup]th[/sup] Id is compared with 1[sup]st[/sup] through 999[sup]th[/sup] Id, so slowing down.
The 32,000[sup]th[/sup] Id is compared with 1[sup]st[/sup] through 31,999[sup]th[/sup] Id, so quite slow.
If the 1 min 30 secs to produce Result Set is after Progress Bar closes, then it can only be FH that is slow.
Maybe the large size of the database (32,000+) is the problem, even with a small Result Set of 100 pairs of Individuals.
However, the Plugin Show previous Result Set of Duplicates in Family Historian for the same Result Set displays quickly.
Could it be the Plugin LUA code garbage collecting a complex table with 32000+ entries, one per Individual, when it closes?
John, can you run the Plugin on a smaller database, of say 3,000 Individuals, just to see what happens.
Perhaps Jane or Simon have some ideas.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 2.3+
Hi Mike,
I ran the plugin on a very small data set (288)
Everything happened so fast I didn't even get a progress bar.
Then tried a sub set (4575) of my large file (basically everyone not in pool 1) Plugin ran in 34 seconds with not discernible wait to gt the result set.
I can therefore only think that the wait time I am getting with the full data-set is just FH number crunching after the plugin has finished.
I have also found that although a pair not match, it leads to more investigation that quite often does produce a match.
I ran the plugin on a very small data set (288)
Everything happened so fast I didn't even get a progress bar.
Then tried a sub set (4575) of my large file (basically everyone not in pool 1) Plugin ran in 34 seconds with not discernible wait to gt the result set.
I can therefore only think that the wait time I am getting with the full data-set is just FH number crunching after the plugin has finished.
I have also found that although a pair not match, it leads to more investigation that quite often does produce a match.
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
The odd thing is that earlier you said using the Plugin Show previous Result Set of Duplicates in Family Historian for the same Result Set displays quickly, despite needing the same FH number crunching.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals Version 2.3+
Hi Mike,
Seperate problem now.
On my reduced dataset project, when I try to load previous result set I get the following error.
The whole plugin screen then 'Greys out' and the plugin has to be closed with the rhs X.
This project has only ever had version 2.6 run on it.
When I try to look at the Omit Non-Duplicates list I also get an error:
The plugin does not 'Grey Out' and navigation back to the main tab is possible, and all button are active.
Seperate problem now.
On my reduced dataset project, when I try to load previous result set I get the following error.
Code: Select all
. HistorianPluginsfind_duplicate_individuals.fh_lua:1607: bad argument #2 to 'MoveToRecordById' (number expected, got nil)
stack traceback:
[C]: in function 'MoveToRecordById'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1607: in function 'strFormatResult'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1632: in function 'doDisplayTables'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1660: in function 'doLoadLists'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1715: in function <... HistorianPluginsfind_duplicate_individuals.fh_lua:1710>
(tail call): ?
[C]: in function 'MainLoop'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1981: in function 'GUI_MainDialogue'
... HistorianPluginsfind_duplicate_individuals.fh_lua:2671: in main chunk.
This project has only ever had version 2.6 run on it.
When I try to look at the Omit Non-Duplicates list I also get an error:
Code: Select all
... HistorianPluginsfind_duplicate_individuals.fh_lua:1607: bad argument #2 to 'MoveToRecordById' (number expected, got nil)
stack traceback:
[C]: in function 'MoveToRecordById'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1607: in function 'strFormatResult'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1632: in function 'doDisplayTables'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1660: in function 'doLoadLists'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1955: in function <... HistorianPluginsfind_duplicate_individuals.fh_lua:1953>
(tail call): ?
[C]: in function 'MainLoop'
... HistorianPluginsfind_duplicate_individuals.fh_lua:1981: in function 'GUI_MainDialogue'
... HistorianPluginsfind_duplicate_individuals.fh_lua:2671: in main chunk.
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals Version 2.3+
That is very odd, and appears to be caused by missing Record Id data in the Non-Duplicates file.
I can only reproduce that error if I manually edit the Find Duplicate Individuals.nondups file.
What is the history of that file in the reduced dataset project?
I can only reproduce that error if I manually edit the Find Duplicate Individuals.nondups file.
What is the history of that file in the reduced dataset project?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry