Page 1 of 1
"Find Duplicates Individual" error in date
Posted: 03 Feb 2014 13:02
by tonyz
Hi,
When I run the above plugin and try to change the date from 1 Jan 1900 (by deleting the date) i get an "unrecognized date" error.
Acrually, my problem is that the "include and selected subset of the Individuals" appears to list only those records which have been edited since being imported from a gedcom file.
What I want is have the plugin run on and include ALL records. How do I do this?
Re: "Find Duplicates Individual" error in date
Posted: 03 Feb 2014 13:52
by tatewise
Hi Tony.
The default setting of the Plugin should include ALL records.
I presume all your records were created/updated after 1 Jan 1900 so they are all included.
To the right of the Include any Selected Subset of the Individuals button it lists the number of Records.
How does this number differ from the number of Individuals in your data that is listed in the status bar bottom right?
Re: "Find Duplicates Individual" error in date
Posted: 04 Feb 2014 08:26
by tonyz
Hi,
Thank you so much for your reply.
The number to the right of the 'Include any Selected Subset of the Individuals button' is very small (49) in relation to the actual number of records (over 120,000 individuals/70,000 families) in the project/file. (the project is validated by FH so there should be no corrupt structure etc)
Re: "Find Duplicates Individual" error in date
Posted: 04 Feb 2014 12:49
by tatewise
That is somewhat odd.
The Plugin
Find Duplicates tab should look like the screenshot below.
The
Updated from this Date box ringed in
red below should say
1 Jan 1900.
The number of
Records should match the number of
Individuals both ringed in
blue below.
(You may need to scroll the screenshot down to see the
Individuals value.)
You are saying that
49 Records and
Individuals: 120,000 are displayed simultaneously.

- FindDuplicatesTab.png (85.28 KiB) Viewed 11157 times
Try selecting the
Set Preferences tab and clicking the
Restore GUI Defaults button ringed in
red below.
Then return to the
Find Duplicates tab to see if that fixes the problem.

- SetPreferencesTab.png (77.26 KiB) Viewed 11157 times
If not then please
Close Plugin and select the FH
Records Window as shown below.
Click on the
Updated column heading highlighted in
yellow below.
This column may be further to the right than shown below, but what is the earliest date shown?
Is it earlier than
1 Jan 1900 ?

- RecordsUpdated.png (14.94 KiB) Viewed 11157 times
Re: "Find Duplicates Individual" error in date
Posted: 04 Feb 2014 14:29
by tonyz
Hi Mike,
Thank you again for such a quick and exhaustive reply.
(a) I did the Restore GUI defaults etc and the problem still remained.
(b) This may or may not be significant: Whilst in your example the "set the Updated from Date to this last run date" is shown as June 2013, in my program it is set to 1 Jan 1900... and I cannot change it!)
(c) The dates in the Records windows for all records (except 49 records) are null (ie no entry). And this is probably why only 49 records are being listed instead of all records.
So what is the best way to solve this?
Re: "Find Duplicates Individual" error in date
Posted: 04 Feb 2014 15:28
by tatewise
That is the problem, and I would be interested in any explanation you may have for how that came about.
I suspect the GEDCOM you originally imported had no Updated dates (CHAN Change tags) and those Records have never been amended.
The following method will reset Individual Records to today's date.
Select all the Individual Records with no Updated date.
Use Edit > Record Flags > New and enter a new Flag name such as Temp.
Then tick that Flag name (Temp) and click OK.
While all those records are still selected, use Edit > Record Flags again and untick the Flag name (Temp).
This should set the Updated dates.
To remove the Flag name use Tools > Work with Named Lists and Flags.
Select the Record Flag (Temp) and click Fag Status button.
Untick Always include this record flag in flag lists.
You should look at all the other Record types (Family, Note, Source, Repository, Multimedia, Submitter, Submission) and review their Updated dates.
Use the Records Window and click each tab in turn.
If necessary use Tools > Preferences > Records Window tab and set all Record Type Display Options to Always show to reveal all the tabs.
However, I know of no easy way to reset the Updated date other than to perform some updates.
This could be done by writing a Plugin if you felt computer literate enough to give it a try.
The date you cannot change is the last run date, which is the date the Plugin was last run, and is only set by the Plugin.
Re: "Find Duplicates Individual" error in date
Posted: 05 Feb 2014 20:59
by tatewise
Above I said:
I know of no easy way to reset the Updated date other than to perform some updates.
This could be done by writing a Plugin if you felt computer literate enough to give it a try.
I have written a small Plugin to perform this task for ALL record types if you are interested.
Re: "Find Duplicates Individual" error in date
Posted: 06 Feb 2014 11:10
by tonyz
HI,
Thank you so much for your reply. Some observations:
(a) The gedcom file I was importing did not have CHAN tag included. I ticked off the box to export this tag when setting up the gedcom file. I now re-exported the file (with the CHAN tag also exported) and the "Find Duplicates Individual" plugin is showing the correct number of individuals. Thank you so much for pointing me in the right direction.
(b) Even after importing the above new gedcom file (with the CHAN tag included), the value of "Set the Updated from Date to this last run date" is 1 Jan 1900 (it should be today's date?). Anyway, on a smaller project (say 2000 individuals) the date is today's date (which is correct). So my question is this: Is there a records/individual limit for this plugin? (See also C below)
(c) When I run the new large number of individuals project (as in B above) which has say 120,000 records, after some minutes I get a "stack overflow" error. (NB A different project with say 2000 individuals works ok.)
This error also occurs if I include a subset (say only 2472 records) of the large file. So, if there is a limit, do you have an indication of what this is?
Re: "Find Duplicates Individual" error in date
Posted: 06 Feb 2014 12:24
by tatewise
(a) OK.
(b) I suspect that to import the new GEDCOM you created a New Project. Therefore, for this new Project there is no Plugin history, and thus no successful last run Date, so it defaults to 1 Jan 1900. On the smaller Project you presumably ran the Plugin successfully today, so its records the last run Date as today.
Nevertheless, you can type any date into the lower last Updated from this Date box.
If the predecessor of your large Project still exists, and has Plugin history, then this can be copied to the new large Project.
The Plugin history is stored in files such as:
...\Family Historian Projects\{project}\{project}.fh_data\Plugin Data\Find Duplicate Individuals.*
(c) I am sorry that is happening. To my knowledge the Plugin has never be used on Projects larger than about 40,000 Individuals, so we may be venturing into the unknown. Regardless of any Selected Subset of the Individuals, the Plugin initially compiles a profile of every Individual in the database, but then assesses each one against each of those in the selected subset, excluding any nominated non-duplicates.
In compiling the profile of one Individual, the Plugin also profiles each Relative in turn (Parents, Spouses, Children).
But in so doing, it profiles the Relatives of each of those Relatives in a nested fashion.
This has been found to be efficient for large databases, but with large families results in a deeply nested stack of data, which I suspect is the cause of the stack overflow.
Could you give some more details please:
(1) What are your PC characteristics? - Windows version, CPU speed, RAM size?
(2) What progress bar messages are displayed, especially the last one leading up to the error message?
(3) Exactly what is the stack overflow error message? - Supply either a screenshot, or a transcript of the first few lines paying particular attention to any numbers.
Re: "Find Duplicates Individual" error in date
Posted: 06 Feb 2014 13:03
by tonyz
Hi,
Details requested follow:
(1) Notebook computer with 6gb free ram, 50gb free disk, Win7 home premium 64 bit, Intel i5-3210 at 2.5ghz,
(2) The screen displays "Loading Relation Pool" then after 3 minutes 40 seconds (with 1% done) a Stack Overflow error occurs as follows "stack overflow" (details below)
(3)stack traceback:
[C]: in function 'gsub'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:476: in function 'split'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'TblNamesData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
...
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'FindDuplicateRecords'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function <... Historian\Plugins\Find Duplicate Individuals.fh_lua:1693>
(tail call): ?
[C]: in function '?'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'GUI_MainDialogue'
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in main chunk"
-------------------------
PS The estimated time given by the program to finish is between 383 amd 1532 minutes (which may seem long but is acceptabel to me if I can then (painlessly!) merge the duplicate records!)
Re: "Find Duplicates Individual" error in date
Posted: 06 Feb 2014 14:58
by tatewise
Thank you for the feedback.
There seems to be no problem with your PC resources, so its looks like the large number of nested generations that is causing the stack overflow.
Roughly how many repeats are there of the traceback line:
... Historian\Plugins\Find Duplicate Individuals.fh_lua

in function 'IntPoolData'
With this info I could then suggest a patch that should restrict the depth of nesting.
Re: "Find Duplicates Individual" error in date
Posted: 07 Feb 2014 07:15
by tonyz
Hi,
Afraid I do not know how to get the info you requested. Where do I find this info? In my previous post I listed (copy & paste) all the info the error message gave.
Re: "Find Duplicates Individual" error in date
Posted: 07 Feb 2014 19:20
by tatewise
Hi Tony, don't worry about that info.
I have revised this Plugin quite often to try and accommodate large databases while minimising run time.
It is a complex process as I do not have a representative large database, so rely on others for testing out ideas.
I think I have found a workable solution that avoids the stack overflow problem and may reduce run time.
Please download this
V3.3 dated
7 Feb 2014 from my SkyDrive
Find Duplicate Individuals.
It should overwrite your existing Plugin but use the same options & settings.
You can always revert to the original
V3.3 from the
Plugin Store.
If any other users reading this who have a large database would like to give it a try then please do.
Re: "Find Duplicates Individual" error in date
Posted: 07 Feb 2014 19:38
by BillH
Mike,
I've only got about 15,000 individuals and 4,600 families in my tree, but thought I'd run this new "version" of the plugin to see what happened to run time. The 3.3 version in the plugin store runs consistently about 3:25 - 3:27 on my system. This "version" runs consistently about 3:46 - 3:49. So this "version" is a bit slower on my system.
Bill
Re: "Find Duplicates Individual" error in date
Posted: 07 Feb 2014 20:28
by tatewise
Thanks for that feedback Bill. I assume the Result Set was the same for both variants.
On my 2,000 Individuals both variants run in the same time of 12 secs, but in debug mode the new variant is a bit faster.
If for some it is a bit slower, that may be the unavoidable penalty for preventing the stack overflow problem.
Benefits of this new variant are:
The progress messages are simply Individual Record Id without any Loading Relation Pool messages.
The % progress is more regular (i.e. the 50% elapsed time is closer to half the 100% time than before).
The stack overflow error reported by Tony cannot now arise.
Re: "Find Duplicates Individual" error in date
Posted: 10 Feb 2014 07:11
by tonyz
Hi Mike,
Update on the 'new' plugin version:
a) downloaded from skydrive and installed ok. Run the application on database of 150,000 records (all 'linked' in one awy or another but known to have multiple duplicates)
b) Initial Running time was as follows:
10 minutes - 15,500 individuals
23 minutes - 25,000 individuals
3hrs 53 minutes - 76,000 individuals
7hrs 20minutes - 113,000 individuals
10hrs 14minutes - 142,000 individuals
c) Run the application again on the same file etc but with a subset of only 1311 records. The following may be interesting:
5minutes - 17,500 individuals
14 minutes - 25,000 individuals
1hrs 57minutes - 70,000 individuals
3hrs 47minutes - 98,000 individuals
5hrs 23minutes - 113,000 individuals
Initial checking of duplicates indicates plugin is working ok!
Thank you.
Re: "Find Duplicates Individual" error in date
Posted: 10 Feb 2014 12:46
by tatewise
That is great feedback Tony.
The long run time is inevitable when you consider the comparisons needed for 150,000 Individuals.
The formula for every Individual to be compared against every other Individual is:
X * X-1 / 2 = 150,000 * 149,999 / 2 = 11,249,925,000 comparisons!
That reduces significantly with a subset of 1311 Individuals to be compared against all 150,000.
I will incorporate the changes with the next release of the Plugin to the Plugin Store.
Re: "Find Duplicates Individual" error in date
Posted: 02 Apr 2014 10:06
by tatewise
The above changes are now incorporated into V3.4 in the Plugin Store.
I would be interested in feedback about run-times for large databases, which I might have improved??
Re: "Find Duplicates Individual" error in date
Posted: 02 Apr 2014 15:37
by BillH
Mike,
I ran
Check Installed Plugins Against the Store. It found that there was a new version for
Find Duplicate Individuals and then gave me the following error:

- image1.jpg (13.44 KiB) Viewed 10736 times
Bill
Re: "Find Duplicates Individual" error in date
Posted: 02 Apr 2014 16:36
by Jane
That looks like a time out error on the download, I would try it again a bit later.
Re: "Find Duplicates Individual" error in date
Posted: 02 Apr 2014 16:51
by BillH
Thanks Jane. I tried again just now and it worked fine.
Bill