* "Find Duplicates Individual" error in date

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
avatar
tonyz
Gold
Posts: 13
Joined: 23 Jul 2012 10:28
Family Historian: V5

"Find Duplicates Individual" error in date

Post by tonyz » 03 Feb 2014 13:02

Hi,

When I run the above plugin and try to change the date from 1 Jan 1900 (by deleting the date) i get an "unrecognized date" error.

Acrually, my problem is that the "include and selected subset of the Individuals" appears to list only those records which have been edited since being imported from a gedcom file.
What I want is have the plugin run on and include ALL records. How do I do this?

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 03 Feb 2014 13:52

Hi Tony.
The default setting of the Plugin should include ALL records.
I presume all your records were created/updated after 1 Jan 1900 so they are all included.

To the right of the Include any Selected Subset of the Individuals button it lists the number of Records.
How does this number differ from the number of Individuals in your data that is listed in the status bar bottom right?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
tonyz
Gold
Posts: 13
Joined: 23 Jul 2012 10:28
Family Historian: V5

Re: "Find Duplicates Individual" error in date

Post by tonyz » 04 Feb 2014 08:26

Hi,

Thank you so much for your reply.

The number to the right of the 'Include any Selected Subset of the Individuals button' is very small (49) in relation to the actual number of records (over 120,000 individuals/70,000 families) in the project/file. (the project is validated by FH so there should be no corrupt structure etc)

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 04 Feb 2014 12:49

That is somewhat odd.
The Plugin Find Duplicates tab should look like the screenshot below.
The Updated from this Date box ringed in red below should say 1 Jan 1900.
The number of Records should match the number of Individuals both ringed in blue below.
(You may need to scroll the screenshot down to see the Individuals value.)
You are saying that 49 Records and Individuals: 120,000 are displayed simultaneously.
FindDuplicatesTab.png
FindDuplicatesTab.png (85.28 KiB) Viewed 11169 times
Try selecting the Set Preferences tab and clicking the Restore GUI Defaults button ringed in red below.
Then return to the Find Duplicates tab to see if that fixes the problem.
SetPreferencesTab.png
SetPreferencesTab.png (77.26 KiB) Viewed 11169 times
If not then please Close Plugin and select the FH Records Window as shown below.
Click on the Updated column heading highlighted in yellow below.
This column may be further to the right than shown below, but what is the earliest date shown?
Is it earlier than 1 Jan 1900 ?
RecordsUpdated.png
RecordsUpdated.png (14.94 KiB) Viewed 11169 times
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
tonyz
Gold
Posts: 13
Joined: 23 Jul 2012 10:28
Family Historian: V5

Re: "Find Duplicates Individual" error in date

Post by tonyz » 04 Feb 2014 14:29

Hi Mike,
Thank you again for such a quick and exhaustive reply.

(a) I did the Restore GUI defaults etc and the problem still remained.

(b) This may or may not be significant: Whilst in your example the "set the Updated from Date to this last run date" is shown as June 2013, in my program it is set to 1 Jan 1900... and I cannot change it!)

(c) The dates in the Records windows for all records (except 49 records) are null (ie no entry). And this is probably why only 49 records are being listed instead of all records.

So what is the best way to solve this?

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 04 Feb 2014 15:28

That is the problem, and I would be interested in any explanation you may have for how that came about.
I suspect the GEDCOM you originally imported had no Updated dates (CHAN Change tags) and those Records have never been amended.

The following method will reset Individual Records to today's date.

Select all the Individual Records with no Updated date.
Use Edit > Record Flags > New and enter a new Flag name such as Temp.
Then tick that Flag name (Temp) and click OK.

While all those records are still selected, use Edit > Record Flags again and untick the Flag name (Temp).
This should set the Updated dates.

To remove the Flag name use Tools > Work with Named Lists and Flags.
Select the Record Flag (Temp) and click Fag Status button.
Untick Always include this record flag in flag lists.

You should look at all the other Record types (Family, Note, Source, Repository, Multimedia, Submitter, Submission) and review their Updated dates.
Use the Records Window and click each tab in turn.
If necessary use Tools > Preferences > Records Window tab and set all Record Type Display Options to Always show to reveal all the tabs.
However, I know of no easy way to reset the Updated date other than to perform some updates.
This could be done by writing a Plugin if you felt computer literate enough to give it a try.

The date you cannot change is the last run date, which is the date the Plugin was last run, and is only set by the Plugin.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 05 Feb 2014 20:59

Above I said:
I know of no easy way to reset the Updated date other than to perform some updates.
This could be done by writing a Plugin if you felt computer literate enough to give it a try.
I have written a small Plugin to perform this task for ALL record types if you are interested.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
tonyz
Gold
Posts: 13
Joined: 23 Jul 2012 10:28
Family Historian: V5

Re: "Find Duplicates Individual" error in date

Post by tonyz » 06 Feb 2014 11:10

HI,
Thank you so much for your reply. Some observations:

(a) The gedcom file I was importing did not have CHAN tag included. I ticked off the box to export this tag when setting up the gedcom file. I now re-exported the file (with the CHAN tag also exported) and the "Find Duplicates Individual" plugin is showing the correct number of individuals. Thank you so much for pointing me in the right direction.

(b) Even after importing the above new gedcom file (with the CHAN tag included), the value of "Set the Updated from Date to this last run date" is 1 Jan 1900 (it should be today's date?). Anyway, on a smaller project (say 2000 individuals) the date is today's date (which is correct). So my question is this: Is there a records/individual limit for this plugin? (See also C below)

(c) When I run the new large number of individuals project (as in B above) which has say 120,000 records, after some minutes I get a "stack overflow" error. (NB A different project with say 2000 individuals works ok.)
This error also occurs if I include a subset (say only 2472 records) of the large file. So, if there is a limit, do you have an indication of what this is?

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 06 Feb 2014 12:24

(a) OK.

(b) I suspect that to import the new GEDCOM you created a New Project. Therefore, for this new Project there is no Plugin history, and thus no successful last run Date, so it defaults to 1 Jan 1900. On the smaller Project you presumably ran the Plugin successfully today, so its records the last run Date as today.

Nevertheless, you can type any date into the lower last Updated from this Date box.

If the predecessor of your large Project still exists, and has Plugin history, then this can be copied to the new large Project.
The Plugin history is stored in files such as:
...\Family Historian Projects\{project}\{project}.fh_data\Plugin Data\Find Duplicate Individuals.*

(c) I am sorry that is happening. To my knowledge the Plugin has never be used on Projects larger than about 40,000 Individuals, so we may be venturing into the unknown. Regardless of any Selected Subset of the Individuals, the Plugin initially compiles a profile of every Individual in the database, but then assesses each one against each of those in the selected subset, excluding any nominated non-duplicates.

In compiling the profile of one Individual, the Plugin also profiles each Relative in turn (Parents, Spouses, Children).
But in so doing, it profiles the Relatives of each of those Relatives in a nested fashion.
This has been found to be efficient for large databases, but with large families results in a deeply nested stack of data, which I suspect is the cause of the stack overflow.

Could you give some more details please:
(1) What are your PC characteristics? - Windows version, CPU speed, RAM size?
(2) What progress bar messages are displayed, especially the last one leading up to the error message?
(3) Exactly what is the stack overflow error message? - Supply either a screenshot, or a transcript of the first few lines paying particular attention to any numbers.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
tonyz
Gold
Posts: 13
Joined: 23 Jul 2012 10:28
Family Historian: V5

Re: "Find Duplicates Individual" error in date

Post by tonyz » 06 Feb 2014 13:03

Hi,
Details requested follow:

(1) Notebook computer with 6gb free ram, 50gb free disk, Win7 home premium 64 bit, Intel i5-3210 at 2.5ghz,

(2) The screen displays "Loading Relation Pool" then after 3 minutes 40 seconds (with 1% done) a Stack Overflow error occurs as follows "stack overflow" (details below)

(3)stack traceback:
[C]: in function 'gsub'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:476: in function 'split'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2064: in function 'TblNamesData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2158: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
...
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2185: in function 'IntPoolData'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2527: in function 'FindDuplicateRecords'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:1698: in function <... Historian\Plugins\Find Duplicate Individuals.fh_lua:1693>
(tail call): ?
[C]: in function '?'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2000: in function 'GUI_MainDialogue'
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2802: in main chunk"


-------------------------

PS The estimated time given by the program to finish is between 383 amd 1532 minutes (which may seem long but is acceptabel to me if I can then (painlessly!) merge the duplicate records!)

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 06 Feb 2014 14:58

Thank you for the feedback.
There seems to be no problem with your PC resources, so its looks like the large number of nested generations that is causing the stack overflow.

Roughly how many repeats are there of the traceback line:
... Historian\Plugins\Find Duplicate Individuals.fh_lua:2158: in function 'IntPoolData'
With this info I could then suggest a patch that should restrict the depth of nesting.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
tonyz
Gold
Posts: 13
Joined: 23 Jul 2012 10:28
Family Historian: V5

Re: "Find Duplicates Individual" error in date

Post by tonyz » 07 Feb 2014 07:15

Hi,

Afraid I do not know how to get the info you requested. Where do I find this info? In my previous post I listed (copy & paste) all the info the error message gave.

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 07 Feb 2014 19:20

Hi Tony, don't worry about that info.

I have revised this Plugin quite often to try and accommodate large databases while minimising run time.
It is a complex process as I do not have a representative large database, so rely on others for testing out ideas.

I think I have found a workable solution that avoids the stack overflow problem and may reduce run time.
Please download this V3.3 dated 7 Feb 2014 from my SkyDrive Find Duplicate Individuals.

It should overwrite your existing Plugin but use the same options & settings.
You can always revert to the original V3.3 from the Plugin Store.

If any other users reading this who have a large database would like to give it a try then please do.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2184
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: "Find Duplicates Individual" error in date

Post by BillH » 07 Feb 2014 19:38

Mike,

I've only got about 15,000 individuals and 4,600 families in my tree, but thought I'd run this new "version" of the plugin to see what happened to run time. The 3.3 version in the plugin store runs consistently about 3:25 - 3:27 on my system. This "version" runs consistently about 3:46 - 3:49. So this "version" is a bit slower on my system.

Bill

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 07 Feb 2014 20:28

Thanks for that feedback Bill. I assume the Result Set was the same for both variants.
On my 2,000 Individuals both variants run in the same time of 12 secs, but in debug mode the new variant is a bit faster.
If for some it is a bit slower, that may be the unavoidable penalty for preventing the stack overflow problem.
Benefits of this new variant are:
The progress messages are simply Individual Record Id without any Loading Relation Pool messages.
The % progress is more regular (i.e. the 50% elapsed time is closer to half the 100% time than before).
The stack overflow error reported by Tony cannot now arise.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
tonyz
Gold
Posts: 13
Joined: 23 Jul 2012 10:28
Family Historian: V5

Re: "Find Duplicates Individual" error in date

Post by tonyz » 10 Feb 2014 07:11

Hi Mike,

Update on the 'new' plugin version:
a) downloaded from skydrive and installed ok. Run the application on database of 150,000 records (all 'linked' in one awy or another but known to have multiple duplicates)
b) Initial Running time was as follows:
10 minutes - 15,500 individuals
23 minutes - 25,000 individuals
3hrs 53 minutes - 76,000 individuals
7hrs 20minutes - 113,000 individuals
10hrs 14minutes - 142,000 individuals

c) Run the application again on the same file etc but with a subset of only 1311 records. The following may be interesting:
5minutes - 17,500 individuals
14 minutes - 25,000 individuals
1hrs 57minutes - 70,000 individuals
3hrs 47minutes - 98,000 individuals
5hrs 23minutes - 113,000 individuals

Initial checking of duplicates indicates plugin is working ok!

Thank you.

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 10 Feb 2014 12:46

That is great feedback Tony.

The long run time is inevitable when you consider the comparisons needed for 150,000 Individuals.
The formula for every Individual to be compared against every other Individual is:
X * X-1 / 2 = 150,000 * 149,999 / 2 = 11,249,925,000 comparisons!

That reduces significantly with a subset of 1311 Individuals to be compared against all 150,000.

I will incorporate the changes with the next release of the Plugin to the Plugin Store.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: "Find Duplicates Individual" error in date

Post by tatewise » 02 Apr 2014 10:06

The above changes are now incorporated into V3.4 in the Plugin Store.

I would be interested in feedback about run-times for large databases, which I might have improved??
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2184
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: "Find Duplicates Individual" error in date

Post by BillH » 02 Apr 2014 15:37

Mike,

I ran Check Installed Plugins Against the Store. It found that there was a new version for Find Duplicate Individuals and then gave me the following error:
image1.jpg
image1.jpg (13.44 KiB) Viewed 10748 times
Bill

User avatar
Jane
Site Admin
Posts: 8442
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Re: "Find Duplicates Individual" error in date

Post by Jane » 02 Apr 2014 16:36

That looks like a time out error on the download, I would try it again a bit later.
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."

User avatar
BillH
Megastar
Posts: 2184
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: "Find Duplicates Individual" error in date

Post by BillH » 02 Apr 2014 16:51

Thanks Jane. I tried again just now and it worked fine.

Bill

Post Reply