* Find Duplicate Individuals (All Relations)

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 15 Nov 2012 14:01

All previous Find Duplicate Individuals versions only considered each Individual's first Mother, Father, Spouse, and Child.
This could lead to duplicates being missed if for example the 1st Child of one Individual is entered as the 2nd Child of the other Individual.

I have developed a version that considers ALL Mothers, Fathers, Spouses, and Children, and uses the score from the best matching pairs.
However, I am concerned that it may extend the Plugin run-time.
This version is available from WiP find_duplicate_individuals.

It will install alongside the Plugin Store Version 3.0, but uses the same Preferences, and Omit Non-Duplicates list, etc.
Both versions will overwrite the same saved Result Set data file.

Please let me know how run-time is affected, and if there are any problems, or improvements that you notice.

ID:6590
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
johnmorrisoniom
Megastar
Posts: 882
Joined: 18 Dec 2008 07:40
Family Historian: V7
Location: Isle of Man

Find Duplicate Individuals (All Relations)

Post by johnmorrisoniom » 15 Nov 2012 19:10

Hi Mike,
Just tried it on my main tree of 32500+ Run time 1949 secs (32.5 Mins) so no major impact on run time.
Highest score 55.
Highest Ind score 37.

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals (All Relations)

Post by BillH » 15 Nov 2012 20:34

Mike,

On my system, when I downloaded the new version, it replaced the old version since it had the same name. This was just as fast as the old version.

On my tree with 10,327 individuals, it ran in 1 min 25 sec. The highest score was 42 and the highest individual score was 29.

I did find an individual where he was shown as the first child in one family and the second child in the other.

Bill

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 15 Nov 2012 21:24

Bill, sorry, I guess you never downloaded the Plugin Store Version named Find Duplicate Individuals, and still had the old WiP version named find_duplicate_individuals.

You can still obtain Find Duplicate Individuals Version 3.0 from the FH Plugin Store.

With positive feedback from you both, I plan to put this latest version into the Plugin Store next week, unless anyone else has problems.

The change for multiple instances of relations was fairly straight-forward.
However, the Marriage Event Dates/Places have a small residual problem.
The Plugin data currently only supports one Date/Place entry for a Marriage per Individual.
With only the 1st Spouse being considered, this was not a problem.
But multiple Spouses can result in multiple Marriage Dates/Places.
At present the Plugin utilises the Marriage Date/Place of the earliest Spouse for which a Marriage Date/Place exists.
When synthesising missing Dates, the Plugin only uses a Marriage Date to synthesis other Dates, if there is only one Spouse.
I suspect none of the above limitations have a significant impact on detecting duplicates.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals (All Relations)

Post by BillH » 15 Nov 2012 21:29

Mike,

No problem at all. I just had it override the old version. I just now cleaned out the old test version and downloaded and installed version 3.0 from the plugin store. I look forward to getting 3.1 from the store once it is out there.

Thanks,

Bill

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 21 Nov 2012 15:18

There is a revised Find Duplicate Individuals V3.1a in the WiP page.

This download will overwrite the previous V3.1 WiP download of find_duplicate_individuals.

This version not only considers all instances of Relations but also all instances of Events.

So if there are multiple Birth or Baptism/Christening or Marriage or Death/Burial Events then the best matches will be used in the Results.

Can you please assess its impact on run-time on your large data bases, or any other side effects.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals (All Relations)

Post by BillH » 21 Nov 2012 22:47

Mike,

On my database of 10,340, version 3.0 ran in 83 seconds and version 3.1a ran in 81 seconds. I don't have 3.1 still installed to compare it to.

Bill

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 23 Nov 2012 11:42

Plugin Store Find Duplicate Individuals V3.1 is now available.

It is a slight upgrade from the V3.1a in WiP, with some small changes to improve run-time, and function prototypes to improve the LUA code structure.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
johnmorrisoniom
Megastar
Posts: 882
Joined: 18 Dec 2008 07:40
Family Historian: V7
Location: Isle of Man

Find Duplicate Individuals (All Relations)

Post by johnmorrisoniom » 23 Nov 2012 17:39

Mike,
At one of the version two stages it was possible to keepe the plugin window open whilst checking , to make life easier for adding pairs to the omit data. This seems to have disappeared, but I can't place which version it went away. Any chance of this returning, or did it interfere with something else?

User avatar
gerrynuk
Megastar
Posts: 565
Joined: 25 Apr 2007 09:21
Family Historian: V6
Location: Welwyn Garden City
Contact:

Find Duplicate Individuals (All Relations)

Post by gerrynuk » 24 Nov 2012 12:12

tatewise said:
Plugin Store Find Duplicate Individuals V3.1 is now available.
Mike,

Just downloaded the latest version. Whilst it runs OK the Progress Bar/%age/Time displays do not update. The Individual ID display does update.

Not really bothered myself and perhaps an option to turn off the progress bar/%age/Time might speed things up even more?

Gerry

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 24 Nov 2012 12:30

John ~ I don't believe that would have ever been possible, because to display the Result Set any Plugin must close.
If you open any Plugin, then FH is locked, and any Result Set on display cannot be manipulated.
I wonder if you are recalling the ability to add pairs of  Individuals from the Result Set to a Named List named Non-Duplicates.
That technique was removed from V3 because it didn't just exclude the pair, but excluded the two Individuals separately, which users did not like.
Have you seen FAQ How do I Avoid Long Run-Times?

Gerry ~ I have once or twice today experienced the same Progress Bar problem, but not sure why.
I will investigate.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 25 Nov 2012 00:08

There is a revised Find Duplicate Individuals V3.1b in the WiP page.

This download will overwrite the previous V3.1a WiP download of find_duplicate_individuals.

It uses a revised method of loading its database prior to running comparisons, which seems faster.

This method also avoids a bug in ~FATH> & ~MOTH> shortcuts that don't find both members of same sex partnerships.

It also fixes the Progress Bar bug reported by Gerry.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
gerrynuk
Megastar
Posts: 565
Joined: 25 Apr 2007 09:21
Family Historian: V6
Location: Welwyn Garden City
Contact:

Find Duplicate Individuals (All Relations)

Post by gerrynuk » 25 Nov 2012 17:45

tatewise said:
There is a revised Find Duplicate Individuals V3.1b in the WiP page.

....

It also fixes the Progress Bar bug reported by Gerry.
Thanks for the quick response, Mike.

User avatar
gerrynuk
Megastar
Posts: 565
Joined: 25 Apr 2007 09:21
Family Historian: V6
Location: Welwyn Garden City
Contact:

Find Duplicate Individuals (All Relations)

Post by gerrynuk » 25 Nov 2012 19:46

Gerry Newnham said:

tatewise said:
There is a revised Find Duplicate Individuals V3.1b in the WiP page.

....

It also fixes the Progress Bar bug reported by Gerry.

Thanks for the quick response, Mike.
Sorry, Mike, I think I spoke to soon. The plugin hangs at loading database.

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 26 Nov 2012 11:27

Gerry ~ Can you give me some details of your database:
- What is its size in terms of Individuals & Families ?
- How long did previous versions of Plugin take to complete ?
- Does Plugin Store Version 3.1 run OK ?

Please try revised Find Duplicate Individuals V3.1c in the WiP page, which adds progress messages to the Loading Records phase.

These last two versions of the Plugin use a new recursive technique while Loading Records Database, which may be upsetting the FH LUA interpreter.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
gerrynuk
Megastar
Posts: 565
Joined: 25 Apr 2007 09:21
Family Historian: V6
Location: Welwyn Garden City
Contact:

Find Duplicate Individuals (All Relations)

Post by gerrynuk » 26 Nov 2012 11:54

tatewise said:
Gerry ~ Can you give me some details of your database:
- What is its size in terms of Individuals & Families ?
- How long did previous versions of Plugin take to complete ?
- Does Plugin Store Version 3.1 run OK ?

Please try revised Find Duplicate Individuals V3.1c in the WiP page, which adds progress messages to the Loading Records phase.

These last two versions of the Plugin use a new recursive technique while Loading Records Database, which may be upsetting the FH LUA interpreter.
Mike,

The database has 7314 individuals and 2077 families. Version 3.1 (listed as version 8 on the plugin popup menu) runs in 2m 25 secs.

v3.1c (listed as version 10) stalls at 'Loading Record ID 1'.

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 26 Nov 2012 12:46

Gerry ~ thank you for the feedback.

I don't understand, what is 'the plugin popup menu' and where does it list Version numbers?

Could you give some details of Individual Record Id 1:
Does it have any unusual or multiple Birth, Baptism, Christening, Marriage, Death, or Burial Events?
Does it have any unusual or multiple Father, Mother, Spouse, or Child Relations?
Or do any of these Relations lead to any unusual or multiple Relations?
Roughly how many generations up or down do these Relations extend?

I might need to post a version of the Plugin with more diagnostic progress messages in order to discover the problem.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
gerrynuk
Megastar
Posts: 565
Joined: 25 Apr 2007 09:21
Family Historian: V6
Location: Welwyn Garden City
Contact:

Find Duplicate Individuals (All Relations)

Post by gerrynuk » 26 Nov 2012 13:25

tatewise said:
Gerry ~ thank you for the feedback.

I don't understand, what is 'the plugin popup menu' and where does it list Version numbers?
Mike - screen shot attached of Plugin Popup Menu:

Image
Could you give some details of Individual Record Id 1:
Does it have any unusual or multiple Birth, Baptism, Christening, Marriage, Death, or Burial Events?
Does it have any unusual or multiple Father, Mother, Spouse, or Child Relations?
Or do any of these Relations lead to any unusual or multiple Relations?
Roughly how many generations up or down do these Relations extend?
Id1 is 4 generations down from the root (me!), had 2 parents and one spouse and 10 children. He has only one of each of the events you mention. He has never given problems before and I haven't changed his details for some time.

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 26 Nov 2012 15:11

Gerry - Those Plugin filename suffixes are caused by the download cache in your browser, and can be ignored.

It is quite convenient, as you can run any of the recent Plugin versions.
Eventually they can all be Deleted by right-clicking the Plugin name.

Please try Find Duplicate Individuals V3.1d in the WiP page, which has revised progress messages for the Loading Records phase.
I think it may have simply been that I put the progress messages in the wrong place.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
gerrynuk
Megastar
Posts: 565
Joined: 25 Apr 2007 09:21
Family Historian: V6
Location: Welwyn Garden City
Contact:

Find Duplicate Individuals (All Relations)

Post by gerrynuk » 26 Nov 2012 19:42

tatewise said:
Gerry -
....
Please try Find Duplicate Individuals V3.1d in the WiP page, which has revised progress messages for the Loading Records phase.
I think it may have simply been that I put the progress messages in the wrong place.
Mike, that version seems to work OK. It took just under 2  mins to load the records and then a further minute to check them.

I notice that the progress bar wasn't updated until the checking phase. Is this intended?

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 26 Nov 2012 22:48

Please can you all try Find Duplicate Individuals V3.1e via the WiP page, which has revised the Progress Bar messages.

Now that I have a better understanding of how various size databases interact with the Plugin revised design, I have altered the Progress Bar methodology to both maintain useful progress messages, and minimise their effect on run-time.
The net effect should be reduced run-time compared with earlier versions ~ I hope!

I have also made the Record Id digits fixed width with leading zeros. Is this preferable?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals (All Relations)

Post by BillH » 26 Nov 2012 23:08

Mike,

On my database of 10,344 individuals, version 3.1e takes 88 seconds. Version 3.1 takes 80 seconds. Just a bit slower in 3.1e, but still very fast.

It doesn't really matter, but I like it without the leading zeroes.

The progress window keeps taking back focus so that I can't do anything else on my computer while the plugin runs. Has it always been that way?

Bill

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 27 Nov 2012 11:50

Can all of you please give me a rough idea of how long the Loading Records phase takes versus the Scoring Records phase?
Please also provide the number of Individuals in your database.

The leading zeros was an experiment to stop the Progress Bar message jiggling as the number of Record Id digits change.
I will use a different technique in the next version that has no leading zeros.

I think the Progress Bar taking focus method has been present since Version 2.4.
However, I can see it is annoying, so will change it back in next Version.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 27080
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Find Duplicate Individuals (All Relations)

Post by tatewise » 27 Nov 2012 12:42

Please can you all try Find Duplicate Individuals V3.1f via the WiP page, which has revised the Progress Bar messages again, plus a few other minor changes, as mentioned above.

When convenient, can you please supply the information requested in my previous posting above.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 2179
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Find Duplicate Individuals (All Relations)

Post by BillH » 27 Nov 2012 17:31

Mike,

On my database of 10,345, version 3.1f took 25 seconds for the loading phase and 63 seconds for the scoring phase for a total of 88 seconds.

I found what might be a bug. I clicked on the Stop Finding Duplicates button when it was at 2 seconds and it kept going until it got to about 55 seconds before it stopped. It used to stop immediately.

It doesn't really matter, but I think I liked the way the record id used to count up from 1 sequentially better than the way 3.1f displays the record id's in no particular order.

Bill

Post Reply