Page 1 of 1
Find Duplicate Individuals; why so long time?
Posted: 16 Sep 2021 11:54
by torleiffh
A project with 127000 individuals.
Selected a subset of only 23 records.
Plugin says Estimated run time is only a few seconds, but it took over 50 minutes.
Why this long time?
Why not search only the 23 records?
Re: Find Duplicate Individuals; why so long time?
Posted: 16 Sep 2021 12:07
by Ron Melby
nearly 3 million comparisons.
thats a little over 53000 comparisons per minute.
Re: Find Duplicate Individuals; why so long time?
Posted: 16 Sep 2021 14:23
by tatewise
127,000 x 23 is actually just over 2.9 million and dividing by 50 gives over 58 thousand per minute.
But the plugin is doing more than that as it also compares close relatives to improve the hit rate and reduce false positives, so it is making much more than 2.9 million comparisons.
Have you checked the plugin Help & Advice button, which explains how it works?
There are also tips in the FAQ to avoid the long run-time on large GEDCOM databases.
After it has been run once the subsequent runs will be much faster as only changed records are considered.
I'm not sure why its estimated run time is so much less than the actual run time but it is only a guess and depends on many factors including your PC performance and the complexity of your Individual records.
Were the results useful?
If you only wanted the 23 Individual records to be compared amongst themselves then you would have to export a GEDCOM containing just those 23 and their close family records. Afterwards, the duplicate records would need to be fed back into your master project.
Re: Find Duplicate Individuals; why so long time?
Posted: 16 Sep 2021 14:50
by torleiffh
screenshots from the run:

- Skjermbilde 2021-09-16 133317.png (8.97 KiB) Viewed 2052 times

- Skjermbilde 2021-09-16 124557.png (27.72 KiB) Viewed 2052 times
The results were OK, but I still don't understand the subset of 23 included.
If I did not select the 23, there are 127000x127000 comparisons?
Re: Find Duplicate Individuals; why so long time?
Posted: 16 Sep 2021 16:20
by tatewise
Yes, if you did not select the 23, there would be 127000x127000 comparisons.
If you tick the top option when you run the plugin again it should be much faster.
If you do NOT tick that option and do NOT select that subset of 23 records, what is the estimated run time?
I can only apologise for it getting its estimate too low.
BTW: I am fascinated how you got those screenshots saying the Plugin Not Run Yet when you have clearly already run it for 50 minutes and got some results. Did you use Edit > Undo Plugin Updates?
Re: Find Duplicate Individuals; why so long time?
Posted: 17 Sep 2021 16:34
by torleiffh
Just started again, now saying 764 minutes ...
I just took a screen shot with windows' Clip-and-draw.

- Skjermbilde 2021-09-17 183214.png (27.16 KiB) Viewed 1965 times
Re: Find Duplicate Individuals; why so long time?
Posted: 17 Sep 2021 17:36
by tatewise
It is actually saying between 191 and 764 minutes.
Large numbers of comparisons inevitably take a long time.