Page 3 of 3
Find Duplicate Individuals (All Relations)
Posted: 03 Dec 2012 12:10
by tatewise
Gerry & John ~ Your assessments are remarkably similar, given that the results are rounded down to the nearest second.
The ratio between FH API and LUA Script is approaching 10 to 1 on both PC.
Whereas, Bill & my results have a ratio much nearer 1 to 1.
John's 2.4 GHz Quad-Core CPU is very similar to my 2.6 GHz Dual-Core CPU, yet the FH API time per 1,000 Individuals is about 10 times mine (& Bill's).
How much RAM do you each have on your PC?
Or maybe it is a Windows XP versus Windows 7 characteristic?
Find Duplicate Individuals (All Relations)
Posted: 03 Dec 2012 17:33
by BillH
Mike,
My system:
Intel Core i7-2600 CPU @ 3.40 GHz (quad core)
8 GB memory
Windows 7 Home Premium 64 bit
Bill
Find Duplicate Individuals (All Relations)
Posted: 03 Dec 2012 17:42
by johnmorrisoniom
I will try on my home machine later to see if it makes a difference.
The Win XP machine has 4Gb ram, but is only 32 bit so can only access 3Gb.
My desktop at home is a similar machine but is running W7 64bit pro.
My laptop is an i3 with 4gb ram, again running W7 64 bit pro
Find Duplicate Individuals (All Relations)
Posted: 03 Dec 2012 19:12
by johnmorrisoniom
The results off my laptop:
an i3 (2.53ghz) 4Gb Ram W7 64 bit

Find Duplicate Individuals (All Relations)
Posted: 04 Dec 2012 07:36
by johnmorrisoniom
This time with my 2.4ghz Quad core running W7 pro 64bit: 4GB Ram.

Find Duplicate Individuals (All Relations)
Posted: 04 Dec 2012 10:52
by tatewise
John ~ Those assessments are both about 3 to 1, which suggests Windows 7 is significantly better than Windows XP given that CPU and RAM are similar on all PC.
I recall you had a sub-set 4575 Individuals database for experiments at the end of October, which is roughly the same order of size as Bill's & Gerry's database.
Is it possible to assess that smaller database on your PC to see if the size is a significant factor.
I have just run the assessment on an old Win XP laptop, 1.6 GHz CPU, 1 GB RAM, 2000 Individuals, FH API = 6 secs, LUA script 4 secs.
Find Duplicate Individuals (All Relations)
Posted: 04 Dec 2012 15:31
by johnmorrisoniom
I had deleted my reduced set, but have recreated again.
Test results from my XP SP3 32 bit .

Find Duplicate Individuals (All Relations)
Posted: 04 Dec 2012 15:49
by johnmorrisoniom
The other part of the data, all of whom are in pool 1 produced this on the XP machine.
I will test the W7 machines later tonight
Find Duplicate Individuals (All Relations)
Posted: 04 Dec 2012 20:10
by gerrynuk
tatewise said:
Gerry & John ~ ...
How much RAM do you each have on your PC?
...
1 GB
Find Duplicate Individuals (All Relations)
Posted: 05 Dec 2012 18:24
by johnmorrisoniom
Hi Mike,
Results for the reduced set on my i3 Laptop as follows:

Find Duplicate Individuals (All Relations)
Posted: 05 Dec 2012 20:42
by johnmorrisoniom
Hi Mike,
Results from My desktop 2.4g quad core W7 64 bit on the reduced set.
I am going to re-run the win Xp test again tomorrow, to check an idea I have had.
Find Duplicate Individuals (All Relations)
Posted: 06 Dec 2012 10:45
by johnmorrisoniom
Hi Mike,
2nd Run on my XP machine on the reduced set.
The only difference in the file between run 1 and run 2, is that I had realised I had left all the media in the folder from splitting tree, so these were then deleted.Run 2 has 2700+ less media files (pdf and word but mostly jpeg) connected to the file
Find Duplicate Individuals (All Relations)
Posted: 06 Dec 2012 12:04
by tatewise
An analysis of the assessments to date are summarised below.
The first column gives the database size in
Individuals.
The next two columns give the
LUA Script run times, and their normalised values per 1000 Individuals.
Remember the LUA Script loops in proportion to the number of Individuals, but without any FH API access.
The times are remarkably consistent, regardless of OS & RAM.
The subsequent columns give the
FH API run times, and normalised values per 1000 Individuals for Windows 7 & XP.
Windows 7 is on a par with the LUA Script, unless the RAM is too small for the database size.
Windows XP is significantly slower, especially if the RAM is too small for the database size.
However, these timings are only significant when running intensive global database Plugins such as
Find Duplicate Individuals and
Map Life Facts and perhaps
Show Project Statistics and the various
Search ... Plugins.
Beware that when FH opens a Project it starts loading Multimedia image thumbnails, which can take many seconds.
Running a Plugin during this phase can have detrimental effects.
I have found that it can often cause FH to crash with a
Family Historian has stopped working message with details involving
BEX and
MSVCR100.dll.
Find Duplicate Individuals (All Relations)
Posted: 06 Dec 2012 14:27
by gerrynuk
Mike,
Thanks for all your hard work on this study, as well, of course, in writing the programs and plugins in the first place.
Although I run FH on an iMac with an emulator (Parallels), the main impact seems to be the amount of memory rather than the processor speed. Would that be a fair conclusion? Is it possible to conclude what the overall impact would be if I doubled the ram from 1 to 2 GB?
Find Duplicate Individuals (All Relations)
Posted: 06 Dec 2012 15:29
by tatewise
It is difficult to be certain what the gain might be, especially since your iMac/Parallels set-up is so different from the others.
If you are going to go through the hassle of adding RAM why not go for more?
A quick Google search suggests it does not have to be Apple RAM, which looks very expensive, as there are compatibles available.
It also looks as if you might get more gain by upgrading from Win XP to Win 7 or Win 8, which I have seen at about £43.
As a guess, it looks like you might expect the FH API run-time to at least halve, but don't hold me to that.
Find Duplicate Individuals (All Relations)
Posted: 07 Dec 2012 09:25
by gerrynuk
tatewise said:
It is difficult to be certain what the gain might be, especially since your iMac/Parallels set-up is so different from the others.
If you are going to go through the hassle of adding RAM why not go for more?
A quick Google search suggests it does not have to be Apple RAM, which looks very expensive, as there are compatibles available.
It also looks as if you might get more gain by upgrading from Win XP to Win 7 or Win 8, which I have seen at about £43.
As a guess, it looks like you might expect the FH API run-time to at least halve, but don't hold me to that.
Mike,
The iMac has 4 GB but Parallels allocates a default 1GB to Win XP in order not to cripple the Mac OS. However, I can increase the Win XP memory to a max of 1.5 GB, which should make a difference. I will run your test plugin later and post the results.
Find Duplicate Individuals (All Relations)
Posted: 07 Dec 2012 12:22
by gerrynuk
Mike,
Here are the results for my setup (Win XP running under Parallels on an iMac):

2nd run with 1.5 MG (1 MG on first run). Slightly faster on the first test and the same on the second. It seems to be sensitive to background tasks as the first test was much slower if I ran it immediately on starting FH as it was loading the thumbnails.

Find Duplicate Individuals (All Relations)
Posted: 08 Dec 2012 10:44
by tatewise
Yes, any run-time assessments will be sensitive to high CPU usage background tasks, so always wait for thumbnails to load, or temporarily disable that feature.
Increasing the RAM has reduced FH API run-time by over 13%.
Judging by the other assessments I would guess that the FH API run-time could be reduced to about 20 secs if enough RAM can be added, but I appreciate that would need RAM adding to the base iMac PC.
Alternatively, if you could upgrade from XP to Win 7 or Win 8 then that should give roughly a 3 times improvement with the same RAM.
Find Duplicate Individuals (All Relations)
Posted: 18 Feb 2013 10:32
by gessler
Many thanks. This is an excellent piece of work which has helped me find and resolve a few duplicates with the minimum of fuss and effort.
There is though something I don't understand about the scoring....and I'm quite content to be told I've overlooked the obvious, but here goes with my puzzle:
I have two 'Joseph WILEMAN' records; no alternative names, no spelling errors all formatted correctly. Now my reading of the scoring is that for the 'Individual score' I should get 7 points for the matching last names, 6 points for first names, zero points for other names and zero points for soundex (as they are a perfect match). So, a total of 13 by my reckoning. However I get 27 and don't understand why. (I am using the preference standard scores.)
To add to this, it is also my understanding that each element of the scoring is limited to a maximum of 20...yet 27 is shown under the Individual column (and is added into the total).
Can anyone explain this? Any help would be much appreciated.
[llama-banana]
Find Duplicate Individuals (All Relations)
Posted: 18 Feb 2013 10:39
by gessler
I should have added in my post a few minutes ago that there are several other matches where similar Individual 'higher scoring' discrepancies appear to occur - and where they are not abated to a maximum Individual score of 20.[llama-banana]
Find Duplicate Individuals (All Relations)
Posted: 18 Feb 2013 12:36
by tatewise
Checkout the
Help & Advice pages for details.
Use
Enable Diagnostic Mode to see a breakdown of scores.
Name scores are limited to
20 points
per person.
After the
Individual is assessed, then their relation's
Names are also assessed and their scores added.
If more than
6 points are scored, then
Event assessment is performed and more points added, etc, etc...
Find Duplicate Individuals (All Relations)
Posted: 18 Feb 2013 13:26
by gessler
Thanks Mike. Your speed of response and its content were very impressive.
I had read the Help & Advice page but had not fully grasped what was being said. On your advice, I used the Enable Diagnostics Mode and all became clear; my 27 points came from my anticipated 13 points Individual Name plus 10 from Individual Birth and 5 from Individual Death. Easy when you know how isn't it![wink]
Once again, many thanks and keep up your excellent work.[llama-banana]
Find Duplicate Individuals (All Relations)
Posted: 18 Feb 2013 15:01
by tatewise
Could you suggest how the Help & Advice could be improved to make it clearer.
It is very difficult, on the inside looking out, to appreciate how newcomers to the Plugin see things.
May be more emphasis that the Event assessment can follow Name assessment?
Should it mention more often that Enable Diagnostic Mode can be used to see a breakdown of scores?
Find Duplicate Individuals (All Relations)
Posted: 18 Feb 2013 17:06
by gessler
Hi Mike
On re-reading, your instructions are pretty clear but my sin was not paying enough attention to the paragraph in the Names Assessment page, 'If the score reaches a threshold of 6 points, then the persons key Events are also assessed'. I'm afraid I concentrated on 'Name' and overlooked the remaining assessments!
I agree with both your suggestions; I think they are the key to help(any?)others like me failing to understand how things work. With this in mind:-
1) Perhaps the addition to the paragraph mentioned above of something along the lines of, 'Any resultant scores from the Events will be added to the name score and shown in the final Results Set.'
2) While the 'Enable Diagnostic Mode' page refers to, 'more sub-points categories are shown' (if used), it might be helpful to include a paragraph in the Results Set page that states that details of the component parts of the scores shown in the Results Set can be seen by running with the Diagnostics Mode enabled (tick box on Find Duplicates at start-up).
I hope this helps. Make of the above what you will; use, modify or discard it, I'm not precious. This is your very elegant baby to nurture and grow and, once again, many thanks for providing it.
[llama-banana]