* Find Duplicate Individuals (All Relations)
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Gerry & John ~ Your assessments are remarkably similar, given that the results are rounded down to the nearest second.
The ratio between FH API and LUA Script is approaching 10 to 1 on both PC.
Whereas, Bill & my results have a ratio much nearer 1 to 1.
John's 2.4 GHz Quad-Core CPU is very similar to my 2.6 GHz Dual-Core CPU, yet the FH API time per 1,000 Individuals is about 10 times mine (& Bill's).
How much RAM do you each have on your PC?
Or maybe it is a Windows XP versus Windows 7 characteristic?
The ratio between FH API and LUA Script is approaching 10 to 1 on both PC.
Whereas, Bill & my results have a ratio much nearer 1 to 1.
John's 2.4 GHz Quad-Core CPU is very similar to my 2.6 GHz Dual-Core CPU, yet the FH API time per 1,000 Individuals is about 10 times mine (& Bill's).
How much RAM do you each have on your PC?
Or maybe it is a Windows XP versus Windows 7 characteristic?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- BillH
- Megastar
- Posts: 2184
- Joined: 31 May 2010 03:40
- Family Historian: V7
- Location: Washington State, USA
Find Duplicate Individuals (All Relations)
Mike,
My system:
Intel Core i7-2600 CPU @ 3.40 GHz (quad core)
8 GB memory
Windows 7 Home Premium 64 bit
Bill
My system:
Intel Core i7-2600 CPU @ 3.40 GHz (quad core)
8 GB memory
Windows 7 Home Premium 64 bit
Bill
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
I will try on my home machine later to see if it makes a difference.
The Win XP machine has 4Gb ram, but is only 32 bit so can only access 3Gb.
My desktop at home is a similar machine but is running W7 64bit pro.
My laptop is an i3 with 4gb ram, again running W7 64 bit pro
The Win XP machine has 4Gb ram, but is only 32 bit so can only access 3Gb.
My desktop at home is a similar machine but is running W7 64bit pro.
My laptop is an i3 with 4gb ram, again running W7 64 bit pro
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
The results off my laptop:
an i3 (2.53ghz) 4Gb Ram W7 64 bit

an i3 (2.53ghz) 4Gb Ram W7 64 bit

- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
This time with my 2.4ghz Quad core running W7 pro 64bit: 4GB Ram.


- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
John ~ Those assessments are both about 3 to 1, which suggests Windows 7 is significantly better than Windows XP given that CPU and RAM are similar on all PC.
I recall you had a sub-set 4575 Individuals database for experiments at the end of October, which is roughly the same order of size as Bill's & Gerry's database.
Is it possible to assess that smaller database on your PC to see if the size is a significant factor.
I have just run the assessment on an old Win XP laptop, 1.6 GHz CPU, 1 GB RAM, 2000 Individuals, FH API = 6 secs, LUA script 4 secs.
I recall you had a sub-set 4575 Individuals database for experiments at the end of October, which is roughly the same order of size as Bill's & Gerry's database.
Is it possible to assess that smaller database on your PC to see if the size is a significant factor.
I have just run the assessment on an old Win XP laptop, 1.6 GHz CPU, 1 GB RAM, 2000 Individuals, FH API = 6 secs, LUA script 4 secs.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
I had deleted my reduced set, but have recreated again.
Test results from my XP SP3 32 bit .

Test results from my XP SP3 32 bit .

- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
The other part of the data, all of whom are in pool 1 produced this on the XP machine.

I will test the W7 machines later tonight

I will test the W7 machines later tonight
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
1 GBtatewise said:
Gerry & John ~ ...
How much RAM do you each have on your PC?
...
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
Hi Mike,
Results for the reduced set on my i3 Laptop as follows:

Results for the reduced set on my i3 Laptop as follows:

- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
Hi Mike,
Results from My desktop 2.4g quad core W7 64 bit on the reduced set.

I am going to re-run the win Xp test again tomorrow, to check an idea I have had.
Results from My desktop 2.4g quad core W7 64 bit on the reduced set.

I am going to re-run the win Xp test again tomorrow, to check an idea I have had.
- johnmorrisoniom
- Megastar
- Posts: 882
- Joined: 18 Dec 2008 07:40
- Family Historian: V7
- Location: Isle of Man
Find Duplicate Individuals (All Relations)
Hi Mike,
2nd Run on my XP machine on the reduced set.

The only difference in the file between run 1 and run 2, is that I had realised I had left all the media in the folder from splitting tree, so these were then deleted.Run 2 has 2700+ less media files (pdf and word but mostly jpeg) connected to the file
2nd Run on my XP machine on the reduced set.

The only difference in the file between run 1 and run 2, is that I had realised I had left all the media in the folder from splitting tree, so these were then deleted.Run 2 has 2700+ less media files (pdf and word but mostly jpeg) connected to the file
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
An analysis of the assessments to date are summarised below.

The first column gives the database size in Individuals.
The next two columns give the LUA Script run times, and their normalised values per 1000 Individuals.
Remember the LUA Script loops in proportion to the number of Individuals, but without any FH API access.
The times are remarkably consistent, regardless of OS & RAM.
The subsequent columns give the FH API run times, and normalised values per 1000 Individuals for Windows 7 & XP.
Windows 7 is on a par with the LUA Script, unless the RAM is too small for the database size.
Windows XP is significantly slower, especially if the RAM is too small for the database size.
However, these timings are only significant when running intensive global database Plugins such as Find Duplicate Individuals and Map Life Facts and perhaps Show Project Statistics and the various Search ... Plugins.
Beware that when FH opens a Project it starts loading Multimedia image thumbnails, which can take many seconds.
Running a Plugin during this phase can have detrimental effects.
I have found that it can often cause FH to crash with a Family Historian has stopped working message with details involving BEX and MSVCR100.dll.

The first column gives the database size in Individuals.
The next two columns give the LUA Script run times, and their normalised values per 1000 Individuals.
Remember the LUA Script loops in proportion to the number of Individuals, but without any FH API access.
The times are remarkably consistent, regardless of OS & RAM.
The subsequent columns give the FH API run times, and normalised values per 1000 Individuals for Windows 7 & XP.
Windows 7 is on a par with the LUA Script, unless the RAM is too small for the database size.
Windows XP is significantly slower, especially if the RAM is too small for the database size.
However, these timings are only significant when running intensive global database Plugins such as Find Duplicate Individuals and Map Life Facts and perhaps Show Project Statistics and the various Search ... Plugins.
Beware that when FH opens a Project it starts loading Multimedia image thumbnails, which can take many seconds.
Running a Plugin during this phase can have detrimental effects.
I have found that it can often cause FH to crash with a Family Historian has stopped working message with details involving BEX and MSVCR100.dll.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Mike,
Thanks for all your hard work on this study, as well, of course, in writing the programs and plugins in the first place.
Although I run FH on an iMac with an emulator (Parallels), the main impact seems to be the amount of memory rather than the processor speed. Would that be a fair conclusion? Is it possible to conclude what the overall impact would be if I doubled the ram from 1 to 2 GB?
Thanks for all your hard work on this study, as well, of course, in writing the programs and plugins in the first place.
Although I run FH on an iMac with an emulator (Parallels), the main impact seems to be the amount of memory rather than the processor speed. Would that be a fair conclusion? Is it possible to conclude what the overall impact would be if I doubled the ram from 1 to 2 GB?
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
It is difficult to be certain what the gain might be, especially since your iMac/Parallels set-up is so different from the others.
If you are going to go through the hassle of adding RAM why not go for more?
A quick Google search suggests it does not have to be Apple RAM, which looks very expensive, as there are compatibles available.
It also looks as if you might get more gain by upgrading from Win XP to Win 7 or Win 8, which I have seen at about £43.
As a guess, it looks like you might expect the FH API run-time to at least halve, but don't hold me to that.
If you are going to go through the hassle of adding RAM why not go for more?
A quick Google search suggests it does not have to be Apple RAM, which looks very expensive, as there are compatibles available.
It also looks as if you might get more gain by upgrading from Win XP to Win 7 or Win 8, which I have seen at about £43.
As a guess, it looks like you might expect the FH API run-time to at least halve, but don't hold me to that.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
Mike,tatewise said:
It is difficult to be certain what the gain might be, especially since your iMac/Parallels set-up is so different from the others.
If you are going to go through the hassle of adding RAM why not go for more?
A quick Google search suggests it does not have to be Apple RAM, which looks very expensive, as there are compatibles available.
It also looks as if you might get more gain by upgrading from Win XP to Win 7 or Win 8, which I have seen at about £43.
As a guess, it looks like you might expect the FH API run-time to at least halve, but don't hold me to that.
The iMac has 4 GB but Parallels allocates a default 1GB to Win XP in order not to cripple the Mac OS. However, I can increase the Win XP memory to a max of 1.5 GB, which should make a difference. I will run your test plugin later and post the results.
- gerrynuk
- Megastar
- Posts: 565
- Joined: 25 Apr 2007 09:21
- Family Historian: V6
- Location: Welwyn Garden City
- Contact:
Find Duplicate Individuals (All Relations)
2nd run with 1.5 MG (1 MG on first run). Slightly faster on the first test and the same on the second. It seems to be sensitive to background tasks as the first test was much slower if I ran it immediately on starting FH as it was loading the thumbnails.Mike,
Here are the results for my setup (Win XP running under Parallels on an iMac):

- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Yes, any run-time assessments will be sensitive to high CPU usage background tasks, so always wait for thumbnails to load, or temporarily disable that feature.
Increasing the RAM has reduced FH API run-time by over 13%.
Judging by the other assessments I would guess that the FH API run-time could be reduced to about 20 secs if enough RAM can be added, but I appreciate that would need RAM adding to the base iMac PC.
Alternatively, if you could upgrade from XP to Win 7 or Win 8 then that should give roughly a 3 times improvement with the same RAM.
Increasing the RAM has reduced FH API run-time by over 13%.
Judging by the other assessments I would guess that the FH API run-time could be reduced to about 20 secs if enough RAM can be added, but I appreciate that would need RAM adding to the base iMac PC.
Alternatively, if you could upgrade from XP to Win 7 or Win 8 then that should give roughly a 3 times improvement with the same RAM.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Find Duplicate Individuals (All Relations)
Many thanks. This is an excellent piece of work which has helped me find and resolve a few duplicates with the minimum of fuss and effort.
There is though something I don't understand about the scoring....and I'm quite content to be told I've overlooked the obvious, but here goes with my puzzle:
I have two 'Joseph WILEMAN' records; no alternative names, no spelling errors all formatted correctly. Now my reading of the scoring is that for the 'Individual score' I should get 7 points for the matching last names, 6 points for first names, zero points for other names and zero points for soundex (as they are a perfect match). So, a total of 13 by my reckoning. However I get 27 and don't understand why. (I am using the preference standard scores.)
To add to this, it is also my understanding that each element of the scoring is limited to a maximum of 20...yet 27 is shown under the Individual column (and is added into the total).
Can anyone explain this? Any help would be much appreciated.
[llama-banana]
There is though something I don't understand about the scoring....and I'm quite content to be told I've overlooked the obvious, but here goes with my puzzle:
I have two 'Joseph WILEMAN' records; no alternative names, no spelling errors all formatted correctly. Now my reading of the scoring is that for the 'Individual score' I should get 7 points for the matching last names, 6 points for first names, zero points for other names and zero points for soundex (as they are a perfect match). So, a total of 13 by my reckoning. However I get 27 and don't understand why. (I am using the preference standard scores.)
To add to this, it is also my understanding that each element of the scoring is limited to a maximum of 20...yet 27 is shown under the Individual column (and is added into the total).
Can anyone explain this? Any help would be much appreciated.
[llama-banana]
Find Duplicate Individuals (All Relations)
I should have added in my post a few minutes ago that there are several other matches where similar Individual 'higher scoring' discrepancies appear to occur - and where they are not abated to a maximum Individual score of 20.[llama-banana]
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Checkout the Help & Advice pages for details.
Use Enable Diagnostic Mode to see a breakdown of scores.
Name scores are limited to 20 points per person.
After the Individual is assessed, then their relation's Names are also assessed and their scores added.
If more than 6 points are scored, then Event assessment is performed and more points added, etc, etc...
Use Enable Diagnostic Mode to see a breakdown of scores.
Name scores are limited to 20 points per person.
After the Individual is assessed, then their relation's Names are also assessed and their scores added.
If more than 6 points are scored, then Event assessment is performed and more points added, etc, etc...
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Find Duplicate Individuals (All Relations)
Thanks Mike. Your speed of response and its content were very impressive.
I had read the Help & Advice page but had not fully grasped what was being said. On your advice, I used the Enable Diagnostics Mode and all became clear; my 27 points came from my anticipated 13 points Individual Name plus 10 from Individual Birth and 5 from Individual Death. Easy when you know how isn't it![wink]
Once again, many thanks and keep up your excellent work.[llama-banana]
I had read the Help & Advice page but had not fully grasped what was being said. On your advice, I used the Enable Diagnostics Mode and all became clear; my 27 points came from my anticipated 13 points Individual Name plus 10 from Individual Birth and 5 from Individual Death. Easy when you know how isn't it![wink]
Once again, many thanks and keep up your excellent work.[llama-banana]
- tatewise
- Megastar
- Posts: 27087
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Find Duplicate Individuals (All Relations)
Could you suggest how the Help & Advice could be improved to make it clearer.
It is very difficult, on the inside looking out, to appreciate how newcomers to the Plugin see things.
May be more emphasis that the Event assessment can follow Name assessment?
Should it mention more often that Enable Diagnostic Mode can be used to see a breakdown of scores?
It is very difficult, on the inside looking out, to appreciate how newcomers to the Plugin see things.
May be more emphasis that the Event assessment can follow Name assessment?
Should it mention more often that Enable Diagnostic Mode can be used to see a breakdown of scores?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Find Duplicate Individuals (All Relations)
Hi Mike
On re-reading, your instructions are pretty clear but my sin was not paying enough attention to the paragraph in the Names Assessment page, 'If the score reaches a threshold of 6 points, then the persons key Events are also assessed'. I'm afraid I concentrated on 'Name' and overlooked the remaining assessments!
I agree with both your suggestions; I think they are the key to help(any?)others like me failing to understand how things work. With this in mind:-
1) Perhaps the addition to the paragraph mentioned above of something along the lines of, 'Any resultant scores from the Events will be added to the name score and shown in the final Results Set.'
2) While the 'Enable Diagnostic Mode' page refers to, 'more sub-points categories are shown' (if used), it might be helpful to include a paragraph in the Results Set page that states that details of the component parts of the scores shown in the Results Set can be seen by running with the Diagnostics Mode enabled (tick box on Find Duplicates at start-up).
I hope this helps. Make of the above what you will; use, modify or discard it, I'm not precious. This is your very elegant baby to nurture and grow and, once again, many thanks for providing it.
[llama-banana]
On re-reading, your instructions are pretty clear but my sin was not paying enough attention to the paragraph in the Names Assessment page, 'If the score reaches a threshold of 6 points, then the persons key Events are also assessed'. I'm afraid I concentrated on 'Name' and overlooked the remaining assessments!
I agree with both your suggestions; I think they are the key to help(any?)others like me failing to understand how things work. With this in mind:-
1) Perhaps the addition to the paragraph mentioned above of something along the lines of, 'Any resultant scores from the Events will be added to the name score and shown in the final Results Set.'
2) While the 'Enable Diagnostic Mode' page refers to, 'more sub-points categories are shown' (if used), it might be helpful to include a paragraph in the Results Set page that states that details of the component parts of the scores shown in the Results Set can be seen by running with the Diagnostics Mode enabled (tick box on Find Duplicates at start-up).
I hope this helps. Make of the above what you will; use, modify or discard it, I'm not precious. This is your very elegant baby to nurture and grow and, once again, many thanks for providing it.
[llama-banana]
