Find Duplicate Individuals ~ Names Assessment

If either person's Name does not exist, then the resulting score is 0 points.

The Plugin matches all Primary Name and Alternate Name fields, by comparing name parts both explicitly and by Soundex code. So Joseph Tom Henry SMITH is a good match with Henry tom JOSEPH-SMITH. All white space and punctuation characters are treated as name part separators, but otherwise ignored. Any 1 or 2 character name parts are ignored (except as explained in Surnames below).

Surnames are always capitalised, so will only match other Surnames, but each part of a punctuation separated (e.g. hyphenated) Surname matches separately, i.e. SMITH matches SMITH above, and gains 7 points. Surnames with parts separated by spaces are treated as one name, i.e. VAN DYKE matches VANDYKE, and only gains 7 points.

Given Names etc., are set to lowercase, so only match other given Names etc., i.e. Tom matches tom above, gaining 6 points, because both are the 2nd forename; Henry matches Henry above, gaining only 3 points, as their positions are different; but Joseph does NOT match surname JOSEPH.

Each name is converted to a Soundex code such as J210, and these are also matched, i.e. Joseph = J210 = JOSEPH, and gains 2 points. Any perfect name matches do not gain extra points for a Soundex match.

Thus the above two names score 7 + 6 + 3 + 2 = 18 points.

If the Surnames do not match, then points may be deducted, but this is disabled by default.

If the score fails to reach a minimum value, then points may be deducted, but this is disabled by default.

To avoid overwhelming the results when there are many matches, the score is limited to 20 points.

If the score reaches a threshold of 6 points, the person’s key Events are also assessed, and any extra points added to their score in the Result Set.

Name assessment is first performed on each pair of Individuals, and if sufficient points are scored, their Father, Mother, Spouse, and Child relatives are also assessed. When there are multiple instances of a relative, then they are assessed against each other, and the best score used. For example, if both Individuals have several Children, then each Child of one Individual is assessed against each Child of the other Individual.

Thus the maximum score for five good Name matches is 5 x 20 = 100 points.

If it is discovered that a pair of Individuals share an identical relative, then those two Individuals are probably not duplicates, and the pair excluded. For example, if they share the same parent, then they must be siblings, or at least step-siblings. If they were accidental duplicates, they would soon be spotted in Diagrams.

All the points and other parameters described above are default values that can be adjusted on the Set Preferences Tab ~ Names Matching Tab.


Back to Find Duplicates Tab.

CC Attribution-Noncommercial-Share Alike 4.0 International
Runs using DokuWiki Recent changes RSS feed www.rjt.org.uk