* Duplicate Individuals

Questions regarding use of any Version of Family Historian. Please ensure you have set your Version of Family Historian in your Profile. If your question fits in one of these subject-specific sub-forums, please ask it there.
Post Reply
User avatar
goodwin2
Famous
Posts: 199
Joined: 24 Aug 2007 21:06
Family Historian: V6.2
Location: Southeastern Pennsylvania, USA

Duplicate Individuals

Post by goodwin2 » 09 Jan 2016 05:59

Hi Mike,

I voted for an automatic check for duplicate names in 2012 and have made comments regarding it since then obviously suggesting that it would be MOST useful.

I know nothing of the complexities of constructing a program such as FH. However, the feature that questions dates that are being added as possibly incorrect would seem to be a similar situation as duplicate names. I DO have some folks who lived to be over 100 and when I add the death date, I get a pop-up questioning this. Also one pops up if it seems that a woman might have been over the normal time to give birth. So why is it so difficult to check for a duplicate name?????? Always hopeful that this will be added.

Meantime, I'm going to keep checking here to see if and when all the glitches with v.6.1 have been resolved before I bite that one off!

Thanks again Mike.
GSB

User avatar
tatewise
Megastar
Posts: 27085
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Duplicate Individuals

Post by tatewise » 09 Jan 2016 11:41

To mitigate those Date warnings, try adjusting the values in Tools > Preferences > Estimates.

For example set the Maximum Lifespan to greater than 100.

In defence of FH, checking Dates is very different from checking duplicate Individuals.

To check Dates only a handful of Date fields from closely linked Individual records need to be compared with each other or the Preferences > Estimates.

To check Names requires the whole database of Individuals to be reviewed against the new Individual.
How close must the Name be to count as a duplicate?
How close must the Date of Birth/Death be?
Must any immediate relations also match up, and how closely?
There are a lot of fuzzy logic decisions needed to avoid false positives and yet include all reasonable possibilities.
I know - I wrote the Find Duplicate Individuals Plugin - it is NOT easy.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
goodwin2
Famous
Posts: 199
Joined: 24 Aug 2007 21:06
Family Historian: V6.2
Location: Southeastern Pennsylvania, USA

Re: Duplicate Individuals

Post by goodwin2 » 10 Jan 2016 05:46

Hi Mike,

Thanks for your input on the duplicate names issue. I have your plug-in, in fact it is running now as a test of current listings. With the size of my database, it will take over two hours to run. I did also run it sometime past and it does a thorough job.

However, what I really need to have is: upon input, check surname, given name and date of birth in whatever order is most useful or easiest. If those match what is already in the database, I can then check the alphabetical list to confirm that I have a duplicate or not. This would have been a big help in the three times - in memory - that I have added a generation of folks, many of whom were already listed.

Does this request have any possibility of being fulfilled - perhaps with a simpler plug-in? Excuse my limited understanding here.

Many thanks for all your help.

GSB

User avatar
tatewise
Megastar
Posts: 27085
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Duplicate Individuals

Post by tatewise » 10 Jan 2016 11:06

We never know if Calico Pie have any plans to add any requested feature, because that is commercially sensitive.

When I started developing the Plugin its matching algorithm was much simpler (and quicker) but users complained that it found too many false positives that they had to wade through, especially those with One-Name Projects.

Plugins cannot be 'plugged-in' to interactive operations, but must be run from the Tools menu.
So if there were a quick match Plugin, you would have to run it after entering every Individual before entering their family.

Your key requirement is: "upon input, check surname, given name and date of birth" but you don't specify the match algorithm rules in any detail, nor when the check should be invoked.
e.g.
Does the match only get invoked when Surname, Given name, and Date of Birth have all been entered (in any order)?
What happens if as part of input you change any one of them at any time in the future?
What if there is no Given name available to enter (there may be a matching Surname and DoB but you won't be told about it).

For a match to be true between Individual A and Individual B:
Surname A = Surname B
1st Given Name A = 1st Given Name B
Date of Birth A = Date of Birth B

Taken literally, any minor spelling variation in the names would fail, and any slight difference in the DoB would fail.
SMITH would NOT match SMYTHE.
Harry would NOT match Harold.
June 1900 or Q2 1900 would NOT match 7 June 1900

You could write your own Query to perform a simple match such as that, which checks any selected Individual against all others in your Project. It is not too difficult.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
goodwin2
Famous
Posts: 199
Joined: 24 Aug 2007 21:06
Family Historian: V6.2
Location: Southeastern Pennsylvania, USA

Re: Duplicate Individuals

Post by goodwin2 » 10 Jan 2016 23:59

Hi Mike,

To clarify - I just did an experiment with FTM v. 2011 which I use to check out the more "iffy" lines before I do any FH input. Here is what I did:

I have a Clarissa Sherwood b. in New York State in 1843; daughter of Samuel, b. 1796.

I added another Clarissa Sherwood - as daughter of Samuel. It took it OK. - Likely as we know many families lost a child and named another of the same sex with the same name.

I then added 1843 in the birth date location. At that point it popped up and said "Is this Clarissa Sherwood the same as Clarissa Sherwood, child of Samuel Sherwood and Lucy Thomas?"

That is exactly what I am hoping for. And I also added Clarissa Sherwood, b. 1843 in New York State to another family and got the same "is this the same Clarissa Sherwood" message". HOORAY!

Granted I do not have as huge a bunch of folks in that line on FTM as I do in my FH database but it is obviously possible to do this.

I don't know how helpful such a feature would be to other FH users but for me it could be a real time saver. Maybe most FH users have their own methods of avoiding adding duplicates - if so it would be interesting to learn of them.

In the meantime, I'm STILL hoping that Calico Pie adds this feature SOON!

Regards,
GSB
GSB

User avatar
tatewise
Megastar
Posts: 27085
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Duplicate Individuals

Post by tatewise » 11 Jan 2016 11:16

If have split this off as new thread, because it is straying from the original theme.

Yes, I understand exactly what you want, but cannot say if FH might ever offer that.

However, your did not answer any of my questions about the criteria for a match.

What happens in FTM if you enter the name as Clarisa Sherwood with only one s or any other minor variant of the name?
What happens if you enter the Date of Birth as June 1843 or any other minor variant of the date?

If have attached a very basic Duplicate Individual Query that compares one selected Person against all others by full Name, Sex, and Date of Birth, but requires a perfect match.
At the prompt, it is easy to choose the most recently entered Person by holding the Alt key while clicking the Record Id or Updated column heading to bring them to top of list.

It could easily be extended to compare Names by SOUNDEX so if they sound alike they match, and the Date of Birth could match if the difference was say a defined number of days.

There are other similarly useful Individual Queries in fhugdownloads:queriesindi|> Downloads and Links ~ Query Type: Individual.
Attachments
Duplicate Individual.fhq
(680 Bytes) Downloaded 163 times
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
AdrianBruce
Megastar
Posts: 1962
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: Duplicate Individuals

Post by AdrianBruce » 11 Jan 2016 13:01

Without wishing to deny the usefulness of a potential-duplicate detection system, I have to say that, when I compare the suggested "check on input" to my personal way of working - I am underwhelmed by the usefulness and overwhelmed by the complexity of the potential programming.

Let me emphasise that this is in comparison with my normal ways of working. Suppose that I were adding Clarissa Sherwood with a father of Samuel as the result of finding a census with the two of them in. Under all normal circumstances, I would only be doing that if I had already manually determined that Samuel or Clarissa on the new census source were the same as Samuel or Clarissa already in my database. I'd have done this and written up the "proof of identification" in the notes for the source-record before updating or inserting any people in the database. (Yes, I am a Method 1 person. Normally).

Now, there are circumstances where prior identification might not have taken place - I can see a One Name Study, or a One Place Study would work differently. And in my case, trying to batter the Pickstock family of mid-Cheshire into submission is being done by entering all source-records, all the people in them, and then merging where I realise that they are duplicates. So this is a point of potential use, but the only one I can see for me. And I'd probably write a Note-record justifying the Merge first.

But that last example leads me to the issue: What should FH do if you answer, "yes" to "Is it a duplicate person?" I don't know at what point in the proceedings FTM decides to update its database. I feel pretty certain that FH updates its internal, in-memory database as soon as you've finished entering a single item for that person. Thus as soon as I tab out of the name of "Clarissa Sherwood", FH updates its database. Until that point FH doesn't have a name to work with. It still doesn't have (say) a date-of-birth. Only when it's got a name and a date-of-birth is it worth doing a duplicate-check. But by this time FH has already updated its database and if you say, "Yes - it is a duplicate", how do you roll back the changes out of the database and apply then to the original Clarissa? One option is a Merge but Merges are problematic at the best of times (witness the number of queries about them) and doing it in the middle of data entry doesn't fill me with joy. It may be that FTM has not committed / entered the data into its database at the point of asking the question, so it is easier for FTM to (somehow) change the ongoing process from an insert to an update. But I think FH is going to have a very complex task.

And I wonder also about the disruption to the flow of control when entering from Ancestral Sources if FH starts asking about duplicates part way through data entry. (As I've said, I'm not an AS user, so am unsure if this disruption is real or not).

And all this is besides the point that Clarissa 1 probably has "btw 1843 and 1844" in the database and Clarissa 2 probably has "btw 1844 and 1845" - I'd want those to be regarded as potentially equal but again, the programming complexity is making me shudder.

So for me an occasional post-entry duplicated-check query generally suffices - a quite simplistic check because I find I'm better at seeing matches than software. (Ever seen the Soundex matches for "Pickstock"???!!!!) Personally, I'd rather Calico Pie expend their effort in other directions such as a bibliography (which didn't get anywhere in the voting on Wish List Requests. Sigh).
Adrian

User avatar
PeterR
Megastar
Posts: 1129
Joined: 10 Jul 2006 16:55
Family Historian: V7
Location: Northumberland, UK

Re: Duplicate Individuals

Post by PeterR » 11 Jan 2016 13:26

I think I may have said this before, some years ago, on a similar thread.

I always add a new individual to an existing family by dragging from an existing box on a diagram; that way I can see at a glance that I'm not trying to create a duplicate. As Adrian said, above, possibly less useful for one-name studies.
Peter Richmond (researching Richmond, Bulman, Martin, Driscoll, Baxter, Hall, Dales, Tyrer)

User avatar
LornaCraig
Megastar
Posts: 2996
Joined: 11 Jan 2005 17:36
Family Historian: V7
Location: Oxfordshire, UK

Re: Duplicate Individuals

Post by LornaCraig » 11 Jan 2016 14:31

Like Peter, I always use diagram-based editing. Having used FH since before the Focus Window was introduced I hardly ever use the Focus Window, finding diagrams much more intuitive. Perhaps this explains why FH does not have a feature to detect possible duplicates when new data is entered: in its original form FH placed the emphasis on diagrams and only introduced the Focus Window later, to provide an input method more familiar to users of other programs.
Lorna

User avatar
tatewise
Megastar
Posts: 27085
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Duplicate Individuals

Post by tatewise » 11 Jan 2016 16:23

@GSB: What happens after FTM asks "is this the same Clarissa Sherwood?" and you answer "Yes".

@Adrian: AS operates directly on the GEDCOM file (so is not exclusively tied to FH).
Afterwards FH reloads the entire GEDCOM file, so would not perform any duplicate checks.
Thus AS would have to perform the duplicate checks if so desired.

@Adrian: Merging a pair of records (c.f. Edit > Merge/Compare Records), especially when one has so little if any new data, is no where near as problematic as File > Merge/Compare File that confuses many users.

I hope this discussion is making it clearer why duplicate Individual check & correction is in a completely different ballpark from raising Date warnings, which is where this thread started.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
AdrianBruce
Megastar
Posts: 1962
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: Duplicate Individuals

Post by AdrianBruce » 11 Jan 2016 20:14

Mike - re AS's mode of operation - thanks for clarifying that. The idea of AS duplicating any major validation, such as is suggested, fills me with horror, breaking as it does all my half-remembered "rules" of functionality only being in one pace. (All such rules are broken, of course, but efforts to duplicate major functionality seldom succeed).

Re merging difficulties - what I had in mind was the case where people successfully merge two individuals into one and their two fathers into one, but don't merge the two families. Again, it may not be a usual case at the suggested stage, but I'll bet it'll happen.

I agree with you - based on the way FH works now and GEDCOM is structured, doing this check at the input stage is hugely non-trivial (which is not to say that some people wouldn't find it useful). I've got no feel for how easy it was for FTM to implement.
Adrian

User avatar
goodwin2
Famous
Posts: 199
Joined: 24 Aug 2007 21:06
Family Historian: V6.2
Location: Southeastern Pennsylvania, USA

Re: Duplicate Individuals

Post by goodwin2 » 11 Jan 2016 23:30

Hi all,

Thanks for all the comments re the value of an instant check in FH for a duplicate name.

To answer Mike's questions:

After FTM asks "is this the same Clarissa Sherwood", FTM then pops up a merge screen with options for the merge. Therefore, no duplicate ends up in the file. Of course, one could also go into the file and delete this duplicate. No problem there as you have not added anything other than the name and date of birth.

A different spelling or a more complete date does NOT bring up a question re it being the same Clarissa. So, yes, the matching has limitations.

I do not use AS therefore did not realize that it might be affected by having an instant duplicate check. Sorry there.

For my records, I enter individuals and their data directly into the floating property box, [sources open on the right], with the individual records window behind; don't use the focus window very often and never have diagrams open [50,000+ in the file]. So my method is probably not one that a lot of FH folks use. Having this arrangement allows me to easily see all the sources listed and add text from the source.

I find the merge/compare fairly easy to use though I always go back to the "merged" item especially if it is an individual rather than a source. The individual merges often need tweaking and/or further merging. I do find the choices of specific items to merge [as the merge screen is up] or save somewhat limiting. I often have to switch the two being merged to save a specific item.

Mike, your Duplicate Individual Query works just great and I will remember to use that as I come across a new line of folks or something triggers my slowing memory.

Looks like my original wish for an instant check for duplicate individuals has been fully explored with alternate avenues available that take just a wee bit more time. As you see from the size of my database, I do NOT need duplicates in it!
GSB

User avatar
tatewise
Megastar
Posts: 27085
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Duplicate Individuals

Post by tatewise » 12 Jan 2016 10:23

@GSB: Thanks for the clarification about how FTM works, but I do not understand why it offers to Merge. The new Name, Gender & DoB are identical to existing entry, so what is there to merge? Surely the new entry just needs deleting?

@GSB: Opening a Diagram for the immediate family being updated (limited by Generations up & down) is unaffected by the size of your database, uses a separate Window, and allows a floating Property Box + Sources. But I do not see how the Sources pane is relevant when entering a brand new Individual. The point is that you can see the family, so anyone with a similar Name & DoB as your proposed entry is staring you in the face.

@Adrian: If the check and merge is performed at this early stage, when only Name, Gender & DoB have been entered, then what can go wrong :?:

@Adrian: I guess AS would only perform the check when an Individual is being added, which is comparatively rare, and can only check the Name & Gender as that is all that is entered. The simple options are Keep or Delete, because at that stage the GEDCOM data has not yet been Saved.

I suspect that if FH were to mimic FTM and only check exact match for Name, Gender & DoB, then it would be fairly easy to implement, and offer to Keep or Delete the new entry. There would of course need to be a Preference to switch the feature on and off. But would it give a false sense of security, as the slightest variance in the new data would go unnoticed? A better but more complex check would allow for some variance in the data, and that is where the slippery slope begins!
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Duplicate Individuals

Post by NickWalker » 12 Jan 2016 10:39

Just to confirm that none of these proposed changes to FH would cause any issues with Ancestral Sources. AS doesn't currently do any duplicate checking and is unlikely to do so.
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
AdrianBruce
Megastar
Posts: 1962
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: Duplicate Individuals

Post by AdrianBruce » 12 Jan 2016 11:38

tatewise wrote:... @Adrian: If the check and merge is performed at this early stage, when only Name, Gender & DoB have been entered, then what can go wrong? ...
As we used to say at work, "What can go wrong, go wrong, go wrong....?" - usually terminated by someone shouting "Go 28!" (which will only mean anything to those of you who used the George 2 operating system).

Yes, I think I was working off the contention that this was Clarissa Sherwood born in year Y, daughter of Samuel Sherwood and that all that data had been entered, which would mean that merging might include Clarissa, Samuel and the Sherwood family. It presupposes a specific order of entry, whereas I would tend to finish off one person completely before adding in a parent (or child) - indeed, if it's a census, I'd actually put the census event in before the birth details.

Yes, the flexibility of FH in its current state doesn't lend itself to easy duplicate checking on entry. Which is partly why I look first...
Adrian

User avatar
goodwin2
Famous
Posts: 199
Joined: 24 Aug 2007 21:06
Family Historian: V6.2
Location: Southeastern Pennsylvania, USA

Re: Duplicate Individuals

Post by goodwin2 » 13 Jan 2016 06:01

I agree that if the answer to "is this the same Clarissa" is yes, then it would make sense just not to allow the entry. I have no idea how that would work. FTM uses the "let's merge your new/duplicate entry" instead, which does leave you with just one entry. FTM would not merge any other individual than the one it had questioned as being a duplicate as far as any experience I have had with the program.

I use the property box for entering new individuals because I can then enter facts, notes and attach media, add family members, have the born, died, buried, married, notes slots available. The source window is open on the right so I see what is already there for that individual, add the text to the source, etc. That's going into the details of the source feature as all FH folks know it - but isn't that the information we are trying to research and record?

We all have our own methods of research and recording I'm sure. I would guess that records available in different countries will differ as well as the information in those records. I tend to work through a whole family if I am fortunate enough to have parents and their children available. Initial input [parents and kids] and then to follow each of the children through their lives or as much of it as I can discover. My research is obviously the US and is down from an ancestor that came here in 1634 and had 14 children. Thankfully New Englanders tended to keep good records!
GSB

User avatar
tatewise
Megastar
Posts: 27085
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Duplicate Individuals

Post by tatewise » 13 Jan 2016 09:44

Just to reiterate that you can do all that via the Property Box + Sources from a Diagram. It is one of the unique features of FH that the Diagrams are dynamic. Just drag in different directions from any existing box and you can add parents, spouses, and children. Then click on them to open their Property Box and add the details as you have just described. The beauty is that it is almost impossible to add duplicates because everyone is there in the Diagram.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

Post Reply