* Using XML vs Gedcom to store data

Requests that have been moved to the Wish List, or deemed to need no further action
avatar
rcpettit
Diamond
Posts: 68
Joined: 30 Apr 2015 00:01
Family Historian: V7

Using XML vs Gedcom to store data

Post by rcpettit » 08 Dec 2022 17:28

FH is a very powerful program but by using a gedcom format to store data places lots of restrictions on the program that other genealogy programs that use a database don't have. It would seem to a better option to use an XML format for the storage of data. With this we can have things like shared citations so you only need to edit one instead of editing every use of that citation. This would allow you to select a source have a tab to see the citations being used and give you the option to merge them and share that citation. If you have a citation that is used multible times, you only have to edit one and change them all at same time. You would still be able to have duplicate citations if there are different notes attached to them. I may be wrong but it seems it would be a easy thing to change over to XML. FH does has the option to save data to an gedcom file for export.

User avatar
davidf
Megastar
Posts: 951
Joined: 17 Jan 2009 19:14
Family Historian: V6.2
Location: UK

Re: Using XML vs Gedcom to store data

Post by davidf » 08 Dec 2022 17:55

Family Historian already uses a "virtual table" for some aspects - for instance Places. GEDCOM does not have place records; FH presumably when it opens a project reads all the place fields into some sort of table and presents them to us in a "one place - many fact fields" format. Then when writing back to GEDCOM it "denormalises" the virtual place table and ensures that all identical places have identical place fields.

Switching to XML (or a relational model) would not be necessary to implement citations as a "one source - many citations" (citation-header table) and "one citation - many 'facts'" (citation-line table) model. The "virtual tables" could be created and denormalised in the same way as Places are handled.

I guess the question is "is it worth Calico-Pie's time?"

There are two areas where I think there is mileage in creating virtual records:
  1. Citations
  2. Addresses which need to be related to places. Having an address of "St Mary's" which could be Church, or Sports Stadium, or locality in numerous different places, seems to me a nonsense and inhibits geocoding at address level - which makes more sense because an address is closer to a "map pin" point than the area implied by a place.
The question is where does one stop and how close to GEDCOM expectations do they want to stay so that migrating users feel some level of familiarity?
David
Running FH 6.2.7. Under Wine on Linux (Ubuntu 22.04 LTS + LXDE 11)

avatar
rcpettit
Diamond
Posts: 68
Joined: 30 Apr 2015 00:01
Family Historian: V7

Re: Using XML vs Gedcom to store data

Post by rcpettit » 08 Dec 2022 18:06

I switched to FH due to the fact they seem to be the only software developer who makes regular updates and enhancement. Many of the others go years between updates. The only fault I have with FH is citation handling being a lumper. Still with this, it still is one of the best programs out there. Their programmers rock.

User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 08 Dec 2022 18:08

David, FH does not have any virtual table of Place details. They are real GEDCOM Place records and saved in the GEDCOM file as GEDCOM records with the tag _PLAC. It sounds like you have not inspected an FH GEDCOM file closely enough.
The same argument goes for Source Template records and Research Note records. They are all real GEDCOM records.
However, none of them exists in the GEDCOM standard specification, so migrating them to other products poses a challenge.

IMO FH would not need an XML database to support shared editing of multiple identical Citations. It would just be a new GUI feature.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 4850
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 08 Dec 2022 18:17

Mike,

David is talking about the internal data architecture used while FH while it is running, not the data storage architecture (which is a GEDCOM file). I think it highly likely that a temporary table of Place details exists (just as it does in your plugins).

User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 08 Dec 2022 18:30

Helen, please read how David describes the process of reading/writing Place data, which clearly suggests that he believes shared Place records are not saved in the GEDCOM file. He does talk about reading/writing the external storage architecture.
e.g. "all identical places have identical place fields" i.e. The shared Place data is replicated against each Place field.

The FH table of Place details is a one-to-one mapping of GEDCOM records.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
davidf
Megastar
Posts: 951
Joined: 17 Jan 2009 19:14
Family Historian: V6.2
Location: UK

Re: Using XML vs Gedcom to store data

Post by davidf » 08 Dec 2022 18:38

tatewise wrote:
08 Dec 2022 18:08
David, FH does not have any virtual table of Place details. They are real GEDCOM Place records and saved in the GEDCOM file as GEDCOM records with the tag _PLAC. It sounds like you have not inspected an FH GEDCOM file closely enough.
...
However, none of them exists in the GEDCOM standard specification, so migrating them to other products poses a challenge.
So there are! It was just in browsing GEDCOMs I had never seen any links to @Pn@ type records (I expected 2 PLAC @Pn@ but found none just text against PLAC. But searching for "0 @P" finds them!

So presumably at some stage (every file opening - in case another program has "been at the file"?) FH does read all the PLAC fields and compare to the "Custom" records - but then writes them to _PLAC records (which makes holding geocoding etc easier). Indexing is done strictly on the text string in the PLAC field?

Migrating them presumably means denormalising to enable the elimination of the Custom records - Map fields becoming sub fields of each applicable PLAC field and presumably everything else goes into PLAC.NOTEs?
David
Running FH 6.2.7. Under Wine on Linux (Ubuntu 22.04 LTS + LXDE 11)

User avatar
Mark1834
Megastar
Posts: 2145
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using XML vs Gedcom to store data

Post by Mark1834 » 08 Dec 2022 18:40

rcpettit wrote:
08 Dec 2022 18:06
I switched to FH due to the fact they seem to be the only software developer who makes regular updates and enhancement.
RM is updated more frequently than FH, but probably has a higher percentage of bug fixes over true enhancements.
Mark Draper

User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 08 Dec 2022 18:58

David, the linkage between Place (PLAC) fields and Place (_PLAC) records is different to normal GEDCOM XREF links.
The association is purely by the Place name text, which is why all Place record names must be unique (the unique key).
Also, since Place fields retain the name text, they export quite well to other products, but Place records get discarded, unless the Export Gedcom File plugin is used to make the necessary adjustments where feasible.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
AdrianBruce
Megastar
Posts: 1961
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: Using XML vs Gedcom to store data

Post by AdrianBruce » 08 Dec 2022 21:54

The reality is that GEDCOM and XML are just methods of plumbing. Meh.

Calico Pie couldn't cut loose from GEDCOM and exploit XML as a new, green field, plumbing mechanism because they'd still need to be able to read GEDCOM files and store the data in the new XML format, and vice versa. Hence one couldn't generate new entity types to be stored in the XML format without also creating the means of storing the same entities in the GEDCOM. Two-way compatibility requirements therefore constrain what could be done in any new format.

I might also add that I lost a lot of faith in XML and / or XML advocates when I realised that many people were advocating using JSON as a storage mechanism, using basically the same arguments for JSON that had been used for XML. It seemed that, if nothing else, XML's time in the sun was slipping away in favour of JSON.
Adrian

avatar
rcpettit
Diamond
Posts: 68
Joined: 30 Apr 2015 00:01
Family Historian: V7

Re: Using XML vs Gedcom to store data

Post by rcpettit » 08 Dec 2022 23:37

Yes Mark RM is making a lot of bug fixes. I have RM 8 to read some old databases. RM8 right now seems more interested in getting it to work with Ancestry and FamilySearch then the actual program itself. Atleast FH's programmers listen to their users, RM seems to have a leave us along, we'll let you know when we fix it and not before mentality. There have been many feature requests for RM over the years that have never been added even though they are wanted by their users. Right now the work flow for RM8 sucks, lots of jumping in and out of dialogs to enter data that most Genealogy programs allow in one screen. It seems RM8 has a long way to go to catch up with FH.

User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 09 Dec 2022 10:11

Anyway, enough of this banter. The Subject line is 'Using XML vs GEDCOM to store data' but most of the OP talks about managing shared/duplicate Citations. There is no need to replace GEDCOM with XML to allow FH to manage multiple instances of identical data as if they were one entity.

To prove my point, FH already has those features. For example, Address fields can be dotted all over the database in Individual facts, Family facts, Repository records, etc, but identical replicated Address values can be managed as if one single entity.
Where is this magic I hear you cry? It is in Tools > Work with Data > Addresses... and similar capabilities apply to Occupations, Generic Source Types, Fact/Witness Sentences, etc.

So it does not take a genius to see that Tools > Work with Data > Citations... would be perfectly feasible while retaining a GEDCOM database. Maybe Tools > Work with Data is not ideal and an option in the floating Citation Window would be better. That window already shows how many Citations are associated with the Source Record, and offers the Show Source Record's Citations menu option, so adding a Work with Citations option would be useful.

IMO drop the XML aspects of this request and focus on a Work with Citations feature.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 4850
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 09 Dec 2022 10:15

I agree -- it's tilting at the wrong target, plus attempting to specify solutions rather the requirements.

User avatar
Mark1834
Megastar
Posts: 2145
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using XML vs Gedcom to store data

Post by Mark1834 » 09 Dec 2022 11:06

IMO, the requirement is exactly the missing feature that I've been banging on about ever since I migrated from FTM to FH over five years ago. Manage duplicated citations as one entity, so change one and you change them all. FTM has had it forever, RM8 has now joined the party, and FH is conspicuous among the "big three" desktop apps for not implementing it.

FHUG has moved on as more new users who are used to a "lumping" model sign up from other apps, so the discussion is now more about "what do we think that should like like?" rather than the "why do we need that?" that was common from FH diehards in the past. I agree with Mike that Tools > Work With Data is better than nothing, but very much an imperfect sticking plaster.

Agree that it is probably harder to implement in GEDCOM than in a relational database (where the shared citation is a single entity), but that's not our issue to solve. Should the discussion focus on how to frame a suitable entry for the Wish List?
Mark Draper

User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 09 Dec 2022 11:21

The management of shared/duplicate Citations is discussed in Viewing and editing citations (19829) that was initiated by the same OP.

IMO this thread should be closed and that other thread come to a conclusion for the wording of a Wish List item.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 4850
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 09 Dec 2022 11:24

Mark1834 wrote:
09 Dec 2022 11:06
Should the discussion focus on how to frame a suitable entry for the Wish List?
Yes, but possibly under a different topic heading?

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by NickWalker » 09 Dec 2022 11:42

A big issue with citations is that they don't have names or an identifying ID so recognising which citations belong together is very difficult and possibly a greater problem than the duplication issue when it comes to 'lumping'. If you have one '1881 England Census' source with hundreds of thousands of citations, how do you recognise which 30 of those are for the same household? So any mechanism that tries to intelligently lump those citations together would need to introduce a citation id of some kind and ideally a citation title too.
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
tatewise
Megastar
Posts: 27074
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 09 Dec 2022 12:38

But Nick it is nothing to do with where the Citations are used any more than the Tools > Work with Data options.
It is simply a matter of 'grouping' identical Citations together as if they were one entity.
That can be done by simply comparing all the Citation subfields including the link to the Source record.
If the 'lumped' Citations have been created consistently then they will get 'grouped' together consistently.
Those subfields could be shown in Columns rather like Tools > Work with Data > Places has Columns for fields.
Similarly, a Where Used... button could create a Result Set list of where a 'group' of Citations is used.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
davidf
Megastar
Posts: 951
Joined: 17 Jan 2009 19:14
Family Historian: V6.2
Location: UK

Re: Using XML vs Gedcom to store data

Post by davidf » 09 Dec 2022 13:23

tatewise wrote:
09 Dec 2022 12:38
That can be done by simply comparing all the Citation subfields including the link to the Source record.
If the 'lumped' Citations have been created consistently then they will get 'grouped' together consistently.
I think it is more complex than that - certainly if the "Where Used" field and the Source are the same, you can argue that the "Citation" is the same, but the notes which may include argument about why the citation has been applied in a particular way to a particular person may differ. Arguably the specific text from source might also differ.

If doing a census citation as a lumper (i think this issue is barely noticeable for splitters), my source will be say "1881 Census for England and Wales"; my "where used" might be the census reference down to page level (but not line level) and I would have to enter this consistently to get a "common citation" - the copy and paste icons in the yellow "source" pane (V6) does this for me. I would have common text from source - the household transcription, held either in the citation "test from source" or in a shared note (currently attached via the all Tab - which is messy), but my citation notes I might like to vary.

For lumpers if we try and think "relationally" (in db terms not genealogically) we have a source (1881 England and Wales Census) which we are citing to support a whole lot of facts (CENS, OCCU, directly and BIRT, FAMx, etc. indirectly) about a number of people.

So we have a Citation-Header Record with a Foreign Key link to the Source and a "Where Used" Attribute - which in the absence of a Citation-Header key field, becomes with the source ID the "Citation Key". You might hold a transcription of the part of the source being cited here (Text from "source") - I would for a census record - the citation in effect being a snap-shot "household record". Think of the Citation-Header as a "micro-source"?

We then have Citation-Line records which in effect hold the Citation Key and (ideally) a Foreign Key which points to a specific Fact (one Citation Line to a fact). Each citation line might have attributes highlighting the relevance of the citation to the fact. The Assessment attribute probably lives here rather than in the Citation-Header. (How might I record doubt that a "son" of the Head of Household might actually be a "step-son"?). It is also possible that in some complex citations (military history?) you might want to hold "Text from Citation" (i.e. Fact-specific text from source). This "Citation-Line table" is also the source of a "where used" for a Citation.

In practice (leaving aside thoughts of rdbm etc.), I very much doubt that CP will introduce facts with their own unique keys (although that would open a useful nutritious can of worms!). But I think as Nick suggests we would need a "Citation ID". The way I work that could be the Source ID plus the "Where Used" - which is conveniently found on the Fact branch and would form the Key to the Citation-Header. In physical practice the Citation-Line detail would be held as a branch to the fact (in a similar way to currently). If we had to distinguish between "Text from Source" and "Text from Citation", the former might (physically) have to be shunted into a shared note hanging off the citation.
David
Running FH 6.2.7. Under Wine on Linux (Ubuntu 22.04 LTS + LXDE 11)

User avatar
AdrianBruce
Megastar
Posts: 1961
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: Using XML vs Gedcom to store data

Post by AdrianBruce » 09 Dec 2022 14:33

I have to say that I'm worried about what a Citation is - i.e. how to identify it and therefore how to detect the "duplicates".

I have this suspicion that Method 1 Splitters have no real interest in detecting duplicates. Take this typical Source / Citation from me:
Screenshot 2022-12-09 134546.jpg
Screenshot 2022-12-09 134546.jpg (84.16 KiB) Viewed 1078 times
There are 6 citations against this Source Record (according to the current software). But (without checking) I suspect that there will only be two sets of data in the Citation Specific Details - one assessed as Primary info (this one) and another assessed as Secondary. On the other hand, there could be a third where I've got something in the Text From Source (which I always restrict to text relating to the specific item being supported by this citation), a 4th with something else in the Text From Source, etc., etc.

Why would I want to update all 6? Or a sub-group of more than 1, given that potentially they are all different? Well, I might want to alter the line number but arguably that should have been in the Source Record anyway. But why would I alter the line number, or the date?

So this is me suspecting that Method 1 Splitters have no real interest in detecting duplicate citations. As David suggests above.

When it comes to Method 2 Lumpers, it's probably a different kettle of fish... Like many Splitters, I actually lump indexes, Wikipedia, Directories, etc. This is a citation for my great-aunt's birth index from the GRO site:
Screenshot 2022-12-09 134718.jpg
Screenshot 2022-12-09 134718.jpg (103.65 KiB) Viewed 1078 times
That single index entry could support (wild guess) 5 different "facts". Would I ever want to update all 5 citations? Potentially - I might realise "Wrong Constance..." and want to delete all 5. Or I might want to reformat the "Text from Source" in the Citation to put the text into a table. I'd like to do that once please, if I did. But what about the Note in the Citation? It's possible that a couple of the 5 citations have different Notes, calling attention to different parts of the Text from Source maybe... So is that 1 citation quintuplicated? (Text from Source argument says yes please) Or 3 different citations, each with different Notes? And how do I - or, more to the point, the software - gather those 5 (or 3 or 1) together, because I'm really not sure what the key(s) is to do that?

That last challenge is surely the key - pun not intended. How do I identify those 5, that are on (say) 3 different individuals? I suspect through the Text from Source... So hard luck if you wrote different Text from Source against each of the 5 "events" being supported.

I'm not saying this is a bad idea - far from it - I'm trying to concentrate my own thoughts first because if Calico Pie did something with this, there's every danger that the first Method 2 Lumper to use the software will complain "That's not a duplicate citation.." Or "This is a Duplicate Citation and you haven't picked it up - OK it has a different Text from Source but look at the Where Within...!"
Adrian

User avatar
davidf
Megastar
Posts: 951
Joined: 17 Jan 2009 19:14
Family Historian: V6.2
Location: UK

Re: Using XML vs Gedcom to store data

Post by davidf » 09 Dec 2022 15:09

AdrianBruce wrote:
09 Dec 2022 14:33
I'm not saying this is a bad idea - far from it - I'm trying to concentrate my own thoughts first because if Calico Pie did something with this, there's every danger that the first Method 2 Lumper to use the software will complain "That's not a duplicate citation.." Or "This is a Duplicate Citation and you haven't picked it up - OK it has a different Text from Source but look at the Where Within...!"
That is one reason why I tried to distinguish between:
  • "Text from Source" (which in a census might be a household transcription - in effect creating a specific highly relevant micro-source for that household at that census) - held at the shared citation level and
  • "Text from citation" (for instance if that census records a widow's "occupation" as "widow of Infantry Officer" - I might want to use the whole citation to support that man's occupation - but the "text" I would want to actually cite would be the fact that his widow described herself as "widow of Infantry Officer") so I would want to hold that on the "Citation-line".
In both cases the source would be say the 1901 Census and "Where within" would be the same reference to the same page.

We may think up a new way to input using shared citations, but we probably need to also get our minds around how we best fit that into the GEDCOM straight jacket.

We can normalise data numerous ways, but in terms of "shared citations" we have to identify what bits are shared (and which in effect go into the header record of the "normalised" entity) and what bits aren't and which go into line or item records for the citation concerned.

Doing that creates "stuff" that it is then very difficult to de-normalise to fit into the way GEDCOM holds citations (as data on sub-branches of facts with links to source records). The stuff that is problematic is stuff that could go in the header, but could also go into the line/item. "Text from source" is one such example, but so is assessment and the citation note (often used to hold the "evidence justification" of the specific fact).

De normalising to "have both" is difficult; I have tried to hold "Text from source" as opposed to "Text from citation" in a shared note, so that the Text from source field hanging from a fact holds the "Text from citation" supporting the specific fact - that is messy but that is primarily due to (i) the current interface, and (ii) concerns that shared notes may get commandeered and edited by mistake.

Do we have to have custom tags? If so that then makes exporting problematic - are we prepared to live with that? Or in exporting do we gang up the shared stuff - for example concatenating the "Text from Source" held in a shared note on to the "text from citation" already in the Text from Source Field for each fact - and similarly with other attributes. It could be messy.
David
Running FH 6.2.7. Under Wine on Linux (Ubuntu 22.04 LTS + LXDE 11)

User avatar
Mark1834
Megastar
Posts: 2145
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using XML vs Gedcom to store data

Post by Mark1834 » 09 Dec 2022 15:12

tatewise wrote:
09 Dec 2022 12:38
It is simply a matter of 'grouping' identical Citations together as if they were one entity.
That can be done by simply comparing all the Citation subfields including the link to the Source record.
I agree with David that is too simplistic (and incorrect). I have pointed out before that you have to distinguish between the content of the citation and its interpretation. For example, the same GRO Death Index entry could support both death (Primary Information) and birth (Secondary Information) facts. The actual reference is the duplicated citation (year, volume, page, entry text, etc), but how it is used is distinct for each fact.

Also agree with both David and Adrian that this is really just a concept for lumpers. Splitters include all these common details in the actual source, so there is only one copy anyway.
Mark Draper

User avatar
LornaCraig
Megastar
Posts: 2989
Joined: 11 Jan 2005 17:36
Family Historian: V7
Location: Oxfordshire, UK

Re: Using XML vs Gedcom to store data

Post by LornaCraig » 09 Dec 2022 15:38

AdrianBruce wrote:
09 Dec 2022 14:33
I have to say that I'm worried about what a Citation is - i.e. how to identify it and therefore how to detect the "duplicates".

At present FH only regards two citations as identical if absolutely all the fields in the citation are identical. For example in Report Options >Sources there is an option to Combine Identical Citations for Same Source. They are only combined if all fields are identical (unless the non-identical fields are not to be included in the footnote). I use only generic sources but assume this is also true of templated sources.

To me (as a splitter) that makes sense. But how do other programs decide what to treat as identical citations?
Lorna

User avatar
Mark1834
Megastar
Posts: 2145
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using XML vs Gedcom to store data

Post by Mark1834 » 09 Dec 2022 15:58

LornaCraig wrote:
09 Dec 2022 15:38
But how do other programs decide what to treat as identical citations?
AFAIK, it is at the point of creation, either by copy/paste or reusing a shared citation. For example, RM8 keeps the citation content in the shared citation, but allows separate assessment for each application. At the moment, the FH copy/paste citation copies all the fields, so you have to remember to change the assessment if necessary. That doesn't matter in FH at the moment, because the original and copy citations are always completely independent of each other, but it would need to separate content and application (as RM and FTM do).
Mark Draper

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by NickWalker » 09 Dec 2022 16:04

tatewise wrote:
09 Dec 2022 12:38
But Nick it is nothing to do with where the Citations are used any more than the Tools > Work with Data options.
It is simply a matter of 'grouping' identical Citations together as if they were one entity.
That can be done by simply comparing all the Citation subfields including the link to the Source record.
If the 'lumped' Citations have been created consistently then they will get 'grouped' together consistently.
Those subfields could be shown in Columns rather like Tools > Work with Data > Places has Columns for fields.
Similarly, a Where Used... button could create a Result Set list of where a 'group' of Citations is used.
Others have explained why this is too simplistic but you're missing the point I was making. I was making a different point, not related to finding duplicates.... Once you've 'grouped identical citations together', how do you find them? There isn't a simple ID we can use to find a 'grouped citation', there isn't a title we can use to find a 'grouped citation'. It may be that the 'Where used' reference is filled in but I have no idea what the reference is for my ancestor William Walker living in Latchford in 1841. With splitting I can easily find this because it has a title and I can use the sources tab to sort by title and/or use filtering. The best we could do currently is perhaps have a list of all the reference ids, but that would not be helpful!

You were suggesting we might need to formulate something for the wish-list, the point I was making is that if citations were going to be dealt with by 'grouping', we would need some kind of id and possibly a title.

However I do really wonder why we need this. We'd just be turning citations into something very similar to splitter sources. Splitter sources work really well so why not use them :)
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

Post Reply