* Using XML vs Gedcom to store data

Requests that have been moved to the Wish List, or deemed to need no further action
User avatar
ADC65
Superstar
Posts: 376
Joined: 09 Jul 2007 10:27
Family Historian: V7

Re: Using XML vs Gedcom to store data

Post by ADC65 » 09 Dec 2022 16:11

NickWalker wrote:
09 Dec 2022 16:04
However I do really wonder why we need this. We'd just be turning citations into something very similar to splitter sources.
I agree entirely. This seems like too much navel-gazing for something that already has a solution.

It's a bit of a faff if I realise I have made spelling mistake in a (say) lumped probate index entry, and have to go and change the four or five Where Within fields I have used to cite the Source, but it happens rarely and doesn't, in my view, need any changes to the software.
Adrian Cook
Researching Cook, Summers, Phipps and Bradford, mainly in Wales and the South West of England

User avatar
Mark1834
Megastar
Posts: 2147
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using XML vs Gedcom to store data

Post by Mark1834 » 09 Dec 2022 16:26

Personally, I find that point of view rather arrogant - "FH works best with a splitter model, so forget all you learned with your old package and do it the way we always do". If you don't think FH should at least try to embrace other ways of working, just don't vote for any Wish List item that comes out of this, rather than trying to strangle it at birth.

The Wish List identifies issues that users want, and CP decide whether they are worth doing. I suggest we let that process take its course...
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 4854
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 09 Dec 2022 16:27

satyricon wrote:
09 Dec 2022 16:11
NickWalker wrote:
09 Dec 2022 16:04
However I do really wonder why we need this. We'd just be turning citations into something very similar to splitter sources.
I agree entirely. This seems like too much navel-gazing for something that already has a solution.
The OP seemed to think there was a valid use case -- I'd be interested to hear how their needs aren't currently being met.

If somebody already has a large number of 'lumper' sources, saying 'splitter' sources makes this unnecessary isn't very helpful.

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by NickWalker » 09 Dec 2022 16:35

ColeValleyGirl wrote:
09 Dec 2022 16:27
satyricon wrote:
09 Dec 2022 16:11
NickWalker wrote:
09 Dec 2022 16:04
However I do really wonder why we need this. We'd just be turning citations into something very similar to splitter sources.
I agree entirely. This seems like too much navel-gazing for something that already has a solution.
The OP seemed to think there was a valid use case -- I'd be interested to hear how their needs aren't currently being met.

If somebody already has a large number of 'lumper' sources, saying 'splitter' sources makes this unnecessary isn't very helpful.
When I said "I do wonder why we need this", I'm talking about the proposed change of grouping all the citations together and then my point of this would only work successfully with an id and then a title. This proposal effectively turns citations in splitter sources.

If FH has this suggested facility for grouping together citations then why not just turn them into sources? If someone has a large number of lumper sources couldn't they be converted into splitter sources by FH?
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
tatewise
Megastar
Posts: 27088
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 09 Dec 2022 16:36

Why is this discussion continuing in a topic about using XML to store data?
It should be in the existing posting Viewing and editing citations (19829) I mentioned earlier.

Yes, it only applies to lumpers and not splitters (but so do lots of FH features only apply to a subset of users).

Having found all the Citations and grouped them according to whatever 'duplication' criteria are appropriate, FH knows where they are all used because FH just found them in order to group them. IMO it is no different than knowing where multiple identical Address values or Occupation values are used.

If some groups of Citations differ by a field or two then by sorting them (in much the same way as Places and Addresses can be sorted on Parts in different ways) then similar groups will be adjacent or at least close to each other, so the same changes can be made to such groups much easier than all the Citations one by one.

I agree with the point that if a user already has a great many lumped Citations, perhaps imported from a lumping product, then saying that splitting is better is not much help.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by NickWalker » 09 Dec 2022 16:40

tatewise wrote:
09 Dec 2022 16:36
Having found all the Citations and grouped them according to whatever 'duplication' criteria are appropriate, FH knows where they are all used because FH just found them in order to group them. IMO it is no different than knowing where multiple identical Address values or Occupation values are used.
An address has an Address that can be listed and this is helpful because you are looking for an address. An occupation has an occupation which can be listed, etc. A citation doesn't have the equivalent 'title' or 'attribute' other than 'where in source' which is not particularly helpful unlike address and occupation.
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
Mark1834
Megastar
Posts: 2147
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using XML vs Gedcom to store data

Post by Mark1834 » 09 Dec 2022 16:41

NickWalker wrote:
09 Dec 2022 16:35
If someone has a large number of lumper sources couldn't they be converted into splitter sources by FH?
That's exactly what the Lumped Source Splitter plugin does, but it's another plugin sticking plaster to cover a limitation in FH design. It doesn't mean that limitation should not be addressed (the settings backup plugins are probably similar examples).
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 4854
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 09 Dec 2022 16:42

NickWalker wrote:
09 Dec 2022 16:35
If FH has this suggested facility for grouping together citations then why not just turn them into sources? If someone has a large number of lumper sources couldn't they be converted into splitter sources by FH?
If somebody wants to use lumper sources, why should they have to convert them?

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by NickWalker » 09 Dec 2022 16:47

I have no problem at all with people using lumper sources, I use them myself for some source types. I didn't say anyone had to convert to them?
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
tatewise
Megastar
Posts: 27088
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 09 Dec 2022 16:49

NickWalker wrote:
09 Dec 2022 16:40
tatewise wrote:
09 Dec 2022 16:36
Having found all the Citations and grouped them according to whatever 'duplication' criteria are appropriate, FH knows where they are all used because FH just found them in order to group them. IMO it is no different than knowing where multiple identical Address values or Occupation values are used.
An address has an Address that can be listed and this is helpful because you are looking for an address. An occupation has an occupation which can be listed, etc. A citation doesn't have the equivalent 'title' or 'attribute' other than 'where in source' which is not particularly helpful unlike address and occupation.
You were saying Citations need an Id in order to know where they are used. My point is they don't. FH knows where the Citations are used, otherwise the View > Citations to Source Record... and Show Source Record's Citations in Results Window would not work. How does FH know where all Address or Occupation values are used so they can be shown in a Result Set when they don't have an Id?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 4854
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 09 Dec 2022 16:50

NickWalker wrote:
09 Dec 2022 16:47
I have no problem at all with people using lumper sources, I use them myself for some source types. I didn't say anyone had to convert to them?
I must have misread this:
NickWalker wrote:
09 Dec 2022 16:35
If FH has this suggested facility for grouping together citations then why not just turn them into sources? If someone has a large number of lumper sources couldn't they be converted into splitter sources by FH?

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by NickWalker » 09 Dec 2022 16:50

tatewise wrote:
09 Dec 2022 16:49
NickWalker wrote:
09 Dec 2022 16:40
tatewise wrote:
09 Dec 2022 16:36
Having found all the Citations and grouped them according to whatever 'duplication' criteria are appropriate, FH knows where they are all used because FH just found them in order to group them. IMO it is no different than knowing where multiple identical Address values or Occupation values are used.
An address has an Address that can be listed and this is helpful because you are looking for an address. An occupation has an occupation which can be listed, etc. A citation doesn't have the equivalent 'title' or 'attribute' other than 'where in source' which is not particularly helpful unlike address and occupation.
You were saying Citations need an Id in order to know where they are used. My point is they don't. FH knows where the Citations are used, otherwise the View > Citations to Source Record... and Show Source Record's Citations in Results Window would not work. How does FH know where all Address or Occupation values are used so they can be shown in a Result Set when they don't have an Id?
I'm not talking about FH. OBVIOUSLY FH can identify them. I'm talking about how they would be presented to the user on the screen in a list.
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by NickWalker » 09 Dec 2022 16:52

ColeValleyGirl wrote:
09 Dec 2022 16:50
NickWalker wrote:
09 Dec 2022 16:47
I have no problem at all with people using lumper sources, I use them myself for some source types. I didn't say anyone had to convert to them?
I must have misread this:
NickWalker wrote:
09 Dec 2022 16:35
If FH has this suggested facility for grouping together citations then why not just turn them into sources? If someone has a large number of lumper sources couldn't they be converted into splitter sources by FH?
Yes you did misread it. I am suggesting the facilty that Mike is suggesting could be used to convert citations into sources. I'm not suggesting anyone would have to use this unless they wanted to! :?
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
ColeValleyGirl
Megastar
Posts: 4854
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 09 Dec 2022 16:53

Nick,

so that would still leave a potential unmet requirement for people who didn't want to convert their lumper sources?

H

User avatar
NickWalker
Megastar
Posts: 2401
Joined: 02 Jan 2004 17:39
Family Historian: V7
Location: Lancashire, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by NickWalker » 09 Dec 2022 16:56

I was just trying to make some helpful observations but I think I will leave this now
Nick Walker
Ancestral Sources Developer

https://fhug.org.uk/kb/kb-article/ancestral-sources/

User avatar
tatewise
Megastar
Posts: 27088
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 09 Dec 2022 16:57

NickWalker wrote:
09 Dec 2022 16:50
I'm not talking about FH. OBVIOUSLY FH can identify them. I'm talking about how they would be presented to the user on the screen in a list.
They would be presented in much the same way as Tools > Work with Data > Places... and Addresses... presents the multiple fields of data with a column for Source Record, Assessment, Entry Date, Where Used, Text From Source, Note, Media, etc, etc, and a Used count column. Then the Where Used... button would list all the instances of one group.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 4854
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 09 Dec 2022 17:02

NickWalker wrote:
09 Dec 2022 16:56
I was just trying to make some helpful observations but I think I will leave this now
As was I and as will I.

User avatar
davidf
Megastar
Posts: 951
Joined: 17 Jan 2009 19:14
Family Historian: V6.2
Location: UK

Re: Using XML vs Gedcom to store data

Post by davidf » 09 Dec 2022 17:28

NickWalker wrote:
09 Dec 2022 16:04
However I do really wonder why we need this. We'd just be turning citations into something very similar to splitter sources. Splitter sources work really well so why not use them
Not resisting the temptation to repeat the reasons why some like to "lump".
  1. Habit. Let's be up front about it.
    • If that is what you are used to from previous software, why should you radically change your previous work? OK there is a plug-in probably to work item by item though a database take a source concatenate it with the where used and use that to create a new source - inheriting the repository from the existing source; then go through finding all your redundant sources, deciding what to do with any source notes which may be "source source" specific but can no longer go in the source note because the previous citation notes have now taken that slot (put them in a shared note? - or leave the citation notes where they are and duplicate the real source notes across all the new sources). It can be done - but as a lumper that just seems messy.
    • If from previous practise you have learnt and possibly taught that a "Source" is "what" you go to. Say, the novel "Jane Eyre" - in a Bibliography that source would be listed as
      "Brontë, C (as Currer Bell), Jane Eyre, 1847, Smith, Elder & Co., London".
      If you then wanted to cite details of the fire at Thornfield Hall, your citation "where you found the specific detail" would be (inline text or footnote)
      (Brontë, C, 1847 pXX).
      That is an ingrained habit.
  2. Preference
    • I like to be able to pull up an Individual record and highlight the birth fact and see in my "Sources" pane a series of sources, briefly and succinctly described:
      • GRO birth index
      • 1851 Census
      • 1871 index
      • 1881 index
      • GRO death index
      It is a clean list and I can quickly see that the 1861 census is missing from the list.
    • If I am writing family history and following the sort of citing/referencing listed above, being able to pickup bibliographic information from the Source (once) for the bibliography and the specific citation details from the "Where within Source", fits my workflow (which is not unique and quirky)
    • I like to be able to pull up a specific source, eg. Beaumont Parish Register (Repository: Carlisle Record Office) and be able to easily look at Source Specific notes. For instance (and simplifying):
      • "You will be asked initially to look at the microfilm, ref XYZ, but where it is hard to read, ask nicely and they will retrieve the actual register from the strongroom",
      • "Browsing History: April 2013: Browsed 1871 - 1881 looking for X and noting occurrences of Y",
      • "Break in records: The period 17xx-18yy is missing, some entries may be found on the IGI"
  3. Tolerance of draw-backs
    • For shared/duplicated citations, I put up with having to either put stuff in attached shared notes (via the All tab - messy) or duplicating (for instance Census transcriptions) through having developed a work flow where I can for a census fact "copy as except (e.g. ages, relationship to Head)" and put up with the need to chase error correction through multiple places (but I could use a search and replace plug-in if it was a heavy weight job)
But that last point should not mean that I should not ponder alternative ways of handling this data - particularly if others are saying that other applications seem to be able to do this.

I don't however think that XML is the way to go (even having experimented with XSLT many years ago). How FH works depends on its internal structure - which we don't know, we can only guess at. We do know that CP makes much of being GEDCOM compatible and they have chosen (almost as a structural decision) to achieve this by using GEDCOM as the storage medium, rather than have an export to GEDCOM routine.
David
Running FH 6.2.7. Under Wine on Linux (Ubuntu 22.04 LTS + LXDE 11)

User avatar
davidf
Megastar
Posts: 951
Joined: 17 Jan 2009 19:14
Family Historian: V6.2
Location: UK

Re: Using XML vs Gedcom to store data

Post by davidf » 09 Dec 2022 17:31

Do we retitle this topic to something like Shared Citations in a GEDCOM compliant application?
David
Running FH 6.2.7. Under Wine on Linux (Ubuntu 22.04 LTS + LXDE 11)

User avatar
tatewise
Megastar
Posts: 27088
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using XML vs Gedcom to store data

Post by tatewise » 09 Dec 2022 18:57

David, yes, we retire this topic and promote the existing posting Viewing and editing citations (19829) I mentioned twice earlier that is on exactly the appropriate topic.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 4854
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 09 Dec 2022 19:00

Mike that looks like a different request to this one albeit related. Why not change the title on this one and keep the discussion to date easily findable.

I did suggest renaming it some time back

User avatar
AdrianBruce
Megastar
Posts: 1962
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: Using XML vs Gedcom to store data

Post by AdrianBruce » 09 Dec 2022 20:37

AdrianBruce wrote:
09 Dec 2022 14:33
... This is a citation for my great-aunt's birth index from the GRO site:
Screenshot 2022-12-09 134718.jpg
That single index entry could support (wild guess) 5 different "facts". Would I ever want to update all 5 citations? Potentially - I might realise "Wrong Constance..." and want to delete all 5. Or I might want to reformat the "Text from Source" in the Citation to put the text into a table. I'd like to do that once please, if I did. But what about the Note in the Citation? It's possible that a couple of the 5 citations have different Notes, calling attention to different parts of the Text from Source maybe... So is that 1 citation quintuplicated? (Text from Source argument says yes please) Or 3 different citations, each with different Notes? And how do I - or, more to the point, the software - gather those 5 (or 3 or 1) together, because I'm really not sure what the key(s) is to do that?

That last challenge is surely the key - pun not intended. How do I identify those 5, that are on (say) 3 different individuals? I suspect through the Text from Source... So hard luck if you wrote different Text from Source against each of the 5 "events" being supported.
...
Just to follow up on the above - could I identify the "other" / duplicated(?) citations?

I went to the Source Record for the GRO Index Site (rather than FreeBMD as per the screenshot) and pulled up "Show Source Record's Citations in Result Window". I could find the line in the Results Window for my great-aunt because I know her name. But there is no immediate clue in the Results Window about the other events that the same index entry supports. To work out what those events are, I had to go to my great-aunt's birth event, look at the Text from Source in the Citation Details, match that text against her family, guess what events the text might support - and examine each manually. Why I would need to do this, I'm not sure, but I found it "interesting" that I could only do it by reverse engineering the conclusions.

Having said that, my inability to find the other entries appeared to come from what's (not) in the printed Footnotes. When I then set the Footnote to show the Text from Source in the Citation Details and sorted on that column, I eventually found Aunty Constance alongside her mother - apparently the Text from Source in the Citation Details shows her mother's maiden name, which looks like the only other "fact" supported. So it seems that I can locate the other citations derived from the same real world data - but the "key" is the content of the Text from Source in the Citation Details. Which I'm instinctively perturbed about! :o This means that Mike's suggestion about using "Show Source Record's Citations in Result Window" is possible if that facility is hacked about but it's still dependent on the Text from Source in the Citation Details (in my case) to identify the sort of similar citations.
Adrian

User avatar
ADC65
Superstar
Posts: 376
Joined: 09 Jul 2007 10:27
Family Historian: V7

Re: Using XML vs Gedcom to store data

Post by ADC65 » 10 Dec 2022 10:01

Mark1834 wrote:
09 Dec 2022 16:26
Personally, I find that point of view rather arrogant - "FH works best with a splitter model, so forget all you learned with your old package and do it the way we always do". If you don't think FH should at least try to embrace other ways of working, just don't vote for any Wish List item that comes out of this, rather than trying to strangle it at birth.

The Wish List identifies issues that users want, and CP decide whether they are worth doing. I suggest we let that process take its course...
I think you need to calm down a bit.

I believed this sub-forum was about discussing whether a request was worth creating a Wish List item for.

Like a few others, I have made my point and I'll now step back.
Adrian Cook
Researching Cook, Summers, Phipps and Bradford, mainly in Wales and the South West of England

User avatar
ColeValleyGirl
Megastar
Posts: 4854
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using XML vs Gedcom to store data

Post by ColeValleyGirl » 03 Jun 2023 12:30

I'm closing this as per Mark's categorisation at Unblocking the Wish List process (21942), as the discussion hasn't produced an actionable proposal.

Post Reply