* GEDCOM 7.0

The place to post news about genealogy products and services that might be of interest to other Family Historian users.
User avatar
ColeValleyGirl
Megastar
Posts: 3022
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: GEDCOM 7.0

Post by ColeValleyGirl » 09 Jun 2021 11:12

Mark1834 wrote:
09 Jun 2021 11:04
Probably - I think the takeaway from this is that GEDCOM works tolerably well for recording Western-style family relationships up until about the end of the last century (coincidentally, exactly the markets where genealogy software is currently sold), but something fundamentally different will probably be needed in the future when we are all dead and buried...
Fair summary -- except the something different is needed now, for people from cultures with different family structures. I suspect it's a chicken-and-egg situation -- the market may not be there because the daemand isn't there, but the demand isn't there because there's no software to support it. PAF kickstarted the genealogy software market in the West, and IIRC led to the development of Gedcom; if there's something similar happening in other parts of the world I haven't seen it (but I haven't gone looking). China and India would be the big markets, I guess.

User avatar
AdrianBruce
Megastar
Posts: 1296
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: GEDCOM 7.0

Post by AdrianBruce » 09 Jun 2021 13:48

If we had DNA testing before PAF, etc & GEDCOM - would the GEDCOM etc Data Model bear any relation to what it is now? (Rhetorical question)
Adrian

User avatar
LornaCraig
Megastar
Posts: 2215
Joined: 11 Jan 2005 17:36
Family Historian: V7
Location: Oxfordshire, UK

Re: GEDCOM 7.0

Post by LornaCraig » 09 Jun 2021 15:33

....something fundamentally different will probably be needed in the future when we are all dead and buried...
Yes, I do wonder what will happen if our descendants start cloning themselves!
Is a clone a 'child' (with a single biological parent)? Or a twin (born/created many years after its identical sibling?) :?
Lorna

avatar
KFN
Gold
Posts: 12
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: GEDCOM 7.0

Post by KFN » 20 Jun 2021 01:42

I have not had a chance to review the GEDCOM 7.0 document as well as I wanted so any comments I make are probably incomplete. I saw this discussion in a general search of the web.

Personally, I’m not a fan of the use of ANY HTML in the GEDCOM 7.0 Text Stream. The problem is layered around the introduction of links, scripts, and other stuff by bad actors and security for online sites that use GEDCOM to build the site.

If you need markup to bold words, create headers, develop tables, control paragraphs, and include photos inside documents a set of commands that is portable to Mac, Windows, Linux, Android, … exists that is NOT HTML called “Markdown”.

For example :Image

This is one example of a Markdown app, other apps exist, this example is more about using regular characters from a standard character set to tell the rendering program what the display or report looks like.

Another advantage to markdown is that it is much more human readable than html.

User avatar
tatewise
Megastar
Posts: 22131
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: GEDCOM 7.0

Post by tatewise » 20 Jun 2021 10:19

I'm not sure I understand the advantage of Markdown over HTML bearing in mind that GEDCOM is a communication format for transferring genealogy data from application to application.

https://en.wikipedia.org/wiki/Markdown suggests there is limited standardization across implementations and the examples include URL links that you claim are a bad idea. Markdown does not support all of the features you mention such as tables nor such as footnotes and abbreviations. Those are only in Markdown Extra that is not so widely supported. So HTML has better standardization and supports more features.

Anyway, users would not usually expect to edit in either Markdown or HTML codes, but via a word-processing style editor.
In the same way as users would not usually directly edit the GEDCOM file but use a genealogy application.
So the underlying GEDCOM coding is largely irrelevant to users and mostly impacts developers.

In my experience, there are already many genealogy products that to some extent support HTML in GEDCOM text fields.
Some only support bold, italic, underscore and super/subscripts, but others support much more.
I am not aware of any that support Markdown in imported or exported GEDCOM files.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
KFN
Gold
Posts: 12
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: GEDCOM 7.0

Post by KFN » 20 Jun 2021 12:49

The problem with supporting HTML is not with users who use the facility’s within a desktop program to add bolding (et el) but when transferring a GEDCOM as input to an online program where that programs is to expected to use the html and some bad actor have created their own GEDCOM with imbedded html. If the html contained scripts (added by a bad actor) or other things that could cause security issues, the online program would have to first remove all stuff they would not want in their payload (including CSS), then either convert it to markdown or ensure that the editor they used for additional html input did not support all of the things a bad actor could add.

Online programs generally don’t let people input html into their text fields for security reasons, therefore they would have some issue with receiving html in a GEDCOM load!

The markdown I’ve use does support tables. I’ve never needed to use a footnote in a GEDCOM NOTE in almost 40 years of GEDCOM use! Generally a NOTE tag is the footnote to the information it is associated with!

User avatar
ColeValleyGirl
Megastar
Posts: 3022
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: GEDCOM 7.0

Post by ColeValleyGirl » 20 Jun 2021 13:07

StackExchange uses Markdown based on CommonMark (as do Discourse, GitHub, GitLab, Reddit, Qt, Stack Exchange (Stack Overflow), and Swift, which makes for quite a big user base; and SE supports tables, in a WYSIWYG editor using open source code.

I personally prefer it to HTML -- it *is* more easily readable/editable in raw form, ore portable and more secure -- but it does lack superscripts/subscripts which would rule out the current approach to FH embedded source citations.

avatar
KFN
Gold
Posts: 12
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: GEDCOM 7.0

Post by KFN » 20 Jun 2021 13:23

ColeValleyGirl said:
it *is* more easily readable/editable in raw form, ore portable and more secure -- but it does lack superscripts/subscripts which would rule out the current approach to FH embedded source citations.
The readability in Raw GEDCOM is very valuable to me. As to superscript/subscripts and NOTES with imbedded source citations I’ve never had the need to do that in a NOTE tag since I general use the NOTE tag as an explanation of the super tag it is associated with, but that is how I use GEDCOM v5.5.1 and backward. We will see how v7.0 changes my use (if it does) once I read the document for the 20th time! :lol:

avatar
KFN
Gold
Posts: 12
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: GEDCOM 7.0

Post by KFN » 20 Jun 2021 15:41

Mark said:
Probably - I think the takeaway from this is that GEDCOM works tolerably well for recording Western-style family relationships up until about the end of the last century (coincidentally, exactly the markets where genealogy software is currently sold), but something fundamentally different will probably be needed in the future when we are all dead and buried...
While the following is not about family relationships per se, One of the major issues with the design and use of “Western Style” thoughts surrounding GEDCOM is that an increasing number of people in Western Countries including Slavic, Spanish, Portuguese and others have differing views of the GEDCOM needs. As a Norwegian and speaking to many others of Scandinavian origin I lump myself in this as well (maybe the Scott’s and Irish as well).

GEDCOM v7 does not fix problems with surnames in any way.

Slavic tradition adds gender specific ending to a surname root. “-ski” is male, “-ska” is female. GEDCOM does not support surname root name for family name indexing. Therefore, in some cases, Kowal is the root to Kowalski and Kowalska and are of the same surname family.

Surname problems also occur in many traditions were GEDCOM and software view multiple surnamed individuals as one single surname, almost like a hyphenated surname. In Portuguese it is common for the surname to include a maternal surname preceding the paternal one. An individual can have up to 4 distinct surnames, that all should be viewed as independent and indexed.

Many traditions don’t have inherited surnames but have location, clan and/or patronymic based Surnames. This naming tradition may include non-surname value like “ da, das, do, dos and de” which precede location names, as in “Leonardo da Vinci” was from the town of Vinci.

I point this only as a matter of reference that GEDCOM has a way to go to even support Western Genealogy!

User avatar
AdrianBruce
Megastar
Posts: 1296
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: GEDCOM 7.0

Post by AdrianBruce » 20 Jun 2021 16:23

While concern for security is crucial, I confess to being somewhat bemused by the idea that GEDCOM 7 is supposed to support HTML is all its glory - page 71 of the 7.0.1 spec'n, says:
As of version 7.0, only 2 media types are supported by this structure:
- text/plain shall be presented to the user as-is, preserving all spacing, line breaks, and so forth.
- text/html uses HTML tags to provide presentation information. Applications should support at least the following:
  • p and br elements for paragraphing and line breaks.
  • b , i , u , and s elements for bold, italic, underlined, and strike-through text (or corresponding display in other locales; see HTML §4.5 for more).
  • sup and sub elements for super- and sub-script.
  • The 3 XML entities that appear in text: & , < > . Note that &quote; and ' are only needed in attributes. Other entities should be represented as their respective Unicode characters instead.
Note that the purpose here, as it says above, is "to provide presentation information". Also
Supporting more of HTML is encouraged. Unsupported elements should be ignored during display.
While "more of HTML is encouraged" might be held to suggest supporting anything and everything, if this is interpreted to be outside the bounds of presentation, then on their own head be it.

KFN appears to acknowledge this by saying that the issue is:
transferring a GEDCOM as input to an online program where that programs is to expected to use the html and some bad actor has created their own GEDCOM with imbedded html.
If someone writes an online program expecting to use HTML in its full glory, then yes, of course they need to validate what the HTML does. But surely validation is what online software is supposed to do? Browsers validate what happens. Software with more limited intended usage should be coded to just carry out just that limited usage. There is no requirement in GEDCOM 7 for the use of anything other than a limited number of tags for presentation purposes, so no excuse for implementating tags of interest to black-hats. You might say, "Well what if they do implement such support?" But that question applies to any and every bit of online software - what if they expose unfortunate facilities?

So to me, yes, there is a risk that someone would slap a full browser interpretation of HTML into the wrong place, but I don't see this as any more of a risk than allowing online programming in the first place.

Now having said all that, I must also add that KFN hits the nail very much on the head by mentioning the issues with surnames - indeed, highlighting the implications for indexing (or not) of family names (whatever they are!) is a valuable point, not least because it's a very specific, practical issue. I know that librarians can provide examples of similar issues, such as whether "van Gogh" should be collated under v or G and I think that the answer varies depending on where you are! To repeat myself for the umpteenth time, if FS can't even date surnames, it doesn't say much for the sophistication of their view.

At the same time, I think some of these issues are not even settled in the real world - for instance, do mat/patronymics count as family names? (whatever they are...)
Adrian

avatar
KFN
Gold
Posts: 12
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: GEDCOM 7.0

Post by KFN » 20 Jun 2021 17:08

If the supported html was restricted to only italics, bold, underline, strike, paragraph, line break, h1, h2 then maybe I would say that this is nothing more than Markdown and while I like markdown better, this would be ok.

But, when GEDCOM leaves the actual definition of allowing html open for interpretation then does presentation also include fonts retrieved from websites, or those that require licenses on platforms like Linux, or are only supported on Windows? When does presentation on a website remove unwanted html, but then because it is lost on the website, does that individual just shrug their shoulder when they want the exact text as enter back on their desktop? GEDCOM was meant as a way to transfer genealogical data in text form, I have seen in GEDCOM from a cut and paste when the text include a lot more that just simple formatting that was not removed in the originator of the text GEDCOM get written out to be used in the targeted other software.

So I guess my real quibble is the use of the term HTML rather than saying “markup” and the defining the markup they say is supported, full stop. Rather than leaving it open at the end.
Last edited by KFN on 20 Jun 2021 17:37, edited 2 times in total.

avatar
KFN
Gold
Posts: 12
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: GEDCOM 7.0

Post by KFN » 20 Jun 2021 17:18

Adrian ask:
At the same time, I think some of these issues are not even settled in the real world - for instance, do mat/patronymics count as family names? (whatever they are...)
In my Scandi world they do not! Unless they are post legal requirement for an inheritable surname.

Having gone to library school and used that Masters degree to build software and designed databases for libraries, warehouses, factories and financial institutions, I have a slightly different approach to names than the current set of Genealogy software programs which GEDCOM influences.

EDIT: We index “van Gogh” under “G”. Some libraries create a finder index for “Van Gogh” with a “see also”.

User avatar
AdrianBruce
Megastar
Posts: 1296
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: GEDCOM 7.0

Post by AdrianBruce » 20 Jun 2021 17:43

KFN wrote:
20 Jun 2021 17:08
... So I guess my real quibble is the use of the term HTML rather than saying “markup” and the defining the markup they say is supported, full stop. Rather than leaving it open at the end.
Actually, that is a reasonable point - the phrase "more of HTML is encouraged" is rather vague - perhaps, one could say that no serious specification should be so vague.
Adrian

User avatar
AdrianBruce
Megastar
Posts: 1296
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: GEDCOM 7.0

Post by AdrianBruce » 20 Jun 2021 17:53

KFN wrote:
20 Jun 2021 17:18
... EDIT: We index “van Gogh” under “G”. Some libraries create a finder index for “Van Gogh” with a “see also”.
I found my reference that I was thinking of: https://academia.stackexchange.com/ques ... bliography has an answer that starts:
The authoritative reference for this type of question (for librarians, at least) would be the publication "Names of Persons" by the International Federation of Library Associations and Institutions ...
and goes on to say:
Here's what it says about 'van':
  • If the person is Dutch, "van Beukering" should be sorted under B
  • If he or she is Belgian, sort it under V (but note the small print that says Belgian libraries aren't consistent across the country)
  • If they're from the US, sort it under V
I imagine that the way forward there would be to mark up the components of the name and then leave it as a function of the software to decide how to display those components.
Adrian

avatar
KFN
Gold
Posts: 12
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: GEDCOM 7.0

Post by KFN » 20 Jun 2021 18:07

Adrian,

It is that vagueness that has plagued GEDCOM from the get go!

They introduced a vague problem with the xref-ID changing the max length from 22 characters to unlimited. While this does not have a major effect on most uses of GEDCOM an unlimited XREF can never be used as an indexed key column in an SQL database. They would have better off expanding it to be big enough for a UUID of say 38 characters. In documents they acknowledge this issue but can’t change it now.

Family Search rather than being a clearing house for custom tags, their name and usage, left it up to the vendors to document the tags and hope that tag collisions and context duplications never occur! This opens up a “Standard” to unstandard outcomes!

avatar
KFN
Gold
Posts: 12
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: GEDCOM 7.0

Post by KFN » 20 Jun 2021 18:34

Adrian,

Your comment about the indexing has an important “note” it is all about the country of origin of the individual to be indexed not the country that is indexing the individual, so van Gogh because he’s Dutch the index is under “G” and this should be used in all cases internationally based on the reference document. But if he had been French (with the same name) he would be index under “V” and used that way in all indexes. This allows international users that know the country of origin to find the person regardless of their location. The rules are laid out, in hopes that every student and library follows them, and is why also that most libraries don’t do their own cataloging themselves anymore! Because the rules are very hard to follow, this was one class I did not do well in at school and never became a school or municipal librarian, but did corporate library instead!

User avatar
AdrianBruce
Megastar
Posts: 1296
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: GEDCOM 7.0

Post by AdrianBruce » 20 Jun 2021 18:36

KFN wrote:
20 Jun 2021 18:07
... Family Search rather than being a clearing house for custom tags, their name and usage, left it up to the vendors to document the tags and hope that tag collisions and context duplications never occur! This opens up a “Standard” to unstandard outcomes!
Personal opinions: I think that we have to be practical. FS could never and should never act as a clearing house for custom tags:
  • They couldn't react fast enough;
  • There is every danger that the parent Church of Latter Day Saints stop FS proceeding with GEDCOM based work - that's how we've had the hiatus from the last agreed spec'n (which was GEDCOM 5.0) to now;
  • Since many software vendors can't follow the GEDCOM Standard anyway, the chances of the vendors (a) creating a sensible request for FS and (b) following the response that they get back from FS, are somewhere on the range from slim to none;
And, as implied above, most of the "non-standard" outcomes relating to the GEDCOM standard are because of the inability of many vendors to follow the specification even in the non-custom tags. Rant over.
Adrian

User avatar
tatewise
Megastar
Posts: 22131
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: GEDCOM 7.0

Post by tatewise » 30 Jul 2021 19:05

I have just noticed that there is a response by https://fhiso.org/ to the new announcement of GEDCOM version 7.
The FHISO Chairmain, Luther Tychonievich, served as drafting editor of this new standard, which it seems incorporates some FHISO developments. The FHISO seem largely in favour of GEDCOM 7.0 but go read the response to form your own opinion.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Valkrider
Megastar
Posts: 1394
Joined: 04 Jun 2012 19:03
Family Historian: V7
Location: Spain
Contact:

Re: GEDCOM 7.0

Post by Valkrider » 31 Jul 2021 06:32

Luther released v7.0.4 of the spec yesterday on GitHub.

User avatar
tatewise
Megastar
Posts: 22131
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: GEDCOM 7.0

Post by tatewise » 31 Jul 2021 11:13

Colin, I cannot find that on GitHub, so could you post a link, please.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Valkrider
Megastar
Posts: 1394
Joined: 04 Jun 2012 19:03
Family Historian: V7
Location: Spain
Contact:

Re: GEDCOM 7.0

Post by Valkrider » 31 Jul 2021 14:34

Mike

Here you go.

https://github.com/FamilySearch/GEDCOM/ ... tag/v7.0.4

It is worth subscribing to the repository on GitHub to be notified of the updates. I am getting a few email notifications about minor changes every week.

Post Reply