AdrianBruce wrote: ↑26 Nov 2022 21:24
Helicopter view... Do we need to step back and say, "
Stop trying to come up with one sort order"? (Or whatever else we are considering apart from sort orders...)
Do we need to come up with a Flemish format for the name
and a French format for the same name? And maybe this needs a separate item (or items) for the
indexed version of the name to cope with the librarians who index von, van, de, etc, differently for the same name, depending on who their employer is?
It may be that that is the direction GEDCOM eventually goes. But I am trying to get my mind around how we might use the current FH versions 6 & 7 and what we might wish to see in version 8 or possibly 9.
Multiple versions of Names of People/Places
The issue of multiple formats/versions for the same name (be it the name of an individual or a place) has been kicking around for some time:
- Kyung Wha Chung (Chung Kyung Wha) originally Latinised and put in "westernised order" KWC, to meet US marketing expectations?)
- Egon Ronay (Rónay Egon) - again "Anglicised"
- Sir Georg Solti (Solti György) - again "Anglicised" in both spelling and word order
- Arnold Paltrow (né Paltrowitz) - again "Anglicised"
- Rónán Ó Gadhra > Ronan O'Gara - Anglicised for us poor "Anglos" who cannot pronounce/type/remember the Irish form - the quiet toleration of our incompetence and insensitivity is often not acknowledged.
- Peking > Bejing (a better/local transliteration into "Western Text" of Mandarin 北京)
- Calcutta > Kolkata (a better/local transliteration into "Western Text" of Bengali কলকাতা)
- Cologne > Köln (a local name triumphing over an Anglicised form?)
- Biel / Bienne (options in a bi-lingual community - in one hotel it depended if morning or evening staff were on duty!)
- Londonderry / Derry (politics)
GEDCOM and variations of individual names
Many of these individual name variations can be "handled" in
GEDCOM 5.5.1 by NAME_TYPE (e.g. "Name Assumed on Immigration" - Paltrow from Paltrowitz?) or ROMANIZED_TYPE (e.g. Korean "정", also often "spelled" Chung, Jung, Joung or Jong. - The Revised Romanization transcription Jeong being the most frequently used). Likewise PHONETIC_TYPE.
But "handling" seems to be limited to different versions being held under the "Name Structure":
Code: Select all
PERSONAL_NAME_STRUCTURE:=
n NAME <NAME_PERSONAL> {1:1} p.54
+1 TYPE <NAME_TYPE> {0:1} p.56
+1 <<PERSONAL_NAME_PIECES>> {0:1} p.37
+1 FONE <NAME_PHONETIC_VARIATION> {0:M} p.55
+2 TYPE <PHONETIC_TYPE> {1:1} p.57
+2 <<PERSONAL_NAME_PIECES>> {0:1} p.37
+1 ROMN <NAME_ROMANIZED_VARIATION> {0:M} p.56
+2 TYPE <ROMANIZED_TYPE> {1:1} p.61
+2 <<PERSONAL_NAME_PIECES>> {0:1} p.37
Something still has to go in the Primary "NAME" and that is what FH uses as the key name, and I am trying to get my mind around what goes into that NAME field ( "A/B/C") where "B" is NAME:SURNAME and is key to so much functionality within FH - yet many naming systems do not comfortably fit this structure.
Name Order
AdrianBruce wrote: ↑26 Nov 2022 21:24
(Have we talked about Hungarian names where the same person may have both the "proper" Hungarian order of family-name then given-name, and the "I can't be bothered explaining" order of given-name family-name....?)
FH can already handle "/Rónay/ Egon" if that is how we wish to refer to him; the labelling of "family name" and "given names" is handled. How we put it in is a user choice - we put in what we want to put out - which usually depends on our audience. Unfortunately, even with language packs in V7, I don't think it is possible to "switch name order" if we are outputting for a Hungarian or Chinese or Korean etc. audience. The
recommendation is to "Enter all names, places and addresses in the project language" and when
outputting "Names of people, ... in any context ... are not translated". (which presumably means they will not switch word order).
[Likewise "/Mao/ Zedong" or "/Mao/ Tse-tung" (Official Hanyu Pinyin and Wade-Giles romanisations). We can hold multiple romanisations, but we choose which romanised verison we put into the NAME field for FH to process. (We probably would not put "毛泽东" in the Name field because that is "too hard" for western users to handle.)]
Non standard name structures
So with names, "you put in what you want to get out" - which means that (outside heavy customisation of sentences, diagram and reports) - what is put in the NAME field is key to both output and processing even if the name in question does not follow any family/given format. Surname prefixes are possibly just the most notable "European" example of this issue - but we have also got use of patronymics (with or without toponymics) and mixed patronymics/matronymics (in some Iberian names), etc..
Yet I get the impression that losing sight of name parts - surname prefixes, patronymics, farm_names, etc. does represent a loss of potential functionality.
We don't want to output a sentence like "van Beethoven lived in Bonn", but "Beethoven lived in Bonn" I get the impression that Germans and Austrians would want (using German as the output language) to say something approximating to "Beethoven lebten in Bonn". In this usage the "van" does not form part of the "surname". I think most usages also want Beethoven to sort under "B" rather than under "v".
(Ludvig's ancestors - per the inevitable Wikipedia - came from Mechelen in the Austrian Duchy of Brabant (in what is now the Flemish region of Belgium) - so it is "van" not "von" - previously I have been inconsistent)
GEDCOM and GIVN SPFX and SURN and "Substance over Form"
Within GEDCOM we can already recognise surname prefixes and in the first post of this topic I pondered about only putting the prefix in the SPFX field
if it was a non-indexing prefix, and then when trying to align GIVN, SPFX and SURN with NAME, putting the SPFX field contents immediately before the first "surname slash" - which is a potential change from earlier discussion brought about by considering the substance of what the name parts do in the context of an individual name rather than the strict form of what the name parts "are". (Contrast this with "von der Leyen", where I am told that is the surname so the prefix is part of the surname for indexing, display and sentences and gets put between the slashes. So we have "Ludvig van /Beethoven/" but "Ursula /von de Leyen/")
As I think Mike was then hinting, if we don't get hung up on the "strict"
form of prefixed surnames can we allow the
substance of how the name is used to determine what goes "between the slashes" and then do we even need to worry about GIVN, SPFX and SURN? And by extension does that apply to other name forms? Do we then enter "Given /Patronymic/ Toponymic" if the user wants the primary sort to be on the Patronymic but "Given Patronymic /Toponymic/" if they want the primary sort to be on the Toponymic - and therefore not have to worry about "which bit is 'the surname'"?
That may meet a lot of user requirements without software enhancements or plugins. But it is a usage that is at variance with what GEDCOM in its Anglo-centric way was anticipating:
GEDCOM 5.5.1. p38 wrote:The name value is formed in the manner the name is normally spoken, with the given name and family name (surname) separated by slashes (/). (See <NAME_PERSONAL>, page 54.) Based on the dynamic nature or unknown compositions of naming conventions, it is difficult to provide more detailed name piece structure to handle every case. The NPFX, GIVN, NICK, SPFX, SURN, and NSFX tags are provided optionally for systems that cannot operate effectively with less structured information. For current future compatibility, all systems must construct their names based on the <NAME_PERSONAL> structure. Those using the optional name pieces should assume that few systems will process them, and most will not provide the name pieces.
Which is essentially saying that all names must follow the format "aaa/bbb/ccc", where any element is optional (provided at least one is included) and in FH the default sort happens on bbb,aaa,ccc (NAME:SURNAME_FIRST in terms of FH Name Formats)
Wider View
AdrianBruce wrote: ↑26 Nov 2022 21:24
Helicopter view... Do we need to step back and say, "
Stop trying to come up with one sort order"? (Or whatever else we are considering apart from sort orders...)
I think with the current coding of FH (and probably many other "GEDCOM Processors") that is exactly what we have - one (default) sort order. If we want to do anything else we have to be able to identify the "name pieces" by which we wish to sort and currently we are limited to those provided by the
FH name formats (NAME:SURNAME, NAME:GIVEN_ALL etc.) and the GEDCOM pieces accessed through the All Tab (GIVN, SPFX and SURN).
AdrianBruce wrote: ↑26 Nov 2022 21:24
...
Do we need to come up with a Flemish format for the name
and a French format for the same name? And maybe this needs a separate item (or items) for the
indexed version of the name to cope with the librarians who index von, van, de, etc, differently for the same name, depending on who their employer is?
I know what you mean (and weren't you the person who questioned me labelling a style "Dutch Style" in answering an original OP about Dutch surnames?)
The Search for a "Parsing Rule"
I started pondering this thinking it was relatively easy - all you had to do was "discern the rule"! When Mike was trying to develop a plug-in for bulk loading of GIVN SPFX and SURN from NAME it quickly became obvious that finding and coding "the rule" was far from easy.
I have been reading (parts of):
- The Indexer: Centrepieces - "The International Journal of Indexing" - First published by the Society of Indexers (UK) in 1958 on a twice-yearly basis, it moved in 2008 to a quarterly publication. From 2019 the journal is published by Liverpool University Press "on behalf of indexing societies worldwide".
- Names of Persons published by by the International Federation of Library Associations and Institutions
I give as one example from "Centrepieces":

- Centrepieces on Indexing names of Dutch/German origin
- Screenshot from 2022-11-27 11-51-14.png (52.22 KiB) Viewed 782 times
I am thinking that trying to enter a name in the NAME field (in "strict form") as "Givenames /Surname/" and then having a routine to sort out and load the SPFX and SURN fields such that lists sort properly is a fool's errand even if you have a series of icons to load these fields according to [Dutch|Flemish|Afrikaans|German] rules.
I think it is easier (if initially non-intuitive) if you enter names in the NAME field following "substance" as "secondary sort elements /Primary sort elements/", but that only handles "Prefixed Surnames". (The "rule" would be something like "all continuous name elements immediately preceding the first slash which are uncapitalised" are to be loaded into SPFX, slashed elements into SURN and the rest into GIVN". That would give a usage of GIVN, SPFX and SURN that gives primacy to substance over form. But that rule might then choke on a name where the final unslashed ("given") name is something like "d'Aragon" which may be a Toponymic!)
Do Language Variants Help?
Following "Substance over Form" you would in the above examples put the element before the comma "between the slashes" and the rest in the name field before the first slash. But you have to decide on your "project language" to determine which column in the above example you are using as your guide. But having decided that if you then want to output in a different language FH will not reconfigure the slashes (it does not translate names). So if the project language is "Flemish", you enter "Eddy /D'hondt/", but if you use Dutch as your output language it will not sort Eddy as "Eddy d'/Hondt/"; he will be listed between the C's and the E's!
The only way I can see round this is, as Adrian suggests, to hold multiple variants of the NAME with NAME_TYPEs specifiying which "language rule is being followed:
Code: Select all
1 NAME Eddy /D'hondt/
2 TYPE Flemish Format
1 NAME Eddy d'/Hondt/
2 TYPE Dutch Format
1 SEX M
Extract from a GEDCOM showing use of Name Types to hold different language formats
There is an inevitable overhead in maintaining this data and without some form of "macro" to swap the names around (possibly driven by Output Language choice, but defaulting to Project Language choice) the opportunities to get tangled are immense.
Identifying specific name parts for Secondary Analysis
If you enter names as "secondary sort elements /Primary sort elements/" do you then need to identify name parts for any form of analysis, filtering or reporting? Is that a secondary issue - other than the sort of issue in the section above?
The question then would be how to hold those name parts that need to be specifically identified: patronymic, toponymic (farm name) etc. You could, as has previously been suggested, use secondary names (and NAME_TYPE) in addition to the Primary NAME field to hold specific elements. You could thus have in your tree multiple Individuals with a secondary name of "Bruflot" with NAME_TYPE: "farm name". Currently this is done (V7 only) through the All tab. NAME_TYPE is not currently accessible through the
Names & Titles Dialog. Adding support for NAME_TYPE to the Names & Titles Dialog would be a
minor enhancement (Wish List 520).
It might then be possible to do a query, for instance, of all Individuals where a secondary NAME of NAME_TYPE = "farm name" contains "Bruflot".
Someone like "Leonardo di ser Piero da Vinci" could have a name Structure like:
Code: Select all
0 @I78@ INDI
1 NAME Leonardo di ser Piero /da Vinci/
1 NAME da /Vinci/
2 TYPE Toponymic
1 NAME di ser /Piero/
2 TYPE Patronymic
1 SEX M
Extract from a GEDCOM showing use of Name Types to hold Toponymics and Patronymics
Notes:
- The Primary Name is entered as you want it output; in this instance he sorts under "da Vinci" rather than "Vinci"
- I am not sure about the slashing of those secondary names; it may or may not be a useful way of drawing out the stem of the Toponymics and Patronymics.
- Care needs to be taken to ensure that these sort of secondary names are not accidentally promoted to the Primary Name
- It would also be useful for the user to be able to standardise these NAME_TYPEs - with possibly some (e.g. Patronymic, Toponymic, etc.) supplied "out of the box".
- With Russian (Slavic?) Patronymics which have gendered endings (e.g. Pyotr Ilyich Tchaikovsky, and his sister, Aleksandra IIinichna Tchaikovskaya, you could enter "Ilya" - the actual name of the father, Ilya Petrovich Tchaikovsky - as the Patronymic, so that you could get all the siblings to sort/report together. Having the patronymic identified as a secondary name allows you to avoid issues arsing from gendering because the actual patronymic is incorporated in the Primary Name.
- With Welsh Patronymics which tend to have gendered prefixes, using "contains" in a filter enables filtering of siblings together.
Where multiple names need to be input a separate tab might be a more streamlined way of entering them than the current Names & Titles dialog.

- Multiple Names and Titles Dialog
- Screenshot from 2022-11-28 08-14-56.png (31.28 KiB) Viewed 743 times
Note this is a V6 dialog so I have a note field where the NAME_TYPE field would logically go.
Does that "long spiel" talk many of the Non-standard Name entry/storage issues out of existence - given implementation an easier access to the NAME_TYPE - possibly with some protection of the Primary Name and standardisation control over NAME_TYPEs? Or are non-Anglo-sphere users (and potential users) likely to pull their hair out because once again their requirements have not been understood?