* Default Code Page

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Default Code Page

Post by tatewise » 29 Oct 2013 17:17

Whilst investigating topic Croatian characters not accepted in FH5 (10867) I have been experimenting with Windows Control Panel > Region and Language settings for System Locale and Default Code Page.
See http://blogs.msdn.com/b/michkap/archive ... 64707.aspx.

My interest is that Plugins such as Export Gedcom to TNG and Map Life Facts assume FH uses Code Page 1252.
But Region and Language can change the Code Page used by FH, and so the Plugins will make mistakes in encoding non-ASCII characters.

The Region and Language > Formats tab sets the Default User Locale which has no effect on FH, but can be detected in a Plugin by using LUA os.setlocale() that reports the Code Page number.

The Region and Language > Administrative > Change system locale sets the Default System Locale which determines the FH Code Page, but this is NOT reported by LUA os.setlocale().

So how can a Plugin determine the Code Page FH is using, to correctly translate GEDCOM characters into UTF-8, etc?

A solution would be something like fhGetContextInfo("CI_CODE_PAGE").

Another side effect is that GEDCOM files on PC using different Code Page settings cannot be exchanged without risking character corruptions.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Jane
Site Admin
Posts: 8442
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Re: Default Code Page

Post by Jane » 29 Oct 2013 18:57

I think GedCom Exchange will be OK if you export using the UTF-8 format on the export option.

I suspect you could get the current code page from the Windows registry, but I have not looked for the key.
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Default Code Page

Post by tatewise » 29 Oct 2013 19:11

Yes, but the user must remember to Export in UTF-8 rather than just copy the entire project with original GEDCOM.
They would also need to export to same folder as the original GEDCOM and retain Relative media links.
Similarly, if the user needs to receive GEDCOM from elsewhere, they must ask other user to export in UTF-8.

The Windows Registry key is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP but to retrieve it means that a Command Prompt will flash on & off to run REG EXPORT whenever a Plugin is run that needs the setting.
Rather like my Backup and Restore Family Historian Settings Plugin.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Jane
Site Admin
Posts: 8442
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Re: Default Code Page

Post by Jane » 29 Oct 2013 19:46

You can use plugins:code_snippets:registry_key_read|registry key read to get reg keys invisibly.
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."

User avatar
tatewise
Megastar
Posts: 27087
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Default Code Page

Post by tatewise » 29 Oct 2013 23:54

Thanks Jane, I had overlooked that method, and it works well.
I have now incorporated it into the text encoder library module to support alternative Code Page settings.

When the module is loaded:
It now raises an error message if the current FH Code Page is not supported by the encoder.
If the Code Page is not 1252 then it substitutes encodings in the lookup tables for the current Code Page.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

Post Reply