* Plugin Warning - Unicode Ahead!

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Plugin Warning - Unicode Ahead!

Post by tatewise »

This is a warning about the use of Unicode (non-ASCII) foreign language characters and symbols and Plugin support.

Recent versions of Windows fully support Unicode in both text and filenames.
FH V6 introduced support of Unicode characters but they don't appear to be widely used.
FH V7 is introducing support for output in languages other than English.

This will no doubt attract many foreign users and increase the use of Unicode characters.
That will lead to Unicode characters appearing in Individual, Place, Source, Media, Project & File names, etc.
Unfortunately, Plugin tools such as Lua and lfs and Penlight do NOT support Unicode.
This means that many Plugins will fail to operate in the presence of Unicode characters.

As an experiment, I have created a Project with Unicode characters in the names listed above.
I then proceeded to run most of the 92 Plugins from the Plugin Store.
Of the 87 Plugins that I ran here are the results:
  • 32 failed to run successfully at all.
  • 19 ran but reported they "may have failed to handle accent characters correctly", i.e. Unicode.
  • 36 seemed to run perfectly OK.
So you see that the majority (59%) either totally failed (37%) or did not fully support Unicode characters (22%).
Some may be a simple fix, but many will need major surgery.

I have a simple Plugin that searches for Media records with broken File links, which I have tentatively fixed.
To work with Unicode characters anywhere in the file path name needs luacom CMD prompt commands, and careful scripting to avoid upsetting filenames with Lua functions such as upper(), lower() & gsub(). The Plugin now runs slower.
Note that Unicode characters don't necessarily exist just in the Media filename, but may be in the Project name, or in the C:\Users\<sign-in-name>\, etc. See plugins:code_snippets:unicode_string_functions|> Unicode String Functions (code snippet).

I have reported the problem to Calico Pie who have logged it for investigation.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
BillH
Megastar
Posts: 2257
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH »

Mike,

i have for many years used accented characters in German and Norwegian Individual names, Place names, addresses etc. I also use them in file names and folder names. I have not had any problems with any of the plugins that I use (at least to my knowledge).

What are the symptoms of the plugins not working correctly? Can you give us the name of some commonly used plugins that aren't working with Unicode characters?

Thanks,
Bill
Bill Henshaw
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise »

The worst consequences stem from Unicode characters in the Project file-path.
Try a Project name incorporating a Unicode character, or anywhere in the folder path above, e.g. the username.
Such a scenario is highly likely for a user who has a native language name involving accented characters.
Then almost all the Plugins on page 1 of the Most Downloaded Plugins will fail abruptly as shown below.

PluginErrorMessage.png
PluginErrorMessage.png (11.66 KiB) Viewed 7130 times

Unicode characters in names inside a Project may get converted to question marks (?) rather than crash the Plugin.

Plugins such as Check for Unlinked Media that manage Media files may not 'fail' at all, but they don't list files that they should if their names or folder paths contain Unicode characters. So even Plugins that don't crash may produce faulty results.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
BillH
Megastar
Posts: 2257
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH »

Mike,

Thanks for the information. Good to know. I'll keep my eye out for any problems.

I've never had any names, places, or addresses where characters were changed to ?.

While I do have folders and files with Unicode characters, I don't have any in my project folder path. I can see where this would be a real problem for folks with Unicode characters in their name.

I do have Unicode characters in the file names of some multimedia files and their associated media records, but have never run into any problems. I guess I just don't use the plugins that have the problems.

Thanks again,
Bill
Bill Henshaw
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise »

The point is that the Media management Plugins don't give you problems, but may hide them.
As an experiment, slightly change a Media filename that includes Unicode characters, so it becomes unlinked.
Do the same with a file that has no Unicode characters, so it also becomes unlinked.
Tools > External File Links will confirm those with two X broken link marks in the list.
Run the Check for Unlinked Media Plugin and it will only list one file!
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
BillH
Megastar
Posts: 2257
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH »

Mike,

This is one of those plugins that I rarely if ever use. However, I did run a test and renamed Åkra, Kari Larsdatter - birth record.jpg to Åkra, Kari Larsdatter - birth recordxx.jpg.

Tools > does show Åkra, Kari Larsdatter - birth record.jpg with an X broken link mark in the list.

image1.jpg
image1.jpg (56.74 KiB) Viewed 7074 times

However, the Check For Unlinked Media Plugin lists Åkra, Kari Larsdatter - birth recordxx.jpg just fine.

image2.jpg
image2.jpg (42.22 KiB) Viewed 7074 times

Not sure what is happening. Is Å not considered a Unicode character?

Thanks,
Bill
Bill Henshaw
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise »

Unfortunately, the picture is complex and confusing.
In your example, Å is NOT a problematic Unicode character because it belongs to the set of ANSI (Code Page 1252) characters that was supported by FH V5 and earlier, and the Plugin is encoded in ANSI.
If you change the Media filename to use Ā instead, then it will get omitted by the Plugin.

The Plugin failure mode symptoms are governed by the following factors:
  • Is the Plugin encoded in UTF-8 or in ANSI?
  • Does the Plugin change to UTF-8 encoding if run in FH V6 or later?
  • Does the Plugin substitute Lua standard string functions with safe UTF-8 compatible versions?
  • Does the Plugin use the iup library and adjust UTF8MODE?
  • Does the Plugin read/write files and do their file path names include Unicode characters? (This is most difficult to fix)
  • Which names within the Project include Unicode characters?
  • Do those Unicode characters belong to the set of ANSI (Code Page 1252) characters supported by FH V5?
The Plugin failure mode symptoms include:
  • Display of an explicit error message (as posted earlier).
  • Corruption of Unicode characters such as replacement with question marks (?).
  • Features misbehave but there is no error message, e.g.
    Add Source from Templates does not save the templates.
    Create Individual Shortcut creates a shortcut but it does not work.
  • Data produced by the Plugin is erroneous or incomplete, e.g.
    Check for Unlinked Media omits unlinked files.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
BillH
Megastar
Posts: 2257
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH »

Mike,

Sounds complex. I guess all the characters I use for Norwegian and German names must be in code page 1252 so I don't ever run into any problems. I'll watch out in the future for any problems I may encounter.

Thanks,
Bill
Bill Henshaw
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise »

The Code Page 1252 characters are shown below:

CodePage1252.png
CodePage1252.png (131.21 KiB) Viewed 6977 times
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
BillH
Megastar
Posts: 2257
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH »

Thanks for the chart. Yes, all the characters I need are in that chart, so that is why I haven't run into any problems. If I find I have to use a new character in the future, I'll check to see that it is also in the chart.

Thanks,
Bill
Bill Henshaw
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise »

The following references appear to offer UTF-8 and thus Unicode solutions?
https://github.com/keplerproject/luafilesystem/pull/57 Fix Windows code to work with non-native code pages.
That posting cross-refers to other discussions and also the following implementations:
http://www.lua.org/manual/5.3/manual.html Lua 5.3 offers some UTF-8 string support.
https://github.com/cloudwu/luawinfile luawinfile has UTF-8 filename alternatives for lfs and Lua 5.3 file functions.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise »

I've resurrected this old thread regarding library support for UTF-8 characters in filenames rather than continue the discussion in the FH-RM-Ancestry Sync to exploit hints (update) (19870) thread. It was an issue I had contended with for a long time before posting this thread in 2020, but the increasing support for Unicode and foreign languages prompted me to start a discussion.

Helen, I thought I had added an old Wiki article on this topic but I might be mistaken. If it exists it would have been in the Plugins section probably to do with using library modules. However, I may just be remembering this thread.

I reported the issue to CP in June/July 2020 as "Unicode in Folder/File Paths in Plugins" ticket #138290 which said much the same as in this thread. CP responded with: "We have logged this for further consideration" but nothing further. I did not expect them to rewrite 3rd party libraries but did expect a considered reply.

As mentioned in that ticket and earlier in this thread, there are library developments such as luawinfile that seem to offer a fix for the problem, which perhaps CP could have adopted not just for FH v7 but also earlier FH versions.

The new FH v7 fhFileUtils library may well offer a solution but that doesn't help Plugins that support all FH versions.

IMO it is a topic that needs to be mentioned in both Writing and Maintaining Plugins Compatible with Versions 5, 6 & 7 and Lua References and Library Modules as it is a 'trip hazard for any Plugin author when handling files. It impacts the Lua io file functions as well as libraries such as lfs.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5502
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by ColeValleyGirl »

As we decided here, luawinfile is not an option.

Suggest some wording for the articles you propose changing (in the Maintaining the KB forum please).

Lua References and Library Modules already says:
fhFileUtils: An ƒh specific library of modules to handle files with Unicode file names that are not supported by the standard Lua LFS and IO methods.
and it isn't really an issue about maintaining compatibility between 5, 6 and 7.

Perhaps it should be in Getting Started Writing Plugins , perhaps as part of a general exhortation to consider the use of libraries and/or under the Unicode heading there?

fhFileUtils does at least offer code that can be included in plugins directly if backwards compatibility is an issue, as it's MIT licenced... However, I doubt that any new plugin authors are going to be concerned with backward compatibility.
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise »

Ok, so there appear to be no library solutions apart from fhFileUtils.
I find that odd as it must be a widespread problem for Lua users.
However, it is a niche problem and just some warnings to plugin authors should suffice.

Since Mark fell into the trap, perhaps he is best placed to suggest where the warnings should appear in the KB.

BTW: The link in Lua References and Library Modules to the fhFileUtils Online Documentation actual links to fhUtils and the URL should be http://pluginstore.family-historian.co. ... tils.html
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5502
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by ColeValleyGirl »

Link fixed.
User avatar
Mark1834
Megastar
Posts: 2511
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Plugin Warning - Unicode Ahead!

Post by Mark1834 »

IMO, the best place would be in the plugin help documentation. It is a fairly good introduction to plugins in general, basic Lua and common libraries, but says relatively little about what modifications are necessary for extended character support (reflecting its UK origins perhaps?).

Pragmatically, that probably won't happen anytime soon, so next best would be the KB. Again IMO, what is there now reads far more as a crib-sheet for the handful of authors who need cross-version compatibility, rather than an introduction for new authors. For example, the FH V6 Unicode section on the getting started page refers only to text fields and doesn't mention file issues at all. Follow the links for more information, and you are confronted with reams of technobabble.

My feeling at the moment is that it needs a new dedicated page on the implications of handling UTF characters that summarises what the issues are and recommended solutions. Personally, I would forget about cross-version compatibility here. I agree with Helen that it is of no interest to new authors, so concentrate on the basics of how to write robust V7 plugins rather than more esoteric solutions such as compat53 and how earlier versions of FH load libraries.

I've fallen down enough holes over the past few months to understand the traps for the unwary, so I'm happy to do a first draft over the next couple of weeks or so that can be reviewed before being finalised to ensure I have not over-simplified or omitted any vital points.
Mark Draper
User avatar
ColeValleyGirl
Megastar
Posts: 5502
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by ColeValleyGirl »

says relatively little about what modifications are necessary for extended character support (reflecting its UK origins perhaps?).
I suspect it's more of a reflection of the fact that (a) utf8 support within Lua didn't arrive until post-version 5.1 (so FH7 in FH terms); and (b) before the introduction of language packs in FH7, the majority of users were English-speaking.

Thanks for the offer to draft something.
User avatar
tatewise
Megastar
Posts: 28414
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise »

FYI: Prompted by me the FH How to Write Plugins > Introduction to Lua > Additional Lua Libraries in FH v7.0.9 has been made comprehensive (similar to Lua References and Library Modules) but needs some corrections.
Those corrections could also introduce warnings about Unicode v ANSI file paths and which libraries support them.
e.g. LFS and PENLIGHT don't whereas fhFileUtils does.
Before I submit my corrections, would such warnings be worth mentioning?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5502
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by ColeValleyGirl »

You might want to tell them they've got the wrong link to Penlight documentation -- they're still referring to the Steve Donovan documentation . Current documentation is at https://lunarmodules.github.io/Penlight/ (or rather, documentation for an even later version compatible with lua 5.4 but there's a changelog there as well.)

I'm not sure 'random' (or selective) warnings ought to be included, unless you're proposing a comprehensive set of warnings....
Post Reply