* Plugin Warning - Unicode Ahead!

Writing and using plugins for Version 5 and above.
Post Reply
User avatar
tatewise
Megastar
Posts: 18632
Joined: 25 May 2010 11:00
Family Historian: V6.2
Location: Torbay, Devon, UK
Contact:

Plugin Warning - Unicode Ahead!

Post by tatewise » 07 Jul 2020 17:07

This is a warning about the use of Unicode (non-ASCII) foreign language characters and symbols and Plugin support.

Recent versions of Windows fully support Unicode in both text and filenames.
FH V6 introduced support of Unicode characters but they don't appear to be widely used.
FH V7 is introducing support for output in languages other than English.

This will no doubt attract many foreign users and increase the use of Unicode characters.
That will lead to Unicode characters appearing in Individual, Place, Source, Media, Project & File names, etc.
Unfortunately, Plugin tools such as Lua and lfs and Penlight do NOT support Unicode.
This means that many Plugins will fail to operate in the presence of Unicode characters.

As an experiment, I have created a Project with Unicode characters in the names listed above.
I then proceeded to run most of the 92 Plugins from the Plugin Store.
Of the 87 Plugins that I ran here are the results:
  • 32 failed to run successfully at all.
  • 19 ran but reported they "may have failed to handle accent characters correctly", i.e. Unicode.
  • 36 seemed to run perfectly OK.
So you see that the majority (59%) either totally failed (37%) or did not fully support Unicode characters (22%).
Some may be a simple fix, but many will need major surgery.

I have a simple Plugin that searches for Media records with broken File links, which I have tentatively fixed.
To work with Unicode characters anywhere in the file path name needs luacom CMD prompt commands, and careful scripting to avoid upsetting filenames with Lua functions such as upper(), lower() & gsub(). The Plugin now runs slower.
Note that Unicode characters don't necessarily exist just in the Media filename, but may be in the Project name, or in the C:\Users\<sign-in-name>\, etc. See Knowledge Base > Unicode String Functions (code snippet).

I have reported the problem to Calico Pie who have logged it for investigation.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 1475
Joined: 31 May 2010 03:40
Family Historian: V6.2
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH » 07 Jul 2020 18:05

Mike,

i have for many years used accented characters in German and Norwegian Individual names, Place names, addresses etc. I also use them in file names and folder names. I have not had any problems with any of the plugins that I use (at least to my knowledge).

What are the symptoms of the plugins not working correctly? Can you give us the name of some commonly used plugins that aren't working with Unicode characters?

Thanks,
Bill

User avatar
tatewise
Megastar
Posts: 18632
Joined: 25 May 2010 11:00
Family Historian: V6.2
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise » 07 Jul 2020 19:16

The worst consequences stem from Unicode characters in the Project file-path.
Try a Project name incorporating a Unicode character, or anywhere in the folder path above, e.g. the username.
Such a scenario is highly likely for a user who has a native language name involving accented characters.
Then almost all the Plugins on page 1 of the Most Downloaded Plugins will fail abruptly as shown below.

PluginErrorMessage.png
PluginErrorMessage.png (11.66 KiB) Viewed 935 times

Unicode characters in names inside a Project may get converted to question marks (?) rather than crash the Plugin.

Plugins such as Check for Unlinked Media that manage Media files may not 'fail' at all, but they don't list files that they should if their names or folder paths contain Unicode characters. So even Plugins that don't crash may produce faulty results.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 1475
Joined: 31 May 2010 03:40
Family Historian: V6.2
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH » 07 Jul 2020 21:51

Mike,

Thanks for the information. Good to know. I'll keep my eye out for any problems.

I've never had any names, places, or addresses where characters were changed to ?.

While I do have folders and files with Unicode characters, I don't have any in my project folder path. I can see where this would be a real problem for folks with Unicode characters in their name.

I do have Unicode characters in the file names of some multimedia files and their associated media records, but have never run into any problems. I guess I just don't use the plugins that have the problems.

Thanks again,
Bill

User avatar
tatewise
Megastar
Posts: 18632
Joined: 25 May 2010 11:00
Family Historian: V6.2
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise » 07 Jul 2020 22:34

The point is that the Media management Plugins don't give you problems, but may hide them.
As an experiment, slightly change a Media filename that includes Unicode characters, so it becomes unlinked.
Do the same with a file that has no Unicode characters, so it also becomes unlinked.
Tools > External File Links will confirm those with two X broken link marks in the list.
Run the Check for Unlinked Media Plugin and it will only list one file!
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 1475
Joined: 31 May 2010 03:40
Family Historian: V6.2
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH » 08 Jul 2020 03:26

Mike,

This is one of those plugins that I rarely if ever use. However, I did run a test and renamed Åkra, Kari Larsdatter - birth record.jpg to Åkra, Kari Larsdatter - birth recordxx.jpg.

Tools > does show Åkra, Kari Larsdatter - birth record.jpg with an X broken link mark in the list.

image1.jpg
image1.jpg (56.74 KiB) Viewed 879 times

However, the Check For Unlinked Media Plugin lists Åkra, Kari Larsdatter - birth recordxx.jpg just fine.

image2.jpg
image2.jpg (42.22 KiB) Viewed 879 times

Not sure what is happening. Is Å not considered a Unicode character?

Thanks,
Bill

User avatar
tatewise
Megastar
Posts: 18632
Joined: 25 May 2010 11:00
Family Historian: V6.2
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise » 08 Jul 2020 09:43

Unfortunately, the picture is complex and confusing.
In your example, Å is NOT a problematic Unicode character because it belongs to the set of ANSI (Code Page 1252) characters that was supported by FH V5 and earlier, and the Plugin is encoded in ANSI.
If you change the Media filename to use Ā instead, then it will get omitted by the Plugin.

The Plugin failure mode symptoms are governed by the following factors:
  • Is the Plugin encoded in UTF-8 or in ANSI?
  • Does the Plugin change to UTF-8 encoding if run in FH V6 or later?
  • Does the Plugin substitute Lua standard string functions with safe UTF-8 compatible versions?
  • Does the Plugin use the iup library and adjust UTF8MODE?
  • Does the Plugin read/write files and do their file path names include Unicode characters? (This is most difficult to fix)
  • Which names within the Project include Unicode characters?
  • Do those Unicode characters belong to the set of ANSI (Code Page 1252) characters supported by FH V5?
The Plugin failure mode symptoms include:
  • Display of an explicit error message (as posted earlier).
  • Corruption of Unicode characters such as replacement with question marks (?).
  • Features misbehave but there is no error message, e.g.
    Add Source from Templates does not save the templates.
    Create Individual Shortcut creates a shortcut but it does not work.
  • Data produced by the Plugin is erroneous or incomplete, e.g.
    Check for Unlinked Media omits unlinked files.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 1475
Joined: 31 May 2010 03:40
Family Historian: V6.2
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH » 08 Jul 2020 15:41

Mike,

Sounds complex. I guess all the characters I use for Norwegian and German names must be in code page 1252 so I don't ever run into any problems. I'll watch out in the future for any problems I may encounter.

Thanks,
Bill

User avatar
tatewise
Megastar
Posts: 18632
Joined: 25 May 2010 11:00
Family Historian: V6.2
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise » 08 Jul 2020 16:16

The Code Page 1252 characters are shown below:

CodePage1252.png
CodePage1252.png (131.21 KiB) Viewed 782 times
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
BillH
Megastar
Posts: 1475
Joined: 31 May 2010 03:40
Family Historian: V6.2
Location: Washington State, USA

Re: Plugin Warning - Unicode Ahead!

Post by BillH » 08 Jul 2020 19:57

Thanks for the chart. Yes, all the characters I need are in that chart, so that is why I haven't run into any problems. If I find I have to use a new character in the future, I'll check to see that it is also in the chart.

Thanks,
Bill

User avatar
tatewise
Megastar
Posts: 18632
Joined: 25 May 2010 11:00
Family Historian: V6.2
Location: Torbay, Devon, UK
Contact:

Re: Plugin Warning - Unicode Ahead!

Post by tatewise » 15 Jul 2020 14:05

The following references appear to offer UTF-8 and thus Unicode solutions?
https://github.com/keplerproject/luafilesystem/pull/57 Fix Windows code to work with non-native code pages.
That posting cross-refers to other discussions and also the following implementations:
http://www.lua.org/manual/5.3/manual.html Lua 5.3 offers some UTF-8 string support.
https://github.com/cloudwu/luawinfile luawinfile has UTF-8 filename alternatives for lfs and Lua 5.3 file functions.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

Post Reply