* fhSQL library not compatible with ANSI encoding

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
User avatar
Mark1834
Megastar
Posts: 2146
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

fhSQL library not compatible with ANSI encoding

Post by Mark1834 » 13 Dec 2021 15:29

The fhSQL library introduced with FH 7.0.8 appears to require a filename in UTF format. If the debugger is set to ANSI format, or the filename string formatted as ANSI to be compatible with basic Lua file i/o, a call to the library with a filename containing extended characters will crash the plugin unless the filename is converted to UTF first. A non-existent filename does not crash the library, only a wrongly encoded one.

I've raised a ticket, but I'll mention it here as well in case it's not fixed quickly. IMO, this behaviour should at least be documented, or even better, trapped. For comparison, the fhLoadTextFile(...) function requires UTF format, but simply returns nil if an ANSI-formatted string is presented.
Mark Draper

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by tatewise » 13 Dec 2021 16:29

This is another instance of the issues discussed in Plugin Warning - Unicode Ahead! (17926).
Depending on which libraries are used, my plugins often invoke fhConvertANSItoUTF8(...) and fhConvertUTF8toANSI(...) on the same filepath as necessary.
I agree fhSQL should not crash.
Perhaps the Lua References and Library Modules should identify which libraries support UTF8 and which only support ANSI, plus any 'tricks' needed such as IUP needs:
iup.SetGlobal("UTF8MODE","YES")
iup.SetGlobal("UTF8MODE_FILE","YES")
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Mark1834
Megastar
Posts: 2146
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: fhSQL library not compatible with ANSI encoding

Post by Mark1834 » 13 Dec 2021 16:43

Yes - improving the KB documentation of Unicode handling is the WIP page alluded to in the new Plugins Introduction.

The CP response to my ticket was simply "Thank you for your feedback", with not even a "which will be passed to the developers for review", so it doesn't sound like it will be fixed anytime soon... :(
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 4853
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by ColeValleyGirl » 13 Dec 2021 16:55

tatewise wrote:
13 Dec 2021 16:29
Perhaps the Lua References and Library Modules should identify which libraries support UTF8 and which only support ANSI, plus any 'tricks' needed such as IUP needs:
I have this on my to-do list, Mike (as Mark already knows).

User avatar
Mark1834
Megastar
Posts: 2146
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: fhSQL library not compatible with ANSI encoding

Post by Mark1834 » 16 Dec 2021 17:08

“Bottom line” on this is that it is unfixable, primarily due to Microsoft gradually deprecating and withdrawing support for what is colloquially (but inaccurately) known as ANSI encoding. Any filename containing characters other than the basic numbers, punctuation marks and plain Latin letters of ASCII has to be presented to fhSQL in UTF8 format.

The help documentation will be updated to reflect this in due course (with thanks to Helen for her insights into the mysterious worlds of encoding, file systems, etc...).
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 4853
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by ColeValleyGirl » 16 Dec 2021 17:55

Even better in Unicode (Microsoft speak for utf16).

User avatar
Mark1834
Megastar
Posts: 2146
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: fhSQL library not compatible with ANSI encoding

Post by Mark1834 » 16 Dec 2021 18:16

How practical is that? AFAIK, FH does not provide tools for interconverting between "ANSI", which many authors will still need for general Lua compatibility (io, string, os, lfs etc), and UTF16. IUP refers specifically to UTF8 mode, not UTF16/Unicode.

I doubt many plugin authors would have the necessary background to fully embrace an all-Unicode model. Unlike Windows, FH is not a global product, so a mix-and-match “ANSI”/UTF8 is probably the more pragmatic solution for its overwhelmingly Western market.
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 4853
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by ColeValleyGirl » 16 Dec 2021 19:23

Tell that to Microsoft.

Iup is not optimised for Windows ;Windows is almost an afterthought

If you pass a filename untransformed from iup to the FH libraries it will work. Utf8 may also work... But it may not.

Problems come when you need to manipulate/slice and dice a utf16 file name and or pass it to one of the built in lua libraries that only handle ANSI. Mike T has some code for handling utf16 translations... But that won't help with filenames that are utf16 encoded if you want the file to be recognised.

As you know I still have a lot of testing to do with utf8.

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by tatewise » 17 Dec 2021 12:12

May I summarise my experience of handling Unicode filepaths with various libraries.
This is what prompted my original Plugin Warning - Unicode Ahead! (17926) posting.

1) ASCII v ANSI v UTF8 v UTF16

ASCII defines the international standard 7-bit character codes 00 - 7F.
See https://en.wikipedia.org/wiki/ASCII

ANSI is a misnomer and meant to define the 8-bit character codes 00 - FF.
The codes 00 - 7F are ASCII but the codes 80 - FF depend on the locale.
The most popular is Windows Code Page 1252 used on most 'English' PC.
See https://en.wikipedia.org/wiki/Windows_code_page

UTF8 defines ALL Unicode code points using from 1 to 4 bytes each.
The 1-byte codes 00 - 7F are identical to ASCII.
Windows definitely supports UTF8 encoded filepaths.
See https://en.wikipedia.org/wiki/UTF-8

UTF16 defines ALL Unicode code points using 1 or 2 16-bit words each.
It is incompatible with ASCII as even the first 128 code points use 16-bits each.
See https://en.wikipedia.org/wiki/UTF-16

I did not think that Windows filepaths used UTF16 encoding but it is not clear.
There are many references to 'Unicode' but that can mean UTF8 or UTF16.

2) Library Modules

Lua io and lfs and penlight only support ANSI/CP-1252 encoded filepaths.
I am surprised that with such widespread use no Unicode filepath support exists yet.

I think IUP supports UTF8 encoded filepaths via these global settings:
iup.SetGlobal("UTF8MODE","YES")
iup.SetGlobal("UTF8MODE_FILE","YES")
If they are set to "NO" then IUP uses ANSI/CP-1252.

File System Object (FSO) methods mostly support UTF8 filepaths, but strangely some do not.
e.g. FileExists, FolderExists do, but CopyFile does not.
See https://docs.microsoft.com/en-us/office ... ect-object

fhFileUtils functions mostly support UTF8 filepaths, but some do not.
e.g. fileGetContents & filePutContents do not.
See http://pluginstore.family-historian.co. ... Utils.html
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 4853
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by ColeValleyGirl » 17 Dec 2021 12:44

When Microsoft say Unicode, they ALWAYS mean UTF-16.

Character Sets Used in File Names (dated 2021)

Contrast 'As of May 2019, Microsoft reversed its course of only supporting UTF-16 for the Windows API, providing the ability to set UTF-8 as the "code page" for the multi-byte API ' from Wikipedia.

Do we believe Wikipedia or Microsoft? Goodness knows... However, the phrase 'multi-byte API' suggests that it will still only work for _wfopen calls, not fopen calls (which io, IUP and others use for portability -- _wfopen is Windows only).

I'll add that IUP says (about UTF-8 file names):

"Another option is to call:

setlocale(LC_ALL, ".UTF8");
But it will work for fopen only in Visual Studio 2017 or newer Microsoft compilers (setlocale will return NULL on other compilers). fopen will successfully open the file if the filename is an UTF-8 string with special characters. "

User avatar
Mark1834
Megastar
Posts: 2146
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: fhSQL library not compatible with ANSI encoding

Post by Mark1834 » 17 Dec 2021 14:35

Think I need a translation here ;). I get it with ANSI - it’s an obsolescent Western-centric system that’s probably had its day. However, from my very imperfect elementary understanding, UTF-8 and UTF-16 are essentially alternate coding systems that both support all the million plus characters of Unicode. Microsoft prefers UTF-16, while most non-MS stuff is UTF-8.

Is it your fear that MS will do something to Windows that means it will no longer support interoperability with UTF-8? I would have thought that very unlikely, or have I misunderstood?
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 4853
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by ColeValleyGirl » 17 Dec 2021 15:00

I'm saying that UTF-8 is (for Microsoft) an afterthought (forced by the uptake of UTF-8 everywhere else) and support may not be all there yet.

ANSI isn't western-centric as such, it's just that the code pages most Western developers are familiar with are Western-centric -- but Microsoft want to kill it...

However, the key thing is that ANSI uses a single byte only (although which character the top half of the character set stands for depends on the code page in use), so fopen works; however, UTF-8 uses 1-4 bytes and UTF-16 uses 2-4 bytes, so both need (I believe) _wfopen (where the w stands for 'wide character'). fopen is a c library function, so is portable across unix and windows (and other places); _wfopen is a Microsoft extension to C and is not portable.

There is a 'UTF8' code page in Windows (now) but it's compiler specific and there are confusing messages about whether it works with fopen or _wfopen. (I suspect Msoft have kludged fopen). It can't be a solution for FH plugins because we shouldn't be messing around with user's code pages.

avatar
KFN
Famous
Posts: 177
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: fhSQL library not compatible with ANSI encoding

Post by KFN » 17 Dec 2021 15:41

Windows (as opposed to *nix operation systems) operates natively in UTF-16, so converting a UTF-16 encoding to the Unicode character set (and Vice-Vera) is always part of the Windows API, while converting a UTF-8 encoding to the Unicode (and vice-versa) may not be available in a specific Windows API.

Thus, for some APIs a conversion (UTF-8 data to UTF-16) may need to occur.

This is my understanding!

User avatar
tatewise
Megastar
Posts: 27076
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by tatewise » 17 Dec 2021 15:49

I don't have that fear. If you follow the trail from Helen's link to Character Sets Used in File Names, via Unicode in the Windows API, to Unicode it says several times that Windows supports UTF-16 and UTF-8. However, at the lower levels, Helen raises some concerns.

Nevertheless, I believe a library that supports UTF8 encoded filepaths &/or text content will perform whatever translation is needed between UTF8 and the Windows underlying encoding. Our challenge is to find a set of library functions that offer all the file management features required. If we restrict our focus to FH v7.0 then that challenge is somewhat simpler.

For some time, I've run all my Plugins against a filepath including ë which is valid in both ANSI/CP 1252 and UTF8.
That allows fhConvertANSItoUTF8(...) and fhConvertUTF8toANSI(...) to be applied as necessary in FH v6 and FH v7.

The difficulty arises when a user employs Unicode filepath characters that are invalid in ANSI/CP 1252.
As an experiment, I have just tried using Ǯ in filepaths, which is just such a Unicode character.
File System Object (FSO) methods go a long way to fully supporting such filepaths.
Even the FH API fhSaveTextFile() and fhLoadTextFile() handled such filepaths!

The Code Snippet Directory Tree won't work as it relies on lfs.

But fhFileUtils getFolderContents() uses the File System Object (FSO) method to obtain files from a folder OK.

And then there are the fhSQL problems...

@KFN: The problem that is the focus of this thread is filepath names. Converting file content is not such a problem.
Last edited by tatewise on 17 Dec 2021 18:35, edited 1 time in total.
Reason: Corrected description of getFolderContent() which does work OK.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
KFN
Famous
Posts: 177
Joined: 20 Jun 2021 01:00
Family Historian: V7

Re: fhSQL library not compatible with ANSI encoding

Post by KFN » 17 Dec 2021 16:12

@tatewise, I would suspect that a conversion is required when reading a file path from the index as well!

User avatar
ColeValleyGirl
Megastar
Posts: 4853
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by ColeValleyGirl » 17 Dec 2021 16:12

tatewise wrote:
17 Dec 2021 15:49
If you follow the trail from Helen's link to Character Sets Used in File Names, via Unicode in the Windows API, to Unicode it says several times that Windows supports UTF-16 and UTF-8.
The issue is not whether Windows supports UTF-8/UTF-16; the issue is whether the libraries can and will take advantage of it. If they're coded using fopen to open files, and fopen on Windows doesn't support UTF-8/UTF-16, the libraries have a problem. Windows offers _wfopen as Unicode support, but if the libraries won't use that function for portability reasons, there's still an issue.

I'm going to drop out of this now -- I have an article for the KB partly written which will not delve into the depths of technospeak, but hopefully offer some specific 'recipes' depending on what a plugin author needs to achieve (and also explain the limitations of each recipe).

User avatar
Mark1834
Megastar
Posts: 2146
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: fhSQL library not compatible with ANSI encoding

Post by Mark1834 » 17 Dec 2021 19:59

I've been playing with fhSQL a little more....

If it is presented with a filename that is valid in "ANSI" and UTF but not ASCII (e.g. an accented European character), it must be presented in UTF format.

If it is a pure UTF filename (e.g. Cyrillic), fhSQL won't accept it at all, irrespective of whether it's UTF-8 or UTF-16.... :?

My pragmatic solution for this will be to try to convert the filename to ANSI, and if fhIsConversionLossFlagSet() is set, reject the file and tell the user to change the name.
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 4853
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by ColeValleyGirl » 17 Dec 2021 20:43

Mark, which variety of utf?

User avatar
Mark1834
Megastar
Posts: 2146
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: fhSQL library not compatible with ANSI encoding

Post by Mark1834 » 17 Dec 2021 21:13

Whichever flavour is produced by this script - both 8 and 16 upset fhSQL

Code: Select all

require('fhSQL')
fhfu = require('fhFileUtils')

FileName = 'C:\\Users\\Mark Draper\\Desktop\\Владимир.rmtree'

OK = fhfu.createTextFile(FileName, true, true, os.date(), 16)
S = fhfu.readTextFile(FileName, true, 16)

fhMessageBox(S .. ' retrieved from ' .. FileName)

if not fhfu.fileExists(FileName) then return end

database = fhSQL.connectSQLite(FileName)
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 4853
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhSQL library not compatible with ANSI encoding

Post by ColeValleyGirl » 17 Dec 2021 21:25

But that file isn't a sql database? So it shouldn't work with fhsql anyway.

User avatar
Mark1834
Megastar
Posts: 2146
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: fhSQL library not compatible with ANSI encoding

Post by Mark1834 » 17 Dec 2021 22:00

Doesn’t matter. fhSQL doesn’t seem to read the contents on connecting, only that it understands the filename. The file doesn’t even have to exist! A non-database file will generate an un-trapped error on the next step and stop plugin execution, so its error handling probably needs sharpening up as well.

Change the filename to Latin script as a control experiment and the plugin runs to completion.
Mark Draper

Post Reply