* fhSQL library not compatible with ANSI encoding
- Mark1834
- Megastar
- Posts: 2147
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
fhSQL library not compatible with ANSI encoding
The fhSQL library introduced with FH 7.0.8 appears to require a filename in UTF format. If the debugger is set to ANSI format, or the filename string formatted as ANSI to be compatible with basic Lua file i/o, a call to the library with a filename containing extended characters will crash the plugin unless the filename is converted to UTF first. A non-existent filename does not crash the library, only a wrongly encoded one.
I've raised a ticket, but I'll mention it here as well in case it's not fixed quickly. IMO, this behaviour should at least be documented, or even better, trapped. For comparison, the fhLoadTextFile(...) function requires UTF format, but simply returns nil if an ANSI-formatted string is presented.
I've raised a ticket, but I'll mention it here as well in case it's not fixed quickly. IMO, this behaviour should at least be documented, or even better, trapped. For comparison, the fhLoadTextFile(...) function requires UTF format, but simply returns nil if an ANSI-formatted string is presented.
Mark Draper
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: fhSQL library not compatible with ANSI encoding
This is another instance of the issues discussed in Plugin Warning - Unicode Ahead! (17926).
Depending on which libraries are used, my plugins often invoke fhConvertANSItoUTF8(...) and fhConvertUTF8toANSI(...) on the same filepath as necessary.
I agree fhSQL should not crash.
Perhaps the Lua References and Library Modules should identify which libraries support UTF8 and which only support ANSI, plus any 'tricks' needed such as IUP needs:
iup.SetGlobal("UTF8MODE","YES")
iup.SetGlobal("UTF8MODE_FILE","YES")
Depending on which libraries are used, my plugins often invoke fhConvertANSItoUTF8(...) and fhConvertUTF8toANSI(...) on the same filepath as necessary.
I agree fhSQL should not crash.
Perhaps the Lua References and Library Modules should identify which libraries support UTF8 and which only support ANSI, plus any 'tricks' needed such as IUP needs:
iup.SetGlobal("UTF8MODE","YES")
iup.SetGlobal("UTF8MODE_FILE","YES")
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- Mark1834
- Megastar
- Posts: 2147
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: fhSQL library not compatible with ANSI encoding
Yes - improving the KB documentation of Unicode handling is the WIP page alluded to in the new Plugins Introduction.
The CP response to my ticket was simply "Thank you for your feedback", with not even a "which will be passed to the developers for review", so it doesn't sound like it will be fixed anytime soon...
The CP response to my ticket was simply "Thank you for your feedback", with not even a "which will be passed to the developers for review", so it doesn't sound like it will be fixed anytime soon...
Mark Draper
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: fhSQL library not compatible with ANSI encoding
I have this on my to-do list, Mike (as Mark already knows).tatewise wrote: ↑13 Dec 2021 16:29Perhaps the Lua References and Library Modules should identify which libraries support UTF8 and which only support ANSI, plus any 'tricks' needed such as IUP needs:
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- Mark1834
- Megastar
- Posts: 2147
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: fhSQL library not compatible with ANSI encoding
“Bottom line” on this is that it is unfixable, primarily due to Microsoft gradually deprecating and withdrawing support for what is colloquially (but inaccurately) known as ANSI encoding. Any filename containing characters other than the basic numbers, punctuation marks and plain Latin letters of ASCII has to be presented to fhSQL in UTF8 format.
The help documentation will be updated to reflect this in due course (with thanks to Helen for her insights into the mysterious worlds of encoding, file systems, etc...).
The help documentation will be updated to reflect this in due course (with thanks to Helen for her insights into the mysterious worlds of encoding, file systems, etc...).
Mark Draper
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: fhSQL library not compatible with ANSI encoding
Even better in Unicode (Microsoft speak for utf16).
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- Mark1834
- Megastar
- Posts: 2147
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: fhSQL library not compatible with ANSI encoding
How practical is that? AFAIK, FH does not provide tools for interconverting between "ANSI", which many authors will still need for general Lua compatibility (io, string, os, lfs etc), and UTF16. IUP refers specifically to UTF8 mode, not UTF16/Unicode.
I doubt many plugin authors would have the necessary background to fully embrace an all-Unicode model. Unlike Windows, FH is not a global product, so a mix-and-match “ANSI”/UTF8 is probably the more pragmatic solution for its overwhelmingly Western market.
I doubt many plugin authors would have the necessary background to fully embrace an all-Unicode model. Unlike Windows, FH is not a global product, so a mix-and-match “ANSI”/UTF8 is probably the more pragmatic solution for its overwhelmingly Western market.
Mark Draper
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: fhSQL library not compatible with ANSI encoding
Tell that to Microsoft.
Iup is not optimised for Windows ;Windows is almost an afterthought
If you pass a filename untransformed from iup to the FH libraries it will work. Utf8 may also work... But it may not.
Problems come when you need to manipulate/slice and dice a utf16 file name and or pass it to one of the built in lua libraries that only handle ANSI. Mike T has some code for handling utf16 translations... But that won't help with filenames that are utf16 encoded if you want the file to be recognised.
As you know I still have a lot of testing to do with utf8.
Iup is not optimised for Windows ;Windows is almost an afterthought
If you pass a filename untransformed from iup to the FH libraries it will work. Utf8 may also work... But it may not.
Problems come when you need to manipulate/slice and dice a utf16 file name and or pass it to one of the built in lua libraries that only handle ANSI. Mike T has some code for handling utf16 translations... But that won't help with filenames that are utf16 encoded if you want the file to be recognised.
As you know I still have a lot of testing to do with utf8.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: fhSQL library not compatible with ANSI encoding
May I summarise my experience of handling Unicode filepaths with various libraries.
This is what prompted my original Plugin Warning - Unicode Ahead! (17926) posting.
1) ASCII v ANSI v UTF8 v UTF16
ASCII defines the international standard 7-bit character codes 00 - 7F.
See https://en.wikipedia.org/wiki/ASCII
ANSI is a misnomer and meant to define the 8-bit character codes 00 - FF.
The codes 00 - 7F are ASCII but the codes 80 - FF depend on the locale.
The most popular is Windows Code Page 1252 used on most 'English' PC.
See https://en.wikipedia.org/wiki/Windows_code_page
UTF8 defines ALL Unicode code points using from 1 to 4 bytes each.
The 1-byte codes 00 - 7F are identical to ASCII.
Windows definitely supports UTF8 encoded filepaths.
See https://en.wikipedia.org/wiki/UTF-8
UTF16 defines ALL Unicode code points using 1 or 2 16-bit words each.
It is incompatible with ASCII as even the first 128 code points use 16-bits each.
See https://en.wikipedia.org/wiki/UTF-16
I did not think that Windows filepaths used UTF16 encoding but it is not clear.
There are many references to 'Unicode' but that can mean UTF8 or UTF16.
2) Library Modules
Lua io and lfs and penlight only support ANSI/CP-1252 encoded filepaths.
I am surprised that with such widespread use no Unicode filepath support exists yet.
I think IUP supports UTF8 encoded filepaths via these global settings:
iup.SetGlobal("UTF8MODE","YES")
iup.SetGlobal("UTF8MODE_FILE","YES")
If they are set to "NO" then IUP uses ANSI/CP-1252.
File System Object (FSO) methods mostly support UTF8 filepaths, but strangely some do not.
e.g. FileExists, FolderExists do, but CopyFile does not.
See https://docs.microsoft.com/en-us/office ... ect-object
fhFileUtils functions mostly support UTF8 filepaths, but some do not.
e.g. fileGetContents & filePutContents do not.
See http://pluginstore.family-historian.co. ... Utils.html
This is what prompted my original Plugin Warning - Unicode Ahead! (17926) posting.
1) ASCII v ANSI v UTF8 v UTF16
ASCII defines the international standard 7-bit character codes 00 - 7F.
See https://en.wikipedia.org/wiki/ASCII
ANSI is a misnomer and meant to define the 8-bit character codes 00 - FF.
The codes 00 - 7F are ASCII but the codes 80 - FF depend on the locale.
The most popular is Windows Code Page 1252 used on most 'English' PC.
See https://en.wikipedia.org/wiki/Windows_code_page
UTF8 defines ALL Unicode code points using from 1 to 4 bytes each.
The 1-byte codes 00 - 7F are identical to ASCII.
Windows definitely supports UTF8 encoded filepaths.
See https://en.wikipedia.org/wiki/UTF-8
UTF16 defines ALL Unicode code points using 1 or 2 16-bit words each.
It is incompatible with ASCII as even the first 128 code points use 16-bits each.
See https://en.wikipedia.org/wiki/UTF-16
I did not think that Windows filepaths used UTF16 encoding but it is not clear.
There are many references to 'Unicode' but that can mean UTF8 or UTF16.
2) Library Modules
Lua io and lfs and penlight only support ANSI/CP-1252 encoded filepaths.
I am surprised that with such widespread use no Unicode filepath support exists yet.
I think IUP supports UTF8 encoded filepaths via these global settings:
iup.SetGlobal("UTF8MODE","YES")
iup.SetGlobal("UTF8MODE_FILE","YES")
If they are set to "NO" then IUP uses ANSI/CP-1252.
File System Object (FSO) methods mostly support UTF8 filepaths, but strangely some do not.
e.g. FileExists, FolderExists do, but CopyFile does not.
See https://docs.microsoft.com/en-us/office ... ect-object
fhFileUtils functions mostly support UTF8 filepaths, but some do not.
e.g. fileGetContents & filePutContents do not.
See http://pluginstore.family-historian.co. ... Utils.html
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: fhSQL library not compatible with ANSI encoding
When Microsoft say Unicode, they ALWAYS mean UTF-16.
Character Sets Used in File Names (dated 2021)
Contrast 'As of May 2019, Microsoft reversed its course of only supporting UTF-16 for the Windows API, providing the ability to set UTF-8 as the "code page" for the multi-byte API ' from Wikipedia.
Do we believe Wikipedia or Microsoft? Goodness knows... However, the phrase 'multi-byte API' suggests that it will still only work for _wfopen calls, not fopen calls (which io, IUP and others use for portability -- _wfopen is Windows only).
I'll add that IUP says (about UTF-8 file names):
"Another option is to call:
setlocale(LC_ALL, ".UTF8");
But it will work for fopen only in Visual Studio 2017 or newer Microsoft compilers (setlocale will return NULL on other compilers). fopen will successfully open the file if the filename is an UTF-8 string with special characters. "
Character Sets Used in File Names (dated 2021)
Contrast 'As of May 2019, Microsoft reversed its course of only supporting UTF-16 for the Windows API, providing the ability to set UTF-8 as the "code page" for the multi-byte API ' from Wikipedia.
Do we believe Wikipedia or Microsoft? Goodness knows... However, the phrase 'multi-byte API' suggests that it will still only work for _wfopen calls, not fopen calls (which io, IUP and others use for portability -- _wfopen is Windows only).
I'll add that IUP says (about UTF-8 file names):
"Another option is to call:
setlocale(LC_ALL, ".UTF8");
But it will work for fopen only in Visual Studio 2017 or newer Microsoft compilers (setlocale will return NULL on other compilers). fopen will successfully open the file if the filename is an UTF-8 string with special characters. "
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- Mark1834
- Megastar
- Posts: 2147
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: fhSQL library not compatible with ANSI encoding
Think I need a translation here
. I get it with ANSI - it’s an obsolescent Western-centric system that’s probably had its day. However, from my very imperfect elementary understanding, UTF-8 and UTF-16 are essentially alternate coding systems that both support all the million plus characters of Unicode. Microsoft prefers UTF-16, while most non-MS stuff is UTF-8.
Is it your fear that MS will do something to Windows that means it will no longer support interoperability with UTF-8? I would have thought that very unlikely, or have I misunderstood?
Is it your fear that MS will do something to Windows that means it will no longer support interoperability with UTF-8? I would have thought that very unlikely, or have I misunderstood?
Mark Draper
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: fhSQL library not compatible with ANSI encoding
I'm saying that UTF-8 is (for Microsoft) an afterthought (forced by the uptake of UTF-8 everywhere else) and support may not be all there yet.
ANSI isn't western-centric as such, it's just that the code pages most Western developers are familiar with are Western-centric -- but Microsoft want to kill it...
However, the key thing is that ANSI uses a single byte only (although which character the top half of the character set stands for depends on the code page in use), so fopen works; however, UTF-8 uses 1-4 bytes and UTF-16 uses 2-4 bytes, so both need (I believe) _wfopen (where the w stands for 'wide character'). fopen is a c library function, so is portable across unix and windows (and other places); _wfopen is a Microsoft extension to C and is not portable.
There is a 'UTF8' code page in Windows (now) but it's compiler specific and there are confusing messages about whether it works with fopen or _wfopen. (I suspect Msoft have kludged fopen). It can't be a solution for FH plugins because we shouldn't be messing around with user's code pages.
ANSI isn't western-centric as such, it's just that the code pages most Western developers are familiar with are Western-centric -- but Microsoft want to kill it...
However, the key thing is that ANSI uses a single byte only (although which character the top half of the character set stands for depends on the code page in use), so fopen works; however, UTF-8 uses 1-4 bytes and UTF-16 uses 2-4 bytes, so both need (I believe) _wfopen (where the w stands for 'wide character'). fopen is a c library function, so is portable across unix and windows (and other places); _wfopen is a Microsoft extension to C and is not portable.
There is a 'UTF8' code page in Windows (now) but it's compiler specific and there are confusing messages about whether it works with fopen or _wfopen. (I suspect Msoft have kludged fopen). It can't be a solution for FH plugins because we shouldn't be messing around with user's code pages.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
Re: fhSQL library not compatible with ANSI encoding
Windows (as opposed to *nix operation systems) operates natively in UTF-16, so converting a UTF-16 encoding to the Unicode character set (and Vice-Vera) is always part of the Windows API, while converting a UTF-8 encoding to the Unicode (and vice-versa) may not be available in a specific Windows API.
Thus, for some APIs a conversion (UTF-8 data to UTF-16) may need to occur.
This is my understanding!
Thus, for some APIs a conversion (UTF-8 data to UTF-16) may need to occur.
This is my understanding!
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: fhSQL library not compatible with ANSI encoding
I don't have that fear. If you follow the trail from Helen's link to Character Sets Used in File Names, via Unicode in the Windows API, to Unicode it says several times that Windows supports UTF-16 and UTF-8. However, at the lower levels, Helen raises some concerns.
Nevertheless, I believe a library that supports UTF8 encoded filepaths &/or text content will perform whatever translation is needed between UTF8 and the Windows underlying encoding. Our challenge is to find a set of library functions that offer all the file management features required. If we restrict our focus to FH v7.0 then that challenge is somewhat simpler.
For some time, I've run all my Plugins against a filepath including ë which is valid in both ANSI/CP 1252 and UTF8.
That allows fhConvertANSItoUTF8(...) and fhConvertUTF8toANSI(...) to be applied as necessary in FH v6 and FH v7.
The difficulty arises when a user employs Unicode filepath characters that are invalid in ANSI/CP 1252.
As an experiment, I have just tried using Ǯ in filepaths, which is just such a Unicode character.
File System Object (FSO) methods go a long way to fully supporting such filepaths.
Even the FH API fhSaveTextFile() and fhLoadTextFile() handled such filepaths!
The Code Snippet Directory Tree won't work as it relies on lfs.
But fhFileUtils getFolderContents() uses the File System Object (FSO) method to obtain files from a folder OK.
And then there are the fhSQL problems...
@KFN: The problem that is the focus of this thread is filepath names. Converting file content is not such a problem.
Nevertheless, I believe a library that supports UTF8 encoded filepaths &/or text content will perform whatever translation is needed between UTF8 and the Windows underlying encoding. Our challenge is to find a set of library functions that offer all the file management features required. If we restrict our focus to FH v7.0 then that challenge is somewhat simpler.
For some time, I've run all my Plugins against a filepath including ë which is valid in both ANSI/CP 1252 and UTF8.
That allows fhConvertANSItoUTF8(...) and fhConvertUTF8toANSI(...) to be applied as necessary in FH v6 and FH v7.
The difficulty arises when a user employs Unicode filepath characters that are invalid in ANSI/CP 1252.
As an experiment, I have just tried using Ǯ in filepaths, which is just such a Unicode character.
File System Object (FSO) methods go a long way to fully supporting such filepaths.
Even the FH API fhSaveTextFile() and fhLoadTextFile() handled such filepaths!
The Code Snippet Directory Tree won't work as it relies on lfs.
But fhFileUtils getFolderContents() uses the File System Object (FSO) method to obtain files from a folder OK.
And then there are the fhSQL problems...
@KFN: The problem that is the focus of this thread is filepath names. Converting file content is not such a problem.
Last edited by tatewise on 17 Dec 2021 18:35, edited 1 time in total.
Reason: Corrected description of getFolderContent() which does work OK.
Reason: Corrected description of getFolderContent() which does work OK.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: fhSQL library not compatible with ANSI encoding
@tatewise, I would suspect that a conversion is required when reading a file path from the index as well!
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: fhSQL library not compatible with ANSI encoding
The issue is not whether Windows supports UTF-8/UTF-16; the issue is whether the libraries can and will take advantage of it. If they're coded using fopen to open files, and fopen on Windows doesn't support UTF-8/UTF-16, the libraries have a problem. Windows offers _wfopen as Unicode support, but if the libraries won't use that function for portability reasons, there's still an issue.
I'm going to drop out of this now -- I have an article for the KB partly written which will not delve into the depths of technospeak, but hopefully offer some specific 'recipes' depending on what a plugin author needs to achieve (and also explain the limitations of each recipe).
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- Mark1834
- Megastar
- Posts: 2147
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: fhSQL library not compatible with ANSI encoding
I've been playing with fhSQL a little more....
If it is presented with a filename that is valid in "ANSI" and UTF but not ASCII (e.g. an accented European character), it must be presented in UTF format.
If it is a pure UTF filename (e.g. Cyrillic), fhSQL won't accept it at all, irrespective of whether it's UTF-8 or UTF-16....
My pragmatic solution for this will be to try to convert the filename to ANSI, and if fhIsConversionLossFlagSet() is set, reject the file and tell the user to change the name.
If it is presented with a filename that is valid in "ANSI" and UTF but not ASCII (e.g. an accented European character), it must be presented in UTF format.
If it is a pure UTF filename (e.g. Cyrillic), fhSQL won't accept it at all, irrespective of whether it's UTF-8 or UTF-16....
My pragmatic solution for this will be to try to convert the filename to ANSI, and if fhIsConversionLossFlagSet() is set, reject the file and tell the user to change the name.
Mark Draper
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: fhSQL library not compatible with ANSI encoding
Mark, which variety of utf?
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- Mark1834
- Megastar
- Posts: 2147
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: fhSQL library not compatible with ANSI encoding
Whichever flavour is produced by this script - both 8 and 16 upset fhSQL
Code: Select all
require('fhSQL')
fhfu = require('fhFileUtils')
FileName = 'C:\\Users\\Mark Draper\\Desktop\\Владимир.rmtree'
OK = fhfu.createTextFile(FileName, true, true, os.date(), 16)
S = fhfu.readTextFile(FileName, true, 16)
fhMessageBox(S .. ' retrieved from ' .. FileName)
if not fhfu.fileExists(FileName) then return end
database = fhSQL.connectSQLite(FileName)
Mark Draper
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: fhSQL library not compatible with ANSI encoding
But that file isn't a sql database? So it shouldn't work with fhsql anyway.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- Mark1834
- Megastar
- Posts: 2147
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: fhSQL library not compatible with ANSI encoding
Doesn’t matter. fhSQL doesn’t seem to read the contents on connecting, only that it understands the filename. The file doesn’t even have to exist! A non-database file will generate an un-trapped error on the next step and stop plugin execution, so its error handling probably needs sharpening up as well.
Change the filename to Latin script as a control experiment and the plugin runs to completion.
Change the filename to Latin script as a control experiment and the plugin runs to completion.
Mark Draper