Page 1 of 2

Error when renaming file with accented characters

Posted: 25 Feb 2021 16:21
by JoopvB
I have a plugin that changes the names of the media linked to a selection of sources to the auto title of the source.

Works very well but not with accented characters. As soon as ë or é etc. are in the title os.rename throws an error.

Correcting it manually is no problem (the filename, media name and auto tile all support accented characters).

Any suggestions on how to fix this are very welcome.

Re: Error when renaming file with accented characters

Posted: 25 Feb 2021 16:31
by ColeValleyGirl
See https://stackoverflow.com/questions/541 ... unicode-ch for some ideas.

The winapi library would be the best solution, but it's not been made available by Calico Pie (and when I asked some years ago, they were unwilling to do so. If however, you're writing for your own personal use you could consider it: https://stackoverflow.com/a/36719757/1943174

Re: Error when renaming file with accented characters

Posted: 25 Feb 2021 16:45
by tatewise
Yes, that is a tricky issue and is only likely to get worse as more accented characters get used in filenames.

I assume that your Plugin File > Encoding is UTF-8
The problem is that Lua file handling tools like os.rename and the lfs library only understand ANSI.

So, use the fhConvertUTF8toANSI(...) function on the filename before supplying to the os.rename function.

If the names include UTF-8 characters outside the ANSI set then Helen's solution is an option and another is:
https://github.com/cloudwu/luawinfile luawinfile has UTF-8 filename alternatives for lfs and Lua functions.
I raised that with CP last June/July under Unicode in Folder/File Paths in Plugins log #138290.

Re: Error when renaming file with accented characters

Posted: 26 Feb 2021 17:22
by JoopvB
Indeed tricky.

I now have os.rename working with Helen's solution (https://stackoverflow.com/a/36719757/1943174) as well as Mike's (fhConvertUTF8toANSI). As far as I can see both conversion results are the same.

The tricky thing is that I need to store the filename in the media record without the ANSI conversion and when using it to test if the file exist (os.open) I need to convert it before the test. Saving the ANSI filename in the media record doesn't work and will also break the link. Don't really understand why. But it works and that's something.

Thanks Helen and Mike.

Re: Error when renaming file with accented characters

Posted: 26 Feb 2021 18:20
by tatewise
The explanation is entirely down to the character encoding.
The binary code used to represent characters depends on the character encoding.

ASCII encoding defines 7-bit numbers for the popular characters 0-9, A-Z, a-z, and most symbols on a keyboard.
Most other encodings include these initial 128 characters.

With ANSI encoding, the character encoding increases to an 8-bit number, i.e. one byte.
The extra 128 characters provide some accented letters and extra symbols.
However, different countries have variants that substitute different characters.

Unicode with its UTF-8 and UTF-16 encodings is a worldwide standard for characters from every language and a vast array of symbols but requires more bits to represent them.

So the binary codes for ë and é in ANSI are different from their codes in UTF-8.

Internally in all its records including Media FH uses UTF-8.
Lua functions such as os.rename require ANSI codes.
So in Plugins, you must convert from one to the other.

But even that will fail if the Media record has filename UTF-8 characters that are not supported by ANSI because no conversion is possible and a ? will get substituted.

Re: Error when renaming file with accented characters

Posted: 26 Feb 2021 18:35
by JoopvB
Clear explanation.

Thanks Mike!

Re: Error when renaming file with accented characters

Posted: 26 Feb 2021 19:28
by tatewise
I have been looking at http://stevedonovan.github.io/winapi/api.html.

Exactly how did you implement that winapi solution and what functions did you utilise?

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 11:05
by JoopvB
Mike,

I didn't use the api, but used the conversion table in the stackoverflow example starting with: utf8_to_cp1252 = (
function(cp1252_description)
....

That gives me the same results as using fhConvertUTF8toANSI.

Yesterday I've published the updated version (1.2) of Rename Selected Source Media in the plugin store. Both approached are in the plugin code, but it uses fhConvertUTF8toANSI, the other one is left in there for reference only.

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 13:17
by tatewise
Understood, but your Plugin needs to warn users that where any Unicode (UTF-8) character in the Source Title does not translate to ANSI Code Page 1252 then a question mark (?) will be substituted in the filename.

If you are interested in eliminating that exception then request CP to support either the luawinfile or winapi library.
My preference is for the luawinfile library that provides substitutes for the Lua os library and lfs module.

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 13:45
by ColeValleyGirl
Mike, luawinfile isn't available in Luarocks, doesn't document it's dependencies and there only appears to be a c module available? Have you found a dll anywhere, and documented dependencies? Without those, it's a non-starter I believe. If you've used it, do you have a dll and where did you get it from?

winapi is in Luarocks (so can be compiled) and the package documents its dependencies. It has a wider set of functions, including UTF-16 support, as well as registry access functionality, process and window handling, and some others.

I'd be willing to put in the work to get winapi supported, but there's no starting point for luawinfile... And just asking for a library without doing the prep work is not going to get any priority attention.

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 16:01
by tatewise
It seems luawinfile has sunk without trace. The links I had no longer work. I never had any dll. So winapi is the solution.

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 17:21
by JoopvB
@Mike I'll put the warning in the plugin and also check for when it happens.

@Helen/@Mike Can I help to get winapi supported by CP and if so how?

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 17:35
by ColeValleyGirl
JoopvB, I'll get the library compiled and post it here for testing, if you (and maybe Mike) would be willing to test it. Once we know it works, I can raise a request with Calico Pie and mention you (and Mike) as supporting it, and we'll see what they say.

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 17:42
by ColeValleyGirl
I can't attach the file because it's a dll, but here's a zip:
winapi.zip
(19.29 KiB) Downloaded 107 times
It needs to be dropped into C:\Program Files (x86)\Family Historian\Program\Lua and required by

Code: Select all

require('winapi')
Documentation at http://stevedonovan.github.io/winapi/api.html

UTF8 filenames: refer to the short_path function; there's an example piece of code using it.

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 17:58
by JoopvB
Can't get it from Dropbox either; zip will do.

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 18:08
by ColeValleyGirl
It hasn't finished synching. Try the zip attached.
winapi.zip
(19.29 KiB) Downloaded 136 times

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 19:17
by tatewise
I have extracted and copied winapi.dll to
C:\Program Files (x86)\Family Historian\Program\Lua\winapi.dll Created: 27 ‎Feb ‎2021, ‏‎17:37:44 Size: 41.0 KB (41,984 bytes)

It has the same Security Permissions as other similar dll.

But require('winapi') says:
An error has occurred - plugin failed to complete
error loading module 'winapi' from file 'C:\Program Files (x86)\Family Historian\Program\Lua\winapi.dll':
The specified module could not be found.

Does it need something else like winapi.lua or a winapi folder?

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 19:54
by Ron Melby
what size is your winapi.dll?

it needs static linking with some other dlls not dynamic linking

some I remember are:
msvcrt.dll (must be the same version)
libgcc_s_dw2.dll
lua5.1-32.dll

it has been so long since I read about it, it might need some googling.?

Re: Error when renaming file with accented characters

Posted: 27 Feb 2021 20:12
by ColeValleyGirl
I've had it working in the past. I'll have a look at the problem tomorrow

Re: Error when renaming file with accented characters

Posted: 28 Feb 2021 15:08
by ColeValleyGirl
OK, or rather, not OK.

It doesn't look as if the author of winapi (Steve Donovan) has updated it for Lua 5.3.

Steve is also the original author of the Penlight libraries, for which maintenance seems has been taken over by Thijs Schreijer/Tieske, so I'm not confident that winapi will be updated (last updates were in 2014, before 5.3 was released). Steve's attention seems to have transferred to Rust (not the metallic kind).

There are a load of forks for winapi, but none by names I recognise or trust. I can also find a binary (dll) which purports to be Lua 5.3 compatible, but that doesn't work either.

We may be at a dead end -- unless we have a c programmer amongst us who is willing to fork the original code and update it for 5.3...

Which is a pity, because it's an incredibly useful library.

Re: Error when renaming file with accented characters

Posted: 28 Feb 2021 17:06
by JoopvB
Helen, thanks for looking into it. Would have been nice, but fhConvertUTF8toANSI will do for most accented characters.

I think I'll change the plugin to replace "invalid" characters with '~' and report it in the result set.

Re: Error when renaming file with accented characters

Posted: 28 Feb 2021 17:11
by tatewise
It seems that being able to support UTF-8 filenames in Lua is fated to fail. That is a shame and surprising.

Am I correct in thinking the winapi.dll should work with FH V6.2?

Replacing ? with ~ is necessary because ? is not an allowed character in Windows filenames.
Presumably, you are doing the same for all the other disallowed characters \ / : * " < > |

Re: Error when renaming file with accented characters

Posted: 28 Feb 2021 17:43
by ColeValleyGirl
Winapi does work with fh6. 2 or at least it did some time ago. I used it to access utf16 files and also to do window handling when fh hid popup windows behind the main program window. Only for private use of course

Re: Error when renaming file with accented characters

Posted: 01 Mar 2021 10:33
by JoopvB
@Mike
When using fhConvertUTF8toANSI it converts the characters in the range (0-255) as expected and the plugin reports the conversion. For characters outside this range fhConvertUTF8toANSI will convert to something as close as possible (according to the help; I tested it and it's doing a nice job). The plugin reports the conversion and I've put in a warning as well.

Re: Error when renaming file with accented characters

Posted: 01 Mar 2021 12:09
by tatewise
I presume by "characters in the range (0-255)" you mean the UTF-8 characters derived from ANSI codes 0 - 255.
Only the codes 0 - 127 represent the same characters in both UTF-8 and ANSI.
The ANSI characters with codes 128 - 255 have completely different codes in UTF-8.

Yes, the fhConvertUTF8toANSI function does its best to convert UTF-8 characters to similar ANSI characters.
Except for exact ANSI equivalents, all forms of accented A become plain A, all forms of accented b become plain b, and so on.
But there are plenty of UTF-8 characters where there is no such obvious mapping and they do become ?.
It is necessary to convert all disallowed filename characters \ / : * ? " < > | to a valid character such as ~.