* Problem with utf8 library

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Problem with utf8 library

Post by tatewise »

The KB wording is fine.
Neither of the Lua 5.3 or Lua 5.4 Compatibility sections mentions %z or \0, so presumably, both %z and \0 are supported, which kicks the can a long way. Anyway, users can always redefine utf8.charpattern as they wish.
( I was not entirely happy with Stepan's 'pragmatic' solution, but it will work for a long time. )

What do you think about getting the utf8lua.lua initialisation file to map all utf8 functions to the string library?
Then users do not need to worry about having to use utf8 functions such as utf8.upper(text) or utf8.len(text) because the string functions text:upper() or text:len() will work with any text strings and any existing scripts need not change.
e.g.

Code: Select all

-- utf8lua.lua initialisation file
utf8 = require(".utf8")
require("utf8data")  -- As per utf8 library README.md Configuration: for lower and upper functions
utf8.config = { conversion = { uc_lc = utf8_uc_lc; lc_uc = utf8_lc_uc; } }
utf8:init()
for k,v in pairs(utf8) do string[k] = v end -- Map utf8 onto string for lower, upper, len, gsub, etc.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Problem with utf8 library

Post by ColeValleyGirl »

Interesting -- let me think.
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Problem with utf8 library

Post by ColeValleyGirl »

I'm not convinced by mapping all utf8 functions to the string library -- I'd rather make it explicit option by documenting what authors need to do it they want it. I'm a firm believer in making it obvious what is going on in a plugin.
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Problem with utf8 library

Post by tatewise »

I'm happy with whatever you choose to do, but users are not forced to use require("utf8lua") that I have just introduced.
They can pick and choose whichever parts of what utf8lua.lua performs and replace require("utf8lua") with them.
i.e.
They could use different upper/lower case translation tables instead of utf8data.lua.
They could map any or none of the utf8 functions onto string functions.

I just thought that the majority of authors would want the 900+ upper/lower case translations and would want all string functions to support UTF-8 character strings. Why would they not?

The KB Lua References and Library Modules would explain exactly what require("utf8lua") does.

Maybe we could get feedback from plugin authors here?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Problem with utf8 library

Post by Mark1834 »

I can't really comment on the "how" it is done, as I'm afraid that I have little enthusiasm for that level of detail, but as a user my plea would be to keep it simple please. For example, I would not have written my recent Change Specific Fact plugin if fhFileUtils had not been available for easy reading of UTF-16 fact definition files. If it was something I needed myself and that library was not available, I would have copied the files to simple ANSI versions and used those. My own origins are boringly UK-based, so I have little need for extended character sets for my own use.

Personally, I think maintaining FH5/6 compatibility is a relatively low priority. If there is a significantly better (i.e. easier to use and understand) way to do things in FH7, I would always take that route. FH is not expensive, and has relatively infrequent major upgrades. If that means users who haven't or don't want to upgrade miss out on new plugin features, so be it. Their old plugins continue to work in FH5/6, so they haven't lost anything.
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Problem with utf8 library

Post by tatewise »

This utf8 library is written by Stepan and claimed to offer upper & lower case conversion with utf8data.lua for Lua 5.1 to 5.3.
So most of this thread is involved in changes we have persuaded Stepan to make in order to satisfy that claim.
The KB explains how to achieve Lua 5.1 and Lua 5.3 compatibility using comapt53 and utf8 but is not relevant to the latest point for which we would like author feedback.

The utf8 README file has a few snippets that are relevant to our probable use of utf8.
One approach is to repeat those snippets in the KB to allow the utf8 library to be initialised by authors.
The alternative is to add our own utf8lua.lua script to the library such that require("utf8lua") optionally does it all.
The snippets in README are:
Example: To invoke utf8 and map all its functions onto the string library functions:

Code: Select all

local utf8 = require('.utf8'):init()
for k,v in pairs(utf8) do
  string[k] = v
end
Configuration: For lower and upper functions to work with utf8data.lua:

Code: Select all

local utf8 = require('.utf8')
utf8.config = {
  conversion = {
    uc_lc = utf8_uc_lc,
    lc_uc = utf8_lc_uc
  },
}
utf8:init()
utf8lua.lua can combine those into one small script as shown yesterday.

Helen would prefer that the mapping of all utf8 functions onto the string library functions is not included.
The author would add that separately to make it more obvious.

My argument is that if you want UTF-8 characters supported then you probably want all the usual string functions to work with those characters by default which is what the README suggests.
Remember, the require("utf8lua") is optional and authors can implement whatever README snippets they prefer.

Which of those two options would you prefer?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Problem with utf8 library

Post by ColeValleyGirl »

Mark1834 wrote: 17 Jul 2021 09:06 I would not have written my recent Change Specific Fact plugin if fhFileUtils had not been available for easy reading of UTF-16 fact definition files. If it was something I needed myself and that library was not available, I would have copied the files to simple ANSI versions and used those. My own origins are boringly UK-based, so I have little need for extended character sets for my own use.
I'm pleased the utf16 support in fhFileUtils is useful. The Fact Set files was the use case I had in mind for it.
Personally, I think maintaining FH5/6 compatibility is a relatively low priority. If there is a significantly better (i.e. easier to use and understand) way to do things in FH7, I would always take that route. FH is not expensive, and has relatively infrequent major upgrades. If that means users who haven't or don't want to upgrade miss out on new plugin features, so be it. Their old plugins continue to work in FH5/6, so they haven't lost anything.
I have some sympathy for this view for new plugins, but -- like Mike -- I have some plugins that were written for FH5/6 and need to be updated for FH7. I'll be adding extra functionality in all environments to 1 of them (Add Source from Template), and different functionality in FH7 for another (Research Planner), so need to keep them backward compatible. I'm unlikely to make any new plugins that I write backwards compatible though.
Helen would prefer that the mapping of all utf8 functions onto the string library functions is not included.
My concern is that, if we drop Stepan's utf8 library onto the Lua string library as well as the Lua utf8 library, the native string functions become unavailable. (Stepan's library leaves the native Lua utf8 string library accessible). If somebody wants to use the native string library (perhaps to handle an unconverted UTF16 string or when they know a string will be ANSI) , they can't. Also, having two routes to the same function (utf8.somefunction, string.somefunction) seems to me to run the risk of writing code that others can't understand.

As Mark said, I think we should keep it simple -- which for me involves keeping the string library as it is alongside a beefed up utf8 library.
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Problem with utf8 library

Post by Mark1834 »

As a user, I want the string library to behave exactly as it says on the can, so all the web advice in Stack Exchange and various excellent tutorials is still fully valid. Life’s too short to have to worry about whether a particular function has been overridden by a custom version...
Mark Draper
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Problem with utf8 library

Post by ColeValleyGirl »

Mike
  1. Can you point me to (or attach) a utf8data.lua that you're happy with?
  2. Having thought some more about your proposed packaging, I'll include the contents of your 'utf8lua.lua' in the instructions in Lua References and Library Modules and maybe duplicate it in Unicode String Functions. That way everything is explicit and users can choose whether or not to use utf8data (which will be a hefty table), and or to overload the string library.
  3. I'll probably put together a package for CP using the branch code that Stepan hasn't merged yet, for testing, and then update it when he merges the branch into the master.
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Problem with utf8 library

Post by tatewise »

  1. The https://github.com/Stepets/utf8.lua/blo ... /README.md Configuration: section has a (data example) link to https://github.com/artemshein/luv/blob/ ... f8data.lua which is what I downloaded.
  2. Yes, that is reasonable, especially as I've found that some utf8 functions run slower than their string counterparts.
    Could you include utf8lua.lua (without the utf8 to string mapping) in the utf8 package for users to optionally require?
  3. That seems a rational way forward.
  4. When this has all succeeded I will update the Install Library Modules plugin.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Problem with utf8 library

Post by ColeValleyGirl »

tatewise wrote: 20 Jul 2021 13:01 Could you include utf8lua.lua (without the utf8 to string mapping) in the utf8 package for users to optionally require?
It would have to be packaged as a module, and to be honest, I can't see any benefits of doing so.
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Problem with utf8 library

Post by tatewise »

Just so I understand, can you explain why utf8lua.lua needs to be packaged as a module.
I have it sitting alongside utf8.lua and utf8data.lua in the C:\Program Files (x86)\Family Historian\Program\Lua folder, and the require("utf8lua") command works just fine.

I see the benefit as users just need a simple require("utf8lua") command similar to all the other library modules.
The need to use utf8 = require(".utf8"):init() has always seemed a bit eccentric to me.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Problem with utf8 library

Post by ColeValleyGirl »

tatewise wrote: 20 Jul 2021 13:35 I see the benefit as users just need a simple require("utf8lua") command similar to all the other library modules.
Mike, I believe that we should invoke a module in the way that the module's author documents, to avoid confusion between what we document and what authors might find elsewhere on line. In other words, echoing Mark's request that
all the web advice in Stack Exchange and various excellent tutorials is still fully valid.
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Problem with utf8 library

Post by tatewise »

Yes, that is exactly what utf8lua.lua does.
It implements the advice in the https://github.com/Stepets/utf8.lua/blo ... /README.md file, except that there is a mistake in that README file that I've pointed out to Stepan and hopefully will get fixed.

Authors don't have to require utf8lua.lua and can construct whatever script is appropriate for their use.
In any case, Lua References and Library Modules must advise on at least three modes of invoking the utf8 library module.
  1. Plain vanilla with string versions of upper & lower

    utf8 = require(".utf8"):init()

    *
  2. As per README Configuration: for lower and upper functions with utfdata.lua

    utf8 = require(".utf8")
    require("utf8data")
    utf8.config = { conversion = { uc_lc = utf8_uc_lc; lc_uc = utf8_lc_uc; } }
    utf8:init()

    *
  3. Either of the above together with mapping onto string library

    for k,v in pairs(utf8) do string[k] = v end
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Problem with utf8 library

Post by ColeValleyGirl »

tatewise wrote: 20 Jul 2021 15:48 Yes, that is exactly what utf8lua.lua does.
Confused... how does introducing a FH-specific piece of code square with
all the web advice in Stack Exchange and various excellent tutorials is still fully valid.
I'll be packaging the branch code imminently, with the intention of updating it to the master code if that's updated before the next release of FH7.
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Problem with utf8 library

Post by tatewise »

Ok. You're the boss. :D
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Problem with utf8 library

Post by tatewise »

Stepan has now merged the https://github.com/Stepets/utf8.lua/tre ... r_utf8data lower/upper fork back into the https://github.com/Stepets/utf8.lua master.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Problem with utf8 library

Post by ColeValleyGirl »

Thx. Doesn't look as if there were any significant later commits so I'll stick with what I've already packaged.
Post Reply