* Multi-line XML definitions

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
User avatar
Mark1834
Megastar
Posts: 2536
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Multi-line XML definitions

Post by Mark1834 »

I've discovered a wrinkle in my plugin that writes Templated Source Definitions. It's unusual, but FH supports a multiline entry for either the template or individual field description. In the template definition file, FH writes this as a single XML line, such as

<Description>Template description text¶Over multiple lines</Description>

I can't figure out what the character is that is used to denote the carriage return. If I iterate through the string with char:byte(), it reports two values, 194 and 182, so it appears to be a two byte character. I need help from those more versed in the black art of character sets please. How do I emulate that behaviour in a plugin? fhGetValueAsText(ptr) will retrieve what is in the actual Source Template Record, complete with conventional carriage returns, but how do I output that as a single line in the definition file?
Mark Draper
User avatar
ColeValleyGirl
Megastar
Posts: 5521
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Multi-line XML definitions

Post by ColeValleyGirl »

194 182 is the utf8 encoding for the pilcrow ¶

Are you writing your content as a utf8 string or ANSI? You need to be using utf8. You've probably got away with it so far because ANSI and utf8 are identical for the first 127 characters which includes the English alphabet. See https://utf8-chartable.de/unicode-utf8 ... l?utf8=dec
User avatar
tatewise
Megastar
Posts: 28490
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Multi-line XML definitions

Post by tatewise »

Yes, ¶ in UTF-8 is 194 182 = C2 B6 in hex. See https://www.fileformat.info/info/unicod ... /index.htm.
The pilcrow ¶ character can be represented in both ANSI and UTF-8 and UTF-16
In my plugins that are fully FH v5, v6 & v7 compatible they are encoded in ANSI.
For FH v6 & v7 they alter to UTF-8 encoding.
An example of script that deals with that is:

Code: Select all

if fhGetAppVersion() > 5 then fhSetStringEncoding("UTF-8") end

StrPilcrow = "¶"			-- Newline symbol
if fhGetAppVersion() > 5 then
	StrPilcrow = fhConvertANSItoUTF8(StrPilcrow)
end
I suspect that in a UTF-8 encoded plugin you can simply use the "¶" pilcrow symbol.
Fact Type (.fhf) files are encoded in UTF-16 LE encoding.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2536
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Multi-line XML definitions

Post by Mark1834 »

Thanks Helen, that’s told me what it is. I’m not doing anything special in my plugin, so I assume it is working in default ANSI. Follow-up question is therefore how do I convert a CR to a pilcrow (which I’d never heard of until 5 mins ago) in a plugin, and output the result to a text file?

Added in edit - thanks Mike, that should be part 2. Oddly, template definition files are UTF8, so hopefully easier to write and only need to be FH7 compatible. I’ll experiment in the morning.
Mark Draper
User avatar
ColeValleyGirl
Megastar
Posts: 5521
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Multi-line XML definitions

Post by ColeValleyGirl »

I suspect CP went with Microsoft's definition of utf initially and can't wind the clock back for fact sets, but didn't make the same mistake for later file types.
User avatar
Mark1834
Megastar
Posts: 2536
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Multi-line XML definitions

Post by Mark1834 »

Nearly there! The following code converts multi-line text to reproduce the example I quoted in the first post:

Code: Select all

fhfu = require('fhFileUtils')

S1 = '<Description>Template description text\nOver multiple lines</Description>'
S2 = S1:gsub('\n', '¶')

File1 = 'V:\\junk\\Codes1.txt'
File2 = 'V:\\junk\\Codes2.txt'
File3 = 'V:\\junk\\Codes3.txt'

fhfu.createTextFile(File1, true, true, S2, 8)	-- save via fhFileUtils
fhSaveTextFile(File2, S2)			-- save via fh function

F = io.open(File3, 'w')				-- save via Lua io library
F:write(S2)
F:close()
In all three cases, the visible file contents are identical, but as expected, the Lua io file is a different format (reported by Notepad++ as UTF-8, while the other two are UTF-8 BOM, the same as FH uses).

Presumably fhFileUtils and fhSaveTextFile are both wrapping the same underlying code, so I might as well just use the fh version if I don't need fhFileUtils for anything else. Is there a simple way of appending content to a UTF-8 BOM file (similar to io.write()), or is it best to construct the file contents as one big string and write in a single operation?
Mark Draper
User avatar
tatewise
Megastar
Posts: 28490
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Multi-line XML definitions

Post by tatewise »

You can force the UTF-8 BOM FH format by prefixing the first line of text with the 3-byte UTF8 BOM:

local bomUtf8 = string.char(0xEF,0xBB,0xBF) -- ""

Then you can use the Lua io write line by line method.

Otherwise, it has to be one large text string.

Presumably, you are aware of the table.insert(...) line by line method ending with table.concat(...) to create the string.
That is much more time & space efficient than using the .. string concatenation operator.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
ColeValleyGirl
Megastar
Posts: 5521
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Multi-line XML definitions

Post by ColeValleyGirl »

fhFileUtils.createTextFile gives you the option NOT to overwrite an existing file, whereas fhSaveTextFile always overwrites existing files.

fhFileUtils.createTextFile gives you the option to create an empty file, whereas fhSaveTextFile doesn't (except by specifying an empty string as strContents).

fhFileUtils.createTextFile gives you the option to create an ANSI file, whereas fhSaveTextFile doesn't.

So which to use depends on what options you need, and as you say, whether you also need anything else from fhFileUtils.

To append to a file use io.open with mode "a" (append) and then io.write to append successive lines/chunks. However, my instincts say that constructing the files contents as a string and then doing a single createTextFile or SaveTextFile will be faster (I'm willing to be corrected) and allow you to exit gracefully without writing a partial update to a file in case of error.
User avatar
Mark1834
Megastar
Posts: 2536
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Multi-line XML definitions

Post by Mark1834 »

Coming back to this after doing something else for a couple of days, I have sorted out Pilcrows and how to handle them (thanks, folks), but it illustrates an obscure bug in FH that I have just reported to CP. The Description field in a Source Template Definition is limited to a single line of text. However, in the actual Source Template itself, it can be expanded to permit multi-line text. If a multi-line description is added, it becomes impossible to sync Template with Definition.

FH reports a successful sync if you modify one to match the other, but they are out of sync again next time FH is loaded. Sometimes it seems to write the XML definition tag over multiple lines, rather than using the Pilcrow, and it appears that FH can't read this back in correctly.

Reading around the topic online, it seems that multi-line tags are valid XML, but most XML readers cannot interpret them correctly.
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2536
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Multi-line XML definitions

Post by Mark1834 »

CP have acknowledged the bug, and it will be fixed in a future release.
Mark Draper
Post Reply