* gmatch and quotes which aren't quotes

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
avatar
shoshk
Famous
Posts: 242
Joined: 13 May 2015 16:28
Family Historian: V7
Location: Mitzpe Jericho, Israel

gmatch and quotes which aren't quotes

Post by shoshk » 28 Jan 2021 07:28

This is a bit long-winded, but thought it might be of interest to somebody else.

First of all, I've discovered another great use for Zerobrane. If you have a section of code which doesn't have any calls to FH built-in functions, you may be able to debug it in the Zerobrane debugger.

In my case, I had a long string -- the result of rt:GetText() from which I wanted to extract the tables. The richtext comes from autotext for plugins.

The string looks like this:

Code: Select all

local sText = [[
<table="2670|15690">
<row> Source |  </row>
<row> Type | Split </row>
<row> Template | Civil Registration </row>
<row> GenericType | Birth Index (gro) </row>
<row> TextFromSource | true </row>
<row> MediaFilenameFormat | StandardMediaFormat( {INDI.Name}, {EN-Fact_Name}, {EN-Suffix}, {INDI.BirthPlace:COUNTRY}, {INDI.BirthPlace:COUNTY}, {INDI.BirthDate} ) </row>
</table>

<table="2685|800|13020">
<row> Fields |  |  </row>
<row> Field | <align="c">DPU | Value </row>
<row> RP-DataProvider | <align="c">D | FindMyPast </row>
<row> TX-Fact_Name | <align="c">D | Birth </row>
<row> TX-Prefix | <align="c">D | BIRT </row>
<row> TX-Suffix | <align="c">D | index (gro) </row>
<row> TX-Database | <align="c">D | England and Wales Births 1837-2006 </row>
<row> EN-DB_Type | <align="c">D | database and images </row>
<row> TX-Citing | <align="c">D | citing General Register Office, “England and Wales Civil Registration Indexes,” London, England </row>
<row> NM-Name_Recorded_1 | <align="c">U | {Principal.Name} </row>
<row> DT-Fact_Date | <align="c">U | {Principal.BirthDate} </row>
</table>

<fs="+3">GRO Birth Index</fs>

<b><fs="+1">Principal</b></fs>
<table="2625|2415|800|12510">
<row> Tabular |  |  |  </row>
<row> Label | Token | <align="c">TLV | Expression </row>
<row> Linked To | {Principal.LinkedTo} | <align="c">L | Principal </row>
<row> Name | {Principal.Name} | <align="c">T | {First name(s)} {Last name} </row>
<row> Birth Date | {Principal.BirthDate} | <align="c">T | Q{Birth quarter} {Birth year} </row>
<row> Birth Place | {Principal.BirthPlace} | <align="c">T | {District} Registration District, {County}, {Country} </row>
</table>

<b><fs="+1">Father</b></fs>
<table="2625|2415|800|12510">
<row> Tabular |  |  |  </row>
<row> Label | Token | <align="c">TLV | Expression </row>
<row> Linked To | {Father.LinkedTo} | <align="c">L | Father </row>
<row> Name | {Father.Name} | <align="c">T | _____ {Last name} </row>
</table>

<b><fs="+1">Mother</b></fs>
<table="2625|2415|800|12510">
<row> Tabular |  |  |  </row>
<row> Label | Token | <align="c">TLV | Expression </row>
<row> Linked To | {Mother.LinkedTo} | <align="c">L | Mother </row>
<row> Name | {Mother.Name} | <align="c">T | {Mother's maiden name} </row>
</table>

<b><fs="+1">Reference</b></fs>
<table="2625|2415|800|12510">
<row> Tabular |  |  |  </row>
<row> Label | Token | <align="c">TLV | Expression </row>
<row> Volume | {Reference.Volume} | <align="c">T | {Volume} </row>
<row> Page | {Reference.Page} | <align="c">T | {Page} </row>
</table>
]]
The code to extract the tables is:

Code: Select all

function rtUtils.ExtractTables(rt, bStripTags)
	local result = {}
	local sText = rt:GetText()
	local tIdx = 0
	local rIdx
	for t in string.gmatch(sText, '<table[%g%s]-</table>') do
		tIdx = tIdx + 1
		result[tIdx] = {}
		rIdx = 0
		for r in string.gmatch(t, '<row>[%g%s]-</row>') do
			rIdx = rIdx + 1
			result[tIdx][rIdx] = rtUtils.ExtractColumns(r, bStripTags)
		end
	end
	return result
end
My problem -- the second table was not being picked up by gmatch. I fooled around quite a bit trying to see what was different about the second table until finally, I noticed that the row for TX-Citing includes quotes which are not quotes. I don't remember what they're called, but they are slanted or curved (depending on font and your editor) instead of up-and-down like regular quotes. :x

Adding the following line before starting the loop solved my problem:

Code: Select all

	sText = sText:gsub('[“”]', '')
It may be possible to handle this with an addition to the pattern in gmatch, but I'm not the greatest at constructing patterns, so I'll stick with my solution for the time being. I'd be happy to substitute a better solution if somebody knows of one.
Shosh Kalson

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: gmatch and quotes which aren't quotes

Post by tatewise » 28 Jan 2021 10:37

Since with [%g%s] you want to match every character, why not use dot . which is specifically intended to match all characters?
See http://www.lua.org/manual/5.3/manual.html#6.4.1 Patterns and 2nd bullet " .: (a dot) represents all characters."
e.g.
string.gmatch(sText, '<table.-</table>')
and
string.gmatch(t, '<row>.-</row>')

BTW: The “ and ” are known as smart quotes and are some of the UTF-8 multi-byte symbols, which is probably why they don't match %g or %s but do match dot . which simply allows each byte to have any value.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
shoshk
Famous
Posts: 242
Joined: 13 May 2015 16:28
Family Historian: V7
Location: Mitzpe Jericho, Israel

Re: gmatch and quotes which aren't quotes

Post by shoshk » 28 Jan 2021 10:40

I tried . at one point but it didn't work. The pattern may not have been exactly the same. I'll give your suggestion a try.
Shosh Kalson

Post Reply