gmatch and quotes which aren't quotes
Posted: 28 Jan 2021 07:28
This is a bit long-winded, but thought it might be of interest to somebody else.
First of all, I've discovered another great use for Zerobrane. If you have a section of code which doesn't have any calls to FH built-in functions, you may be able to debug it in the Zerobrane debugger.
In my case, I had a long string -- the result of rt:GetText() from which I wanted to extract the tables. The richtext comes from autotext for plugins.
The string looks like this:
The code to extract the tables is:
My problem -- the second table was not being picked up by gmatch. I fooled around quite a bit trying to see what was different about the second table until finally, I noticed that the row for TX-Citing includes quotes which are not quotes. I don't remember what they're called, but they are slanted or curved (depending on font and your editor) instead of up-and-down like regular quotes.
Adding the following line before starting the loop solved my problem:
It may be possible to handle this with an addition to the pattern in gmatch, but I'm not the greatest at constructing patterns, so I'll stick with my solution for the time being. I'd be happy to substitute a better solution if somebody knows of one.
First of all, I've discovered another great use for Zerobrane. If you have a section of code which doesn't have any calls to FH built-in functions, you may be able to debug it in the Zerobrane debugger.
In my case, I had a long string -- the result of rt:GetText() from which I wanted to extract the tables. The richtext comes from autotext for plugins.
The string looks like this:
Code: Select all
local sText = [[
<table="2670|15690">
<row> Source | </row>
<row> Type | Split </row>
<row> Template | Civil Registration </row>
<row> GenericType | Birth Index (gro) </row>
<row> TextFromSource | true </row>
<row> MediaFilenameFormat | StandardMediaFormat( {INDI.Name}, {EN-Fact_Name}, {EN-Suffix}, {INDI.BirthPlace:COUNTRY}, {INDI.BirthPlace:COUNTY}, {INDI.BirthDate} ) </row>
</table>
<table="2685|800|13020">
<row> Fields | | </row>
<row> Field | <align="c">DPU | Value </row>
<row> RP-DataProvider | <align="c">D | FindMyPast </row>
<row> TX-Fact_Name | <align="c">D | Birth </row>
<row> TX-Prefix | <align="c">D | BIRT </row>
<row> TX-Suffix | <align="c">D | index (gro) </row>
<row> TX-Database | <align="c">D | England and Wales Births 1837-2006 </row>
<row> EN-DB_Type | <align="c">D | database and images </row>
<row> TX-Citing | <align="c">D | citing General Register Office, “England and Wales Civil Registration Indexes,” London, England </row>
<row> NM-Name_Recorded_1 | <align="c">U | {Principal.Name} </row>
<row> DT-Fact_Date | <align="c">U | {Principal.BirthDate} </row>
</table>
<fs="+3">GRO Birth Index</fs>
<b><fs="+1">Principal</b></fs>
<table="2625|2415|800|12510">
<row> Tabular | | | </row>
<row> Label | Token | <align="c">TLV | Expression </row>
<row> Linked To | {Principal.LinkedTo} | <align="c">L | Principal </row>
<row> Name | {Principal.Name} | <align="c">T | {First name(s)} {Last name} </row>
<row> Birth Date | {Principal.BirthDate} | <align="c">T | Q{Birth quarter} {Birth year} </row>
<row> Birth Place | {Principal.BirthPlace} | <align="c">T | {District} Registration District, {County}, {Country} </row>
</table>
<b><fs="+1">Father</b></fs>
<table="2625|2415|800|12510">
<row> Tabular | | | </row>
<row> Label | Token | <align="c">TLV | Expression </row>
<row> Linked To | {Father.LinkedTo} | <align="c">L | Father </row>
<row> Name | {Father.Name} | <align="c">T | _____ {Last name} </row>
</table>
<b><fs="+1">Mother</b></fs>
<table="2625|2415|800|12510">
<row> Tabular | | | </row>
<row> Label | Token | <align="c">TLV | Expression </row>
<row> Linked To | {Mother.LinkedTo} | <align="c">L | Mother </row>
<row> Name | {Mother.Name} | <align="c">T | {Mother's maiden name} </row>
</table>
<b><fs="+1">Reference</b></fs>
<table="2625|2415|800|12510">
<row> Tabular | | | </row>
<row> Label | Token | <align="c">TLV | Expression </row>
<row> Volume | {Reference.Volume} | <align="c">T | {Volume} </row>
<row> Page | {Reference.Page} | <align="c">T | {Page} </row>
</table>
]]
Code: Select all
function rtUtils.ExtractTables(rt, bStripTags)
local result = {}
local sText = rt:GetText()
local tIdx = 0
local rIdx
for t in string.gmatch(sText, '<table[%g%s]-</table>') do
tIdx = tIdx + 1
result[tIdx] = {}
rIdx = 0
for r in string.gmatch(t, '<row>[%g%s]-</row>') do
rIdx = rIdx + 1
result[tIdx][rIdx] = rtUtils.ExtractColumns(r, bStripTags)
end
end
return result
end
Adding the following line before starting the loop solved my problem:
Code: Select all
sText = sText:gsub('[“”]', '')