I am writing a plugin that needs to read gedcom files directly. (Mentioned by Bundle in the thread "Problems with note fields truncating in FTM imports" I am trying to fix this problem.
I am very new to Lua though I have written a lot of VB and C. I have no trouble reading the gedcoms from Ancestry and FTM which are UTF-8.
My problem is that the FH gedcom is in full blown Unicode not UTF-8, and Lua does not seem to be able to read it: I am using
for sLine in io.lines(sFile) do
......
Can anybody point me to any syntax I haven't found that will read the unicode file?
I can make it work by opening the file in Notepad and doing a Save As to UTF-8. Is there any reason not to do this? FH still seems to like it, at least on the surface.
* Reading Gedcom file directly
Reading Gedcom file directly
Ian Johnson - researching Bain, Batley, Elsden, Ewen and Johnson families and the village of Easton Royal
(i>
(i>
- tatewise
- Megastar
- Posts: 27082
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Reading Gedcom file directly
Reading Unicode files (either UTF8 or UTF16) in LUA is very tricky, because LUA knows nothing about the byte encoding and treats every byte as an ANSI character, plus the newline character in UTF16 is not recognised by for sLine in io.lines(sFile) do.
I have written some library modules that will read Unicode files but you probbaly don't need them.
If your objective is to correct the NOTE field CONCatenation level error from FTM Gedcom then that can be done much more simply.
The technique is to search all records and fields for the NOTE2 tag, then if any subsequent CONC UDF tags are found then concatenate their text onto the Note text and delete the CONC item.
e.g.
I have written some library modules that will read Unicode files but you probbaly don't need them.
If your objective is to correct the NOTE field CONCatenation level error from FTM Gedcom then that can be done much more simply.
The technique is to search all records and fields for the NOTE2 tag, then if any subsequent CONC UDF tags are found then concatenate their text onto the Note text and delete the CONC item.
e.g.
Code: Select all
function Main()
for intRec, strRec in ipairs({"INDI","FAM","SOUR"}) do -- Loop through desired record types
local ptrNote, ptrConc, isOK
local ptrRef = fhNewItemPtr()
ptrRef:MoveToFirstRecord(strRec)
while ptrRef:IsNotNull() do -- Loop through all data fields
local ptrTag = ptrRef:Clone()
local strTag = fhGetTag(ptrTag)
ptrRef:MoveNextSpecial()
if strTag == "NOTE2" then
ptrNote = ptrTag:Clone() -- Remember latest Note field
elseif strTag == "CONC" then
ptrConc = ptrTag:Clone() -- Found CONCatenation field
local strNote = fhGetValueAsText(ptrNote)
local strConc = fhGetValueAsText(ptrConc)
strNote = strNote..strConc -- Append CONC text to NOTE text
isOK = fhSetValueAsText(ptrNote,strNote)
isOK = fhDeleteItem(ptrConc) -- Delete CONC field
end
end
end
end
Main()
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Reading Gedcom file directly
Thanks for prompt reply.
In my simple way I thought if the file is wrong you open it up and fix it!
It works - very impressive - but I think there is some serious learning curve ahead of me!
I think I understand broadly how your code works - but could you explain why fhGetTag returns "NOTE2" for the note line but simply "CONC" for the following lines.
But again thanks
Ian (i>
In my simple way I thought if the file is wrong you open it up and fix it!
It works - very impressive - but I think there is some serious learning curve ahead of me!
I think I understand broadly how your code works - but could you explain why fhGetTag returns "NOTE2" for the note line but simply "CONC" for the following lines.
But again thanks
Ian (i>
Ian Johnson - researching Bain, Batley, Elsden, Ewen and Johnson families and the village of Easton Royal
(i>
(i>
- tatewise
- Megastar
- Posts: 27082
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Reading Gedcom file directly
Yes, you could work directly on a Gedcom file, but it would be easier to work on the UTF8 (or ANSI) file exported from FTM rather than the FH active UTF16 imported file.
However, in most cases, and this is one of them, it is even easier to work through the FH API and correct the internal FH database.
The FH internal tag names are not specifically to do with LUA Plugins, but generally the way Data References are formed. If you use any Data Reference Assistant in FH you will see that a local Note for say an INDIvidual record uses %INDI.NOTE2%, whereas a shared Note record uses %INDI.NOTE>%, despite the Gedcom tag in both cases being NOTE. This convention is used throughout to differentiate local tags from link tags without having to interrogate their parameter. So SOUR2 is a local Source Note whereas SOUR is a link to a Source Record, and OBJE2 is a local Media object whereas OBJE is a link to a Media record. With that convention it is easy to know whether to use fhGetValueAsText() or fhGetValueAsLink().
There is a great deal of advice in the plugins:index|> Family Historian Plugins and plenty of Plugins to plagiarise.
However, in most cases, and this is one of them, it is even easier to work through the FH API and correct the internal FH database.
The FH internal tag names are not specifically to do with LUA Plugins, but generally the way Data References are formed. If you use any Data Reference Assistant in FH you will see that a local Note for say an INDIvidual record uses %INDI.NOTE2%, whereas a shared Note record uses %INDI.NOTE>%, despite the Gedcom tag in both cases being NOTE. This convention is used throughout to differentiate local tags from link tags without having to interrogate their parameter. So SOUR2 is a local Source Note whereas SOUR is a link to a Source Record, and OBJE2 is a local Media object whereas OBJE is a link to a Media record. With that convention it is easy to know whether to use fhGetValueAsText() or fhGetValueAsLink().
There is a great deal of advice in the plugins:index|> Family Historian Plugins and plenty of Plugins to plagiarise.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Reading Gedcom file directly
I found a problem with your example, Mike
If CONC records not associated with a NOTE are encountered, it will append them too, I think. "Fortunately' I ran into one before the first NOTE so it died due to ptrNote being nil.
So I changed it:-
...
if strTag == "NOTE2" then
ptrNote = ptrTag:Clone() -- Remember latest Note field
elseif strTag == "CONC" then
if (ptrNote ~= nil) then
ptrConc = ptrTag:Clone() -- Found CONCatenation field
local strNote = fhGetValueAsText(ptrNote)
local strConc = fhGetValueAsText(ptrConc)
strNote = strNote..strConc -- Append CONC text to NOTE text
isOK = fhSetValueAsText(ptrNote,strNote)
isOK = fhDeleteItem(ptrConc) -- Delete CONC field
end
else
ptrNote = nil -- forget the NOTE field
end
...
btw I could not find how to make that nice scrolling code box come up
If CONC records not associated with a NOTE are encountered, it will append them too, I think. "Fortunately' I ran into one before the first NOTE so it died due to ptrNote being nil.
So I changed it:-
...
if strTag == "NOTE2" then
ptrNote = ptrTag:Clone() -- Remember latest Note field
elseif strTag == "CONC" then
if (ptrNote ~= nil) then
ptrConc = ptrTag:Clone() -- Found CONCatenation field
local strNote = fhGetValueAsText(ptrNote)
local strConc = fhGetValueAsText(ptrConc)
strNote = strNote..strConc -- Append CONC text to NOTE text
isOK = fhSetValueAsText(ptrNote,strNote)
isOK = fhDeleteItem(ptrConc) -- Delete CONC field
end
else
ptrNote = nil -- forget the NOTE field
end
...
btw I could not find how to make that nice scrolling code box come up
Ian Johnson - researching Bain, Batley, Elsden, Ewen and Johnson families and the village of Easton Royal
(i>
(i>
- tatewise
- Megastar
- Posts: 27082
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Reading Gedcom file directly
Note any Plugin changes can be reversed using Edit > Undo Plugin Updates before closing FH.
My Plugin was a quick proof of concept prototype.
It needs refining to accommodate other fields that may have CONC UDF tags after importing a Gedcom.
These can include TEXT and PAGE tags as well as NOTE tags.
If any other tag is detected then the pointer should bet SetNull() to prevent CONC text being appended incorrectly.
A test similar to yours is good defensive coding, but should use ptrNote:IsNotNull() or ptrNote:IsNull() as required.
Also, other record types than the three I included should be searched.
To format postings use the BBCode icons above this edit box (similar to word processor formatting).
To create scrolling code use the </> icon to insert 'code' tags.
My Plugin was a quick proof of concept prototype.
It needs refining to accommodate other fields that may have CONC UDF tags after importing a Gedcom.
These can include TEXT and PAGE tags as well as NOTE tags.
If any other tag is detected then the pointer should bet SetNull() to prevent CONC text being appended incorrectly.
A test similar to yours is good defensive coding, but should use ptrNote:IsNotNull() or ptrNote:IsNull() as required.
Also, other record types than the three I included should be searched.
To format postings use the BBCode icons above this edit box (similar to word processor formatting).
To create scrolling code use the </> icon to insert 'code' tags.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry