Page 1 of 1
Reading Gedcom file directly
Posted: 29 Jun 2015 17:43
by Barnowl
I am writing a plugin that needs to read gedcom files directly. (Mentioned by Bundle in the thread "Problems with note fields truncating in FTM imports" I am trying to fix this problem.
I am very new to Lua though I have written a lot of VB and C. I have no trouble reading the gedcoms from Ancestry and FTM which are UTF-8.
My problem is that the FH gedcom is in full blown Unicode not UTF-8, and Lua does not seem to be able to read it: I am using
for sLine in io.lines(sFile) do
......
Can anybody point me to any syntax I haven't found that will read the unicode file?
I can make it work by opening the file in Notepad and doing a Save As to UTF-8. Is there any reason not to do this? FH still seems to like it, at least on the surface.
Re: Reading Gedcom file directly
Posted: 29 Jun 2015 18:30
by tatewise
Reading
Unicode files (either
UTF8 or
UTF16) in
LUA is very tricky, because
LUA knows nothing about the byte encoding and treats every byte as an
ANSI character, plus the newline character in
UTF16 is not recognised by
for sLine in io.lines(sFile) do.
I have written some library modules that will read
Unicode files but you probbaly don't need them.
If your objective is to correct the
NOTE field
CONCatenation level error from
FTM Gedcom then that can be done much more simply.
The technique is to search all records and fields for the
NOTE2 tag, then if any subsequent
CONC UDF tags are found then concatenate their text onto the
Note text and delete the
CONC item.
e.g.
Code: Select all
function Main()
for intRec, strRec in ipairs({"INDI","FAM","SOUR"}) do -- Loop through desired record types
local ptrNote, ptrConc, isOK
local ptrRef = fhNewItemPtr()
ptrRef:MoveToFirstRecord(strRec)
while ptrRef:IsNotNull() do -- Loop through all data fields
local ptrTag = ptrRef:Clone()
local strTag = fhGetTag(ptrTag)
ptrRef:MoveNextSpecial()
if strTag == "NOTE2" then
ptrNote = ptrTag:Clone() -- Remember latest Note field
elseif strTag == "CONC" then
ptrConc = ptrTag:Clone() -- Found CONCatenation field
local strNote = fhGetValueAsText(ptrNote)
local strConc = fhGetValueAsText(ptrConc)
strNote = strNote..strConc -- Append CONC text to NOTE text
isOK = fhSetValueAsText(ptrNote,strNote)
isOK = fhDeleteItem(ptrConc) -- Delete CONC field
end
end
end
end
Main()
Re: Reading Gedcom file directly
Posted: 29 Jun 2015 21:23
by Barnowl
Thanks for prompt reply.
In my simple way I thought if the file is wrong you open it up and fix it!
It works - very impressive - but I think there is some serious learning curve ahead of me!
I think I understand broadly how your code works - but could you explain why fhGetTag returns "NOTE2" for the note line but simply "CONC" for the following lines.
But again thanks
Ian (i>
Re: Reading Gedcom file directly
Posted: 29 Jun 2015 21:46
by tatewise
Yes, you could work directly on a Gedcom file, but it would be easier to work on the UTF8 (or ANSI) file exported from FTM rather than the FH active UTF16 imported file.
However, in most cases, and this is one of them, it is even easier to work through the FH API and correct the internal FH database.
The FH internal tag names are not specifically to do with LUA Plugins, but generally the way Data References are formed. If you use any Data Reference Assistant in FH you will see that a local Note for say an INDIvidual record uses %INDI.NOTE2%, whereas a shared Note record uses %INDI.NOTE>%, despite the Gedcom tag in both cases being NOTE. This convention is used throughout to differentiate local tags from link tags without having to interrogate their parameter. So SOUR2 is a local Source Note whereas SOUR is a link to a Source Record, and OBJE2 is a local Media object whereas OBJE is a link to a Media record. With that convention it is easy to know whether to use fhGetValueAsText() or fhGetValueAsLink().
There is a great deal of advice in the plugins:index|> Family Historian Plugins and plenty of Plugins to plagiarise.
Re: Reading Gedcom file directly
Posted: 01 Jul 2015 13:48
by Barnowl
I found a problem with your example, Mike
If CONC records not associated with a NOTE are encountered, it will append them too, I think. "Fortunately' I ran into one before the first NOTE so it died due to ptrNote being nil.
So I changed it:-
...
if strTag == "NOTE2" then
ptrNote = ptrTag:Clone() -- Remember latest Note field
elseif strTag == "CONC" then
if (ptrNote ~= nil) then
ptrConc = ptrTag:Clone() -- Found CONCatenation field
local strNote = fhGetValueAsText(ptrNote)
local strConc = fhGetValueAsText(ptrConc)
strNote = strNote..strConc -- Append CONC text to NOTE text
isOK = fhSetValueAsText(ptrNote,strNote)
isOK = fhDeleteItem(ptrConc) -- Delete CONC field
end
else
ptrNote = nil -- forget the NOTE field
end
...
btw I could not find how to make that nice scrolling code box come up

Re: Reading Gedcom file directly
Posted: 01 Jul 2015 15:47
by tatewise
Note any Plugin changes can be reversed using Edit > Undo Plugin Updates before closing FH.
My Plugin was a quick proof of concept prototype.
It needs refining to accommodate other fields that may have CONC UDF tags after importing a Gedcom.
These can include TEXT and PAGE tags as well as NOTE tags.
If any other tag is detected then the pointer should bet SetNull() to prevent CONC text being appended incorrectly.
A test similar to yours is good defensive coding, but should use ptrNote:IsNotNull() or ptrNote:IsNull() as required.
Also, other record types than the three I included should be searched.
To format postings use the BBCode icons above this edit box (similar to word processor formatting).
To create scrolling code use the </> icon to insert 'code' tags.