* Reading Gedcom file directly

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
User avatar
Barnowl
Gold
Posts: 21
Joined: 25 Sep 2014 10:05
Family Historian: V6

Reading Gedcom file directly

Post by Barnowl » 29 Jun 2015 17:43

I am writing a plugin that needs to read gedcom files directly. (Mentioned by Bundle in the thread "Problems with note fields truncating in FTM imports" I am trying to fix this problem.
I am very new to Lua though I have written a lot of VB and C. I have no trouble reading the gedcoms from Ancestry and FTM which are UTF-8.
My problem is that the FH gedcom is in full blown Unicode not UTF-8, and Lua does not seem to be able to read it: I am using

for sLine in io.lines(sFile) do
......

Can anybody point me to any syntax I haven't found that will read the unicode file?

I can make it work by opening the file in Notepad and doing a Save As to UTF-8. Is there any reason not to do this? FH still seems to like it, at least on the surface.
Ian Johnson - researching Bain, Batley, Elsden, Ewen and Johnson families and the village of Easton Royal
(i>

User avatar
tatewise
Megastar
Posts: 27088
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Reading Gedcom file directly

Post by tatewise » 29 Jun 2015 18:30

Reading Unicode files (either UTF8 or UTF16) in LUA is very tricky, because LUA knows nothing about the byte encoding and treats every byte as an ANSI character, plus the newline character in UTF16 is not recognised by for sLine in io.lines(sFile) do.

I have written some library modules that will read Unicode files but you probbaly don't need them.

If your objective is to correct the NOTE field CONCatenation level error from FTM Gedcom then that can be done much more simply.

The technique is to search all records and fields for the NOTE2 tag, then if any subsequent CONC UDF tags are found then concatenate their text onto the Note text and delete the CONC item.
e.g.

Code: Select all

function Main()
	for intRec, strRec in ipairs({"INDI","FAM","SOUR"}) do	-- Loop through desired record types
		local ptrNote, ptrConc, isOK
		local ptrRef = fhNewItemPtr()
		ptrRef:MoveToFirstRecord(strRec)
		while ptrRef:IsNotNull() do	-- Loop through all data fields
			local ptrTag = ptrRef:Clone()
			local strTag = fhGetTag(ptrTag)
			ptrRef:MoveNextSpecial()
			if strTag == "NOTE2" then
				ptrNote = ptrTag:Clone()		-- Remember latest Note field
			elseif strTag == "CONC" then
				ptrConc = ptrTag:Clone()		-- Found CONCatenation field
				local strNote = fhGetValueAsText(ptrNote)
				local strConc = fhGetValueAsText(ptrConc)
				strNote = strNote..strConc	-- Append CONC text to NOTE text
				isOK = fhSetValueAsText(ptrNote,strNote)
				isOK = fhDeleteItem(ptrConc)	-- Delete CONC field
			end
		end
	end
end

Main()

Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Barnowl
Gold
Posts: 21
Joined: 25 Sep 2014 10:05
Family Historian: V6

Re: Reading Gedcom file directly

Post by Barnowl » 29 Jun 2015 21:23

Thanks for prompt reply.
In my simple way I thought if the file is wrong you open it up and fix it!
It works - very impressive - but I think there is some serious learning curve ahead of me!
I think I understand broadly how your code works - but could you explain why fhGetTag returns "NOTE2" for the note line but simply "CONC" for the following lines.
But again thanks
Ian (i>
Ian Johnson - researching Bain, Batley, Elsden, Ewen and Johnson families and the village of Easton Royal
(i>

User avatar
tatewise
Megastar
Posts: 27088
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Reading Gedcom file directly

Post by tatewise » 29 Jun 2015 21:46

Yes, you could work directly on a Gedcom file, but it would be easier to work on the UTF8 (or ANSI) file exported from FTM rather than the FH active UTF16 imported file.

However, in most cases, and this is one of them, it is even easier to work through the FH API and correct the internal FH database.

The FH internal tag names are not specifically to do with LUA Plugins, but generally the way Data References are formed. If you use any Data Reference Assistant in FH you will see that a local Note for say an INDIvidual record uses %INDI.NOTE2%, whereas a shared Note record uses %INDI.NOTE>%, despite the Gedcom tag in both cases being NOTE. This convention is used throughout to differentiate local tags from link tags without having to interrogate their parameter. So SOUR2 is a local Source Note whereas SOUR is a link to a Source Record, and OBJE2 is a local Media object whereas OBJE is a link to a Media record. With that convention it is easy to know whether to use fhGetValueAsText() or fhGetValueAsLink().

There is a great deal of advice in the plugins:index|> Family Historian Plugins and plenty of Plugins to plagiarise.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Barnowl
Gold
Posts: 21
Joined: 25 Sep 2014 10:05
Family Historian: V6

Re: Reading Gedcom file directly

Post by Barnowl » 01 Jul 2015 13:48

I found a problem with your example, Mike
If CONC records not associated with a NOTE are encountered, it will append them too, I think. "Fortunately' I ran into one before the first NOTE so it died due to ptrNote being nil.
So I changed it:-
...
if strTag == "NOTE2" then
ptrNote = ptrTag:Clone() -- Remember latest Note field
elseif strTag == "CONC" then
if (ptrNote ~= nil) then
ptrConc = ptrTag:Clone() -- Found CONCatenation field
local strNote = fhGetValueAsText(ptrNote)
local strConc = fhGetValueAsText(ptrConc)
strNote = strNote..strConc -- Append CONC text to NOTE text
isOK = fhSetValueAsText(ptrNote,strNote)
isOK = fhDeleteItem(ptrConc) -- Delete CONC field
end
else
ptrNote = nil -- forget the NOTE field

end
...

btw I could not find how to make that nice scrolling code box come up :(
Ian Johnson - researching Bain, Batley, Elsden, Ewen and Johnson families and the village of Easton Royal
(i>

User avatar
tatewise
Megastar
Posts: 27088
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Reading Gedcom file directly

Post by tatewise » 01 Jul 2015 15:47

Note any Plugin changes can be reversed using Edit > Undo Plugin Updates before closing FH.

My Plugin was a quick proof of concept prototype.
It needs refining to accommodate other fields that may have CONC UDF tags after importing a Gedcom.
These can include TEXT and PAGE tags as well as NOTE tags.

If any other tag is detected then the pointer should bet SetNull() to prevent CONC text being appended incorrectly.

A test similar to yours is good defensive coding, but should use ptrNote:IsNotNull() or ptrNote:IsNull() as required.

Also, other record types than the three I included should be searched.

To format postings use the BBCode icons above this edit box (similar to word processor formatting).
To create scrolling code use the </> icon to insert 'code' tags.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

Post Reply