* fhFileUtils library problems

Writing and using plugins for Version 5 and above.
Post Reply
User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

fhFileUtils library problems

Post by tatewise » 11 Oct 2021 11:51

For the Family History Multi-Project Person Index (19877) posting I've developed an FH 7.0 only plugin using the "fhFileUtils" library module to read multiple GEDCOM files. It works well in many respects but I've run into a number of problems.

fhFileUtils.readTextFile() does not read large files:
This mostly works well but leaves FH 'Not Responding' for files above a certain size.
It is OK for files up to about 1,500 KB but files above 40,000 KB fail to be read. I'm not sure where the breakpoint is.
It does handle file paths containing non-ASCII characters such as ë Ā Ē Ī Ō Ū unlike the functions below.

fhFileUtils.fileGetContents() does not handle non-ASCII paths:
This reads very large files OK, but does not handle file paths containing non-ASCII characters such as ë Ā Ē Ī Ō Ū
fhFileUtils.fileExists() says the file exists, but fhFileUtils.fileGetContents() says it does not:

An error has occurred - plugin failed to complete
...es (x86)\Family Historian\Program\Lua\fhFileUtils.fh_lua:311:
Unable to open file E:\Mike\OneDrive\Documents\Family Historian Projects\Test Unicode ĀĒĪŌŪ\Test Unicode ĀĒĪŌŪ.txt for binary read.
E:\Mike\OneDrive\Documents\Family Historian Projects\Test Unicode ĀĒĪŌŪ\Test Unicode ĀĒĪŌŪ.txt: No such file or directory

fhFileUtils.getShortPath() does not handle non-ASCII paths:
The documentation says: "Gets the short path for a file or folder, suitable for use with lua io library"
It does not seem to alter the file path in any way and fails with non-ASCII characters such as ë Ā Ē Ī Ō Ū
Short = fhFileUtils.getShortPath(File)
Handle = io.open(Short) sets Handle to (nil) even though fhFileUtils.fileExists() says the file exists.

BTW: The following functions handle file paths containing non-ASCII characters such as ë Ā Ē Ī Ō Ū correctly:
fhFileUtils.fileExists() , fhFileUtils.getParent() , fhFileUtils.getFolderContents ()
I have not yet checked all the other file handling functions. Should I do so?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhFileUtils library problems

Post by tatewise » 11 Oct 2021 15:12

After some searching around in Google, I think I know why fhFileUtils.getShortPath() does not provide short paths.

In recent Windows versions the following Registry key is set to 1 by default:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\NtfsDisable8dot3NameCreation
* 0: Enables 8dot3 name creation for all volumes on the system.
* 1: Disables 8dot3 name creation for all volumes on the system.
* 2: Sets 8dot3 name creation on a per volume basis.
* 3: Disables 8dot3 name creation for all volumes except the system volume.
That can be changed and needs a reboot, but still does not create short names for existing folders & files.
Short names can be added by Administrators using the following command prompt but that does not help much:
fsutil file setshortname "filename" "shortname"

Therefore, short paths are generally unavailable in Windows 10 and so Lua io functions will never work!

The corollary is that fhFileUtils functions that rely on Lua io can never support non-ASCII file path names.
i.e. fhFileUtils.fileGetContents() and fhFileUtils.filePutContents()
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 3148
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhFileUtils library problems

Post by ColeValleyGirl » 11 Oct 2021 15:58

Good to know why getShortPath worked for me when I tested it -- I just checked that value on my system and it's set to 0 (no idea how) -- the function will be dropped from the next version of the library.

The problem with fhFileUtils.readTextFile() and large files is a limitation of the ADODB stream control used, and there's nothing we can do short-term other than than document that it's unreliable for use with very large files. The size of files you're talking about isn't a use case we'd considered -- we were thinking fact sets, config files and the like. The file limit is about 20Mb if I'm correct (although different webpages suggest different values). There might be ways around it, reading/writing line by line but with a corresponding performance hit, and I'm not convinced the effort is worthwhile for the edge case of large files. I'll put it as low priority on my to-do list to investigate.

Give me time to look into the fileGetContents/filePutContents issue -- I know what the problem is but need to think how to address it. The other functions use FileSystemObjects, so will handle Unicode file names correctly.

User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhFileUtils library problems

Post by tatewise » 11 Oct 2021 16:26

Thank you. We are on a similar path.

I am investigating ways of using ADODB for the fhFileUtils.fileGetContents() and fhFileUtils.filePutContents() functions.
So far, it seems that it must be told the stream Type is Binary to handle all types of file, i.e. so.Type = 2

I am also looking at fhFileUtils.readTextFile() to see if there is a workaround for the Stream size problem.
The so:LoadFromFile(sPath) command seems to load the enormous file and so.Size correctly gives its size.
It is the so:ReadText() command that hangs FH and I'm experimenting with so:ReadText(1000000) to read chunks at a time with some success. Watch this space.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhFileUtils library problems

Post by tatewise » 11 Oct 2021 18:44

I have both fhFileUtils.fileGetContents() and fhFileUtils.readTextFile() working with a 50 MB file and non-ASCII characters in the file path name. I need to perform a few more tests and refine the performance.
I'll also look at fhFileUtils.filePutContents() and fhFileUtils.createTextFile() and post my findings tomorrow.
It is looking OK. :D
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhFileUtils library problems

Post by tatewise » 12 Oct 2021 13:50

This is what I have achieved with readTextFile(), createTextFile(), fileGetContents() & filePutContents().

The biggest breakthrough is reading the ADO stream in 100,000 byte chunks, which handles enormous files quickly.
Reading in larger chunks, or the whole stream, invokes the '(Not Responding)' problem and FH gets very slow.
This is the typical chunk reading script which has very little impact on small files:

Code: Select all

      so:Open()
      so:LoadFromFile(sPath)
      local iSize = so.Size
      local tText = {}
      repeat
        table.insert(tText,so:ReadText(100000))
        iSize = iSize - 100000
      until iSize <= 0
      so:Close()
      return table.concat(tText)
Another new feature is auto-detection of file encoding such as Unicode UTF-8 or UTF-16 or defaulting to ASCII.
This is useful when reading/writing GEDCOM or other files where the encoding cannot be predicted.
( BTW: The ASCII CharSet strips down to 7-bit bytes so cannot handle ANSI files with accented characters! )
This is the text encoding detection script that is only invoked if bUnicode or iBits are not supplied:

Code: Select all

        bUnicode = bUnicode or false
        if sText:match("^\xEF\xBB\xBF")			-- "" = UTF-8 BOM
        or sText:match("[\xC2-\xF4][\x80-\xBF]+") then 	-- UTF-8 multi-byte encoding pattern
          bUnicode = true
          iBits = 8
        elseif sText:match("^\xFF\xFE")			-- "ÿþ" = UTF-16 BOM
            or sText:match("^.\0") then			-- UTF-16 2-byte encoding 2nd byte 0
          bUnicode = true
          iBits = 16
        end
The full revised and tested readTextFile() and createTextFile() function script is attached below.

I've experimented with ADO stream versions of fileGetContents() and filePutContents() with varing success.

The fileGetContents() ADO stream version works perfectly well when reading files in binary mode.
It returns exactly the same file contents as the Lua version using io.open with read("*all").
The ADO version does handle non-ASCII file paths whereas the Lua version did not.
That is probably all that is required. I see no point in trying to read in text mode.
The fileGetContents() ADO version script is attached below.

The filePutContents() ADO stream version is unsucessful in binary mode.
The so:Write(sContents) command always fails with the message:
An error has occurred - plugin failed to complete
[string "C:\Users\Mike\AppData\Local\Temp\~fhF69.tmp"]:392: COM exception:(d:\my\lua\luacom-master\src\library\tluacom.cpp,382):Arguments are of the wrong type, are out of acceptable range, or are in conflict with one another.
No changes have been made to data records.
The problematic filePutContents() ADO version script is attached below.

However, I wonder if this function serves any purpose. When would a Plugin need to compose & write a binary file?
There are functions to copy, move & rename files. Isn't that enough?
Attachments
fhFileUtils Script.fh_lua
ADO Stream based functions
(6.19 KiB) Downloaded 4 times
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 3148
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhFileUtils library problems

Post by ColeValleyGirl » 12 Oct 2021 14:15

Mike, I'll have to look at this later this week -- but thanks for what you've done so far.

Re filePutContents, I suspect somebody will come up with a reason to need it! I seem to remember coming across the same problem in the past -- I will dig back through my notes.

User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhFileUtils library problems

Post by tatewise » 12 Oct 2021 14:57

I have incorporated the readTextFile() function into my next version of the Family History Multi-Project Person Index (19877) associated plugin to handle non-ASCII file paths, and enormous GEDCOM files with unknown character encoding.

If we cannot get the filePutContents() ADO stream version to work, then stick with the Lua io.open write version, but warn users that non-ASCII file paths are not supported, although fhConvertUTF8toANSI() will fix some.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhFileUtils library problems

Post by tatewise » 13 Oct 2021 10:41

A Google search reveals that many other users get that error with ADO Stream Write buffer.
I have tried the various solutions, but although they sidestep the error, usually by using text instead of binary mode, none of them produces the desired output file, which is typically corrupt for whatever format is involved.

So, I have created a workaround filePutContents() function that uses the Lua io.open write method, but if the file path has UTF-8 characters it goes via a temporary file in the FH ProgramData folder and then uses moveFile().
That revealed a bug in the moveFile() function checking the parent folder, so that function has never been tested!

Should I test all the other fsFileUtils library functions for similar minor bugs?

The attached fsFileUtils Script.fh_lua has the workaround script for filePutContents() and corrected moveFile() function.

[ P.S. A bit more testing reveals that moveFile() crashes if the destination file exists so that needs a bit more correction! ]
Attachments
fhFileUtils Script.fh_lua
ADO Stream based functions, etc
(6.73 KiB) Downloaded 3 times
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 3148
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhFileUtils library problems

Post by ColeValleyGirl » 13 Oct 2021 10:59

Mike, do you have a diff file? -- otherwise I'm going to have to go through line for line eyeballing the changes to merge them into the main branch in the github repository (and doing manual merges like that introduces errors like the one you've spotted).

Re testing, no thanks. I'm going to have to run through testing everything again once the changes are made.

User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhFileUtils library problems

Post by tatewise » 13 Oct 2021 11:50

I did not think a diff file would be very useful, because the changed functions all involve an almost total rewrite, apart from the 2nd half of createTextFile() and now moveFile()!

Thinking about moveFile() a bit more, does it need a bOverwrite parameter like createTextFile() ?

Many functions return false but provide no reason. Should they provide a 2nd returned error message similar to many Lua functions?
e.g.
return false, "Parent folder missing or file exists with no overwrite."

With that amount of change would an entire replacement fhFileUtils.fh_lua file be satisfactory?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 3148
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhFileUtils library problems

Post by ColeValleyGirl » 13 Oct 2021 12:01

Mike, hold off a while -- I need to talk about this elsewhere, especially about changing the interface.

User avatar
tatewise
Megastar
Posts: 22518
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: fhFileUtils library problems

Post by tatewise » 13 Oct 2021 19:00

OK, I will hold off for a while.

However, I could not resist investigating all the functions a little more.

IMO many of them do not check the file path parameters thoroughly enough.
That can lead to many reasons why the File System Object functions might crash with little explanation.
e.g. the wrong type of file path supplied (file versus folder), existing or missing file or folder, etc.

Several functions would seem to benefit from a bOverwrite parameter just like createTextFile().
e.g. copyFile(), copyFolder(), moveFile(), moveFolder(), filePutContents()

The ADODB based functions can raise the error "This library requires ADODB (part of Microsoft .NET)".
That is followed by a 'return false' statement that will never be executed even if the user employs pcall().
So the return statement is redundant.

What do you think?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
E Wilcock
Megastar
Posts: 1156
Joined: 11 Oct 2014 07:59
Family Historian: V7
Location: London
Contact:

Re: fhFileUtils library problems

Post by E Wilcock » 16 Oct 2021 09:23

I have downloaded it but it doesnt run.
Nothing happens.

The Place Project I happen to be working on at the moment is not vast. 2730 people. My largest similar is 5855

User avatar
ColeValleyGirl
Megastar
Posts: 3148
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: fhFileUtils library problems

Post by ColeValleyGirl » 16 Oct 2021 09:54

Evelyn, it won't do anything on it's own -- it's a library for use by plugins/plugin authors, not a plugin for users.

avatar
E Wilcock
Megastar
Posts: 1156
Joined: 11 Oct 2014 07:59
Family Historian: V7
Location: London
Contact:

Re: fhFileUtils library problems

Post by E Wilcock » 16 Oct 2021 10:03

Thank you. Apologies.

Post Reply