Run Time and Memory Use Advice

Introduction

Some Plugins perform intensive repetitive operations, which on a large database in excess of 10,000 Individuals, may take a long time or need large amounts of memory.

This article suggests how these resources can be minimised by using a few simple techniques. If the Plugin run time is measured in minutes rather than seconds, then even a 10% saving becomes significant.

Global v Local

As a general rule local variables are more efficient than global variables, but there are exceptions.

When a function requires a lookup table of constants, such as below, then it is faster if it is global.

TblLookup = { A=1,E=2,I=3,O=4,U=5 }

This is because the global table is only created once, whereas a local table is created every time the function is called.

It can also help to define local variables that reference global variables, especially an indexed table entry. e.g.

local tblCode = TblLookup
local intNumb = IntNumber
local tblMode = TblMode[intNumb]

Where the same table lookup or other complex operation is required multiple times, then assign the result to a local variable and use it multiple times instead. e.g.

if tblCode[strA] > 1 and tblCode[strA] < intLast then
	intSum = intSum + tblCode[strA]
end

becomes

local intCode = tblCode[strA]		-- Look up the value once 
if intCode > 1 and intCode < intLast then
	intSum = intSum + intCode	-- Use the local variable three times
end

Some of the above techniques are illustrated in the Soundex (code snippet) examples, where the Global Variable Version runs about 5 times faster than the Local Variable Version, and the Function Prototype Version takes things one step further.

Although a local function within another function clarifies its scope of use, a global function is faster. Maybe it is due to the global function being defined only once, whereas a local function is defined every time its container function is called.

Progress Bar

The Progress Bar (code snippet) provides useful feedback and a cancel option for long running Plugins. However, if the Global Variable Version is called too frequently, its own code can significantly extend the run time.

Therefore, it may be better to avoid calling the ProgressDisplay.Step function on every loop step. e.g.

ProgressDisplay.Start("Loop Progress",9000)
intSteps = 0
for i = 1, 9000 do
	intSteps = intSteps + 1
	if intSteps == 100 then		-- Note that	if intSteps % 100 == 0 then	is slower
		intSteps = 0
		ProgressBar.Step(100)	-- Only update Progress Bar every 100 steps
	end
	if ProgressBar.Stop() then break end
end

This problem has been mitigated in the Function Prototype Version by only updating the display when necessary instead of every Step.

Do not make the ProgressBar.Stop() function conditional on the intSteps count, otherwise it may make interrupting the loop and other interactions less responsive.

Large Files

Sometimes it is necessary to process the contents of large files line by line. For this it is much faster to use table.insert and table.concat than string concatenation strText = strText..strLine. e.g.

local tblText = {}
for strLine in io.lines(strFile) do		-- Read through the file line by line
	strLine = strLine:gsub("abc","xyz")
	table.insert(tblText,strLine)		-- Insert the line of text as the next table entry
end
local strText = table.concat(tblText,"\n")	-- Concatenate the lines of text separated by newline
SaveStringToFile(strText,strFile)		-- See the Save String To File (code snippet)

Large Tables

Very large tables of data can arise, say when keeping results of each Individual Record in the database compared with every other Individual Record. For smaller databases up to 10,000 Individuals, this amounts to less than 50,000,000 entries, but quickly escalates for larger databases, and can exhaust available memory.

To avoid this problem the table of results should be sorted and the lowest entries pruned off. e.g.

if intScore >= intMinimum then				-- Continue if score is above lowest retained Results entry
	table.insert(tblResults,{ Score=intScore, ... })
	if #tblResults >= 2000 then			-- Prune low scores from Results to avoid exhausting memory
		table.sort( tblResults, function(tblA,tblB) return tblA["Score"] > tblB["Score"] end )
		for i = 1 , #tblResults / 2 do
			table.remove(tblResults)	-- Remove the lower 50% of the sorted Results
		end
		intMinimum = tblResults[#tblResults]["Score"]
	end
end

Data Tables

When testing for alternative data values it is tempting to use if … then … else … structures, but when there are more than a few values it can become inefficient. Consider the following where each data reference tag is tested several times:

ptrIndi = fhNewItemPtr()
ptrIndi:MoveToFirstRecord("INDI")
while ptrIndi:IsNotNull() do
	local ptrData = fhNewItemPtr()
	ptrData:MoveToFirstChildItem(ptrIndi) 
	while ptrData:IsNotNull() do
		local strTag = fhGetTag(ptrData)
		if strTag == "NAME" then
			-- Handle names
		elseif strTag == "FAMS" then
			-- Handle spouse
		elseif strTag == "SOUR" then
			-- Handle source
		end
		ptrData:MoveNext()
	end
	ptrIndi:MoveNext()
end

The following data table method is more efficient, as each data reference is only tested once. It becomes even more efficient as the number of alternatives increases. The only condition is that all the functions must support the same parameters.

function HandleNames(ptrData)
	-- Handle names here
end

function HandleSpouse(ptrData)
	-- Handle spouse here
end

function HandleSource(ptrData)
	-- Handle source here
end

function Null()
	-- Handle anything else
end

tblWhat = {		-- Translate data tag to function
	NAME = HandleNames;
	FAMS = HandleSpouse;
	SOUR = HandleSource;
}

ptrIndi = fhNewItemPtr()
ptrIndi:MoveToFirstRecord("INDI")
while ptrIndi:IsNotNull() do
	local ptrData = fhNewItemPtr()
	ptrData:MoveToFirstChildItem(ptrIndi) 
	while ptrData:IsNotNull() do
		local strTag = fhGetTag(ptrData)
		local action = tblWhat[strTag] or Null
		action(ptrData)		-- Call one of the functions above
		ptrData:MoveNext()
	end
	ptrIndi:MoveNext()
end