* Function Factories and Iterators (advanced Lua)

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
User avatar
Jane
Site Admin
Posts: 8441
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Function Factories and Iterators (advanced Lua)

Post by Jane » 19 Oct 2012 17:05

I have been experimenting with writing my own iterators and function factories.

I have added a couple of iterators to the Code Snippets.

This afternoon I took a look at Mikes Soundex routine and tried a little experiment, using a Function factory to return a function at the same time as setting up it's parameters as local variables.

I suspect it's slightly quicker than the other version, but also has the advantage of being self contained with no global variables.

Any thoughts?

Code: Select all

--[[
@Title:      Soundex Calculator
@Author:   Jane Taubman
@Version:   1.0
@LastUpdated:   October 2012
@Description:   Function to Convert any Name to Soundex as per http://en.wikipedia.org/wiki/Soundex
      and http://creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm#SoundExAndCensus
]]


function newSoundEx()
    local TblSoundex = {} -- Soundex dictionary of previously coded Names & Places
    
    local TblCodeNum = { -- Soundex code number table
        A=0,E=0,I=0,O=0,U=0,Y=0, -- H=0,W=0, -- H & W are ignored
        B=1,F=1,P=1,V=1,
        C=2,G=2,J=2,K=2,Q=2,S=2,X=2,Z=2,
        D=3,T=3,
        L=4,
        M=5,N=5,
        R=6
    }
    return function (strAnyName)
        strAnyName = string.upper(strAnyName:gsub('[^%a]','')) -- Make name upper case letters only
        if strAnyName == '' then return 'Z000' end
        local strSoundex = TblSoundex[strAnyName] -- If already coded then return previous Soundex code
        if strSoundex then return strSoundex end
        local strSoundex = string.sub(strAnyName,1,1) -- Soundex starts with initial letter
        local tblCodeNum = TblCodeNum -- Local reference to Global table is faster
        local strLastNum = tblCodeNum[strSoundex] -- Set initial Soundex code number
        for i = 2, string.len(strAnyName) do
            local strCodeNum = tblCodeNum[string.sub(strAnyName,i,i)] -- Step through Soundex code of each subsequent letter
            if strCodeNum then
                if strCodeNum > 0 and strCodeNum ~= strLastNum then -- Not a vowel nor same as Soundex preceeding code
                    strSoundex = strSoundex..strCodeNum -- So append Soundex code until 4 chars long
                    if string.len(strSoundex) == 4 then
                        TblSoundex[strAnyName] = strSoundex -- Save code for future quick lookup
                        return strSoundex
                    end
                end
                strLastNum = strCodeNum -- Save as Soundex preceeding code, unless H or W
            end
        end
        strSoundex = string.sub(strSoundex..'0000',1,4) -- Pad code with zeroes to 4 chars long
        TblSoundex[strAnyName] = strSoundex -- Save code for future quick lookup
        return strSoundex
    end -- function StrSoundex
end -- function newSoundEx



local strSoundex = newSoundEx()  -- Create strSoundex Function.
print(os.clock())
for i = 1,10000 do
        for _,v in ipairs({'Mullins','Smith','Jones','Smith'}) do
        local s = (strSoundex(v))
    end
end
print(os.clock())
ID:6538
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."

User avatar
tatewise
Megastar
Posts: 27079
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Function Factories and Iterators (advanced Lua)

Post by tatewise » 19 Oct 2012 20:04

I am not sure I understand quite how it works, but it does very well, and here are my initial comments.

(1) Passing the input parameter using (strSoundex(v)) looks slightly unusual.

(2) local TblSoundex = { } used to be global, because its contents must be preserved between function calls, and this new technique achieves that somehow with a local!

(3) local TblSoundex = { } used to be global to allow it to be saved in & loaded from a Plugin data file between Plugin runs, but this benefit was marginal, and this new technique benefits even less.

(4) This new technique is so much quicker that local TblSoundex = { } can be dispensed with altogether, with further run time savings.

(5) local TblCodeNum = { } used to be global, otherwise it got rebuilt every time the function is called, but this new technique somehow does it better.

(6) This new function does run about 10% to 20% faster, excellent Jane.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Jane
Site Admin
Posts: 8441
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Function Factories and Iterators (advanced Lua)

Post by Jane » 20 Oct 2012 08:35

(1) Passing the input parameter using (strSoundex(v)) looks slightly unusual.
My Fault that bit had a print statement previously and I forgot to remove the brackets.

So it can be simply
local soundex = strSoundex(v)
(4) This new technique is so much quicker that local TblSoundex = { } can be dispensed with altogether, with further run time savings.
So do you think it's worth simplifying the routine and removing the cacheing? I just did a simple test and I think the cache is worth retaining, but probably not saving it to storage between runs.
(5) local TblCodeNum = { } used to be global, otherwise it got rebuilt every time the function is called, but this new technique somehow does it better.
The technique is described in Chapter 6.1 and once you get your head around it, it can be really useful. Especially where a function needs to use pointers as it means the function can have it's own 'private' pointer which exists from the first call and is not alterable by the main program (just remember to never return it in case someone changes it).
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."

User avatar
tatewise
Megastar
Posts: 27079
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Function Factories and Iterators (advanced Lua)

Post by tatewise » 20 Oct 2012 16:56

I have read Chapter 6.1 and now understand what is happening.

I have run some more performance tests using Find Duplicate Individuals.
Contrary to my first thoughts, the cache is definitely worth retaining.
It is also probably worth saving to storage between runs, especially for very large databases.

So below is my suggested revised prototype function.
By default it only uses an internal cache.
But it allows an external cache to be specified as a parameter.
The two examples show the two modes of usage.

Code: Select all

function NewSoundex(tblSoundex) -- Prototype Soundex Calculator Function

      local tblSoundex = tblSoundex or {} -- Soundex dictionary cache of previously coded names

      local tblCodeNum = { -- Soundex code number table
            A=0,E=0,I=0,O=0,U=0,Y=0, -- H=0,W=0, -- H & W are ignored
            B=1,F=1,P=1,V=1,
            C=2,G=2,J=2,K=2,Q=2,S=2,X=2,Z=2,
            D=3,T=3,
            L=4,
            M=5,N=5,
            R=6
      }

      return function (strAnyName)
            strAnyName = string.upper(strAnyName:gsub('[^%a]','')) -- Make name upper case letters only
            if strAnyName == '' then return 'Z000' end
            local strSoundex = tblSoundex[strAnyName] -- If already coded then return previous Soundex code
            if strSoundex then return strSoundex end
            local strSoundex = string.sub(strAnyName,1,1) -- Soundex starts with initial letter
            local strLastNum = tblCodeNum[strSoundex] -- Set initial Soundex code number
            for i = 2, string.len(strAnyName) do
                  local strCodeNum = tblCodeNum[string.sub(strAnyName,i,i)] -- Step through Soundex code of each subsequent letter
                  if strCodeNum then
                        if strCodeNum > 0 and strCodeNum ~= strLastNum then -- Not a vowel nor same as Soundex preceeding code
                              strSoundex = strSoundex..strCodeNum -- So append Soundex code until 4 chars long
                              if string.len(strSoundex) == 4 then
                                    tblSoundex[strAnyName] = strSoundex -- Save code for future quick lookup
                                    return strSoundex
                              end
                        end
                        strLastNum = strCodeNum -- Save as Soundex preceeding code, unless H or W
                  end
            end
            strSoundex = string.sub(strSoundex..'0000',1,4) -- Pad code with zeroes to 4 chars long
            tblSoundex[strAnyName] = strSoundex -- Save code for future quick lookup
            return strSoundex
      end -- anonymous function
end -- function NewSoundex

-- Default Internal Cache Example --

StrSoundex = NewSoundex() -- Soundex Calculator with internal cache

print(os.clock())
for i = 1,10000 do
       for _,v in ipairs({'Mullins','Smith','Jones','Smith'}) do
       local s = StrSoundex(v)
   end
end
print(os.clock())

-- Advanced External Cache Example --

TblSoundex = {} -- Soundex dictionary cache of previously coded names

StrSoundex = NewSoundex(TblSoundex) -- Soundex Calculator with external cache saved between Plugin runs

print(os.clock())
for i = 1,10000 do
       for _,v in ipairs({'Mullins','Smith','Jones','Smith'}) do
       local s = StrSoundex(v)
   end
end
print(os.clock())

for m,n in pairs (TblSoundex) do
      print(m,n)
end
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Jane
Site Admin
Posts: 8441
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Function Factories and Iterators (advanced Lua)

Post by Jane » 21 Oct 2012 09:19

That looks good Mike, do you want to update the code Snippets?
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."

User avatar
tatewise
Megastar
Posts: 27079
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Function Factories and Iterators (advanced Lua)

Post by tatewise » 21 Oct 2012 11:36

Yes, I will, perhaps adding the Prototype Function as a Code Version 3 would be best, to show the progression through simple functions to the prototype method.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Jane
Site Admin
Posts: 8441
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Function Factories and Iterators (advanced Lua)

Post by Jane » 21 Oct 2012 11:53

Good Idea.
Jane
My Family History : My Photography "Knowledge is knowing that a tomato is a fruit. Wisdom is not putting it in a fruit salad."

Post Reply