Page 1 of 1
Occupations
Posted: 18 Jun 2014 09:19
by DavidNewton
I am thinking of using the Occupation Descriptor, as well as the Occupation description, and my thought is to split my occupation data into 'Occupation' and 'Descriptor'. So I would welcome opinions as to how this should be done, or even if it should be done..
At this moment I am thinking along the lines: 'Coal miner, pony driver underground' becomes Coal Miner (Descriptor: Pony Driver Underground). The advantage of this is that in the Working with Data > Occupations I would get overall numbers easily. The disadvantage of course is that the fine detail becomes almost invisible but it does show in the custom query 'Occupations List'
http://www.fhug.org.uk/wiki/doku.php?id ... ccupations
I have searched the forum for Occupation Descriptor and came up with just one hit
http://www.fhug.org.uk/forum/viewtopic. ... tor#p18846
in which the use of the descriptor is discussed.
David
Re: Occupations
Posted: 18 Jun 2014 12:12
by tatewise
As you say there are problems with the visibility of the Descriptor sub-field.
An alternative is to use an unusual character such as tilde (~) between the two parts of the Occupation.
e.g. 'Coal miner ~ pony driver underground'
Then a Plugin similar to my Occupations Per Census Year and Gender could produce a Result Set of counts for just the part of the Occupation preceding the tilde (~).
Re: Occupations
Posted: 18 Jun 2014 13:11
by DavidNewton
Mike
I was unaware of your plugin. I have downloaded it and will now spend some time trying to follow how it works and then, I hope, modify it to use your idea of the ~ separator, as it solves both my issues simultaneously.
David
Re: Occupations
Posted: 18 Jun 2014 16:27
by DavidNewton
Excellent. Your idea of the separator works really well and following your plugin and drastically reducing the amount of data collected I was able to write a simple plugin to just count the number of individuals who recorded a particular occupation and discard duplicate mentions of the same occupation. I twigged while writing the plugin that in the Working with Data each mention of an occupation is counted whereas in the Occupations List Query only the first occupation that an individual lists is mentioned. Your ~ idea means that as they move tasks within the industry they are still counted as coal miners.
All that is left is to go through all the occupations data and edit it into an appropriate format. Many Thanks
David
Re: Occupations
Posted: 18 Jun 2014 21:00
by jimlad68
David, any chance of publishing (work in progress) or inserting a copy of your amended code in this post, will save reinventing the wheel and/or help others with future coding.
Re: Occupations
Posted: 18 Jun 2014 22:33
by DavidNewton
Jim
I am not intending to publish this so work in progress is not accurate. However. I am attempting to attach the plugin file. Please excuse my non-standard choices of variable names - I am not a programmer.
Re: Occupations
Posted: 19 Jun 2014 07:13
by Jane
Hi David, just a thought you could use match rather than find and that way you can extract the string in one line, it's worth getting to grips with match and gsub as they are really useful.
For example
This will grab all the characters up until the first space ~ combination and if that's nil return the original string.
So your function would reduce to
Code: Select all
function ExtOccu(str)
return str:match('(.-) ~') or str
end
Re: Occupations
Posted: 19 Jun 2014 08:03
by DavidNewton
Thanks Jane
In the book I have been using string.find comes first and did the job, also generally I didn't have to worry too much about patterns.
I'm trying to work out the pattern you suggested. Is this correct?
the () means grab the string within; '.-' means 0 or more characters minimally expanded until it reaches the two characters ' ~'
So if I wanted to grab the space as well, ensuring that it is a space, would the pattern be '(.- )~' and if I just want to grab everything up to the first '~' would it be '(.-)~'
David
Re: Occupations
Posted: 19 Jun 2014 08:20
by tatewise
Yes, exactly correct David.
The
match function has a further trick.
If you also want to capture the string after the
~ then use:
Code: Select all
before, after = str:match("(.-)~(.*)")
While editing your Plugin, click its
Help > Lua Online Reference Manual to display the reference manual in your browser.
For details on this scenario scroll down the index and click on one of the following:
5 – Standard Libraries
5.4 – String Manipulation
5.4.1 – Patterns
There is a neater method for updating your counter for Occupations.
The logic is that the Individual Occupation entry is
nil (i.e.
false) until the Occupation sets it to
true.
If the Individual Occupation entry is
false (i.e.
not true) then the count is initialised to
0 and incremented.
So your two
if statements become:
Code: Select all
if not IndiOcc[strOcc] then -- Occupation not found yet for this Individual
IndiOcc[strOcc] = true -- Record that Occupation is found
Occup[strOcc] = ( Occup[strOcc] or 0 ) + 1 -- If count is nil then use 0 and finally add 1
end
For more advice see the plugins:index|> V5 Plugins
Developer Guide section.
Re: Occupations
Posted: 19 Jun 2014 11:16
by DavidNewton
That is much neater and the logic is still transparent - a very important factor. I have a tendency to include too many comments but I do not want to fall into the trap described below. The remainder of this post is not relevant to Occupations but is relevant to Documentation
Many years ago I read a book "The Mathematical Experience" by Philip Davis & Reuben Hersh. In one of the chapters they are describing the 'Ideal Mathematician'. Let me quote two short passages about the ideal mathematician, I have left out some of the text.
"To his fellow experts he communicates his results in a casual shorthand. If you apply a tangential mollifier to the left quasi-martingale you get an estimate better than quadratic..."
However,
"His writing follows an unbreakable convention: to conceal any sign that the author, or the intended reader, is a human being. ... The intended readers (all twelve of them) can decode the formal presentation....and see what the author is doing and why he does it. But for the noninitiate this is a cipher that will never yield its secret