* Date Qualifier/Function/Operator Anomalies

Homeless Posts from the old forum system
Locked
User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 25 Aug 2010 18:39

I have been experimenting with Dates and the new FH4 Power User Features in Queries.
Qualifiers
The old Date Qualifiers such as :YEAR and :MONTH_NUMBER and :DAY are documented under Help - Date Formats.
Unfortunately many date formats with a qualifier added give a blank result.
For example %...DATE:DAY% only yields a number if a Simple date includes the day of month (15 Dec 1789 yields 15), but if the date is a Period or Range (e.g. after 15 Dec 1789) then it yields nothing, which cannot be detected using =IsEmpty(%...DATE:DAY%) or =NullDate(%...DATE:DAY%)!!
Functions
The new Power User Functions such as =Year() and =Month() and =Day() are better as they usually use the DATE:XDATE1 option.
=Year() yields a number not only for Simple dates but also Period, Range & Quarter dates.
=Month() similarly yields a number for Simple, Period, Range & Quarter dates if a month/quarter appears in the (1st) date.
=Day() yields a number for Simple, Period & Range dates if a day of the month appears in the (1st) date.
=DayNumber() is not so good and only yields a number for Simple dates with a day, a month & a year.
In all other cases these functions yield nothing, which cannot be detected using =IsEmpty(Month(%...DATE%)) or =NullDate(Day(%...DATE%))!!
Operators
The Operators such as can compare Simple, Period, Range & Quarter dates even with missing month or day.
A date that only defines the Year is deemed to come after the end of that year, so 1950 > 31 Dec 1950.
A date that only defines the Month & Year is deemed to come after the end of that month, so June 1950 > 30 June 1950.
This differs from the Query Columns Sort order that places such dates before the start of the year or start of the month, so 1950 < 1 Jan 1950 and June 1950 < 1 June 1950.
Suggestions
Preferably date Qualifiers should be improved in line with date Functions.
All date Qualifiers & Functions should yield a number, preferably 0 (zero), instead of nothing. This would be consistent with =DateAt(y,m,d) which accepts 0 (zero) for an undefined month or day.
Date comparison Operators and Query Sort order should be consistent, and in line with the 0 values above. Thus 1950 < 1 Jan 1950 and June 1950 < 1 June 1950.
Comments please!!!

ID:4745

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 25 Aug 2010 22:16

P.S.
Year Index
This takes the form %INDI.CENS[year=1881]% and selects any data item with an associated date matching the year.
Usually it is used to locate Census Events. See Help - Understanding Data References.
It works with Simple dates (even without a day or month) and any Quarter dates, but not Period or Range dates.
Presumably it could be modified to use the first of a Period or Range date similar to the =Year() function.

User avatar
SimonOrde
Program Designer
Posts: 352
Joined: 18 Nov 2002 10:20
Family Historian: V7
Location: Calico Pie

Date Qualifier/Function/Operator Anomalies

Post by SimonOrde » 26 Aug 2010 09:50

It is true that you cannot test for an empty string using either IsEmpty() or NullDate() - but neither of them are designed for that.  IsEmpty() is used to test whether a data item exists or has no value.  NullDate() is used to test whether a date is Null.

To test whether a string is empty, you should use the = operator and compare it to ''.  For example,

=IsTrue(Text(%INDI.BIRT.DATE:DAY%) = '')

will return True if the value returned is blank.  Using this approach you can output 0 instead of nothing if you prefer that.  e.g.

=NumberIf(Text(%INDI.BIRT.DATE:DAY%) = '',0,%INDI.BIRT.DATE:DAY%)

will output the day number, but substituting 0 for a blank string.

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 26 Aug 2010 15:22

Thank you Simon for that guidance - I had got to that solution in the end, but you have to admit it is rather cumbersome.
Furthermore, the same solution has to be used for the functions Day(), Month(), Year(), and DayNumber() even though the Help in Functions (all) clearly states that the Return Type is Number and not Text, and does not describe the conditions where the return value is blank.

Do you have any comments on my other observations and suggestions?

User avatar
SimonOrde
Program Designer
Posts: 352
Joined: 18 Nov 2002 10:20
Family Historian: V7
Location: Calico Pie

Date Qualifier/Function/Operator Anomalies

Post by SimonOrde » 03 Sep 2010 13:22

All functions, whatever their return type, can effectively return a null value if there is no value to return.  It's not obvious to me that it is more desirable to return 0 rather than blank, in such cases.  Blank cells in queries are surely easier to filter out visually.  Also, in some contexts a value of 0 might be quite misleading - e.g. to suggest that a person was aged 0 years when an event occurred, if you had no idea what their age was.  It is common in spreadsheets to leave blanks where a numeric value is not known.  What would you do with functions that returned dates, if there was no known date?  

Where there are inconsistencies in our practice, I do agree that we should look at that.  Thank you for your feedback in any case.

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 03 Sep 2010 21:46

Thank you for your comments Simon.
Perhaps, what is needed is an extra optional parameter to these functions to determine whether blank or 0 (zero) is returned when no date value exists.

We have focused a lot on these return values, but I would like to change the focus to the types of Date (Simple, Period, Range & Quarter) supported in different ways by the various power features.

My original posting was triggered by a Query design problem.
One Column sorted a Date field in Descending order.
Another Column chose a value according to the same Date being > (greater than) a fixed DateAt(value).
e.g.
Heading Expression Sort
Birth %INDI.BIRT[1].DATE% D
Alive? =TextIf((%INDI.BIRT.DATE% > DateAt(1841,6,6)),'No','Yes')

Strangely, in the Result Set, there were several Rows where the Alive? value switched back and forth between Yes and No.
e.g.
Birth Alive?
1843 No
Q1 1842 No
3 Jun 1841 Yes
Jun 1841 No
Q2 1841 Yes
Q1 1841 Yes
1841 No
Q3 1840 Yes
Q1 1840 Yes

That is when it became apparent that Query Sort and Comparison Operators use slightly different criteria.
So I tried to create a Column that would Sort in agreement with the Comparison.
I tried =DayNumber(%INDI.BIRT[1].DATE%) but that returns no value for most Dates illustrated above, and many others.
So I tried =Number(Year(%INDI.BIRT.DATE%) * 10000 + Month(%INDI.BIRT.DATE%) * 100 + Day(%INDI.BIRT.DATE%)) but that fails for the reasons discussed earlier.
I similarly tried the :YEAR and :MONTH_NUMBER and :DAY qualifiers, but these were even worse!
Finally =Number((NumberIf(Text(Year(%INDI.BIRT.DATE%)) = '',1000,Year(%INDI.BIRT.DATE%)) * 10000) + (NumberIf(Text(Month(%INDI.BIRT.DATE%)) = '',99,Month(%INDI.BIRT.DATE%)) * 100) + NumberIf(Text(Day(%INDI.BIRT.DATE%)) = '',99,Day(%INDI.BIRT.DATE%))) worked!?!?

That was when I posted my original comments......

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 04 Sep 2010 09:29

P.S.
Just as clarification, the crux of my original posting is outlined below.

The Query Sort algorithm must have to assign a numerical value to the Year, Month, and Day component of every conceivable Date value, even where that component is missing. Otherwise, a Sort Date Query Column would not work!

The Date Comparison Operators ( >= ) and the =DayNumber() Function should use these same Date component values in a consistent way.

The Date Functions (=Year(), =Month(), =Day(), etc.) and Qualifiers (:YEAR, :MONTH_NUMBER, :DAY, etc.) should also use these Date component values, with the possible variant that if their associated Date component is missing, then they yield a blank result.

While talking about dates, there is a small error in the FH4 Help on The Gregorian Calendar. It says the Julian Calendar Year had an average of 365 days whereas it should be 365.25 days according to Wikipedia (and many other online references), because it had a leap day in February every fourth year. Otherwise, in 400 years it would have been about 97 days behind, not 3 days ahead, based on modern calculations of year length at about 365.2422 days.

User avatar
SimonOrde
Program Designer
Posts: 352
Joined: 18 Nov 2002 10:20
Family Historian: V7
Location: Calico Pie

Date Qualifier/Function/Operator Anomalies

Post by SimonOrde » 07 Sep 2010 12:38

Tatewise - Family Historian's date sort algorithm does not simply use the date comparison operators because the requirements of the two are different.  Date sorting should present dates in an order that is presentationally most convenient for most users, most of the time. The job of the comparison operator however - like any other operator - is to give accurate, quantifiable results.  Consequently, FH won't say that X < Y if X MAY be earlier than Y.  It will only say that X < Y if, on the information available, X is definitely earlier than Y.  This is the policy adopted by database programs.  A value of NULL in a database never returns true if compared to a number. It is neither less than 5 nor greater than 5.  It is simply NULL.  But you still have to decide whether it appears before or after 5 when sorting values in lists (different programs sort them differently).  Yes this may sometimes lead to apparently anomalous results. But that's always going to happen whichever way you do it.

I suspect that the best solution would be to make available a new function 'DateSort' (or similar) which returns -1, 1 or 0, for any 2 dates, depending on whether FH sorts the first earlier than the second, later than the second, or neither.

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 08 Sep 2010 14:59

Firstly, in the context of the statement from Simon that FH 'will only say that X 6 June 1841 = True[/b] when it only MAY be True, but equally MAY be False?
Similarly it says 1841 (calc) < 6 June 1841 = False when it MAY be True?
Query Columns:
Image

Query Result Set:
Image

Secondly, I do not think a DateSort function, such as Simon suggests, would solve the Query problem that I originally posed. Could Simon perhaps explain how it would be used.

Thirdly, could Simon comment on my other suggestions regarding Date Functions & Qualifiers.

User avatar
PeterR
Megastar
Posts: 1129
Joined: 10 Jul 2006 16:55
Family Historian: V7
Location: Northumberland, UK

Date Qualifier/Function/Operator Anomalies

Post by PeterR » 08 Sep 2010 17:32

Comparisons involving partial dates appear to work on a balance of probabilities.  Since there are 24 days in June which are after 6 June 1841 and only 5 days in June before that date, it is certainly more likely that June 1841 > 6 June 1841 than otherwise.  This may be a more useful result than NULL, but perhaps there could be an option for NULL instead in such circumstances.  I'm not sure there is any way of satisfying all possible requirements, but greater consistency, where practicable, would be desirable.

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 08 Sep 2010 20:27

PeterR, I am sorry but your cunning assumption based on probabilities is incorrect.
If the date is changed such that the comparison is June 1841 > 30 June 1841 then the result is still True.
Furthermore, if the date is changed such that the comparison is 1841 > 31 Dec 1841 then that result is also True!
As stated in my original posting on: 25/08/10 at 18:39:51:
'A date that only defines the Year is deemed to come after the end of that year' but before the next, so 31 Dec 1841 < 1841 < 1 Jan 1842.
'A date that only defines the Month & Year is deemed to come after the end of that month' but before the next, so 30 June 1841 < June 1841 < 1 Jul 1841.
This is contrary to the Query Sort order, and does not support Simon Orde's explanation of how comparisons work.

User avatar
SimonOrde
Program Designer
Posts: 352
Joined: 18 Nov 2002 10:20
Family Historian: V7
Location: Calico Pie

Date Qualifier/Function/Operator Anomalies

Post by SimonOrde » 09 Sep 2010 10:02

Tatewise - I have had a quick look at your example, and agree that it doesn't match what I said - thank you for pointing that out.  It's a somewhat complicated area because as well as, effectively, 2 kinds of date comparison - the sort kind and the calculation kind - you also have 2 kinds of dates: a 'big' date that stores ranges, periods, quarters, etc; and a 'small' date, which marks the starts and ends of periods, ranges etc.  For the purposes of the compare operators, 'big' dates are mapped to 'small' dates.  We don't have a compare operator that works with big dates.  But even allowing for that, looking at the examples you gave, it does appear that the policy I stated on comparisons is not being applied consistently, at the level of comparisons at the month/day level when months or days are missing.  We have made a note of this and will look at it and any other inconsistencies in this area.

>> I do not think a DateSort function, such as Simon suggests, would solve the Query problem that I originally posed <<

I'm not sure what problem you are referring to.  Is the problem that you wanted the values in the 'Alive' column of your query result set to be grouped together?  If so you can achieve this by sorting on that column first, and then on the date next.  You can do this (in the Columns tab) either by moving the alive column before the birth column so that its sort has higher priority.  Or, if you don't want to do that, create a second hidden column for Birth, that comes after the 'Alive' column, and apply the date sort to that column.

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 09 Sep 2010 11:12

I know it has been a bit of a journey, but thank you for making a note to look into these inconsistencies.

I agree that to get the Alive? column values, in my example, to group together, I could sort as Simon suggests.
However, that does not need the DateSort function, so I still do not understand its purpose.

Furthermore, my Query Problem is much more complex than presented so far. I have simplified the examples in order to make the inconsistencies clearer.
The problem arose when designing my Kinsfolk Census Queries which are now posted in the FHUG Query Store.
The Kinsfolk Census To Do Table uses the Alive? comparison as only part of the Expression in each of 8 Columns (one Column per Census from 1841 to 1911).
If you change the Columns tab by inserting a new first Column with Expression: %INDI.BIRT[1].DATE% and sort Descending on this first Column instead of the Hidden Y+M+D Column, then the problem arises with the n/a (not alive) values in the Result Set not always being grouped together.
(Read the General tab Description for more details).

User avatar
Jane
Site Admin
Posts: 8440
Joined: 01 Nov 2002 15:00
Family Historian: V7
Location: Somerset, England
Contact:

Date Qualifier/Function/Operator Anomalies

Post by Jane » 09 Sep 2010 12:07

then the problem arises with the n/a (not alive) values in the Result Set not always being grouped together.
Could you use a hidden column which factors the alive into the equation for YMD? Perhaps setting the number -ive or very large, depending where you want them?

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 09 Sep 2010 12:30

The Y+M+D expression evaluates the Birth Date using the Comparison criteria instead of the Sort criteria, and that resolves the problem, but in a very inelegant way.
Factoring in the Alive decision is complicated by the fact that their are 8 such decisions, one per Census column!

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 13 Sep 2010 14:30

I have given these Date issues some consideration and may have hit upon an elegant solution.
This proposal hinges on the realisation that dates such as the month June 1841 and the year 1842 represent a duration, which for the sake of clarity will be called a date Span.
The June 1841 date Span = 1 Jun 1841 - 30 Jun 1841 and the 1842 date Span = 1 Jan 1842 - 31 Dec 1842.
If FH did not support such dates (with missing day/month components) users would have to enter them as a date Range or Period.
In the case of Range and Period dates the Qualifiers :XDATE1 and :XDATE2 provide the values of the 1st and 2nd dates.
This proposal suggests that new Span date Qualifiers :SPANMIN and :SPANMAX could provide similar but more comprehensive 1st and 2nd date values.
There could also be complementary Span date Functions =SpanMin() and =SpanMax() that return the same date values.
See the later table for illustrations of all the valid date formats and values.

In fact there is almost a precedent for this proposal in the case of Quarter dates, which have :XDATE1 and :XDATE2 1st and 2nd date values.
For instance, Q2 1841 has :XDATE1 = Apr 1841 and :XDATE2 = Jun 1841 in the current FH.
However, they ought to be :XDATE1 = 1 Apr 1841 and :XDATE2 = 30 Jun 1841 to fully define the quarter!!

With this proposal the June 1841 date Span would have :SPANMIN = SpanMin() = 1 Jun 1841 and :SPANMAX = SpanMax() = 30 June 1841.
If a birth date %INDI.BIRT.DATE% = June 1841, then currently %INDI.BIRT.DATE% > DateAt(1841,6,6) = True, but with this proposal %INDI.BIRT.DATE:SPANMIN% > DateAt(1841,6,6) = False and =SpanMax(%INDI.BIRT.DATE%) > DateAt(1841,6,6) = True.
This resolves the problem of grouping Date Comparison values together in Query Results when sorting by birth date %INDI.BIRT.DATE% as explained in the earlier postings.

The date Functions (=Day(), =Month(), =Year(), =DayNumber(), =Timespan(), etc, etc.) would support dates with the :SPANMIN and :SPANMAX Qualifiers.
This would resolve a problem with Dates exported to Spreadsheets, that do not recognise dates earlier than 1 Jan 1900, and thus cannot Sort such dates.
However, sorting will work on an exported column containing =DayNumber(SpanMin(%date%)) or =Timespan(DateAt(1,1,1), SpanMin(%date%), DAYS).

Table of Proposed Span Date Values:
Image
Where 1 Jan 9999 B.C. and 31 Dec 9999 represent the ealiest date and latest date supported by FH in the Gregorian Calendar.
Of course similar sets of dates must be applied for the other supported calendars.

Alternatively, instead of introducing :SPANMIN, :SPANMAX & :SPANTYPE, the existing :XDATE1, :XDATE2 & :XDATETYPE could be updated, but backward compatibility would be compromised.

avatar
PeterD
Gold
Posts: 14
Joined: 14 Oct 2009 14:25
Family Historian: None

Date Qualifier/Function/Operator Anomalies

Post by PeterD » 15 Sep 2010 11:15

Souds very very good, Wish I understood it !!!

Peter Driver[rolleyes]

User avatar
tatewise
Megastar
Posts: 27078
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Date Qualifier/Function/Operator Anomalies

Post by tatewise » 15 Sep 2010 11:55

It is trying to solve the problems discussed in the earlier postings.

A simple example may explain.
If you sort dates into order in a FH Query then 1841 is before June 1841 which is before 15 June 1841.
But if you use the Comparison operators then 1841 > June 1841 > 15 June 1841.

This is because 1841 and June 1841 are really a range of dates and some tests use the beginning of the range and others use the end.
The Span values allow the user to choose which dates they want to use.

Locked