Tuesday, February 27, 2018

Logic in the Presence of Unknowns Just Isn't

... logical?  ... executing? ... or as in my case, even vaguely working to my expectations.

CQL today has to work in the presence of unknown values.  We call these nulls.  Null has this weird property of taking over everything in tri-value oriented languages (those where null is expected), and blowing up everything in bi-value oriented languages (those where null is not so much expected).

How can you tell if your language is oriented towards tri-valuedlogic, or bi-valued logic?  Well, the simple answer is what happens when you compare null OR true.  If the answer is that an exception is thrown, you are definitely dealing with bi-valued, and if you get true, then you are dealing with tri-valued, and if you get null, someone screwed up.

So what happens when you try to build a language interpreter for a tri-valued logic system (like say CQL) in a language that is generally bi-valued (say Java).  Some problems around null values.  In the real world, null is a thing.  It happens.  People don't fill out all the fields in a form, some values are simply unknown, or dependent on workflow that hasn't happened yet.  But we still have to reason with it.

Here are some interesting things you need to think about:
When you sort a list of objects based on a field, where do the objects go where the field is null?  XSLT got this right by making a decision, even if you don't like it.  So the behavior is defined.
"The set of sort key values (after any conversion) is first divided into two categories: empty values, and ordinary values. The empty sort key values represent those items where the sort key value is an empty sequence. These values are considered for sorting purposes to be equal to each other, but less than any other value. The remaining values are classified as ordinary values."
CQL doesn't actually cover this case.  Here's what it has to say about sorting:
"After the iterative clauses are executed for each element of the query source, the sort clause, if present, specifies a sort order for the final output. This step simply involves sorting the output of the iterative steps by the conditions defined in the sort clause. This may involve sorting by a particular element of the result tuples, or it may simply involve sorting the resulting list by the defined comparison for the data type (for example, if the result of the query is simply a list of integers)."
So NOW what?  Well, I think a minor adjustment much like what XSL had to say is in order here.

Type conversion is another issue. If you have a defined process for converting from one type to another, then you should also have a defined process for converting null things that might have been a basic data type into other null things that could be a different data type.  For example, the null string should convert to the null date.

Taking type conversion a step further, the string "asdf192832340asdfa8" when converted to a date might in fact return a null value to indicate their is no conversion.  Or it could raise an error.  That's a decision that needs deciding.

What happens when you union or intersect two lists where the list itself is null?  At the very least the behavior needs to be defined.  To see where the problem lies, consider the following:

List<String> l = null;

Is l an empty list, or simply null?  People who build collections are in the habit of returning an empty collection rather than null, but sometimes the collection builder itself returns null because it perhaps doesn't even understand the type of null at execution time.  That's actually OK, just return Collections.EMPTY_LIST (which happens to be pretty much identical to Collections.EMPTY_SET).

Life gets dicey around nulls.  There are no easy rules, you have to think about it.

BTW: This isn't a dissing CQL. I quite like the language.  But then again, I've been known to write tremendous volumes of code in XSL as well, so that isn't necessarily great praise from someone sane ;-).  I'm simply reporting on some of the challenges I'm having in the hopes that they can be fixed, and that others trying to use it can watch out for the hidden (you might even say unknown) pitfalls that are still being worked out.

   Keith

0 comments:

Post a Comment