Saturday, May 4, 2013

The problem with NULLs and T-SQL

I was reading Itzik Ben Gan's T-SQL Fundamentals book as part of the reading assignment for the SQL Pass Book Readers Group.

He made a great point about dealing with nulls and the IN clause and he had a great example.  Rather than use his database, I wrote my own script to demonstrate the problem, and thought I'd post the issue here, succinctly.

First run the setup script.  This creates two tables and populates it with data.

create table customers
(custID int
, custName varchar(100));
create table orders
(orderID int
, custID int
, OrderDate datetime);

insert into customers
(custID, CustName)
(1, 'Ike Ellis')
, (2, 'John Ackerman')
, (3, 'Scott Reed')
, (4, 'Brad Cunningham')
, (5, 'Llewellyn Falco');
insert into orders
(orderID, custID, OrderDate)
(1, 1, '20130101')
, (2, 1, '20130102')
, (3, 1, '20130103')
, (4, 3, '20130104')
, (5, 3, '20130105')
, (6, 4, '20130106')
, (7, 4, '20130107')

Next, we want to find out how many customers have not placed orders:

select * from customers
where custid not in (select custid from orders)

We get two results, John Ackerman and Llewellyn Falco.

Now let's add a NULL into the Orders table:

insert into orders
(orderid, custid, orderdate)
(8, null, '20130108')

And now, let's run the same query:

select * from customers
where custid not in (select custid from orders)

And we get no results.  This is because the column allows NULLs and NULL comparisons always return false.  Essentially the NULL customerID could be John Ackerman or Llewellyn Faloc.  We don't really know if it's them or not, so instead of returning them, we just won't return anything.  This is by design.  So how do we get the results we want.  We can do three things:

1)  We can make sure that columns that don't need null values are created with NOT NULL, thus disallowing them to be inserted. 

2)  We can strip out the NULLs in the subquery:
select * from customers
where custid not in (select custid from orders where custid is not null)

3)  We can use exists, which only uses two-value predicate logic, not three-value logic, thus giving us the desired result:
select * from customers c
where not exists (select * from orders o where o.custid = c.custid)

Hope that little script helps!

No comments:

Post a Comment