Andy Lester

Technology, careers, life and being happy

SELECT * is a bug waiting to happen

| 0 comments

A SQL SELECT statement that use * instead of an explicit column list is a bug waiting to happen.  Beyond the quick-and-dirty prototyping stage, every SQL query in an application should explicitly specify the columns it needs to protect against future changes.

Say you’ve got a table and code like this:

USERS table:
id integer NOT NULL
name varchar(100) NOT NULL
mail varchar(100)

my $query = perform_select( 'select * from users' );
while ( my $row = $query->fetch_next ) {
    if ( defined($row{mail}) ) {
        # do something to send user mail
    }
}

Later on, someone goes and renames the users.mail column to users.email. Your program will never know it. The email branch will just never execute.

Here’s another example. Say you’ve got that users table joining to departments, like so

users table:
id integer NOT NULL
name varchar(100) NOT NULL
email varchar(100)
deptid integer

dept table:
id integer NOT NULL
deptname varchar(100) NOT NULL

SELECT *
FROM users u JOIN dept d ON (u.deptid = d.id)

So your selects come back with id, name, email, deptid, id, deptname. You’ve got “id” in there twice. How does your DB layer handle that situation? Which “id” column takes precedence? That’s not something I want to have to spend brain cycles thinking about.

You should even specify which table each columns come from. For example, say you don’t want the IDs and you just specify the columns you want. So you write something like this:

SELECT name, email, deptname
FROM users u JOIN dept d ON (u.deptid = d.id)

Later on, someone adds an email column to the dept table. Now, your “SELECT name, email, deptname” is making an ambiguous column reference to “email”. If you specify everything fully:

SELECT u.name, u.email, d.deptname
FROM users u JOIN dept d ON (u.deptid = d.id)

then you’re future-proof.

Of course, this rule doesn’t apply to code that is dealing with columns in aggregate. If you’re writing a utility that deals with all columns in a row and transforms them somehow as a group, then no, you don’t need to specify columns.

Aside from the potential bugs, I also think it’s important to be clear to the human reader of your code what exactly you’re pulling from the database. SELECT * makes it a guessing game. Which of these makes it more obvious to the reader what I’m doing?

SELECT * FROM users;

or

SELECT first_name, last_name, email_addr FROM users;

There are also all sorts of speed reasons to specify columns. You reduce the amount of work fetching data from the disk, and your DBMS may not even have to fetch rows from disk if the data is covered in an index. For discussion of the performance issues, see this StackOverflow thread. One thing to remember: Your code will never be slower if you specify columns. It can ONLY be faster.

The speedups are secondary, however. I want to write my queries to be resistant to future change. I don’t mind making a few extra keystrokes to make that happen. That’s why I always specify columns in my SELECTs.

Leave a Reply

Required fields are marked *.