Andy Lester

Technology, careers, life and being happy

The world’s two worst variable names

| 71 Comments

As programmers, assigning names makes up a big part of our jobs. Phil Karlton said “There are only two hard things in Computer Science: cache invalidation and naming things.” It’s a hard problem, and it’s something we deal with every time we write a line of code. Whether it’s a variable or a table or a column in that table or a file on the filesystem, or what we call our projects and products, naming is a big deal.

Bad variable naming is everywhere. Maybe you’ll find variables that are too short to be adequately descriptive. The programmer might as well have been working in TRS-80 BASIC, where only the first two characters of variable names were significant, and we had to keep a handwritten lookup chart of names in a spiral notebook next to the keyboard.

Sometimes you’ll find variables where all vowels have been removed as a shortening technique, instead of simple truncation, so you have $cstmr instead of $cust. I sure hope you don’t have to distinguish the customers from costumers! Worse, $cstmr is harder to type because of the lack of vowels, and is no longer pronounceable in conversation.

There are also intentionally bad variable names, where the writer was more interested in being funny than clear. I’ve seen $crap as a loop variable, and a colleague tells of overhauling old code with a function called THE_LONE_RANGER_RIDES_AGAIN(). That’s not the type of bad variable name I mean.

While I’m well aware that variable naming conventions can often turn into a religious war, I’m entirely confident when I declare The World’s Worst Variable Name is $data.

Of course it’s data! That’s what variables contain! That’s all they ever contain. It’s like if you were packing up your belongings in moving boxes, and on the side you labeled the box “matter.”

Variable names should say what type of data they hold. Asking the question “what kind” is an easy way to enhance your variable naming. I once saw $data used when reading a record from a database table. The code was something like:

$data = read_record();
print "ID = ", $data["CUSTOMER_ID"];

Asking the question “what kind of $data?” turns up immediate ideas for renaming. $record would be a good start. $customer_record would be better still.

Vague names are the worst, but right behind them are naming related objects with nearly identical names that do not distinguish them. Therefore the World’s Second Worst Variable Name is: $data2.

More generally, any variable that relies on a numeral to distinguish it from a similar
variable needs to be refactored, immediately. Usually, you’ll see it like this:

$total = $price * $qty;
$total2 = $total - $discount;
$total2 += $total2 * $taxrate;

$total3 = $purchase_order_value + $available_credit;
if ( $total2 < $total3 ) {
    print "You can't afford this order.";
}

You can see this as an archaeological dig through the code. At one point, the code only figured out the total cost of the order, $total. If that’s all the code does, then $total is a fine name. Unfortunately, someone came along later, added code for handling discounts and tax rate, and took the lazy way out by putting it in $total2. Finally, someone added some checking against the total that the user can pay and named it $total3.

The real killer in this chunk of code is that if statement:

if ( $total2 < $total3 )

You can’t read that without going back to figure out how it was calculated. You have to look back up above to keep track of what’s what.

If you’re faced with naming something $total2, change the existing name to something more specific. Spend the five minutes to name the variables appropriately. This level of refactoring is one of the easiest, cheapest and safest forms of refactoring you can have, especially if the naming is confined to a single subroutine.

Let’s do a simple search-and-replace on the coding horror above:

$order_total = $price * $qty;
$payable_total = $order_total - $discount;
$payable_total += $payable_total * $taxrate;

$available_funds = $purchase_order_value + $available_credit;
if ( $payable_total < $available_funds ) {
    print "You can't afford this order.";
}

The only thing that changed was the variable names, and already it's much easier to read. Now there’s no ambiguity as to what each of the _total variables means. And look what we found: The comparison in the if statement was reversed. Effective naming makes it obvious.

There is one exception to the rule that all variables ending with numerals are bad. If the entity itself is named with a number, then keep that as part of the name. It's fine to use $sha1 for variable that holds a SHA-1 hash. It helps no one to rename it to $sha_one.

After I wrote the first version of this article, I created policies for Perl::Critic to check for these two naming problems. My add-on module Perl::Critic::Bangs includes two policies to check for these problems: ProhibitVagueNames and ProhibitNumberedNames.

What other naming sins drive you crazy? Have you created automated ways to detect them?

71 Comments

  1. I’ve been told that naming or appending ‘temp’ or ‘tmp’ to a variable name is bad practice because all variables are temporary by nature.

    • I don’t have an issue with Temp, because its relative… If you think about it, really everything we work with is temporary in the scheme of things, so that logic doesn’t do much for me.

      As long as you are consistent and don’t use the temp variable outside the scope of your function it is clear to most that its a dump variable. Is it lazy… probably… but many times I think it actually helps the code make more sense for intermediate steps what whatnot.

      However, using temp for significant variables is a sign of idiocy, except maybe temperature?

      • I don’t have an issue with Data, because its relative… If you think about it, really everything we work with is data in the scheme of things, so that logic doesn’t do much for me.

        Everything is relative and depends on context, same for data and temp.

        • Temp is relative in the sense that some variables are more temporary than others.

          Data is relative in the sense that some variables are more data than others?

          • Well, more or less.

            If you are developing something that will get raw bytes as input, data may seen as an appropriated name (at least, for me).

    • I don’t mind variables in PHP named ‘tmp’ because you can’t do this..

      $firstElement = function_that_returns_array()[0];

      You have to phrase it:
      $tmp = function_that_returns_array();
      $firstElement = $tmp[0];

      (Yea, yea, yea, I know.. PHP 5.4 — its not available to me yet.)

  2. I call variables “data” when I do not trust what they contain. It may be the record I expect, but since I just queried a random web service on a remote server, it is just data. After validating it, it is a customer_record.

    • What about “remote_data”, “query_response”, “raw_customer_record”?

      I hold firm that there is no excuse for naming a variable “data”.

      • Statistical analysts use this variable often. Here is the data we’re analyzing. Let’s call it data. Sounds good. Moving right along.

        It all depends on the context and scope. Variable names can be far worse than ‘data’. How about ‘variable’, now that’s a bad name.

    • IMO `data’ can be the worst, but it depends on context. In the context of most languages, data is a poor choice and perhaps the poorest one can make. In other languages, it is the only choice that makes sense.

      We are building a rather significant system using a language called Eiffel. In Eiffel, we have a single root class called ANY. Descending from ANY is EV_ANY, which is the root class of all of Eiffels “Vision” classes, which are used for any and all GUI controls or control structures.

      Each GUI control or control structure can have some form of data attached to it (e.g. some languages call this `data binding’). Thus, the Windows controls of COMCTL32 form the basis for the Windows version of Eiffel Vision, thus, the EV_ANY class has a `data’ feature. This feature is declared as:

      data: detachable ANY

      What this means is: As as programmer, I now have an attachment point for control data (if I need it) later on in the inheritance model. Moreover, EV_ANY is referred to as a `deferred’ class, which means it cannot be directly instanced (e.g. created) in memory. It must be inherited from in a descendant class where the `data: detachable ANY’ can be redefined to something far more meaningful, if it will be used.

      Interestingly enough, if the `data’ feature does not get used, the compiler will strip the dead code from the resulting EXE, so the compiled program will not have this unused feature hanging out in memory in the run-time application. Only those instances where `data’ has been defined and used will be present in the run-time EXE.

      So, again — the usefulness and legitimate use of the term `data’ for a feature (e.g. property) in a class is far from settled. Depending on your language system and context, you might find that data makes perfect sense!

      Cheers!
      Larry

  3. It’s not the same sort of thing, but I absolutely hate when people put the word “function” in their function names. Of course calculator_function() is a function, damnit!

    The same with variables like $var_foobar – I mean, I can understand a little coming from a language like Javascript, but you don’t declare your variables there as var var_foobar do you?

  4. I get all my variable names from newguid.com

  5. I agree on both counts, but is “record” really any better than “data”? It’s equally generic.

    (And, if that’s what you’re trying to communicate, I like these names. So, if I were writing a DB layer, or maybe a graphing library that truly had some generic “data”, I would use that name, and I’d be doing it to be intentionally vague. It’s being vague by mistake that’s the problem.)

    • Precisely. It comes down to whether the `name` is too vague for the data it holds, and sometimes you need a name as vague as it gets.

  6. I sometimes use tmp as a variable name. If it’s not important to the logic of computation I don’t want anyone to look at that variable. for instance parsing date in c#:
    datetime tmp;
    datetime? result = new datetime?();
    if(datetime.tryparse(” jhuhb “, out tmp))
    result = tmp;

    • I love the name “temp”, I use it all the time. It’s usually a temporary container or object in a function, that will be returned.

  7. not to pick on perl, but:

    $_

    $|

    $%

    $&

    $+

    $.

    and the list goes on, there are at least 16 of these dollar+punctuation variables in the global scope, and several of them are essential to using the language.

  8. I once encountered a language where variables of different types could have the exact same name in the same scope… The various functions in the language would know which variable to use based on the type it was expecting. Of course the code I was working with made extensive use of this ‘feature’… int A, long A, short A, float A and string A$ (at least you knew that one was a sting by the $.) The language? HP Basic on the HP 9845.

    • It seems to me that various editions of BASIC are generally the worst offenders for things like this. Take PBASIC from Parallax for use on their embedded microcontrollers, TI-BASIC from Texas Instruments in their programmable calculators, and HP Basic. All of them terrible languages… I’d rather just write decent assembly and save myself their “features” and useless abstractions.

  9. It depends on scope, I think. For identifiers that are only in use over 10-20 lines of code, names like x, y, z or i, j, k are fine, I think. For names of wider scope, though, it is important to use names that clearly characterize the purpose of the thing named.

    • x, y, and z are fine as variable names if they are referring to values in euclidean geometry. If you are doing just about anything else, not so much. If you can’t think of a reasonable name for something, you don’t understand the domain enough to be writing the code. And the next guy who has to deal with your crappy code suffers.

      i, j, and k are not unreasonable for idiomatic use as indexes in loops. I prefer naming such things iFoo or idxFoo, particularly where in nested loops over indexes where you might confuse i and j.

      • X, Y, Z (or A, B) are perfectly reasonable for comparers, too. The names get their clarity from their function’s name:

        sub diffMerge($x, $y, $z) {
        # do interesting three-way comparison between $x, $y, and $z
        }

        • x, y and z are terrible names in any context. Even when your doing euclidean geometry. it should be something like xcoord, ycoord and zcoord. At least you know its a coordinate. As for comparators you can usually come up with names like source or standard. No need for single letter variable names.

          I’m usually the guy who has to come in later and fix this garbage. It ends up taking months to figure out whats going on. And by then my hair starts to thin because I’ve been pulling it out for so long. Shame on any programmer for inflicting this crap on people.

      • Rather insane. iFoo and idxFoo just defeat the purpose of using a short loop index counter. Especially if that loop does anything math intensive or ever swaps variables (i.e. sorting algos).

        Here’s a huge tip: use x, y, z, w for loop counters. It’s far less likely you’ll ever have the “2 hour debug session because I used an i instead of a j and my font sucks” problem, and everyone will know immediately what you’re doing. If you need to use loop counters and geometry, use j, k, m, n, though I personally have preference for skipping n and using o because n can occasionally be confused for an m.

        Never use i, j, or l for loop counters. You’re just inviting bugs into your code.

      • >i, j, and k are not unreasonable for idiomatic use as indexes in loops.

        I still always use self documenting names here too. For example:

        foreach my $customer (@customers) {
        # do something w/ $customer
        }

        • In a foreach loop, sure, i/j/k are stupid and should be replaced with something more explicit. But when it’s an integer index loop, they’re just fine.

  10. I hate the numbering of variable names. I used to work with someone who did it all the time. With files, methods and variables. It’s so hard to distinguish between them. We have those file:
    AdvertiserKeywordsStatRep22.pl
    AdvertiserKeywordsStatRep2.pl
    AdvertiserKeywordsStatsRep2.pl
    AdvertiserKeywordsStatsRep.pl

    Even with the file name in front of me, I often open the wrong one. And in those files, you have functions like getData2() and getData22(). Very hard to follow.

  11. calculator_function … what about dynamically typed languages with first class functions …
    I really have to disagree on this one!

  12. ‘this’ has got to be the worst variable name of all-time.

  13. I prefer $chinstrap

  14. I’m pretty sure the worst variable name is my_var.

    • Any time I see a variable prefaced with m- or my- I want to scream at the person who coded it. If your language supports instance or global variables, they should be named consistently different than local variables, and you can do that without tacking letters to the front of your variable names. The best way is to make it explicit that you are working with your instance variables (i.e. this.whatever, super.whateverElse). Yeah, “it takes longer to type ‘this.’ than ‘m’”, but in the end you’re saving yourself an enormous headache when you start dealing with derived types (Hey, which class has mWhatchamacallit defined? I need to change its type…)

      One of the worst things I hate about programming on the Windows platform is all of its incomprehensible Hungarian notation jumanji added to the front of type names. If you don’t know what the type of the variable you’re working with is without these letters in the front of it, your code is screwed anyways, and by nature it’s really hard to know what type of whatever you’re dealing with on Windows with its dozens of typed pointers.

  15. It all depends on the context, i think. We have some functions in the codebase which aggregate a lot of things into a nested structure named ‘data’. All other local variables have more meaningful names, of course, but i like to have this short-named variable ‘data’ stand out all over the function.

    Second my most favorite generic name is ‘it’, used only in short functional calls like this:

    _(special_rules).detect(function(it) { return it.type == ‘working_time’ })

  16. Whenever i code, i like to read my code like a book
    Making full use of rem tekst to explain those things

    usually there are some counters like i and j the only type of counter i use in for next constructions, but since i stick in the usage only there the code is easy to read.

    I can procedure raw code without explanation, but it removes the elegance, how you got to it, and that part of coding to me is beautiful, to sad that a lot of coders only copy and paste from websites and forget to create decent content.

    Maybe over a few years people would not only pay per code line, but also about the written story inside this code how it works, maybe someday people will be interested in actually reading code like a book.

  17. I would also add world’s two worst method names: process() and handle()…

  18. ‘Var’ is pretty bad as well

  19. $info is as meaningless as $data, unless ‘info’ has a meaning in the domain.

    I’d say $cust wasn’t nice either, I refuse to allow abbreviations, unless they are well-known ones. Just making up ones own abbrevs is confusing :-)

    For me, there are technical problems and domain ones. In both categories, names are fairly well-defined. All you need do is understand what is out there, and use it properly. On that note, most data names should be nouns, and most code names should be verbs (there are exceptions, but you’d have to be coding something very interesting to really get to one). For the most part the names should match the way you would describe the data/code to another programmer or a stakeholder. If you really know what you are writing, naming isn’t a hard problem.

    Paul.

  20. I would disagree. There are perfectly acceptable uses for the variable name data. This would be when designing a generic/template and it’s designed to hold a peice of data, and you don’t know what it is. For instance when you’re writing any sort of data structure that uses nodes (like a linked list, or a tree). Like in a linked list you’re likely to have Nodes that hold a next pointer a previous pointer and a T data object where T is the type of object the linked list is holding.

  21. I submit $var is a worse variable name than $data….

  22. $data is worse than say $froggle, $wibble or $fnurd? IMO, one shouldn’t pass judgement on the name of a variable, unless you know the context it’s used in. $customer_age may sound like a very descriptive variable name, but if it’s sometimes used to store colour names in, and sometimes it holds a pointer to a global data structure, I’d call it a very bad variable name. In fact, that makes $customer_age worse that $data; $data may not be very description, it also doesn’t give the opportunity to suggest it contains something else than it does.

  23. ‘flag’, anyone?

  24. I think you’d enjoy Steve McConnell’s book “Code Complete”.

    One of the concepts he covers there is the metric “variable span” which just measures the number of lines of code between the first use/declaration of a variable and its last appearance. Generally, the length and specificity of your variable name should be proportional to the variable’s span. Variable only used inside an inner loop? One letter generic name ($i, $j, $x) is probably fine. For a global variable that appears all over the code, you’ll want $A_VERY_LONG_VARIABLE_NAME_THAT_IS_COMPLETELY_UNAMBIGUOUS. Most are somewhere in between.

    Judging a variable name without considering its span seems misleading to me.

  25. All very true – but at least they are words. I also hate when I come across code using a single letter for variables (e.g. $t = $p * $q).

    BTW – Not that it matters, but in the first example I think there’s a small typo:
    $total2 += $total * $taxrate; should likely be $total2 += $total2 * $taxrate;

  26. How about $value for #3? :-)

  27. I’m not the biggest fan of “foo” and “bar” even as metavariables in example code, but when I see them in actual production code (or, as is more common for me, code handed in by my students), it gives me the absolute screaming heebie-jeebies.

  28. Sometimes I agonize for hours or days over what to name a variable in the world’s first true open-source artificial intelligence. For instance, “midway” is the time-point roughly halfway-back in the artificial lifetime of the AI organism. Other times I start out with an ill-chosen variable name and I hate having to change it, but I just bite the bullet. Sometimes I consider selling people the right to have a variable named after them in the world’s first True AI, but such profiteering would be too crass. Please accept thanks for this cool article about “The world’s worst two variable names” from the simultaneous First and Second-place winner of the august title of The World’s Most Annoying Nerd — http://videnskab.dk/bagsiden/verdens-mest-irriterende-nord — in Danish.

  29. I once worked on a system built on a relational database: it had a table called Entity, and one of the fields was called Attribute.

    • I too, worked on a database with an Entity table. Worse yet, it was a catch-all table for everything that was not a person or a place.

  30. flag and flags

  31. Pingback: » Around the Web: Wednesday April 18, 2012 mikemct

  32. Actually, I believe the saying is “There are only two hard things in Computer Science: cache invalidation, naming things and off by one errors.”

  33. I used data as a variable name the same way I use i.

    In a loop there is no need for anything more specific than i, and the same for an object. If you are using the object for a very very short term purpose (inside a small loop for example) then using data as the variable name works well.

  34. You’ve clearly never used Drupal, which uses $variables. Very clever of it.

  35. I think there’s another exception to the numbered variables argument, and that is mathematical formulas. For example, to find the distance between two points, writing:

    distance = sqrt( square( x2-x1 ) + square( y2-y1 ))

    is totally legitimate, as everyone who works with that code should be familiar with the mathematical nomenclature.

  36. If you have trouble knowing what your variables are holding you should probably use more functions. Variables names are rarely an issue if you keep functions under 100 lines.
    I used to give this to my students.
    http://www.kernel.org/doc/Documentation/CodingStyle

  37. The worst I’ve ever seen was this — we had hired a co-op student from a local university (this was many years ago). He was a huge fan of the band Genesis. So he named all his variables after songs, bane members, lyrics, etc. The variables had absolutely nothing to do with what their actual use. We didn’t realize this until he had returned to school and we had to debug some of his code. I thought we were all going to lose it.

  38. There are two bad ones I know: 1) Booleans with negatives in the name. If you have one called “not_ready” or “not_empty” I find that your mind has to re-comprehend what it means every time that you use it somewhere. 2)Variables that have their names shortened, the memory is there, I don’t expect a variable name length of 30 chars, but if you’re shortening clean_up[] to c_up[] c’mon.

  39. ‘data’ makes sense if you are writing a generic function that processes some data and generate output.

    To me the worst variable name is l (lower case L) – in languages like JS where you don’t have to put a dollar in front of a var name, that is

    • “‘data’ makes sense if you are writing a generic function that processes some data”

      So, basically every function everywhere? :-)

  40. Can anyone suggest a good abbreviation for $temperature?

    • When I writing physics code, I’m happy using T for temperature as that draws on the expectations/domain knowledge of the reader, similarly v for velocity dTdz for a temperature gradient, and so on. Then again, in that case I’m essentially writing down maths which has a good description elsewhere.

      This is where it’s very helpful to use numerals, if the original equation you’re implementing uses them: T = c1 * S + c2 * S^2, this way it’s very easy to check against the original source (usually a paper) that you’ve got it right.

  41. I have a separate project for writing ad-hoc programs where I vent my needs to name things in a stupid manner. I do insist on sensible variables on release code but at the same time if the variable name alone is enough to stop you from understand it’s function then this might not be just the variable name that is at fault.

  42. Years ago I helped a fledgling programmer that had 0 typing skills so to make it easy to type his program his variable names were A, AA, AAA, AAAA, AAAAA, AAAAAA, B, BB, BBB, BBBB, BBBBB, BBBBBB, C, CC, etc.

  43. What about buf?

    buf[255] lol

  44. How do people feel about using Hungarian notation with variable names? Sometimes I’ve found that to be really helpful, but for simple data types it can seem a little superflous.

  45. Just discovered I’ve used $data lots of times in recent code – oh no!
    The only saving grace is each usage is no more than 2-4 lines after variable creation, and almost always is constructing a temporary array to be sent out to a function straight afterwards.

    Even so, I agree that they should be called things like $viewParams or $formData or other more relevant names.

    Right, add that to the list for the next refactor!

  46. Pingback: The World’s Two Worst Variable Names | The Lingua Pragma Blog

  47. Pingback: 【转】史上最糟糕的两个变量名 | 挨踢小胖的博客

Leave a Reply

Required fields are marked *.