Andy Lester

Technology, careers, life and being happy

My bash prompt with git/svn branch+status display

| 0 comments

After spending a few hours last night switching between three different branches in the ack2 project, and typing “git br” over and over, I decided I needed to put branch status in my bash prompt. The only question was: Which one would I steal? Fortunately, Rob Hoelz was online and I mentioned it to him and he handed me his, so I stole it and also added Subversion support as well.

Note: Edited to fix a problem with detecting SVN branches.

#!/bin/bash

# Adapted from from https://github.com/hoelzro/bashrc/blob/master/colors.sh

function __prompt
{
    # List of color variables that bash can use
    local BLACK="\[\033[0;30m\]"   # Black
    local DGREY="\[\033[1;30m\]"   # Dark Gray
    local RED="\[\033[0;31m\]"     # Red
    local LRED="\[\033[1;31m\]"    # Light Red
    local GREEN="\[\033[0;32m\]"   # Green
    local LGREEN="\[\033[1;32m\]"  # Light Green
    local BROWN="\[\033[0;33m\]"   # Brown
    local YELLOW="\[\033[1;33m\]"  # Yellow
    local BLUE="\[\033[0;34m\]"    # Blue
    local LBLUE="\[\033[1;34m\]"   # Light Blue
    local PURPLE="\[\033[0;35m\]"  # Purple
    local LPURPLE="\[\033[1;35m\]" # Light Purple
    local CYAN="\[\033[0;36m\]"    # Cyan
    local LCYAN="\[\033[1;36m\]"   # Light Cyan
    local LGREY="\[\033[0;37m\]"   # Light Gray
    local WHITE="\[\033[1;37m\]"   # White

    local RESET="\[\033[0m\]"      # Color reset
    local BOLD="\[\033[;1m\]"      # Bold

    # Base prompt
    PS1="$LCYAN\h:$YELLOW\w$LCYAN \\\$$RESET "

    local dirty
    local branch

    # Look for Git status
    if git status &>/dev/null; then
        if git status -uno -s | grep -q . ; then
            dirty=1
        fi
        branch=$(git branch --color=never | sed -ne 's/* //p')

    # Look for Subversion status
    else
        svn_info=$( (svn info | grep ^URL) 2>/dev/null )
        if [[ ! -z "$svn_info" ]] ; then
            branch_pattern="^URL: .*/(branch(es)?|tags)/([^/]+)"
            trunk_pattern="^URL: .*/trunk(/.*)?$"
            if [[ $svn_info =~ $branch_pattern ]]; then
                branch=${BASH_REMATCH[3]}
            elif [[ $svn_info =~ $trunk_pattern ]]; then
                branch='trunk'
            else
                branch='SVN'
            fi
            dirty=$(svn status -q)
        fi
    fi

    if [[ ! -z "$branch" ]]; then
        local status_color
        if [[ -z "$dirty" ]] ; then
            status_color=$LGREEN
        else
            status_color=$LRED
        fi
        PS1="$LCYAN($BOLD$status_color$branch$LCYAN)$RESET $PS1"
    fi
}

if [[ -z "$PROMPT_COMMAND" ]]; then
    PROMPT_COMMAND=__prompt
else
    PROMPT_COMMAND="$PROMPT_COMMAND ; __prompt"
fi
__prompt

Just drop that into your ~/.bash directory as prompt.sh, and then add

source ~/.bash/prompt.sh

to your .bashrc. Now you have color-coded branch names: red for dirty, green for clean.

ack 2.0 has been released

| 6 Comments

ack 2.0 has been released. ack is a grep-like search tool that has been optimized for searching large heterogeneous trees of source code.

ack has been around since 2005. Since then it has become very popular and is packaged by all the major Linux distributions. It is cross-platform and pure Perl, so will run on Windows easily. See the “Why ack?” page for the top ten reasons, and dozens of testimonials.

ack 2.0 has many changes from 1.x, but here are four big differences and features that long-time ack 1.x users should be aware of.

  • By default all text files are searched, not just files with types that ack recognizes. If you prefer the old ack 1.x behavior of only searching files that ack recognizes, you can use the -k/--known-types option.
  • There is a much more flexible type identification system available. You can specify a file type based on extension (.rb for Ruby), filename (Rakefile is a Ruby file), first line matching a regex (Matching /#!.+ruby/ is a Ruby file) or regex match on the filename itself.
  • Greater support for ackrc files. You can have a system-wide ackrc at /etc/ackrc, a user-specific ackrc in ~/.ackrc, and ackrc files local to your projects.
  • The -x argument tells ack to read the list of files to search from stdin, much like xargs. This lets you do things like git ls | ack -x foo and ack will search every file in the git repository, and only those files that appear in the repository.

On the horizon, we see creating a framework that will let authors create ack plugins in Perl to allow flexibility. You might create a plugin that allows searching through zip files, or reading text from an Excel spreadsheet, or a web page.

ack has always thrived on numerous contributions from the ack community, but I especially want to single out Rob Hoelz for his work over the past year or two. If it were not for Rob, ack 2.0 might never have seen the light of day, and for that I am grateful.

A final note: In the past, ack’s home page was betterthangrep.com. With the release of ack 2.0, I’ve changed to beyondgrep.com. “Beyond” feels less adversarial than “better than”, and implies moving forward as well as upward. beyondgrep.com also includes a page of other tools that go beyond the capabilities of grep when searching source code.

For long time ack users, I hope you enjoy ack 2.0 and that it makes your programming life easier and more enjoyable. If you’ve never used ack, give it a try.

When it comes to job hunting advice, question everything you’re told

| 1 Comment

Punk pioneers Stiff Little Fingers‘ signature tune “Suspect Device” admonished “Don’t believe them / Question everything you’re told.” It’s sound advice for anyone looking for guidance in the job world.

The other day on /r/GetEmployed, a user asked how he should write his resume objective for a job as a sales clerk at Bass Pro Shops. He said that the prof for his Communications in the Business Environment class told him to have an objective on his resume.

I’m guessing the prof might also have advised to put “References available upon request” at the bottom of the resume, too, which is also bad advice. I’m also guessing that the prof hasn’t created a resume in the non-educational world ever.

The key here is that the original poster of the question (the OP) didn’t ask why an objective is important. He just accepted it as true without an understanding. This is a mistake. Whenever someone gives you advice, about anything, not just jobs, ask why. Ask specifically, “Why do you say I should put an objective on the resume?” or “Why do I have to wear a suit to the interview?” You need to understand why you are doing anything, and not just follow it blindly, so that you can make a decision on if you want to follow it or not. You will get conflicting opinions on everything in life, so understand the logic behind it.

I’m guessing that if the OP had gone back to his prof and asked why to have an objective, the prof’s answer would have been not much more substantive than “because that’s just what you do”. If he were to ask me why you should not have an objective, I’d explain “because it is a waste of space that says nothing except that you want the job that you’re applying for, instead of telling good information about you and why you’re good for the job”. Based on those two reasons, the OP can make his own decision.

Note: There is a time when objectives may make sense: when you’re handing out resumes blindly, like at a job fair or something, where it’s not clear what sort of job you’re looking for. Then it makes sense. But if you’re sending in a resume for a specific job, and your objective is “to get a job that is exactly like the one I’m applying for right now”, then leave it off.

Ask questions. Understand why you’re doing what you’re doing. Don’t follow anyone’s advice blindly, including mine.

A musical postscript: A while back, HR blogger Laurie Ruettimann had a blog called Punk Rock HR. I like to think of Jake Burns and the boys as the real Punk Rock HR in this case.

Fans of early 70s hard rock may notice that SLF’s riff for “Suspect Device” is remarkably similar to the main riff to Montrose’s “Space Station #5″, released five years earlier in 1973.

How to prepare for a job interview: The 4-point summary

| 1 Comment

The core of your preparation for the job interview:

  1. Learn what they do.
  2. Learn how they do what they do.
  3. Figure out exactly what skills, experience and background you have that will help them do what they do faster and cheaper.
  4. Plan how you’re going to explain #3 to them.

Everything else is implementation details.

You should have the first three figured out before you even send a resume. If you don’t have what it takes to help them do it cheaper and faster, then don’t waste your time applying for the job.

Tell me about weird, frustrating or bad job interview questions you’ve been asked.

| 4 Comments

I’m working on an article for SmartBear on handling bad or weird job interview questions, and I’d like to get input from you.  Have you been asked weird, insulting, inexplicable or just plain bad questions in a job interview?  Please let me know about them, and where you were interviewing, or at least what type of company and job was involved if you don’t want to name names.  I want to include real examples in my article and then include suggestions on how to effectively answer them.  I’ll also be discussing alternatives to these bad questions that get at what (I suspect) the interviewer is getting at.  I’m looking for first-hand accounts rather than questions you might have heard a friend talking about.

I’m sure many of you have had estimation questions like “How many light bulbs are there in the city of Chicago?”.  I don’t see those as weird if you’re interviewing at Google, where estimation and scaling are core competencies, but may be in other other contexts.  Have you been asked these sorts of questions elsewhere?  I get a sense from reading things online that these are asked by managers who think they’re cool questions, but without a business need for asking.

Please let me know in the comments, or email me at andy@petdance.com.

Solr’s DataImportHandler can’t handle line-based SQL comments

| 0 comments

At least twice now I’ve run into this problem where I try to comment my SQL code, but doing so makes my Solr data importer blow up.  I post it here for posterity.

Part of your DIH configuration will be at least one entity, probably with SQL code like this:

<entity name="nodes" dataSource="jdbc""
    query="
        SELECT
            foo,
            bar
        FROM blah_blah
    ">

And maybe part of the SQL query isn’t obvious, so you want to add a comment like

<entity name="nodes" dataSource="jdbc""
    query="
        SELECT
            foo, -- We need the foo so we can fribble the wibbitz
            bar
        FROM blah_blah
    ">

But that blows up because the DIH strips linefeeds from your SQL code before passing it to the server.  This means that the SQL code you’re passing looks like this:

SELECT foo, -- We need the foo so we can fribble the wibbitz bar FROM blah_blah

Your line-based comment has wiped out the rest of your SQL query.  So what you have to do is use C-style comments

<entity name="nodes" dataSource="jdbc""
    query="
        SELECT
            foo, /* We need the foo so we can fribble the wibbitz */
            bar
        FROM blah_blah
    ">

Chances are your database supports C-style comments, according to this post on StackOverflow:

C style comments are standard in SQL 2003 and SQL 2008 (but not in SQL 1999 or before). The following DBMS all support C style comments:

  • Informix
  • PostgreSQL
  • MySQL
  • Oracle
  • DB2
  • Sybase
  • Ingres
  • Microsoft SQL Server
  • SQLite (3.7.2 and later)

That is not every possible DBMS, but it is more or less every major SQL DBMS.

Slides from today’s resumes & interviews talk

| 1 Comment

This morning I gave a presentation titled Resumes & Interviews From the Hiring Manager’s Perspective at the Career TOOLS Conference in Milwaukee, WI.

Big lesson learned: Even when the conference says they’re providing the laptops already set up, bring your own slide clicker, in case you’re on a big stage in an auditorium, and the laptop is in the orchestra pit, and they don’t have a clicker for you.

That technical problem aside, solved by a having a human slide clicker and hand signals, it was a good conference and I hope some people got some ideas to help them in their job searches.

A little Ruby program to monitor Solr DIH imports

| 0 comments

Solr is a text indexing package. All interaction with it is through GETting and POSTting to the service, and then XML responses.

After you do the GET to start an import with Solr’s DataImportHandler, you have to check a status URL, and Solr gives a response like this:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
    </lst>
    <lst name="initArgs">
        <lst name="defaults">
            <str name="config">jdbc.xml</str>
        </lst>
    </lst>
    <str name="command">status</str>
    <str name="status">busy</str>
    <str name="importResponse">A command is still running...</str>
    <lst name="statusMessages">
        <str name="Time Elapsed">0:0:4.545</str>
        <str name="Total Requests made to DataSource">1</str>
        <str name="Total Rows Fetched">36262</str>
        <str name="Total Documents Processed">36261</str>
        <str name="Total Documents Skipped">0</str>
        <str name="Full Dump Started">2012-07-11 09:31:03</str>
    </lst>
    <str name="WARNING">This response format is experimental.  It is likely to change in the future.</str>
</response>

And then after a while when you check the status URL, the response looks like this:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
    </lst>
    <lst name="initArgs">
        <lst name="defaults">
            <str name="config">jdbc.xml</str>
        </lst>
    </lst>
    <str name="command">status</str>
    <str name="status">idle</str>
    <str name="importResponse"/>
    <lst name="statusMessages">
        <str name="Total Requests made to DataSource">1</str>
        <str name="Total Rows Fetched">1000000</str>
        <str name="Total Documents Skipped">0</str>
        <str name="Full Dump Started">2012-07-11 09:23:30</str>
        <str name="">Indexing completed. Added/Updated: 1000000 documents. Deleted 0 documents.</str>
        <str name="Committed">2012-07-11 09:26:01</str>
        <str name="Total Documents Processed">1000000</str>
        <str name="Time taken">0:2:31.95</str>
    </lst>
    <str name="WARNING">This response format is experimental.  It is likely to change in the future.</str>
</response>

But when does it finish? There’s no way to tell other than hitting that status URL and watching for it to change. I needed a tool to tell me when importing had finished, so I could use it in my makefile. It just has to check the status until it’s completed, and then exit.

So, I wrote a little program to do the monitoring, using Ruby and the Nokogiri library. Nokogiri is a web client similar to Perl’s WWW::Mechanize, with built-in XPath and CSS selector capabilities.

#!/usr/bin/ruby

require 'rubygems'
require 'nokogiri'
require 'open-uri'

while true
    doc = Nokogiri::XML( open( 'http://hostname:8080/solr/db/dih?command=status' ) )

    # If it's still running, this status will say something like "A process is still running..."
    # The status turns blank when the process has stopped.
    status = doc.xpath( '//response/str[@name="importResponse"]' ).inner_text
    if ( status == '' )
        break
    end

    # Get the import process's elapsed time and record count and display then
    time_elapsed   = doc.xpath( '//response/lst[@name = "statusMessages"]/str[@name = "Time Elapsed"]' ).inner_text
    docs_processed = doc.xpath( '//response/lst[@name = "statusMessages"]/str[@name = "Total Documents Processed"]' ).inner_text
    puts docs_processed + ' documents in ' + time_elapsed + ' seconds'

    sleep(2)
end

I’m not much of a Ruby guy, but this was pretty simple to write. Most of my time was looking at Nokogiri’s method listings and reacquainting myself with XPath syntax. The one Ruby gotcha I found was that before Ruby 1.9, if your program uses any Ruby gems, you have to put require 'rubygems' before you require any other gems.

SELECT * is a bug waiting to happen

| 0 comments

A SQL SELECT statement that use * instead of an explicit column list is a bug waiting to happen.  Beyond the quick-and-dirty prototyping stage, every SQL query in an application should explicitly specify the columns it needs to protect against future changes.

Say you’ve got a table and code like this:

USERS table:
id integer NOT NULL
name varchar(100) NOT NULL
mail varchar(100)

my $query = perform_select( 'select * from users' );
while ( my $row = $query->fetch_next ) {
    if ( defined($row{mail}) ) {
        # do something to send user mail
    }
}

Later on, someone goes and renames the users.mail column to users.email. Your program will never know it. The email branch will just never execute.

Here’s another example. Say you’ve got that users table joining to departments, like so

users table:
id integer NOT NULL
name varchar(100) NOT NULL
email varchar(100)
deptid integer

dept table:
id integer NOT NULL
deptname varchar(100) NOT NULL

SELECT *
FROM users u JOIN dept d ON (u.deptid = d.id)

So your selects come back with id, name, email, deptid, id, deptname. You’ve got “id” in there twice. How does your DB layer handle that situation? Which “id” column takes precedence? That’s not something I want to have to spend brain cycles thinking about.

You should even specify which table each columns come from. For example, say you don’t want the IDs and you just specify the columns you want. So you write something like this:

SELECT name, email, deptname
FROM users u JOIN dept d ON (u.deptid = d.id)

Later on, someone adds an email column to the dept table. Now, your “SELECT name, email, deptname” is making an ambiguous column reference to “email”. If you specify everything fully:

SELECT u.name, u.email, d.deptname
FROM users u JOIN dept d ON (u.deptid = d.id)

then you’re future-proof.

Of course, this rule doesn’t apply to code that is dealing with columns in aggregate. If you’re writing a utility that deals with all columns in a row and transforms them somehow as a group, then no, you don’t need to specify columns.

Aside from the potential bugs, I also think it’s important to be clear to the human reader of your code what exactly you’re pulling from the database. SELECT * makes it a guessing game. Which of these makes it more obvious to the reader what I’m doing?

SELECT * FROM users;

or

SELECT first_name, last_name, email_addr FROM users;

There are also all sorts of speed reasons to specify columns. You reduce the amount of work fetching data from the disk, and your DBMS may not even have to fetch rows from disk if the data is covered in an index. For discussion of the performance issues, see this StackOverflow thread. One thing to remember: Your code will never be slower if you specify columns. It can ONLY be faster.

The speedups are secondary, however. I want to write my queries to be resistant to future change. I don’t mind making a few extra keystrokes to make that happen. That’s why I always specify columns in my SELECTs.

My YAPC::NA 2012 recap

| 1 Comment

Random notes and comments about YAPC::NA in Madison, WI

ack 2.0

I uploaded ack 2.00alpha01 to the CPAN.

All that week, Rob Hoelz did a ton of work, and Jerry Gay was invaluable in helping us work through some configuration issues. Then, out of nowhere, Ryan Olson swoops in to close some sticky issues in the GitHub queue. I love conferences for bringing people together to get things done.

Finally, on Thursday night at the Bad Movie BOF I hacked away on the final few tickets while watching “Computer Beach Party (1987)”. Halfway through MST3K’s take on “Catalina Caper (1967)”, I made the alpha release. If that’s not heaven, I don’t know what is.

Mojolicious

Glen Hinkle

Mojolicous looks really cool. Glen called it a “full web framework, not partial,” although I’m not sure what would count as a partial framework.

It has no outside dependencies, and works to have a lot of bleeding edge features like websockets, non-blocking events, IPv6 and concurrent requests.

Mojo::UserAgent is the client that is part of Mojolicious, and it’s got all sorts of cool features:

  • DOM parsing
  • text selection via CSS selectors
    • For example, “give me all the text that is #introduction ul li.”
    • Command line: mojo get mojolicio.us '#introduction ul li'
  • JSON parsing
  • JSON pointers
    • JSON pointers look like XPath as a way of specifying data in
      a JSON string

Mojolicious is based on “routes”, which look like:

get '/'
get '/:placeholder'
get '/#releaxed'
get '/*wildcard'

The latter three are (apparently) ways of making flexible URL specifications that then return information to your app about the URL.

Sample app with Mojolicious::Lite:

use Mojolicious::Lite;
get '/' => sub {
    my $self = shift;
    $self->render( text => 'mytemplate' );
}
app->start;

__DATA__
@@ mytemplate.html.ep
Hello!

Mojolicious also has its own templating language that looks a lot like Mason, but Glen said you can use Template Toolkit as well (and presumably others, but TT was the only one I was
interested in.)

Full Mojolicious includes a dev server called Morbo and you can run your apps through the Hypnotoad “hot-code-reloading production server” if you don’t want to run under Apache/etc.

Another selling point for Mojolicious: They value making things “beautiful” and “fun”. Glen specifically said “Join our IRC channel. We will not be mean to you.”

Perl-as-a-Service shootout

Mark Allen

Slides

This was disappointing because I was hoping for recommendations to use or not use a given vendor’s offerings. I was hoping at least for “This vendor does this, and that one does that differently,” but all I came away with was “they’re pretty much the same.”

It’s a good sign that, as Mark put it, “getting PSGI-compliant apps into PaaS is generally pain free.”

His criteria were as follows:

  • Ease of deployment
  • Performance (ignored)
  • Cost (ignored)
  • How “magical” the Perl support is (first class or hacked together)

Why ignore performance and cost? I don’t know.

Big data and PDL

There were three sessions back-to-back about PDL, the Perl Data Language. It’s in the same space as Mathematica and R. I was disappointed because I was hoping for big data analysis outside of just number crunching. The analysis of galaxy luminosity was pretty and looked very easy to do, but it didn’t have any application I was interested in. I bailed after the 2nd talk.

My big takeaway from the talk was that I need to take a statistics
class.

Web security 101

Michael Peters gave a good intro talk on security, handwaving the tech details with examples of “This is how bad guys can get your info.”

Emphasis on not trusting your client data, but I was surprised and disappointed that he seemed to steer people away from Perl’s taint mode. He made vague reference to there being bugs with regexes and taint mode, but I don’t know what he’s referring to.

Taint mode is one of my favorite things about Perl 5, and there are (last I checked) no plans for implementing it in Perl 6. :-(

One of the examples Michael used for an example of an attack with SQL injection used sleep() to let the attacker find out information about the database based on timings. I asked him to write that up for bobby-tables.com.

On being a polyglot

Miyagawa gave a great overview of how he spends time in Perl, Python and Ruby, and what he learns from each, and what each language learns from the others.

Key point: Ruby is not the enemy. They are neighbors.

Things he likes about Ruby:

  • Everything is an object
  • More Perlish than Python
  • Diversity matters = TIMTOTWTDI
  • Meta programming built in and encouraged
  • Convention of ! and ? in method names
    • str.upcase! to upcase str in place
    • str.islower? to functions that return values
  • Ability to omit self
  • Everything is an expression.
  • No need to type ; (unlike Python)
  • Implicit better than explicit
  • block, iterators and yield
  • No semicolons, 2-space indent.
    • (This last one gives me the creeps. 2-space indent!??!)

Naming differences between the three:

  • Perl naming: Descriptive, boring, clones become ::Simple
  • Python naming: Descriptive, confusing, everything is py* or *py
  • Ruby naming: Fancy, creative, chaotic (Sinatra, Rails, etc)
  • With frameworks, all the languages get creative: Django, nbottle,
    Catalyst, Dancer, Mojolicious

When you’re going to borrow something from another language, don’t just borrow it, but copy it wholesale. Example: Perl’s WWW::Mechanize getting cloned as Ruby’s WWW::Mechanize.

Doing Things Wrong, chromatic

chromatic talked about the value of doing things “wrong” and embracing your constraints. Sometimes you can’t do The Perfect Job, and that’s OK, and sometimes comes out even better.

Example: chromatic wanted to do some parallel web fetching. He could have dug into LWP::Parallel, but instead he went with what he knew: waitpid() and shelling to curl.

Screen scraping example:

Parsing HTML with regex may be the “wrong” way to do
it, but sometimes, it’s the best solution.

Perl 6 lists

Patrick Michaud talked about all kinds of awesome stuff you can do with lists and arrays in Perl 6. After a bit I stopped trying to take notes and follow what he was saying and instead just let it wash over me so I could absorb the coolness.

I would really like Perl 6 to be easy enough to install for serious play. I need to get my feet back into the Perl 6 pool and see how I can help.

Tweakers Anonymous

John Anderson (genehack)

Quick overview of cool things that he has in his configs.

  • “The F keys are not just to skip tracks in your music player.”
  • Keep your configs in git. You will screw them up. This will save you.
  • Make your editor chmod +x when you create a .pl file since you know you will want to run it.

The coolest thing was this plugin called flymake. Apparently it runs continuously, submitting your code to a compiler (or perl -c) as you type. As soon as John made a typo on a line and moved to the next line, the error line was highlighted. He then demonstrated doing this with Perl::Critic, which must be dog slow, but flymake lets you adjust the frequency of checks.

Exceptional Exceptions

Mark Fowler, now at OmniTI. Great discussion of exceptions in Perl.

Returning false on failure sucks because you have to follow your failures all the way up the call tree. It’s tedious and error-prone because all it takes is one link in the chain to not propagate the error and you’re out of luck.

Using try/catch from Java.

There are three non-deprecated ways of doing exceptions in Perl.

eval

eval is often confused with eval $string which means to compile code. eval is a statement not a block so requires a semicolon after it. It works but it’s a pain.

Try::Tiny

  • Simple extension to the syntax
  • Uses $_ not $@

TryCatch

  • Has named exception variables
  • Fully functional syntax
  • Very fast and featureful
  • Large dependency base

TryCatch is a little faster than Try::Tiny, but eval is much much faster than either of them.

TryCatch has much more clever syntax, but looks (to me) to be more dangerous.

Mark recommends that whatever you use, you make exceptions out of Exception::Class objects.