Python Rocks! and other rants
Weblog of Kent S Johnson

#46 2004-05-28 20:55:28

Why I love Python 3

Sometimes it's the little things that make my day. For example, string handling and tuple unpacking.

It's so easy to work with strings in Python! Here is the code to drop the last character from a string:

s = s[:-1]

In Java that becomes

s = s.substring(0, s.length()-1);

How about splitting a string on the last instance of '/', where at least one / is assumed to be present?

i = s.rindex('/')
prefix = s[:i]
lastElement = s[i+1:]

In Java it is about the same, though much more verbose:

int i = s.lastIndexOf('/');
String prefix = s.substring(0, i);
String lastElement = s.substring(i+1);

In Python, I can easily make this into a function that returns both prefix and lastElement. With tuple-unpacking, the client code can assign the values to two varibles:

def splitPath(path):
    i = path.rindex('/')
    return path[:i], path[i+1:]

# client code
prefix, last = splitPath(aPath)

There is no reasonable equivalent in Java.

I am working on a small database application. It has a utility class that queries the database, returning the result as a list of lists, one for each row. Client code can easily unpack the row lists into individial variables. This is what it looks like:

data = self.dba.queryAsList('select topicid, parentid, groupid from topic')
for topicid, parentid, groupid in data:
  # process the data

Sweet!

Update: A couple of comments point out that the best way to split a file path in Python is to use os.path.split(). I agree! But the point of the post is to show how easy it is to work with strings; maybe I'll talk about the standard libraries another time.

Categories: Python

#45 2004-05-27 18:28:16

Uncle Bob says, "Go well, not fast"

Robert Martin argues eloquently for taking the time to make your code right the first time. If you focus only on speed then you end up dragging so much weight around that speed is impossible. If you do it right you can keep moving. Recommended reading. Highly recommended practice!

Uncle Bob says, "Go well, not fast"

Categories: Agile

#44 2004-05-25 08:43:44

Unit testing a complex procedure

I am working on a unit test for a complicated, multi-step procedure. Conceptually it is something like this:

def complicatedStuff(self):
  self.step1()
  self.step2()
  self.step3()
  # etc...

Ideally I would like to write tests for each step:

def test_step1(self):
  # self.obj is the object under test
  # set up to test step 1...

  self.obj.step1()

  # check that step 1 was successful...

def test_step2(self):
  # etc...

The problem is that the setup for each step is complex. The best way to set up to test step3() is to do step1() and step2(). So I have settled for a single test method that has the same structure as complicatedStuff():

def test_complicatedStuff(self):
  # set up for step1...

  self.obj.step1()
  # check that step 1 was successful...

  self.obj.step2()
  # check that step 2 was successful...

  self.obj.step3()
  # etc...

This smells. It is a clear violation of Don't Repeat Yourself - the structure of complicatedStuff() is duplicated. As a result it is fragile. If the structure of complicatedStuff() changes, test_complicatedStuff() has to change the same way. On the other hand, it works, which is worth a lot!

Categories: Agile

#43 2004-05-24 19:10:56

Database unit testing is HARD

I am working on a project that makes some brain-twisting changes to a database. It is the first database work I have done in a while, the changes are a bit tricky, and the consequences of failure grim (breaking large production databases in use by many thousands of customers daily), so I am writing unit tests for everything using DbUnit.

DbUnit has one feature that I really like - the setUp() method of a test case can initialize the database to a known state. This feature alone is enough to adopt DbUnit.

What really stands out is how slowly I am making progress. There are many reasons for this - I am learning the problem domain as I go along, the problem has a number of wrinkles to it. But part it is that writing the unit tests is just plain hard.

Often unit testing is pretty simple - pass a few parameters to a function, check the result. Repeat for a few different sets of parameters. Sometimes there is a structure to be set up or checked. It's easy to work in small bites.

For this project, each test case requires the database to be set up. With DbUnit, this means creating an XML file that reflects the desired state of the database tables. These files are hard to read and hard to create when the table has foreign keys to another table. In my case, one of the tables represents a tree structure so it is essentially a list of parent-child relationships.

So first I have to figure out what will make a good test case. Then I create the XML file, either by hand editing or by somehow getting the database into the desired state and dumping it to XML. Finally I can write the actual test. This usually involves writing some queries to figure out if the database is in the correct state.

Then I can actually write the code to make the test pass. No wonder it is going slowly!

By the way I am writing the project in Jython. DbUnit works well with Jython because you don't have to subclass a DbUnit test case class - you can use DbUnit using independent objects and static assertions. I am writing the test cases with Python's unittest module and calling DbUnit as a library.

Categories: Agile

#42 2004-05-19 09:43:28

First-class functions are my friends

Python's first-class functions make life easier in so many ways. A Java programmer will say, "I can do the same thing with interfaces and anonymous inner classes," but it's just not the same.

The most obvious use is for callbacks of any kind. The simplest example is a function that runs a command. In Python, the command can be passed directly into the runner as a function. For example:

# The run function just calls the callback
def run(cmd):
    cmd()

# A simple command
def myCommand():
    print 'myCommand'

# Run the command
run(myCommand)  # prints 'myCommand'

You can pass an instance method to runner using a bound method:

class RunnerClient:
    def useRunner(self):
        run(self.command)

    def command(self):
        print 'RunnerClient.command'

RunnerClient().useRunner()  # prints 'RunnerClient.command'

This can easily be extended to pass any number of arguments to the command given to runner, and to return any number of results:

# The run function
def run(cmd, *args):
    return cmd(*args)

# A simple command
def myCommand(x, y):
    print 'x = %d, y = %d' % (x, y)
    return x+10, y+10

# Run the command
z, w = run(myCommand, 10, 20)   # prints 'x = 10, y = 20'
print z, w  # prints '20 30'

Alternatively, the command arguments can be bound to a function of no arguments using lambda:

# The run function
def run(cmd):
    return cmd()

# A simple command
def myCommand(x, y):
    print 'x = %d, y = %d' % (x, y)
    return x+10, y+10

# Run the command
z, w = run(lambda : myCommand(10, 20))

That's easy! What does it look like in Java? Let's start with the simplest case above. We need an interface for the command, a Runner class and an instantiation of the command. I made Command and Runner inner classes to keep everything in one module; normally they would be top-level classes. Here is the code:

public class RunnerClient {
    private interface Command {
        public void execute();
    }

    private static class Runner {
        public static void run(Command cmd) {
            cmd.execute();
        }
    }

    public void useRunner() {
        Command cmd = new Command() {
            public void execute() {
                System.out.println("myCommand");
            }
        };

        Runner.run(cmd);
    }

    public static void main(String[] args) {
        new RunnerClient().useRunner();
    }
}

Yikes! That's nasty. So much noise! There is the visual noise of the extra punctuation and type declarations, and the conceptual noise of class declarations.

If you want the callback to be a member function, it's not too much different. The anonymous inner class can forward to the method of the main class:

public class RunnerClient {
    private interface Command {
        public void execute();
    }

    private static class Runner {
        public static void run(Command cmd) {
            cmd.execute();
        }
    }

    public void useRunner() {
        Command cmd = new Command() {
            public void execute() {
                command();
            }
        };

        Runner.run(cmd);
    }

    public void command() {
        System.out.println("RunnerClient.command()");
    }

    public static void main(String[] args) {
        new RunnerClient().useRunner();
    }
}

To pass arguments to the command...I'm not sure I want to think about that. Probably the simplest solution is to make a specific Command implementation to hold the arguments to the command. You need a different helper class for each command signature. Maybe there is a way to do it with introspection. Any way you cut it, it's going to be painful.

#41 2004-05-18 07:52:32

Velocity and Velocity Tools releases

The Jakarta Velocity project has recently released new versions of both the main Velocity package and the Velocity Tools sub-package. Velocity is a flexible, powerful and easy-to-use template engine that can be used for any kind of text generation including web pages, XML files and code generation. Velocity Tools includes modules that integrate Velocity with Struts and allow Velocity to be used as an alternative to JSP for Struts views. Velocity's simple, uncluttered syntax makes it a very readable alternative to JSP.

Velocity has a built-in introspection engine that makes it suitable for generating text output from any kind of domain data. In particular it is a great way to generate XML from a Java object model. All the XML markup goes in the template file instead of cluttering up your code. [*]

I have used Velocity on many projects for report generation, web page generation, and output file formatting. I highly recommend it.

With the new releases, this is a great time to take a look at these packages.

[*]	: Of all the ways to generate XML from a Java object model, I think Velocity is one of the best. Just like with web page generation, it separates content and presentation and keeps the markup out of your code. Here are some alternatives:

If your data is in a DOM model already, you can serialize it with DOM tools. This works as a way to persist your data. For custom formatting you need to transform the data, for example using XSL.
You can build a DOM model and serialize it. That is a lot of work just to generate a file.
To serialize a custom model you can use one of the many Java-to-XML mapping products such as Castor XML Mapping or JAXB. Again, this works for persistence but it is hard to generate a custom format from it.
You can write code to generate the XML directly or using a helper library such as XMLWriter. This is painful in several ways. It puts the markup in the code, which is not such a good idea. You have to manually match up begin and end tags, which is difficult and error-prone. There is no good place for the code - it doesn't feel right either in the data classes or as a separate module.
Or (drum-roll, please) you can write a Velocity template to generate the XML. In code, you just need a little boilerplate to set up the template engine and run the template. All the XML structure is in one place, in the template file. You have full access to the data using Velocity introspection. You can add more output formats just by creating new templates and hooking them in to the code. Life is good!

Categories: Java

#40 2004-05-17 09:00:48

Going beyond the hammer

"When all you have is a hammer, everything looks like a nail." But if the hammer gets the job done, how do you convince someone to learn to use another tool? Most people will continue to use the tools they know and are comfortable with. Presented with a new problem that is solvable with known tools, they will use those tools.

I like to hunt for new tools. I read blogs and follow up on interesting links. I keep tabs on what is going on in the Jakarta project. I don't swallow the conventional wisdom about what is best without poking into it first. I'm not exactly an early adopter, but I actively look for tools with a track record and a promise to get the job done easier, faster, better. When I find something promising I look for an opportunity to try it out.

Today's news flash is - most of the developers I work with aren't like that. They walk the well-worn, familiar paths they know. If they need to go somewhere new, they will look for the accepted way to get there.

A colleague is agitating to use Jython in a new project and has a certain amount of buy-in from the project leader. There are two other developers on the project. One of them started writing a major piece of the project in Java. My colleague is frustrated. But it's a stretch to expect someone to adopt a new tool on his own.

I don't know what the answer is. Successful examples don't seem to be very convincing. Major handholding might work. A group decision or top-down leadership can work.

Categories: Python

#39 2004-05-12 22:08:00

dom4j models are strongly typed, too

An object model built with dom4j and accessed using XPath has a strong, dynamic type system. An object model built in dom4j is homogeneous - everything is an Element. But when you try to access an attribute or child, the type system kicks in. If the Element is not the correct "type" for the operation you are attempting, you will get an error.

In other words, nodes are strongly typed - they support a certain set of attributes and operations - and the typing is determined dynamically, at runtime, when you try to access an attribute.

There are a couple of ways this typing is weaker than Python's native type system. The most glaring difference is that it doesn't extend to the values of attributes. Values are always represented as text, whether they represent integers, dates, or any other actual data type. Another difference is that the error notification is much weaker with XPath. In Python you will raise an AttributeError or other exception if you try to access something that is not there. With dom4j and XPath you will just get a None value or an empty string, which is much easier to miss.

This is one reason I like the combination of Jython and dom4j so much - their styles of data access are similar, so it is easy to switch between them.

For example, in Python you might have a simple object hierarchy like this:

class Parent:
  def __init__(self, name):
      self.name = name
      self.children = []

class Child:
  def __init__(self, name, age):
      self.name = name
      self.age = age

Now, given a Parent, you can access attributes and children:

parent.name
parent.children[1].age
for child in parent.children:
  print child.age

If you try e.g. parent.age you will get an AttributeError.

Now suppose the same object hierarchy is represented as dom4j nodes. The hierarchy might be generated from XML that looks like this:

<Parent name="Kent">
  <Child name="Cathy" age="14" />
  <Child name="Roger" age="8" />
</Parent>

After parsing, you will have a dom4j Element object that supports operations like this:

parent.valueOf('@name')
parent.valueOf('Child[1]/@age')
for child in parent.selectNodes('Child'):
  print child.valueOf('@age')

If you try parent.valueOf('@age') you will just get an empty string.

This is a little more verbose than the pure Python version but conceptually it is not that different. It has several bonuses as well:

you don't have to define the classes for the object model
serialization and deserialization are nearly free
you can do quite complicated queries directly on the data model
you can write generic tools to traverse the model

Update: Here is a little more background.

Meccano1 was written in Java with a custom Java domain model. At the time I hadn't written any large Python programs and I liked the security of a statically-typed model.

Meccano2 is written in Jython using much of the code from Meccano1 as a library. At first we continued to use the Java domain model. We quickly realized that there would be many benefits to using some kind of DOM model, including easy serialization and querying.

At the time, I was very nervous about giving up the safety of the Java model. But the forces for a DOM model were compelling. With some trepidation I made the shift. This was a large task - it was a major change to a fundamental part of the application. For example I re-wrote all the (Velocity) output templates to use the new model, and I wrote a semi-automatic, reflection-based converter so I could continue to use a parser which was a major component of Meccano1.

In retrospect, this was an excellent decision and a major design win for the application. I've written elsewhere [1] about why I like dom4j as a domain model so I won't repeat that here.

The point of this post is that with the move from Java to Jython, I was already trading static typing for dynamic typing. Even with the Java data model I would have lost static typing because I was accessing the data from Jython. With the shift to the dom4j model I didn't give up much more than I had already given up with the language change.

[1]	See these essays: ` <Jython + dom4j = High octane development/stories/00002.html>`_, XPath and dom4j

Categories: Python

#38 2004-05-05 21:40:16

The opposite of strong, static typing is...

If you come from a background in C programming, the strong, static type systems of C++ [1] and Java are a welcome relief from the chaos of weak typing in C. It's such a relief to get some help figuring out what a variable is!

I think that may be why programmers become so attached to static (compile-time) type checking. It is a solution to a very painful problem. It is a very comforting security blanket.

Python is different and it makes these people uneasy. What they have to realize is that Python is not weakly typed. Python has a strong, dynamic type system. Types are associated with values instead of variables, and types are checked dynamically (at run-time) instead of statically.

This is very different than C. In C, if you pass a pointer to Foo to a function that expects a pointer to Bar, the called function will very happily interpret the Foo as a Bar with bad results.

In Python, if you pass a reference to Foo to a function that expects a reference to Bar, as soon as the called function tries to use any Bar attribute that is not also a Foo attribute, an exception will be raised and the function will abort.

The point is, the chaos of C is not the only alternative to static typing. Strong typing provides significant type-safety whether the types are checked at compile time or at run time.

Weak, static typing -> C
Strong, static typing -> C++, Java
Strong, dynamic typing -> Python
Weak, dynamic typing -> Perl [2]

Me, I go for the strong, dynamic type!

[1]	Arguably C++ is weakly typed because there are many escapes from the type system. But C++ allows significantly better type-safety than C and for purposes of this article I consider it to be strongly typed.

[2]	This seems to be a matter of some dispute. But both Wikipedia and PhpWiki list Perl as a weakly-typed language.

Categories: Python

#37 2004-05-03 08:58:40

When to design, when to code?

Thinking about when to design and when to write code leads to an illuminating distinction between different development styles.

A (sadly) popular style of development is code-and-fix programming. In this style, the goal is to do the minimum required to get something that appears to work. At its worst this is coding without design, at any rate without thinking seriously about design or architecture or long-term viability. It leads to unreadable, unmaintainable code, cascading defects and many other problems.

Big Design Up Front (BDUF) is a response to this style. BDUF attempts to figure out the solution in the abstract before starting to implement it. This is design without coding. This approach has many problems as well. It is hard to figure out a good design without the hands-on knowledge you get from writing code. It is brittle and unresponsive to changing requirements. It takes considerable effort to create and maintain design documents.

The agile approach is a middle way. It disdains both sloppy hacking and excessive design. Agile programmers think about what might work and try it. They refactor when they have a better idea or when the requirements change. They keep growing the design.

Agile development is sometimes feared because without up-front design it must be "just hacking". If you aren't designing up front, with formal process and documents, you must not be designing at all. This fear is born from the reaction to code-and-fix programming and thinking that code-and-fix is the only alternative to BDUF. This is a false dichotomy and a false perception of agile development.

Design and coding work best when taken together. If you try to think your way through the design without writing code you lack the on-the-ground knowledge you get from coding. If you just start writing code with the sole goal of getting something that seems to work, you are hacking in the worst sense of the word. But if you consider each addition to the code carefully and keep the code clean at all times you will end up with a thing of beauty - well-designed code that is superbly suited to the task at hand.

Note: Ned Batchelder's blog about the similarity between diamond cutting and the decisions that must be made while designing a software system was the spark that led me to this idea.

Categories: Agile