Software has held more promise and yet met with more failure than any other technology. Meanwhile the hardware guys have been merrily creating more and more CPU cycles for us to swamp.
A lot has been written about why software sucks. Fred Brooks famously predicted there would be no silver bullet improvement in software productive over the next ten years. That was twenty years ago, but there are signs that the dark age of software might be slowly and quietly coming to an end.
Brooks was right about the lack of a silver bullet, of course. But since then we've learned that we've been trying to kill the wrong monster. Building a program was compared to the reliable timeline and quality of building a bridge or skyscraper, and we were mystified on how to get there. This is the wrong monster; instead we need to view code as design and adopt our practices to match. Coding is a creative process with much more in common with design than with any industrial process -- Jack Reeves got it right.
Notice successful development methodologies jibe with viewing code as design. We unit test because a design isn't complete until validated with a simulation. We work in iterations because a change in one part of the design affects others. We document in the source code itself because that is the definitive source of design.
The code-is-design perspective also predicts many supposed mysteries: huge productivity variance, a high number of defects, and failures of rote process are all expected in unvalidated, untested design.
Mysteries and Problems
Fortunately we've taken an important step: software development was once a mystery in search of a silver bullet; now it is a problem with real progress. Noam Chomsky pointed out the first step to understanding the natural world is to turn mysteries into problems. We're seeing the same with software engineering.
Given the premise of code being design, here are some steps to take us in the right direction:
1. Boil out the remaining accidents. We can still have race conditions or leak resources in many programming languages. Newer languages should make these mistakes impossible. We can also learn from existing languages. For instance, the messaging model in Erlang prevents many types of timing problems. Functional languages may eliminate the many types of bugs which arise from side effects. The essence of a design should not be concerned with timing or unexpected side effects, so whenever possible this should be solved in the language itself.
2. Make validation of designs easy. How about the following requirement for a new programming language:
It must be straightforward to write a unit test to reproduce any possible bug.
Since designs must be validated with computer simulation, we must make those simulations easy. Huge strides have been made with various unit testing frameworks, but there are still shortcomings. Proofs of correctness are impractical for most systems, but the ability to easily assert a system doesn't have a particular bug is the next best thing. Furthermore, any bug discovered at any time should be added to the test suite to make sure it never occurs again. This ensures the defect rate of a system trends towards zero.
3. Design and code should merge into a single artifact. We still need design at a higher level than current programming languages, but this design needs to be an asset rather than a burden that gets out of synch with the code.
We are already seeing this: documentation has been merged with source code and tools convert code to UML diagrams and vice versa. But there is a lot of room for improvement. Code in object-oriented languages often includes many unessential objects that are just noise from a high-level view. These should be eliminated from the language, or at least filtered from the high-level design.
There are surely many other examples that I'm overlooking, and functional languages do hold promise. What's important is we now have an idea of why software sucks, and an inkling of what the solution might be.
Sunday, November 26, 2006
Tuesday, November 14, 2006
The real reason we need strongly-typed languages
There aren't many hard-and-fast facts about software, but here's one:
The more code you have, the more bugs you have.
More formally, lines of code is a strong predictor of defects. So, any language that allows you to do the same work with less code must be a Good Thing. This is why Python is better than Java is better than C++ is better than assembly, right?
Well, almost. The standard counter argument is strongly-typed languages catch mistakes at compile time that slip to run time for dynamic languages. I don't buy this argument, because even though it will catch some mistakes, there are so many other errors to be made you still need to unit test the code. And good unit tests will catch the same errors of strongly-typed languages. Okay, so a lot of people don't write good unit tests, but until they do they're beyond our help anyway. As Bruce Eckel said, "if it's not tested, it's broken".
So why am I a proponent of strongly-typed languages for many (but not all) problem sets? Because they give us one thing dynamic languages by definition never will: unambiguous, guaranteed documentation. Building large software systems means interfacing with subsystems written by others, and they must be documented. Strongly-typed languages have an important part of that documentation built in: they precisely define the input and output types of each procedure call. An API in a dynamic language needs the same documentation anyway, so why provide the chance for error? Even for well-documented libraries, the reduced amount of code in the dynamic language is balanced by increased documentation.
In fact, I'd like to see languages with even stronger guarantees. A Nice addition to Java would be to define references that can never be null. Many times I've used an API and asked, "do I need to check for null?" The answer to my question should be part of the API itself.
Of course, specific dynamic languages may have other advantages over their strongly-typed counterparts. Also, some programs may not need the detailed level of documentation offered by strong typing. Even so, type definitions are a key part of the documentation needed for a large system.
The more code you have, the more bugs you have.
More formally, lines of code is a strong predictor of defects. So, any language that allows you to do the same work with less code must be a Good Thing. This is why Python is better than Java is better than C++ is better than assembly, right?
Well, almost. The standard counter argument is strongly-typed languages catch mistakes at compile time that slip to run time for dynamic languages. I don't buy this argument, because even though it will catch some mistakes, there are so many other errors to be made you still need to unit test the code. And good unit tests will catch the same errors of strongly-typed languages. Okay, so a lot of people don't write good unit tests, but until they do they're beyond our help anyway. As Bruce Eckel said, "if it's not tested, it's broken".
So why am I a proponent of strongly-typed languages for many (but not all) problem sets? Because they give us one thing dynamic languages by definition never will: unambiguous, guaranteed documentation. Building large software systems means interfacing with subsystems written by others, and they must be documented. Strongly-typed languages have an important part of that documentation built in: they precisely define the input and output types of each procedure call. An API in a dynamic language needs the same documentation anyway, so why provide the chance for error? Even for well-documented libraries, the reduced amount of code in the dynamic language is balanced by increased documentation.
In fact, I'd like to see languages with even stronger guarantees. A Nice addition to Java would be to define references that can never be null. Many times I've used an API and asked, "do I need to check for null?" The answer to my question should be part of the API itself.
Of course, specific dynamic languages may have other advantages over their strongly-typed counterparts. Also, some programs may not need the detailed level of documentation offered by strong typing. Even so, type definitions are a key part of the documentation needed for a large system.
Subscribe to:
Posts (Atom)