Tuesday, November 14, 2006

The real reason we need strongly-typed languages

There aren't many hard-and-fast facts about software, but here's one:

The more code you have, the more bugs you have.

More formally, lines of code is a strong predictor of defects. So, any language that allows you to do the same work with less code must be a Good Thing. This is why Python is better than Java is better than C++ is better than assembly, right?

Well, almost. The standard counter argument is strongly-typed languages catch mistakes at compile time that slip to run time for dynamic languages. I don't buy this argument, because even though it will catch some mistakes, there are so many other errors to be made you still need to unit test the code. And good unit tests will catch the same errors of strongly-typed languages. Okay, so a lot of people don't write good unit tests, but until they do they're beyond our help anyway. As Bruce Eckel said, "if it's not tested, it's broken".

So why am I a proponent of strongly-typed languages for many (but not all) problem sets? Because they give us one thing dynamic languages by definition never will: unambiguous, guaranteed documentation. Building large software systems means interfacing with subsystems written by others, and they must be documented. Strongly-typed languages have an important part of that documentation built in: they precisely define the input and output types of each procedure call. An API in a dynamic language needs the same documentation anyway, so why provide the chance for error? Even for well-documented libraries, the reduced amount of code in the dynamic language is balanced by increased documentation.

In fact, I'd like to see languages with even stronger guarantees. A Nice addition to Java would be to define references that can never be null. Many times I've used an API and asked, "do I need to check for null?" The answer to my question should be part of the API itself.

Of course, specific dynamic languages may have other advantages over their strongly-typed counterparts. Also, some programs may not need the detailed level of documentation offered by strong typing. Even so, type definitions are a key part of the documentation needed for a large system.

5 comments:

Chris Hedgate said...
This post has been removed by the author.
Chris Hedgate said...

(I just realized my previous comment could be interpreted very differently from how I intended it. Since updates to comments are not possible, I deleted the old and repost this.)

I am currently thinking a lot about the (possible) issue of lack of built-in documentation of dynamic languages (Python has interfaces for this reason). I am not certain about my thoughts yet so unfortunately I cannot give any good answer. However, I did think about one thing you wrote.

You said you do not buy the argument that static languages catch mistakes at runtime (I agree). However, the same argument can be extended to the "unambigous, guaranteed documentation" case. Even if we can be sure that a class that implements interface X will have a method Y (since it is defined in the interface), there is no guarantee that the implementation does what we expect it to.

I see the Abstract Test Case testing pattern as a solution for this, but I wonder if it could not also be applied to systems written in a dynamic language.

jomofo7 said...

The more behavioral semantics you can express in the definition of the interface, the more easily you can derive unit tests. In fact, the marriage of API design and unit testing is the penultimate goal of TDD proponents.

Yet, from the perspective of an API designer following TDD methodologies, I find myself becoming increasingly frustrated that I have to separate the design of my APIs from the design of my test cases.

For example, if I can unambiguously express in my API design that null is an unacceptable input, then why do *I* have to write code to automatically assert that I've correctly implemented the guarantees made by my interface? I've already supplied enough information at design-time that the machine could do this testing for me.

Chris makes a good point that even with an unambiguous specification--in a general purpose language--we still have to manually develop unit tests to assert a lot of the underlying behavioral guarantees.

With general purpose languages like C++ and Java, the degree of semantics we can express in the interface definition is limited by the lowest common denominator--the properties of the language that also make them Turing-complete?

Alas, could this be the calling of domain-specific languages?

Ryan said...

I agree with jomofo's remarks, although I do understand the distaste some hold for certain strongly-typed languages.

However, I think there is a middle ground that allows us to have our cake (with strongly typed languages) and eat it too (eliminate the verbose syntax). I've been hacking with Haskell lately, and have come to love its type inference mechanism.

For those not familiar with Haskell, it is a strongly typed language that catches nearly all typing errors at compile time. But it does this via inference -- I don't necessarily have to declare a function explicitly to accept a String, but if a use a parameter as a String, the type is inferred. Consumers that don't pass a String get a compile-time error.

Haskell does have a facility for defining type information if it cannot be inferred from the code, but in many circumstances this isn't necessary.

Regarding the topic of the post, a documentation engine (HaskellDoc?) could infer the type information from the code and generate appropriate information for consumers, without require the programmer to redundantly define it in the code.

I'm not saying that Haskell is the ultimate solution to this, but it does have some innovative ideas.

Mathias Gaunard said...

"This is why Python is better than Java is better than C++ is better than assembly, right?"

C++, even though it is the worst programming language in your list, is the one that will provide the best correctness and robustness out of those at compile time.

It is also the only one that supports static typing, dynamic typing, no typing and static duck typing at the same time.