Friday, May 13

Some Reasons Software Sucks

Pharaohmagnetic and I were talking today about Paul De Palma's pseudomemoir The Software Wars, published in the Winter 2005 issue of American Scholar. (I'd give a link, but it's subscription only; Pharaoh receives the actual bound journal in the mail every season. Quaint, isn't it.) De Palma spends most of the article war-storying a few of his experiences in professional software development:
My company's sin went beyond working with complex, poorly understood tools. Neither the tools nor our system existed. The database manufacturer had a delivery date and no product. Their consultants were selling us a nonexistent system. To make their deadline, I am confident they hired more programmers and experimented with unproven software from still other companies with delivery dates but no products. And what of those companies? You get the idea.

His argument, paraphrased masterfully by Pharaoh, is this: software projects can be ruined by foolish business decisions in ways that tangible-goods projects cannot.

That's a good start. There's a lot of meat on that idea, a lot of tasty morsels to chew on. But I'd argue that it misses the point subtly: foolish business decisions can ruin any project. The difference is that the newness of software as a whole means that western corporate culture doesn't yet have an instinctive understanding of what constitutes a foolish business decision for a software project. Admittedly, we're doing better now than we were in the bubble days, when The New Economy (it doesn't exist) meant that old rules didn't apply (they do) and mindshare is now the most important thing (it isn't).

Foolish business decisions on software projects have publicly ruined many a company. But market forces haven't finished culling the truly fat-headed, idiotic, soul-crushingly incompetent software project managers from the talent pool. Add to this the fact that if you're really lucky, the idiotic business decisions you made will only manifest after your product goes to market and its manager gets promoted, and you've got a recipe for a continuing culture of Really Bad Products in the software world.

So one of the reasons software sucks is that the people who make it suck.

Another reason is that software, since it's just a collection of bits rather than a tangible good, is pretty hard to test destructively. So its bounds of correct operation are really hard to determine without actually, y'know, releasing it into the market and waiting for it to break. Engineers can build scale models of their bridges and, within certain well-understood limits, be confident that the full-scale bridge will act the same way as the small one did. DePalma touches on this, but he fails to draw this distinction:

A few years ago, an IBM consulting group determined that of twenty-four companies surveyed, 55 percent built systems that were over budget; 68 percent built systems that were behind schedule; and 88 percent of the completed systems had to be redesigned. Try to imagine the same kind of gloomy numbers for civil engineering: three-quarters of all bridges carrying loads below specifica­tion; almost nine of ten sewage treatment plants, once completed, in need of redesign; one-third of highway projects canceled because tech­nical problems have grown beyond the capacity of engineers to solve them. Silly? Yes. Programming has miles to go before it earns the title "software engineering."

Since a software engineering department's output is itself the model of the production system (we deliver the code you need to execute to generate the running process, not the running process itself), they don't have the luxury of modelling as a mechanism to test assumptions about the systemic qualities of their product.

Additionally, OSes and libraries suffer the same problems of obscurity and untestability. So another reason software sucks is that the tools used to make it suck, which I mentioned before. To summarize that post, software develpment tools don't give you intuitive ways for determining correctness; they require neocortical intelligence as opposed to reptillian-brain intelligence.

A third reason: perhaps consumer demand is not strong enough. The consuming public is already quite used to the general crappiness of software - when a system fails, sometimes they get mad, but most of the time they utter a sigh of resignation, pick up the pieces, and accept the fallibility of the product in question as par for the course. But there's a finer point here: this overall trend of Software Consumer Resignation contributes to an atmosphere within the developer community where there is strangely low pressure for true innovation. Particular trendbucking examples are iPod and Google - no one really knew just how good a portable music player or a search engine could be before they hit the market. People were satisfied with Walkmen and Altavista; they didn't know that vastly superior products could exist, let alone did they expect or demand them.

A fourth reason, which De Palma illustrates quite well, is the Military Procurement Analogy. So well, in fact, that I'll block-quote this big excerpt and let it speak for itself:
The characteristics of software often cited as leading to failure-its complexity, its utter plasticity, its free-floating nature, unham­pered by tethers to the physical world-make it oddly, even paradoxically, similar to the practice of military procurement.

Late in 1986 James Fallows wrote an article analyzing the Challenger ex­plosion for the New York Review of Books. Instead of concentrating on the well-known O-ring problem, he situated the failure of the Challenger in the context of military procurement, specifically in the military's inordinate fondness for complex systems. This fondness leads to stunning cost over­runs, unanticipated complexity, and regular failures. It leads to Osprey air­craft that fall from the sky, to anti-missile missiles for which decoys are easy to construct, to FA-22 fighters that are fabulously over budget. The litany goes on. What these failures have in common with the Challenger is, Fallows argues, "military procurement disease," namely, "over-ambitious schedules, problems born of too-complex design, shortages of spare parts, a`can-do' attitude that stifles embarrassing truths ('No problem, Mr. President, we can lick those Viet Cong' ), and total collapse when one component unexpect­edly fails."Explanations for this phenomenon include competition among the services; a monopoly hold by defense contractors who are building, say, aircraft or submarines; lavish defense budgets that isolate military pur­chases from normal market mechanisms; the nature of capital-intensive, laptop warfare where hypothetical justifications need not-usually can­not-be verified in practice; and a little-boy fascination with things that fly and explode. Much of this describes the software industry too.

Fallows breaks down military procurement into five stages:
  1. The Vegematic Promise, wherein we are offered hybrid aircraft, part heli­copter, part airplane, or software that has more features than could be learned in a lifetime of diligent study. Think Microsoft Office here.
  2. The Rosy Prospect, wherein we are assured that all is going well. I call this the 90 percent syndrome. I don't think I have ever supervised a project, ei­ther as a software manager overseeing professionals or as a professor over­seeing students, that was not 90 percent complete whenever I asked.
  3. The Big Technical Leap, wherein we learn that our system will take us to regions not yet visited, and we will build it using tools not yet developed. So the shuttle's solid-fuel boosters were more powerful than any previously developed boosters, and bringing it all back home, my system was to use a database we had never used before, running on a computer for which a ver­sion of that software did not yet exist.
  4. The Unpleasant Surprise, wherein we learn something unforeseen and, if we are unlucky, calamitous. Thus, the shuttle's heat-resistant tiles, all 31,000 of them, had to be installed at the unexpected rate of 2.8 days per tile, and my system gobbled so much disk space that there was scarcely any room for data.
  5. The House of Cards, wherein an unpleasant surprise, or two, or three, causes the entire system to collapse. The Germans flanked the Maginot Line, and in my case, once we learned that our reliance on a promised data­base package outstripped operating-system limits, the choices were: one, wait for advances in operating systems; two, admit a mistake, beg for for­giveness, and resolve to be more prudent in the future; or, three, push on until management pulls the plug.
We see the effects of big-ticket government procurement all the time; here's a boingboing link to a relevant example.

Now, with the problems identified, how do we proceed? The field of software engineering is still quite young. Bridge-building, too, went through an industrial-scale learning period - Pharaohmagnetic points out the Firth of Tay, Quebec Bridge, and Tacoma Narrows as oft-cited examples of failure that all civil engineers study in their freshmen courses. Imagine a future where standards for software projects mirror those for bridges; a future were software and the tools to make it are so intelligent, it codes itself.

On second thought, let's not just imagine it. Let's make it happen.

No comments: