Early error detection is paramount

Is early error detection, while developing software, really all that important?

This question has probably been asked and answered a thousand times over in the last 50 years.  Let me try to rehash the argument in my own words.

Catch the error while the cement is wet

Construction, of the brick and mortar variety, sometimes provides an apt analogy for software development, so I am going to give that a whirl.

See if you can correlate the story I lay out below, to the various stages in a software development task – DEV, QA, Bug Fixing, and the Aftermath.

Say you decide to build a house. You create some kind of design, and completely construct your house.

After the construction is all finished, and only after this, you call in an inspector to see if the construction is up to code. The inspector finds that your electrical wiring is the wrong gauge.  You must change it.

The wiring is inside the walls.  You have to tear into the walls to get to the wiring.

You have already spent a lot of money and time on the construction.  You want to move in already.  The extra expense is a burden on the pocket book, and the mind.  You are not at your patient, nor enthusiastic best.

The builder had other projects scheduled.  He wants to be done with your house yesterday. He can no longer giving his best attention to your problems.

Some of the folks that worked on this house are needed on the county commissioner’s lake side cottage. The builder brings in some temporary help to make the fixes. These are snot-nosed college kids, who don’t know many of the small, but consequential technical decisions that went into your house’s construction.  They are going to trip over these and make mistakes. Worse, these kids know they will never see you and your house after this summer.

After the wiring is changed, you notice that a couple of windows do not close well. The lights in the stairwell flicker randomly, but noticeably.  The re-painting of the walls in your guest room is not the right shade.  By this time you have no other place to live, so you suck it up, and move in.

After you move in you notice that your water heater does not work well with the new wiring.  You have to replace the water heater.   More aggravation; more time wasted; more expense; send in the plumber.

There is worse.  While monkeying with the walls and the wiring, a construction worker accidentally rammed a 100 pound sander into a load bearing beam.  It now has a crack it in.  No one notices.

Let’s see how this correlates to software development.

DEV

Say you decide to build a house. You create some kind of design, and completely construct your house.

Listen pilgrim, I just write code. I don’t test.

QA

After the construction is all finished, and only after this, you call in an inspector to see if the construction is up to code. The inspector finds that your electrical wiring is the wrong gauge. You must change it.

This happens all the time in enterprise software development. Developers do not test their code effectively. Infrastructure, which enables developers to adequately test the code they write, often does not exist. Everyone, including management, is happy to leave serious testing till after all of the code is turned in.

The bug fixing

The wiring is inside the walls.  You have to tear into the walls to get to the wiring.

The bug is buried somewhere deep in a few thousand lines of code that you blithely turned in. You go digging, make a change some place, with little knowledge of everything else that might now be affected by your change. It is hard to know, there is too much code, code that you don’t even remember exists. The bug fix is a risk.

You have already spent a lot of money and time on the construction.  You want to move in already.  The extra expense is a burden on the pocket book, and the mind.  You are not at your patient nor enthusiastic best.

The builder had other projects scheduled.  He wants to be done with your house yesterday. He is no longer giving his best attention to your problems.

I’ve seen this in just about every large project I have been part of. Developers use all of a sprint to write code and turn it in. Testing of this code happens in the next sprint, when both the stakeholders and the developers are also assigned to the development tasks scheduled for this second sprint. Nobody is able to bring their best selves to the bug fixing.

Some of the folks that worked on this house are needed on the county commissioner’s lake side cottage. The builder brings in some temporary help to make the fixes. These are snot-nosed college kids, who don’t know many of the small, but consequential technical decisions that went into your house’s construction.  This is going to trip them up, and they will make mistakes.   Worse, these kids know they will never see you nor your house after this summer.

You designed and wrote the original code. However bugs are assigned to someone else, who knows little of the business requirements, the design decisions that went into the solution, and the contours of the code base you created.

Sometimes this someone else is a consultant. And we know consultants can be a mixed blessing, don’t we?

The new and not so improved aftermath

After the wiring is changed, you notice that a couple of windows do not close well. The lights in the stairwell flicker randomly, but noticeably.  The re-painting of the walls in your guest room is not the right shade.  By this time you have no other place to live, so you suck it up, and move in.

A bug fix can fundamentally improve the solution you constructed. Or it may just be a jerry-rigged whatchamacallit that Rube Goldberg would look down his nose at. Often it is the latter, which gets you past today’s problem, and sows the seeds for several others.

But you have no choice. The show must go on.

After you move in you notice that your water heater does not work well with the new wiring.  You have to replace the water heater.   More aggravation; more time wasted; more expense; send in the plumber.

There is worse.  While monkeying with the walls and the wiring, a construction worker accidentally rammed a 100 pound sander into a load bearing beam.  It now has a crack it in.  No one notices.

Like I mentioned earlier, you often have no idea what damage you did while making your bug fix.

Lesson Learned

You want to catch the wiring issue in DEV:

  • Before a whole bunch of stuff was built around it
  • Before you build a whole bunch of stuff that depends on the error
  • While the construction crew was focused exclusively on this task
  • When the folks who made the error are available to rectify the error
  • When you have the least risk if you make another mistake

Knowing a tool vs knowing software development

Give an enterprise developer a top notch table saw, and a 500 dollar power drill.

Give that developer all the training he wants on those tools.

Ask the developer to build a chair.

His chair will come out with 4 legs of different lengths, cracks in the seat, and a couple of nails sticking out.

The developer knows how to use his fancy tools, but he does not know how to build a chair.

That is the difference between knowing how to use a tool, and knowing how to do software development.

They are two different bodies of knowledge, two different sets of skills.

By the way, that chair has business value.  If you must sit, and that chair is all you have, you will sit on that chair, carefully, and with a few choice curses.  I am willing to bet that this is how business folks view most enterprise software that they are saddled with.

Bloody-minded software development

I’ve been pre-occupied lately, with the familiar notion that ‘action‘ has its moment.  How much planning did the chicken do, before crossing the road?

I have seen two kinds of software development outfits.

One was all chaotic action, with apparently little method, yet who always managed to deliver something. Some thing went into production.   Some thing of some value was up and running.  The plan, if one can be said to exist, was often brutal but effective, like clearing a minefield by having your platoon walk across it.  Quality was an alien concept, little more than a pretty thought.

The other development shop is all talk, with little to show for it.  Good people, talented, even competent individuals, who collectively can’t seem to program their way out of a paper bag.  They have worked two, three years on something, nothing of which is in production.  The waste is heart-breaking. Some thing, some essential bloody-mindedness, is missing.

If I were a business person, I would have to pick the former team every time.

Delivery is an essential, like the food you put in front of a starving man. Quality provides long term value, with long term benefits, and requires constant application – it is putting healthy food in front of a starving man, and continuing to feed him healthy food, even after he stops starving.

 

 

Who verifies the blueprint?

You have a business problem (sometimes called a ‘business requirement’).   Someone devises a solution.  You create a blueprint (sometimes called a ‘specification‘) of the solution.   Then you construct the solution that the blueprint specifies.  This workflow suggests that there are at least two things to verify.

  • Does my construction adhere to the blueprint?
  • Is the blueprint correct in the first place?

Here is an example.

Business Requirement

To determine the monies that we may have to refund a customer (say an auto-insurance holder), perform the following calculation.

The monies that the customer owes at the moment
minus
the monies that the customer has already paid us

If the customer has paid us more than she owes at the moment, the customer is due a refund.

However, we cannot count all of the payments that we have received from the customer.  There are rules that tell us which payments must be ignored while calculating refunds.  Here is one of them.

The ‘My Dog Died’ rule

Customers that are dis-enrolled because they failed to pay their premiums, can ask for and receive a sort of ‘grace period’, of usually 2 to 3 months, in which they can catch up (or perhaps even pay ahead).  If they hit all the payment targets within this period, the customer is re-instated.  Let’s call this the ‘My Dog Died, So Have Pity On Me‘ rule.

Specification of the solution – The Blueprint

So how will we satisfy the business requirement?  What are we going to build?

The brain trust (the business analyst, the architect, the DBA, the resident loudmouth), go off into their huddle, and produce these instructions for the construction crew (a developer, and a tester).

  • Using already known methods, determine if the customer is dis-enrolled because of failure to pay premiums.
  • Determine if the customer was granted a ‘My Dog Died‘ grace period, and if so how long that period is for.  In particular, look for the following attributes.
    • An attribute called ‘GriefStricken‘.  Its ‘Effective Date’ is the start of the ‘My Dog Died’ period.
    • An attribute called ‘GriefStrickenExpiration‘,  Its ‘Effective Date’ is the end of the ‘My Dog Died’ period.
  • When calculating refunds, ignore all payments received during the ‘My Dog Died’ period.

Verification

Does construction match blueprint

As I mentioned earlier, in my environment the construction crew consists of a developer, and a tester.

The developer writes computer code that implements the blueprint.  The developer and the tester verify that the computer code does indeed do what the blueprint lays out.

They discover bugs – mismatches between the construction and the blueprint.  The developer fixes the bugs – removes the mismatches.  At some reasonable point, the construction crew turns in the solution.

But.

Who verifies the blueprint

Yea, you guessed it – the blueprint was wrong.

It turns out that the ‘My Dog Died’ period is stipulated to start on the day that the customer is dis-enrolled.

What we thought was the start date, the ‘EffectiveDate’ of the ‘GriefStricken’ attribute, is only the day on which the customer was approved for the ‘My Dog Died’ grace process, which is often several days after the dis-enrollment.

There was no way for the developer to know this.  The tester did not know this either.  The construction crew only knows what is in the specification of the solution.

Yet, this is a significant bug.