Early error detection is paramount

Is early error detection, while developing software, really all that important?

This question has probably been asked and answered a thousand times over in the last 50 years.  Let me try to rehash the argument in my own words.

Catch the error while the cement is wet

Construction, of the brick and mortar variety, sometimes provides an apt analogy for software development, so I am going to give that a whirl.

See if you can correlate the story I lay out below, to the various stages in a software development task – DEV, QA, Bug Fixing, and the Aftermath.

Say you decide to build a house. You create some kind of design, and completely construct your house.

After the construction is all finished, and only after this, you call in an inspector to see if the construction is up to code. The inspector finds that your electrical wiring is the wrong gauge.  You must change it.

The wiring is inside the walls.  You have to tear into the walls to get to the wiring.

You have already spent a lot of money and time on the construction.  You want to move in already.  The extra expense is a burden on the pocket book, and the mind.  You are not at your patient, nor enthusiastic best.

The builder had other projects scheduled.  He wants to be done with your house yesterday. He can no longer giving his best attention to your problems.

Some of the folks that worked on this house are needed on the county commissioner’s lake side cottage. The builder brings in some temporary help to make the fixes. These are snot-nosed college kids, who don’t know many of the small, but consequential technical decisions that went into your house’s construction.  They are going to trip over these and make mistakes. Worse, these kids know they will never see you and your house after this summer.

After the wiring is changed, you notice that a couple of windows do not close well. The lights in the stairwell flicker randomly, but noticeably.  The re-painting of the walls in your guest room is not the right shade.  By this time you have no other place to live, so you suck it up, and move in.

After you move in you notice that your water heater does not work well with the new wiring.  You have to replace the water heater.   More aggravation; more time wasted; more expense; send in the plumber.

There is worse.  While monkeying with the walls and the wiring, a construction worker accidentally rammed a 100 pound sander into a load bearing beam.  It now has a crack it in.  No one notices.

Let’s see how this correlates to software development.

DEV

Say you decide to build a house. You create some kind of design, and completely construct your house.

Listen pilgrim, I just write code. I don’t test.

QA

After the construction is all finished, and only after this, you call in an inspector to see if the construction is up to code. The inspector finds that your electrical wiring is the wrong gauge. You must change it.

This happens all the time in enterprise software development. Developers do not test their code effectively. Infrastructure, which enables developers to adequately test the code they write, often does not exist. Everyone, including management, is happy to leave serious testing till after all of the code is turned in.

The bug fixing

The wiring is inside the walls.  You have to tear into the walls to get to the wiring.

The bug is buried somewhere deep in a few thousand lines of code that you blithely turned in. You go digging, make a change some place, with little knowledge of everything else that might now be affected by your change. It is hard to know, there is too much code, code that you don’t even remember exists. The bug fix is a risk.

You have already spent a lot of money and time on the construction.  You want to move in already.  The extra expense is a burden on the pocket book, and the mind.  You are not at your patient nor enthusiastic best.

The builder had other projects scheduled.  He wants to be done with your house yesterday. He is no longer giving his best attention to your problems.

I’ve seen this in just about every large project I have been part of. Developers use all of a sprint to write code and turn it in. Testing of this code happens in the next sprint, when both the stakeholders and the developers are also assigned to the development tasks scheduled for this second sprint. Nobody is able to bring their best selves to the bug fixing.

Some of the folks that worked on this house are needed on the county commissioner’s lake side cottage. The builder brings in some temporary help to make the fixes. These are snot-nosed college kids, who don’t know many of the small, but consequential technical decisions that went into your house’s construction.  This is going to trip them up, and they will make mistakes.   Worse, these kids know they will never see you nor your house after this summer.

You designed and wrote the original code. However bugs are assigned to someone else, who knows little of the business requirements, the design decisions that went into the solution, and the contours of the code base you created.

Sometimes this someone else is a consultant. And we know consultants can be a mixed blessing, don’t we?

The new and not so improved aftermath

After the wiring is changed, you notice that a couple of windows do not close well. The lights in the stairwell flicker randomly, but noticeably.  The re-painting of the walls in your guest room is not the right shade.  By this time you have no other place to live, so you suck it up, and move in.

A bug fix can fundamentally improve the solution you constructed. Or it may just be a jerry-rigged whatchamacallit that Rube Goldberg would look down his nose at. Often it is the latter, which gets you past today’s problem, and sows the seeds for several others.

But you have no choice. The show must go on.

After you move in you notice that your water heater does not work well with the new wiring.  You have to replace the water heater.   More aggravation; more time wasted; more expense; send in the plumber.

There is worse.  While monkeying with the walls and the wiring, a construction worker accidentally rammed a 100 pound sander into a load bearing beam.  It now has a crack it in.  No one notices.

Like I mentioned earlier, you often have no idea what damage you did while making your bug fix.

Lesson Learned

You want to catch the wiring issue in DEV:

  • Before a whole bunch of stuff was built around it
  • Before you build a whole bunch of stuff that depends on the error
  • While the construction crew was focused exclusively on this task
  • When the folks who made the error are available to rectify the error
  • When you have the least risk if you make another mistake

Knowing a tool vs knowing software development

Give an enterprise developer a top notch table saw, and a 500 dollar power drill.

Give that developer all the training he wants on those tools.

Ask the developer to build a chair.

His chair will come out with 4 legs of different lengths, cracks in the seat, and a couple of nails sticking out.

The developer knows how to use his fancy tools, but he does not know how to build a chair.

That is the difference between knowing how to use a tool, and knowing how to do software development.

They are two different bodies of knowledge, two different sets of skills.

By the way, that chair has business value.  If you must sit, and that chair is all you have, you will sit on that chair, carefully, and with a few choice curses.  I am willing to bet that this is how business folks view most enterprise software that they are saddled with.

Bloody-minded software development

I’ve been pre-occupied lately, with the familiar notion that ‘action‘ has its moment.  How much planning did the chicken do, before crossing the road?

I have seen two kinds of software development outfits.

One was all chaotic action, with apparently little method, yet who always managed to deliver something. Some thing went into production.   Some thing of some value was up and running.  The plan, if one can be said to exist, was often brutal but effective, like clearing a minefield by having your platoon walk across it.  Quality was an alien concept, little more than a pretty thought.

The other development shop is all talk, with little to show for it.  Good people, talented, even competent individuals, who collectively can’t seem to program their way out of a paper bag.  They have worked two, three years on something, nothing of which is in production.  The waste is heart-breaking. Some thing, some essential bloody-mindedness, is missing.

If I were a business person, I would have to pick the former team every time.

Delivery is an essential, like the food you put in front of a starving man. Quality provides long term value, with long term benefits, and requires constant application – it is putting healthy food in front of a starving man, and continuing to feed him healthy food, even after he stops starving.

 

 

Who verifies the blueprint?

You have a business problem (sometimes called a ‘business requirement’).   Someone devises a solution.  You create a blueprint (sometimes called a ‘specification‘) of the solution.   Then you construct the solution that the blueprint specifies.  This workflow suggests that there are at least two things to verify.

  • Does my construction adhere to the blueprint?
  • Is the blueprint correct in the first place?

Here is an example.

Business Requirement

To determine the monies that we may have to refund a customer (say an auto-insurance holder), perform the following calculation.

The monies that the customer owes at the moment
minus
the monies that the customer has already paid us

If the customer has paid us more than she owes at the moment, the customer is due a refund.

However, we cannot count all of the payments that we have received from the customer.  There are rules that tell us which payments must be ignored while calculating refunds.  Here is one of them.

The ‘My Dog Died’ rule

Customers that are dis-enrolled because they failed to pay their premiums, can ask for and receive a sort of ‘grace period’, of usually 2 to 3 months, in which they can catch up (or perhaps even pay ahead).  If they hit all the payment targets within this period, the customer is re-instated.  Let’s call this the ‘My Dog Died, So Have Pity On Me‘ rule.

Specification of the solution – The Blueprint

So how will we satisfy the business requirement?  What are we going to build?

The brain trust (the business analyst, the architect, the DBA, the resident loudmouth), go off into their huddle, and produce these instructions for the construction crew (a developer, and a tester).

  • Using already known methods, determine if the customer is dis-enrolled because of failure to pay premiums.
  • Determine if the customer was granted a ‘My Dog Died‘ grace period, and if so how long that period is for.  In particular, look for the following attributes.
    • An attribute called ‘GriefStricken‘.  Its ‘Effective Date’ is the start of the ‘My Dog Died’ period.
    • An attribute called ‘GriefStrickenExpiration‘,  Its ‘Effective Date’ is the end of the ‘My Dog Died’ period.
  • When calculating refunds, ignore all payments received during the ‘My Dog Died’ period.

Verification

Does construction match blueprint

As I mentioned earlier, in my environment the construction crew consists of a developer, and a tester.

The developer writes computer code that implements the blueprint.  The developer and the tester verify that the computer code does indeed do what the blueprint lays out.

They discover bugs – mismatches between the construction and the blueprint.  The developer fixes the bugs – removes the mismatches.  At some reasonable point, the construction crew turns in the solution.

But.

Who verifies the blueprint

Yea, you guessed it – the blueprint was wrong.

It turns out that the ‘My Dog Died’ period is stipulated to start on the day that the customer is dis-enrolled.

What we thought was the start date, the ‘EffectiveDate’ of the ‘GriefStricken’ attribute, is only the day on which the customer was approved for the ‘My Dog Died’ grace process, which is often several days after the dis-enrollment.

There was no way for the developer to know this.  The tester did not know this either.  The construction crew only knows what is in the specification of the solution.

Yet, this is a significant bug.

Loose the debugger

Loose the debugger

My ideal development team would not use step-through debuggers.  If I am responsible for mentoring newbie programmers, my first rule would be – no step-through debuggers.  My ideal IDE would include all the editing, analysis, navigation, and refactoring features, that modern incarnations like Eclipse and IntelliJ have, but without the step-through debugger.

Does this make any sense at all?

What is a debugger used for?  It is used to understand what a piece of code is doing.  The code might be doing something wrong, aka a bug.   The debugger can help us understand how a bug came to be.  If there were no debugger, what can you do?

Well, read the code.

 

Less Code

If reading code was the only way to understand the code, wouldn’t you write less code?  You will.  This will act as a disincentive to proliferation by copy/paste.    This will be an incentive to learn to ‘not repeat yourself‘ (the DRY principle).

 

Intelligible Code

If reading code was the only way to understand the code, wouldn’t you write code that is easy to understand?   You will learn the difference between the code, and the intent of the code.  You look at code, and ask yourself, WTF, why was this code written?  That.

Is there any need for code to be a puzzle?  Here is an example.

protected <I, O> O mapIO(Class<O> clo, I in) {
   try {
       Class<?> cli = in.getClass();
       O out = clo.newInstance();

       for (Method mo : clo.getMethods()) {
           if ((mo.getName().startsWith("set")) && 
               (mo.getParameterTypes().length == 1)) {
               Class<?> pr = mo.getParameterTypes()[0];
               Method mi = null;
               Object ob = null;
               if ((BaseList.class.isAssignableFrom(pr)) || 
                   (Collection.class.isAssignableFrom(pr))) {
                   mi = null;
               } else {
                    try {
                        mi = cli.getMethod("get" + mo.getName().substring(3));
                    } catch (NoSuchMethodException e) {
                          try {
                              mi = cli.getMethod("is" + mo.getName().substring(3));
                          } catch (NoSuchMethodException e2) {
                    }
               }
               if (mi != null && pr.isAssignableFrom(mi.etReturnType())) {
                   ob = mi.invoke(in);  
               } else {
                   try {
                       ob = pr.getConstructor().newInstance();
                   } catch (NoSuchMethodException e)  {
                       ob = null;
                   }
               }
               mo.invoke(out, ob); 
           }
       }
       return out;
   }
   catch (Exception e) {
       ......
   }
}

Any idea what the above code does?   Right.  I looked at it and my eyes started to swim.   It is in fact, a decent method.  It is short, and once you decipher it, you see that it does one simple thing.  The rub, of course, is having to decipher it.  I had to spend some time digging into it, doing a little archaeology as it were, to discover the intent of the code.  So what does this code do anyway?

Given an input object, and the class of the output, instantiate, and 
initialize an output object, in the following manner.  

Scalar properties, which exist in the input object, are copied over. 

All other properties - properties that do not exist in the input object, 
and vector properties (collections), are initialized with the 
default no-arg constructor.  

That's it.  

This is the 'intent' of the code.   This is what the code is supposed to 
accomplish.  This is why the code exists.

Now, why can’t the code just say what it means?  Something like this.

protected <I, O> O prepareOutput(Class<O> outputClass, I input) {
   try {
       List<Field> propertiesToBeCopied = 
                     getMatchingScalarProperties(outputClass, input);

       List<Field> propertiesToBeInitialized = 
                     getRestOfTheProperties(outputClass, propertiesToBeCopied);

       O output = outputClass.newInstance();
       copyProperties(output, input, propertiesToBeCopied);
       initializeProperties(output, propertiesToBeInitialized);

       return output;
   }
   catch (Exception e) {
       ......
   }
}

This alternative code is definitely less efficient than the original version.  On the other hand, what this code is about, is fairly obvious.  Even if it turns out that I must optimize this code, I will have more confidence in that attempt, because I start with a better view of what the code is supposed to accomplish.

Wait, I see how we can make this more efficient, without sacrificing the clarity we are heading towards.

protected <I, O> O prepareOutput(Class<O> outputClass, I input) {
   try {

       O output = outputClass.newInstance();
       for (Field field : clo.getDeclaredFields()) {
           if (isMatchingScalarProperty(output, input, field) {
               copyProperty(output, input, field);
           else {
               initializeProperty(output, field);
           }
       }

       return out;
   } catch (Exception e) { 
      ...... 
   }

}

As a friend of mine says, whaddyathink?  Notice, this version looks a lot like the English description of the intent of the code.  I just translated English to Java.   Give me code like this, and I don’t need a step-through debugger.

This is also a good example of the oldest truth in design (or writing, for that matter) – you almost never get it right the first time.

Also, see the point Martin Fowler makes about about code that requires comments.  It is at the end of the ‘Bad Smells in Code‘ chapter, of his refactoring book.

 

What you see is what you get

Bob Martin, in his book, “Clean Code: A handbook of Agile Software Craftsmanship“, quotes Ward Cunningham’s notion of clean code – “You know you are working on clean code when each routine you read turns out to be pretty much what you expected“.

You loose the step-through debugger, you get this.

Haven’t you had to deal with a method that had some simple name like getDriversLicense, but went on to do everything from the groceries to changing your baby’s diaper, and in some obscure corner, almost as an after thought, it retrieved your driver’s license.  If the method, getDriversLicense, did just that and nothing else, you could skip reading the content of that method.

The more you are forced to read code, the more you will write methods that do one small thing, just the thing that the method’s signature suggests.

 

Developer tested

Of course that cleanly written, getDriversLicense method, could have bugs.   How do you increase your confidence in the getDriversLicense method?   As you read the code, you read a call to getDriversLicense, and say, okay, great, I know that works, and move on.  You don’t want to have to also read that method’s definition.

You know the answer.  How do you produce code that folks can implicitly trust?  You test the daylights out of the code that you deliver.   Automated developer tests.

Loose the debugger, and you will learn to hate the lack of developer testing.

 

Log your way out of trouble

Regardless of how clearly your code is written, there will be times when you will want hard evidence of what the code is doing to the data.   In the absence of a step-through debugger, you will necessarily have to rely on logging. You can understand what your code is doing by logging inputs, outputs, and execution paths, which trace the code’s work.

Any enterprise system worth its salt must have good tactical logging anyway.  Clear, and configurable logs, is useful for system maintenance, and business monitoring.

Loose the debugger, and you will be forced to nail down your application’s logging.

 

And so,

Do any of these alternatives to step-through debugging sound like a bad thing?  No.  Taken at face value, each of these alternatives, and in fact all of them together, add a lot of value, which the step-through debugger does not.

Think about it another way.

Why do you need the step-through debugger.  9 times out of 10, you need it to negotiate bad code.   If you are starting from scratch, if you do not have to deal with legacy code, stay away from the debugger.  This will force you to learn to write cleaner code.

Reading code makes you feel the pain caused by poor code.  Using a step-through debugger helps you turn a blind eye to poor code.  At its worst, the step-through debugger enables poor code.

 

A benchmark?

Say I am building a new software team, my own outfit even.   I almost think that the missing debugger can separate folks that I want to rely on, from folks that I am sort of forced to rely on.  At the minimum, I want developers that can learn to be productive without the step-through debugger.  If you cannot live without that crutch,  hmm, well, ….. I don’t know.

I haven’t met a physician yet, who likes EMR software

I only have anecdotal evidence, but every physician I talk to, hates the EMR software he has to deal with.

Ye old Enterprise IT

One particularly tech savvy young resident in an area hospital, says, “… too many clicks, too many clicks“. These are folks that live in their iPhones and iPads every spare moment they have.

As he did book-keeping on his laptop, I peaked over his shoulder at the EMR program he was using.  Also, a couple of months ago, I had occasion to spend a few days at Johns Hopkins, helping take care of a family member, and I took every chance to watch the staff at work at their monitors, which now seem to be stashed into every available corner (each patient’s room had a desktop, monitor, keyboard, and mouse).  I was a little taken aback to see that they were all using Windows desktop applications.  Seen through eyes drunk on modern touch-screen mobile devices, these apps look old-fashioned – poor, tired, windows, boxes, lists and buttons, huddled together in an unappealing jumble.

My doctor friend said his EMR was cumbersome to use, and it took too long to enter all the data that he is prompted for.  He seemed to unconsciously separate information that is vital to patient care, from information that is just “for billing“. Often, he enters only the patient care information that he thinks is necessary, and ignores what he called, “fluff“.

The word “fluff” struck a chord.  Design for mobile first.  That forces you to identify the “fluff“, and drop it.

Essentially, what I saw was classic Enterprise IT interfaces.  They serve some bare business purpose, with little thought to ease of use.  Users, doctors, and nurses, in the midst of their high-stress workdays, just deal with it, because, well, they have to.

There were more tell-tale signs of Enterprise IT.

"Yea, they improve things, but everybody hates the changes. You 
manage to learn one thing, and then they make you learn something 
new all over again."
"They never ask the doctors.  They try to keep us away from what 
they are doing.   They build something and show it to us, and it 
is not great, but then it is too late to change anything."

Impenetrable domain knowledge

The doctor is looking at lab results.  He sees that haemoglobin is low.   In that situation he is taught to then look at past iron levels, and vitamin levels, which are results of other tests.   In the interface he showed me, the iron levels, and vitamin levels were hard to find.  The lab results were a long Excel like table, and he had to scroll far and wide to find them.  They were not close to the haemoglobin levels.  The interface was not smart enough to offer the iron levels and vitamin levels when it detects that the haemoglobin is low. The doctor said he sometimes surrenders to fatigue and irritation and simply orders the tests for iron levels and vitamin levels again.   Of course that is duplicated effort for someone, not to mention a wasted expense.

The business analyst who modeled the diagnostic processes obviously did not know how medical personnel are expected to react to low haemoglobin.    The UI designer did not know that a relationship exists between low haemoglobin, and iron, and vitamin levels.   The interface they built does not reflect that knowledge.  The EMR software did not make the physician’s job easier.  In fact, it made the whole process more inefficient, and wasteful.

There must be so many other little use-cases like this.  I imagine the patient care domain is vast, varied, and complex.   A doctor spends years acquiring all that training, and knowledge.   How can you expect a business analyst, or UI designer to absorb all that information?   Even in the best of circumstances, there are so many ways for domain knowledge to get lost in the translation, from business user through business analyst, to system designer, and finally to the developer.  I imagine that this exercise is even more error-prone in a complex and information-heavy field like patient care.

Are we barking up entirely the wrong tree?  Is it a fool’s errand to try to model the patient care domain in order to produce a structured interface that makes the doctor’s job easier?

 

A simple-minded EMR

I wonder if something like this would be a viable EMR system?

A universal key

You must be able to uniquely identify a patient, a human being.   In other words, you need a universal key for their records, which you can apply at any health-care organization they are visiting.  Say, fingerprints.  Would that work?

A simple collection of documents

Medical records themselves are just a collection of documents.  They can be anything at all – plain text, HTML, WORD, EXCEL, PDF, audio, photos, video, etc.  Each provider simply creates whatever records makes them happy, in any format at all.

Each document is characterized by very simple, non-medical meta-data.

  • Who created them?
  • When were they created?
  • etc.

The documents are all stored together against that universal key.   You have the patient’s fingerprint, you have her records.

Searchable

You need the ability to search through a patient’s medical records – the collection of heterogenous documents. You must be able to return results ranked by relevance.

Transferable

You must have the ability to simply transfer the records of a particular person between providers.  And even to the patient herself.   This is a simple transfer, because there is little structure to speak of.   It could be as simple as an email with attachments.

Start Minimal

That’s it.   There is your minimal, but possibly complete, EMR.  In capability, it probably matches what folks are able to do with paper records, with added sugar due to the fact that the records are native citizens of the digital world.

The system is simple enough that it can be quickly adopted by organizations.  This is what every institution must be able to do, off the blocks.  No complex analysis effort.   No errors that are introduced simply by the act of creating the new system.

And build on it

Once the system is on-line, slowly build on it.

  • Improve identity tracking, if necessary.
  • Improve entry, generation, and visualization of data.   As we saw above, this means applying knowledge that only physicians have.   Business analysis must come directly from the physicians.   Work on one specialty at a time.  Work on one disease at a time.  Or something like that.
  • Improve search.  Which is really ‘data analysis‘ aka ‘analytics‘ aka ‘big data‘.
  • Improve data storage, and data transfer.

And so on.

 

Many Whys

So why are EMR systems not as simple as the one described above?

Are there considerations that I still have not learned about?

Why is there so much structure, which is hard to get right, and much of it un-intuitive to physicians?   Are these related to ‘accountability‘, and ‘billing‘?

I mentioned HL7 standards, and SNOMED taxonomies to a couple of young doctors – a resident, and a fellow.  They had never heard of them.  This is medical knowledge that software engineers are basing EMR software on, and doctors have little knowledge of them?  I was about to sign up for an expensive week-long seminar on HL7.   Is that a waste?   What is going on?

I went to a meet-up of the local Health 2.0 chapter.  Folks spoke with enthusiasm about many things, but EMR systems did not come up at all.

There seem to be a lot of startups doing healthcare related work in Pittsburgh.   However, everybody I met is working on devices, and solutions for use by individuals to get control of their personal health .  No one said anything about making a doctor’s day to day work easier, and more effective.

There is something interesting going on here.   Are enterprise concerns, like EMR, simply  un-cool?  Or are they considered a done deal?   Is it too late, and well-nigh impossible, to enter the field now, and improve on matters?

 

 

 

 

 

 

 

 

 

 

User experience for the techies

What do I think of when considering ‘user experience‘ for the technical folk?   What makes it easier to do their jobs?  What helps them do their jobs better?

Techies include the following sorts of resources.

  • Programmers
  • Testers
  • Dev Ops
  • Program Operators

As always, common factors that I listed in the post, User experience in an enterprise system, apply to these resources as well.

Programmers can see everything

Interestingly, it appears that programmers occupy a special place in the enterprise.   Everything, systems and data, that any corner of the enterprise sees and uses, must be accessible to some programmer or the other.  Who builds these systems?  Programmers.  Who do you call when anything goes wrong?  Programmers.   Nothing can be hidden from the programmers.

This means that the the user experience considerations that apply for everybody else in the enterprise, which I have written about in other posts, apply to programmers as well.

One thought occurs to me.   Doesn’t this kind of all-pervasive access raise privacy issues?  Not to mention security concerns?  I wonder how enterprises deal with this.

 

A Domain Specific Language (DSL)

Often, for some reason or the other, a programmer is asked to perform a business task.  The enterprise system typically supplies business users a graphical interface for this work.  Programmers can use that interface too.  However, programmers have technical skills that typical business users may not have. Further, programmers have responsibilities other than performing business tasks, which means they are looking to save as much time as possible.

Hence, if it is possible to provide programmers an alternate interface, which is more powerful, and more performant, even if more technically complex, it would be a good thing.

In some of my previous work, programmers did considerable customer support work. Often they would have to perform some activity that meant wading through pages and pages of UI, in order to make one small change, or press one button.  I heard them lamenting the lack of a more ‘expert’ kind of interface. After all, they arguably had more technical expertise than the rank and file business users.  We could have used a scripting solution. A couple of lines of a well designed DSL (domain specific language), would have put them all in a good mood, not to mention saved a lot of time, and energy.

Say that you have to change the type of roof, on the 3rd barn of the farm that is insured by farm policy, FU-237EKS. After the change, the policy must be assigned to an underwriter for review.   In the UI, the change happens on the 8th screen of the farm policy. The assignment to the underwriter is a further 3 screens down.   Rather than schelp through all that UI, some folks I knew would have liked to execute something like this script.

FU-237EKS.barns[3].roof = shingle
underwriters['nate silver'].reviews += fp237eks

However, keep in mind that it ought to be possible to design the graphical user interface so that it provides the same kind of power, and efficiency. The recent, industry-wide emphasis on usability is all about this kind of improvement.  The point here is that alternatives to GUIs (graphical user interfaces) exist, which might more naturally fit programmers’ sensibilities.

I also wonder if there might be certain kinds of business functions that are hard to represent in a GUI.  Some complex, but one-off, workflow, which has to be created on the fly, for instance.  This is a question to explore.

 

Some items related to support

  • You must be able to change the logging level that is in effect without stopping the application.
  • You must be able to add muscle to the system, without stopping anything.  Bring more servers on line, add more threads to a running server, and so on.

 

Software Configuration Management

Essentially, all of configuration management must be automated.

Check out, and check in of code; build, test, assembly, and release of an application, must be completely scriptable.  You should be able to do all of this at a click of a single button, or the issue of a single command at the command line.

In some of my previous work, even though the code base was all in one source control system, we used to have to explicitly issue about 60 check out commands. We never automated the check out. So to
create a new workspace, we would do a lot of manual work – 60 clicks of the mouse, 60 commands at the command line.

Releasing a new version of the application was a multi-step, manual process, which would require some Dev Ops person to be up at ungodly hours.

Some releases, especially emergency patches, were horrendously complex. Some poor schnook, sitting in India, used to have to painstakingly undo changes to several parts of the code base, make the release, and then restore all those changes.

Needless to say, this was error-prone. Disasters, big and small, begging to happen.  This should never be the case.

Software configuration management (sometimes referred to as build management, release management), must never be a burden to the developer.   Folks that specialize in this work (Dev Ops), must hide the complexity by automating it all away.   If you are using tools that do not lend themselves to this kind of automation, well, you are using the tools that developers cannot love.   This is infrastructure, which is meant to remove some of the drudgery from a developer’s working life.   So let us have useful, and reliable, infrastructure.

 

Testing

One of the most critical lessons of the Agile philosophy is the recognition that developers must also test.   The Agile world asks, how does a developer know he is done with a task?  The Agile world answers, he proves it with tests.  When all of his tests run successfully, the developer is done with his work.

Each developer must be able to test her work independently of other programmers.    This means that each developer must have separate sandboxes for code, and data.

Developers have their own sandboxes for code by virtue of using a 
source control system.  However, in my previous work, I often 
encountered resistance to setting up independent sandboxes for 
the data.  I never really understood this.  Why couldn't each 
developer have their own copy of the system database for instance?
Isn't this sort of thing quite inexpensive these days?

We must automate the generation, and load, of test data.

In one of my previous jobs, many tests required testers to create an
insurance policy.  These were manually entered, a laborious process,
which took significant time.   

As you can imagine, testing was not as rigorous as it could have been,
because it was just too hard to setup the data required for the test.

The system’s user interfaces must support automation.   This is necessary not only for functional testing, but more importantly, for load testing.   You have to be able to simulate many users banging on the UI.   For this to be done with any kind of rigor, you have to be able to drive the UI with scripts.  Keep that in mind when you construct the UI.

Tests must run continuously.  If you have ‘continuos integration‘ going, you have this in place.   Continuous integration is a feature of software configuration management.   Every time a change is checked into the source control system, you must automatically kick off a build of the whole system, which, by definition, includes tests.   This allows you to find errors sooner rather than later.   Continuous integration is possible only if all of your software configuration management is automated.

Finally, a replica of the production environment must be available to the developers.   Often, you run into errors that only seem to happen in production.   Give developers an environment that is identical to production, where they can test, and debug problems, without messing with the production environment itself.  Without this, you are asking developers to be brilliant, which seems like a high risk strategy.

 

Techies other than developers

So how about testers, dev ops personnel, and program operators?   These folks perform functions that have been covered above, and in earlier posts.

The section on testing applies to folks that are exclusively black box testers.   The section on configuration management applies to dev ops.   Earlier posts on graceful processes applies to program operators.

 

User experience for managers

There are at least two types of managers in an enterprise, right?

I think of them as ‘business managers‘ and ‘systems managers‘.

How are they different, in so far as user experience is concerned?

Business Architecture vs. System Architecture

We can approach this question using the same yardstick that I used in an earlier post, ‘User experience for business employees‘.  Business managers are expected to know the business, and not necessarily any one enterprise computer system that helps run the business.  A business manager should be able to move between companies that are in the same business, but may use different enterprise systems.   Within the same company, the enterprise systems may change as technologies evolve, but the business might stay largely the same.  Changing enterprise systems ought not to be the business manager’s concern, bread or butter.  So who is responsible for the enterprise system, which helps run the business.  Enter, the systems manager.

The business manager knows the goals of the business.  She is familiar with the various functions, capabilities and resources that collaborate to achieve the goals of the business.  Does that definition sound vaguely familiar?  It should, because that is the general definition of an architecture.   A business manager is cognizant of the business architecture.   A business architecture is separate from, and independent of what we could call the system architecture – individual computer based systems each with its own capabilities and responsibilities, interacting in well defined ways to implement the business architecture (aka the goals of the business).

For instance, a simplistic insurance company may be organized around these components – sales, underwriting, billing, and claims.   A business manager knows the responsibilities of each of these business components, and how these components collaborate with each other to produce outcomes that the insurance company wants.

However, the billing department might run its business on the backs of four different computer systems – a billing app that manages transactional billing data, a document management system that manages documents coming out of the billing app (bills, delinquency notices, etc.), a high volume print manager, and a messaging system that helps the billing department collaborate with the other business components – claims, and underwriting, and sales.  This is the system architecture that implements the billing component.   The billing manager’s focus stays with the business component as a whole, while some IT manager must know and keep control over the computer systems that help run billing.

 

User experience for a business manager

Truth be told, what a business manager requires will change from business to business.  I have little knowledge of any business, so there is specificity that I am not going to be able to provide.

However, thinking about this at a general architectural level, and applying anecdotal experience gained from working in a few enterprises, I believe we can come up with a list of what a business manager might find useful.

 

See the work flowing through the business architecture

The business manager will need to be able to witness, and evaluate the business that is flowing through the business architecture.   Two sorts of  views will be useful.

  • A snapshot of the state of things at a certain moment.  Now, two hours ago, closing time yesterday, etc. This should allow creation of a real time tracker of the business.  Any hotspots, bottlenecks?
  • Aggregate data.  The business that was done in some duration – all day today, the week so far, in the last 6 months, etc.

Similar questions will need to be answered for parts of the architecture.  Say just one component, like claims in an insurance company.

  • Snapshot.  How many claims does each claims adjuster have outstanding at the moment?  What is each claims rep doing – in the office, out in the field, etc.  
  • Aggregate data.  How many claims were paid, and how many rejected in the last 15 days?   How much money was paid out and by whom yesterday?  etc.

 

Presentation

The information described above must be available in two forms.

  • Old fashioned kind – reports, tables, graphs and charts.

 

Alerts

The manager must be able to setup alerts, and notifications, on arbitrary events of interest.   These alerts should be available on devices, and social platforms of the manager’s choice.

 

User experience for an IT manager

The user experience requirements for the business manager apply to IT managers as well, with one difference.  The IT manager wants information on how the system architecture is performing.

Consider the example of the billing component described earlier.   While the business manager is interested in how the billing component as a whole is performing, the IT manager will want to keep track of how things are going with the four computer systems that run billing – the billing app, the document management system, the print manager, and the messaging system.

  • A realtime visual representation of the work running through the billing workflow.  This must include the number of and the type of various billing transactions, the documents going to the document management system, what the print queues are doing, etc.  This will show me hot spots.
  • How many of the Missouri auto policy billings were finished today?
  • How much did we pay out as agents’ commissions last month?
  • Map delinquencies by region, but I don’t want a spreadsheet.  I want it represented in shades of color on a physical map of the country.
  • And so on.

 

How to get there

As with any user experience problem, you have to start with accurate knowledge of the business.   In any environment, we have to know the business architecture well, in order to satisfy business managers’ requirements.

Reporting requirements for system architecture demand that each computer system in the architecture must expose snapshot, and aggregate descriptions of the work that is going through the system.

Finally, we are going to have to pick up expertise in data visualization beyond filling spreadsheets.   There seem to be many tools out there for the client, which can be supported in the backend by either the JVM, or node.js.

 

 

 

User experience for business employees

Business employees

Who am I referring to exactly?   After all even the IT developer who builds and maintains the enterprise system is an employee of the business.  In fact, I mean folks that are not IT employees.   I am referring to people whose primary knowledge is the business of the enterprise, and not computer related skills.

For instance, she is a certified underwriter of farm policies.  She is not expected to know SQL.  She is not expected to be able to setup CRON jobs, or put together a Lucene query, or even an advanced Google query.  She does not tune the Oracle database where her policies reside.

What does it mean for such employees to have good user experience?

 

User experience

Common characteristics

At the outset, the common characteristics, as described in these posts, apply to business folks too.

Besides these, there is a perspective that applies to business folks specifically, I believe.

No training

Business resources must really only need training, experience, and expertise in the business, and not in whatever enterprise-wide computer system that is in place.

For instance, an underwriter should be able to come into a new insurance company, and knowing no more than how to use a keyboard, mouse, and perhaps a touch screen interface, and without formal training, should be able to very quickly learn and be productive using the existing enterprise system.

Anything the business resource has to learn, she must be able to learn painlessly, by just using the system.

The interfaces that the business user encounters must be a clear, natural, and seamless representation of her knowledge of the business.   The system should guide the user down paths that are instantly familiar, and obviously correct.

The enterprise system must leave little room for mistakes.   Even when the mistakes happen, they must be caught early, and there must be little or no cost.  This is essential to facilitate experimentation, and self-learning.

Transparent

You know that the user experience is good when the enterprise system recedes into the background.

The computer system must not register in the user’s mind as an obstacle, as a challenge, or as anything at all that is above and beyond her knowledge of the business.

Granted, regardless of how convoluted a system is, once a user learns it, the system will recede into the background.  That is almost unfortunate, because that is how a lot of clunky systems come to be.

However, say you have to make a change to the system.  Or say you want to replace the system.   How much resistance do you encounter?  If the business complains about having to learn a whole new system all over again, your user experience is suspect.

To put it another way, your system’s interfaces must not add any cognitive burden, beyond that which the business expertise itself requires.

How to get there

The fundamental business

Good solution design begins with effective business analysis.  Before diving into solutions, business analysis must first describe the essential business problem.  Volere, a Requirements and Business Analysis consultancy has a good definition of such analysis, which it calls ‘systemic thinking’.  To paraphrase Volere, you want an understanding of the essence of the business, without being prejudiced by any solutions, whether digital, or the old-fashioned kinds.

In a lot of my past work, the product of business analysis included 
someone's notion of a solution too, typically a user interface 
designed by folks with knowledge of the business, and good intentions, 
but not much else.  

Folks with the expertise (in user experience design) to translate the 
essential business rules, processes, and outcomes, into a transparent 
computer solution, never got a chance to understand the business at all.  
Result, more often than not, avoidable errors, unnecessary iterations, 
and ultimately, an interface, that was not useless, but that was less 
optimal than it had to be.

These were common refrains - "They will get used to it", "This is a 
documentation issue", "This is a training issue", etc.  All telltale signs 
of user experience that has room for improvement.

Human interaction design and construction

There are folks with the expertise to take the description of the essence of the business that systemic business analysis produces, and design a system with the characteristics described above.

Much of this expertise has been codified, as guidelines, patterns, and frameworks, which a competent generalist can learn as necessary.

Here is a list of resources that must serve as our guides.

Further, design is an iterative process, which will include the following sort of cycle.

  • The human interaction designer comes up with a design.
  • The design is implemented as some kind of prototype.
  • Users use and evaluate the prototype.
  • Tweak, enhance, start over, until everyone arrives at a satisfactory destination.

As this suggests, besides the design expertise, you have to be able to repeatedly construct, deploy, review, and change these solutions quickly.

Construction skills include the following.

  • Create plain wireframes with a tool like Balsamic.
  • Create colors and images laden, HTML, and CSS mockups with tools like Dreamweaver, and Photoshop, etc.
  • Create live prototypes with RAD frameworks like Ruby on Rails, or Django (Python), or Grails, or Play with Scala etc.  In particular, my personal interest is in the Java eco-system (Grails, Play), and a pure Javascript solution (for instance, Bootstrap.js, and BackBone.js at the client, and node.js in the backend).

You have to have the infrastructure and the skills for continuous integration, and continuous release.

Part of the review capabilities must include the ability to run usability tests.

Finally, there will be times when interfaces will have to change deep into the construction of the system.  Your engineering must be such that the interface can change quickly, without adversely affecting the backend.  You never want to say to the client – “It is too late to make that UI change.  You should told us this earlier,”.

As a generalist, what must I know regarding ‘transactions’

I believe, a generalist, or a team of generalists, must offer these skills, related to the implementation of ‘business transactions‘.

The failure of a business transaction must still leave the system in a safe, and valid state. As a ‘generalist’ I must either know how to achieve that goal, or know where to quickly find a solution.

Platforms

As always, I want to be able to solve this problem in two platforms – the Java eco-ssytem, and Node.js.

 

Short-lived business transactions

Most of us are familiar with ACID transaction support in relational databases.  Relying on this support is only recommended for short-lived business transactions.

Database transactions are typically implemented by locking database tables for a particular user, which forces all others to wait till the locks are released. This necessarily slows the system down, among other complexities. Hence the recommendation that ACID transactions be very short-lived.

Here are some examples of short-lived business transactions.

  • Change the address of a customer. Typically you have already gathered the new address, and you simply have to update a few tables with the new data.
  • Apply a payment against a policy. Again, the whole transaction is typically an update to a few tables.

In the Java eco-system, this sort of short-lived transaction, when applied against a single database, is implemented with the JDBC API. We must be able to able that, using Java, Scala, and Groovy.

I must be able to talk to relational databases, and manage ACID transactions against a single database, using node.js.

 

Long-lived business transactions

However, often, there are business transactions that are long-lived, and must behave gracefully.

Here is one example of a long-lived transaction – migrating old insurance policies from one system to another.

This is historical data, often several years worth and can be voluminous. It contains many different parts, like contacts, coverages, changes made to the policy over the years, documents, etc. Accepting all of the policy into a new system can take a significant time. We used to run into policies that took 20 seconds to process completely. Things will go wrong, and when they do, you would like to cleanly rollback all of the incoming data. However, you cannot keep an ACID database transaction open for 20 second, 10, or even 5 seconds. That locks up database tables, which in turn will severely diminish your ability to handle load.

Here is another example – business workflows that extend over several days.

They are initiated, passed around to several folks, and then eventually completed. If for some reason this business process ends in some kind of rejection, or failure, you may want to discard, or perhaps archive data that this workflow created.

Consider that new accident information is received for an automobile policy. Maybe some documents are uploaded. Some premium adjustments are made. Some bills are generated. Underwriting, and billing managers review, and sign off. This process may take several days to finish. Say after doing a lot of work, you discover all this work was done on the wrong policy – perhaps an ex-husband’s. How do you roll back this work, and data that has been accumulating over several days? Surely not with a database transaction.

So how do ensure that such operations are well behaved? If necessary, how to ensure that these long-lived business transactions exhibit ACID properties?

As a ‘generalist’ I must know standard approaches, and solutions to this problem. Further, I must be able to implement these solutions in the Java eco-system, and in node.js.

 

Single database vs. multiple

In a large, heterogenous environment, you are often working with several databases. Perhaps all documents are in some legacy SQL Server DB, and day to day transactional data are in a fast MySQL DB.

How do you implement ACID when a business transaction, even a short-lived one, works with data that is distributed across more than one database?

In the Java world, there is the JTA API, which supports so-called ‘distributed transactions’. But very few people seem to use this. As a ‘generalist’ I must know what the alternative is.

Similarly, I must know how this problem can be handled in node.js.

 

Polyglot persistence

This is just a special case of the multiple database scenario. The data repository can be anything at all – relational DB, NoSQL DB, text based index, messaging end point, flat disk file, etc.

How do you implement ACID when the business transaction works with many different kinds of data repositories?

For instance, say you are recording an auto accident. Photos of the car might go into a noSQL database, like MongoDB. A description of the accident is saved to Oracle, and this information is also parsed and pushed into Lucene. Finally, a notification is dropped into a queue that a claims adjuster is watching. If this transaction dies for some reason, you want to rollback the changes you made to each of these very disparate data repositories.