performance | S E on the Mon

This interview question will make me laugh. The Big O notation for sorting algorithms indeed!

Short answer; I have been a generalist application programmer for going on 20 years. In all those years, I have never had to look up a sorting algorithm’s Big O score.

If I were a systems programmer, working on a compiler, or a DBMS, I presume I would use that knowledge. Am I producing consumer software that millions of users access at the same time? Are you processing tons of data, as in social media? We may need knowledge of sorting algorithms. Build or maintain in-house business software that 300 CSRs use, on a good day? At an insurance company that is only marginally more agile than the DMV in Zootopia? Big O notation is a faint echo.

My long answer; I can tell you how I approach performance, and load handling issues, if you wish. There is much I can do before I am forced to dredge my college memories for Big O balderdash.

At Business Analysis, and Solution Design

Business says, build this. Or business says, we have a problem, do something. Parse business’ lament, and you’ll find two categories of information. Essential business requirement, and a solution that satisfies the requirement. Further, we can view the solution from two perspectives; technology agnostic, and technology-centric.

Nailing down the work in the above manner will reveal performance problems. By the same token, lack of this analysis, can make you miss performance problems.

There was once a web app. The user would push a button, which initiated an action on a bucket of 100K documents. More often than not, the action took too long to complete, and the web page would time out.
Much unhappiness ensued. Smart people tweaked the application code, and the database. It was not enough.
Finally, someone asked the right question. Shouldn’t long lived actions be asyncrhonous? The user should not have to wait at that web page till we completed the work. The system should tell the user, thanks for the work, you can go now, we’ll let you know when we finish.

Big O analysis did not raise that question.

Do you know what did? Knowing what business purpose we were trying to serve, so we could find an alternative way to get there. Courage to point out the error in system design to senior folk who allowed the error in the first place. You could also consider it, caring enough about the work to speak up. Earning the trust of the business folks and your peers in tech, so they are willing to lend you their ear. Finally some technical skills in the User Experience, and Asynchronous Processing areas.

Loose Coupling or Bust

Next, in so far as there is room for it, ensure that the parts that comprise your solution are loosely coupled. In plan English, I should be able to tinker with one part without affecting any other part. Each part must know as little as possible about the other parts. Make this happen at both large-grained and fine-grained levels.

Dave Thomas, Pragmatic Programmer, on loose coupling.

At the fine grained level, this typically puts me in Bob Martin’s Clean Code territory.

At the large grained level (aka ‘architecture’), I want to make architecture disappear. I am referring to Grady Booch’s definition of architecture.

Grady Booch's definition of Software Architecture

To paraphrase, architecture is those choices that are expensive to change. All our choices must be inexpensive to change.

Construct, Measure, Improve

Next construct the solution.

Now measure performance. Throw load at the system. Locate the bottlenecks. Remove the bottlenecks

If we got the design, and construction right, it should be easy to locate the bottlenecks. Furthermore, we can remove the inefficiency with minimal impact to the rest of the system. Remember, the parts of our solution are loosely coupled.

Improving Performance, Without the Big O

Look at I/O

About performance itself, I know one broad truth. The delays caused by disk based I/O, and network latency overwhelms anything you might do in memory.

Here are three examples from my recent experience.

Several activities in our QA environment were crawling. We noticed that log files were huge. We were at a very fine level in QA. We raised the logging level, which wrote fewer logs to disk. QA sped up to acceptable levels.

One activity was slower than acceptable in production. We discovered that we were making close to 1500 trips to the DB. We consolidated the query into 3 trips to the DB. We were good to go.

Last, we have a batch framework. It breaks a large task into small chunks, and puts the chunks on a queue. Listeners pull the chunks off the queue, and work on them in their own time. Some tasks took forever to complete, seized up even. We found that the queue choked when it received a lot of messages very fast. 750K records broken into 7500 chunks of a 100 records each, ran for 15 hours once. Increase the chunk to 20000 records, which put about 38 chunks on the queue. This worked like a charm.

In each of the above cases, I did not need Big O analysis. Someone built the Logging Framework that we used. And the DB. And the Messaging Platform. That person needed the Big O.

Learn the machine

At some point, we will need to fine tune a machine; a DB, a JVM, a CLR, a Messaging Server. For this you need in-depth specialized knowledge of the particular machine. I don’t have that. Remember, I am a generalist. I know what questions to ask. I can dig up answers. And then forget, because I don’t use that knowledge everyday.

More to this than we can see

Last, at this level, there are often a lot of variables, which interact in ways that is not always easy to predict. For instance, there are tons of JVM parameters, and DB switches. You do what you need to do to get past the current problem. Then you watch how things go, till you have to intervene again.

This is what I know about performance.

See, I don’t remember enough Big O analysis to save my life. Yet, life goes on.

I believe, a generalist, or a team of generalists, must offer these skills, related to the performance of software systems.

We are either going to be working with new software, which we write, or we will be working on a preexisting, complex and heterogeneous system.

A new system

On a new system, I am going to need the following skills.

Create a flexible profiling infrastructure.
- Be able to log the turnaround time on relevant functionality, and at relevant layers of the system.
- Be able to turn these logs on and off, increase and decrease the granularity of these logs, etc.
- Create log output that is easy to analyze. Bottlenecks must be easy to find.

Architect the system such that, bottlenecks can be replaced, without affecting any other part of the system.

Load test the system. Throw large amounts data, and users at the system, in order to expose its fault-lines.

An existing system

More often than not, you will have to deal with preexisting systems which you cannot instrument. Hence performance measurement will essentially have to be a black-box affair. In such situations, these skills are needed.

Load test the system from whatever interface is exposed to you.
- Throw users, and data at the available GUI.
- Throw users and data at available APIs.

Specialized knowledge on profiling the various parts of the existing system will help. For instance, say you have a PHP app running in Apache. Knowing how to manage Apache profiling, will help. But you cannot anticipate what tools a client will use. So this expertise is something we have to be able to acquire quickly, as needed, and on the job.

What platforms and tools exactly?

My interest is in two platforms – the Java eco-system, and node.js. Much of my programming experience has been in the Java world, and I am newbie to node.js. In any event, the above requirements translate into the following concrete skills, I believe.

Implement profiling (logging by another name) which can be configured at run-time.
- In a system that runs on the JVM, which includes these languages – Java, Scala, Groovy
  - Through common logging frameworks – log4j, JDK logging, SL4J
  - Through aspects – AspectJ
  - Through meta-programming – Groovy, Scala

In node.js

Break the system into modules whose boundaries are milestones that are relevant to performance. Implement these modules such that we can look into and change the performance on any one of them, without affecting anything else.
- For instance, presentation, service, data access layers.
- Or web service access to an external site, a computation performed in a rule engine, a search query against a text-based index, a MapRequce query against a NoSQL db, etc.

Get turnaround times out of at least commonly used tools, which are listed below. Clients may use other tools. So this is often going to be something you figure out on the job. Also, getting profiling information is only one problem. Actually administering these tools to improve performance is often a complex, and specialized job. It is almost impossible for a generalist to master all these tools. At most, when the need arises, we need to know how and where to look for a solution.

Web / App Servers
- Apache HTTP server
- Jboss Application server

At the client
- Rendering engines, and Javascript engines on web browsers (Chrome, Safari, Firefox, IE, Opera)
- A mobile app – Android, and iOS

Data repositories
- Oracle
- MySQL
- Lucene

Others
- JBPM (Business process management)
- Jboss Rules (Rule engine)
- CXF, Axis (Web service frameworks)

Load (data and users) test. There are plain old test automation tools, and there are specialized load and performance testing tools. I, in fact, do not know what is adequate for this purpose exactly. I have used simple Junit based tools in the past for this.
- By driving a web UI that is in some browser.
- Through a mobile app – Android, and iOS
- By driving an API (JVM language, or node.js) directly, without going through a UI.

Now what?

On second glace, this is no trivial list of skills. If nothing else, there should be little room for boredom.

S E on the Mon

So many questions…. So little time.

Tag Archives: performance

Interview Question – Sort the common sorting algorithms based on Big O analysis