The number we had asked for was right there on the screen, slightly left of the center, in 32pt font. Clearly labeled, impossible to miss. In the dark of the observation room, we watched the eye tracker’s gaze indicator flicker over it. The test participant even clicked on it. The number, which would have been the solution to the test task, disappeared from the screen, and another screen came up. A full 30% of test participants failed the task, which was to report the number to the test moderator.

No one had expected that.

Whenever I hear someone say, “We don’t need to test that,” I think of this episode. I’ve been in the usability business for over 25 years now, and there was not a single usability test where I didn’t experience a major surprise. Surprises are the very reason why we test in the first place. If we knew the results, we wouldn’t have to test.

But wait, that’s what experts are for, right? Knowing solutions to problems, and predicting what to expect. How can you claim to be an expert and at the same time declare yourself unable to predict usability testing results?

Experts Aren’t Perfect. Neither Are Tests.

Well, this is less of a contradiction than you might think. In his now famous CUE studies, Rolf Molich conducted a number of experiments on how “good” usability test labs and experts actually are. In one study, he compared the effectiveness of expert reviews with that of lab usability tests. Quite unexpectedly, usability tests and thorough inspections by experts yielded about the same number and quality of results. However, both experts and lab tests failed to capture many of the actual usability issues in the test applications: 60% of the issues reported overall were unique: each showed up in one team’s report only. The other teams missed them. Nevertheless they were all valid issues.

Can we actually ever know all the usability problems there are in an application? I doubt we can. There are statistical methods, some quite sophisticated, to estimate the overall number of problems in a UI from those you observed in a test or a number of inspections. With those methods, you can also estimate the number of test participants you need to invite in order to achieve a certain test coverage. The resulting numbers are surprisingly high, and they are missing the point.

Two points, actually.

Great User Experiences Come From Multiple Perspectives

The absence of usability issues doesn’t guarantee a successful product. A good product idea can actually bear quite a number of minor usability issues – we’ve used four-letter words about products which nonetheless somehow made it into the market and our possession, and which we would never give away. A good user experience is more than mere lack of usability issues. A usability test however is a perfect opportunity to shed light on way more than just the narrow scope of your test tasks. Good user experience comes from experiencing users – as directly as possible. This is why a good formative test protocol includes open-ended questions, and probing into unexpected events and things participants say. This is why a good testing campaign should be part of a design iteration process rather than about getting big numbers out of one big test. Every perspective added – another expert, another test participant, another design variant – adds to the overall understanding of what your product actually should be. The biggest gain in information always comes from fresh perspectives. Experts that test aren’t ignorant, au contraire: they know what they don’t know.

Great Teams Learn From Users, Not “Experts”

Point two is that usability tests don’t happen in thin air. They are part of a development project, a procurement and implementation project, or a general corporate strategy to improve a product’s user experience. The ultimate goal of any testing activity is to close the feedback loop between development outcome and development activities. Take away the feedback, and product development becomes a flight in the dark without instruments. You may have someone on board who knows the way, but how do you know you can trust them? How can you tell an expert from someone merely claiming expertise?

You may have heard about the Dunning-Kruger effect. David Dunning and Justin Kruger from Cornell University have run a number of psychological experiments demonstrating that a person’s belief in their own expertise doesn’t necessarily match their actual expertise. In fact, it may even be inversely proportional. Bluntly speaking: those most fervently claiming expertise are quite likely not to be actual experts.

But what if what they’re saying is so logical, and so perfectly making sense? Well, logic can be extremely misleading in predicting usability. One of the most stunning experiences with usability test participants is how logical their actions are when they totally screw up. Users rarely act illogically, they merely follow a logic that is quite likely different from that of a developer. Want an example? Just watch in the next couple of meetings how many speakers point the video projector remote control at the screen, instead of the projector (I’ve seen PhD-level physicists doing this). The logic they follow is clear: they point the control where they want it to take effect – that’s the screen, after all.

Experiencing user logic first-hand has a tremendous effect on development teams. Often, hours of design discussions become obsolete within seconds, and totally unexpected questions pop up. Team dynamics can change dramatically: all of a sudden, the most boisterous “usability experts” become very quiet. So far, virtually every team I’ve worked with was enlightened and inspired by what they saw through the one-way mirror. In terms of UX evangelizing, one hour of observing actual users is worth weeks of classroom usability training. The important learning goal is not “usability expertise.” It’s the simple fact that you are not the user.

The Greek philosopher Socrates once said “I know that I know nothing.” Over 2000 years later, he’s still a recognized expert in his field. Beat that.

Not logged in