Why Are User Journeys So Difficult to Test?

by | December 12, 2022

Part Two of the Humanizing Software Quality Series from Intellyx, for Apica

In our previous post, we established user journeys as the essential goal of software testing efforts.

If you were on a desert island and forced to bring only one service level objective of a successful application, you would absolutely want to keep tracking user journeys, leaving the rest of the metrics behind.

If that is true, what is stopping organizations from truly testing user journeys?

Variability and complexity

User journeys encompass every sequence of actions users take within an application. Any application with a high degree of interactivity and utility – meaning it’s not a simple single-path or low value app like a multiple-choice survey – can have an astronomical number of complex and variable user journeys available to test.

But let’s not just concede failure to complexity again. Let’s move past it.

Throughout history, whenever a particularly difficult task is put in front of humans, we have looked to achieve a state of mastery and flow to get around it. Ancient warriors could even practice and learn something as difficult as horseback archery.

Hitting a moving target from a moving platform requires a Zen-like state of detachment from the archer’s thought and emotion, where the state of the horse, hands, bow, arrows and target are all combined into a flowing action that can only be mastered in the moment.

This state of mastery and flow is definitely relevant to how we build and test modern applications, which involve interconnected movements of front-end design processes, distributed services, and back-end data repositories, all orchestrated to meet customer requirements at scale.

Putting the user at the center instead of the software

Most software delivery teams are already adopting automated tools and DevOps methodologies for shifting unit, functional, regression and performance testing left as far as they can in the software pipeline.

High performing DevOps teams can expect to accelerate software delivery from quarterly or monthly releases, to several daily deployments on elastic cloud architectures in many cases, with far lower deployment failure rates, according to industry surveys.

Continuously deploying new releases at 100X or 1000X speed is a DevOps state of flow all organizations strive for. However, even with automated quality checks, agility entrains a new set of risks due to variability in real user scenarios.

To stay ahead of the rate of change in our extended application estate, and aim toward where the quality target will be, we must master monitoring, testing, and understanding human-centric, dynamic user journeys.

Getting a grip on user intentions

Users are unpredictable, and you never know what they might try to do with an application. They show up in droves at odd times, they don’t read the instructions or labels, they skip steps, they enter the wrong information in the wrong place, they abandon carts, they complain constantly… what the hell do they want?

An oft-overlooked aspect of the DevOps movement is empathy – but it’s not just empathy among team members, it’s about having empathy for users as well. You never know exactly what people are going through in their lives when they are using your application, so there’s real value in understanding their motives.

For marketers, the one-question survey: “Would you recommend this to a friend?” produces the net promoter score. NPS has always been the golden metric of customer satisfaction, and while it’s still useful from a 10,000 meter view of overall success, NPS gives you very little idea of user intentions, nor any details of scenarios played out by live customers.

Understanding customers starts with active user monitoring of on-screen actions and session data. You can almost think of it as software documentation in reverse. Rather than providing written instructions for users, the users instruct and inform design, testing, and refinement of the entire software offering by contributing their live usage data, which is refined as tests.

Causing the cause is difficult

There’s a mature category of software tools for real user monitoring (or RUM) which does capture production data from users, but this only scratches the surface on recognizing deeper patterns in that data that emerge over the course of thousands of sessions.

Applied intelligence (a specialized form of AI) is required to correlate so many user actions and sessions into codified functional and load testing scenarios that account for variability, since there is usually more than one way each user can act on their intentions.

Still, we need to go deeper than replaying sessions. The hardest problem isn’t just repeating user actions and making them variable, it’s creating the conditions in which the riskiest user journeys would exist in the first place.

You must be able to cause the cause – synthesizing monitoring feedback to simulate an application under stress. By running the test user through both tedious everyday scenarios and rainy-day edge cases that appear when a totally unexpected crisis arises, teams can start to generate feedback that is robust enough to make the system under test more resilient.

Use cases: From slow advertising to nonstop carnage

One major multi-network media property did full synthetic monitoring and web performance testing with Apica at the beginning and end of every release cycle. They discovered that each captured page in their model loaded sluggishly, as the captured browser scenarios included as many as 300 different hidden pixel lookups and calls to different advertising providers.

While an extra two or three seconds of wait time is hardly worth reporting a customer support issue, over time such a performance lag causes users to permanently leave a site. The network’s web team reduced it all to just one API call per page to a unified ad handling function, thereby reducing page load times by 90 percent or more.

Hotter still is the HBO Max use case, which sought to get ahead of customer service concerns with its streaming service, which plays largely on smart TVs and console platforms. Expecting a rush of dragon-loving fans for the final season of their hit show Game of Thrones (GoT), they were able to simulate millions of simultaneous subscriber sign-ins and multi gigabit streaming requests across different platforms.

The result? More than 19.3 million subscribers watched the GoT series finale, with as many as 13.6 million viewers tuned into the high-definition stream within 3 minutes of its exact release time to see the legendary land of Westeros set on fire.

The Intellyx Take

If ancient Mongolian archers could hit a moving target from horseback at more than 100 meters, maybe modern devtest teams can reach the state of flow and mastery necessary to test real user journeys and hit quality and performance targets.

Disciplined capture and correlation of every user action and screen element into a robust performance and functional test suite is just the start. The best performing teams will use AI and synthetic monitoring to reproduce as many test scenarios as needed to reduce the risk user journeys will be interrupted.

Next up in this series – How to bring API-driven applications and interconnected services into the human experience equation.

©2022 Intellyx LLC. Intellyx retains control over the content of this document. At the time of writing, Apica is an Intellyx customer. Image source: Lian Xiaoxiao, flickr CC2.0 license.