This is blog post 3 in a 5 post series

Blog post 1: Rising to the Occasion: Using Load Testing to Optimize Cloud Server Scaling

Blog post 2: Developing a Server Asset Scaling Strategy

Application teams need to establish how much power each specific part of the system requires to expand capacity without wasting resources to address traffic increases. While a singular load test tells the development team whether a web application and its hardware infrastructure will provide reliable performance at a given traffic rate, it does not establish growth guidelines.

Looking at the test results on an incremental basis lets the team know when they need to add more server power, but it does not specify how much additional server power is necessary to handle a specific uptick in end user traffic. Moreover, different parts of the service infrastructure scale at different rates–so a hypothetical increase of 1,000 requests per second may require an additional application server, but will not require adding another database.

Scaling Beyond the Testing Threshold: Capacity Planning

The AWS test peaked out at 19,032 requests per second–which may actually be much more than what the application ends up using down the line. Fortunately, comparing data from load testing results using the requests-per-second rating as a variable provides insight into the relationship between the application implementation performance. The business world uses the term “capacity planning” to describe the practice of meeting increasing production demands while minimizing waste.

Within the context of web applications, capacity planning refers to determining the maximum production ratio between specific components within a larger system to determine when any given part of the system needs to be upgraded to meet demands. Capacity planning addresses both when an implementation needs additional power to function and where the team can trim unnecessary additional components.

Single Implementation, Many Scaling Parts

Capacity planning is important in relation to asset waste, because different parts of the larger system have different thresholds. The system itself features parts like web servers, application servers, cache layers, and databases, each of which features different rates at which they need to be scaled to handle increased traffic. Capacity planning establishes the ratio between these components.

In the AWS case, the test found that every application server was able to support the same amount of traffic as four web servers. Thus, utilizing two application servers for three web servers would mean the second application server is an unnecessary expense. However, if traffic demands moved up from needing four web servers to five web servers, the second application server would be necessary to avoid bottlenecking. As traffic continues to grow in the AWS use case, the implementation would not require adding a third application server until the traffic demands at least nine web servers to operate smoothly.

Scaling the Application Layers

Scaling goes deeper than servers and databases: the hardware running the application also needs adjusting. Application traffic goes through several system layers to complete an interaction, all of which require enough bandwidth to meet traffic demands.

In the case of the Apica AWS load test project, the first traffic layer consisted of a front-end API running on NodeJS within the AWS Elastic Beanstalk Service. After the API handles data processing, the information moves through the AWS Kinesis system, while the DynamoDB system handles the backend database for the front-end API. Many of these components interact with each other in multiple directions, as illustrated by the following diagram:

If any of the components in this system are overwhelmed by information flow, it creates a bottleneck for the service. The scaling interaction data between the different components of the application is essential for establishing the ratios at which the layers scale relative to each other. The team used test results from the optimal API application instance, DynamoDB setup, and Kinesis performance results to establish scaling ratios.

The Increasing Significance of Capacity Planning

Capacity planning is essential for a proactive approach to hardware scaling. Modern server capacity concerns are compounded by the rise of handheld devices like smartphones and tablets combined with laptop access via mobile broadband Internet, changing not only the nature of how end users connect to a service, but also the frequency at which they can (and, often, do). This changing technological climate means web applications are pushed to higher usage limits than ever before–and it’s up to you to be ready.