This is blog post #4 in a 5 post series

Blog post 1: Rising to the Occasion: Using Load Testing to Optimize Cloud Server Scaling

Blog post 2: Developing a Server Asset Scaling Strategy

Blog post 3: Planning Ahead: Load Testing-Assisted Capacity Planning

In addition to establishing component scaling guidelines, the Apica AWS load test project set out to address potential complications with the testing process on top of returning useful data. By ramping up users to the max load in just a few milliseconds, the testing pushed the system right from the start in order to identify its breaking points. Apica LoadTest Portal allows quick ramp up, variable duration, and global geographic source traffic to simulate the real end user scenario. The testing team used the project to answer the following three questions about the process

Where are our limits/bottlenecks?

Limits and bottlenecks prevent the testing infrastructure from reaching the desired number of requests-per-second and the return of results that reflect real-world use. The AWS test featured two user scenarios recorded via Apica ZebraTester, which could create the desired dynamics in the scenarios and scale the request load to the required high-frequency rates.

The load tests need to run between 5 and 30 minutes to identify breaking points, so the testing infrastructure needs to last just as long before experiencing performance drops. The test team utilized the AWS CloudWatch to gauge metrics throughout the load tests, together with the Apica Performance Monitoring Agent. This collects metrics every five seconds, providing great granularity. The Apica AWS load test project identified bottleneck points that the testing team may encounter when running the test, including:

Limit #1: The test computer’s operating system.

The testing team identified pre-existing limitations within Linux that prevented the test application from using enough of the system resources to produce the number of requests-per-second necessary for the load test. In short, the application was unable to use enough CPU power and RAM capacity to generate enough virtual users to run the test at the level the team needed. This sort of situation is easy to identify by checking system logs for reached and maxed out limits. The testing team used SYSCTL to reconfigure the operating system so it could allocate enough resources to the application to accommodate the testing conditions. The following code shows how the team adjusted the operating system limits to address the needs of the load test:

Limit #2: The Nginx web server configuration had to be tweaked in order to maximize its utilization.

After confirming the Nginx utilizations, the team continued pushing load against the instance. They identified cases where the CPU was touching 100 percent usage during high load situations, despite the server having plenty of RAM free. Since the memory usage was at an acceptable level, while the CPU was not, the CPU was identified as a limiting factor in the test. The team settled on utilizing C-instances on the AWS side, which are optimized for handling computation-heavy operations.

Limit #3: The initial Kinesis implementation pushed records one by one, which created substantial overhead time in sending requests. Each API request had to go through the Kinesis layer on its own, which limited the system’s ability to send the required amount of requests per second.

The team adjusted the system to batch data requests in the API before sending them to Kinesis as a group in order to work around the limitation. The adjusted method used slightly more RAM to operate, but led to a huge performance improvement. The following chart shows an example case of traffic into Kinesis through AWS Cloudwatch:

The following code showcases the Kinesis configuration for the project. In particular, the code sample shows the buffer configuration which set the measurement interval to either every five seconds or 5,000 records–whichever comes first.

Limit #4: Once the system was able to process data requests fast enough to perform the test, the team identified additional bottleneck limits in the database request layer. DynamoDB’s setup configuration caused it to impact the performance. It also increased cost, as “reads” is a cost parameter for DynamoDB.

Since the load test rarely changed the accessed data, the team adjusted the layer to use a longer caching period, together with a new method that purged the cache when the data changed (as opposed to on each request). This change made the database layer very efficient. The following code demonstrates how the team was able to configure DynamoDB appropriately for the test:

Once the team isolated and eliminated testing bottleneck points, the load test provided accurate data for devising a server scalability strategy.

How does the system, and each layer, scale? What instances should be used when scaling?

In addition to testing whether the hosting infrastructure can withstand high traffic rates, the load testing process can be used to establish capacity planning guidelines. The hosting infrastructure includes many components, and those different parts all need upgrades at different intervals.

The test determines the upgrade ratios, like the need to add one application server for every four web servers. The team used A/B load testing, which involves running another load test each time the Apica AWS load test project scales up a component to measure scaling. The test team made use of the Apica Loadtest Portal, where it could group and tag tests for a great overview of all the tests as well as efficient comparisons of the results.

After scale-testing each layer a few times, the team established a pattern of how each application behaves as demand increases. This practice provided the guidelines for when and where the infrastructure needed to be scaled, which determined the AWS auto scaling rules. The following charts demonstrate how the load test transaction rate relates to page response times for establishing a scaling pattern”

How large does the system need to be to handle the initial load?

The load tests are only as good as the infrastructure hosting them; the team also has to establish how much power the testing infrastructure needs to operate to run reliable tests. In the AWS test case, the initial test encountered scaling issues early in the 2,000 requests-per-second range, which the team quickly realized was a bottlenecking issue. As with establishing scaling ratios for capacity planning, the team utilized the previously mentioned A/B testing method to push the testing infrastructure to the 19,032 requests-per-second rate.

The final test results boast a process any team can use to reliably predict the simultaneous maximum users a server implementation can support prior to experiencing service delays and overloads. Getting the most out of the load testing process requires addressing bottlenecking and limitations, while understanding how the system layers scale and how large the system needs to be to handle the initial load.

After the team addressed the bottleneck issues and established component scaling relationships, it was time to put that information to the test. The team performed a final load test over a six hour period with fluctuating traffic levels to verify that the AWS auto-scaling settings worked as desired, and to explain how each layer’s components loaded. The final test confirmed that the settings worked and were ready to be pushed into production.

Links

OS limitations

The above final configuration sets new OS Limits by increasing the required limits using sysctl in linux.

Before this configuration was used the system was using the default limit values which was not enough in order to maximize resource usage in the system.

This can clearly be identified when the application does not use all system resources – usually the system also throws log messages in the system log about the limits being reached / maxed out.

SYSCTL Final example configuration:

files:

				
						   “/etc/sysctl.d/01-apica.conf”:
     mode: “000755”
     owner: root
     group: root
     content: |
       net.ipv4.ip_local_port_range = 1024 65000
       net.ipv4.tcp_tw_reuse = 1
       net.ipv4.tcp_fin_timeout = 15
       net.core.netdev_max_backlog = 4096
       net.core.rmem_max = 16777216
       net.core.somaxconn = 4096
       net.core.wmem_max = 16777216
       net.ipv4.tcp_max_syn_backlog = 20480
       net.ipv4.tcp_max_tw_buckets = 400000
       net.ipv4.tcp_no_metrics_save = 1
       net.ipv4.tcp_rmem = 4096 87380 16777216
       net.ipv4.tcp_syn_retries = 2
       net.ipv4.tcp_synack_retries = 2
       net.ipv4.tcp_wmem = 4096 65536 16777216
       vm.min_free_kbytes = 65536
commands:
   sysctl:
     command: sysctl -p /etc/sysctl.d/01-apica.conf
     ignoreErrors: true
   nginx_restart:
     command: /etc/init.d/nginx restart
     ignoreErrors: true
				
			

SYSCTL explanation:

The above final configuration sets new OS Limits by increasing the required limits using sysctl in linux.

Before this configuration was used the system was using the default limit values which was not enough in order to maximize resource usage in the system.

This can clearly be identified when the application does not use all system resources – usually the system also throws log messages in the system log about the limits being reached / maxed out.

Nginx conf:

files:

				
					“/etc/nginx/nginx.conf”:
     mode: “000755”
     owner: root
     group: root
     content: |
       # Elastic Beanstalk Managed
       # Elastic Beanstalk managed configuration file
       # Some configuration of nginx can be by placing files in /etc/nginx/conf.d
       # using Configuration Files.
       # http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/customize-containers.html
       #
       # Modifications of nginx.conf can be performed using container_commands to modify the staged version
       # located in /tmp/deployment/config/etc#nginx#nginx.conf
       # Elastic_Beanstalk
       # For more information on configuration, see:
       #   * Official English Documentation: http://nginx.org/en/docs/
       #   * Official Russian Documentation: http://nginx.org/ru/docs/
       user  nginx;
       worker_processes  auto;
       error_log  /var/log/nginx/error.log;
       pid        /var/run/nginx.pid;
       events {
           worker_connections  16000;
       }
       http {
           # Elastic Beanstalk Modification(EB_INCLUDE)
           include /etc/nginx/conf.d/*.conf;
           # End Modification
           port_in_redirect off;
           include   /etc/nginx/mime.types;
           default_type  application/octet-stream;
           log_format  main  ‘$remote_addr – $remote_user [$time_local] “$request” ‘
                             ‘$status $body_bytes_sent “$http_referer” ‘
                             ‘”$http_user_agent” “$http_x_forwarded_for”‘;
           access_log  /var/log/nginx/access.log  main;
           sendfile        on;
           keepalive_timeout  65;
       }
       #apica custom
       worker_rlimit_nofile 65535;
KINESIS:

### Before

### Sending each record per request

var handleRequest = function (request) {

  //validation

  //..

  //authentication

  //..

  kinesis.send(request.record);

}

### After

### Batching records using the reactive library ‘rx’

 

var Rx = require(‘rx’);

 

var source = Rx.Observable.fromEvent(process, ‘kinesisRecords’) .bufferWithTimeOrCount(5000, 500); /* every 5th second or every 500th record */

var subscription = source.subscribe(function(records) {
                            if (records.length > 0) {
    kinesis.send(records);
                            }

});

var handleRequest = function (request) {

  //validation

  //..

  //authentication

  //..

  process.emit(‘kinesisRecords’, request.record );
}

DYNAMO DB:

###Before

// …

// …
 
database.getApplication(applicationId, function(err, application) {
  return callback(err, application);
});

###After

###Basic caching using the ‘memory-cache’ library
 var cache = require(‘memory-cache’);

// …

// …

var application = cache.get(cacheKey);
if (application)
{
  return callback(null, application);
}
database.getApplication(applicationId, function(err, application) {

  if (application){

    cache.put(cacheKey, application, cacheDuration);
  }
  return callback(err, application);

});