Loadtesting best practices – Part 1

Before I start discussing best practices about loadtesting, let me first tell you what my definition of a loadtest is.

“Testing a system with representative load, to determine if the nominal load can be handled”.

This means that if you start loadtesting you need to know the nominal load and have an expectation of the outcome. You will find that most of the best practices has to do with preparation, not with fancy techniques. An different type of test that is frequently called a loadtest, is a stresstest. A stresstest has a different purpose, as can be read in my definition of a stresstest.

“Testing a system above its nominal load, with the purpose of determining if the system in the future can handle a bigger load and to find a bottleneck.”

LoadTest are commonly used to scale an environment. These environments can be a SBC environment like Citrix XenApp or Microsoft RDS, or a VDI environment. But other environments like file, print or webservers can be loadtested aswell. In fact, if you can create the load by simulating user actions, you probably can perform some sort of loadtest.

Now let’s start with the best practices. I’ve written these down in the past years while I was performing loadtests with the DeNamiK LoadGen, the best practices apply to (almost) all loadtesting applications. You can say I learned it the hard way. Although there’s nothing wrong with that, you can learn from my mistakes.

This is the first part, focussing on the “basics”. Part two will focus on more advanced topics.

1 – Determine the purpose of the test

Before doing anything, you should determine why you’re performing a test. What type of test are you planning, a load or a stresstest? Do you want to validate a certain result with a pre-defined load, or do you want to find the level where things start to stop functioning properly?

For each test you should know how much sessions (or users) you’re going to use to simulate the load. 10, 25, 100 or more? The more users, the bigger the impact on the environment and the preparations.

The moment a session is launched, load is generated. Determine the when and how much sessions you want to launch simultaneously. The number of simultaneous users and the interval determines the load on the target environment, if you’re purpose isn’t to stress the environment you should take it slow. If you’re testing a real-life scenario, determine a scenario that corresponds with real life.

Create different load profiles. Users can generally be placed in different profiles like ‘light users’, ‘medium users’ and ‘heavy users’. Each profile has a different set of applications, workspeed and distribution.

For all of the above the same rule applies: consult the customer and adapt the real life parameters as much as possible.

2- Create a scenario

After a sessions is launched, load needs to be generated by simulating user actions. The user actions simulated should be determined before creating the script that simulates the actions.

The user actions simulated should be written down in a scenario which describes a roadmap. The scenario described what applications are used and what actions are executed in that application. For each action the expected outcome, a result on the screen, is written down aswell. A good objective to pursue is that a regular user, with no experience with the application, can execute the same actions without asking what to do next.

This way the guy that makes the script that simulates user actions won’t have to know the application.

The best scenario is a scenario that matches the workload of the customer. So consult the customer what applications are used and how the applications are used. Not all applications should be simulated, but it should correspond with the regular workload.

3 – Test users

Sessions require users to authenticate. It is recommended to use dedicated test users placed in a group. This way the users can be managed easily.

Make sure the users exist, are configured and function before you start. If the users do not exist, and you’re in charge of creating them, make sure you have an administrative account. And, know how and where to create the users, what properties needs to be set and where is the profile and home directory stored?

Test users should all be equal. Since you’re creating an automated test with simulated user actions, you want them to be as much equal as possible.

4 – SUT

Usually a test is conducted in an environment which has more than one system involved. Determine all systems that are under test (SUT) and therefore can influence the result or should be monitored.

Each system that has a connection, and has influence on the end-result, is a system under test. For instance an Active Directory Domain Controller, Fileserver, SQL server, SAN or router. Altough your main purpose can be very simple, like testing a Citrix XenApp farm, each component in the chain can influence the test results.

Windows machines can be monitored using performance counters, the following performance metrics give a good overview of the system. In case a bottleneck is reached, add more performance counters to gain more insight.

Memory
     Available MBytes
     Committed Bytes
     Free System Page  Table Entries
     Page Faults/sec
     Pool Nonpaged Bytes
     Pool Paged Bytes

Network Interface
     Bytes Received/Sec
     Bytes Sent/sec
     Bytes Total/sec

Physical Disk
     _Total\% Disk Time
     _Total\Current Disk Queue Lenght

Paging File
     _Total\%Usage

Process
     _Total\Page File Bytes

Processor
     _Total\% Interrupt Time
     _Total\% Processor Time
     _Total\Interrups/sec

Server
     Server Sessions

System
     Context Switches/sec
     Processor Queue Lenght

Terminal Services
     Active Sessions

5 – Influences

During a test other processes might be active that can influence you’re test results. That’s fine if this is part of your plan, but not if you didn’t knew. For instance when a back-up process starts during a test, that might influence the speed of the environment.

Map the activities in the environment and communicate with the customer. If nobody knows they should influence the test, they might do.

Except for a back-up process you might consider processes that influence the fileserver, databaseserver, SAN, WAN connection etc. etc.

6 – Computational vs perceived performance

Looking at the performance of a machine can be done in different ways. Generally these can be divided in two categories: Computational performance and perceived performance.

Computational performance is based on CPU cycles, available memory, IOPS, network throughput etc. Although these are all valid performance indicators, they are nothing more than an indicator. They describe the performance they theoretically should provide. For example if a workload on a processor is 30%, is that good? Or is it inefficient?

Tim Mangan described computational performance in his white paper “Perceived Performance” in September 2003:

“Computational capacity is the total amount of useful work that can be accomplished on a system in a given fixed period of time”

Perceived performance is from a different angle, the way users perceive the performance. This the end-result of the complete chain and should IMHO be on the top of your list. Perceived performance can be determined by measuring the responsetimes of user actions.

Tim Mangan descbied perceived performance in his white paper “Perceived Performance – Welcome to VDI” in May 2011:

“A methodology where one analyzes the system with a goal of improving user productivity by focusing on issues that affect the performance as perceived by the users”

7 – Creating bottlenecks

Make sure you’re not creating a bottleneck with your test environment since this influences the outcome. For instance the the machines that hosts the sessions, the LoadBot / Loader. Make sure that the number of sessions hosted per machine does not exceed the capacity of the machine. Altough it might sound logical, i’ve seen many cases where the number of sessions on one machine was much more than it can handle. This resulted in higher measurements, even when the target environment had enough capacity.

The same applies for collecting performance metrics. Each performance metric that is collected requires bandwith, if the monitored system is on a slow WAN connection this might influence the test. I’ve seen a test where the complete WAN connection was required for collecting performance metrics leaving no bandwith for the sessions.

Before you start a test, scale the LoadBots and determine the required bandwith for collection performance metrics.

Ingmar Verheij