Experimental Evaluation

Get introduced to the experimental evaluation technique.

Experimental factors

Before diving into the details of how this method works, let’s first examine some factors that can affect an experiment’s results and overall reliability.

Participants

  • Who to include in the participants set?
    The ideal approach is to choose participants from the group of actual intended users. If that is not possible, select participants that are closely matched to the actual users in terms of age, education, expertise in the domain, etc. For example, if a device is intended for general public use, and the participant set for testing consists of undergraduates in the computer science field, the testing will not yield reliable results as the participant set does not represent the whole general public.

  • What should be the participant set size?
    The chosen set size often depends on two pragmatic considerations:

    • Number of resources available.
    • Number of participants available.

    Regardless of these, the sample size should be large enough to represent each major category of the intended audience.

Variables

Variables are used to test a hypothesis under controlled conditions. There are two types of variables:

  • Independent variables:
    These are the variables that are changed/manipulated to compare different conditions, where each different value produces a different condition. The different values of a variable are called the levels of variables. For example, consider an experiment to test whether changing the number of items in a menu changes the speed of the user’s selection from the menu. Here, the menu items are an independent variable, and the number of items may be three, five, seven, or nine. Considering that there are four different values, the level of menu items will be four. Some more examples of independent variables include:

    • Interaction style
    • Level of help provided
    • Icon design
    • Colors

    More than one independent variable may be required for complex systems. For example, if the experiment is to test a user’s response speed based on menu items and the type of positioning device being used, there is more than one independent variable. If three levels of positioning styles are being tested, say mouse, touchpad, and touchscreen, then the total number of conditions produced will be 12, i.e. the level of menu items multiplied by the level of positioning style.

    There should be at least two conditions:

    • Experimental condition: Condition which has manipulated variables. For example, in the above scenario, there will be at least 12 experimental conditions with each condition having different values of independent variables.
    • Controlled condition: The same condition as experimental but having unmanipulated variables. For example, the controlled condition for the above scenario will be a condition that has a controlled or constant set of independent variables.
  • Dependent variables:
    Dependent variables are the ones that are measured in the results of experiments and these results are based on the values of independent variables. Dependent variables must be measurable, affected by the independent variables, and, if possible, unaffected by other factors. In the above example, the user’s response speed is the dependent variable. Other dependent variables may include:

    • User performance
    • Time taken to complete a task
    • User preference
    • Errors made

Hypothesis

Hypotheses are the stated guesses about the experiment results that are defined in terms of variables. For example, “Altering the number of items on the menu will change the user’s speed in responding or selecting an item from the menu.” A null hypothesis is the one that states that altering independent variables does not affect the dependent variables. Referring to the above example, the null hypothesis would be, “Altering the number of items on the menu does not affect the user’s speed in responding or selecting an item from the menu.” This experiment aims to disprove the null hypothesis by statistically measuring the values of the variables. We will learn about these statistical measures in just a few moments.

Experimental design

Let’s find out how an experiment is performed.

  1. Define the right hypothesis, variables, and participants
    The first step is to define the hypothesis clearly, and for this purpose, the variables must be defined as well. All the necessary details should be known, such as the number of dependent and independent variables, their values, and the expected changes in the values of dependent variables. The number of participants should also be considered in this step and whether they represent the user group or not.

  2. Select the right experimental method
    The second step is to select the appropriate method from these two:

Between-subjects Within-subjects
In the between-subjects method, each participant is assigned to only one condition. In the within-subjects method, each participant is assigned to each condition.
This requires a large number of participants. This requires a relatively less number of participants.
  1. Select the right statistical test
    The third and the last step is to select the appropriate test to analyze the data. Choosing an inappropriate test can make the results invalid as each test is based on different assumptions about the data. Therefore, the test choice is very important for an experiment’s success.

Statistical measures

Let’s see some factors that help determine the appropriate test to use. In this course, we will not look at the details of the tests. There are two main tasks to perform before analyzing the data. The first task is to simply look at the data and eliminate any outliers. The second task is to save the data as it may be required for different methods of analysis. Basically, the type of data determines the appropriate statistical analysis method. The data variables are of two types:

Discrete variables Continuous variables
A variable that has finite values. A variable that can have any value in between a range.
For example, the variable “font size” may have four values or levels: 12,13,14, and 15. For example, the variable “font size” can be expressed as continuous with a value range of 11 to 25.

A continuous variable can be transformed into a discrete variable by dividing the range into groups. For example, “font size” can be classified as small (11-15), medium (16-20), and large (21-25).

Note: The independent variable is mostly kept discrete in cases where one design is being tested against another.

Types of tests

There are three types of tests depending upon the type of the dependent variable’s data retrieved during an experiment.

  1. Parametric tests: We use a parametric test when the data obtained follows a normal distribution. These tests produce acceptable results even when the data is not fully normal.
  2. Non-parametric tests: When it is not clear if data follows a normal distribution or not, we use non-parametric tests. These tests make no assumptions about the distribution of data. They use data ranking as their basis. For example, the ranking for the data 55, 70, 34, 67, 88, from lowest to highest, is 2, 4, 1, 3, 5.
  3. Contingency tests: Contingency tests are based on data classification in different discrete sets, known as attributes. The frequency of all data items of an attribute is calculated to be used as test input.