Human Performance Regression Testing

Abstract

As software systems evolve, new interface features such as keyboard shortcuts and toolbars are introduced. While it is common to regression test the new features for functional correctness, there has been less focus on systematic regression testing for usability, due to the effort and time involved in human studies. Cognitive modeling tools such as CogTool provide some help by computing predictions of user performance, but they still require manual effort to describe the user interface and tasks, limiting regression testing efforts. In recent work, we developed CogTool-Helper to reduce the effort required to generate human performance models of existing systems. We build on this work by providing task specific test case generation and present our vision for human performance regression testing (HPRT) that generates large numbers of test cases and evaluates a range of human performance predictions for the same task. We examine the feasibility of HPRT on four tasks in LibreOffice, find several regressions, and then discuss how a project team could use this information. We also illustrate that we can increase efficiency with sampling by leveraging an inference algorithm. Samples that take approximately 50% of the runtime lose at most 10% of the performance predictions.

Experiment Settings

CPU: AMD Operon 6128, 2.2 GHz, 32-core,64-bit processors
Memory: 128GB/node, 4GB/core
Operating System: Linux 2.6.18
Java Runtime: Java 1.6.0_10
GUI Environment: Xvfb
CogTool Version: 1.2.2.0 (3770) 10/24/2012-01:21

Setup

Below, we have provided the GUI and EFG files extracted for each version of each task (Menus Only, Menus + Keyboards , Menus + Keyboards + Toolbars). We have also provided the sets of rules (in our XML format) we used to prune the generated test cases. A zip file of each set of generated test cases is also provided.

The tool we use to peform the test case generation, GUITAR, can be found here: http://guitar.sourceforge.net/. The set of applications used for our tasks is LibreOffice (www.libreoffice.org).

Task	Module	Version	Rules, GUI, EFG	# Test Cases
		Menus Only	Rules , GUI , EFG	3
Format Text	Writer	Menus + Keyboards	Rules , GUI , EFG	24
		Menus + Keyboards + Toolbars	Rules , GUI , EFG	81
		Menus Only	Rules , GUI , EFG	2
Insert Hyperlink	Writer	Menus + Keyboards	Rules , GUI , EFG	8
		Menus + Keyboards + Toolbars	Rules , GUI , EFG	18
		Menus Only	Rules , GUI , EFG	4
Absolute Value	Calc	Menus + Keyboards	Rules , GUI , EFG	32
		Menus + Keyboards + Toolbars	Rules , GUI , EFG	72
		Menus Only	Rules , GUI , EFG	3
Insert Table	Impress	Menus + Keyboards	Rules , GUI , EFG	12
		Menus + Keyboards + Toolbars	Rules , GUI , EFG	36

Results for RQ1

Our results for Research Question 1 are below. Each CogTool (.cgt) project file has been provided. To view the CogTool files download CogTool (http://cogtool.hcii.cs.cmu.edu/) and open them as a project.

Task	Version	No Test Cases	Mean Time	Min Time	Max Time	SD	Project
Format Text	M	3	13.8	13.7	13.8	0.1	.cgt
Format Text	MK	24	13.2	12.3	14.1	0.6	.cgt
Format Text	MKT	81	11.8	8.6	14.1	1.7	.cgt
Insert Hyperlink	M	2	20.5	19.5	21.6	1.5	.cgt
Insert Hyperlink	MK	8	20.1	18.3	21.6	1.4	.cgt
Insert Hyperlink	MKT	18	19.8	17.6	21.6	1.3	.cgt
Absolute Value	M	4	18.1	17.9	18.3	0.1	.cgt
Absolute Value	MK	32	18.3	17.7	18.8	0.2	.cgt
Absolute Value	MKT	72	17.8	14.1	18.9	1.6	.cgt
Insert Table	M	3	12.8	12.7	12.9	0.1	.cgt
Insert Table	MK	12	12.7	12.3	13.3	0.3	.cgt
Insert Table	MKT	36	12.3	11.3	13.3	0.4	.cgt

Results for RQ2

The results shown below are across 5 randomly chosen runs for each sample size. The samples are taken from the full generated set of test cases for that task. The set of test cases is provided as well as the resulting project file for each run. To view the CogTool project files (.cgt), download CogTool (http://cogtool.hcii.cs.cmu.edu/) and open them as a project. A zip file is also given with all of the generated test cases for that run.

Test cases for the 'All' versions for each task can be found in the Setup table (Menus + Keyboards + Toolbars). Results for the 'All' versions can be found in the RQ1 results table (MKT version). Event-Flow Graphs (EFGs) can be found for each set of test cases below in the Setup table (Menus+Keyboards+Toolbars version).

Design Construction			CogTool Analysis
Task (Sample %/Size)	Run Time (s)	% Red	No. Methods	No. Inferred	Mean	Min	Max	Test Cases	Project
Format Text (5%/4)	445.7	93.8	12.8	8.8	12.2	10.2	13.7	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Format Text (10%/8)	800.2	88.9	41.4	33.4	11.9	8.8	14.0	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Format Text (25%/20)	1869.6	74.1	76.2	56.2	11.8	8.6	14.1	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Format Text (50%/41)	3668.3	49.2	81.0	40.0	11.8	8.6	14.1	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Format Text (All)	7215.9	-	81	-	11.8	8.6	14.1	--	--
Insert Hyperlink (5%/1)	187.5	89.8	1.0	0.0	19.6	19.6	19.6	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Insert Hyperlink (10%/2)	293.7	84.0	3.6	1.6	20.3	19.5	21.1	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Insert Hyperlink (25%/5)	579.1	68.5	15.6	10.6	19.8	17.6	21.6	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Insert Hyperlink (50%/9)	967.5	47.3	18.0	9.0	19.8	17.6	21.6	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Insert Hyperlink (All)	1836.8	-	18	-	19.8	17.6	21.6	--	--
Absolute Value (5%/4)	877.3	93.7	14.8	10.8	17.6	15.2	18.8	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Absolute Value (10%/7)	1423.1	89.7	25.8	18.8	16.9	14.1	18.7	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Absolute Value (25%/18)	3561.4	74.3	56.4	38.4	17.0	14.1	18.9	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Absolute Value (50%/36)	6974.6	49.7	69.6	33.6	17.1	14.1	18.9	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Absolute Value (All)	13864.9	-	72	-	17.1	14.1	18.9	--	--
Insert Table (5%/2)	300.9	92.2	3.6	1.6	12.3	11.8	12.7	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Insert Table (10%/4)	519.7	86.6	6.4	2.4	12.3	11.8	12.8	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Insert Table (25%/9)	1036.3	73.2	19.4	10.4	12.3	11.4	13.1	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Insert Table (50%/18)	2069.9	46.5	32.8	14.8	12.4	11.4	13.3	Run 1 Run 2 Run 3 Run 4 Run 5	R1 .cgt R2 .cgt R3 .cgt R4 .cgt R5 .cgt
Insert Table (All)	3867.2	-	36	-	12.3	11.3	13.3	--	--

Acknowledgments

We thank Peter Santhanam (IBM Research) for pointing out the connection between usability and functional GUI testing and Atif Memon (University of Maryland) for providing us with the newest releases of GUITAR and technical support. This work is supported in part by IBM, the National Science Foundation through award CCF-0747009, CNS-0855139 and CNS-1205472, and by the Air Force Office of Scientific Research, award FA9550-10-1-0406. The views and conclusions in this paper are those of the authors and do not necessarily reflect the position or policy of IBM, NSF or AFOSR.