Rigorous Evaluation R.I.T. Analysis and Reporting. Structure is from A Practical Guide to Usability Testing by J. Dumas, J. Redish

Similar documents
Unit 2. Describing Data: Numerical

Chapter 3. Measuring data

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Describing distributions with numbers

Describing distributions with numbers

Statistics for Managers using Microsoft Excel 6 th Edition

Descriptive Data Summarization

Business Statistics. Lecture 10: Course Review

Chapter 1 - Lecture 3 Measures of Location

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

MALLOY PSYCH 3000 MEAN & VARIANCE PAGE 1 STATISTICS MEASURES OF CENTRAL TENDENCY. In an experiment, these are applied to the dependent variable (DV)

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

BNG 495 Capstone Design. Descriptive Statistics

Descriptive Univariate Statistics and Bivariate Correlation

After completing this chapter, you should be able to:

Summarizing Measured Data

CIVL 7012/8012. Collection and Analysis of Information

AP Statistics Cumulative AP Exam Study Guide

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Review of Statistics 101

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Frequency Distribution Cross-Tabulation

Stat 101 Exam 1 Important Formulas and Concepts 1

THE PEARSON CORRELATION COEFFICIENT

SUMMARIZING MEASURED DATA. Gaia Maselli

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

1. Exploratory Data Analysis

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Descriptive Statistics-I. Dr Mahmoud Alhussami

Chapter 2: Tools for Exploring Univariate Data

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Nicole Dalzell. July 2, 2014

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Introduction to Statistical Data Analysis III

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Practice Problems Section Problems

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Tests about a population mean

Chapter 1 Statistical Inference

Statistics I Chapter 2: Univariate data analysis

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

2011 Pearson Education, Inc

Chapter 3. Data Description

Lecture 22/Chapter 19 Part 4. Statistical Inference Ch. 19 Diversity of Sample Proportions

Statistics I Chapter 2: Univariate data analysis

MgtOp 215 Chapter 3 Dr. Ahn

Basic Concepts of Inference

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Glossary for the Triola Statistics Series

Interactietechnologie

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Stat 101 L: Laboratory 5

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

Statistics and parameters

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

Week 1: Intro to R and EDA

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

1.3: Describing Quantitative Data with Numbers

Sampling Distributions

Chapter 2. Mean and Standard Deviation

Describing Distributions With Numbers Chapter 12

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Chapter 1. Looking at Data

Wilcoxon Test and Calculating Sample Sizes

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

DROP WEIGHT TEST WORK

Chapter 4.notebook. August 30, 2017

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Units. Exploratory Data Analysis. Variables. Student Data

Inferences About the Difference Between Two Means

Introduction to Statistics

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Lecture 2 and Lecture 3

1 Introduction to Minitab

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

16.400/453J Human Factors Engineering. Design of Experiments II

Stat 427/527: Advanced Data Analysis I

Chapter 1: Exploring Data

Notation Measures of Location Measures of Dispersion Standardization Proportions for Categorical Variables Measures of Association Outliers

is your scource from Sartorius at discount prices. Manual of Weighing Applications Part 2 Counting

Transcription:

Rigorous Evaluation Analysis and Reporting Structure is from A Practical Guide to Usability Testing by J. Dumas, J. Redish S. Ludi/R. Kuehl p. 1

Summarize and Analyze Test Data Qualitative data - comments, observations, test logs, surveys, Group into meaningful categories (+ or for a particular task/screen) Quantitative data - times, error rates, Tabulate survey multiple choice questions Use statistical analysis when appropriate User Role Usability Measure Measuring Instrument Metric Baseline Target Measured S. Ludi/R. Kuehl p. 2

Look for Data Trends/ Surprises Examine the quantitative data Trends or patterns in task completion, error rates, etc. Identify extremes, outliers Outliers - what can they tell us, ignore at your peril Non-usability anomaly such as technical problem? Difficulties unique to one participant? Unexpected usage patterns? Correlate with qualitative data such as written comments why? If appropriate compare across program versions (A/B testing), different user groups Identify critical instances (notable UX impact) S. Ludi/R. Kuehl p. 3

Examining the Data for Problems Have you achieved the usability goals learnable, memorable, efficient, understandable, satisfying? Unanticipated usability problems? Usability concerns that are not addressed in the design Have the quantitative criteria that you have set been met or exceeded? Was the expected emotional impact observed? S. Ludi/R. Kuehl p. 4

Task and Error Analysis What tasks did users have the most problems with (usability goals not met)? Conduct error analysis Categorize errors/task - requirement or design defect (or bug) % of participants performing successfully within the benchmark time % of participants performing successfully regardless of time (with or without assistance) If low then BIG problems S. Ludi/R. Kuehl p. 5

Prioritize Problems Criticality = Severity + Probability Severity 4: Unusable not able/want to use that part of product due to design/implementation 3: Severe severely limited in ability to use product (hard to workaround) 2: Moderate can use product in most cases, with moderate workaround 1: Irritant intermittent issue with easy workaround; cosmetic Factor in scope local to a task (e.g., on screen) versus global to the application (e.g., main menu) S. Ludi/R. Kuehl p. 6

Prioritize Problems (cont) Probability = frequency * scale Frequency (% of time used) 4: 90%+, 3: 51-89%, 2: 11-50%, 1: 10% or less Between 0 and 1 Scale (% of users) % of target population Between 0 and 1 When done sort by severity (priority) S. Ludi/R. Kuehl p. 7

Errors in Testing Sample is not big enough The sample is biased You have failed to notice and compensate for factors that can bias the results Sloppy measurement of data. Outliers were left in when they should have been removed Is an outlier a fluke or a sign of something more serious in the context of a larger data set? S. Ludi/R. Kuehl p. 8

Statistical Analysis Summarize quantitative data to help discover patterns of performance and preference, and detect usability problems Descriptive and inferential techniques S. Ludi/R. Kuehl p. 9

Describe the properties of a specific data set Measures of central tendency (single variable) Frequency distribution (e.g., of errors) Mean (average), median (middle value), mode (most frequent value in a set) Measures of spread (single variable) Amount of variance from the mean, standard deviation Relationships between pairs of variables Scatterplot Correlation Descriptive Statistics Sufficient to make meaningful recommendations for most tests S. Ludi/R. Kuehl p. 10

Using Descriptive Statistics to Summarize Performance Data E.g., Task Completion Times Mean time to complete rough estimate of group as a whole Compare with original benchmark: is it skewed above/below? TRIMMEAN - trims the top and bottom 10% before mean calculation to exclude outliers (Excel function) Median time to complete use if data very skewed Range (largest value smallest value) spread of data If small spread then mean is representative of the group A good measure S. Ludi/R. Kuehl p. 11

Summarizing Performance Data (Cont d) Interquartile range (IQR) another measure of statistical spread Find the three data points (quartiles) that divide the data set into four equal parts, where each part has one quarter of the data Difference between the upper (Q 3 ) and lower (Q 1 ) quartile points is the IQR IQR = Q 3 - Q 1 ( middle fifty ) Test for normal distribution actual and calculated values of Q1 and Q3 are equal if normal Find outliers - below Q 1-1.5(IQR) or above Q 3 + 1.5(IQR) S. Ludi/R. Kuehl p. 12

Summarizing Performance Data (Cont d) Standard Deviation (SD) is the square root of the variance How much variation or "dispersion" is there from the average (mean or expected value) in a normal distribution Sample SD "Bessel's Correction" E.g., Standard deviation of completion times If small, then performance is similar If large, then more analysis is needed A better measure S. Ludi/R. Kuehl p. 13

Standard Deviation (SD) The smaller the value of SD, the sharper the curve (narrow peak and steep sides) Results grouped around the mean The larger the value of SD, the broader the curve And the larger the difference that values have from the mean Influence by outliers possible, so rerun without them as well S. Ludi/R. Kuehl p. 14

Normal Curve and Standard Deviation 1 SD= 68% 2 SD = 95% 3 SD= 99.7% S. Ludi/R. Kuehl p. 15

Sample Data Tasks % of Participants Performing within Benchmark Mean Time SD Set Temp and Pressure 83 3.21 0.67 Set flows 33 12.08 10.15 Load the sample tray Set oven temperature program 100.46.17 66 6.54 2.56 S. Ludi/R. Kuehl p. 16

Correlation Allows exploration of the strength of the linear relationship between two continuous variables You get two pieces of information; direction and strength of the relationship Direction +, as one variable increases so does the other -, as one variable increases, the other variable decreases Strength Small: ±.01 to.29 Medium: ±.3 to.49 Large: ±.5 to 1 S. Ludi/R. Kuehl p. 17

Correlation Cov XY Pearson s correlation coefficient (r) is most often used to measure correlation Sensitive to a linear relationship between two variables Pearson s correlation coefficient (r) supported in Excel ( X r N X 1 S. Ludi/R. Kuehl p. 18 )( Y covxy SD SD X Y Y ) X = X axis data point Y = Y axis data point X= mean of the X points Y = mean of the Y points 11.13 (2.33)(6.69) 11.13 15.59.71 N = number of data points SD = Standard Dev.

Scatterplots Limitations of the Pearson correlation coefficient: Its value generally does not completely characterize the relationship between variables Non-linear and non-normal distributions, outliers Need to visually examine the data points Scatterplot plot (X,Y) data point coordinates on a Cartesian diagram S. Ludi/R. Kuehl p. 19

Scatterplot Samples 1.2 1 0.8 0.6 r =.00 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 S. Ludi/R. Kuehl p. 20

Scatterplot Samples 1.2 1 0.8 0.6 r =.40 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 S. Ludi/R. Kuehl p. 21

Scatterplot Samples 1.2 1 0.8 0.6 r =.99 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 S. Ludi/R. Kuehl p. 22

Data Analysis Activity See the Excel spreadsheet Sample Usability Data File under Assignments and In-Class Activities in mycourses Follow the directions Submit to the Activity dropbox Data Analysis S. Ludi/R. Kuehl p. 23

Supplemental Information Inferential Statistics S. Ludi/R. Kuehl p. 24

Infer some property or general pattern about a larger data set by studying a statistically significant sample (large enough to obtain repeatable results) In expectation the results will generalize to the larger group Analyze data subject to random variation as a sample from a larger data set Techniques: Inferential Statistics Estimation of descriptive parameters Testing of statistical hypotheses Can be complex to use, controversial Keep Inferential Statistics Simple (KISS 2.0) S. Ludi/R. Kuehl p. 25

Statistical Hypothesis Testing A method for making decisions about statistical validity of observable results as applied to the broader population Based on data samples from experiments or observations Statistical hypothesis (1) a statement about the value of a population parameter (e.g., mean) or (2) a statement about the kind of probability distribution that a certain variable obeys S. Ludi/R. Kuehl p. 26

Establish a Null Hypothesis (H 0 ) The null hypothesis H 0 is a simple hypothesis in contradiction to what you would like to prove about a data population The alternative hypothesis H 1 is the opposite what you would like to prove For example: I believe the mean age of this class is greater than or equal to 20.7 H 0 - the mean age is < 20.7 H 1 the mean age is 20.7 S. Ludi/R. Kuehl p. 27

Does the Statistical Hypothesis Match Reality? Two types of errors in deciding whether a hypothesis is true or false Note: a decision about what you believe to be true or false about the hypothesis, not a proof Type I error is considered more serious S. Ludi/R. Kuehl p. 28

Null hypothesis (H 0 ) hypothesis stated in such a way that a Type I error occurs if you believe the hypothesis is false and it is true In any test of H 0 based on sample observations open to random variation, there is a probability of a Type I error P(Type I Error) = Null Hypothesis Called the significance level Essential idea - limit, to the small value of, the likelihood of incorrectly reaching the decision to reject H 0 when it is true As a result of experimental error or randomness S. Ludi/R. Kuehl p. 29

How It Works Establish H 0 (and H 1 ) Establish a relevant test statistic and distribution for the sample (e.g., mean, normal distribution) Establish the maximum acceptable probability of a Type I error - the significance level (0.05) Describe an experiment in terms of Set of possible values for the test statistic Distribute the test statistic into values for which H 0 is rejected (critical region) or not Threshold probability of the critical region is Run the experiment to collect data and compute the test statistic p If p > reject H 0 S. Ludi/R. Kuehl p. 30

I believe the mean age of this class is 20.7 Establish H 0 The mean age in this class is less than 20.7 years Establish a relevant test statistic and distribution for the sample Mean, assume normal distribution from 17 to 26 of all undergraduate SE students Establish the significance level 0.05 by convention Simple Example Distribute the test statistic into values for which H 0 is rejected (critical region) Let s say 19 and above Run the test with a sample size of 10, compute the mean and the probability p of that mean value occurring from a sample size of 10 in the general population If p>, reject H 0 S. Ludi/R. Kuehl p. 31