Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Similar documents
Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

HodgeRank on Random Graphs

Active Sampling for Subjective Image Quality Assessment

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Two-Sample Inferential Statistics

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Linear regression methods

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

CSE 546 Final Exam, Autumn 2013

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

Linear classifiers: Overfitting and regularization

Recommender systems, matrix factorization, variable selection and social graph data

Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services

Graph Helmholtzian and Rank Learning

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

CS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine

Complex Social System, Elections. Introduction to Network Analysis 1

CSC 411: Lecture 03: Linear Classification

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

When Dictionary Learning Meets Classification

Effects of Outliers and Multicollinearity on Some Estimators of Linear Regression Model

Large-Margin Thresholded Ensembles for Ordinal Regression

SAT, NP, NP-Completeness

FINAL: CS 6375 (Machine Learning) Fall 2014

Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons

Ad Placement Strategies

Large-scale Collaborative Ranking in Near-Linear Time

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Andriy Mnih and Ruslan Salakhutdinov

8.1 Concentration inequality for Gaussian random matrix (cont d)

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction to Statistics for Traffic Crash Reconstruction

Infinite Ensemble Learning with Support Vector Machinery

10-701/ Machine Learning - Midterm Exam, Fall 2010

arxiv: v1 [math.co] 13 Dec 2014

On Markov chain Monte Carlo methods for tall data

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Appendix: Modeling Approach

A Wisdom of the Crowd Approach to Forecasting

ECE521 week 3: 23/26 January 2017

Statistical Ranking Problem

Robust Principal Component Analysis

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

CS246 Final Exam, Winter 2011

Maximum Margin Matrix Factorization for Collaborative Ranking

The Perceptron Algorithm

1 [15 points] Frequent Itemsets Generation With Map-Reduce

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

arxiv: v1 [stat.ml] 16 Nov 2017

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

CS 188: Artificial Intelligence. Outline

Recommendation Systems

Structural Learning and Integrative Decomposition of Multi-View Data

CS246 Final Exam. March 16, :30AM - 11:30AM

CS Homework 3. October 15, 2009

Some graph optimization problems in data mining. P. Van Dooren, CESAME, Univ. catholique Louvain based on work in collaboration with the group on

arxiv: v1 [stat.me] 30 Dec 2017

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Lecture 9: September 28

Introduction to Logistic Regression

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Numerical Methods I Solving Nonlinear Equations

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Deep Learning & Artificial Intelligence WS 2018/2019

Logistic Regression with the Nonnegative Garrote

Voting (Ensemble Methods)

Proximity-Based Anomaly Detection using Sparse Structure Learning

Bias-free Sparse Regression with Guaranteed Consistency

Stat 705: Completely randomized and complete block designs

Method 1: Geometric Error Optimization

Generic Text Summarization

CSE 546 Midterm Exam, Fall 2014

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Extending a two-variable mean to a multi-variable mean

Dimensionality Reduction

Machine Learning - MT Clustering

Shape Outlier Detection Using Pose Preserving Dynamic Shape Models

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Information Retrieval

Learning to Query, Reason, and Answer Questions On Ambiguous Texts

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Data Mining. CS57300 Purdue University. Jan 11, Bruno Ribeiro

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Lesson 15: Solution Sets of Two or More Equations (or Inequalities) Joined by And or Or

Lecture 11. Linear Soft Margin Support Vector Machines

Interpreting Deep Classifiers

CS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

A Probabilistic Model for Canonicalizing Named Entity Mentions. Dani Yogatama Yanchuan Sim Noah A. Smith

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Transcription:

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience Qianqian Xu, Ming Yan, Yuan Yao October 2014

1 Motivation Mean Opinion Score vs. Paired Comparisons Crowdsourcing Ranking on Internet 2 Outlier Detection HodgeRank on Graphs LASSO for Outlier Detection Adaptive Least Trimmed Squares 3 Numerical Experiments Simulated Study Real-world Data 4 Conclusions

Outline Motivation Outlier Detection Numerical Experiments Subjective Image Quality Assessment Reference Qianqian Xu, Fast-fading Ming Yan, Yuan Yao White-noise iterative Least Trimmed Squares Conclusions

Mean Opinion Score Mean opinion score is widely used for evaluation of images, videos, as well books and movies, etc., but Unable to concretely define the concept of scale; Ambiguity interpretations of the scale among users; Difficult to verify whether a participant gives false ratings either intentionally or carelessly.

Outline Motivation Outlier Detection Numerical Experiments Paired Comparisons Simpler design with binary choice; Robust decision (invariant up to a monotone transform on personal scaling functions). Which one looks better to you (fastfading vs. white noise)? Qianqian Xu, Ming Yan, Yuan Yao iterative Least Trimmed Squares Conclusions

Crowdsourcing Ranking on Internet Start from a movie The Social Network

Crowdsourcing Definition The term crowdsourcing is a portmanteau of crowd and outsourcing. It is the act of outsourcing tasks, traditionally performed by an employee or contractor, to an undefined, large group of people or community (a crowd ), through an open call. random participants from Internet; random item pairs in comparison; so, data is incomplete; is imbalanced (heterogeneously distributed); is dynamic; contains outliers.

Automatic Outlier Detection in QoE Evaluation iterative Least Trimmed Squares (ilts) is proposed for outlier detection in QoE evaluation. ilts is fast achieves up to 190 times faster than LASSO; ilts is adaptive purifies data automatically without a prior knowledge on the amount of outliers.

Least Squares Assume Y α ij = sign(s i s j + Z α ij ) sign( ) = ±1 measures the sign of the value; s = {s 1,, s n} R n is the true scaling score on n items; z α ij is noise. For independent and identically distributed noise zij α with zero mean, the Gauss-Markov Theorem claims Least Square (LS)-rank is the unbiased estimator with minimal variance: 1 minimize s R n 2 (s i s j Yij α ) 2. i,j,α

Sparse Outliers If Zij α = Eij α + Nij α, where perturbation E α ij is sparse outliers, LS becomes unstable and may give bad estimation. May due to different test conditions; human errors; abnormal variations in context. How to detect and remove them to achieve a robust estimation against sparse outliers?

LASSO 1 minimize s R n,e 2 (s i s j Yij α + Eij α ) 2 + λ E 1. i,j,α Algorithm 1 LASSO for Outlier-Detection and Global Rating Estimation (1) Find the solution path of the Lasso problem; (2) Tuning parameter. Determine an optimal λ by cross-validation with random projections or by looking at the path directly; (3) Rule out outliers and perform least squares to get an unbiased score estimation.

Parameter Tuning Remark. Cross-validation may fail when outliers become dense and small in magnitudes. We can look at the solution path directly. An example of the solution path of simulated data with those corresponding to outliers are plotted in red, which mostly lie outside the majority of the paths.

Drawbacks of LASSO LASSO is expensive; LASSO needs prior knowledge (i.e., number of outliers in the dataset). Call for an efficient and automatic method for outlier detection!

Least Trimmed Squares Use l 0 in the constraint instead of l 1 in the objective. { minimize 1 s R n,e 2 (s i s j Yij α + E α ij )2, i,j,α subject to E 0 K. It is equivalent to minimize s R n,λ subject to Λ α ij (s i s j Yij α)2, (1 Λ α ij ) K, Λα ij {0, 1}. i,j,α i,j,α (1) Here Λ α ij is used to denote the outliers as follows: Λ α ij = { 0, if Y α ij is a outlier, 1, otherwise. (2)

Alternating Minimization 1) Fix Λ and update s. We need to solve a least squares problem with the comparisons that Λ α ij = 1. 2) Fix s and update Λ. This time we are solving minimize Λ subject to Λ α ij (s i s j Yij α)2, (1 Λ α ij ) K, Λα ij {0, 1}. i,j,α i,j,α This problem is to choose K elements with largest summation from the set {(s i s j Yij α)2 }. We can choose any Λ such that (1 Λ α ij ) K, Λα ij {0, 1} and i,j,α min i,j,α,λ α=0(s i s j Yij α ) 2 max ij i,j,α,λ α=1(s i s j Yij α ) 2. (4) ij (3)

iterative Least Trimmed Squares Algorithm 2 iterative Least Trimmed Squares with K Input: {Yij α }, K 0. Initialization: k = 0, Λ α ij = 1. for k = 1, 2, do Update s k by solving the least squares problem using only the comparisons with Λ α ij = 1. Update Λ k from (4) with one different from previous ones. end for return s.

Adaptive Least Trimmed Squares Algorithm 3 Adaptive Least Trimmed Squares Input: {Yij α }, Miter > 0, β 1 < 1, β 2 > 1. Initialization: k = 0, Λ α i,j(k) = 1, K k = 0. for k = 1,, Miter do Update s k with least squares using only the comparisons with Λ α ij (k 1) = 1. Let K k be the total number of comparisons with wrong directions, i.e., Yij α has different sign with si k sj k. { β 1 Kk, if k = 1; K k = min( β 2K k 1, K (5) k ), otherwise. If K k = K k, break. Update Λ(k) using (4) with K = K k. end for Find ŝ with least squares using only the samples with Λ α ij (k) = 1. return ŝ, ˆK = Kk.

Remarks β 1 < 1 is small to make sure that the first estimation is underestimated. e.g. β 1 =0.75. β 2 > 1 is small to make sure that it will not overshoot the estimation too much. e.g. β 2 =1.03. Add one step to just compare every pair of two successive items and make the correction on the detection, i.e., if item i is ranking above j but the number of people choosing item i over j is less than the number of people choosing j over i, we can remove those choosing j over i and keep those choosing i over j.

Data Description Create a random total order on n candidates as the ground-truth order. Add paired comparison edges (i, j) randomly with preference directions following the ground-truth order. Randomly choose a portion of the comparison edges and reverse them in preference directions. Notation: SN (Sample Number): total number of paired comparisons. ON (Outlier Number): number of outliers. OP (Outlier Percentage): ON/SN.

ilts vs. LASSO (Precisions) avg (sd) OP=5% OP=10% OP=15% OP=30% OP=45% OP=50% SN=1000 0.997(0.022) 0.993(0.023) 0.993(0.015) 0.942(0.037) 0.670(0.078) 0.505(0.097) SN=2000 1.000(0.000) 1.000(0.000) 0.998(0.009) 0.976(0.023) 0.751(0.067) 0.503(0.089) SN=3000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.991(0.013) 0.811(0.060) 0.502(0.090) SN=4000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.995(0.010) 0.829(0.059) 0.498(0.098) SN=5000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.998(0.006) 0.847(0.052) 0.499(0.101) Precisions for simulated data via ilts, 100 times repeat. avg (sd) OP=5% OP=10% OP=15% OP=30% OP=45% OP=50% SN=1000 0.972(0.033) 0.962(0.030) 0.958(0.025) 0.905(0.032) 0.698(0.067) 0.513(0.085) SN=2000 0.996(0.011) 0.990(0.014) 0.984(0.016) 0.942(0.022) 0.750(0.056) 0.516(0.084) SN=3000 0.999(0.005) 0.997(0.008) 0.992(0.012) 0.957(0.020) 0.796(0.050) 0.523(0.083) SN=4000 0.999(0.001) 0.999(0.005) 0.996(0.009) 0.970(0.016) 0.818(0.048) 0.518(0.093) SN=5000 0.999(0.002) 1.000(0.000) 0.998(0.006) 0.972(0.016) 0.837(0.038) 0.525(0.088) Precisions for simulated data via LASSO, 100 times repeat.

ilts vs. LASSO (Recalls) avg (sd) OP=5% OP=10% OP=15% OP=30% OP=45% OP=50% SN=1000 1.000(0.000) 0.994(0.015) 0.994(0.010) 0.943(0.036) 0.653(0.080) 0.438(0.093) SN=2000 1.000(0.000) 1.000(0.000) 0.999(0.006) 0.978(0.019) 0.727(0.071) 0.456(0.087) SN=3000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.991(0.012) 0.797(0.062) 0.464(0.089) SN=4000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.996(0.007) 0.821(0.060) 0.466(0.098) SN=5000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.998(0.006) 0.842(0.052) 0.470(0.100) Recalls for simulated data via ilts, 100 times repeat. avg (sd) OP=5% OP=10% OP=15% OP=30% OP=45% OP=50% SN=1000 0.972(0.033) 0.962(0.030) 0.958(0.025) 0.905(0.032) 0.698(0.067) 0.513(0.085) SN=2000 0.996(0.011) 0.990(0.014) 0.984(0.016) 0.942(0.022) 0.750(0.056) 0.518(0.084) SN=3000 0.999(0.005) 0.997(0.008) 0.992(0.012) 0.957(0.020) 0.796(0.050) 0.523(0.083) SN=4000 0.999(0.001) 0.999(0.005) 0.996(0.009) 0.970(0.016) 0.818(0.048) 0.518(0.093) SN=5000 0.999(0.002) 1.000(0.000) 0.998(0.006) 0.972(0.016) 0.837(0.038) 0.525(0.088) Recalls for simulated data via LASSO, 100 times repeat.

ilts vs. LASSO (F1 scores) avg (sd) OP=5% OP=10% OP=15% OP=30% OP=45% OP=50% SN=1000 0.998(0.012) 0.994(0.019) 0.994(0.012) 0.943(0.036) 0.675(0.079) 0.469(0.095) SN=2000 1.000(0.000) 1.000(0.000) 0.999(0.007) 0.977(0.021) 0.739(0.069) 0.478(0.088) SN=3000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.991(0.012) 0.804(0.061) 0.482(0.089) SN=4000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.996(0.009) 0.825(0.059) 0.482(0.098) SN=5000 1.000(0.000) 1.000(0.000) 1.000(0.000) 0.998(0.006) 0.845(0.052) 0.484(0.101) F1 scores for simulated data via ilts, 100 times repeat. avg (sd) OP=5% OP=10% OP=15% OP=30% OP=45% OP=50% SN=1000 0.972(0.033) 0.962(0.030) 0.958(0.025) 0.905(0.032) 0.698(0.067) 0.513(0.085) SN=2000 0.996(0.011) 0.990(0.014) 0.984(0.016) 0.942(0.022) 0.750(0.056) 0.516(0.084) SN=3000 0.999(0.005) 0.997(0.008) 0.992(0.012) 0.957(0.020) 0.796(0.050) 0.523(0.083) SN=4000 0.999(0.001) 0.999(0.005) 0.996(0.009) 0.970(0.016) 0.818(0.048) 0.518(0.093) SN=5000 0.999(0.002) 1.000(0.000) 0.998(0.006) 0.972(0.016) 0.837(0.038) 0.525(0.088) F1 scores for simulated data via LASSO, 100 times repeat.

ilts is Fast! time (second) OP=5% OP=10% OP=15% OP=30% OP=45% OP=50% SN=1000 4.38 3.92 3.65 3.61 3.66 3.73 SN=2000 6.54 5.93 5.62 5.33 5.13 5.15 SN=3000 8.86 8.14 7.62 6.98 7.15 7.02 SN=4000 11.01 10.32 9.67 8.87 8.78 8.81 SN=5000 13.23 12.36 12.14 11.59 10.79 10.49 Computing time for 100 runs in total via ilts. time (second) OP=5% OP=10% OP=15% OP=30% OP=45% OP=50% SN=1000 625.14 673.75 690.31 636.35 638.65 560.71 SN=2000 905.04 973.64 938.37 887.72 818.99 806.25 SN=3000 1116.23 1167.45 1184.89 1032.88 822.35 929.75 SN=4000 1158.67 1256.82 1305.28 1087.81 948.75 1011.45 SN=5000 1288.02 1375.14 1368.75 1104.32 1034.12 1077.93 Computing time for 100 runs in total via LASSO. ilts can achieve up to about 190 times faster than LASSO!!!

Outline Motivation Outlier Detection Numerical Experiments Conclusions Data Description Figure: Left: PC-VQA dataset; right: PC-IQA dataset. PC-VQA (complete and balanced dataset): 38,400 paired comparisons for 10 reference videos, collected from 209 random observers. PC-IQA (incomplete and imbalanced dataset): 23,097 paired comparisons for 15 reference images, collected from 187 random observers. Qianqian Xu, Ming Yan, Yuan Yao iterative Least Trimmed Squares

Experimental Results ID 1 9 10 13 7 8 11 14 15 3 12 4 16 5 6 2 1 0 22 29 30 30 29 29 29 30 28 29 32 32 31 32 31 9 10 0 22 20 14 23 23 25 29 29 32 30 29 30 29 31 10 3 10 0 22 11 21 29 23 31 27 31 30 32 30 32 31 13 2 12 10 0 18 22 23 27 31 28 29 29 29 25 27 28 7 2 18 21 14 0 21 14 16 28 23 31 25 19 27 26 28 8 3 9 11 10 11 0 25 14 28 25 29 27 24 25 28 32 11 3 9 3 9 18 7 0 22 27 26 26 30 30 27 27 31 14 3 7 9 5 16 18 10 0 28 27 18 29 29 26 28 29 15 2 3 1 1 4 4 5 4 0 25 20 22 26 25 29 24 3 4 3 5 4 9 7 6 5 7 0 11 15 26 24 29 28 12 3 0 1 3 1 3 6 14 12 21 0 16 20 24 26 26 4 0 2 2 3 7 5 2 3 10 17 16 0 15 26 27 30 16 0 3 0 3 13 8 2 3 6 6 12 17 0 22 24 28 5 1 2 2 7 5 7 5 6 7 8 8 6 10 0 26 27 6 0 3 0 5 6 4 5 4 3 3 6 5 8 6 0 21 2 1 1 1 4 4 0 1 3 8 4 6 2 4 5 11 0 Paired comparison matrixes of reference (a) in PC-VQA dataset. Red pairs are outliers obtained by both ilts and LASSO. Open blue circles are outliers obtained by LASSO but not ilts. Filled blue circles are outliers obtained by ilts but not LASSO.

Different rankings after outlier removal Video ID L2 LASSO IRLS 1 1 ( 0.7930 ) 1 ( 0.9123 ) 1 ( 0.9129 ) 9 2 ( 0.5312 ) 2 ( 0.7537 ) 2 ( 0.7539 ) 10 3 ( 0.4805 ) 3 ( 0.6317 ) 3 ( 0.6322 ) 13 4 ( 0.3906 ) 4 ( 0.5522 ) 4 ( 0.5524 ) 7 5 ( 0.2852 ) 5 ( 0.4533 ) 5 ( 0.4537 ) 8 6 ( 0.2383 ) 6 ( 0.3159 ) 6 ( 0.3163 ) 11 7 ( 0.2148 ) 7 ( 0.2113 ) 7 ( 0.2120 ) 14 8 ( 0.1641 ) 8 ( 0.1099 ) 8 ( 0.1103 ) 15 9 (-0.1758 ) 9 (-0.1024 ) 9 (-0.1029 ) 3 10 (-0.2227 ) 11 (-0.3195 ) 12 (-0.3999 ) 12 11 (-0.2500 ) 10 (-0.2149 ) 10 (-0.2158 ) 4 12 (-0.2930 ) 12 (-0.4054 ) 11 (-0.3252 ) 16 13 (-0.3633 ) 13 (-0.5311 ) 13 (-0.5332 ) 5 14 (-0.4414 ) 14 (-0.6573 ) 14 (-0.6568 ) 6 15 (-0.6289 ) 15 (-0.8054 ) 15 (-0.8057 ) 2 16 (-0.7227 ) 16 (-0.9046 ) 16 (-0.9042 ) Different rankings for reference (a) in the PC-VQA dataset. The integer represents the ranking position. The number in parentheses represents the global ranking score.

Experimental Results ID 1 8 16 2 3 11 6 12 9 14 5 13 7 10 15 4 1 0 13 9 16 19 12 15 13 14 14 14 17 16 17 16 16 8 6 0 8 7 8 5 13 7 7 8 19 8 15 9 12 15 16 4 0 0 9 11 9 8 15 3 18 16 17 12 7 21 18 2 5 5 6 0 8 9 10 11 7 14 13 14 14 13 14 15 3 3 4 6 7 0 6 11 9 10 16 12 15 14 14 18 13 11 4 6 3 5 6 0 5 3 5 6 21 5 11 7 12 18 6 0 2 7 4 2 7 0 12 12 7 22 15 17 13 13 17 12 3 4 1 4 4 3 1 0 8 15 18 12 9 8 13 17 9 1 3 3 5 1 3 1 0 0 5 18 10 14 9 7 16 14 0 0 1 0 0 3 7 2 1 0 14 15 10 8 17 19 5 0 0 0 0 0 0 0 0 0 1 0 14 19 19 15 17 13 0 0 0 0 0 0 0 0 0 0 6 0 5 7 17 16 7 0 0 0 0 0 0 0 0 0 0 0 5 0 8 9 18 10 0 0 0 0 0 0 0 0 0 0 0 2 2 0 3 11 15 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 11 4 0 0 0 0 0 0 0 0 0 0 0 1 0 6 6 0 Paired comparison matrixes of reference (c) in PC-IQA dataset. Red pairs are outliers obtained by both ilts and LASSO. Open blue circles are outliers obtained by LASSO but not ilts. Filled blue circles are outliers obtained by ilts but not LASSO.

Different rankings after outliers removal for reference (c) in the PC-IQA dataset Image ID L2 LASSO IRLS 1 1 ( 0.7575 ) 1 ( 0.9015 ) 1 ( 0.9022 ) 8 2 ( 0.5670 ) 2 ( 0.7088 ) 2 ( 0.7129 ) 16 3 ( 0.5124 ) 3 ( 0.6472 ) 3 ( 0.6504 ) 2 4 ( 0.4642 ) 4 ( 0.5242 ) 4 ( 0.5248 ) 3 5 ( 0.4423 ) 5 ( 0.4119 ) 5 ( 0.4148 ) 11 6 ( 0.3277 ) 6 ( 0.2592 ) 7 ( 0.1763 ) 6 7 ( 0.3128 ) 7 ( 0.2515 ) 6 ( 0.3124 ) 12 8 ( 0.2423 ) 8 ( 0.1209 ) 8 ( 0.1261 ) 9 9 ( 0.1453 ) 9 ( 0.0043 ) 9 ( 0.0069 ) 14 10 (-0.0455 ) 10 (-0.1274 ) 10 (-0.1243 ) 5 11 (-0.3376 ) 11 (-0.3205 ) 11 (-0.3214 ) 13 12 (-0.4785 ) 12 (-0.4621 ) 12 (-0.4560 ) 7 13 (-0.5396 ) 13 (-0.5515 ) 13 (-0.5494 ) 10 14 (-0.7486 ) 14 (-0.7005 ) 15 (-0.7485 ) 15 15 (-0.7658 ) 15 (-0.7511 ) 14 (-0.7106 ) 4 16 (-0.8559 ) 16 (-0.9163 ) 16 (-0.9166 ) Remark: ilts prefers to choose minority votings as outliers, while the LASSO selects the large deviations from the gradient of global ranking score as outliers even when they are in majority votings. Such a small difference only leads to a local order change of nearby ranked items, so both of these two ranking algorithms are stable.

Summary ilts is a surprisingly simple, efficient, and automatic algorithm for outlier detection in QoE evaluation. ilts achieves up to 190 times faster than LASSO. ilts can automatically estimate the number of outliers and detect them without any prior information about the number of outliers exist in the dataset. Open resources: report, slides, and codes available on https://code.google.com/p/irls/.