Large-Scale Nearest Neighbor Classification with Statistical Guarantees

Similar documents
Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Advanced Introduction to Machine Learning CMU-10715

PAC-learning, VC Dimension and Margin-based Bounds

Recap from previous lecture

Day 5: Generative models, structured classification

A first model of learning

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

Holdout and Cross-Validation Methods Overfitting Avoidance

CS7267 MACHINE LEARNING

Hierarchical models for the rainfall forecast DATA MINING APPROACH

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Introduction to Machine Learning

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Classification and Pattern Recognition

Lossless Online Bayesian Bagging

MIRA, SVM, k-nn. Lirong Xia

Machine Learning

PAC-learning, VC Dimension and Margin-based Bounds

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University

Final Exam, Machine Learning, Spring 2009

ECE 5984: Introduction to Machine Learning

Machine Learning

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

ECE 5424: Introduction to Machine Learning

Random projection ensemble classification

Variance Reduction and Ensemble Methods

Learning Theory Continued

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Lecture 3: Statistical Decision Theory (Part II)

A Simple Algorithm for Learning Stable Machines

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

CSE 546 Final Exam, Autumn 2013

Data Mining und Maschinelles Lernen

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Learning theory. Ensemble methods. Boosting. Boosting: history

Machine Learning, Fall 2009: Midterm

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

10-701/ Machine Learning, Fall

Generalization, Overfitting, and Model Selection

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1

Lazy Rule Learning Nikolaus Korfhage

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

CS145: INTRODUCTION TO DATA MINING

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

A Bias Correction for the Minimum Error Rate in Cross-validation

COMS 4771 Lecture Boosting 1 / 16

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Oliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Linear Methods for Classification

IFT Lecture 7 Elements of statistical learning theory

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

cxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c

18.9 SUPPORT VECTOR MACHINES

Introduction to Machine Learning Midterm Exam

Ensemble Methods for Machine Learning

Consistency of Nearest Neighbor Methods

Brief Introduction to Machine Learning

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

A Strongly Quasiconvex PAC-Bayesian Bound

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Machine Learning for Signal Processing Bayes Classification

Midterm Exam, Spring 2005

Gradient Boosting (Continued)

COMS 4771 Introduction to Machine Learning. Nakul Verma

Gradient Boosting, Continued

Lecture 15: Random Projections

Time Series Classification

Computational and Statistical Learning Theory

Announcements. Proposals graded

Statistical Machine Learning from Data

ML in Practice: CMSC 422 Slides adapted from Prof. CARPUAT and Prof. Roth

Introduction. Chapter 1

CS6220: DATA MINING TECHNIQUES

VBM683 Machine Learning

Day 3: Classification, logistic regression

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Lecture 2: Linear Algebra Review

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Learning Time-Series Shapelets

Linear and Logistic Regression. Dr. Xiaowei Huang

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Learning with multiple models. Boosting.

Midterm: CS 6375 Spring 2018

PATTERN RECOGNITION AND MACHINE LEARNING

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Lecture 3: Introduction to Complexity Regularization

Nearest Neighbor Pattern Classification

FINAL: CS 6375 (Machine Learning) Fall 2014

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Transcription:

Large-Scale Nearest Neighbor Classification with Statistical Guarantees Guang Cheng Big Data Theory Lab Department of Statistics Purdue University Joint Work with Xingye Qiao and Jiexin Duan July 3, 2018 Guang Cheng Big Data Theory Lab@Purdue 1 / 48

SUSY Data Set (UCI Machine Learning Repository) Size: 5,000,000 particles from the accelerator Predictor variables: 18 (properties of the particles) Goal Classification: distinguish between a signal process and a background process Guang Cheng Big Data Theory Lab@Purdue 2 / 48

Nearest Neighbor Classification The knn classifier predicts the class of x R d to be the most frequent class of its k nearest neighbors (Euclidean distance). Guang Cheng Big Data Theory Lab@Purdue 3 / 48

Computational/Space Challenges for knn Classifiers Time complexity: O(dN + Nlog(N)) dn: computing distances from the query point to all N observations in R d Nlog(N): sorting N distances and selecting the k nearest distances Space complexity: O(dN) Here, N is the size of the entire dataset Guang Cheng Big Data Theory Lab@Purdue 4 / 48

How to Conquer Big Data? Guang Cheng Big Data Theory Lab@Purdue 5 / 48

If We Have A Supercomputer... Train the total data at one time: oracle knn Figure: I m Pac-Superman! Guang Cheng Big Data Theory Lab@Purdue 6 / 48

Construction of Pac-Superman To build a Pac-superman for oracle-knn, we need: Expensive super-computer Guang Cheng Big Data Theory Lab@Purdue 7 / 48

Construction of Pac-Superman To build a Pac-superman for oracle-knn, we need: Expensive super-computer Operation system, e.g. Linux Guang Cheng Big Data Theory Lab@Purdue 7 / 48

Construction of Pac-Superman To build a Pac-superman for oracle-knn, we need: Expensive super-computer Operation system, e.g. Linux Statistical software designed for big data, e.g. Spark, Hadoop Guang Cheng Big Data Theory Lab@Purdue 7 / 48

Construction of Pac-Superman To build a Pac-superman for oracle-knn, we need: Expensive super-computer Operation system, e.g. Linux Statistical software designed for big data, e.g. Spark, Hadoop Write complicated algorithms (e.g. MapReduce) with limited extendibility to other classification methods Guang Cheng Big Data Theory Lab@Purdue 7 / 48

Machine Learning Literature In machine learning area, there are some nearest neighbor algorithms for big data already Muja&Lowe(2014) designed an algorithm to search nearest neighbors for big data Comments: complicated; without statistical gurantee; lack of extendibility We hope to design an algorithm framework: Easy to implement Strong statistical guarantee More generic and expandable (e.g. easy to be applied to other classifiers (decision trees, SVM, etc) Guang Cheng Big Data Theory Lab@Purdue 8 / 48

If We Don t Have A Supercomputer... What can we do? Guang Cheng Big Data Theory Lab@Purdue 9 / 48

If We Don t Have A Supercomputer... What about splitting the data? Guang Cheng Big Data Theory Lab@Purdue 10 / 48

Divide & Conquer (D&C) Framework Goal: deliver different insights from regression problems. Guang Cheng Big Data Theory Lab@Purdue 11 / 48

Big-kNN for SUSY Data Oracle-kNN: knn trained by the entire data Number of subsets: s = N γ, γ = 0.1, 0.2,... 0.8 risk (error rate) 0.29 0.30 0.31 0.32 0.33 speedup 0 500 1000 1500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn Oracle knn Big knn Figure: Risk and Speedup for Big-kNN. Risk of classifier φ: R(φ) = P(φ(X ) Y ) Speedup: running time ratio between Oracle-kNN & Big-kNN Guang Cheng Big Data Theory Lab@Purdue 12 / 48

Construction of Pac-Batman (Big Data) Guang Cheng Big Data Theory Lab@Purdue 13 / 48

We need a statistical understanding of Big-kNN... Guang Cheng Big Data Theory Lab@Purdue 14 / 48

Weighted Nearest Neighbor Classifiers (WNN) Definition: WNN In a dataset with size n, the WNN classifier assigns a weight w ni on the i-th neighbor of x: { n } φ wn n (x) = 1 w ni 1{Y (i) = 1} 1 n s.t. w ni = 1, 2 i=1 i=1 When w ni = k 1 1{1 i k}, WNN reduces to knn. Guang Cheng Big Data Theory Lab@Purdue 15 / 48

D&C Construction of WNN Construction of Big-WNN In a big data set with size N = sn, the Big-WNN is constructed as: s φ Big n,s (x) = 1 s 1 φ (j) n (x) 1/2, j=1 where φ (j) n (x) is the local WNN in j-th subset. Guang Cheng Big Data Theory Lab@Purdue 16 / 48

Statistical Guarantees: Classification Accuracy & Stability Primary criterion: accuracy ] Regret=Expected Risk Bayes Risk= E D [R( φ n ) R(φ Bayes ) A small value of regret is preferred Guang Cheng Big Data Theory Lab@Purdue 17 / 48

Statistical Guarantees: Classification Accuracy & Stability Primary criterion: accuracy ] Regret=Expected Risk Bayes Risk= E D [R( φ n ) R(φ Bayes ) A small value of regret is preferred Secondary criterion: stability (Sun et al, 2016, JASA) Definition: Classification Instability (CIS) Define classification instability of a classification procedure Ψ as CIS(Ψ) = E D1,D 2 [P X ( φn1 (X ) φ )] n2 (X ), where φ n1 and φ n2 are the classifiers trained from the same classification procedure Ψ based on D 1 and D 2 with the same distribution. A small value of CIS is preferred Guang Cheng Big Data Theory Lab@Purdue 17 / 48

Asymptotic Regret of Big-WNN Theorem (Regret of Big-WNN) Under regularity assumptions, with s upper bounded by the subset size n, we have as n, s, Regret (Big-WNN) B 1 s 1 n ( n wni 2 + B 2 i=1 i=1 α i w ni n 2/d ) 2, where α i = i 1+ 2 d (i 1) 1+ 2 d, w ni are the local weights. The constants B 1 and B 2 are distribution-dependent quantities. Guang Cheng Big Data Theory Lab@Purdue 18 / 48

Asymptotic Regret of Big-WNN Theorem (Regret of Big-WNN) Under regularity assumptions, with s upper bounded by the subset size n, we have as n, s, Regret (Big-WNN) B 1 s 1 n ( n wni 2 + B 2 i=1 i=1 α i w ni n 2/d ) 2, where α i = i 1+ 2 d (i 1) 1+ 2 d, w ni are the local weights. The constants B 1 and B 2 are distribution-dependent quantities. Variance part Guang Cheng Big Data Theory Lab@Purdue 18 / 48

Asymptotic Regret of Big-WNN Theorem (Regret of Big-WNN) Under regularity assumptions, with s upper bounded by the subset size n, we have as n, s, Regret (Big-WNN) B 1 s 1 n ( n wni 2 + B 2 i=1 i=1 α i w ni n 2/d ) 2, where α i = i 1+ 2 d (i 1) 1+ 2 d, w ni are the local weights. The constants B 1 and B 2 are distribution-dependent quantities. Variance part Bias part Guang Cheng Big Data Theory Lab@Purdue 18 / 48

Asymptotic Regret Comparison Round 1 Guang Cheng Big Data Theory Lab@Purdue 19 / 48

Asymptotic Regret Comparison Theorem (Regret Ratio of Big-WNN and Oracle-WNN) Given an Oracle-WNN classifier, we can always design a Big-WNN by adjusting its weights according to those in the oracle version s.t. Regret (Big-WNN) Regret (Oracle-WNN) Q, where Q = ( π 2 ) 4 d+4 and 1 < Q < 2. Guang Cheng Big Data Theory Lab@Purdue 20 / 48

Asymptotic Regret Comparison Theorem (Regret Ratio of Big-WNN and Oracle-WNN) Given an Oracle-WNN classifier, we can always design a Big-WNN by adjusting its weights according to those in the oracle version s.t. Regret (Big-WNN) Regret (Oracle-WNN) Q, where Q = ( π 2 ) 4 d+4 and 1 < Q < 2. E.g., in Big-kNN case, we set k = ( π 2 ) d d+4 ko s given the oracle k O. We need a multiplicative constant ( π 2 ) d d+4! Guang Cheng Big Data Theory Lab@Purdue 20 / 48

Asymptotic Regret Comparison Theorem (Regret Ratio of Big-WNN and Oracle-WNN) Given an Oracle-WNN classifier, we can always design a Big-WNN by adjusting its weights according to those in the oracle version s.t. Regret (Big-WNN) Regret (Oracle-WNN) Q, where Q = ( π 2 ) 4 d+4 and 1 < Q < 2. E.g., in Big-kNN case, we set k = ( π 2 ) d d+4 ko s given the oracle k O. We need a multiplicative constant ( π 2 ) d We name Q the Majority Voting (MV) Constant. d+4! Guang Cheng Big Data Theory Lab@Purdue 20 / 48

Majority Voting Constant MV constant monotonically decreases to one as d increases Q 1.0 1.1 1.2 1.3 1.4 For example: d=1, Q=1.44 d=2, Q=1.35 d=5, Q=1.22 d=10, Q=1.14 d=20, Q=1.08 d=50, Q=1.03 d=100, Q=1.02 0 20 40 60 80 100 d Guang Cheng Big Data Theory Lab@Purdue 21 / 48

How to Compensate Statistical Accuracy Loss, i.e., Q > 1 Why is there statistical accuracy loss? Guang Cheng Big Data Theory Lab@Purdue 22 / 48

Statistical Accuracy Loss due to Majority Voting Accuracy loss due to the transformation from (continuous) percentage to (discrete) 0-1 label Guang Cheng Big Data Theory Lab@Purdue 23 / 48

Majority Voting/Accuracy Loss Once in Oracle Classifier Only one majority voting in oracle classifier Figure: I m Pac-Superman! Guang Cheng Big Data Theory Lab@Purdue 24 / 48

Majority Voting/Accuracy Loss Twice in D&C Framework Guang Cheng Big Data Theory Lab@Purdue 25 / 48

Next Step... Is it possible to apply majority voting once in D&C? Guang Cheng Big Data Theory Lab@Purdue 26 / 48

A Continuous Version of D&C Framework Guang Cheng Big Data Theory Lab@Purdue 27 / 48

Only Loss Once in the Continuous Version Guang Cheng Big Data Theory Lab@Purdue 28 / 48

Construction of Pac-Batman (Continuous Version) Guang Cheng Big Data Theory Lab@Purdue 29 / 48

Continuous Version of Big-WNN (C-Big-WNN) Definition: C-Big-WNN In a data set with size N = sn, the C-Big-WNN is constructed as: φ CBig 1 s n,s (x) = 1 Ŝ n (j) (x) 1/2 s, where Ŝ n (j) (x) = n i=1 w ni,jy (i),j (x) is the weighted average for nearest neighbors of query point x in j-th subset. j=1 Guang Cheng Big Data Theory Lab@Purdue 30 / 48

Continuous Version of Big-WNN (C-Big-WNN) Definition: C-Big-WNN In a data set with size N = sn, the C-Big-WNN is constructed as: φ CBig 1 s n,s (x) = 1 Ŝ n (j) (x) 1/2 s, where Ŝ n (j) (x) = n i=1 w ni,jy (i),j (x) is the weighted average for nearest neighbors of query point x in j-th subset. Step 1: Percentage calculation j=1 Guang Cheng Big Data Theory Lab@Purdue 30 / 48

Continuous Version of Big-WNN (C-Big-WNN) Definition: C-Big-WNN In a data set with size N = sn, the C-Big-WNN is constructed as: φ CBig 1 s n,s (x) = 1 Ŝ n (j) (x) 1/2 s, where Ŝ n (j) (x) = n i=1 w ni,jy (i),j (x) is the weighted average for nearest neighbors of query point x in j-th subset. j=1 Step 1: Percentage calculation Step 2: Majority voting Guang Cheng Big Data Theory Lab@Purdue 30 / 48

C-Big-kNN for SUSY Data risk (error rate) 0.29 0.30 0.31 0.32 0.33 speedup 0 500 1000 1500 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn C Big knn Oracle knn Big knn C Big knn Figure: Risk and Speedup for Big-kNN and C-Big-kNN. Guang Cheng Big Data Theory Lab@Purdue 31 / 48

Asymptotic Regret Comparison Round 2 Guang Cheng Big Data Theory Lab@Purdue 32 / 48

Asymptotic Regret Comparison Theorem (Regret Ratio between C-Big-WNN and Oracle-WNN) Given an Oracle-WNN, we can always design a C-Big-WNN by adjusting its weights according to those in the oracle version s.t. Regret (C-Big-WNN) Regret (Oracle-WNN) 1, Guang Cheng Big Data Theory Lab@Purdue 33 / 48

Asymptotic Regret Comparison Theorem (Regret Ratio between C-Big-WNN and Oracle-WNN) Given an Oracle-WNN, we can always design a C-Big-WNN by adjusting its weights according to those in the oracle version s.t. Regret (C-Big-WNN) Regret (Oracle-WNN) 1, E.g., in the C-Big-kNN case, we can set k = ko s oracle k O. There is no multiplicative constant! given the Guang Cheng Big Data Theory Lab@Purdue 33 / 48

Asymptotic Regret Comparison Theorem (Regret Ratio between C-Big-WNN and Oracle-WNN) Given an Oracle-WNN, we can always design a C-Big-WNN by adjusting its weights according to those in the oracle version s.t. Regret (C-Big-WNN) Regret (Oracle-WNN) 1, E.g., in the C-Big-kNN case, we can set k = ko s given the oracle k O. There is no multiplicative constant! Note that this theorem can be directly applied to the optimally weighted NN (Samworth, 2012, AoS), i.e., OWNN, whose weights are optimized to lead to the minimal regret. Guang Cheng Big Data Theory Lab@Purdue 33 / 48

Practical Consideration: Speed vs Accuracy It is time consuming to apply OWNN in each subset although it leads to the minimal regret Guang Cheng Big Data Theory Lab@Purdue 34 / 48

Practical Consideration: Speed vs Accuracy It is time consuming to apply OWNN in each subset although it leads to the minimal regret Rather, it is more easier and faster to implement knn in each subset (with accuracy loss, though) Guang Cheng Big Data Theory Lab@Purdue 34 / 48

Practical Consideration: Speed vs Accuracy It is time consuming to apply OWNN in each subset although it leads to the minimal regret Rather, it is more easier and faster to implement knn in each subset (with accuracy loss, though) What is the statistical price (in terms of accuracy and stability) in order to trade accuracy with speed? Guang Cheng Big Data Theory Lab@Purdue 34 / 48

Statistical Price Theorem (Statistical Price) We can design an optimal C-Big-kNN by setting its weights as k = ko,opt s s.t. Regret(Optimal C-Big-kNN) Regret(Oracle-OWNN) CIS(Optimal C-Big-kNN) CIS(Oracle-OWNN) Q, Q, where Q = 2 4/d+4 ( d+4 d+2 )(2d+4)/(d+4) and 1 < Q < 2. Here, k O,opt minimizes the regret of Oracle-kNN. Interestingly, both Q and Q only depend on data dimension! Guang Cheng Big Data Theory Lab@Purdue 35 / 48

Statistical Price Q converges to one as d grows in a unimodal way Q_prime 0.98 1.00 1.02 1.04 1.06 1.08 1.10 1.12 For example: d=1, Q=1.06 d=2, Q=1.08 d=5, Q=1.08 d=10, Q=1.07 d=20, Q=1.04 d=50, Q=1.02 d=100, Q=1.01 0 20 40 60 80 100 d The worst dimension is d = 4 = Q = 1.089. Guang Cheng Big Data Theory Lab@Purdue 36 / 48

Simulation Analysis knn Consider the classification problem for Big-kNN and C-Big-kNN: Sample size: N = 27, 000 Dimensions: d = 6, 8 P 0 N(0 d, I d ) and P 1 N( 2 d 1 d, I d ) Prior class probability: π 1 = Pr(Y = 1) = 1/3 Number of neighbors in Oracle-kNN: k O = N 0.7 Number of subsamples in D&C: s = N γ, γ = 0.1, 0.2,... 0.8 Number of neighbors in Big-kNN: k d = ( π 2 ) d d+4 ko s Number of neighbors in C-Big-kNN: k c = ko s Guang Cheng Big Data Theory Lab@Purdue 37 / 48

Simulation Analysis Empirical Risk risk 0.140 0.145 0.150 0.155 0.160 0.165 0.170 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn C Big knn Oracle knn Bayes risk 0.14 0.15 0.16 0.17 0.18 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn C Big knn Oracle knn Bayes Figure: Empirical Risk (Testing Error). Left/Right: d = 6/8. Guang Cheng Big Data Theory Lab@Purdue 38 / 48

Simulation Analysis Running Time time 0 200 400 600 800 1000 time 0 200 400 600 800 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn C Big knn Oracle knn 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn C Big knn Oracle knn Figure: Running Time. Left/Right: d = 6/8. Guang Cheng Big Data Theory Lab@Purdue 39 / 48

Simulation Analysis Empirical Regret Ratio Q 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Q 1.0 1.5 2.0 2.5 3.0 3.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn C Big knn Big knn C Big knn Figure: Empirical Ratio of Regret. Left/Right: d = 6/8. Q: Regret(Big-kNN)/Regret(Oracle-kNN) or Regret(C-Big-kNN)/Regret(Oracle-kNN) Guang Cheng Big Data Theory Lab@Purdue 40 / 48

Simulation Analysis Empirical CIS cis 0.008 0.009 0.010 0.011 0.012 cis 0.009 0.010 0.011 0.012 0.013 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn C Big knn Oracle knn 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 gamma Big knn C Big knn Oracle knn Figure: Empirical CIS. Left/Right: d = 6/8. Guang Cheng Big Data Theory Lab@Purdue 41 / 48

Real Data Analysis UCI Machine Learning Repository Data Size Dim Big-kNN C-Big-kNN Oracle-kNN Speedup htru2 17898 8 3.72 3.36 3.34 21.27 gisette 6000 5000 12.86 10.77 10.78 12.83 musk1 476 166 36.43 33.87 35.71 3.6 musk2 6598 166 10.43 9.98 10.14 14.68 occup 20560 6 5.09 4.87 5.19 20.31 credit 30000 24 21.85 21.37 21.5 22.44 SUSY 5000000 18 30.66 30.08 30.01 68.45 Table: Test error (Risk): Big-kNN compared to Oracle-kNN in real datasets. Best performance is shown in bold-face. The speedup factor is defined as computing time of Oracle-kNN divided by the time of the slower Big-kNN method. Oracle k = N 0.7, number of subsets s = N 0.3. Guang Cheng Big Data Theory Lab@Purdue 42 / 48

Guang Cheng Big Data Theory Lab@Purdue 48 / 48