Photos placed in horizontal position with even amount of white space between photos and header

Similar documents
Validation Metrics. Kathryn Maupin. Laura Swiler. June 28, 2017

Marginal Specifications and a Gaussian Copula Estimation

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Where would you rather live? (And why?)

Dimension Reduction. David M. Blei. April 23, 2012

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

The Instability of Correlations: Measurement and the Implications for Market Risk

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Bayesian Model Diagnostics and Checking

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Generative Models for Discrete Data

Bayes methods for categorical data. April 25, 2017

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CS281 Section 4: Factor Analysis and PCA

Introduction to Graphical Models

Efficient CP-ALS and Reconstruction From CP

Feature Engineering, Model Evaluations

CSC2515 Assignment #2

Probabilistic Latent Semantic Analysis

Package sbgcop. May 29, 2018

STA 4273H: Sta-s-cal Machine Learning

The sbgcop Package. March 9, 2007

Uncertainty Quantification for Machine Learning and Statistical Models

Machine Learning Techniques for Computer Vision

Analysing geoadditive regression data: a mixed model approach

Reliability of Acceptance Criteria in Nonlinear Response History Analysis of Tall Buildings

Bayesian linear regression

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Convergence of Random Processes

Nonparametric Bayesian Methods (Gaussian Processes)

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Dimension Reduction Methods

Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 1, 2015

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Probabilistic Graphical Models

Chart types and when to use them

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Econometric Analysis of Cross Section and Panel Data

PROBABILISTIC LATENT SEMANTIC ANALYSIS

ECE521 week 3: 23/26 January 2017

a Short Introduction

Probabilistic Graphical Models

STA 216, GLM, Lecture 16. October 29, 2007

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Very simple marginal effects in some discrete choice models *

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CPSC 540: Machine Learning

Statistics Toolbox 6. Apply statistical algorithms and probability models

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Lecture 7: Con3nuous Latent Variable Models

Experimental Design and Data Analysis for Biologists

Probability and Information Theory. Sargur N. Srihari

Linear regression methods

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

Correspondence Analysis of Longitudinal Data

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

20: Gaussian Processes

CPSC 340: Machine Learning and Data Mining

Ensemble Copula Coupling (ECC)

Midterm exam CS 189/289, Fall 2015

Multimodel Ensemble forecasts

Variable selection and machine learning methods in causal inference

Swarthmore Honors Exam 2012: Statistics

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Independent Component (IC) Models: New Extensions of the Multinormal Model

Factor Analysis (10/2/13)

Statistical Learning Theory and the C-Loss cost function

Lecture 3: Statistical Decision Theory (Part II)

Gaussian Models

Leverage Sparse Information in Predictive Modeling

Bayesian inference for multivariate extreme value distributions

Estimation of cumulative distribution function with spline functions

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Structured matrix factorizations. Example: Eigenfaces

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

Robert Collins CSE586, PSU Intro to Sampling Methods

Latent Variable Models and EM Algorithm

Robust Principal Component Analysis

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Question 2 We saw in class that if we flip a coin K times, the the distribution in runs of success in the resulting sequence is equal to

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Parallel Tensor Compression for Large-Scale Scientific Data

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Pattern Recognition and Machine Learning

NATCOR. Forecast Evaluation. Forecasting with ARIMA models. Nikolaos Kourentzes

Algorithms for Collaborative Filtering

Semi-parametric predictive inference for bivariate data using copulas

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Discrete Dependent Variable Models

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Transcription:

Photos placed in horizontal position with even amount of white space between photos and header XPCA: Copula-based Decompositions for Ordinal Data Clifford Anderson-Bergman, Kina Kincher-Winoto and Tamara Kolda Presented by: Clifford Anderson-Bergman DATAWorks 2018 Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy s National Nuclear Security Administration under contract DE-NA0003525. SAND NO. 2017-XXXXP

Introduction: PCA Principal Components Analysis fundamental tool for data analysis o Exploratory data analysis o Dimension reduction o Data compression o Missing data imputation

Introduction: Low-rank PCA PCA procedure can described as Low-rank representation of relation between columns Original Data: n x p Low rank representation of rows: n x k Can be shown to be maximum likelihood estimate of model We will work this interpretation of PCA, but there are others Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. 3

Introduction: Copulas Copulas are methods for generically modeling joint distribution between differing variable types Copulas create a function that joins the CDF for different variables Marginally uniform(0,1) Separates modeling into two steps: o Modeling marginal distributions Copula function that defines relation between marginal uniform(0,1) variables o Modeling relation between marginal uniform RV's 4

Introduction: Gaussian Copula Gaussian copula popular model CDF of multivariate Normal with correlation matrix R Relation between ordered values of variables from each column same as ordered values of multivariate normal Relatively simple to work with o Push original values through marginal CDF + inverse CDF of standard normal o Treat as multi-variate normal Inverse CDF of standard Normal 5

XPCA: PCA for non-normal ordinal data Recall that PCA maximum likelihood estimate under assumption of multivariate normal data We want to relax assumption of multivariate normality by using Copula models to model joint distribution across columns Copula Component Analysis (COCA) took similar approach, but made implicit assumption of all variables being continuous We extend results to discrete ordinal variables Ma, J., & Sun, Z. (2007). Copula component analysis. 6

Why use XPCA? Decomposition is more robust to outliers in individual variables than PCA o COCA has same feature If all data is continuous, resulting factors will be nearly identical between COCA and XPCA XPCA makes better use of discrete variables, i.e. binary being an extreme case o COCA does not and has shown no improvement over PCA with discrete variables We also present method for deriving full conditional distribution of missing data 7

XPCA: Basic Model Latent PCA variable Begin with (unobserved) low-rank PCA model in copula space Latent variables pushed through standard normal CDF so they are marginally uniform(0,1) Values are then pushed through different inverse CDF functions for each variable observed Transforming latent value to observed value Final observed value 8

Likelihood Function: Continuous Variables If all variables are continuous, likelihood can be written as Likelihood function used by COCA Problem: if column j is discrete, then a range of values of U lead to the same value of X Hoff (2007) showed ignoring this can lead to heavy bias in estimated correlations Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. 9

Likelihood Function: Discrete variables Define: Likelihood function is then: Integrating over range of latent variables that leads to observed outcome 10

Fitting the model Fit marginal CDF with Empirical Distribution function o Results in all variables being treated as discrete, even if originally continuous o Consistent estimator of any CDF (including continuous variables) o If all columns are continuous, XPCA solution approaches COCA solution Fit PCA parameters using maximum likelihood estimation Two algorithms implemented o Generic L-BFGS implementation o Custom Alternating Newton's method On average, L-BFGS slightly faster (about 2x) Alternating Newton's more robust 11

Imputing Data: Median Estimate According to XPCA model, estimated median of We want estimate of Because and are monotonic functions, estimated median of is Imputation method used by COCA is 12

Imputing Data: Mean Estimates Once model is fit, can compute Estimated expected value is then Characterizes full conditional distribution! Estimated means are a kernel-smoothed weighted average of all observed values o Automatically range respecting o For properly coded binary variables, mean estimate is probability = 1 13

Simulated Comparisons Simulated low-rank data and examined MSE in recovering mean structure o N rows/cols = 50, 100, 200, 400 o Rank = 5 o Error terms Scenario 1: all Normal Scenario 2: 75% Probit model (i.e. binary), 25% Normal Note that MSE error metric favors PCA (directly attempts to minimize) With all Gaussian data, PCA does best, but advantage decreases with data size XPCA has much less error when binary variables included, even as data size gets larger MSE 0.000 0.005 0.010 0.015 0.020 MSE 0.000 0.005 0.010 0.015 All Normal Errors 4.0 4.5 5.0 5.5 6.0 Log N row/col 75% Binary Variables, 25% Normal PCA COCA XPCA 4.0 4.5 5.0 5.5 6.0 Log N row/col 14

Basket Ball Data Season Statistics for 2015-2016 NBA basketball 476 players 39 variables o Season summary statistics Games played, # wins for player's team, etc. o Game summary statistics Shots made, assists, rebounds, etc. o Draft Round/Number Draft Numbers 1-30 -> Round 1, 31-60 -> Round 2 + undrafted o Others mix of count and continuous data 15

Imputed data: Point estimates PCA provides linear estimates of imputed data o Not constrained to be realistic values COCA provides median estimate o Realistic value but not very informative for binary data! XPCA can extract full distribution o Easily summarized by median/mean/whatever statistic desired Estimate for whether Kevin Garnett was drafted (true value masked) Not valid Method Estimate PCA 1.113 COCA 1 XPCA (mean method) 0.9999997 probability Correct value, but no indication of certainty Fully informative estimate for binary data 16

Imputed data: Conditional Distribution Can use XPCA to extract estimated full conditional distribution of missing values Marginal distribution of shots made across all players Estimated distribution of shots made from individual with same latent features as Kevin Garnett 17

Basketball Data: Cross Validated Rescaled MSE Evaluating methods by 4x cross-validated Rescale Mean Squared Error o MSE of each column rescaled by column variance Lowest overall Rescaled MSE for XPCA PCA error explodes with overfitting XPCA + COCA also prone to overfitting, but because range restricted, damage to MSE more limited 18

Future Directions Accelerating algorithm o Randomized methods? Adding penalization o Improving out of sample error o Can be simpler to pick penalty than pick rank of decomposition Allowing for non-constant o Setting as fixed across columns (a la PCA model) assumes each column is equally predictable o May not be a reasonable assumption for heterogeneous data; instead allow to change by column Adding standard errors 19