Research Overview. Kristjan Greenewald. February 2, University of Michigan - Ann Arbor
|
|
- Shanna Golden
- 5 years ago
- Views:
Transcription
1 Research Overview Kristjan Greenewald University of Michigan - Ann Arbor February 2, 2016
2 2/17 Background and Motivation Want efficient statistical modeling of high-dimensional spatio-temporal data with complex correlation structures. Training samples are often scarce relative to the number of variables. Dataset limitations Data distribution changing slowly over time in non-stationary way. Use mean-covariance model. Will consider the combination of several high-dimensional covariance regularization methods including KronPCA.
3 3/17 Kronecker Covariance Estimation Data vector dimensionality p t p s : p 2 s p 2 t parameters. No assumed structure: Sample covariance (SCM) is known to be a poor estimator in sample starved situations. Natural array arrangement (e.g. spatiotemporal) of variables should imply exploitable structure Kronecker product covariance [Werner et al 2008, Tsiligkaridis et al 2013, etc.] Σ = T S t 11 S t 1T S T S =..... t T 1 S t TT S ps 2 + pt 2 parameters - much lower estimation variance. Issue: Restrictive model gives significant bias in most space-time applications
4 4/17 Robust KronPCA Sum of Kronecker products allows lower bias, but often has conditioning issues. Solution: Include a sparse correction ( r ) Σ T i S i + Γ = Θ + Γ i=1 where Γ is a sparse matrix. Robust KronPCA with sparse noise, analogous to PCA with sparse noise of [Yang et al 2013] etc. Motivation: Avoid degradation of Kronecker basis estimate by a few high magnitude outlier variables and/or correlations. Sensor failure robustness. Noise processes often have sparse correlations which would be unlikely to have the same Kronecker basis as the signal. Identifiability is given via an incoherence assumption on Θ and Γ e.g. [Chandrasekaran et al 2011].
5 5/17 Nuclear Norm-based Objective Function Nuclear and L1 norm penalization based approach. Encourage sparsity of singular values of rearranged R(Θ) (implies low r) and elements of Γ. Objective function min Σ SCM Θ Γ 2 F + β R(Θ) + λ Γ 1,
6 6/17 Theoretical Rates Able to derive a random matrix concentration bound and leverage general Robust PCA theorems. The rate derived for Robust KronPCA: ( { ˆΘ Θ F = O max r p2 t + ps 2 + log M, n }) r p2 t + ps 2 + log M s log pt p s, n n The rate for unstructured (SCM) covariance estimation: ( ) p 2 ˆΣ SCM Σ F = O s pt 2 n Note the significant gains when r, s are both small.
7 7/17 Simulation Results: Estimation MSE Estimation MSE for corrupted Toeplitz (left) and non Toeplitz (right) covariances. Results shown as a function of n for values of λ Θ, λ Γ selected via cross validation.
8 8/17 Dynamic Metric Tracking Metric learning: Learning a metric ( inverse covariance ) best separating data classes. Big, complex data. Unsupervised learning is not good enough - because there are many possible learning tasks. Fully supervised learning requires too much analyst time. Traditional approach: Hand design custom feature set/learning approach. Our goal: Use a relatively small amount of targeted analyst feedback to allow unsupervised techniques to home in on the problem of interest.
9 9/17 Metric Learning Unsupervised techniques require notion of closeness: learn the analyst s problem-specific internal metric. Cluster similar points together Analyst metric: Non-euclidean, potentially complex riemannian metric. E.g. imagery: small changes in appearance cause large L2 changes, and vice versa. Feature relevance: metrics that project the data onto a submanifold prevent irrelevant features from confusing the learner. Example: Grouping objects by shape vs. grouping by color. Different ways to cluster
10 10/17 Metric Drift The real world is often dynamic Social media, news: changing discussion, changing events, behavior etc. Security: Changing attacks, changing technology, changing human behavior. And more... What causes metric drift? Changes in: problem of interest/feature relevance analyst internal metric underlying distribution/classes (changes optimal metric) Potentially rapid changes Need to exploit previous information without being enslaved to it. For simplicity, approximate the metric with the Mahalanobis distance, analogous to the inverse covariance. d M (x, z) = (x z) T M(x z)
11 11/17 Learning Applications Goals Track the relevant metric in the presence of label noise. Find an embedding in which the data clusters are most separated, enabling better interpretation and/or feedback. Applications Improving k-nn classification performance Partially supervised clustering: Estimates a notion of similarity, which is fundamental to clustering. Anomaly detection: Distance to nonanomalous distribution. top 2 images wikipedia.org k-nn Clustering Anomaly Detection
12 12/17 Objective Function: Constraints Analyst provides a sequence of triplets (x t, z t, y t ) (x t, z t ) Pair of instances in R n y t Label: similar = +1, dissimilar = 1 Drift occurs as the sequence is provided by the analyst. Q({M t }) = T l t (M t, µ) + ρr(m t ) t=1 l t (M, µ) = l(m t ), m t = y t (µ (x t z t ) T M(x t z t ))
13 13/17 Online DML Efficient implementation of Composite Objective Mirror Descent (COMID) for online static DML [Kunapuli 2012]. Online updates (B ψ Bregman divergence, η t learning rate) M t+1 = arg min M 0 B ψ(m, M t ) + η t M l t (M t, µ t ), M M t + η t ρr(m) µ t+1 = arg min µ 1 B ψ(µ, µ t ) + η t µ l t (M t, µ t ) T (µ µ t ) Provably sublinear (O( T )) regret for learning rate η t = η 0 / T. Problem: In a true online learning scenario, the drift rate may change without warning. Cannot optimize η 0 a priori.
14 14/17 Strongly Adaptive Learning Combine COMID learners with low static regret on different scale intervals. Pick best performers. Each learner is a mirror descent learner with fixed η 1/ T i. Each (besides Scale 1) is initialized to the current estimate and weight of the next shortest scale.
15 15/17 Results - Shift to Confuser Classes 0.2 No drift 1 Mean K-NN Error Rate Nonadaptive Adaptive Batch Windowed Batch Online ITML K-Means NMI Probability Time (constraints) Drift Time (constraints) Mean K-NN Error Rate Nonadaptive Adaptive Batch Windowed Batch K-Means NMI Probability Time (constraints) Nonadaptive Adaptive Time (constraints) Batch Windowed Batch
16 16/17 Conclusions Kronecker methods significantly reduce the number of training samples required to perform high-dimensional covariance estimation for matrix-valued data. Other work: Block Toeplitz KronPCA Application to detection of moving targets in synthetic aperture radar Time varying Kronecker sum model incorporating sparsity in the inverse Introduced dynamic metric tracking: a strongly adaptive online method to find useful low-dimensional embeddings of changing and/or ambiguous datasets.
17 17/17 Publications [1] K. Greenewald, T. Tsiligkaridis, and A. Hero, Kronecker sum decompositions of space-time data, in Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2013 IEEE 5th International Workshop on, Dec 2013, pp [2] K. Greenewald and A. Hero, Robust kronecker product pca for spatio-temporal covariance estimation, Signal Processing, IEEE Transactions on, vol. 63, no. 23, pp , Dec [3] K. Greenewald and A. O. Hero III, kronecker pca based robust sar stap, arxiv preprint arxiv: , [4] K. Greenewald and A. Hero, Kronecker pca based spatio-temporal modeling of video for dismount classification, in Proceedings of SPIE, [5], Regularized block toeplitz covariance matrix estimation via kronecker product expansions, in Proceedings of IEEE SSP, Accepted papers Greenewald, E. Zelnio, and A. Hero, SPIE Defense+ Security, Under revision: Greenewald, E. Zelnio, and A. Hero, Kronecker PCA Based Robust SAR STAP, IEEE Transactions on Aerospace and Electronic Systems. In preparation: Greenewald, S. Kelley, A. Hero, Dynamic Metric Learning. Greenewald, S. Park, S. Zhou, A. Giessing, Time Varying Matrix Variate Graphical Models. Greenewald, S. Zhou, A. Hero, Multigraphical Lasso.
High Dimensional Covariance Estimation for Spatio-Temporal Processes
High Dimensional Covariance Estimation for Spatio-Temporal Processes by Kristjan Greenewald A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical
More informationInformation-driven learning, distributed fusion, and planning
ARO ARO MURI MURI on on Value-centered Theory Theory for for Adaptive Learning, Inference, Tracking, and and Exploitation -driven learning, distributed fusion, and planning Co-PI Alfred Hero University
More informationSparse Gaussian Markov Random Field Mixtures for Anomaly Detection
Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University
More informationMirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik
Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric
More informationMassive MIMO: Signal Structure, Efficient Processing, and Open Problems II
Massive MIMO: Signal Structure, Efficient Processing, and Open Problems II Mahdi Barzegar Communications and Information Theory Group (CommIT) Technische Universität Berlin Heisenberg Communications and
More informationUnsupervised Anomaly Detection for High Dimensional Data
Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationRiemannian Metric Learning for Symmetric Positive Definite Matrices
CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs
More informationDetection of Anomalous Crowd Behavior Using Spatio-Temporal Multiresolution Model and Kronecker Sum Decompositions
1 Detection of Anomalous Crowd Behavior Using Spatio-Temporal Multiresolution Model and Kronecker Sum Decompositions Kristjan Greenewald, Student Member, IEEE, and Alfred O. Hero III, Fellow, IEEE arxiv:1401.3291v2
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationAn Efficient Sparse Metric Learning in High-Dimensional Space via l 1 -Penalized Log-Determinant Regularization
via l 1 -Penalized Log-Determinant Regularization Guo-Jun Qi qi4@illinois.edu Depart. ECE, University of Illinois at Urbana-Champaign, 405 North Mathews Avenue, Urbana, IL 61801 USA Jinhui Tang, Zheng-Jun
More informationDistance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationDistance Metric Learning
Distance Metric Learning Technical University of Munich Department of Informatics Computer Vision Group November 11, 2016 M.Sc. John Chiotellis: Distance Metric Learning 1 / 36 Outline Computer Vision
More informationDetection of Anomalies in Texture Images using Multi-Resolution Features
Detection of Anomalies in Texture Images using Multi-Resolution Features Electrical Engineering Department Supervisor: Prof. Israel Cohen Outline Introduction 1 Introduction Anomaly Detection Texture Segmentation
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationShort Term Memory Quantifications in Input-Driven Linear Dynamical Systems
Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Peter Tiňo and Ali Rodan School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {P.Tino,
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels
Sparse Least Mean Square Algorithm for Estimation of Truncated Volterra Kernels Bijit Kumar Das 1, Mrityunjoy Chakraborty 2 Department of Electronics and Electrical Communication Engineering Indian Institute
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationComposite Objective Mirror Descent
Composite Objective Mirror Descent John C. Duchi 1,3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,
More informationSupervised Metric Learning with Generalization Guarantees
Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 1
Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationExploiting Sparse Non-Linear Structure in Astronomical Data
Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationModeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview
4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationNormalization Techniques
Normalization Techniques Devansh Arpit Normalization Techniques 1 / 39 Table of Contents 1 Introduction 2 Motivation 3 Batch Normalization 4 Normalization Propagation 5 Weight Normalization 6 Layer Normalization
More informationTensor Canonical Correlation Analysis and Its applications
Tensor Canonical Correlation Analysis and Its applications Presenter: Yong LUO The work is done when Yong LUO was a Research Fellow at Nanyang Technological University, Singapore Outline Y. Luo, D. C.
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationWHEN IS A MAXIMAL INVARIANT HYPOTHESIS TEST BETTER THAN THE GLRT? Hyung Soo Kim and Alfred O. Hero
WHEN IS A MAXIMAL INVARIANT HYPTHESIS TEST BETTER THAN THE GLRT? Hyung Soo Kim and Alfred. Hero Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, MI 489-222 ABSTRACT
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationA Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang
More informationOnline Nonnegative Matrix Factorization with General Divergences
Online Nonnegative Matrix Factorization with General Divergences Vincent Y. F. Tan (ECE, Mathematics, NUS) Joint work with Renbo Zhao (NUS) and Huan Xu (GeorgiaTech) IWCT, Shanghai Jiaotong University
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationA Robust Approach to Regularized Discriminant Analysis
A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,
More informationRecovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies
July 12, 212 Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies Morteza Mardani Dept. of ECE, University of Minnesota, Minneapolis, MN 55455 Acknowledgments:
More informationCorrelation Preserving Unsupervised Discretization. Outline
Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization
More information1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)
Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from
More informationSample Complexity of Learning Mahalanobis Distance Metrics. Nakul Verma Janelia, HHMI
Sample Complexity of Learning Mahalanobis Distance Metrics Nakul Verma Janelia, HHMI feature 2 Mahalanobis Metric Learning Comparing observations in feature space: x 1 [sq. Euclidean dist] x 2 (all features
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationA Framework for Modeling Positive Class Expansion with Single Snapshot
A Framework for Modeling Positive Class Expansion with Single Snapshot Yang Yu and Zhi-Hua Zhou LAMDA Group National Key Laboratory for Novel Software Technology Nanjing University, China Motivating task
More informationRobust covariance matrices estimation and applications in signal processing
Robust covariance matrices estimation and applications in signal processing F. Pascal SONDRA/Supelec GDR ISIS Journée Estimation et traitement statistique en grande dimension May 16 th, 2013 FP (SONDRA/Supelec)
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationTHE estimation of covariance matrices is a crucial component
1 A Subspace Method for Array Covariance Matrix Estimation Mostafa Rahmani and George K. Atia, Member, IEEE, arxiv:1411.0622v1 [cs.na] 20 Oct 2014 Abstract This paper introduces a subspace method for the
More informationDistributed Event Identification for WSNs in Non-Stationary Environments
Distributed Event Identification for WSNs in Non-Stationary Environments K. Ali 2, S.B. Ali 4, I.H. Naqvi 1, M.A. Lodhi 3 1 Department of Electrical Engineering, LUMS Syed Babar Ali School of Science and
More informationStreaming multiscale anomaly detection
Streaming multiscale anomaly detection DATA-ENS Paris and ThalesAlenia Space B Ravi Kiran, Université Lille 3, CRISTaL Joint work with Mathieu Andreux beedotkiran@gmail.com June 20, 2017 (CRISTaL) Streaming
More informationLearning the Number of Neurons in Deep Networks
Learning the Number of Jose M. Alvarez 1 Mathieu Salzmanno 2 1 Data61 @ CSIRO,Canberra, ACT 2601, Australia 2 CVLab, EPFL,CH-1015 Lausanne, Switzerland NIPS,2016 Presenter: Arshdeep Sekhon NIPS,2016 Presenter:
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationClustering non-stationary data streams and its applications
Clustering non-stationary data streams and its applications Amr Abdullatif DIBRIS, University of Genoa, Italy amr.abdullatif@unige.it June 22th, 2016 Outline Introduction 1 Introduction 2 3 4 INTRODUCTION
More informationGROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA
GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University
More informationAnomaly Detection. Jing Gao. SUNY Buffalo
Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More informationSensitivity of the Elucidative Fusion System to the Choice of the Underlying Similarity Metric
Sensitivity of the Elucidative Fusion System to the Choice of the Underlying Similarity Metric Belur V. Dasarathy Dynetics, Inc., P. O. Box 5500 Huntsville, AL. 35814-5500, USA Belur.d@dynetics.com Abstract
More informationMODELLING TEMPORAL VARIATIONS BY POLYNOMIAL REGRESSION FOR CLASSIFICATION OF RADAR TRACKS
MODELLING TEMPORAL VARIATIONS BY POLYNOMIAL REGRESSION FOR CLASSIFICATION OF RADAR TRACKS Lars W. Jochumsen 1,2, Jan Østergaard 2, Søren H. Jensen 2, Morten Ø. Pedersen 1 1 Terma A/S, Hovmarken 4, Lystrup,
More informationTHere are many sensing scenarios for which the target is
Adaptive Multi-Aspect Target Classification and Detection with Hidden Markov Models Shihao Ji, Xuejun Liao, Senior Member, IEEE, and Lawrence Carin, Fellow, IEEE Abstract Target detection and classification
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationSpectral Methods for Subgraph Detection
Spectral Methods for Subgraph Detection Nadya T. Bliss & Benjamin A. Miller Embedded and High Performance Computing Patrick J. Wolfe Statistics and Information Laboratory Harvard University 12 July 2010
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationCOMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR
COMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR Michael Wilmanski #*1, Chris Kreucher *2, & Alfred Hero #3 # University of Michigan 500 S State St, Ann Arbor, MI 48109 1 wilmansk@umich.edu,
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationImproved Kalman Filter Initialisation using Neurofuzzy Estimation
Improved Kalman Filter Initialisation using Neurofuzzy Estimation J. M. Roberts, D. J. Mills, D. Charnley and C. J. Harris Introduction It is traditional to initialise Kalman filters and extended Kalman
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 10 What is Data? Collection of data objects
More informationRobust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds
Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.
More informationVisual meta-learning for planning and control
Visual meta-learning for planning and control Seminar on Current Works in Computer Vision @ Chair of Pattern Recognition and Image Processing. Samuel Roth Winter Semester 2018/19 Albert-Ludwigs-Universität
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More information