Alignment and Analysis of Proteomics Data using Square Root Slope Function Framework

Size: px
Start display at page:

Download "Alignment and Analysis of Proteomics Data using Square Root Slope Function Framework"

Transcription

1 Alignment and Analysis of Proteomics Data using Square Root Slope Function Framework J. Derek Tucker 1 1 Department of Statistics Florida State University Tallahassee, FL CTW: Statistics of Warpings and Phase Variations 2012 Tucker SRSF FDA - Proteomics 1/19

2 Problem Introduction Problem Introduction Given: A collection of observed Total Icon Count (TIC) Chromoatograms Goals: We would like to align the data study their variability (FPCA) develop probability models to capture their variability generate random samples Requirement: Need a proper metric structure on the space of these functions Our Method: Propose phase and amplitude separation using elastic metric as presented by A. Srivastava Tucker SRSF FDA - Proteomics 2/19

3 Problem Introduction Results on Proteomics Data TIC (Total Ion Count) Chromatograms of blood samples. Used in protein profiling, assuming that proteins with different abundances are functionally related to disease processes. 9.0 Original Data Aligned Data Tucker SRSF FDA - Proteomics 3/19

4 Problem Introduction Zoom In on Alignment TIC (Total Ion Count) Chromatograms of blood samples. Used in protein profiling, assuming that proteins with different abundances are functionally related to disease processes. 9.0 Original Data Aligned Data Tucker SRSF FDA - Proteomics 4/19

5 Problem Introduction Results on Proteomics Data A partial answer key is available where several peaks have been manually identified, good alignment was not used in alignment 9.0 Original Data Aligned Data Tucker SRSF FDA - Proteomics 5/19

6 Problem Introduction Zoom In on Alignment A partial answer key is available where several peaks have been manually identified, good alignment was not used in alignment 9.0 Original Data Aligned Data Tucker SRSF FDA - Proteomics 6/19

7 Problem Introduction Comparison with Other Methods MBM Comparison with MBM (James 2007) and MSE (Ramsay and Silverman 2005) methods MSE Tucker SRSF FDA - Proteomics 7/19

8 Problem Introduction Alignment Performance Can also quantify the alignment performance using the decrease in the cumulative cross-sectional variance of the aligned functions Var({g i }) = 1 1 n 1 0 ( n g i (t) 1 n 2 g i (t)) dt n i=1 i=1 Define: Original Variance = Var({f i }), Amplitude Variance = Var({ f i }), Phase Variance = Var({µ f γ i }) Original Variance Elastic Method MBM MSE Amplitude-variance Phase-variance Tucker SRSF FDA - Proteomics 8/19

9 Phase-Variability:Analysis of Warping Functions Analysis of Warping Functions using Horizontal fpca We have a collection of warping functions in the space Γ and we want to model their variability Γ is a nonlinear manifold and we cannot perform FPCA directly We choose to represent warping functions by their SRSFs as presented by A. Sravastava ψ(t) = γ(t) The L 2 norm of this SRSF is: 1 0 ψ(t) 2 dt = 1 0 γ(t) dt = γ(1) γ(0) = 1 Hence, the space of such SRSFs is a unit Hilbert sphere in L 2 ; call it Ψ Tucker SRSF FDA - Proteomics 9/19

10 Phase-Variability:Analysis of Warping Functions Results on Proteomics Data From left to right: the observed warping functions, their Karcher mean, and the first three principal directions of the observed data. Tucker SRSF FDA - Proteomics 10/19

11 Amplitude Variability: Analysis of Aligned Functions Analysis of Aligned Functions using Vertical fpca The aligned can be statistically analyzed in a standard way (in L 2 ) using cross-sectional computations in the SRSF space To properly calculate this we need to perform a joint FPCA which includes the vertical variability of F a functional variable f i is analyzed using the pair h i = (q i, f i (0)) rather than just q i Define covariance operator K h (s, t) = 1 n 1 where µ h = [µ q f (0)] Taking the SVD, K h = U h Σ h V T h n E[( h i (s) µ h (s))( h i (t) µ h (t))] i=1 Tucker SRSF FDA - Proteomics 11/19

12 Amplitude Variability: Analysis of Aligned Functions Results on Proteomics Data First 2 vertical principal-geodesic paths Most of the information is captured in the first first few directions First 5 eigenvalues ( ) Tucker SRSF FDA - Proteomics 12/19

13 Modeling of Phase and Amplitude Components Modeling of Phase and Amplitude Components Let c = (c 1,..., c k1 ) and z = (z 1,..., z k2 ) be the dominant principal coefficients of the amplitude- and phase-components, respectively Recall that c j = h, U h,j and z j = v, U ψ,j We can reconstruct the amplitude component using q = µ q + k 1 j=1 c j U h,j Similar for the phase component using v = k 2 j=1 z ju ψ,j and then using ψ = cos( v )µ ψ + sin( v ) v v, then t γ s (t) = ψ(s) 2 ds 0 Combining the two random quantities, we obtain a random function f s γ s Tucker SRSF FDA - Proteomics 13/19

14 Modeling of Phase and Amplitude Components Modeling Types Gaussian Models on fpca Coefficients Model f s (0), c, and z as multivariate normal random variables The mean of f s (0) is f (0) while the means of c and z are zero vectors Their joint covariance matrix is of the type: σ 2 0 L 1 L 2 L1 T Σ h S R (k 1+k 2 +1) (k 1 +k 2 +1) L2 T S Σ ψ Here, L 1 R 1 k 1 captures the covariance between f (0) and c, L 2 R 1 k 2 between f (0) and z, and S R k 1 k 2 between c and z Non-parametric Models on fpca Coefficients Use of kernel density estimation, where the density of f s (0), each of the k 1 components of c, and the k 2 components of z can be estimated using p ker (x) = 1 n ( ) x xi K nb b i=1 Tucker SRSF FDA - Proteomics 14/19

15 Modeling of Phase and Amplitude Components Modeling Results Amplitude Random Samples Random Warping Functions Random Samples Comparing them with the original data set we conclude that the random samples are similar to the original data Tucker SRSF FDA - Proteomics 15/19

16 Modeling of Phase and Amplitude Components Classification using Pair-Wise Distances da dp L More structure to pairwise-distance matrices for d a and d p over standard L 2 Rates d a = 87% (13/15) d p = 33% (5/15) L 2 = 27% (4/15) Tucker SRSF FDA - Proteomics 16/19

17 Modeling of Phase and Amplitude Components Cumulative Match Characteristic Curve A CMC curve plots the probability of classification against the returned candidate list size Also compared with a naive distance d Naive = argmin γ Γ f i f j γ Classification Rate Random Samples List Size variable L 2 d Naive d p d a Classification Performance of d Naive : 80% (12/15) Our method rapidly approaches over 90% classification rate in contrast to the d Naive and the standard L 2 distances Tucker SRSF FDA - Proteomics 17/19

18 Summary and Conclusions Summary and Future Work Conclusions Excellent alignment was achieved using our square-root slope function framework Used this framework to separate amplitude and phase of the given data Performed fpca on amplitude and phase and imposed models on the components Verified the model using random sampling This theory behind this work has been submitted to Computational Statistics and Data Analysis 2012 Future Work Expand the analysis of classification to probabilistic models given we have more samples Analyze and under stand how additive noise impacts SRSFs and Karcher Mean calculation Tucker SRSF FDA - Proteomics 18/19

19 Questions?? Summary and Conclusions Tucker SRSF FDA - Proteomics 19/19

FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES FUNCTIONAL COMPONENT ANALYSIS AND REGRESSION USING ELASTIC METHODS J.

FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES FUNCTIONAL COMPONENT ANALYSIS AND REGRESSION USING ELASTIC METHODS J. FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES FUNCTIONAL COMPONENT ANALYSIS AND REGRESSION USING ELASTIC METHODS By J. DEREK TUCKER A Dissertation submitted to the Department of Statistics in partial

More information

Registration of Functional Data Using Fisher-Rao Metric

Registration of Functional Data Using Fisher-Rao Metric Registration of Functional Data Using Fisher-Rao Metric arxiv:3.387v [math.st] 6 May A. Srivastava, W. Wu, S. Kurtek, E. Klassen, and J. S. Marron Dept. of Statistics & Dept. of Mathematics, Florida State

More information

Norm-preserving constraint in the Fisher Rao registration and its application in signal estimation

Norm-preserving constraint in the Fisher Rao registration and its application in signal estimation Journal of Nonparametric Statistics ISSN: 148-5252 (Print) 129-311 (Online) Journal homepage: http://www.tandfonline.com/loi/gnst2 Norm-preserving constraint in the Fisher Rao registration and its application

More information

Bayesian Registration of Functions with a Gaussian Process Prior

Bayesian Registration of Functions with a Gaussian Process Prior Bayesian Registration of Functions with a Gaussian Process Prior RELATED PROBLEMS Functions Curves Images Surfaces FUNCTIONAL DATA Electrocardiogram Signals (Heart Disease) Gait Pressure Measurements (Parkinson

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Learning SVM Classifiers with Indefinite Kernels

Learning SVM Classifiers with Indefinite Kernels Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Multivariate Statistical Analysis of Deformation Momenta Relating Anatomical Shape to Neuropsychological Measures

Multivariate Statistical Analysis of Deformation Momenta Relating Anatomical Shape to Neuropsychological Measures Multivariate Statistical Analysis of Deformation Momenta Relating Anatomical Shape to Neuropsychological Measures Nikhil Singh, Tom Fletcher, Sam Preston, Linh Ha, J. Stephen Marron, Michael Wiener, and

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1 FUNCTIONAL DATA ANALYSIS Contribution to the International Handbook (Encyclopedia) of Statistical Sciences July 28, 2009 Hans-Georg Müller 1 Department of Statistics University of California, Davis One

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

The Square Root Velocity Framework for Curves in a Homogeneous Space

The Square Root Velocity Framework for Curves in a Homogeneous Space The Square Root Velocity Framework for Curves in a Homogeneous Space Zhe Su 1 Eric Klassen 1 Martin Bauer 1 1 Florida State University October 8, 2017 Zhe Su (FSU) SRVF for Curves in a Homogeneous Space

More information

Linear Factor Models. Sargur N. Srihari

Linear Factor Models. Sargur N. Srihari Linear Factor Models Sargur N. srihari@cedar.buffalo.edu 1 Topics in Linear Factor Models Linear factor model definition 1. Probabilistic PCA and Factor Analysis 2. Independent Component Analysis (ICA)

More information

1 Cricket chirps: an example

1 Cricket chirps: an example Notes for 2016-09-26 1 Cricket chirps: an example Did you know that you can estimate the temperature by listening to the rate of chirps? The data set in Table 1 1. represents measurements of the number

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Elastic Handling of Predictor Phase in Functional Regression Models

Elastic Handling of Predictor Phase in Functional Regression Models Elastic Handling of Predictor Phase in Functional Regression Models Kyungmin Ahn Department of Statistics Florida State University Tallahassee, FL k.ahn@stat.fsu.edu J. Derek Tucker Sandia National Laboratories

More information

Overview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Overview. DS GA 1002 Probability and Statistics for Data Science.   Carlos Fernandez-Granda Overview DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

FuncICA for time series pattern discovery

FuncICA for time series pattern discovery FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Apprentissage non supervisée

Apprentissage non supervisée Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let

More information

Sparse Proteomics Analysis (SPA)

Sparse Proteomics Analysis (SPA) Sparse Proteomics Analysis (SPA) Toward a Mathematical Theory for Feature Selection from Forward Models Martin Genzel Technische Universität Berlin Winter School on Compressed Sensing December 5, 2015

More information

Fundamental concepts of functional data analysis

Fundamental concepts of functional data analysis Fundamental concepts of functional data analysis Department of Statistics, Colorado State University Examples of functional data 0 1440 2880 4320 5760 7200 8640 10080 Time in minutes The horizontal component

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Persistent homology and nonparametric regression

Persistent homology and nonparametric regression Cleveland State University March 10, 2009, BIRS: Data Analysis using Computational Topology and Geometric Statistics joint work with Gunnar Carlsson (Stanford), Moo Chung (Wisconsin Madison), Peter Kim

More information

Functional Latent Feature Models. With Single-Index Interaction

Functional Latent Feature Models. With Single-Index Interaction Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

STATISTICAL ANALYSIS OF SHAPES OF 2D CURVES, 3D CURVES & FACIAL SURFACES. Anuj Srivastava

STATISTICAL ANALYSIS OF SHAPES OF 2D CURVES, 3D CURVES & FACIAL SURFACES. Anuj Srivastava STATISTICAL ANALYSIS OF SHAPES OF D CURVES, 3D CURVES & FACIAL SURFACES Anuj Srivastava Department of Statistics Florida State University This research supported in part by ARO and NSF. Presented at IMA

More information

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principal Components Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X

More information

Applied Probability and Stochastic Processes

Applied Probability and Stochastic Processes Applied Probability and Stochastic Processes In Engineering and Physical Sciences MICHEL K. OCHI University of Florida A Wiley-Interscience Publication JOHN WILEY & SONS New York - Chichester Brisbane

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0 Singular value decomposition If only the first p singular values are nonzero we write G =[U p U o ] " Sp 0 0 0 # [V p V o ] T U p represents the first p columns of U U o represents the last N-p columns

More information

Functional principal component analysis of aircraft trajectories

Functional principal component analysis of aircraft trajectories Functional principal component analysis of aircraft trajectories Florence Nicol To cite this version: Florence Nicol. Functional principal component analysis of aircraft trajectories. ISIATM 0, nd International

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Overview of sparse system identification

Overview of sparse system identification Overview of sparse system identification J.-Ch. Loiseau 1 & Others 2, 3 1 Laboratoire DynFluid, Arts et Métiers ParisTech, France 2 LIMSI, Université d Orsay CNRS, France 3 University of Washington, Seattle,

More information

Computational Genomics

Computational Genomics Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event

More information

Nonlinear Manifold Learning Summary

Nonlinear Manifold Learning Summary Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES II. DIFFERENTIABLE MANIFOLDS Washington Mio Anuj Srivastava and Xiuwen Liu (Illustrations by D. Badlyans) CENTER FOR APPLIED VISION AND IMAGING SCIENCES Florida State University WHY MANIFOLDS? Non-linearity

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Durham Research Online

Durham Research Online Durham Research Online Deposited in DRO: 5 April 26 Version of attached le: Accepted Version Peer-review status of attached le: Peer-reviewed Citation for published item: Srivastava, A. and Jermyn, I.H.

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize

More information

Smooth Common Principal Component Analysis

Smooth Common Principal Component Analysis 1 Smooth Common Principal Component Analysis Michal Benko Wolfgang Härdle Center for Applied Statistics and Economics benko@wiwi.hu-berlin.de Humboldt-Universität zu Berlin Motivation 1-1 Volatility Surface

More information

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi Signal Modeling Techniques in Speech Recognition Hassan A. Kingravi Outline Introduction Spectral Shaping Spectral Analysis Parameter Transforms Statistical Modeling Discussion Conclusions 1: Introduction

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

1. Geometry of the unit tangent bundle

1. Geometry of the unit tangent bundle 1 1. Geometry of the unit tangent bundle The main reference for this section is [8]. In the following, we consider (M, g) an n-dimensional smooth manifold endowed with a Riemannian metric g. 1.1. Notations

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Performance Limits on the Classification of Kronecker-structured Models

Performance Limits on the Classification of Kronecker-structured Models Performance Limits on the Classification of Kronecker-structured Models Ishan Jindal and Matthew Nokleby Electrical and Computer Engineering Wayne State University Motivation: Subspace models (Union of)

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Dynamically data-driven morphing of reduced order models and the prediction of transients

Dynamically data-driven morphing of reduced order models and the prediction of transients STOCHASTIC ANALYSIS AND NONLINEAR DYNAMICS Dynamically data-driven morphing of reduced order models and the prediction of transients Joint NSF/AFOSR EAGER on Dynamic Data Systems Themis Sapsis Massachusetts

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Geodesic shooting on shape spaces

Geodesic shooting on shape spaces Geodesic shooting on shape spaces Alain Trouvé CMLA, Ecole Normale Supérieure de Cachan GDR MIA Paris, November 21 2014 Outline Riemannian manifolds (finite dimensional) Spaces spaces (intrinsic metrics)

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Option 1: Landmark Registration We can try to align specific points. The Registration Problem. Landmark registration. Time-Warping Functions

Option 1: Landmark Registration We can try to align specific points. The Registration Problem. Landmark registration. Time-Warping Functions The Registration Problem Most analyzes only account for variation in amplitude. Frequently, observed data exhibit features that vary in time. Option 1: Landmark Registration We can try to align specific

More information

Fractal functional regression for classification of gene expression data by wavelets

Fractal functional regression for classification of gene expression data by wavelets Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday, February 25, 1997 (Day 1)

QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday, February 25, 1997 (Day 1) QUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday, February 25, 1997 (Day 1) 1. Factor the polynomial x 3 x + 1 and find the Galois group of its splitting field if the ground

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Density Estimation (II)

Density Estimation (II) Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Intrinsic Bayesian Active Contours for Extraction of Object Boundaries in Images

Intrinsic Bayesian Active Contours for Extraction of Object Boundaries in Images Noname manuscript No. (will be inserted by the editor) Intrinsic Bayesian Active Contours for Extraction of Object Boundaries in Images Shantanu H. Joshi 1 and Anuj Srivastava 2 1 Department of Electrical

More information

CSE 554 Lecture 7: Alignment

CSE 554 Lecture 7: Alignment CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex

More information

L5: Quadratic classifiers

L5: Quadratic classifiers L5: Quadratic classifiers Bayes classifiers for Normally distributed classes Case 1: Σ i = σ 2 I Case 2: Σ i = Σ (Σ diagonal) Case 3: Σ i = Σ (Σ non-diagonal) Case 4: Σ i = σ 2 i I Case 5: Σ i Σ j (general

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Introduction to the Mathematics of Medical Imaging

Introduction to the Mathematics of Medical Imaging Introduction to the Mathematics of Medical Imaging Second Edition Charles L. Epstein University of Pennsylvania Philadelphia, Pennsylvania EiaJTL Society for Industrial and Applied Mathematics Philadelphia

More information

The registration of functional data

The registration of functional data The registration of functional data Page 1 of 38 1. Page 2 of 38 Ten female growth acceleration curves Page 3 of 38 These curves show two types of variation: The usual amplitude variation, seen in the

More information

Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual Speech Recognition

Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual Speech Recognition Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual Speech Recognition Jingyong Su Department of Mathematics & Statistics Texas Tech University, Lubbock, TX jingyong.su@ttu.edu

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

S A T T A I T ST S I T CA C L A L DAT A A T

S A T T A I T ST S I T CA C L A L DAT A A T Microarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 5 Linear Regression dr. Petr Nazarov 31-10-2011 petr.nazarov@crp-sante.lu Statistical data analysis in Excel. 5. Linear regression OUTLINE Lecture

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Chapter 4. Chapter 4 sections

Chapter 4. Chapter 4 sections Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

Covariance to PCA. CS 510 Lecture #8 February 17, 2014 Covariance to PCA CS 510 Lecture 8 February 17, 2014 Status Update Programming Assignment 2 is due March 7 th Expect questions about your progress at the start of class I still owe you Assignment 1 back

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

FUNCTIONAL DATA ANALYSIS FOR DENSITY FUNCTIONS BY TRANSFORMATION TO A HILBERT SPACE

FUNCTIONAL DATA ANALYSIS FOR DENSITY FUNCTIONS BY TRANSFORMATION TO A HILBERT SPACE Submitted to the Annals of Statistics FUNCTIONAL DATA ANALYSIS FOR DENSITY FUNCTIONS BY TRANSFORMATION TO A HILBERT SPACE By Alexander Petersen, and Hans-Georg Müller, Department of Statistics, University

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method Tim Roughgarden & Gregory Valiant April 15, 015 This lecture began with an extended recap of Lecture 7. Recall that

More information

Welcome to Copenhagen!

Welcome to Copenhagen! Welcome to Copenhagen! Schedule: Monday Tuesday Wednesday Thursday Friday 8 Registration and welcome 9 Crash course on Crash course on Introduction to Differential and Differential and Information Geometry

More information

Supplementary Material SPNet: Shape Prediction using a Fully Convolutional Neural Network

Supplementary Material SPNet: Shape Prediction using a Fully Convolutional Neural Network Supplementary Material SPNet: Shape Prediction using a Fully Convolutional Neural Network S M Masudur Rahman Al Arif 1, Karen Knapp 2 and Greg Slabaugh 1 1 City, University of London 2 University of Exeter

More information