Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Size: px
Start display at page:

Download "Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA"

Transcription

1 Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA with Hiroshi Morioka Dept of Computer Science University of Helsinki, Finland Facebook AI Summit, 13th June 2016

2 Abstract How to extract features from multi-dimensional data when there are no labels (unsupervised)? We consider data with temporal structure We learn features than enable discriminating data from different time segments (taking segment labels as class labels) We use ordinary neural networks with multinomial logistic regression: Last hidden layer gives the features Surprising theoretical result: Learns to estimate a nonlinear ICA model with general nonlinear mixing x(t) = f(s(t)). nonstationary components si (t)

3 Background: Need for generative models like ICA Unsupervised deep learning is a largely unsolved problem Important since labels often difficult (costly) to obtain Most approaches heuristic, not very clear what they are doing Best would be to define a generative model, and estimate it Cf. Linear unsupervised learning: independent component analysis (ICA) / sparse coding: generative models which are well-defined, i.e. identifiable (Darmois-Skitovich around 1950; Comon, 1994) If we define and estimate generative models: we know better what we are doing we can use all the theory of probabilistic methods... but admittedly, it is theoretically more challenging

4 Background: Nonlinear ICA may not be well-defined For random vector x, it is easy to assume a nonlinear generative model x = f(s) (1) with mutually independent hidden/latent components s i. However, not identifiable i.e. many different nonlinear transforms of x give independent components: no guarantee we can recover the original s i if we assume data with no temporal structure, and general smooth invertible nonlinearities f (Darmois, 1952; Hyvärinen and Pajunen, 1999) Nevertheless, estimation attempted by many authors, e.g. Tan-Zurada (2001), Almeida (2003) and recent deep learning work (Dinh et al, 2015)

5 Background: Temporal correlations can help Harmeling et al (2003) suggested using temporal structure find features that change as slowly as possible (Földiák, 1991) x s they used kernel-based models of nonlinearities Well-known idea in linear ICA (source separation) literature (Tong et al 1991; Belouchrani, 1997) In linear case, identifiable if autocorrelations distinct for different sources (a rather strict condition!) In nonlinear case, identifiability unknown, but certainly not better than in linear case!

6 Background: Temporal structure as nonstationarity A less-known principle in linear source separation: Sources are nonstationary (Matsuoka et al, 2005) x s Usually, we assume variances of the sources change in time s i (t) N (0, σ i (t) 2 ) (2) Linear model x(t) = As(t) is identifiable under weak assumptions (Pham and Cardoso, 2001) So far, not used in nonlinear case...

7 : Intuitive motivation Assume we are given an n-dimensional time series, x(t), with t time index Divide the time series (arbitrarily) into k segments (e.g. bins with equal sizes, points in each segment) Train a multi-layer perceptron to discriminate between segments Number of classes k, index of segment is class label Use multinomial regression, well-known algorithms/software Classifier should find a good representation in hidden layers: In particular, regarding nonstationarity Turns unsupervised learning into supervised, cf. noise-contrastive estimation or generative adversarial nets.

8 Theorem: TCL estimates nonlinear nonstationary ICA Assume data follows nonlinear ICA model x(t) = f(s(t)) with independent sources s i (t) with nonstationary variances i.e. s i (t) N (0, σ i (τ)) in segment τ smooth, invertible nonlinear mixing f : R n R n (+technical assumptions on the non-degeneracy of σi (τ))

9 Theorem: TCL estimates nonlinear nonstationary ICA Assume data follows nonlinear ICA model x(t) = f(s(t)) with independent sources s i (t) with nonstationary variances i.e. s i (t) N (0, σ i (τ)) in segment τ smooth, invertible nonlinear mixing f : R n R n (+technical assumptions on the non-degeneracy of σi (τ)) Assume we apply time-contrastive learning on x(t) i.e. logistic regression to discriminate between time segments using MLP with last hidden layer outputs in vector h(x(t)).

10 Theorem: TCL estimates nonlinear nonstationary ICA Assume data follows nonlinear ICA model x(t) = f(s(t)) with independent sources s i (t) with nonstationary variances i.e. s i (t) N (0, σ i (τ)) in segment τ smooth, invertible nonlinear mixing f : R n R n (+technical assumptions on the non-degeneracy of σi (τ)) Assume we apply time-contrastive learning on x(t) i.e. logistic regression to discriminate between time segments using MLP with last hidden layer outputs in vector h(x(t)). Then, s(t) 2 = Ah(x(t)) for some linear mixing matrix A. (Squaring is element-wise)

11 Theorem: TCL estimates nonlinear nonstationary ICA Assume data follows nonlinear ICA model x(t) = f(s(t)) with independent sources s i (t) with nonstationary variances i.e. s i (t) N (0, σ i (τ)) in segment τ smooth, invertible nonlinear mixing f : R n R n (+technical assumptions on the non-degeneracy of σi (τ)) Assume we apply time-contrastive learning on x(t) i.e. logistic regression to discriminate between time segments using MLP with last hidden layer outputs in vector h(x(t)). Then, s(t) 2 = Ah(x(t)) for some linear mixing matrix A. (Squaring is element-wise) I.e.: TCL demixes nonlinear ICA model up to a linear mixing (which can be estimated by linear ICA) and up to squaring.

12 Theorem: TCL estimates nonlinear nonstationary ICA Assume data follows nonlinear ICA model x(t) = f(s(t)) with independent sources s i (t) with nonstationary variances i.e. s i (t) N (0, σ i (τ)) in segment τ smooth, invertible nonlinear mixing f : R n R n (+technical assumptions on the non-degeneracy of σi (τ)) Assume we apply time-contrastive learning on x(t) i.e. logistic regression to discriminate between time segments using MLP with last hidden layer outputs in vector h(x(t)). Then, s(t) 2 = Ah(x(t)) for some linear mixing matrix A. (Squaring is element-wise) I.e.: TCL demixes nonlinear ICA model up to a linear mixing (which can be estimated by linear ICA) and up to squaring. This is a constructive proof of identifiability (up to squaring)

13 Illustration and comments Source signals Observed signals 1 n 1 n Segments: T Nonlinear mixture: Theorem 1 Feature values 1 m Predictions of segment labels T Multinomial logistic regression: Feature extractor: T Time ( ) A Generative model B Nonstationarity enables identifiability, since independence of sources must hold for all time points enough constraints Many data sets well known to be nonstationary: Video, EEG/MEG, financial time series We can generalize nonstationarity to exponential family We can combine with dimension reduction: find only nonstationary manifold

14 Sketch of proof of Theorem Denote h, hidden unit outputs; x, data; w τ, LR coeffs in segment τ; p τ, probability in segment τ. By theory of logistic regression, we learn differences of log-pdf s in classes: w T τ h(x t ) + b τ = log p τ (x t ) log p 1 (x t ) + const, (3) By the nonlinear ICA model, we have log p τ (x) = n λ τ,i si 2 + log det Jg(x) log Z(λ τ ), (4) i=1 where J is Jacobian of nonlinear mixing f. So, the si 2 and the h i (x t ) span the same subspace the si 2 are linear transformations of hidden units

15 Simulations with artificial data Create data according to model, try to recover sources. Nonlinear mixing is by another MLP; segment length 512 points. Mean correlation Recovery of sources 1 TCL(L=1) TCL(L=2) TCL(L=3) 0.8 TCL(L=4) TCL(L=5) NSVICA(L=1) 0.6 NSVICA(L=2) NSVICA(L=3) NSVICA(L=4) 0.4 NSVICA(L=5) ktdsep(l=1) ktdsep(l=2) 0.2 ktdsep(l=3) ktdsep(l=4) ktdsep(l=5) 0 DAE(L=1) DAE(L=2) Number of segments DAE(L=3) Accuracy (%) Classification accuracy Number of segments ktdsep: Harmeling et el (2003) DAE: Denoising autoencoder NSVICA: Linear nonstationarity-based method L=1 L=2 L=3 L=4 L=5 L=1(chance) L=2(chance) L=3(chance) L=4(chance) L=5(chance)

16 Experiments with brain imaging data MEG data (like EEG but better) Sources estimated from resting data (no stimulation) a) Validation by classifying another data set with four stimulation modalities: visual, auditory, tactile, rest. Trained a linear SVM on estimated sources Number of layers in MLP ranging from 1 to 4 b) Attempt to visualize nonlinear processing a) Classification accuracy (%) L=1 L=4 L=1 L=4 TCL DAE ktdsep NSVICA b) L3 L2 L1 Figure 3: Real MEG data. a) Classification accuracies of linear SMVs newly trained with tasksession data to predict stimulation labels in task-sessions, with feature extractors trained in advance with resting-session data. Error Aapo bars Hyvärinen give standard Time-contrastive errors of thelearning mean across ten repetitions. For TCL

17 Conclusion We proposed the intuitive idea of time-contrastive learning Divide multivariate time series into segments, learn to discriminate them, e.g. by ordinary MLP (deep) learning Unsupervised learning via supervised learning No new algorithms or software needed TCL can be shown to estimate a nonlinear ICA model With general (smooth, invertible) nonlinear mixing functions Assuming sources are nonstationary (Note: Likelihood or mutual information of nonlinear ICA model would be much more difficult to compute) First case of nonlinear ICA (or source separation) with general identifiability results!! (?) Future work: Application on image/video data etc. Combining nonstationarity with autocorrelations

arxiv: v1 [stat.ml] 20 May 2016

arxiv: v1 [stat.ml] 20 May 2016 Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA Aapo Hyvärinen and Hiroshi Morioka arxiv:1605.06336v1 [stat.ml] 20 May 2016 Department of Computer Science and HIIT University

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV Aapo Hyvärinen Gatsby Unit University College London Part III: Estimation of unnormalized models Often,

More information

Nonlinear ICA of Temporally Dependent Stationary Sources

Nonlinear ICA of Temporally Dependent Stationary Sources Aapo Hyvärinen 1,2 Hiroshi Morioka 2 1 Gatsby Computational Neuroscience Unit University College London, UK 2 Department of Computer Science and HIIT University of Helsinki, Finland Abstract We develop

More information

Estimating Unnormalized models. Without Numerical Integration

Estimating Unnormalized models. Without Numerical Integration Estimating Unnormalized Models Without Numerical Integration Dept of Computer Science University of Helsinki, Finland with Michael Gutmann Problem: Estimation of unnormalized models We want to estimate

More information

Independent Component Analysis

Independent Component Analysis A Short Introduction to Independent Component Analysis Aapo Hyvärinen Helsinki Institute for Information Technology and Depts of Computer Science and Psychology University of Helsinki Problem of blind

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

A Deep Interpretation of Classifier Chains

A Deep Interpretation of Classifier Chains A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and

More information

Notes on Noise Contrastive Estimation (NCE)

Notes on Noise Contrastive Estimation (NCE) Notes on Noise Contrastive Estimation NCE) David Meyer dmm@{-4-5.net,uoregon.edu,...} March 0, 207 Introduction In this note we follow the notation used in [2]. Suppose X x, x 2,, x Td ) is a sample of

More information

Independent Component Analysis

Independent Component Analysis A Short Introduction to Independent Component Analysis with Some Recent Advances Aapo Hyvärinen Dept of Computer Science Dept of Mathematics and Statistics University of Helsinki Problem of blind source

More information

Linear Factor Models. Deep Learning Decal. Hosted by Machine Learning at Berkeley

Linear Factor Models. Deep Learning Decal. Hosted by Machine Learning at Berkeley Linear Factor Models Deep Learning Decal Hosted by Machine Learning at Berkeley 1 Overview Agenda Background Linear Factor Models Probabilistic PCA Independent Component Analysis Slow Feature Analysis

More information

New Machine Learning Methods for Neuroimaging

New Machine Learning Methods for Neuroimaging New Machine Learning Methods for Neuroimaging Gatsby Computational Neuroscience Unit University College London, UK Dept of Computer Science University of Helsinki, Finland Outline Resting-state networks

More information

Natural Image Statistics

Natural Image Statistics Natural Image Statistics A probabilistic approach to modelling early visual processing in the cortex Dept of Computer Science Early visual processing LGN V1 retina From the eye to the primary visual cortex

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Independent Subspace Analysis

Independent Subspace Analysis Independent Subspace Analysis Barnabás Póczos Supervisor: Dr. András Lőrincz Eötvös Loránd University Neural Information Processing Group Budapest, Hungary MPI, Tübingen, 24 July 2007. Independent Component

More information

Bayesian ensemble learning of generative models

Bayesian ensemble learning of generative models Chapter Bayesian ensemble learning of generative models Harri Valpola, Antti Honkela, Juha Karhunen, Tapani Raiko, Xavier Giannakopoulos, Alexander Ilin, Erkki Oja 65 66 Bayesian ensemble learning of generative

More information

Blind Machine Separation Te-Won Lee

Blind Machine Separation Te-Won Lee Blind Machine Separation Te-Won Lee University of California, San Diego Institute for Neural Computation Blind Machine Separation Problem we want to solve: Single microphone blind source separation & deconvolution

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Dreem Challenge report (team Bussanati)

Dreem Challenge report (team Bussanati) Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the

More information

Statistical Learning Reading Assignments

Statistical Learning Reading Assignments Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical

More information

NONLINEAR BLIND SOURCE SEPARATION USING KERNEL FEATURE SPACES.

NONLINEAR BLIND SOURCE SEPARATION USING KERNEL FEATURE SPACES. NONLINEAR BLIND SOURCE SEPARATION USING KERNEL FEATURE SPACES Stefan Harmeling 1, Andreas Ziehe 1, Motoaki Kawanabe 1, Benjamin Blankertz 1, Klaus-Robert Müller 1, 1 GMD FIRST.IDA, Kekuléstr. 7, 1489 Berlin,

More information

Semi-Blind approaches to source separation: introduction to the special session

Semi-Blind approaches to source separation: introduction to the special session Semi-Blind approaches to source separation: introduction to the special session Massoud BABAIE-ZADEH 1 Christian JUTTEN 2 1- Sharif University of Technology, Tehran, IRAN 2- Laboratory of Images and Signals

More information

Kernel Feature Spaces and Nonlinear Blind Source Separation

Kernel Feature Spaces and Nonlinear Blind Source Separation Kernel Feature Spaces and Nonlinear Blind Source Separation Stefan Harmeling 1, Andreas Ziehe 1, Motoaki Kawanabe 1, Klaus-Robert Müller 1,2 1 Fraunhofer FIRST.IDA, Kekuléstr. 7, 12489 Berlin, Germany

More information

From independent component analysis to score matching

From independent component analysis to score matching From independent component analysis to score matching Aapo Hyvärinen Dept of Computer Science & HIIT Dept of Mathematics and Statistics University of Helsinki Finland 1 Abstract First, short introduction

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Tensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W.,

Tensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W., Overview 1. Brief tensor introduction 2. Stein s lemma 3. Score and score matching for fitting models 4. Bringing it all together for supervised deep learning Tensor intro 1 Tensors are multidimensional

More information

Linear Factor Models. Sargur N. Srihari

Linear Factor Models. Sargur N. Srihari Linear Factor Models Sargur N. srihari@cedar.buffalo.edu 1 Topics in Linear Factor Models Linear factor model definition 1. Probabilistic PCA and Factor Analysis 2. Independent Component Analysis (ICA)

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

EXTENSIONS OF ICA AS MODELS OF NATURAL IMAGES AND VISUAL PROCESSING. Aapo Hyvärinen, Patrik O. Hoyer and Jarmo Hurri

EXTENSIONS OF ICA AS MODELS OF NATURAL IMAGES AND VISUAL PROCESSING. Aapo Hyvärinen, Patrik O. Hoyer and Jarmo Hurri EXTENSIONS OF ICA AS MODELS OF NATURAL IMAGES AND VISUAL PROCESSING Aapo Hyvärinen, Patrik O. Hoyer and Jarmo Hurri Neural Networks Research Centre Helsinki University of Technology P.O. Box 5400, FIN-02015

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)

CS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont) CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =

More information

Feature Extraction with Weighted Samples Based on Independent Component Analysis

Feature Extraction with Weighted Samples Based on Independent Component Analysis Feature Extraction with Weighted Samples Based on Independent Component Analysis Nojun Kwak Samsung Electronics, Suwon P.O. Box 105, Suwon-Si, Gyeonggi-Do, KOREA 442-742, nojunk@ieee.org, WWW home page:

More information

A two-layer ICA-like model estimated by Score Matching

A two-layer ICA-like model estimated by Score Matching A two-layer ICA-like model estimated by Score Matching Urs Köster and Aapo Hyvärinen University of Helsinki and Helsinki Institute for Information Technology Abstract. Capturing regularities in high-dimensional

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA 16 1 (Independent Component Analysis: ICA) 198 9 ICA ICA ICA 1 ICA 198 Jutten Herault Comon[3], Amari & Cardoso[4] ICA Comon (PCA) projection persuit projection persuit ICA ICA ICA 1 [1] [] ICA ICA EEG

More information

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki SPARSE REPRESENTATON AND BLND DECONVOLUTON OF DYNAMCAL SYSTEMS Liqing Zhang and Andrzej Cichocki Lab for Advanced Brain Signal Processing RKEN Brain Science nstitute Wako shi, Saitama, 351-198, apan zha,cia

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Blind separation of sources that have spatiotemporal variance dependencies

Blind separation of sources that have spatiotemporal variance dependencies Blind separation of sources that have spatiotemporal variance dependencies Aapo Hyvärinen a b Jarmo Hurri a a Neural Networks Research Centre, Helsinki University of Technology, Finland b Helsinki Institute

More information

MISEP Linear and Nonlinear ICA Based on Mutual Information

MISEP Linear and Nonlinear ICA Based on Mutual Information Journal of Machine Learning Research 4 (23) 297-38 Submitted /2; Published 2/3 MISEP Linear and Nonlinear ICA Based on Mutual Information Luís B. Almeida INESC-ID, R. Alves Redol, 9, -29 Lisboa, Portugal

More information

Temporal Coherence, Natural Image Sequences, and the Visual Cortex

Temporal Coherence, Natural Image Sequences, and the Visual Cortex Temporal Coherence, Natural Image Sequences, and the Visual Cortex Jarmo Hurri and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O.Box 9800, 02015 HUT, Finland {jarmo.hurri,aapo.hyvarinen}@hut.fi

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

Independent Component Analysis and Blind Source Separation

Independent Component Analysis and Blind Source Separation Independent Component Analysis and Blind Source Separation Aapo Hyvärinen University of Helsinki and Helsinki Institute of Information Technology 1 Blind source separation Four source signals : 1.5 2 3

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces LETTER Communicated by Bartlett Mel Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces Aapo Hyvärinen Patrik Hoyer Helsinki University

More information

ICA. Independent Component Analysis. Zakariás Mátyás

ICA. Independent Component Analysis. Zakariás Mátyás ICA Independent Component Analysis Zakariás Mátyás Contents Definitions Introduction History Algorithms Code Uses of ICA Definitions ICA Miture Separation Signals typical signals Multivariate statistics

More information

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo The Fixed-Point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis Aapo Hyvarinen Helsinki University of Technology Laboratory of Computer and Information Science P.O.Box 5400,

More information

Non-linear Measure Based Process Monitoring and Fault Diagnosis

Non-linear Measure Based Process Monitoring and Fault Diagnosis Non-linear Measure Based Process Monitoring and Fault Diagnosis 12 th Annual AIChE Meeting, Reno, NV [275] Data Driven Approaches to Process Control 4:40 PM, Nov. 6, 2001 Sandeep Rajput Duane D. Bruns

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Antti Honkela 1, Stefan Harmeling 2, Leo Lundqvist 1, and Harri Valpola 1 1 Helsinki University of Technology,

More information

Reward-modulated inference

Reward-modulated inference Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Information Dynamics Foundations and Applications

Information Dynamics Foundations and Applications Gustavo Deco Bernd Schürmann Information Dynamics Foundations and Applications With 89 Illustrations Springer PREFACE vii CHAPTER 1 Introduction 1 CHAPTER 2 Dynamical Systems: An Overview 7 2.1 Deterministic

More information

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology

More information

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES Dinh-Tuan Pham Laboratoire de Modélisation et Calcul URA 397, CNRS/UJF/INPG BP 53X, 38041 Grenoble cédex, France Dinh-Tuan.Pham@imag.fr

More information

Analytical solution of the blind source separation problem using derivatives

Analytical solution of the blind source separation problem using derivatives Analytical solution of the blind source separation problem using derivatives Sebastien Lagrange 1,2, Luc Jaulin 2, Vincent Vigneron 1, and Christian Jutten 1 1 Laboratoire Images et Signaux, Institut National

More information

NONLINEAR INDEPENDENT FACTOR ANALYSIS BY HIERARCHICAL MODELS

NONLINEAR INDEPENDENT FACTOR ANALYSIS BY HIERARCHICAL MODELS NONLINEAR INDEPENDENT FACTOR ANALYSIS BY HIERARCHICAL MODELS Harri Valpola, Tomas Östman and Juha Karhunen Helsinki University of Technology, Neural Networks Research Centre P.O. Box 5400, FIN-02015 HUT,

More information

Blind Source Separation in Nonlinear Mixture for Colored Sources Using Signal Derivatives

Blind Source Separation in Nonlinear Mixture for Colored Sources Using Signal Derivatives Blind Source Separation in Nonlinear Mixture for Colored Sources Using Signal Derivatives Bahram Ehsandoust, Masoud Babaie-Zadeh, Christian Jutten To cite this version: Bahram Ehsandoust, Masoud Babaie-Zadeh,

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Learning from Data: Multi-layer Perceptrons

Learning from Data: Multi-layer Perceptrons Learning from Data: Multi-layer Perceptrons Amos Storkey, School of Informatics University of Edinburgh Semester, 24 LfD 24 Layered Neural Networks Background Single Neurons Relationship to logistic regression.

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation

More information

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

By choosing to view this document, you agree to all provisions of the copyright laws protecting it. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services. Internal

More information

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met c Outlines Statistical Group and College of Engineering and Computer Science Overview Linear Regression Linear Classification Neural Networks Kernel Methods and SVM Mixture Models and EM Resources More

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13 Energy Based Models Stefano Ermon, Aditya Grover Stanford University Lecture 13 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 13 1 / 21 Summary Story so far Representation: Latent

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone: Neural Networks Nethra Sambamoorthi, Ph.D Jan 2003 CRMportals Inc., Nethra Sambamoorthi, Ph.D Phone: 732-972-8969 Nethra@crmportals.com What? Saying it Again in Different ways Artificial neural network

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

DISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING

DISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING DISTINGUISH HARD INSTANCES OF AN NP-HARD PROBLEM USING MACHINE LEARNING ZHE WANG, TONG ZHANG AND YUHAO ZHANG Abstract. Graph properties suitable for the classification of instance hardness for the NP-hard

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

Higher Order Statistics

Higher Order Statistics Higher Order Statistics Matthias Hennig Neural Information Processing School of Informatics, University of Edinburgh February 12, 2018 1 0 Based on Mark van Rossum s and Chris Williams s old NIP slides

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information