Machine Learning A Bayesian and Optimization Perspective

Size: px
Start display at page:

Download "Machine Learning A Bayesian and Optimization Perspective"

Transcription

1 Machine Learning A Bayesian and Optimization Perspective Sergios Theodoridis AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Academic Press is an imprint of Elsevier

2 Preface Acknowledgments Notation xvii xix xxi CHAPTER 1 Introduction i 1.1 What Machine Learning is About Classification Regression Structure and a Road Map of the Book 5 References 8 CHAPTER 2 Probability and Stochastic Processes Introduction Probability and Random Variables Probability Discrete Random Variables Continuous Random Variables Mean and Variance Transformation of Random Variables Examples of Distributions Discrete Variables Continuous Variables Stochastic Processes First and Second Order Statistics Stationarity and Ergodicity Power Spectral Density Autoregressive Models Information Theory Discrete Random Variables Continuous Random Variables Stochastic Convergence 48 Problems 49 References 51 CHAPTER 3 Learning in Parametric Modeling: Basic Concepts and Directions Introduction Parameter Estimation: The Deterministic Point of View 54 V

3 vi 3.3 Linear Regression Classification Biased Versus Unbiased Estimation Biased or Unbiased Estimation? The Cramer-Rao Lower Bound Sufficient Statistic Regularization The Bias-Variance Dilemma Mean-Square Error Estimation Bias-Variance Tradeoff Maximum Likelihood Method Linear Regression: The Nonwhite Gaussian Noise Case Bayesian Inference The Maximum a Posteriori Probability Estimation Method Curse of Dimensionality Validation Expected and Empirical Loss Functions Nonparametric Modeling and Estimation 95 Problems 97 References 102 CHAPTER 4 Mean-Square Error Linear Estimation Introduction Mean-Square Error Linear Estimation: The Normal Equations The Cost Function Surface A Geometric Viewpoint: Orthogonality Condition Extension to Complex-Valued Variables Ill Widely Linear Complex-Valued Estimation Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus Linear Filtering MSE Linear Filtering: A Frequency Domain Point of View Some Typical Applications Interference Cancellation System Identification Deconvolution: Channel Equalization Algorithmic Aspects: The Levinson and the Lattice-Ladder Algorithms The Lattice-Ladder Scheme Mean-Square Error Estimation of Linear Models The Gauss-Markov Theorem Constrained Linear Estimation: The Beamforming Case 145

4 vii 4.10 Time-Varying Statistics: Kaiman Filtering 148 Problems 154 References 158 CHAPTER 5 Stochastic Gradient Descent: The LMS Algorithm and its Family i6i 5.1 Introduction The Steepest Descent Method Application to the Mean-Square Error Cost Function The Complex-Valued Case Stochastic Approximation The Least-Mean-Squares Adaptive Algorithm Convergence and Steady-State Performance of the LMS in Stationary Environments Cumulative Loss Bounds The Affine Projection Algorithm The Normalized LMS The Complex-Valued Case Relatives of the LMS Simulation Examples Adaptive Decision Feedback Equalization The Linearly Constrained LMS Tracking Performance of the LMS in Nonstationary Environments Distributed Learning: The Distributed LMS Cooperation Strategies The Diffusion LMS Convergence and Steady-State Performance: Some Highlights Consensus-Based Distributed Schemes A Case Study: Target Localization Some Concluding Remarks: Consensus Matrix 223 Problems 224 References 227 CHAPTER 6 The Least-Squares Family Introduction Least-Squares Linear Regression: A Geometric Perspective Statistical Properties of the LS Estimator Orthogonalizing the Column Space of X: The SVD Method Ridge Regression The Recursive Least-Squares Algorithm 245

5 viii 6.7 Newton's Iterative Minimization Method RLS and Newton's Method Steady-State Performance of the RLS Complex-Valued Data: The Widely Linear RLS Computational Aspects of the LS Solution The Coordinate and Cyclic Coordinate Descent Methods Simulation Examples Total-Least-Squares 261 Problems 268 References 272 CHAPTER 7 Classification: A Tour of the Classics Introduction Bayesian Classification Average Risk Decision (Hyper)Surfaces The Gaussian Distribution Case The Naive Bayes Classifier The Nearest Neighbor Rule Logistic Regression Fisher's Linear Discriminant Classification Trees Combining Classifiers The Boosting Approach Boosting Trees A Case Study: Protein Folding Prediction 314 Problems 318 References 323 CHAPTER 8 Parameter Learning: A Convex Analytic Path Introduction Convex Sets and Functions Convex Sets Convex Functions Projections onto Convex Sets Properties of Projections Fundamental Theorem of Projections onto Convex Sets A Parallel Version of POCS From Convex Sets to Parameter Estimation and Machine Learning Regression Classification 347

6 ix 8.7 Infinite Many Closed Convex Sets: The Online Learning Case Convergence of APSM Constrained Learning The Distributed APSM Optimizing Nonsmooth Convex Cost Functions Subgradients and Subdifferentials Minimizing Nonsmooth Continuous Convex Loss Functions: The Batch Learning Case Online Learning for Convex Optimization Regret Analysis Online Learning and Big Data Applications: A Discussion Proximal Operators Properties of the Proximal Operator Proximal Minimization Proximal Splitting Methods for Optimization 385 Problems Appendix to Chapter References 398 CHAPTER 9 Sparsity-Aware Learning: Concepts and Theoretical Foundations Introduction Searching for a Norm The Least Absolute Shrinkage and Selection Operator (LASSO) Sparse Signal Representation In Search of the Sparsest Solution Uniqueness of the IQ Minimizer Mutual Coherence Equivalence of IQ and l\ Minimizers: Sufficiency Conditions Condition Implied by the Mutual Coherence Number The Restricted Isometry Property (RIP) Robust Sparse Signal Recovery from Noisy Measurements Compressed Sensing: The Glory of Randomness Dimensionality Reduction and Stable Embeddings Sub-Nyquist Sampling: Analog-to-Information Conversion A Case Study: Image De-Noising 438 Problems 440 References 444 CHAPTER 10 Sparsity-Aware Learning: Algorithms and Applications Introduction Sparsity-Promoting Algorithms 450

7 x Greedy Algorithms Iterative Shrinkage/Thresholding (1ST) Algorithms Which Algorithm?: Some Practical Hints Variations on the Sparsity-Aware Theme Online Sparsity-Promoting Algorithms LASSO: Asymptotic Performance The Adaptive Norm-Weighted LASSO Adaptive CoSaMP (AdCoSaMP) Algorithm Sparse Adaptive Projection Subgradient Method (SpAPSM) Learning Sparse Analysis Models Compressed Sensing for Sparse Signal Representation in Coherent Dictionaries Cosparsity A Case Study: Time-Frequency Analysis Appendix to Chapter 10: Some Hints from the Theory of Frames 497 Problems 500 References 502 CHAPTER 11 Learning in Reproducing Kernel Hubert Spaces Introduction Generalized Linear Models Volterra, Wiener, and Hammerstein Models Cover's Theorem: Capacity of a Space in Linear Dichotomies Reproducing Kernel Hubert Spaces Some Properties and Theoretical Highlights Examples of Kernel Functions Representer Theorem Semiparametric Representer Theorem Nonparametric Modeling: A Discussion Kernel Ridge Regression Support Vector Regression The Linear e-insensitive Optimal Regression Kernel Ridge Regression Revisited Optimal Margin Classification: Support Vector Machines Linearly Separable Classes: Maximum Margin Classifiers Nonseparable Classes Performance of SVMs and Applications Choice of Hyperparameters Computational Considerations Multiclass Generalizations 552

8 xi Online Learning in RKHS The Kernel LMS(KLMS) The Naive Online R reg Minimization Algorithm (NORMA) The Kernel APSM Algorithm Multiple Kernel Learning Nonparametric Sparsity-Aware Learning: Additive Models A Case Study: Authorship Identification 570 Problems 574 References 578 CHAPTER 12 Bayesian Learning: Inference and the EM Algorithm Introduction Regression: A Bayesian Perspective The Maximum Likelihood Estimator The MAP Estimator The Bayesian Approach The Evidence Function and Occam's Razor Rule Exponential Family of Probability Distributions The Exponential Family and the Maximum Entropy Method Latent Variables and the EM Algorithm The Expectation-Maximization Algorithm The EM Algorithm: A Lower Bound Maximization View Linear Regression and the EM Algorithm Gaussian Mixture Models Gaussian Mixture Modeling and Clustering Combining Learning Models: A Probabilistic Point of View Mixing Linear Regression Models Mixing Logistic Regression Models 625 Problems Appendix to Chapter PDFs with Exponent of Quadratic Form The Conditional from the Joint Gaussian Pdf The Marginal from the Joint Gaussian Pdf The Posterior from Gaussian Prior and Conditional Pdfs 634 References 637 CHAPTER 13 Bayesian Learning: Approximate Inference and Nonparametric Models Introduction Variational Approximation in Bayesian Learning The Case of the Exponential Family of Probability Distributions 644

9 xii 13.3 A Variational Bayesian Approach to Linear Regression A Variational Bayesian Approach to Gaussian Mixture Modeling When Bayesian Inference Meets Sparsity Sparse Bayesian Learning (SBL) The Spike and Slab Method The Relevance Vector Machine Framework Adopting the Logistic Regression Model for Classification Convex Duality and Variational Bounds Sparsity-Aware Regression: A Variational Bound Bayesian Path Sparsity-Aware Learning: Some Concluding Remarks Expectation Propagation Nonparametric Bayesian Modeling The Chinese Restaurant Process Inference Dirichlet Processes The Stick-Breaking Construction of a DP Gaussian Processes Covariance Functions and Kernels Regression Classification A Case Study: Hyperspectral Image Unmixing Hierarchical Bayesian Modeling Experimental Results 696 Problems 699 References 702 CHAPTER 14 Monte Carlo Methods Introduction Monte Carlo Methods: The Main Concept Random number generation Random Sampling Based on Function Transformation Rejection Sampling Importance Sampling Monte Carlo Methods and the EM Algorithm Markov Chain Monte Carlo Methods Ergodic Markov Chains The Metropolis Method Convergence Issues Gibbs Sampling In Search of More Efficient Methods: A Discussion 735

10 xiii A Case Study: Change-Point Detection 737 Problems 740 References 742 CHAPTER 15 Probabilistic Graphical Models: Part I Introduction The Need for Graphical Models Bayesian Networks and the Markov Condition Graphs: Basic Definitions Some Hints on Causality D-Separation Sigmoidal Bayesian Networks Linear Gaussian Models Multiple-Cause Networks I-Maps, Soundness, Faithfulness, and Completeness Undirected Graphical Models Independencies and I-Maps in Markov Random Fields The Ising Model and Its Variants Conditional Random Fields (CRFs) Factor Graphs Graphical Models for Error-Correcting Codes Moralization of Directed Graphs Exact Inference Methods: Message-Passing Algorithms Exact Inference in Chains Exact Inference in Trees The Sum-Product Algorithm The Max-Product and Max-Sum Algorithms 782 Problems 789 References 791 CHAPTER 16 Probabilistic Graphical Models: Part II Introduction Triangulated Graphs and Junction Trees Constructing a Join Tree Message-Passing in Junction Trees Approximate Inference Methods Variational Methods: Local Approximation Block Methods for Variational Approximation Loopy Belief Propagation Dynamic Graphical Models 816

11 xiv 16.5 Hidden Markov Models Inference Learning the Parameters in an HMM Discriminative Learning Beyond HMMs: A Discussion Factorial Hidden Markov Models Time-Varying Dynamic Bayesian Networks Learning Graphical Models Parameter Estimation Learning the Structure 837 Problems 838 References 840 CHAPTER 17 Particle Filtering Introduction Sequential Importance Sampling Importance Sampling Revisited Resampling Sequential Sampling Kaiman and Particle Filtering Kaiman Filtering: A Bayesian Point of View Particle Filtering Degeneracy Generic Particle Filtering Auxiliary Particle Filtering 862 Problems 868 References 872 CHAPTER 18 Neural Networks and Deep Learning Introduction The Perceptron The Kernel Perceptron Algorithm Feed-Forward Multilayer Neural Networks The Backpropagation Algorithm The Gradient Descent Scheme Beyond the Gradient Descent Rationale Selecting a Cost Function Pruning the Network Universal Approximation Property of Feed-Forward Neural Networks Neural Networks: A Bayesian Flavor 902

12 xv 18.8 Learning Deep Networks The Need for Deep Architectures Training Deep Networks Training Restricted Boltzmann Machines Training Deep Feed-Forward Networks Deep Belief Networks Variations on the Deep Learning Theme Gaussian Units Stacked Autoencoders The Conditional RBM Case Study: A Deep Network for Optical Character Recognition Case Study: A Deep Autoencoder Example: Generating Data via a DBN 928 Problems 929 References 932 CHAPTER 19 Dimensionality Reduction and Latent Variables Modeling Introduction Intrinsic Dimensionality Principle Component Analysis Canonical Correlation Analysis Relatives of CCA Independent Component Analysis ICA and Gaussianity ICA and Higher Order Cumulants Non-Gaussianity and Independent Components ICA Based on Mutual Information Alternative Paths to ICA Dictionary Learning: The &-SVD Algorithm Nonnegative Matrix Factorization Learning Low-Dimensional Models: A Probabilistic Perspective Factor Analysis Probabilistic PCA Mixture of Factors Analyzers: A Bayesian View to Compressed Sensing Nonlinear Dimensionality Reduction Kernel PCA Graph-Based Methods 982

13 xvi Low-Rank Matrix Factorization: A Sparse Modeling Path Matrix Completion Robust PCA Applications of Matrix Completion and ROBUST PCA A Case Study: fmri Data Analysis 998 Problems 1002 References 1003 APPENDIX A Linear Algebra 1013 A.I Properties of Matrices 1013 A.2 Positive Definite and Symmetric Matrices 1015 A.3 Wirtinger Calculus 1016 References 1017 APPENDIX B Probability Theory and Statistics 1019 B.1 Cramer-Rao Bound 1019 B.2 Characteristic Functions 1020 B.3 Moments and Cumulants 1020 B.4 Edgeworth Expansion of a Pdf 1021 Reference 1022 APPENDIX C Hints on Constrained Optimization 1023 C.1 Equality Constraints 1023 C.2 Inequality Constraints 1025 References 1029 Index 1031

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Lessons in Estimation Theory for Signal Processing, Communications, and Control

Lessons in Estimation Theory for Signal Processing, Communications, and Control Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL

More information

Statistical and Adaptive Signal Processing

Statistical and Adaptive Signal Processing r Statistical and Adaptive Signal Processing Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing Dimitris G. Manolakis Massachusetts Institute of Technology Lincoln Laboratory

More information

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Fundamentals of Applied Probability and Random Processes

Fundamentals of Applied Probability and Random Processes Fundamentals of Applied Probability and Random Processes,nd 2 na Edition Oliver C. Ibe University of Massachusetts, LoweLL, Massachusetts ip^ W >!^ AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS

More information

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL. Adaptive Filtering Fundamentals of Least Mean Squares with MATLABR Alexander D. Poularikas University of Alabama, Huntsville, AL CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Statistical Signal Processing Detection, Estimation, and Time Series Analysis Statistical Signal Processing Detection, Estimation, and Time Series Analysis Louis L. Scharf University of Colorado at Boulder with Cedric Demeure collaborating on Chapters 10 and 11 A TT ADDISON-WESLEY

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Linear Algebra and Probability

Linear Algebra and Probability Linear Algebra and Probability for Computer Science Applications Ernest Davis CRC Press Taylor!* Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor Sc Francis Croup, an informa

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Numerical Analysis for Statisticians

Numerical Analysis for Statisticians Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES THEORY AND PRACTICE Bogustaw Cyganek AGH University of Science and Technology, Poland WILEY A John Wiley &. Sons, Ltd., Publication Contents Preface Acknowledgements

More information

GEOPHYSICAL INVERSE THEORY AND REGULARIZATION PROBLEMS

GEOPHYSICAL INVERSE THEORY AND REGULARIZATION PROBLEMS Methods in Geochemistry and Geophysics, 36 GEOPHYSICAL INVERSE THEORY AND REGULARIZATION PROBLEMS Michael S. ZHDANOV University of Utah Salt Lake City UTAH, U.S.A. 2OO2 ELSEVIER Amsterdam - Boston - London

More information

STOCHASTIC PROCESSES IN PHYSICS AND CHEMISTRY

STOCHASTIC PROCESSES IN PHYSICS AND CHEMISTRY STOCHASTIC PROCESSES IN PHYSICS AND CHEMISTRY Third edition N.G. VAN KAMPEN Institute for Theoretical Physics of the University at Utrecht ELSEVIER Amsterdam Boston Heidelberg London New York Oxford Paris

More information

OPTIMAL ESTIMATION of DYNAMIC SYSTEMS

OPTIMAL ESTIMATION of DYNAMIC SYSTEMS CHAPMAN & HALL/CRC APPLIED MATHEMATICS -. AND NONLINEAR SCIENCE SERIES OPTIMAL ESTIMATION of DYNAMIC SYSTEMS John L Crassidis and John L. Junkins CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

Exploring Monte Carlo Methods

Exploring Monte Carlo Methods Exploring Monte Carlo Methods William L Dunn J. Kenneth Shultis AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO ELSEVIER Academic Press Is an imprint

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Monte Carlo Methods. Handbook of. University ofqueensland. Thomas Taimre. Zdravko I. Botev. Dirk P. Kroese. Universite de Montreal

Monte Carlo Methods. Handbook of. University ofqueensland. Thomas Taimre. Zdravko I. Botev. Dirk P. Kroese. Universite de Montreal Handbook of Monte Carlo Methods Dirk P. Kroese University ofqueensland Thomas Taimre University ofqueensland Zdravko I. Botev Universite de Montreal A JOHN WILEY & SONS, INC., PUBLICATION Preface Acknowledgments

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Geophysical Data Analysis: Discrete Inverse Theory

Geophysical Data Analysis: Discrete Inverse Theory Geophysical Data Analysis: Discrete Inverse Theory MATLAB Edition William Menke Lamont-Doherty Earth Observatory and Department of Earth and Environmental Sciences Columbia University. ' - Palisades, New

More information

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M. TIME SERIES ANALYSIS Forecasting and Control Fifth Edition GEORGE E. P. BOX GWILYM M. JENKINS GREGORY C. REINSEL GRETA M. LJUNG Wiley CONTENTS PREFACE TO THE FIFTH EDITION PREFACE TO THE FOURTH EDITION

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Statistical Learning Reading Assignments

Statistical Learning Reading Assignments Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY PREFACE xiii 1 Difference Equations 1.1. First-Order Difference Equations 1 1.2. pth-order Difference Equations 7

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Monte-Carlo Methods and Stochastic Processes

Monte-Carlo Methods and Stochastic Processes Monte-Carlo Methods and Stochastic Processes From Linear to Non-Linear EMMANUEL GOBET ECOLE POLYTECHNIQUE - UNIVERSITY PARIS-SACLAY CMAP, PALAISEAU CEDEX, FRANCE CRC Press Taylor & Francis Group 6000 Broken

More information

Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition Discriminant Analysis and Statistical Pattern Recognition GEOFFREY J. McLACHLAN Department of Mathematics The University of Queensland St. Lucia, Queensland, Australia A Wiley-Interscience Publication

More information

Identification of Nonlinear Systems Using Neural Networks and Polynomial Models

Identification of Nonlinear Systems Using Neural Networks and Polynomial Models A.Janczak Identification of Nonlinear Systems Using Neural Networks and Polynomial Models A Block-Oriented Approach With 79 Figures and 22 Tables Springer Contents Symbols and notation XI 1 Introduction

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Information Theory in Computer Vision and Pattern Recognition

Information Theory in Computer Vision and Pattern Recognition Francisco Escolano Pablo Suau Boyan Bonev Information Theory in Computer Vision and Pattern Recognition Foreword by Alan Yuille ~ Springer Contents 1 Introduction...............................................

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review

More information

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information