Statistical Learning Theory

Size: px
Start display at page:

Download "Statistical Learning Theory"

Transcription

1 Statistical Learning Theory Part I : Mathematical Learning Theory (1-8) By Sumio Watanabe, Evaluation : Report Part II : Information Statistical Mechanics (9-15) By Yoshiyuki Kabashima, Evaluation : Report

2 Prerequisite Knowledge (1) In order to learn this lecture, you need (1) Vector space, and linear transform, and matrix computation. (2) Partial differentiation and multiple integration. f(x,y) f(x,y) dxdy x (3) Basic Probability theory. p(1) Probability function p(2) p(3) S

3 Prerequisite Knowledge (2) Statistical learning theory needs mathematics. If you did not learn at least one of them, then it is impossible for you to understand this lecture. Check.1 Let f(x) be an C 2 -class function of x=(x 1,x 2,,x N ) in R N. Then by using definitions f (x)=( f/ x i ) and f (x)=( 2 f/ x i x j ), there exists a * in R N such that f(x) = f(a) + ((x-a), f (a)) + (1/2) ((x-a),f (a * )(x-a)). Check.2 Let X 1, X 2,,X n be independently and identically distributed random variables which have the finite expectation value M. Then (X 1 +X 2 + +X n )/n converges to M almost surely, when n tends to infinity. Remark. If you do not know these check points, then, before participating this lecture, you should learn them in undergraduate program.

4 Part I Mathematical Learning Theory Sumio Watanabe

5 Part I - 1. Basic Concepts in Statistical Learning Theory Sumio Watanabe

6 Part I Probability Distribution and Random Variable

7 Probability Density function on R Definition. Let R be a set of all real values. A function p from R to R is called a probability density function if (1) For an arbitrary x in R, p(x) >=0. (2) p(x) dx =1. p(x) R 7

8 Example.1 p(x) Standard Normal Distribution 1 p(x) = exp( - ) 2π O x 2 2 R Formula : for a>0, exp(- ax 2 ) dx = (π/a) 1/2 8

9 Example.2 p(x) Uniform distribution on [a,b] a b R p(x) = 1/(b-a) (a <= x <=b) 0 (otherwise) 9

10 Probability Distribution on R Definition. Let p(x) be a probability density function on R. For a subset A contained in R, P(A) is defined by P(A) = p(x) dx, A Then P is called a probability distribution on R. p(x) A R 10

11 Example.3 p(x) R 0 1 Probability density function p(x) = 2x (0<= x<=1) = 0 (otherwise) 0.7 P([0.5, 0.7]) = 2x dx =

12 Remark. Probability and Axiom of Choice This page explains a mathematically advanced point. A student who studies introductive probability theory may skip this page. From the mathematical point of view, the axiom of choice is inconsistent with the axiom that any subset in R is measurable, hence mathematical probability theory that employs the axiom of choice needs to determine all subsets which are measurable. The family of such subsets are called a completely additive class. A student who wants to understand them should learn measure theory and mathematical probability theory. 12

13 Probability Density function on R N Definition. Let N be a positive integer and R N be the N dimensional real Euclidean space. A function p from R N to R is called a probability density function if (1) For an arbitrary x in R N, p(x) >=0. (2) p(x) dx =1. A probability distribution P(A) is defined for a subset A in R N P(A) = p(x) dx. A 13

14 Random Variable Definition. Let P be a probability distribution on R N. If a variable X in R N satisfies P({ X in A} ) = p(x) dx, A then X is called an R N -valued random variable and P and p are called a probability distribution and density function of X respectively. Also it is said that X has P and X has p. Note. In probability theory, a random variable is defined as a measurable function on a probability space. 14

15 Expectation Value and Variance Definition. Assume that X is an R N -valued random variable which has a probability density function p. Then the expectation value E[X] and covariance matrix V[X] are respectively defined by E[X] = x p(x) dx, V[X] = (x-e[x])(x-e[x)]) T p(x)dx = E[(X-E[X])(X-E[X]) T ] = E[XX T ]-E[X]E[X] T, where T shows the transposed vector. If N=1, then V[X] is called the variance. 15

16 Part I Conditional Probability

17 Simultaneous Probability Density function Definition. Let (X,Y) be an R M times R N -valued random variable which has a probability density function p(x,y), where x=(x 1,x 2,,x M ) and y=(y 1,y 2,,y N ). Then p(x,y) is called a simultaneous probability density function of (X,Y). O p(x,y) x y Simultaneous PDF shows the PDF of the pair (x,y). 17

18 Marginal Probability Density function Definition. Let (X,Y) be an R M times R N -valued random variable which has a simultaneous probability density function p(x,y). The marginal probability density functions p(x) and p(y) of X and Y are respectively defined by y p(y) p(x,y) p(x) = p(y) = p(x,y) dy, p(x,y) dx. p(x) Marginal PDF shows the PDF of each x or y. 18 x

19 Example.4 A simultaneous probability density function on R 1 times R 1, p(x,y) = (1/C) exp( - 2x 2 +2xy y 2 ), where C = exp( - 2x 2 +2xy y 2 ) dx dy = π. Marginal density functions are p(x) = (1/C) exp( - 2x 2 +2xy y 2 ) dy = 1/π 1/2 exp(-x 2 ). p(y) = (1/C) exp( - 2x 2 +2xy y 2 ) dx = 1/(2π) 1/2 exp(-y 2 /2). Formula : for a>0, exp(- ax 2 ) dx = (π/a) 1/2 19

20 Example.5 A simultaneous probability density function on (x,y) in R 1 times {0,1}, p(x,0) = a p 1 (x), p(x,1) = b p 2 (x), where p 1 (x) and p 2 (x) are probability density functions and a+b=1. The marginal probability density function of x is p(x) = a p 1 (x) + b p 2 (x), The marginal probability function of y is p(0) = a, p(1) = b. 20

21 Conditional Probability Density function Definition. Let (X,Y) be an R M and R N -valued random variable which has a simultaneous probability density function p(x,y). The conditional probability density functions p(y x) and p(x y) are respectively defined by p(y x) = p(x,y) / p(x), p(x y) = p(x,y) / p(y). Remark 1. For x s.t. p(x)=0, p(y x) is not defined. Remark 2. (Mathematically advanced point) In a general probability space, definition of conditional probability requires the division of measures, for example, Radon-Nikodym derivative. 21

22 Meaning of Conditional PDF p(y x) = p(x,y) / p(x) = p(x,y) / { p(x,y )dy } p(x,y) p(x,y) Conditional PDF shows the PDF of y for a fixed x. y O x 22

23 Example.6 A simultaneous probability density function on R 1 times R 1. p(x,y) = (1/ π) exp( - 2x 2 +2xy y 2 ), Marginal density functions are p(x) = 1/π 1/2 exp(-x 2 ). p(y) = 1/(2π) 1/2 exp(-y 2 /2). Conditional probability density functions are p(x y) = p(x,y)/p(y) = 1/(π/2) 1/2 exp(-2(x-y/2) 2 ). p(y x) = p(x,y)/p(x) = 1/π 1/2 exp(-(y-x) 2 ). Formula : for a>0, exp(- ax 2 ) dx = (π/a) 1/2 23

24 Example.7 A simultaneous probability density function on (x,y) in R 1 times {0,1}, p(x,0) = a p 1 (x), p(x,1) = b p 2 (x), The marginal probability density functions are p(x) = a p 1 (x) + b p 2 (x), p(0) = a, p(1) = b. The conditional probability density functions are p(x 0) = p(x,0)/p(0)= p 1 (x), p(x 1) = p(x,1)/p(1)= p 2 (x), p(0 x) = p(x,0)/p(x)= a p 1 (x) / (a p 1 (x) + b p 2 (x)), p(1 x) = p(x,1)/p(x)= b p 2 (x) / (a p 1 (x) + b p 2 (x)), 24

25 Bayes Theorem Theorem : (Bayes Theorem) p(x,y) = p(y x)p(x) = p(x y)p(y). Note. If p(x)=0, then p(y x) is not defined, but we define if p(x)=0, then 0*p(y x) =0. This theorem automatically obtained by the definition of the conditional probability, but there are many applications of this theorem to real world information processing. 25

26 Part I Supervised Learning Sumio Watanabe

27 Supervised Learning Examples Answers Teacher 8,6,2 Teacher Student Student Learn to read characters Mathematical Learning Theory 27

28 Mathematical Description Information Source q(x) X 1, X 2,, X n Examples q(y x) Teacher p(y x,w) Student Mathematical Learning Theory Y 1, Y 2,, Y n I optimize parameter w so that q(y x) = p(y x,w). 28

29 True and Estimation q(x) q(y x) X 1, X 2,, X n Y 1, Y 2,, Y n X Y q(x) q(y x)= q(x,y) p(y x,w) 29

30 Supervised Learning Training data X 1, X 2,, X n Y 1, Y 2,, Y n q(x,y) Unknown Information Source Test data X Y p(y x,w) Learning machine

31 Definition of Supervised Learning Definition. In supervised learning, an information source and a teacher are represented by q(x) and q(y x), whereas a learning machine p(y x,w) with a paramter w. A set of training data consists of { (x i,y i ) ; i=1,2,n}, which are independent and has q(x)q(y x). The number n is called the number of training data. In statistics, a set of training data is called a sample and n is referred to as a sample size. A learning machine optimizes the parameter w so that p(y x,w) approximates q(y x). 31

32 Supervised Learning q(y x) y Supervised learning is mathematically understood as an approximation of q(y x) by p(y x,w). O x p(y x,w) y q(y x) O p(y x,w) x 32

33 Example of q(x)q(y x) Training data are taken from q(x) 0 and q(y x) 2018/6/5 Mathematical Learning Theory

34 Neural Network Example of p(y x,w) 0 6 Output units 2 Learning Machine = Hidden units 6 Input units 25 Image 25 34

35 Classification Training Data, n=100. Desired Output Output Layer Output Hidden Layer Input Layer Input

36 Data Learning in a Neural Network True Trained Neural Network 36

37 Contents of Part I 1. Basic Concepts in Statistical Learning 2. Neural Network 3. Learning in Neural Network, Report Writing (1) 4. Boltzmann Machine 5. Deep Learning 6. Information and Entropy, Report Writing (2) 7. Prediction Accuracy 8. Knowledge Discovery, Report Writing (3) 37

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Internal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky.

Internal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky. Batch Normalization Devin Willmott University of Kentucky October 23, 2017 Overview 1 Internal Covariate Shift 2 Batch Normalization 3 Implementation 4 Experiments Covariate Shift Suppose we have two distributions,

More information

Statistical Learning Theory. Part I 5. Deep Learning

Statistical Learning Theory. Part I 5. Deep Learning Statistical Learning Theory Part I 5. Deep Learning Sumio Watanabe Tokyo Institute of Technology Review : Supervised Learning Training Data X 1, X 2,, X n q(x,y) =q(x)q(y x) Information Source Y 1, Y 2,,

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

Joint Probability Distributions, Correlations

Joint Probability Distributions, Correlations Joint Probability Distributions, Correlations What we learned so far Events: Working with events as sets: union, intersection, etc. Some events are simple: Head vs Tails, Cancer vs Healthy Some are more

More information

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University Statistics for Economists Lectures 6 & 7 Asrat Temesgen Stockholm University 1 Chapter 4- Bivariate Distributions 41 Distributions of two random variables Definition 41-1: Let X and Y be two random variables

More information

There are two basic kinds of random variables continuous and discrete.

There are two basic kinds of random variables continuous and discrete. Summary of Lectures 5 and 6 Random Variables The random variable is usually represented by an upper case letter, say X. A measured value of the random variable is denoted by the corresponding lower case

More information

Let X and Y denote two random variables. The joint distribution of these random

Let X and Y denote two random variables. The joint distribution of these random EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Chapter 2. Probability

Chapter 2. Probability 2-1 Chapter 2 Probability 2-2 Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance with certainty. Examples: rolling a die tossing

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Multivariate Distributions CIVL 7012/8012

Multivariate Distributions CIVL 7012/8012 Multivariate Distributions CIVL 7012/8012 Multivariate Distributions Engineers often are interested in more than one measurement from a single item. Multivariate distributions describe the probability

More information

EE4601 Communication Systems

EE4601 Communication Systems EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two

More information

EXAM # 3 PLEASE SHOW ALL WORK!

EXAM # 3 PLEASE SHOW ALL WORK! Stat 311, Summer 2018 Name EXAM # 3 PLEASE SHOW ALL WORK! Problem Points Grade 1 30 2 20 3 20 4 30 Total 100 1. A socioeconomic study analyzes two discrete random variables in a certain population of households

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

PROBABILITY THEORY REVIEW

PROBABILITY THEORY REVIEW PROBABILITY THEORY REVIEW CMPUT 466/551 Martha White Fall, 2017 REMINDERS Assignment 1 is due on September 28 Thought questions 1 are due on September 21 Chapters 1-4, about 40 pages If you are printing,

More information

First-Order ODE: Separable Equations, Exact Equations and Integrating Factor

First-Order ODE: Separable Equations, Exact Equations and Integrating Factor First-Order ODE: Separable Equations, Exact Equations and Integrating Factor Department of Mathematics IIT Guwahati REMARK: In the last theorem of the previous lecture, you can change the open interval

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Probability Review. Chao Lan

Probability Review. Chao Lan Probability Review Chao Lan Let s start with a single random variable Random Experiment A random experiment has three elements 1. sample space Ω: set of all possible outcomes e.g.,ω={1,2,3,4,5,6} 2. event

More information

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2) This is my preperation notes for teaching in sections during the winter 2018 quarter for course CSE 446. Useful for myself to review the concepts as well. More Linear Algebra Definition 1.1 (Dot Product).

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Bivariate Distributions

Bivariate Distributions Bivariate Distributions EGR 260 R. Van Til Industrial & Systems Engineering Dept. Copyright 2013. Robert P. Van Til. All rights reserved. 1 What s It All About? Many random processes produce Examples.»

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 17: Continuous random variables: conditional PDF Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Two Posts to Fill On School Board

Two Posts to Fill On School Board Y Y 9 86 4 4 qz 86 x : ( ) z 7 854 Y x 4 z z x x 4 87 88 Y 5 x q x 8 Y 8 x x : 6 ; : 5 x ; 4 ( z ; ( ) ) x ; z 94 ; x 3 3 3 5 94 ; ; ; ; 3 x : 5 89 q ; ; x ; x ; ; x : ; ; ; ; ; ; 87 47% : () : / : 83

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Joint Probability Distributions, Correlations

Joint Probability Distributions, Correlations Joint Probability Distributions, Correlations What we learned so far Events: Working with events as sets: union, intersection, etc. Some events are simple: Head vs Tails, Cancer vs Healthy Some are more

More information

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C, Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: PMF case: p(x, y, z) is the joint Probability Mass Function of X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) PDF case: f(x, y, z) is

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Chapter 5 Joint Probability Distributions

Chapter 5 Joint Probability Distributions Applied Statistics and Probability for Engineers Sixth Edition Douglas C. Montgomery George C. Runger Chapter 5 Joint Probability Distributions 5 Joint Probability Distributions CHAPTER OUTLINE 5-1 Two

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1 CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 1 1 Outline of the lecture Syllabus Introduction to Pattern Recognition Review of Probability/Statistics 2 Syllabus Prerequisite Analysis of

More information

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University.

Random Signals and Systems. Chapter 3. Jitendra K Tugnait. Department of Electrical & Computer Engineering. Auburn University. Random Signals and Systems Chapter 3 Jitendra K Tugnait Professor Department of Electrical & Computer Engineering Auburn University Two Random Variables Previously, we only dealt with one random variable

More information

Lecture 1a: Basic Concepts and Recaps

Lecture 1a: Basic Concepts and Recaps Lecture 1a: Basic Concepts and Recaps Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs

Lecture Notes 3 Multiple Random Variables. Joint, Marginal, and Conditional pmfs. Bayes Rule and Independence for pmfs Lecture Notes 3 Multiple Random Variables Joint, Marginal, and Conditional pmfs Bayes Rule and Independence for pmfs Joint, Marginal, and Conditional pdfs Bayes Rule and Independence for pdfs Functions

More information

2. Second-order Linear Ordinary Differential Equations

2. Second-order Linear Ordinary Differential Equations Advanced Engineering Mathematics 2. Second-order Linear ODEs 1 2. Second-order Linear Ordinary Differential Equations 2.1 Homogeneous linear ODEs 2.2 Homogeneous linear ODEs with constant coefficients

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 1 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

10 BIVARIATE DISTRIBUTIONS

10 BIVARIATE DISTRIBUTIONS BIVARIATE DISTRIBUTIONS After some discussion of the Normal distribution, consideration is given to handling two continuous random variables. The Normal Distribution The probability density function f(x)

More information

More than one variable

More than one variable Chapter More than one variable.1 Bivariate discrete distributions Suppose that the r.v. s X and Y are discrete and take on the values x j and y j, j 1, respectively. Then the joint p.d.f. of X and Y, to

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0 Lecture 22 Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) Recall a few facts about power series: a n z n This series in z is centered at z 0. Here z can

More information

OWELL WEEKLY JOURNAL

OWELL WEEKLY JOURNAL Y \»< - } Y Y Y & #»»» q ] q»»»>) & - - - } ) x ( - { Y» & ( x - (» & )< - Y X - & Q Q» 3 - x Q Y 6 \Y > Y Y X 3 3-9 33 x - - / - -»- --

More information

CMPT 882 Machine Learning

CMPT 882 Machine Learning CMPT 882 Machine Learning Lecture Notes Instructor: Dr. Oliver Schulte Scribe: Qidan Cheng and Yan Long Mar. 9, 2004 and Mar. 11, 2004-1 - Basic Definitions and Facts from Statistics 1. The Binomial Distribution

More information

Power Series and Analytic Function

Power Series and Analytic Function Dr Mansoor Alshehri King Saud University MATH204-Differential Equations Center of Excellence in Learning and Teaching 1 / 21 Some Reviews of Power Series Differentiation and Integration of a Power Series

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

Predicate Calculus - Semantics 1/4

Predicate Calculus - Semantics 1/4 Predicate Calculus - Semantics 1/4 Moonzoo Kim CS Dept. KAIST moonzoo@cs.kaist.ac.kr 1 Introduction to predicate calculus (1/2) Propositional logic (sentence logic) dealt quite satisfactorily with sentences

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

2 (Statistics) Random variables

2 (Statistics) Random variables 2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes

More information

2.3 Linear Equations 69

2.3 Linear Equations 69 2.3 Linear Equations 69 2.3 Linear Equations An equation y = fx,y) is called first-order linear or a linear equation provided it can be rewritten in the special form 1) y + px)y = rx) for some functions

More information

Preliminary statistics

Preliminary statistics 1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),

More information

The Theory of Second Order Linear Differential Equations 1 Michael C. Sullivan Math Department Southern Illinois University

The Theory of Second Order Linear Differential Equations 1 Michael C. Sullivan Math Department Southern Illinois University The Theory of Second Order Linear Differential Equations 1 Michael C. Sullivan Math Department Southern Illinois University These notes are intended as a supplement to section 3.2 of the textbook Elementary

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Lecture 1: Basics of Probability

Lecture 1: Basics of Probability Lecture 1: Basics of Probability (Luise-Vitetta, Chapter 8) Why probability in data science? Data acquisition is noisy Sampling/quantization external factors: If you record your voice saying machine learning

More information

18.440: Lecture 26 Conditional expectation

18.440: Lecture 26 Conditional expectation 18.440: Lecture 26 Conditional expectation Scott Sheffield MIT 1 Outline Conditional probability distributions Conditional expectation Interpretation and examples 2 Outline Conditional probability distributions

More information

MTH310 EXAM 2 REVIEW

MTH310 EXAM 2 REVIEW MTH310 EXAM 2 REVIEW SA LI 4.1 Polynomial Arithmetic and the Division Algorithm A. Polynomial Arithmetic *Polynomial Rings If R is a ring, then there exists a ring T containing an element x that is not

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Introduction to Stochastic Processes

Introduction to Stochastic Processes Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage.

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Lecture 4 : Random variable and expectation

Lecture 4 : Random variable and expectation Lecture 4 : Random variable and expectation Study Objectives: to learn the concept of 1. Random variable (rv), including discrete rv and continuous rv; and the distribution functions (pmf, pdf and cdf).

More information

Algebraic Geometry and Model Selection

Algebraic Geometry and Model Selection Algebraic Geometry and Model Selection American Institute of Mathematics 2011/Dec/12-16 I would like to thank Prof. Russell Steele, Prof. Bernd Sturmfels, and all participants. Thank you very much. Sumio

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE 1: BASICS. 1. Bayes Rule. p(b j A)p(A) p(b)

ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE 1: BASICS. 1. Bayes Rule. p(b j A)p(A) p(b) ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE : BASICS KRISTOFFER P. NIMARK. Bayes Rule De nition. Bayes Rule. The probability of event A occurring conditional on the event B having

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Conditional Distributions

Conditional Distributions Conditional Distributions The goal is to provide a general definition of the conditional distribution of Y given X, when (X, Y ) are jointly distributed. Let F be a distribution function on R. Let G(,

More information

STA 111: Probability & Statistical Inference

STA 111: Probability & Statistical Inference STA 111: Probability & Statistical Inference Lecture Four Expectation and Continuous Random Variables Instructor: Olanrewaju Michael Akande Department of Statistical Science, Duke University Instructor:

More information