Probability Theory (revisited)

Similar documents
j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture 3: Probability Distributions

Probability and Random Variable Primer

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

First Year Examination Department of Statistics, University of Florida

Expected Value and Variance

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Quantifying Uncertainty

Engineering Risk Benefit Analysis

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Randomness and Computation

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Simulation and Random Number Generation

Stochastic Structural Dynamics

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Introduction to Random Variables

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Convergence of random processes

A random variable is a function which associates a real number to each element of the sample space

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Information Geometry of Gibbs Sampler

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Gaussian Mixture Models

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Feature Selection: Part 1

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Limited Dependent Variables

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

U-Pb Geochronology Practical: Background

A Robust Method for Calculating the Correlation Coefficient

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Learning from Data 1 Naive Bayes

CS 798: Homework Assignment 2 (Probability)

EM and Structure Learning

Differentiating Gaussian Processes

Case Study of Markov Chains Ray-Knight Compactification

Suites of Tests. DIEHARD TESTS (Marsaglia, 1985) See

Composite Hypotheses testing

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Negative Binomial Regression

Basic Statistical Analysis and Yield Calculations

Outline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)

6. Stochastic processes (2)

Linear Approximation with Regularization and Moving Least Squares

6. Stochastic processes (2)

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

CS-433: Simulation and Modeling Modeling and Probability Review

OPTIMAL COMBINATION OF FOURTH ORDER STATISTICS FOR NON-CIRCULAR SOURCE SEPARATION. Christophe De Luigi and Eric Moreau

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

On mutual information estimation for mixed-pair random variables

Multiple Choice. Choose the one that best completes the statement or answers the question.

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

EGR 544 Communication Theory

Lecture 3: Shannon s Theorem

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Markov Chain Monte Carlo Lecture 6

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Systems of Equations (SUR, GMM, and 3SLS)

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Linear Regression Analysis: Terminology and Notation

STAT 511 FINAL EXAM NAME Spring 2001

Strong Markov property: Same assertion holds for stopping times τ.

Lecture 4: Universal Hash Functions/Streaming Cont d

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

( ) ( ) ( ) ( ) STOCHASTIC SIMULATION FOR BLOCKED DATA. Monte Carlo simulation Rejection sampling Importance sampling Markov chain Monte Carlo

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Homework Assignment 3 Due in class, Thursday October 15

10-701/ Machine Learning, Fall 2005 Homework 3

Continuous Time Markov Chain

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Chapter 1. Probability

APPENDIX A Some Linear Algebra

18.1 Introduction and Recap

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

The Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor

Markov chains. Definition of a CTMC: [2, page 381] is a continuous time, discrete value random process such that for an infinitesimal

DS-GA 1002 Lecture notes 5 Fall Random processes

Conjugacy and the Exponential Family

Chapter Newton s Method

9.07 Introduction to Probability and Statistics for Brain and Cognitive Sciences Emery N. Brown

Transcription:

Probablty Theory (revsted)

Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments

Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted by a polceman. Is the man a thef? Can a machne reason lke us?

Jacob Bernoull (1654-1705)

Bernoull defnton Defnton of probablty attrbuted to Jacob Bernoull (1689): P m n m number of favorable cases n total number of cases Ths defnton establshes a lnk between probabltes and the output of experments. Queston: how to manpulate probabltes n a consstent way?

Kolmogorov (1903-1987)

Kolmogorov Axoms Kolmogorov defned a set of axoms for Probablty Calculus based on set theory and measure theory: He defnes: sgma algebra of sets, closed wth respect to complement and unon of a countable number of sets. a probablty measure for the sets belongng to the sgma feld, denoted as events. E A B Strong pont: all the operatons wth probabltes can be defned from the axoms n a consstent way. Queston: what s the relatonshp between probabltes and expermental data?

Probablty Space A probablty space conssts of: a sample space E, an event space F and a probablty measure P. E s the set of results of the random experment, F s a famly of subsets of E such that P s a functon from F nto [0,1] such that: 0,P(A ) tends to... A If dsjonts ), ( ) ( ) ( 1 ) (, 0 ) ( 2 1 A F A,B B P A P B A P E P F A A P F A F A F A F A F E countable set,

E. Jaynes (1922-1998)

Thnkng Robot How to buld a thnkng robot? Based on the works of Polya e Cox, Jaynes assgns a plausblty to each proposton and demonstrates that a consstent nductve logc must obey the rules of Probablty Calculus. In ths context, probabltes are assgned not to sets of a sgma feld but to propostons.

Random Varables A random varable assgns a numerc value to each experment. dscrete random varables contnuous

Dscrete Random Varables How to characterze a dscrete random varable? Probablty functon P( k) Pr{ x k} Propertes P( k) 0, k N k1 P( k) 1 Ex: S { 1,2,3,4}, P(1).1, P(2).3, P(3).4, P(4).2 Realzatons: 2 3 2 3 3 3 3 4 2 2 4 1 3 2 3 2 1 3 4 3 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 Note: the same symbol wll be used to denote the random varable a realzaton. Dfferent symbols (e.g., captal and lower letters) could be used to make ths dfference more clear.

Bnomal Dstrbuton It answers the followng problem: what s the probablty of an event A beng observed k tmes n n random experments? n k P( k) (1 ) k nk =P(A) 0.25 0.2 0.15 0.1 n=20 0.09 0.08 n=100 0.07 0.06 0.05 0.1 0.04 0.03 0.05 0.02 0.01 0 0 5 10 15 20 25 k 0 0 20 40 60 80 100 120 k

Contnuous Random Varables How to characterze a contnuous random varable x? Probablty densty functon, p P{ x x o } x 0 p( x) dx Propertes p( x) 0, p( x) dx x R 1 n Ex: S R, p( x) 1, x [0,1[, p( x) 0, otherwse 0.8381 0.0196 0.6813 0.3795 0.8318 0.5028 0.7095 0.4289 0.3046 0.1897 0.1934 0.6822 0.3028 0.5417 0.1509 0.6979 0.3784 0.8600 0.8537 0.5936 0 1

Normal Dstrbuton Probablty densty functon N( x;, R) (2 ) n / 1 2 R 1/ 2 e 1 2 ( x)' R 1 ( x) R mean vector covarance matrx The level surfaces n R n are ellpsods centered n m and wth axs v where, v are egen values and egen vectors of R ( v 1)

Jont Dstrbuton The jont dstrbuton of x 1,..., x N, s defned on the set of values of the sequence, beng characterzed by Probablty functon P(x 1,..., x N ) (dscrete varables) Probablty densty functon p(x 1,..., x N ) (contnuous varables) Margnalzaton p( x1) P( x1, x2) x2 p( x2) P( x1, x2) x1

Independence Def: the r.v. x 1,..., x N are ndependent f and only f p( x 1,..., xn ) p( x ) ndependent varables dependent varables Covarances: R 1 0 0.37 R 8.3 1.6 1.6 1.1 Note: ndependent r.v. are converted nto dependent ones by applyng a non dagonal lnear transformaton

Correlaton Def: x 1,..., x N are correlated r.v. f ther covarance matrx s non dagonal. Notes: ndependent r.v. are always uncorrelated. The converse s not true. gven n r.vs. t s possble to decorrelate them by applyng a sutable lnear transformaton (e.g. KLT or PCA)

Condtonal Probabltes Defnton (condtonal probablty): P( x y) P( x, y)/ P( y) f P( y) 0 P(x y) s nterpreted as the probablty of occurrng x knowng that y occurred. Note: f x,y are contnuous random varables the condtonal probablty densty functon p(x y) s defned n an analogous way.

Expectaton Defnton (Expectaton): Let f: SR n E{ f ( x)} f ( x) P( x) x E { f ( x)} f ( x) p( x) dx (x - dscrete r.v.) (x - contnuous r.v.) Relatonshp wth the artmetc mean: E{ f ( x)} lm N N 1 N 1 f ( x ) x 1, x 2,... are realzaton of x Defnton (mean and covarance matrx): Let x be a random varable mean =E{x} covarance matrx: R=E{(x-)(x-)'}

Propertes of the Covarance Matrx A s a covarance matrx f and only f t s a square matrx, symmetrc and sem defnte postve. Other propertes: the egen values of a covarance matrx are non negatve. R, v 1 m v v ( v ' 1) m R 1 egen values and egen vectors

Propertes of Normal Dstrbuton If x 1,..., x n, are r.v. wth normal dstrbuton, any subset of varables x p1,..., x pm are also r.v. wth normal dstrbuton. Gven a r.v. x ~ N( x, R) the dstrbuton of y=ax+b s N( y, Ryy ) y Ax b, R ARA' yy Gven 2 varables x N( x, R ), y ~ N( y, R ) ~ xx yy then x y ~ N( x y, P) P R xx R yy f x, y are ndependent

Generaton of Random Values Dscrete varables: Splt [0,1[ nterval nto subntervals of length P(). generate a random value wth unform dstrbuton n [0,1[. The value of x s the ndex of the subnterval whch was selected. P(1) P(2) P(N) 0 1 x=2 Contnuous varables: specfc algorthms for some dstrbutons Metropols algorthm mportance samplng Gbbs sampler

Metropols Algorthm How to generate random values wth a gven dstrbuton? Metropols Algorthm: - move x randomly - accept the new value x wth probablty P mn( 1, p( x') / p( x)) otherwse make x =x 0.5 0.4 0.3 Example 3 2 p ( x) ( 2x -4x 6)/ 13. 5 Hstogram e p(x) 0.2 0.1 0-1 0 1 2 x

Importance Samplng It s used to compute expected values when t s dffcult to generate random values wth the true dstrbuton p(x) but t s possble to generate samples wth an auxlary dstrbuton q(x). Algorthm: generate n ndependent realzatons x ~q(x) assgn a weght to each realzaton (mportance) w = p(x ) /q(x ). n 1 n 1 Expectaton: E{ f ( x)} w f ( x ) Note: q ( x) 0, x : p( x) 0 poor performance n hgh dmenson spaces

Example We wsh to estmate 2 moments of a dstrbuton N(0,1) usng mportance samplng. We consdered n=100; 100 estmaton experments were performed m n n 2 1 n 1 wx m2 n 1 wx 1 1 m 1 m m2 1 m 2 q=n(0,.25) -0.0194 0.2801 0.6549 0.4575 q=n(0,1) 0.0034 0.0951 0.9977 0.1540 q=n(0,4) 0.0170 0.0927 0.9959 0.0611 tal msmatch

Gbbs Sampler Problem: generate random values wth a known dstrbuton P( x 1,..., x N ) Algorthm: begn generate x 1 wth dstrbuton generate x 2 wth dstrbuton P( x P( x 1 2 / x 2 / x 1, x, x 3 3,..., x,..., x N N ) ) repeat generate x N wth dstrbuton P( x N / x 1, x 2,..., x N1 ) Ths algorthm generates a Markov wth asymptotc dstrbuton P( x 1,..., x N )

Optmzaton wth the Gbbs Sampler p Generate realzatons of a r.v. x wth dstrbuton p(x) a p 10 Change a untl a domnant mode s observed p 20 In the lmt the algorthm wll only generate values whch maxmze p. p 30 Dffculty: there are no optmal crtera for the evoluton of a

Problems x y P 1. Gven a dstrbuton P(x,y) defned by: ) P(x) ) P(y) ) P(x/y) v) E{x} v) E{y} v) E{x+y} v) E{xy} 2. The meanng of varables x,y,z s the followng : x-the s gas n the tank; y battery s OK; z- motor starts at frst attempt. Defne a probablty dstrbuton for these varables. 1 1 1 2 2 2 1 2 3 1 2 3.1.2.1.3.1.2 3. A random varable x~n(0,r) has an uncertanty ellpsod wth sem axs [3 1], [-.2.6]. Compute the covarance matrx R knowng that E{x 12 }=1. 4. We known that a brdge falls wth probablty.8 f the man structure elements break and ths happens wth probablty.001. Whch s the break probablty knowng that the brdge has fallen? Dscuss f ths problem can be solved. 5. Three prsoners A, B, C are n separate cells. One s gong to be released and the other two wll be condemned to de. Prsoner A asks the jaler to delver a farewell letter to one of the other prsoners whch wll be condemned. The next day the jaler tells hm that he delvered the letter to prsoner B. What s the probablty of A beng set free before and after the jaler answer?

Work Let x be a random varable wth dstrbuton N(0,1). Determne n an exact or approxmate way: E{x 2 }, E{x 4 }, E{cos(x)}, E{tan(x)}, E{tan -1 (x)}

Bblography E. T. Jaynes, Probablty Theory: the Logc of Scence, 1995. J. Marques, Reconhecmento de Padrões. Métodos Estatístcos e Neuronas, IST Press, 1999. S. Geman and D. Geman. Stochastc relaxaton, Gbbs dstrbutons and the Bayesan restoraton of mages. IEEE Transactons on Pattern Analyss and Machne Intellgence, 6:721-741, 1984.