Partition models and cluster processes

Size: px
Start display at page:

Download "Partition models and cluster processes"

Transcription

1 and cluster processes and cluster processes With applications to classification Jie Yang Department of Statistics University of Chicago ICM, Madrid, August 26

2 and cluster processes utline 1 and cluster processes 2 Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration

3 and cluster processes Fisher Discriminant model Description as a process (i.i.d.) k classes with frequencies π 1,..., π k Features X for class r are iid N(µ r, Σ) on X = R q bservations (Y 1, X 1 ), (Y 2, X 2 ),... are i.i.d pr(y i = r) = π r i.i.d. Given Y, features are iid X j N(µ yj, Σ) Density of (Y i, X i ) at (y, x) π y Σ q/2 exp ( (x µ y ) Σ 1 (x µ y )/2 )

4 and cluster processes Fisher Discriminant model Application to classification Estimation step: ˆπ r = n r /n, ˆµ r = x r, ˆΣ = S; Classification step: for a new unit u such that X(u ) = x pr(y (u ) = r data) π r φ(x µ r ; Σ) (independent of data other than x ) Substitution step: ˆpr(Y (u ) = r data) n r φ(x x r ; S) (dependent on data through estimate)

5 and cluster processes Description as a process (indep but not i.d.) Feature x(u) regarded as a covariate for unit u Class Y (u) regarded as response Linear logistic model: Components Y (u 1 ), Y (u 2 ),... independent log pr(y (u) = r) = α r + β r x(u) + const consistent with Fisher model β r = Σ 1 µ r

6 and cluster processes Application to classification Estimation step: maximum likelihood gives ˆα, ˆβ Classification step: pr(y (u ) = r data) = exp(α r + β r x(u )) c exp(α c + β c x(u )) (independent of data other than x(u )) Substitution step: β ˆβ (fitted = predicted)

7 and cluster processes Random partition of [n] = {1,..., n} A set of disjoint non-empty subsets whose union is [n] n = 2: B = {{1, 2}} = 12 or B = {{1}, {2}} = 1 2 n = 3: B 3 = {123, 1 23, 2 13, 3 12, 1 2 3} n = 4: B 4 = {1234, 123 4[4], 12 34[3], [6], }. Bell numbers: #B 3 = 5; #B 4 = 15; #B 5 = 52; #B 6 = B = B 5 1

8 and cluster processes Permutation of elements: B σ ij = B σ(i)σ(j) B = B 5 1 Example 1: σ = (1, 3): B σ = Example 2: σ = (2, 4, 5): B σ = B σ (1, 5) = B(1, 2) = 1, B σ (1, 2) = B(1, 4) = Block sizes unaffected Exchangeable product partition distribution (Hartigan 199) p n (B) = D n C(#b) b B

9 and cluster processes Deletion of elements Deletion (of 5) from gives in B 4 Deletion (of 5) from gives in B 4 Deletion is a map D : B n B n 1 How does this affect distributions? Example: p 5 is uniform on the 15 elements of B 5 Marginal distribution on B 4 after deletion is p 4 (1234) = p 5 (12345) + p 5 (1234 5) = 2/15 p 4 (123 4) = p 5 (1235 4) + p 5 (123 45) + p 5 ( ) = 3/15 p 4 (12 3 4) = p 5 ( ) + + p 5 ( ) = 4/15 Marginal distribution not uniform.

10 and cluster processes Ewens partition process p n (B) = λ#b Γ(λ) Γ(#b) Γ(n + λ) b B Exponential family of distns on B n canonical parameter θ = log λ canonical statistic #B: E(#B) λ log(n) cumulant function log Γ(n + e θ ) log Γ(e θ ) Product partition model: C(#b) = λ (#b 1)!. Finitely exchangeable for each n p n is marginal distn of p n+1

11 and cluster processes distribution for n = 4 p 4 (1234) = 6/(λ + 1)(λ + 2)(λ + 3), p 4 (123 4) = 2λ/(λ + 1)(λ + 2)(λ + 3), p 4 (12 34) = λ/(λ + 1)(λ + 2)(λ + 3) p 4 (12 3 4) = λ 2 /(λ + 1)(λ + 2)(λ + 3), p 4 ( ) = λ 3 /(λ + 1)(λ + 2)(λ + 3) p 3 (123) = p 4 (1234) + p 4 (123 4) = 2/(λ + 1)(λ + 2) p 3 (12 3) = p 4 (124 3) + p 4 (12 34) + p 4 (12 3 4) p 3 (1 2 3) = p 4 (14 2 3) + p 4 (1 24 3) + p 4 (1 2 34) + p 4 ( ) p n is marginal distribution of p n+1 Infinitely exchangeable partition process

12 and cluster processes Chinese restaurant process Ewens process: B B is a random partition of natural numbers Regard as a sequence of matrices B 1, B 2,... such that B n (i, j) = B (i, j) is leading sub-matrix of order n. B n Ewens(n, λ) Conditional distribution of B n+1 given B n Transition distribution B n B n+1 { #b/(n + λ) b B n pr(u n+1 b B n ) = λ/(n + λ) b = CRP interpretation due to Dubins and Pitman No of occupied tables is approx Poisson(λ log(n))

13 and cluster processes Gauss Ewens cluster processes Pair (B, X) such that X = (X 1, X 2,...) (random sequence) B = {B ij } is a random partition of integers Finite-dimensional distributions p n (B, x) is distn on B n X n Invariant under permutation p n (B σ, x σ ) = p n (B, x) p n is the marginal of p n+1 B Ewens n (λ); given B, X N(, Σ = I n + θb) p n (B, x) = λ#b Γ(λ) Γ(#b) 2πΣ 1/2 exp( x Σ 1 x/2) Γ(n + λ) b B

14 and cluster processes rpts$points[,2] rpts$points[,2] rpts$points[,1] rpts$points[,1] rpts$points[,2] rpts$points[,2] 4 7 5

15 and cluster processes rpts$points[,2] rpts$points[,2] rpts$points[,1] rpts$points[,1] rpts$points[,2] rpts$points[,2] 9 9 8

16 and cluster processes!!!!! " $ % 6 9 # # 7 # # # rpts$points[,2] rpts$points[,2] * * rpts$points[,1] + * * * )) * 2, * * + rpts$points[,1] + 2 +, + rpts$points[,1], ) rpts$points[,2] rpts$points[,2] 8 " " " $ $ %! # 8 $ " " " " & #! #!! 8 ( $ $ " "! & & #

17 and cluster processes Potential applications of cluster process Prediction: given (B, X) (n) predict next value Easily computed but not interesting or useful Density estimation: Given X 1,..., X n (No B) predict next value p n+1 (X n+1 X 1,..., X n ) Cluster analysis: p n (B X) is a distn on clusterings of [n] p n (B i,j = 1 X) = E(B i,j X) (triples too) p n (#B = r X) (conditional distn of #B) preliminary conclusions... Classification: given (B, X) (n) and X(u n+1 ) = x compute p n+1 (u n+1 b data) Easily computed and fairly useful

18 and cluster processes Potential applications of cluster process Prediction: given (B, X) (n) predict next value Easily computed but not interesting or useful Density estimation: Given X 1,..., X n (No B) predict next value p n+1 (X n+1 X 1,..., X n ) Cluster analysis: p n (B X) is a distn on clusterings of [n] p n (B i,j = 1 X) = E(B i,j X) (triples too) p n (#B = r X) (conditional distn of #B) preliminary conclusions... Classification: given (B, X) (n) and X(u n+1 ) = x compute p n+1 (u n+1 b data) Easily computed and fairly useful

19 and cluster processes Potential applications of cluster process Prediction: given (B, X) (n) predict next value Easily computed but not interesting or useful Density estimation: Given X 1,..., X n (No B) predict next value p n+1 (X n+1 X 1,..., X n ) Cluster analysis: p n (B X) is a distn on clusterings of [n] p n (B i,j = 1 X) = E(B i,j X) (triples too) p n (#B = r X) (conditional distn of #B) preliminary conclusions... Classification: given (B, X) (n) and X(u n+1 ) = x compute p n+1 (u n+1 b data) Easily computed and fairly useful

20 and cluster processes Potential applications of cluster process Prediction: given (B, X) (n) predict next value Easily computed but not interesting or useful Density estimation: Given X 1,..., X n (No B) predict next value p n+1 (X n+1 X 1,..., X n ) Cluster analysis: p n (B X) is a distn on clusterings of [n] p n (B i,j = 1 X) = E(B i,j X) (triples too) p n (#B = r X) (conditional distn of #B) preliminary conclusions... Classification: given (B, X) (n) and X(u n+1 ) = x compute p n+1 (u n+1 b data) Easily computed and fairly useful

21 and cluster processes Classification and cluster processes Given (B, x) on a sample of n units and a new unit u with x(u ) = x. Classify the new unit (conditional distribution) Gauss-Ewens conditional distribution { pr(u (#b) φ n+1 (x ; I n+1 + θb b ) b...) λ φ n+1 (x ; I n+1 + θb ) b B b = x = (x, x ); B b = {B, u b}

22 and cluster processes Application to classification (contd) Simplification of conditional distribution { pr(u (#b) φ 1 (x(u ) µ b ; σ b 2 b...) ) λ φ 1 (x(u ); 1 + θ) b B b = µ b = x b n b θ/(1 + n b θ), σ 2 b = 1 + θ/(1 + n bθ) Typical values θ 5 and n b 5 so µ b / x b = n b θ/(1 + n b θ).96 (similar to, but with shrinkage)

23 Cox process and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Construction: λ(x) is a random intensity function on X = R X is a Poisson process with intensity λ. Product density: m(x) = E(λ(x)) first-order p.d. m(x 1, x 2 ) = E(λ(x 1 )λ(x 2 ) ) second order p.d. m(x) = E ( λ(x 1 ) λ(x n ) ) is nth order p.d. X(A) = #(X A) = no of events in A m(x) dx: expected no of ordered n-tuples of events in dx

24 and cluster processes Marked Cox process Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration λ 1 (x),..., λ k (x) indep random intensity functions X (r) Cox process with intensity λ r (x) X = X (r) superposition process Marked process is labelled partition X (1),..., X (k) ) of X pr(x (r) contains event in dx (r) ) = m r (dx (r) ) + o(dx (r) ) pr(x contains event in dx) = m(dx) + o(dx) pr(y x X) = mr (x (r) ) m(x) label y : x C and x (r) = y 1 (r)

25 and cluster processes Boson and related processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Z (x) is a GP N(, K ) on X (complex-valued for simplicity!) λ(x) = Z (x) 2 is intensity at x for Cox process product density: m(x) = E(λ(x 1 ) λ(x n )) = per[k ](x) matrix [K ](x) = {K (x i, x j )} Permanent polynomial of matrix A: per α (A) = σ α #σ A 1,σ(1) A n,σ(n) Convolution or superposition: per α [K ](w) per α [K ]( w) = per α+α [K ](x) w x Macchi 1975; Shirai and Takahashi 23; McCullagh and Møller 26

26 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Intensity functions and Cox processes

27 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Intensity functions and Cox processes

28 and cluster processes Superposition processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration process2[, 2] 5 4

29 and cluster processes Permanent partition models Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Given that x X occurs in the superposition Conditional distribution of labels y : x C p n (y x) = per α 1 [K ](x (1) ) per αk [K ](x (k) ) per α. [K ](x) is a prob distribution on labelled partitions of x. reduces to Dirichlet multinomial if K (x, x ) 1 Classification distribution for new event at x p n+1 (y(x ) = r data) per αr [K ](x (r), x )/ per αr [K ](x (r) )

30 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Conditional relative intensity per α [K ](x, x )/ per α [K ](x) f x K (x, x ) = exp( (x x ) 2 ) on (, 2π). Solid curve is for α = 1, dashed curve for α =.25, and the dotted curve for α =.

31 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Illustration 2: Relative intensity functions per[k ](x.x )/ per[k ](x) ne-dimensional feature space: 2 labelled classes:

32 and cluster processes Chequerboard Pattern Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration A two-class example is generated by a 3 3 chequerboard pattern. The permanent model with α 1 = α 2 = α and K (x, x ) = exp( x x 2 /τ 2 ) is used. Data Set Cross Validation x sum of log(pr) alpha=.5 alpha=1 alpha=

33 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Chequerboard Pattern: Classification Classification x

Stochastic classification models

Stochastic classification models Stochastic classification models Peter McCullagh and Jie Yang Abstract. Two families of stochastic processes are constructed that are intended for use in classification problems where the aim is to classify

More information

How Many Clusters? Peter McCullagh and Jie Yang. April Abstract

How Many Clusters? Peter McCullagh and Jie Yang. April Abstract How Many Clusters? Peter McCullagh and Jie Yang April 006 Abstract The title poses a deceptively simple question that must be addressed by any statistical model or computational algorithm for the clustering

More information

Classification based on a permanental process with cyclic approximation

Classification based on a permanental process with cyclic approximation Classification based on a permanental process with cyclic approximation Jie Yang, Klaus Miescke, and Peter McCullagh 2 University of Illinois at Chicago and 2 University of Chicago arxiv:08.4920v4 [stat.me]

More information

Generalized linear models III Log-linear and related models

Generalized linear models III Log-linear and related models Generalized linear models III Log-linear and related models Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013 Outline Log-linear models Binomial models

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

arxiv:submit/ [stat.me] 10 May 2012

arxiv:submit/ [stat.me] 10 May 2012 Classification based on a permanental process with cyclic approximation arxiv:submit/047265 [stat.me] 0 May 202 Jie Yang, Klaus Miescke, and Peter McCullagh 2 University of Illinois at Chicago and 2 University

More information

Sampling bias in logistic models

Sampling bias in logistic models Sampling bias in logistic models Department of Statistics University of Chicago University of Wisconsin Oct 24, 2007 www.stat.uchicago.edu/~pmcc/reports/bias.pdf Outline Conventional regression models

More information

REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS

REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS Applied Probability Trust (29 September 2014) REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS HARRY CRANE, Rutgers University PETER MCCULLAGH, University of Chicago Abstract We study k-divisible

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Improper mixtures and Bayes s theorem

Improper mixtures and Bayes s theorem and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Regression models, selection effects and sampling bias

Regression models, selection effects and sampling bias Regression models, selection effects and sampling bias Peter McCullagh Department of Statistics University of Chicago Hotelling Lecture I UNC Chapel Hill Dec 1 2008 www.stat.uchicago.edu/~pmcc/reports/bias.pdf

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Statstical models, selection effects and sampling bias

Statstical models, selection effects and sampling bias Statstical models, selection effects and sampling bias Department of Statistics University of Chicago Statslab Cambridge Nov 14 2008 Outline 1 Gaussian models Binary regression model Attenuation of treatment

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

1.1 Review of Probability Theory

1.1 Review of Probability Theory 1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,

More information

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet

More information

An asymptotic approximation for the permanent of a doubly stochastic matrix

An asymptotic approximation for the permanent of a doubly stochastic matrix An asymptotic approximation for the permanent of a doubly stochastic matrix Peter McCullagh University of Chicago August 2, 2012 Abstract A determinantal approximation is obtained for the permanent of

More information

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Chapter 4: Asymptotic Properties of the MLE (Part 2) Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS INFORMATION THEORY AND STATISTICS Solomon Kullback DOVER PUBLICATIONS, INC. Mineola, New York Contents 1 DEFINITION OF INFORMATION 1 Introduction 1 2 Definition 3 3 Divergence 6 4 Examples 7 5 Problems...''.

More information

Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data

Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data Jonathan Ortigosa-Hernández, Iñaki Inza, and Jose A. Lozano Contents 1 Appendix A.

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Bayesian nonparametric latent feature models

Bayesian nonparametric latent feature models Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

4. CONTINUOUS RANDOM VARIABLES

4. CONTINUOUS RANDOM VARIABLES IA Probability Lent Term 4 CONTINUOUS RANDOM VARIABLES 4 Introduction Up to now we have restricted consideration to sample spaces Ω which are finite, or countable; we will now relax that assumption We

More information

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1, Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Infinite latent feature models and the Indian Buffet Process

Infinite latent feature models and the Indian Buffet Process p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

MAS3301 Bayesian Statistics Problems 5 and Solutions

MAS3301 Bayesian Statistics Problems 5 and Solutions MAS3301 Bayesian Statistics Problems 5 and Solutions Semester 008-9 Problems 5 1. (Some of this question is also in Problems 4). I recorded the attendance of students at tutorials for a module. Suppose

More information

Graduate Econometrics I: Maximum Likelihood II

Graduate Econometrics I: Maximum Likelihood II Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Discussion of Dimension Reduction... by Dennis Cook

Discussion of Dimension Reduction... by Dennis Cook Discussion of Dimension Reduction... by Dennis Cook Department of Statistics University of Chicago Anthony Atkinson 70th birthday celebration London, December 14, 2007 Outline The renaissance of PCA? State

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

Markov processes and queueing networks

Markov processes and queueing networks Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution

More information

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015 MFM Practitioner Module: Quantitiative Risk Management October 14, 2015 The n-block maxima 1 is a random variable defined as M n max (X 1,..., X n ) for i.i.d. random variables X i with distribution function

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Bayesian nonparametric latent feature models

Bayesian nonparametric latent feature models Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Infinitely Imbalanced Logistic Regression

Infinitely Imbalanced Logistic Regression p. 1/1 Infinitely Imbalanced Logistic Regression Art B. Owen Journal of Machine Learning Research, April 2007 Presenter: Ivo D. Shterev p. 2/1 Outline Motivation Introduction Numerical Examples Notation

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Generating Functions

Generating Functions Semester 1, 2004 Generating functions Another means of organising enumeration. Two examples we have seen already. Example 1. Binomial coefficients. Let X = {1, 2,..., n} c k = # k-element subsets of X

More information

Likelihood inference in the presence of nuisance parameters

Likelihood inference in the presence of nuisance parameters Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

2017 Financial Mathematics Orientation - Statistics

2017 Financial Mathematics Orientation - Statistics 2017 Financial Mathematics Orientation - Statistics Written by Long Wang Edited by Joshua Agterberg August 21, 2018 Contents 1 Preliminaries 5 1.1 Samples and Population............................. 5

More information

MAS3301 Bayesian Statistics

MAS3301 Bayesian Statistics MAS3301 Bayesian Statistics M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9 1 11 Conjugate Priors IV: The Dirichlet distribution and multinomial observations 11.1

More information

Random Partition Distribution Indexed by Pairwise Information

Random Partition Distribution Indexed by Pairwise Information Random Partition Distribution Indexed by Pairwise Information David B. Dahl 1, Ryan Day 2, and Jerry W. Tsai 3 1 Department of Statistics, Brigham Young University, Provo, UT 84602, U.S.A. 2 Livermore

More information

Statistical distributions: Synopsis

Statistical distributions: Synopsis Statistical distributions: Synopsis Basics of Distributions Special Distributions: Binomial, Exponential, Poisson, Gamma, Chi-Square, F, Extreme-value etc Uniform Distribution Empirical Distributions Quantile

More information

Course Notes. Part IV. Probabilistic Combinatorics. Algorithms

Course Notes. Part IV. Probabilistic Combinatorics. Algorithms Course Notes Part IV Probabilistic Combinatorics and Algorithms J. A. Verstraete Department of Mathematics University of California San Diego 9500 Gilman Drive La Jolla California 92037-0112 jacques@ucsd.edu

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

2.6.3 Generalized likelihood ratio tests

2.6.3 Generalized likelihood ratio tests 26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used

More information

Generalized linear models I Linear models

Generalized linear models I Linear models Generalized linear models I Linear models Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013 Outline Exchangeability, covariates, relationships, Linear

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Exchangeable random hypergraphs

Exchangeable random hypergraphs Exchangeable random hypergraphs By Danna Zhang and Peter McCullagh Department of Statistics, University of Chicago Abstract: A hypergraph is a generalization of a graph in which an edge may contain more

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Locating the Source of Diffusion in Large-Scale Networks

Locating the Source of Diffusion in Large-Scale Networks Locating the Source of Diffusion in Large-Scale Networks Supplemental Material Pedro C. Pinto, Patrick Thiran, Martin Vetterli Contents S1. Detailed Proof of Proposition 1..................................

More information