Partition models and cluster processes
|
|
- Moses Kory Wells
- 5 years ago
- Views:
Transcription
1 and cluster processes and cluster processes With applications to classification Jie Yang Department of Statistics University of Chicago ICM, Madrid, August 26
2 and cluster processes utline 1 and cluster processes 2 Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration
3 and cluster processes Fisher Discriminant model Description as a process (i.i.d.) k classes with frequencies π 1,..., π k Features X for class r are iid N(µ r, Σ) on X = R q bservations (Y 1, X 1 ), (Y 2, X 2 ),... are i.i.d pr(y i = r) = π r i.i.d. Given Y, features are iid X j N(µ yj, Σ) Density of (Y i, X i ) at (y, x) π y Σ q/2 exp ( (x µ y ) Σ 1 (x µ y )/2 )
4 and cluster processes Fisher Discriminant model Application to classification Estimation step: ˆπ r = n r /n, ˆµ r = x r, ˆΣ = S; Classification step: for a new unit u such that X(u ) = x pr(y (u ) = r data) π r φ(x µ r ; Σ) (independent of data other than x ) Substitution step: ˆpr(Y (u ) = r data) n r φ(x x r ; S) (dependent on data through estimate)
5 and cluster processes Description as a process (indep but not i.d.) Feature x(u) regarded as a covariate for unit u Class Y (u) regarded as response Linear logistic model: Components Y (u 1 ), Y (u 2 ),... independent log pr(y (u) = r) = α r + β r x(u) + const consistent with Fisher model β r = Σ 1 µ r
6 and cluster processes Application to classification Estimation step: maximum likelihood gives ˆα, ˆβ Classification step: pr(y (u ) = r data) = exp(α r + β r x(u )) c exp(α c + β c x(u )) (independent of data other than x(u )) Substitution step: β ˆβ (fitted = predicted)
7 and cluster processes Random partition of [n] = {1,..., n} A set of disjoint non-empty subsets whose union is [n] n = 2: B = {{1, 2}} = 12 or B = {{1}, {2}} = 1 2 n = 3: B 3 = {123, 1 23, 2 13, 3 12, 1 2 3} n = 4: B 4 = {1234, 123 4[4], 12 34[3], [6], }. Bell numbers: #B 3 = 5; #B 4 = 15; #B 5 = 52; #B 6 = B = B 5 1
8 and cluster processes Permutation of elements: B σ ij = B σ(i)σ(j) B = B 5 1 Example 1: σ = (1, 3): B σ = Example 2: σ = (2, 4, 5): B σ = B σ (1, 5) = B(1, 2) = 1, B σ (1, 2) = B(1, 4) = Block sizes unaffected Exchangeable product partition distribution (Hartigan 199) p n (B) = D n C(#b) b B
9 and cluster processes Deletion of elements Deletion (of 5) from gives in B 4 Deletion (of 5) from gives in B 4 Deletion is a map D : B n B n 1 How does this affect distributions? Example: p 5 is uniform on the 15 elements of B 5 Marginal distribution on B 4 after deletion is p 4 (1234) = p 5 (12345) + p 5 (1234 5) = 2/15 p 4 (123 4) = p 5 (1235 4) + p 5 (123 45) + p 5 ( ) = 3/15 p 4 (12 3 4) = p 5 ( ) + + p 5 ( ) = 4/15 Marginal distribution not uniform.
10 and cluster processes Ewens partition process p n (B) = λ#b Γ(λ) Γ(#b) Γ(n + λ) b B Exponential family of distns on B n canonical parameter θ = log λ canonical statistic #B: E(#B) λ log(n) cumulant function log Γ(n + e θ ) log Γ(e θ ) Product partition model: C(#b) = λ (#b 1)!. Finitely exchangeable for each n p n is marginal distn of p n+1
11 and cluster processes distribution for n = 4 p 4 (1234) = 6/(λ + 1)(λ + 2)(λ + 3), p 4 (123 4) = 2λ/(λ + 1)(λ + 2)(λ + 3), p 4 (12 34) = λ/(λ + 1)(λ + 2)(λ + 3) p 4 (12 3 4) = λ 2 /(λ + 1)(λ + 2)(λ + 3), p 4 ( ) = λ 3 /(λ + 1)(λ + 2)(λ + 3) p 3 (123) = p 4 (1234) + p 4 (123 4) = 2/(λ + 1)(λ + 2) p 3 (12 3) = p 4 (124 3) + p 4 (12 34) + p 4 (12 3 4) p 3 (1 2 3) = p 4 (14 2 3) + p 4 (1 24 3) + p 4 (1 2 34) + p 4 ( ) p n is marginal distribution of p n+1 Infinitely exchangeable partition process
12 and cluster processes Chinese restaurant process Ewens process: B B is a random partition of natural numbers Regard as a sequence of matrices B 1, B 2,... such that B n (i, j) = B (i, j) is leading sub-matrix of order n. B n Ewens(n, λ) Conditional distribution of B n+1 given B n Transition distribution B n B n+1 { #b/(n + λ) b B n pr(u n+1 b B n ) = λ/(n + λ) b = CRP interpretation due to Dubins and Pitman No of occupied tables is approx Poisson(λ log(n))
13 and cluster processes Gauss Ewens cluster processes Pair (B, X) such that X = (X 1, X 2,...) (random sequence) B = {B ij } is a random partition of integers Finite-dimensional distributions p n (B, x) is distn on B n X n Invariant under permutation p n (B σ, x σ ) = p n (B, x) p n is the marginal of p n+1 B Ewens n (λ); given B, X N(, Σ = I n + θb) p n (B, x) = λ#b Γ(λ) Γ(#b) 2πΣ 1/2 exp( x Σ 1 x/2) Γ(n + λ) b B
14 and cluster processes rpts$points[,2] rpts$points[,2] rpts$points[,1] rpts$points[,1] rpts$points[,2] rpts$points[,2] 4 7 5
15 and cluster processes rpts$points[,2] rpts$points[,2] rpts$points[,1] rpts$points[,1] rpts$points[,2] rpts$points[,2] 9 9 8
16 and cluster processes!!!!! " $ % 6 9 # # 7 # # # rpts$points[,2] rpts$points[,2] * * rpts$points[,1] + * * * )) * 2, * * + rpts$points[,1] + 2 +, + rpts$points[,1], ) rpts$points[,2] rpts$points[,2] 8 " " " $ $ %! # 8 $ " " " " & #! #!! 8 ( $ $ " "! & & #
17 and cluster processes Potential applications of cluster process Prediction: given (B, X) (n) predict next value Easily computed but not interesting or useful Density estimation: Given X 1,..., X n (No B) predict next value p n+1 (X n+1 X 1,..., X n ) Cluster analysis: p n (B X) is a distn on clusterings of [n] p n (B i,j = 1 X) = E(B i,j X) (triples too) p n (#B = r X) (conditional distn of #B) preliminary conclusions... Classification: given (B, X) (n) and X(u n+1 ) = x compute p n+1 (u n+1 b data) Easily computed and fairly useful
18 and cluster processes Potential applications of cluster process Prediction: given (B, X) (n) predict next value Easily computed but not interesting or useful Density estimation: Given X 1,..., X n (No B) predict next value p n+1 (X n+1 X 1,..., X n ) Cluster analysis: p n (B X) is a distn on clusterings of [n] p n (B i,j = 1 X) = E(B i,j X) (triples too) p n (#B = r X) (conditional distn of #B) preliminary conclusions... Classification: given (B, X) (n) and X(u n+1 ) = x compute p n+1 (u n+1 b data) Easily computed and fairly useful
19 and cluster processes Potential applications of cluster process Prediction: given (B, X) (n) predict next value Easily computed but not interesting or useful Density estimation: Given X 1,..., X n (No B) predict next value p n+1 (X n+1 X 1,..., X n ) Cluster analysis: p n (B X) is a distn on clusterings of [n] p n (B i,j = 1 X) = E(B i,j X) (triples too) p n (#B = r X) (conditional distn of #B) preliminary conclusions... Classification: given (B, X) (n) and X(u n+1 ) = x compute p n+1 (u n+1 b data) Easily computed and fairly useful
20 and cluster processes Potential applications of cluster process Prediction: given (B, X) (n) predict next value Easily computed but not interesting or useful Density estimation: Given X 1,..., X n (No B) predict next value p n+1 (X n+1 X 1,..., X n ) Cluster analysis: p n (B X) is a distn on clusterings of [n] p n (B i,j = 1 X) = E(B i,j X) (triples too) p n (#B = r X) (conditional distn of #B) preliminary conclusions... Classification: given (B, X) (n) and X(u n+1 ) = x compute p n+1 (u n+1 b data) Easily computed and fairly useful
21 and cluster processes Classification and cluster processes Given (B, x) on a sample of n units and a new unit u with x(u ) = x. Classify the new unit (conditional distribution) Gauss-Ewens conditional distribution { pr(u (#b) φ n+1 (x ; I n+1 + θb b ) b...) λ φ n+1 (x ; I n+1 + θb ) b B b = x = (x, x ); B b = {B, u b}
22 and cluster processes Application to classification (contd) Simplification of conditional distribution { pr(u (#b) φ 1 (x(u ) µ b ; σ b 2 b...) ) λ φ 1 (x(u ); 1 + θ) b B b = µ b = x b n b θ/(1 + n b θ), σ 2 b = 1 + θ/(1 + n bθ) Typical values θ 5 and n b 5 so µ b / x b = n b θ/(1 + n b θ).96 (similar to, but with shrinkage)
23 Cox process and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Construction: λ(x) is a random intensity function on X = R X is a Poisson process with intensity λ. Product density: m(x) = E(λ(x)) first-order p.d. m(x 1, x 2 ) = E(λ(x 1 )λ(x 2 ) ) second order p.d. m(x) = E ( λ(x 1 ) λ(x n ) ) is nth order p.d. X(A) = #(X A) = no of events in A m(x) dx: expected no of ordered n-tuples of events in dx
24 and cluster processes Marked Cox process Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration λ 1 (x),..., λ k (x) indep random intensity functions X (r) Cox process with intensity λ r (x) X = X (r) superposition process Marked process is labelled partition X (1),..., X (k) ) of X pr(x (r) contains event in dx (r) ) = m r (dx (r) ) + o(dx (r) ) pr(x contains event in dx) = m(dx) + o(dx) pr(y x X) = mr (x (r) ) m(x) label y : x C and x (r) = y 1 (r)
25 and cluster processes Boson and related processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Z (x) is a GP N(, K ) on X (complex-valued for simplicity!) λ(x) = Z (x) 2 is intensity at x for Cox process product density: m(x) = E(λ(x 1 ) λ(x n )) = per[k ](x) matrix [K ](x) = {K (x i, x j )} Permanent polynomial of matrix A: per α (A) = σ α #σ A 1,σ(1) A n,σ(n) Convolution or superposition: per α [K ](w) per α [K ]( w) = per α+α [K ](x) w x Macchi 1975; Shirai and Takahashi 23; McCullagh and Møller 26
26 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Intensity functions and Cox processes
27 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Intensity functions and Cox processes
28 and cluster processes Superposition processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration process2[, 2] 5 4
29 and cluster processes Permanent partition models Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Given that x X occurs in the superposition Conditional distribution of labels y : x C p n (y x) = per α 1 [K ](x (1) ) per αk [K ](x (k) ) per α. [K ](x) is a prob distribution on labelled partitions of x. reduces to Dirichlet multinomial if K (x, x ) 1 Classification distribution for new event at x p n+1 (y(x ) = r data) per αr [K ](x (r), x )/ per αr [K ](x (r) )
30 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Conditional relative intensity per α [K ](x, x )/ per α [K ](x) f x K (x, x ) = exp( (x x ) 2 ) on (, 2π). Solid curve is for α = 1, dashed curve for α =.25, and the dotted curve for α =.
31 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Illustration 2: Relative intensity functions per[k ](x.x )/ per[k ](x) ne-dimensional feature space: 2 labelled classes:
32 and cluster processes Chequerboard Pattern Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration A two-class example is generated by a 3 3 chequerboard pattern. The permanent model with α 1 = α 2 = α and K (x, x ) = exp( x x 2 /τ 2 ) is used. Data Set Cross Validation x sum of log(pr) alpha=.5 alpha=1 alpha=
33 and cluster processes Cox process models Permanent partition process Superposition of permanent processes Permanent partition models Conditional relative intensity Chequerboard illustration Chequerboard Pattern: Classification Classification x
Stochastic classification models
Stochastic classification models Peter McCullagh and Jie Yang Abstract. Two families of stochastic processes are constructed that are intended for use in classification problems where the aim is to classify
More informationHow Many Clusters? Peter McCullagh and Jie Yang. April Abstract
How Many Clusters? Peter McCullagh and Jie Yang April 006 Abstract The title poses a deceptively simple question that must be addressed by any statistical model or computational algorithm for the clustering
More informationClassification based on a permanental process with cyclic approximation
Classification based on a permanental process with cyclic approximation Jie Yang, Klaus Miescke, and Peter McCullagh 2 University of Illinois at Chicago and 2 University of Chicago arxiv:08.4920v4 [stat.me]
More informationGeneralized linear models III Log-linear and related models
Generalized linear models III Log-linear and related models Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013 Outline Log-linear models Binomial models
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationarxiv:submit/ [stat.me] 10 May 2012
Classification based on a permanental process with cyclic approximation arxiv:submit/047265 [stat.me] 0 May 202 Jie Yang, Klaus Miescke, and Peter McCullagh 2 University of Illinois at Chicago and 2 University
More informationSampling bias in logistic models
Sampling bias in logistic models Department of Statistics University of Chicago University of Wisconsin Oct 24, 2007 www.stat.uchicago.edu/~pmcc/reports/bias.pdf Outline Conventional regression models
More informationREVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS
Applied Probability Trust (29 September 2014) REVERSIBLE MARKOV STRUCTURES ON DIVISIBLE SET PAR- TITIONS HARRY CRANE, Rutgers University PETER MCCULLAGH, University of Chicago Abstract We study k-divisible
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationImproper mixtures and Bayes s theorem
and Bayes s theorem and Han Han Department of Statistics University of Chicago DASF-III conference Toronto, March 2010 Outline Bayes s theorem 1 Bayes s theorem 2 Bayes s theorem Non-Bayesian model: Domain
More informationEEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as
L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace
More informationRegression models, selection effects and sampling bias
Regression models, selection effects and sampling bias Peter McCullagh Department of Statistics University of Chicago Hotelling Lecture I UNC Chapel Hill Dec 1 2008 www.stat.uchicago.edu/~pmcc/reports/bias.pdf
More informationPCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities
PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationStatstical models, selection effects and sampling bias
Statstical models, selection effects and sampling bias Department of Statistics University of Chicago Statslab Cambridge Nov 14 2008 Outline 1 Gaussian models Binary regression model Attenuation of treatment
More informationThe logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is
Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More information1.1 Review of Probability Theory
1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,
More informationCS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I
X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet
More informationAn asymptotic approximation for the permanent of a doubly stochastic matrix
An asymptotic approximation for the permanent of a doubly stochastic matrix Peter McCullagh University of Chicago August 2, 2012 Abstract A determinantal approximation is obtained for the permanent of
More informationChapter 4: Asymptotic Properties of the MLE (Part 2)
Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationLecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016
Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationINFORMATION THEORY AND STATISTICS
INFORMATION THEORY AND STATISTICS Solomon Kullback DOVER PUBLICATIONS, INC. Mineola, New York Contents 1 DEFINITION OF INFORMATION 1 Introduction 1 2 Definition 3 3 Divergence 6 4 Examples 7 5 Problems...''.
More informationAppendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data
Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data Jonathan Ortigosa-Hernández, Iñaki Inza, and Jose A. Lozano Contents 1 Appendix A.
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More information4. CONTINUOUS RANDOM VARIABLES
IA Probability Lent Term 4 CONTINUOUS RANDOM VARIABLES 4 Introduction Up to now we have restricted consideration to sample spaces Ω which are finite, or countable; we will now relax that assumption We
More informationEconomics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,
Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem
More informationUQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables
UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1
More informationHybrid Dirichlet processes for functional data
Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationMAS3301 Bayesian Statistics Problems 5 and Solutions
MAS3301 Bayesian Statistics Problems 5 and Solutions Semester 008-9 Problems 5 1. (Some of this question is also in Problems 4). I recorded the attendance of students at tutorials for a module. Suppose
More informationGraduate Econometrics I: Maximum Likelihood II
Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationDiscussion of Dimension Reduction... by Dennis Cook
Discussion of Dimension Reduction... by Dennis Cook Department of Statistics University of Chicago Anthony Atkinson 70th birthday celebration London, December 14, 2007 Outline The renaissance of PCA? State
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationAnswer Key for STAT 200B HW No. 7
Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;
More informationMarkov processes and queueing networks
Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution
More informationMFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015
MFM Practitioner Module: Quantitiative Risk Management October 14, 2015 The n-block maxima 1 is a random variable defined as M n max (X 1,..., X n ) for i.i.d. random variables X i with distribution function
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationWeek 9 The Central Limit Theorem and Estimation Concepts
Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population
More information1 Review of Probability and Distributions
Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationInfinitely Imbalanced Logistic Regression
p. 1/1 Infinitely Imbalanced Logistic Regression Art B. Owen Journal of Machine Learning Research, April 2007 Presenter: Ivo D. Shterev p. 2/1 Outline Motivation Introduction Numerical Examples Notation
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationGenerating Functions
Semester 1, 2004 Generating functions Another means of organising enumeration. Two examples we have seen already. Example 1. Binomial coefficients. Let X = {1, 2,..., n} c k = # k-element subsets of X
More informationLikelihood inference in the presence of nuisance parameters
Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationSTATISTICS SYLLABUS UNIT I
STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More information2017 Financial Mathematics Orientation - Statistics
2017 Financial Mathematics Orientation - Statistics Written by Long Wang Edited by Joshua Agterberg August 21, 2018 Contents 1 Preliminaries 5 1.1 Samples and Population............................. 5
More informationMAS3301 Bayesian Statistics
MAS3301 Bayesian Statistics M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9 1 11 Conjugate Priors IV: The Dirichlet distribution and multinomial observations 11.1
More informationRandom Partition Distribution Indexed by Pairwise Information
Random Partition Distribution Indexed by Pairwise Information David B. Dahl 1, Ryan Day 2, and Jerry W. Tsai 3 1 Department of Statistics, Brigham Young University, Provo, UT 84602, U.S.A. 2 Livermore
More informationStatistical distributions: Synopsis
Statistical distributions: Synopsis Basics of Distributions Special Distributions: Binomial, Exponential, Poisson, Gamma, Chi-Square, F, Extreme-value etc Uniform Distribution Empirical Distributions Quantile
More informationCourse Notes. Part IV. Probabilistic Combinatorics. Algorithms
Course Notes Part IV Probabilistic Combinatorics and Algorithms J. A. Verstraete Department of Mathematics University of California San Diego 9500 Gilman Drive La Jolla California 92037-0112 jacques@ucsd.edu
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationCMPE 58K Bayesian Statistics and Machine Learning Lecture 5
CMPE 58K Bayesian Statistics and Machine Learning Lecture 5 Multivariate distributions: Gaussian, Bernoulli, Probability tables Department of Computer Engineering, Boğaziçi University, Istanbul, Turkey
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More information2.6.3 Generalized likelihood ratio tests
26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used
More informationGeneralized linear models I Linear models
Generalized linear models I Linear models Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013 Outline Exchangeability, covariates, relationships, Linear
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationExchangeable random hypergraphs
Exchangeable random hypergraphs By Danna Zhang and Peter McCullagh Department of Statistics, University of Chicago Abstract: A hypergraph is a generalization of a graph in which an edge may contain more
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationLocating the Source of Diffusion in Large-Scale Networks
Locating the Source of Diffusion in Large-Scale Networks Supplemental Material Pedro C. Pinto, Patrick Thiran, Martin Vetterli Contents S1. Detailed Proof of Proposition 1..................................
More information