5 : Exponential Family and Generalized Linear Models
|
|
- Sherilyn Copeland
- 5 years ago
- Views:
Transcription
1 0-708: Probabilistic Graphical Models 0-708, Sprig : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity fuctios that are i expoetial family ca be expressed i the followig form. p(x η) h(x)exp{η T T (x) A(η)} A(η) log h(x)exp{η T T (x)}dx Oe example of expoetial family is multiomial distributio. For give data x (x, x 2,..., x k ), x i Multi(, π i ) ad Σ π i, we ca write the probability of the data as follows. p(x π) π x πx2 2...πx k k e log(πx πx πx k k ) e Σk i xilogπi Thus, i the correspodig expoetial form, η [log π, logπ 2,..., logπ k ], x [x, x 2,..., x k ], T(x) x, A(η) 0 ad h(x). Aother example is Dirichlet distributio. Let α, α 2,..., α k > 0. The probability fuctio of such distributio ca be represeted as a expoetial fuctio. p(π α) B(α) ΠK iπ αi i e ΣK i (αi )logπi logb(α) where A(η) logb(α), η [α, α 2,..., α k ], T (x) [logπ, logπ 2,..., π k ] ad h((x)). 2 Cumulat Geeratig Property Notice that oe appealig feature of the expoetial family is that we ca easily compute momets of the distributio by takig derivatives of the log ormalizer A(η).
2 2 5 : Expoetial Family ad Geeralized Liear Models 2. First cumulat a.k.a Mea The first derivative of A(η) is equal to the mea of sufficiet statistics T (X). da dη d dη logz(η) d Z(η) dη Z(η) { } d h(x)exp{η T T (x)}dx Z(η) dη T (x) h(x)exp{ηt T (x)} dx Z(η) T (x)p(x η)dx E[T (X)] 2.2 Secod cumulat a.k.a Variace The secod derivative of A(η) is equal to the variace or first cetral momet of sufficiet statistics T (X). d 2 A dη 2 T (x)exp{η T T (x) A(η)}(T (x) da dη )h(x)dx T 2 (x) h(x)exp{ηt T (x)} dx da T (x) h(x)exp{ηt T (x)} dx Z(η) dη Z(η) T 2 (x)p(x η)dx da T (x)p(x η)dx dη E[T 2 (X)] (E[T (X)]) 2 V ar[t (X)] 2.3 Momet estimatio Accordigly, the q th derivative gives the q th cetered momet. Whe the sufficiet statistic is a stacked vector, partial derivatives eed to be cosidered. 2.4 Momet vs caoical parameters Sice the momet parameter µ ca be derived from the atural (caoical) parameter η by: Also otice that A(η) is covex sice: da(η) η E[T (x)] def µ d 2 A(η) dη 2 V ar[t (x)] > 0
3 5 : Expoetial Family ad Geeralized Liear Models 3 Hece we ca ivert the relatioship ad ifer the caoical parameter from the momet parameter (-to-) by: η ψ(µ) which meas a distributio i the expoetial family ca be parameterized ot oly by η (the caoical parameterizatio), but also by µ (the momet parameterizatio). 3 Sufficiecy For p(x θ), T(x) is sufficiet for θ if there is o iformatio i X regardig θ beyod that i T(x). θ X T (X) However, it is defied i differet ways i the Bayesia ad frequetist frameworks. Bayesia view θ as a radom variable. To estimate θ, T (X) cotais all the essetial iformatio i X. p(θ T (x), x) p(θ T (x)) Frequetist view θ as a label rather tha a radom variable. distributio of X give T (X) is ot a fuctio of θ. T (X) is sufficiet for θ if the coditioal p(x T (x), θ) p(x T (x)) For udirected models, we have p(x, T (x), θ) ψ (T (x), θ)ψ 2 (x, T (x)) Sice T (x) is fuctio of x, we ca drop T (x) o the left side, ad the divide it by p(θ). p(x θ) g(t (x), θ)h(x, T (x)) Aother importat feature of the expoetial family is that oe ca obtai the sufficiet statistics T (X) simply by ispectio. Oce the distributio fuctio is expressed i the stadard form, we ca directly see T (X) is sufficiet for η. p(x η) h(x)exp{η T T (x) A(η)}
4 4 5 : Expoetial Family ad Geeralized Liear Models 4 MLE for Expoetial Family The reductio obtaied by usig a sufficiet statistic T (X) is particularly otable i the case of IID samplig. Suppose the dataset D is composed of N idepedet radom variables, characterized by the same expoetial family desity. For these i.i.d data, the log-likelihood is l(η; D) log N h(x )exp{η T T (x ) A(η)} log(h(x )) + (η T Take derivative ad set it to zero, we ca get l η A(η) η N ˆµ MLE N N T (x ) N A(η) η T (x ) T (x ) ˆη MLE ψ(ˆµ MLE ) T (x )) NA(η) Our formula ivolves the data oly via the sufficiet statistic N T (X ). This meas that to estimate MLE of η, we oly eed to maitai fixed dimesios of data. For Berouilli, Poisso ad multiomial distributios, it suffices to maitai a sigle value, the sum of the observatios. Idividual data poits ca be throw away. While for the uivariate Gaussia distributio, we eed to maitai the sum N x ad the sum of squares N x Examples. Gaussia distributio: We have [ η Σ µ; ] 2 vec(σ ) T (x) [ x; vec(xx T ) ] A(η) 2 µt Σ µ + log Σ 2 h(x) (2π) k/2. So µ MLE T (x ) N N x.
5 5 : Expoetial Family ad Geeralized Liear Models 5 2. Multiomial distributio: We have η [ l π ] k ; 0 π K T (x) [x] A(η) l h(x). ( K k π k ) So µ MLE N x. 3. Poisso distributio: We have η log λ T (x) x A(η) λ e η h(x) x!. So µ MLE N x. 5 Bayesia Estimatio 5. Cojugacy Prior Likelihood suppose η T (η), posterior p(η φ) exp{φ T T (η) A(η)} p(x η) exp{η T T (x) A(η)} p(η x, φ) p(x η)p(η φ) exp{η T T (x) + φ T T (η) A(η) A(φ)} exp{ T (η) T ( T (x) + φ ) (A(η) + A(φ) )} }{{}}{{}}{{} sufficiet fuc atural parameter A(η,φ)
6 6 5 : Expoetial Family ad Geeralized Liear Models 6 Geeralized Liear Model GLIM is a geeralized form of traditioal liear regressio. As i liear regressio, the observed iput x is assumed to eter the model via a liear combiatio of its elemets ξ θ T x. The output of the model, o the other had, is assumed to have a expoetial family distributio with coditioal mea µ f(ξ), where f is kow as the respose fuctio. Note that for liear regressio f is simply the idetity fuctio. Figure is a graphical represetatio of GLIM. Ad Table lists some correspodece betwee usual regressio types ad choice of f ad Y. Figure : Graphical model of GLIM.. Regressio Type f distributio of Y Liear Regressio idetity N (µ, σ 2 ) Logistic Regressio logistic Beroulli Probit regressio cumulative Gaussia Beroulli Multivariate Regressio logistic Multivariate Table : Examples of regressio types ad choice of f ad Y. 6. Why GLIMs? As a geeralizatio of liear regressio, logistic regressio, probit regressio, etc., GLIM provides a framework for creatig ew coditioal distributios that comes with some coveiet properties. Also GLIMs with the caoical respose fuctios are easy to trai with MLE. However, Bayesia estimatio of GLIMs does t have a closed form of posterior, so we have to tur to approximatio techiques. 6.2 Properties of GLIM Formally, we assume the output of GLIM has the followig form: [ ] p(y η, φ) h(y, φ) exp φ (ηt (x)y A(η)).
7 5 : Expoetial Family ad Geeralized Liear Models 7 This is slightly differet from the traditioal defiitio of EF, where we iclude a ew scale parameter φ; most distributios are aturally expressed i this form. Note that η ψ(µ) ad µ f(ξ) f(θ T x), so we have η ψ(f(θ T x)). So the coditioal distributio of y give x, θ ad φ is [ ] p(y x, θφ) h(y, φ) exp φ (yt ψ(f(θ T x)) A(ψ(f(θ T x)))). There re mostly 2 desig choices of GLIM: the choice of expoetial family ad the choice of f. The choice of the expoetial family is largely costraied by the ature of the data y. E.g., for cotiuous y we use multivariate Gaussia, where for discrete class labels we use Beroulli or multiomial. Respose fuctio is usually chose with some mild costraits, e.g., betwee [0, ] ad beig positive. There s a so-called caoical respose fuctio where we use f ψ ; i this case the coditioal probability is simplified to [ ] p(y x, θφ) h(y, φ) exp φ (θt x y A(θ T x)). Figure 2 lists caoical respose fuctio for several distributios. Figure 3 ad table 2 lists the relatioship betwee variables ad caoical fuctios. Figure 2: Caoical respose fuctio for several distributios.. Figure 3: Relatioship betwee variables ad fuctios.. Regressio Type Caoical Respose µ f(ξ) η f (µ) distributio of Y Liear Regressio Y µ ξ η µ N (µ, σ 2 ) Logistic Regressio Y µ +exp( ξ) η log µ µ Beroulli(µ) Probit regressio N µ φ(ξ) η φ (µ) Beroulli(µ) Table 2: Some regressio types ad their respose/lik fuctios.
8 8 5 : Expoetial Family ad Geeralized Liear Models 6.3 MLE estimatio for GLIMs with caoical respose Now we ca compute the MLE estimatio for caoical respose fuctios: the log likelihood fuctio is l log h(y ) + (θ T x y A(η )). Take derivative with respect to θ(ote that θ is the oly parameter for GLIMs with caoical respose): dl dθ ( x y da(η ) ) dη (y µ )x X T (y µ). dη dθ So we ca do stochastic gradiet ascet with update rule where ρ is the step size. θ (t+) θ (t) + ρ(y (θ (t) ) T x )x Aother method is to use Newto-Raphso methods to obtai a batch-learig algorithm: The update rule is θ (t+) θ (t) H θ J where J is the cost fuctio ad H is the Hessia matrix (secod derivative). We have θ J X T (y µ), ad H 2 l θ θ T θ T (y µ )x x µ η η θ T x µ θ T ( dµ where W diag, dµ 2,, dµ N dη dη 2 dη N X T W X, x µ η x T ). So the update rule is θ (t+) θ (t) H θ J (X T W (t) X) X T W (t) z (t) where the adjusted respose is z (t) Xθ (t) + (W (t) ) (y µ (t) ).
Chapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationClassification with linear models
Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic
More informationMathematical Statistics - MS
Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios
More informationEE 4TM4: Digital Communications II Probability Theory
1 EE 4TM4: Digital Commuicatios II Probability Theory I. RANDOM VARIABLES A radom variable is a real-valued fuctio defied o the sample space. Example: Suppose that our experimet cosists of tossig two fair
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More information1.010 Uncertainty in Engineering Fall 2008
MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval
More information6. Sufficient, Complete, and Ancillary Statistics
Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationLecture 23: Minimal sufficiency
Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationMultinomial likelihood. Multinomial MLE. NIST data and genetic fingerprints. θ = (p 1,..., p m ) with j p j = 1 and p j 0. Point probabilities
Multiomial distributio Let Y,..., Y be iid, uiformly sampled from a fiite populatio ad X i deotes a property of the idividual i. Label the properties,..., m. p j = PX i = j) = umber of idividuals with
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationMATHEMATICAL SCIENCES PAPER-II
MATHEMATICAL SCIENCES PAPER-II. Let {x } ad {y } be two sequeces of real umbers. Prove or disprove each of the statemets :. If {x y } coverges, ad if {y } is coverget, the {x } is coverget.. {x + y } coverges
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationThis section is optional.
4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore
More informationStatistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23
18.650 Statistics for Applicatios Chapter 3: Maximum Likelihood Estimatio 1/23 Total variatio distace (1) ( ) Let E,(IPθ ) θ Θ be a statistical model associated with a sample of i.i.d. r.v. X 1,...,X.
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationProbability and statistics: basic terms
Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationProbability and MLE.
10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationLecture 3: MLE and Regression
STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationLogit regression Logit regression
Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio
More informationLecture 9: September 19
36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace
More information5. Likelihood Ratio Tests
1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,
More informationMatrix Representation of Data in Experiment
Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More information6 Sample Size Calculations
6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationExponential Families and Friends: Learning the Parameters of the a Fully Observed BN. Kayhan Batmanghelich
Epoetia Famiies ad Frieds: Learig te Parameters of te a Fuy Observed BN Kaya Batmageic Macie Learig Te data ispires te structures we wat to predict Iferece fids {best structure, margias, partitio fuctio}
More informationQuestions and Answers on Maximum Likelihood
Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i,
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationLECTURE 8: ASYMPTOTICS I
LECTURE 8: ASYMPTOTICS I We are iterested i the properties of estimators as. Cosider a sequece of radom variables {, X 1}. N. M. Kiefer, Corell Uiversity, Ecoomics 60 1 Defiitio: (Weak covergece) A sequece
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationIE 230 Probability & Statistics in Engineering I. Closed book and notes. No calculators. 120 minutes.
Closed book ad otes. No calculators. 120 miutes. Cover page, five pages of exam, ad tables for discrete ad cotiuous distributios. Score X i =1 X i / S X 2 i =1 (X i X ) 2 / ( 1) = [i =1 X i 2 X 2 ] / (
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationStatistical Theory MT 2009 Problems 1: Solution sketches
Statistical Theory MT 009 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. (a) Let 0 < θ < ad put f(x, θ) = ( θ)θ x ; x = 0,,,... (b) (c) where
More informationIE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.
IE 230 Seat # Name < KEY > Please read these directios. Closed book ad otes. 60 miutes. Covers through the ormal distributio, Sectio 4.7 of Motgomery ad Ruger, fourth editio. Cover page ad four pages of
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationThe Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model
Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationStatistical Theory MT 2008 Problems 1: Solution sketches
Statistical Theory MT 008 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. a) Let 0 < θ < ad put fx, θ) = θ)θ x ; x = 0,,,... b) c) where α
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationMaximum Likelihood Estimation
ECE 645: Estimatio Theory Sprig 2015 Istructor: Prof. Staley H. Cha Maximum Likelihood Estimatio (LaTeX prepared by Shaobo Fag) April 14, 2015 This lecture ote is based o ECE 645(Sprig 2015) by Prof. Staley
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More information1. Parameter estimation point estimation and interval estimation. 2. Hypothesis testing methods to help decision making.
Chapter 7 Parameter Estimatio 7.1 Itroductio Statistical Iferece Statistical iferece helps us i estimatig the characteristics of the etire populatio based upo the data collected from (or the evidece 0produced
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationDirection: This test is worth 250 points. You are required to complete this test within 50 minutes.
Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationAMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2
AMS 216 Stochastic Differetial Equatios Lecture 02 Copyright by Hogyu Wag, UCSC Review of probability theory (Cotiued) Variace: var X We obtai: = E X E( X ) 2 = E( X 2 ) 2E ( X )E X var( X ) = E X 2 Stadard
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationAMS570 Lecture Notes #2
AMS570 Lecture Notes # Review of Probability (cotiued) Probability distributios. () Biomial distributio Biomial Experimet: ) It cosists of trials ) Each trial results i of possible outcomes, S or F 3)
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationTAMS24: Notations and Formulas
TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =
More informationSTAT Homework 1 - Solutions
STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better
More informationLecture 20: Multivariate convergence and the Central Limit Theorem
Lecture 20: Multivariate covergece ad the Cetral Limit Theorem Covergece i distributio for radom vectors Let Z,Z 1,Z 2,... be radom vectors o R k. If the cdf of Z is cotiuous, the we ca defie covergece
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationSummary. Recap ... Last Lecture. Summary. Theorem
Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca
More informationIntroduction to Probability I: Expectations, Bayes Theorem, Gaussians, and the Poisson Distribution. 1
Itroductio to Probability I: Expectatios, Bayes Theorem, Gaussias, ad the Poisso Distributio. 1 Pakaj Mehta February 25, 2019 1 Read: This will itroduce some elemetary ideas i probability theory that we
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationStat410 Probability and Statistics II (F16)
Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationLecture Notes 15 Hypothesis Testing (Chapter 10)
1 Itroductio Lecture Notes 15 Hypothesis Testig Chapter 10) Let X 1,..., X p θ x). Suppose we we wat to kow if θ = θ 0 or ot, where θ 0 is a specific value of θ. For example, if we are flippig a coi, we
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard
More informationPRACTICE PROBLEMS FOR THE FINAL
PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationLast time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object
6.3 Stochastic Estimatio ad Cotrol, Fall 004 Lecture 7 Last time: Momets of the Poisso distributio from its geeratig fuctio. Gs () e dg µ e ds dg µ ( s) µ ( s) µ ( s) µ e ds dg X µ ds X s dg dg + ds ds
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationLecture 18: Sampling distributions
Lecture 18: Samplig distributios I may applicatios, the populatio is oe or several ormal distributios (or approximately). We ow study properties of some importat statistics based o a radom sample from
More informationEE / EEE SAMPLE STUDY MATERIAL. GATE, IES & PSUs Signal System. Electrical Engineering. Postal Correspondence Course
Sigal-EE Postal Correspodece Course 1 SAMPLE STUDY MATERIAL Electrical Egieerig EE / EEE Postal Correspodece Course GATE, IES & PSUs Sigal System Sigal-EE Postal Correspodece Course CONTENTS 1. SIGNAL
More informationPRACTICE PROBLEMS FOR THE FINAL
PRACTICE PROBLEMS FOR THE FINAL Math 36Q Sprig 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2
More informationAn Introduction to Asymptotic Theory
A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu
More informationBinomial Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More information