Algebraic Geometry and Model Selection
|
|
- Arron Jones
- 5 years ago
- Views:
Transcription
1 Algebraic Geometry and Model Selection American Institute of Mathematics 2011/Dec/12-16 I would like to thank Prof. Russell Steele, Prof. Bernd Sturmfels, and all participants. Thank you very much. Sumio Watanabe Tokyo Institute of Technology
2 Contents 1. AIC, BIC, and DIC 2. Birational Invariants 3. Singular Fluctuation 4. Open Questions
3 1 AIC, BIC, and DIC 12/15/2011 Birational probability theory 3
4 Statistical Model and True Distribution x R N, w W(compact) R d (1) True dist. q(x) i.i.d. X 1, X 2,, X n (2) Statistical model p(x w) (3) Prior dist. j(w) (Remark) E[ ] shows the expectation over X 1,X 2,,X n. E x [ ] does that over X whose prob. dist. is q(x). 12/15/2011 Birational probability theory 4
5 Statistical Estimation Posterior Dist. E w [ ] = n ( ) P p(x i w) j(w) dw n i=1 P p(x i w) j(w) dw i=1 Predictive Dist. p*(x) = E w [ p(x w) ] (Remark) In Bayesian estimation, the true distribution q(x) is estimated by p*(x). 12/15/2011 Birational probability theory 5
6 Stochastic Complexity and Generalization Loss (1) Stochastic Complexity = - (Bayes Marginal) F = - log n P p(x i w) j(w) dw i=1 (2) Generalization Loss G = - E x [ log p*(x) ] = S + KL( q(x) p*(x) ) (Remark) S = -E x [log q(x)] is the entropy of q(x). 12/15/2011 Birational probability theory 6
7 BIC(Schwarz,1978) (1) BIC = - S log p(x i w MLE ) + (d/2) log n If the posterior ~ normal distribution. F = BIC + O p (1). In general, yesterday, we learned F = - S log p(x i w MLE ) + l log n (m-1)loglog n + O p (1). RLCT In our workshop, Dr. Lin: Relation to asymptotic integral. Dr. Drton: How to use in statistics. Dr. Leykin: D-module and b-function. 12/15/2011 Birational probability theory 7
8 AIC(Akaike,1974) & DIC(Spiegelhalter,et.al. 2002) (2) AIC = - S log p(x i w MLE ) + d (3) DIC = - S log E w [p(x i w) ] + 2 S {-E w [log p(x i w)]+log p(x i E w [w] ) } If the posterior ~ the normal distribution. E[AIC]= n E[G]+o(1), E[DIC]= n E[G]+o(1). If otherwise, such relations do not hold. 12/15/2011 Birational probability theory 8
9 Estimation of G and F in regular cases Main Term Estimated Consistency in model selection Unbiased Estimator of G AIC, DIC Random (Order 1) No Yes BIC Constant (log n) Yes No 12/15/2011 Birational probability theory 9
10 2 Birational Statistics 12/15/2011 Birational probability theory 10
11 Birational Invariant If a value that is defined using resolution of singularities does not depend on the choice of resolution, then it is called a birational invariant. g1 U1 W g2 g3 U2 U3 12/15/2011 Birational probability theory 11
12 Nature in Statistics It is natural that statistical theory should be made to be invariant under birational transform. Fisher s asymptotic theory does not satisfy such invariance. For example, it is not invariant under blow-up. Y = ax + b + Noise a = c =c d b = cd=d Asymptotic normality of (a,b) holds, whereas that of (c,d) in projective space not. 12/15/2011 Birational probability theory 12
13 Differential and Birational Algebraic geometry studies mathematical properties those are invariant under the birational transform. diffeo birational To construct birational statistics might be one of the purposes of algebraic statistics. 12/15/2011 Birational probability theory 13
14 3 Singular Fluctuation and model selection 12/15/2011 Birational probability theory 14
15 Loss for estimation Predictive Dist. p*(x) = E w [ p(x w) ] Generalization Loss G n = - E x [ log E w [p(x w)] ] Training Loss n T n = - (1/n) S log E w [p(x i w)] i=1 12/15/2011 Birational probability theory 15
16 Functional Cumulant Def. Two Cumulant Generating Functions g(a)= - E x [ log E w [p(x w) a ] ] n t(a) = - (1/n) S log E w [p(x i w) a ] i=1 Then g(0)=t(0)=0, and G n =g(1), T n = t(1). 12/15/2011 Birational probability theory 16
17 Invariance Two functions g(a) and t(a) are invariant under w = g(u) p(x w) = p(x g(u)) j(w) dw = j(g(u)) g (u) du Cumulant generating function = Birational invariant generating function Example. l = lim n { E[g(1)]-S } n o Birational probability theory
18 Notation Def. Log density ratio function f(x,w)=log( q(x)/p(x w) ). Then log p(x w)= log q(x) + f(x,w). Birational probability theory
19 g(a) is rewritten as Expansion of g(a) g(a) = as - E x [ log E w [exp(-af(x,w))] ] Therefore g (0) = S + E x [ E w [f(x,w)] ] g (0) = - E x [ E w [f(x,w) 2 ] - E w [f(x,w)] 2 ] 12/15/2011 Birational probability theory 19
20 By the same way, Expansion of t(a) n t(a) = as n - (1/n) S log E w [exp(-af(x i w))] i=1 Therefore t (0) = S n + (1/n) S E w [f(x i,w)] n t (0) = - (1/n) S { E w [f(x i,w) 2 ] - E w [f(x i,w)] 2 } i=1 n i=1 12/15/2011 Birational probability theory 20
21 Functional Variance Def. Two random variables V 1 = n E x [ E w [f(x,w)] ] l n V 2 = S { E w [ (log p(x i w) ) 2 ] - E w [ log p(x i w) ] 2 } i=1 Remark. V 2 can be calculated by samples and a model without any information about true dist.. In order to calculate V 1, we need the information of the true distribution. 12/15/2011 Birational probability theory 21
22 Singular Fluctuation Theorem. Convergences hold, V 1 V 1 * E[V 1 ] E[V 1 *] V 2 V 2 * Theorem and Def. E[V 2 ] E[V 2 *] Singular Fluctuation E[V 1 *] = E[V 2 *] = 2n Remark. In order to prove the above theorems, we need resolution theorem and empirical process theory. In regular cases, l=n=d/ /15/2011 Birational probability theory
23 Outline of Proofs Posterior distribution measure exp(-nk n (w)) j(w) dw K n (w)=(1/n)s f(x i,w) = exp( - nu 2k + n 1/2 u k ξ n (u) ) j(g(u)) g (u) du o = dt d(t-nu 2k ) u h b(u) exp( - t + t 1/2 ξ n (u) ) du 0 (log n) = m-1 dt t l-1 exp( - t + t 1/2 ξ n (u) ) D(u)du n l 12/15/2011 Birational probability theory 23
24 Outline of Proofs 2 Def. Expectation over the limit posterior distribution < = > dt dud(u)( ) t l-1 exp( -t+t 1/2 x(u)) dt dud(u) t l-1 exp( -t+t 1/2 x(u)) Lemma. For s 0, n s/2 E w [ f(x,w) s ] < t s/2 a(x,u) s > 12/15/2011 Birational probability theory 24
25 Cumulants --- Random Variables g (0) = S + (l+v 1 /2)/n + o p (1/n) g (0)= - V 2 /n + o p (1/n) t (0) = S n + (l-v 1 /2)/n + o p (1/n) t (0) = - V 2 /n + o p (1/n) 12/15/2011 Birational probability theory 25
26 Cumulants --- Birational invariants. E[ g (0) ] = S + (l+n)/n + o p (1/n) E[ g (0) ] = - 2n/n + o p (1/n) E[ t (0) ] = S + (l-n)/n + o p (1/n) E[ t (0) ]= - 2n /n + o p (1/n) 12/15/2011 Birational probability theory 26
27 Theorem Generalization Loss G n = S + [ l + V 1 /2 V 2 /2 ] / n +o p (1/n) Training Loss T n = S n + [ l - V 1 /2 - V 2 /2 ] / n +o p (1/n) 12/15/2011 Birational probability theory 27
28 Theorem When n tends to infinity, E[ G n ] = S + l / n + o(1/n), E[ T n ] = S + ( l-2n ) / n + o(1/n), E[ V 2 ] = 2n + o(1). 12/15/2011 Birational probability theory 28
29 Def. WAIC WAIC is defined by W n = T n + V 2 /n. Theorem For arbitrary set (q(x),p(x w),j(w)), E[ G n ] = E[ W n ] + o(1/n 2 ). (G n -S) +( W n S n ) = 2l/n + o p (1/n). Remark. WAIC is asymptotically equivalent to Bayes cross validation (Watanabe,JMLR,Vol.11, ,2010) 12/15/2011 Birational probability theory 29W
30 T n -S n W n -S n + G-S Red =W n -S n Theory=l/n Blue =G n - S 100 Indep. Trials Reduced Rank Regression True Thank you By Theory, l=12 n=200 Metropolis /15/2011 Birational probability theory 30
31 G n -S 0.1 (G n -S)+(W n -S n ) = 2l/n W n - S n
32 WAIC (1) E[ W n ]= E[ G n ]+o(1/n 2 ) holds even if q(x) is not realizable by p(x w). (2) The essential main term is fluctuated. Inconsistency in model selection. (3) If the posterior can be approximated by a normal distribution, WAIC is equivalent to AIC and DIC as a random variable. (4) If otherwise, WAIC is unbiased estimator of generalization error, whereas either AIC or DIC not. 12/15/2011 Birational probability theory 32
33 4 Open Problems 12/15/2011 Birational probability theory 33
34 Two Biratinal Invariants If the regularity condition is satisfied, l = n = d/2. In general, they are different. l = Dimension that shows how fast the posterior shrinks. n = Dimension that shows how strong the posterior fluctuates. Q. Mathematically, what is n? 12/15/2011 Birational probability theory 34
35 Generalized RLCT g(a)= - E x [ log E w [p(x w) a ] ] t(a) = - (1/n) S log E w [p(x i w) a ] n i=1 Q. E[ (d/da) k g(0) ] and E[ (d/da) k t(0) ] (k=1,2,,) are all birational invariants. They are statistically generalized ones from real log canonical threshold. What are they? 12/15/2011 Birational probability theory 35
36 Variance of Information Criterion E[ G n ] = E[ W n ] + o(1/n 2 ). (G n -S) +( W n S n ) = 2l/n + o p (1/n). Q. These equations show that WAIC seems to have the smallest variance among the information criteria whose averages are equal to G n. Can we prove this? 12/15/2011 Birational probability theory 36
37 Bayes hypothesis testing For a given prior j(w), we define F(j) = - log P p(x i w) j(w) dw n i=1 If the null hypothesis is j 0 (w), and if the alternative is j 1 (w), then the most powerful test is given by DF = F(j 1 )-F(j 0 ). Q. In order to make the most powerful hypothesis test, probability distribution of DF is necessary. 12/15/2011 Birational probability theory 37
Algebraic Information Geometry for Learning Machines with Singularities
Algebraic Information Geometry for Learning Machines with Singularities Sumio Watanabe Precision and Intelligence Laboratory Tokyo Institute of Technology 4259 Nagatsuta, Midori-ku, Yokohama, 226-8503
More informationWhat is Singular Learning Theory?
What is Singular Learning Theory? Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 23 Sep 2011 McGill University Singular Learning Theory A statistical model is regular if it is identifiable and its
More informationA Widely Applicable Bayesian Information Criterion
Journal of Machine Learning Research 14 (2013) 867-897 Submitted 8/12; Revised 2/13; Published 3/13 A Widely Applicable Bayesian Information Criterion Sumio Watanabe Department of Computational Intelligence
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationAsymptotic Analysis of the Bayesian Likelihood Ratio for Testing Homogeneity in Normal Mixture Models
Asymptotic Analysis of the Bayesian Likelihood Ratio for Testing Homogeneity in Normal Mixture Models arxiv:181.351v1 [math.st] 9 Dec 18 Natsuki Kariya, and Sumio Watanabe Department of Mathematical and
More informationAsymptotic Approximation of Marginal Likelihood Integrals
Asymptotic Approximation of Marginal Likelihood Integrals Shaowei Lin 10 Dec 2008 Abstract We study the asymptotics of marginal likelihood integrals for discrete models using resolution of singularities
More informationHow New Information Criteria WAIC and WBIC Worked for MLP Model Selection
How ew Information Criteria WAIC and WBIC Worked for MLP Model Selection Seiya Satoh and Ryohei akano ational Institute of Advanced Industrial Science and Tech, --7 Aomi, Koto-ku, Tokyo, 5-6, Japan Chubu
More informationStatistical Learning Theory
Statistical Learning Theory Part I : Mathematical Learning Theory (1-8) By Sumio Watanabe, Evaluation : Report Part II : Information Statistical Mechanics (9-15) By Yoshiyuki Kabashima, Evaluation : Report
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationA Bayesian view of model complexity
A Bayesian view of model complexity Angelika van der Linde University of Bremen, Germany 1. Intuitions 2. Measures of dependence between observables and parameters 3. Occurrence in predictive model comparison
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationStochastic Complexities of Reduced Rank Regression in Bayesian Estimation
Stochastic Complexities of Reduced Rank Regression in Bayesian Estimation Miki Aoyagi and Sumio Watanabe Contact information for authors. M. Aoyagi Email : miki-a@sophia.ac.jp Address : Department of Mathematics,
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationSRNDNA Model Fitting in RL Workshop
SRNDNA Model Fitting in RL Workshop yael@princeton.edu Topics: 1. trial-by-trial model fitting (morning; Yael) 2. model comparison (morning; Yael) 3. advanced topics (hierarchical fitting etc.) (afternoon;
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationConstruction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting
Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting Anne Philippe Laboratoire de Mathématiques Jean Leray Université de Nantes Workshop EDF-INRIA,
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationBasic concepts in estimation
Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More information5 Introduction to the Theory of Order Statistics and Rank Statistics
5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order
More informationBayesian techniques for fatigue life prediction and for inference in linear time dependent PDEs
Bayesian techniques for fatigue life prediction and for inference in linear time dependent PDEs Marco Scavino SRI Center for Uncertainty Quantification Computer, Electrical and Mathematical Sciences &
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationResolution of Singularities and Stochastic Complexity of Complete Bipartite Graph-Type Spin Model in Bayesian Estimation
Resolution of Singularities and Stochastic Complexity of Complete Bipartite Graph-Type Spin Model in Bayesian Estimation Miki Aoyagi and Sumio Watanabe Precision and Intelligence Laboratory, Tokyo Institute
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationMachine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 4: SVM I Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d. from distribution
More informationVariations. ECE 6540, Lecture 10 Maximum Likelihood Estimation
Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationSeminar über Statistik FS2008: Model Selection
Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationConvex optimization COMS 4771
Convex optimization COMS 4771 1. Recap: learning via optimization Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w 2 2 + 1 n n [ ] 1 y ix T i w. + 1 / 15 Soft-margin
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationUnderstanding the Curse of Singularities in Machine Learning
Understanding the Curse of Singularities in Machine Learning Shaowei Lin (UC Berkeley) shaowei@ math.berkeley.edu 19 September 2012 UC Berkeley Statistics 1 / 27 Linear Regression Param Estimation Laplace
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationCS540 Machine learning Lecture 5
CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3
More informationStochastic Complexity of Variational Bayesian Hidden Markov Models
Stochastic Complexity of Variational Bayesian Hidden Markov Models Tikara Hosino Department of Computational Intelligence and System Science, Tokyo Institute of Technology Mailbox R-5, 459 Nagatsuta, Midori-ku,
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationConditional Distributions
Conditional Distributions The goal is to provide a general definition of the conditional distribution of Y given X, when (X, Y ) are jointly distributed. Let F be a distribution function on R. Let G(,
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationLTI Systems, Additive Noise, and Order Estimation
LTI Systems, Additive oise, and Order Estimation Soosan Beheshti, Munther A. Dahleh Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science Massachusetts
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationSignal Processing - Lecture 7
1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Model Fitting COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Parameter estimation Model selection 2 Parameter Estimation 3 Generally
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationPolicy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9)
CS294-40 Learning for Robotics and Control Lecture 16-10/20/2008 Lecturer: Pieter Abbeel Policy Gradient Scribe: Jan Biermeyer 1 Recap Recall: H U() = E[ R(s t,a ;π ] = E[R();π ] (1) Here is a sample path
More informationEnKF and filter divergence
EnKF and filter divergence David Kelly Andrew Stuart Kody Law Courant Institute New York University New York, NY dtbkelly.com December 12, 2014 Applied and computational mathematics seminar, NIST. David
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationFinancial Econometrics and Quantitative Risk Managenent Return Properties
Financial Econometrics and Quantitative Risk Managenent Return Properties Eric Zivot Updated: April 1, 2013 Lecture Outline Course introduction Return definitions Empirical properties of returns Reading
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More information(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.
Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationEECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015
EECS564 Estimation, Filtering, and Detection Exam Week of April 0, 015 This is an open book takehome exam. You have 48 hours to complete the exam. All work on the exam should be your own. problems have
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationarxiv: v1 [stat.co] 6 Dec 2015
Bayesian inference and model comparison for metallic fatigue data Ivo Babuška a, Zaid Sawlan b,, Marco Scavino b,d, Barna Szabó c, Raúl Tempone b a ICES, The University of Texas at Austin, Austin, USA
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationBayesian inference. Justin Chumbley ETH and UZH. (Thanks to Jean Denizeau for slides)
Bayesian inference Justin Chumbley ETH and UZH (Thanks to Jean Denizeau for slides) Overview of the talk Introduction: Bayesian inference Bayesian model comparison Group-level Bayesian model selection
More informationTesting and Model Selection
Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses
More informationStatistical Learning Theory. Part I 5. Deep Learning
Statistical Learning Theory Part I 5. Deep Learning Sumio Watanabe Tokyo Institute of Technology Review : Supervised Learning Training Data X 1, X 2,, X n q(x,y) =q(x)q(y x) Information Source Y 1, Y 2,,
More informationSTATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:
More information