Outline. Prior Information and Subjective Probability. Subjective Probability. The Histogram Approach. Subjective Determination of the Prior Density

Similar documents
BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

XII.3 The EM (Expectation-Maximization) Algorithm

Excess Error, Approximation Error, and Estimation Error

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

System in Weibull Distribution

LECTURE :FACTOR ANALYSIS

Need for Probabilistic Reasoning. Raymond J. Mooney. Conditional Probability. Axioms of Probability Theory. Classification (Categorization)

Xiangwen Li. March 8th and March 13th, 2001

Computational and Statistical Learning theory Assignment 4

Preference and Demand Examples

1 Definition of Rademacher Complexity

Fermi-Dirac statistics

Engineering Risk Benefit Analysis

On Pfaff s solution of the Pfaff problem

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c

1 Review From Last Time

CHAPT II : Prob-stats, estimation

Reliability estimation in Pareto-I distribution based on progressively type II censored sample with binomial removals

Least Squares Fitting of Data

Lecture 3: Probability Distributions

Introducing Entropy Distributions

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Chapter 12 Lyes KADEM [Thermodynamics II] 2007

PGM Learning Tasks and Metrics

SEMI-EMPIRICAL LIKELIHOOD RATIO CONFIDENCE INTERVALS FOR THE DIFFERENCE OF TWO SAMPLE MEANS

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18

PARAMETER ESTIMATION IN WEIBULL DISTRIBUTION ON PROGRESSIVELY TYPE- II CENSORED SAMPLE WITH BETA-BINOMIAL REMOVALS

What is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

e companion ONLY AVAILABLE IN ELECTRONIC FORM

COS 511: Theoretical Machine Learning

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Statistical analysis of Accelerated life testing under Weibull distribution based on fuzzy theory

On the number of regions in an m-dimensional space cut by n hyperplanes

Least Squares Fitting of Data

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Chapter One Mixture of Ideal Gases

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

Small-Sample Equating With Prior Information

A be a probability space. A random vector

First Year Examination Department of Statistics, University of Florida

Collaborative Filtering Recommendation Algorithm

Bayesian predictive Configural Frequency Analysis

Estimation of Reliability in Multicomponent Stress-Strength Based on Generalized Rayleigh Distribution

χ x B E (c) Figure 2.1.1: (a) a material particle in a body, (b) a place in space, (c) a configuration of the body

Linear Approximation with Regularization and Moving Least Squares

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Departure Process from a M/M/m/ Queue

Study of the possibility of eliminating the Gibbs paradox within the framework of classical thermodynamics *

CHAPTER 6 CONSTRAINED OPTIMIZATION 1: K-T CONDITIONS

CHAPTER 10 ROTATIONAL MOTION

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form

PHYS 1443 Section 002 Lecture #20

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

CHAPTER 7 CONSTRAINED OPTIMIZATION 1: THE KARUSH-KUHN-TUCKER CONDITIONS

ACTM State Calculus Competition Saturday April 30, 2011

Machine learning: Density estimation

Slobodan Lakić. Communicated by R. Van Keer

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

More metrics on cartesian products

, are assumed to fluctuate around zero, with E( i) 0. Now imagine that this overall random effect, , is composed of many independent factors,

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Bayesian estimation using MCMC approach based on progressive first-failure censoring from generalized Pareto distribution

NUMERICAL DIFFERENTIATION

Chapter 1. Probability

1.3 Hence, calculate a formula for the force required to break the bond (i.e. the maximum value of F)

Integral Transforms and Dual Integral Equations to Solve Heat Equation with Mixed Conditions

Determination of the Confidence Level of PSD Estimation with Given D.O.F. Based on WELCH Algorithm

Probability and Random Variable Primer

NP-Completeness : Proofs

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

ASYMMETRIC TRAFFIC ASSIGNMENT WITH FLOW RESPONSIVE SIGNAL CONTROL IN AN URBAN NETWORK

Introduction to Random Variables

arxiv: v2 [math.co] 3 Sep 2017

Notes prepared by Prof Mrs) M.J. Gholba Class M.Sc Part(I) Information Technology

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Analysis of Discrete Time Queues (Section 4.6)

Solving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint

Elastic Collisions. Definition: two point masses on which no external forces act collide without losing any energy.

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Description of the Force Method Procedure. Indeterminate Analysis Force Method 1. Force Method con t. Force Method con t

Universal communication part II: channels with memory

Stochastic Structural Dynamics

Solutions Homework 4 March 5, 2018

Maximizing the number of nonnegative subsets

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

ON WEIGHTED ESTIMATION IN LINEAR REGRESSION IN THE PRESENCE OF PARAMETER UNCERTAINTY

Final Exam Solutions, 1998

A random variable is a function which associates a real number to each element of the sample space

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

STAT 3008 Applied Regression Analysis

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Finding Dense Subgraphs in G(n, 1/2)

Transcription:

Outlne Pror Inforaton and Subjectve Probablty u89603 1 Subjectve Probablty Subjectve Deternaton of the Pror Densty Nonnforatve Prors Maxu Entropy Prors Usng the Margnal Dstrbuton to Deterne the Pror Herarchcal Pror Crtcss Subjectve Probablty Subjectve Deternaton of the Pror Densty Pror nforaton Classcal concept of probablty: frequency vewpont Subjectve probablty: deal wth rando that frequency vewpont does not apply Ex: con tossng & uneployent rate for next year The Hstogra Approach The Relatve Lelhood Approach Matchng a Gven Functonal For CDF Deternaton 3 4 The Hstogra Approach The Relatve Lelhood Approach When s an nterval of real lne, the ost approach to use s the hstogra. Dvde nto ntervals, deterne the subjectve probablty of each nterval, and plot a probablty hstogra. Short cut: how any te nterval? what sze of ntervals? When s a subset of the real lne, copare the ntutve lelhoods of varous ponts n, and setch a pror densty. Ex: =[0,1] Deterne the ost lely paraeter pont =, whch s three tes as lely as 3 4 = 0, the least lely ones. Then deterne three other ponts copared wth = 0 and setch the result. 5 6

Matchng a Gven Functonal For Assue that () s of a gven functonal for, and choose the densty whch ost closely atched pror belefs After deterned the functonal for, choose paraeters for the functon fro estated pror oents. subjectvely estate several fractles of pror dstrbuton, and atchng these fractles Draw bacs: Only useful when certan specfc functonal fors of pror are assued. 7 Matchng a Gven Functonal For Exaple: =(-, ), pror s thought to be fro noral faly. Deterne the edan s 0, and the quartles are -1 and 1. Snce ean s equal to edan, =0. P(Z<-1/(.19) 1/ )=1/4 when Z s N(0,1). the densty of pror s N(0,.19) 8 CDF Deternaton Ths approach can be done by subjectvely deternng several -fractles, z(), plottng the ponts (z(), ), and setchng a sooth curve jonng the. Dscusson Multvarate pror densty can be consderable The easest way s the use of a gven functonal for, then only a few paraeters need to be deterned subjectvely. Also, ore easer s the case n whch the coordnate,, of are thought to be ndependent. The pror s then the product of the unvarate pror densty of the. Ex: ( )= ( 1, ) If not, the best way s to deterne condtonal and argnal pror denstes Ex: ( 1, ) = ( 1 ) ( 1 ) 9 10 Nonnforatve Prors Because of the copellng reasons to perfor a condtonal analyss and the attractveness of usng Bayesan achnery to do so, there have been attepts to use the Bayesan approach even when no pror nforaton s avalable. Ex: Suppose the paraeter of nterest s a noral ean, so =(-, ). Nonnforatve pror s chosen to be ()=1 (not ()=c>0) (called the unfor densty on R 1, and was ntroduce by Laplace(189)) Nonnforatve Prors Soetes, nonnforatve cannot antan consstency. The lac of nvarance of the constant pror has led to a search for nonnforatve prors whch are approprately nvarant under transforatons. 11 1

Nonnforatve Prors for Locaton and Scale Probles Efforts to derve nonforatve prors through consderaton of transforaton of a proble had ts begnnngs wth Jeffreys (cf. Jeffreys(1961)). It has been extensvely used n Hartgan (1964), Jaynes (1968,1983), Vllegas (1977,1981,1984), and elsewhere. Exaple: Locaton Paraeters and are subset of R p, and the densty of X s of the for f(x-), called locaton densty. s called a locaton paraeter. The N(, )( fxed), T(,, )( and fxed), (,)( fxed), and N p (,)( fxed) denstes are all exaples of locaton denstes. Also, a saple of..d rando varables s sad to be for a locaton densty f ther coon densty s a locaton densty. To derve a nonnforatve pror for ths stuaton, we observe the r.v. Y=X+c (c R p ) 13 14 Defnng =+c,t s clear that Y has densty f(y-). If now ==R p, then saple space and paraeter space for the (Y,) proble are also R p. The (X,) and (Y,) are thus dentcal n strcture. Let and * denote the nonnforatve prors n the (X,) and (Y,) respectvely, the above ples P ( A)=P * ( A) for any set n R p. Snce =+c, t should also be true P * (A)=P (+c A)=P ( A-c) Then, P ( A)=P ( A-c) 15 Assung that the pror has a densty, we can wrte ( ) d ( ) ( ) A d c d Ac A If ths hold for all sets A, t can t ust be true that ()= (-c) for all. Settng =c thus gves (c)=(0) Ths should be hold for all c R p. The concluson s that ust be a constant functon. It s convenent to choose the constant to be 1, so nonnforatve pror densty for a locaton paraeter s ()= 1 16 Nonnforatve Prors n general Settngs For general proble, varous suggestons have been advanced for deternng a nonforatve pror. The ost wdely used ethod s that of Jeffreys (1961), whch s to choose ()=[I()] 1/ I() s the expected Fsher nforaton, log f( X ) I()= -E [ ] If =( 1,, p ) t s a vector, Jeffreys (1961) suggest the use of ()=[det I()] 1/ log f( X ) I j ()= -E [ ] 17 Dscusson A nuber of crtcss have rased concernng the use of nonnforatve prors. Volatng the Lelhood Prncpal. See Gesser (1984a)) Margnalzaton paradox of Dawd, Stone, and Zde (1973) 18

Dscusson There are two coon responses to these crtcss of nonnforatve pror Bayesan analyss. The frst response, attepted by soe nonnforatve pror Bayesans, s to argue for the correctness of ther favorte nonnforatve pror approach, together wth attepts to rebut the paradoxes and counterexaples. The second response s to argue that, operatonally, t s rare for the choce of a nonnforatve pror to aredly affect the answer, so that any reasonable nonnforatve pror can be used. Maxu Entropy Prors Frequently partal pror nforaton s avalable, outsde of whch t s desred to use a pror that s as nonnforatve as possble Defnton 1: Assue s dscrete, let be a probablty densty on. The entropy of, to be denoted ()= - ( )log( ) Entropy has a drect relatonshp to nforaton theory, and n a sense easures the aount of uncertanty nherent n the probablty dstrbuton 19 0 Assue that partal pror nforaton concernng s avalable. E [g ()]= ( )g ( )=, =1,, * It sees reasonable to see the pror dstrbuton whch axzes entropy aong all those dstrbutons whch satsfy the gven set of restrctons. The soluton s gven by exp 1g ( ) ( ##proof ) exp 1g ( ), where are constants to be deterned for the constrant n * If s contnuous, the use of axu entropy becoes ore coplcated. Jaynes (1968) aes a stronger case for defnng entropy as ()= -E ( ) ( ) [log ]= -,where 0 () 0( ) ( )log( ) 0( ) d s the natural nvarant nonnforatve pror for the proble. In the presence of partal pror nforaton of the for E [g ()]= g ( ) ( ) d =, =1,,, ** the pror densty whch axzes () s gven by 0( )exp 1g ( ) ( ) 0( ) exp 1g ( ), where are constants to be deterned for the constrant n ** 1 Exaple: Assue =R 1, s a locaton paraeter. The natural nonnforatve pror s then 0 () =1. It s beleved that The true pror ean s and varance s. These restrcton are of the for ** wth g 1 ()=, 1 =, =,and g ()= (-) The axu entropy pror, subject to these restrcton s exp 1 ( ) ( ) exp 1 ( ) d, where 1 and are to be chosen fro **. Clearly 1 + (-) = [- ] 1 1 +[ 1 - ] 4 3 Exaple (cont) Hence ( ) exp [ ( / ) 1 1 The denonator s a constant, so ( ) s noral densty wth ean - 1 / and varance -1/. Choose 1 =0 and =-1/ satsfes **. Thus ( ) s a N(, ) densty. Dffcultes arsng fro ths approach: Although the need to use a nonnforatve pror n the dervaton of s not too serous, a ore serous proble s that often won t exst. exp [ ( / ) d 4

Usng the Margnal Dstrbuton to Deterne the Pror If X has probablty densty f(x ), and has probablty densty (), then the jont densty of X and s h(x,)=f(x ) (). Defnton : The argnal densty of X s (x )= f( x ) df ( ) = f ( x ) ( ) d (cont case) f( x ) ( ) (dscrete case) Bayesans have long used to chec assuptons. If (for the actual observed data x) turns out to be sall, then the assuptons (the odel f and pror ) have not predcted what actually occurred and are suspect. Inforaton About Subjectve nowledge The data tself 5 6 The ML- Approach to Pror Selecton In Defnton, t was ponted out that (x ) reflects the plausblty of f and, n the lght of the data. If we treat f as defntely nown, t follows that (x ) reflects the plausblty of It s reasonable to consder (x ) as a lelhood functon for. Faced wth a lelhood functon for, a natural ethod of choosng s to use axu lelhood. 7 The ML- Approach to Pror Selecton (cont) Defnton 3: Suppose s a class of prors under consderaton, and that * satsfes (for the observed data x) (x *)= sup(x ) Then * wll be called type axu lelhood pror, or ML- pror for short. When s the class ={: ()=g( ), }, then sup (x )= sup (x g( )), so that one sply ax over the hyperparaeter. 8 Herarchcal Pror Herarchcal Pror also called a ultstage pror. The dea s that one ay have structural and subjectve pror nforaton at the sae te, and t s often convenent to odel ths n stages. For nstance, n the Bayes scenaro, structural nowledge that the were..d. led to the frst stage pror descrpton p 1 ()= 0 ( ) 1 The herarchcal approach would see to place a second stage subjectve pror on 0. The herarchcal approach s ost coonly used when the frst stage,, conssts of prors of a certan functonal for. 9 Crtcss Objectvty Classcal statstcs s objectve and hence sutable for the needs of scence, whle Bayesans s subjectve and only useful for ang personal decsons. Msuse of pror dstrbutons Robustness (n secton 4.7) Data or odel dependent prors The dealzed Bayesan vew s that s a quantty about whch separate nforaton exsts, and that ths nforaton s to be cobned wth that n the data. The approach presues the pror doesn t depend n any way on the data. 30

## proof: Entropy: ()= - ( )log( ) Constrant: ( )g ( )= =1... ( )=1 Then, by Lagrange's ultpler ethod, G(( 1 ),, ( n ))= -( )log( )+ ( ( )g ( )- ) +( ( )-1) G( ( ) 0= = -log( )-1+ g ( )+ ( ) -log ( )-1+ g ( )+ =0 ( ) =exp[-1++ g ( )] Snce So ( ) =1 1 exp[-1+]= exp[ g( )] exp[ g( )] Therefore ( ) = exp[ g( )] 31 Fro Gesser 1984a It was ponted out by Barnerd, Jenns, and Wnsten (196) that f a con whose probablty of heads s cae up heads t tes and tals n-t tes n a seres of ndependent tosses, rrespectve of the stoppng rule, the lelhood would be L() t (1- ) n-t, and the lelhood prncpal would then dctate that any nference about should not depend on whch stoppng rule was actually used. Two coon stoppng rules are: (a) fx the total nuber of tosses and observe the nuber of heads (b) observe the total nuber of tosses requred to attan a fxed nuber of heads 3 Two cases In case (a), the saplng dstrbuton of T, the nuber of heads, s n Pr[ T = t n ]= t (1-) n-t, t t=0,1,,n In case (b), the saplng dstrbuton of N, the nuber of tosses requred to obtan t heads, s n 1 Pr[ N = n t ]= t (1-) n-t, t 1 n=t,t+1, 33 Two cases Now there are Bayesans who have developed rules for obtanng reference pror dstrbutons that purport to express lttle or no nforaton regardng the paraeter. All of these ethods, except Gesser s and Zellner s, yeld the sae reference prors P B () -1/ (1- )-1/ for the bnoal and P N () -1 (1- )-1/ for the negatve bnoal case. Hence the posteror denstes for these two cases are P B ( t,n) t-1/ (1- )n-1-1/ and P N ( t,n) t-1 (1- )n-t-1/ respectvely. 34 Concluson In fact for all of these ethods, the pror dstrbuton wll depend on the saplng rule, and consequently so wll the posteror dstrbuton. The lelhood prncpal says that any nference about the sae paraeter should not depend on whch saplng rule was used. So one ay volate the lelhood prncpal n usng nonnforatve prors. 35 Soe Bayesans Jeffreys (1961) nvoed nvarance, Box and Tao (1973) recoended prors such that lelhoods are data translated n soe sense. Aae (1978) and Gesser (1979) forulated procedures nvolvng the predctve dstrbuton and Kullbac-Lebler dvergence easures. Berbardo (1979) used the noton of axzng entropy n the lt. Zellner (1977) axzed the Shannon nforaton of the relatve to that of the pror. 36