Parameter Estimation

Similar documents
CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Bayes (Naïve or not) Classifiers: Generative Approach

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Probability and Statistics. What is probability? What is statistics?

STK3100 and STK4100 Autumn 2017

Econometric Methods. Review of Estimation

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Nonparametric Density Estimation Intro

Summary of the lecture in Biostatistics

STK3100 and STK4100 Autumn 2018

BASIC PRINCIPLES OF STATISTICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Point Estimation: definition of estimators

Lecture Notes Types of economic variables

STK4011 and STK9011 Autumn 2016

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

CHAPTER VI Statistical Analysis of Experimental Data

Continuous Distributions

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Module 7. Lecture 7: Statistical parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

Simulation Output Analysis

Chapter 3 Sampling For Proportions and Percentages

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Entropy, Relative Entropy and Mutual Information

Generative classification models

Naïve Bayes MIT Course Notes Cynthia Rudin

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

The expected value of a sum of random variables,, is the sum of the expected values:

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Unsupervised Learning and Other Neural Networks

22 Nonparametric Methods.

MIMA Group. Chapter 4 Non-Parameter Estimation. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Chapter 14 Logistic Regression Models

Nonparametric Techniques

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

Introduction to local (nonparametric) density estimation. methods

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Chapter 5 Properties of a Random Sample

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Learning Graphical Models

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Special Instructions / Useful Data

Functions of Random Variables

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

ENGI 3423 Simple Linear Regression Page 12-01

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

STRONG CONSISTENCY FOR SIMPLE LINEAR EV MODEL WITH v/ -MIXING

ε. Therefore, the estimate

LINEAR REGRESSION ANALYSIS

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

ESS Line Fitting

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Lecture 3 Probability review (cont d)

Analysis of Variance with Weibull Data

Chapter 8: Statistical Analysis of Simulated Data

ρ < 1 be five real numbers. The

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

X ε ) = 0, or equivalently, lim

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Set Theory and Probability

Qualifying Exam Statistical Theory Problem Solutions August 2005

Chapter 5 Properties of a Random Sample

2. Independence and Bernoulli Trials

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

Machine Learning. Tutorial on Basic Probability. Lecture 2, September 15, 2006

TESTS BASED ON MAXIMUM LIKELIHOOD

Comparison of Parameters of Lognormal Distribution Based On the Classical and Posterior Estimates

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Parameter, Statistic and Random Samples

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Recursive linear estimation for discrete time systems in the presence of different multiplicative observation noises

IFYMB002 Mathematics Business Appendix C Formula Booklet

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Simple Linear Regression

Construction and Evaluation of Actuarial Models. Rajapaksha Premarathna

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

ENGI 4421 Propagation of Error Page 8-01

6. Nonparametric techniques

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

Chapter 4 Multiple Random Variables

IS 709/809: Computational Methods in IS Research. Simple Markovian Queueing Model

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

CS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration

Quantitative analysis requires : sound knowledge of chemistry : possibility of interferences WHY do we need to use STATISTICS in Anal. Chem.?

3. Basic Concepts: Consequences and Properties

Continuous Random Variables: Conditioning, Expectation and Independence

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Chapter 8. Inferences about More Than Two Population Central Values

Transcription:

arameter Estmato

robabltes Notatoal Coveto Mass dscrete fucto: catal letters Desty cotuous fucto: small letters Vector vs. scalar Scalar: la Vector: bold D: small Hgher dmeso: catal Notes a cotuous state of fluctuato utl a toc s fshed may udates R ANN & ML

arameter Estmato Otmal classfer mamzes a ror robablty class-codtoal desty Assumto o correlato tme deedet statstcs R ANN & ML 3

oular Aroaches arametrc: assume a certa arametrc form for w ad estmate the arameters Noarametrc: does ot assume a arametrc form for w ad estmate the desty rofle drectly Boudary: estmate the searato hyerlae hyersurface betwee w ad w R ANN & ML 4

5 R ANN & ML a ror robablty Gve the umbers of occurrece: f umber of samles are large eough the selecto rocess s ot based Caveat: samlg may be based k M M k k k

Class codtoal desty More comlcated ot a sgle umber but a dstrbuto assume a certa form estmate the arameters What form should we assume? May but ths course We use almost eclusvely Gaussa R ANN & ML 6

7 R ANN & ML Gaussa or Normal Scalar case Vector case Ukows class mea ad varace u e N ] [ / T e N d u Σ u Σ Σ μ Gaussa Dstrbuto

oulato u e feature R ANN & ML 8

Why Gaussa Normal? Cetral lmt theorem redcts ormal dstrbuto from IID eermets I realty There are oly two umbers the scalar case mea ad varace to estmate or d + dd+/ d- dmesos Nce mathematcal roertes e.g. Fourer trasform of a Gaussa s a Gaussa. roducts ad summato of Gaussa rema Gaussa Ay lear trasform of a Gaussa s a Gaussa R ANN & ML 9

Trasformato roecto I artcular a whteg trasform ca dagoalze the covarace matr R ANN & ML 0

arameter Estmato Mamum lkelhood estmator arameters have fed but ukow values Bayesa estmator arameters as radom varables wth kow a ror dstrbutos Bayesa estmator allows us to chage the a ror dstrbuto by cororatg measuremets to share the rofle R ANN & ML

Grahcally MLE Bayesa lkelhood arameters R ANN & ML

Gve Mamum Lkelhood Estmator labeled samles observatos a assumed dstrbuto of e arameters samles are draw deedetly from Fd { { arameter that best elas the observatos } e} R ANN & ML 3

4 R ANN & ML MLE Formulato Mamze l log log Or l 0 log 0 Log lkelhood

5 R ANN & ML A Eamle log log log log log u u u e

6 R ANN & ML A Eamle cot. ˆ ˆ ˆ 0 0 u Class mea as samle mea class varace as samle varace ˆ ˆ ˆ ˆ ˆ u e N g - MLE s based!

oulato u u e e co weght R ANN & ML 7

uˆ uˆ e e ˆ 0 If too arrow may samlg ots wll be outsde wdth wth low lkelhood of occurrece If too wde / becomes too small ad reduces the lkelhood of occurrece R ANN & ML 8

A Quck Word o MA MA Mamum a osteror estmator Smlar to MLE wth oe addtoal twst Mamze the log lkelhood l. ad. ror robablty of arameter values f you kow t e.g. the mea s more lkely to be u o wth a ormal dstrbuto MLE has a uform ror MA ot ecessarly The added term s a case of regularzato R ANN & ML 9

Bayesa Estmator Note that MLE s a batch estmator All data have to be ket Dffcult to udate estmato Dffcult to cororate other evdece Isst o a sgle measuremet Bayesa estmator Allow the freedom that arameters themselves ca be radom varables Allow multle evdece Allow teratve udate R ANN & ML 0

R ANN & ML Bayesa Estmator Based o Bayes rule Wth at our dsosal

R ANN & ML Bayes Rule Formulato Assume comes from oly oe class s deedet of

How ca be used? The dstrbuto s kow e.g. ormal the arameters are ukow For estmatg class arameters class arameters the costra ut t all together d R ANN & ML 3

Bayes Rule Formulato cot. d Ideally someˆ 0 otherwse d Ths s MLE! ˆ Otherwse all ossble s are used R ANN & ML 4

Grahc Iterretato { } { e } { e } { ' ' '} { " " "} ' e " e R ANN & ML 5

6 R ANN & ML A eamle Estmatg mea of a ormal dstrbuto Varace s kow Usg samles Frst ste o o k e N u e u o o o k du u u Curret evdece revous ad other evdece Key to Bayesa: Both curret ad ror evdece ca be used

7 R ANN & ML The } { ' o o k k o o o k e e e e o k k k o o o o o o o m f m

hels Defg the mea Reducg the ucertaty mea Trust ew data f Class varace s small Number of samle s large ror s ucerta o m o R ANN & ML o o 8

9 R ANN & ML Secod ste e d u f where f N d e e d u g u } e{ } { A eamle cot. Thrd ste

Grahcal Iterretato: MLE k e k ˆ R ANN & ML 30

3 R ANN & ML Grahcal Iterretato: Bayesa k k e u e u u u k k

Results of Iteratve rocess Start wth a ror dstrbuto Icororate curret batch of data Geerate a ew ror Goodess of ew ror = goodess of old ror * goodess of terretato Usually ror dstrbuto share Bayesa learg Ucertaty dros R ANN & ML 3

MLE vs. Bayes Faster dfferetato Sgle model Kow model Less formato Slow tegrato Multle weghted Ukow model fe More formato ouform ror R ANN & ML 33

Does t really make a dfferece? Yes Bayesa classfer ad MA wll geeral gve dfferet results whe used to classfy ew samles Because MA MLE kees oly oe hyothess whle Bayesa kees multle weghted hyotheses R ANN & ML 34

35 R ANN & ML Eamle MLE Bayesa ma arg ' ' ma arg ' where ' d ma arg 0.3 0.3 0.4 3 3 3.4.3 * 0.3 * 0.4 *.6.3 *.3 *.4 * 0 Oly oe hyothess s ket

Gbbs Samler Bayesa classfer s otmal but ca be very eesve esecally whe a large umber of hyotheses are ket ad evaluated Gbbs radomly ck oe hyothess accordg to the curret osteror dstrbuto Ca be show later to be related k classfer ad the eected error s at most twce as bad as Bayesa R ANN & ML 36

37 R ANN & ML A Eamle: Naïve Bayesa Features are a coucto of attrbutes Bayes theorem states that a osteror robablty should be mamzed Naïve Bayesa classfer assumes deedece of attrbutes ma arg ma arg ma arg c c c c a c a a a c c a a a a a a c c

Eamle Day Outlook Temerature Humdty Wd lay tes D Suy Hot Hgh Weak No D Suy Host Hgh Strog No D3 Overcast Hot Hgh Weak Yes D4 Ra Mld Hgh Weak Yes D5 Ra Cool Normal Weak Yes D6 Ra Cold Normal Strog No D7 Overcast Cool Normal Strog Yes D8 Suy Mld Hgh Weak No D9 Suy Cool Normal Weak Yes D0 Ra Mld Normal Weak Yes D Suy Mld Normal Strog Yes D Overcast Mld Hgh Strog Yes D3 Overcast Hot Normal Weak Yes D4 Ra Mld Hgh Strog No R ANN & ML 38

39 R ANN & ML Eamle cot <Outlook=suy Temerature=cool Humdty=hgh Wd=strog> laytes=yes? Or o? ma arg } { o yes c NB c strog Wd c hgh Humdty c cool e Temeratur c suy Outlook c c.6 5 3.33 9 3.36 4 5.64 4 9 o strog Wd yes strog Wd o laytes yes laytes 0.006 0.0053 o strog o hgh o cool o suy o yes strog yes hgh yes cool yes suy yes

Caveat Guardg agast zero robablty a c Esecally for small samle szes ad large set of attrbute values Use m-estmate stead If attrbute a ca take k values the =/k a a c c :# of samles c :# of samles c m : equvalet samle a c m m wth sze attrbute a add m more samles : ror estmate R ANN & ML 40

More Eamles Web age classfcato/newsgrou classfcato Lke/dslke for web ages Scece/sorts/etertamet categores for web ages/ewsgrous R ANN & ML 4

More Eamles cot. Select commo occurrg words as features at least k tmes documets Elmate sto words the t etc. ad uctuatos Word stemmg lke lked etc. word k class s deedet of word osto the documet Acheve 89% accuracy for classfyg documets for 0 ewsgrous R ANN & ML 4