Bayes Decision Theory - II
|
|
- Julian Tate
- 5 years ago
- Views:
Transcription
1 Bayes Decso Theory - II Ke Kreutz-Delgado (Nuo Vascocelos) ECE 175 Wter UCSD
2 Nearest Neghbor Classfer We are cosderg supervsed classfcato Nearest Neghbor (NN) Classfer A trag set D = {(x 1,y 1 ),, (x,y )} x s a vector of observatos, y s the correspodg class label a vector x to classfy The NN Decso Rule s Set y where y * * arg m d( x, x ) {1,..., } argm meas: the that mmzes the dstace 2
3 Optmal Classfers We have see that performace depeds o metrc Some metrcs are better tha others The meag of better s coected to how well adapted the metrc s to the propertes of the data But ca we be more rgorous? what do we mea by optmal? To talk about optmalty we defe cost or loss x yˆ f ( x) () f L( y, yˆ ) Loss s the fucto that we wat to mmze Loss depeds o true y ad predcto Loss tells us how good our predctor s ŷ 3
4 Loss Fuctos & Classfcato Errors Loss s a fucto of classfcato errors What errors ca we have? Two types: false postves ad false egatves cosder a face detecto problem (decde face or o-face ) f you see ths ad say face o-face you have a false postve false-egatve (false alarm) (mss, falure to detect) Obvously, we have correspodg sub-classes for o-errors true-postves ad true-egatves postve/egatve part reflects what we say or decde, true/false part reflects the true class label ( true state of the world ) 4
5 (Codtoal) Rsk To wegh dfferet errors dfferetly We troduce a loss fucto Deote the cost of classfyg X from class as j by L j Oe way to measure how good the classfer s to use the (datacodtoal) expected value of the loss, aka the (codtoal) Rsk, R( x, ) E{ L[ Y ] x} L j PYX( j x) j Note that the (data-codtoal) rsk s a fucto of both the decso decde class ad the codtog data (measured feature vector), x. 5
6 Loss Fuctos example: two sakes ad eatg posoous dart frogs Regular sake wll de Frogs are a good sack for the predator dart-sake Ths leads to the losses Regular sake dart frog regular frog regular 0 dart 0 10 Predator sake dart frog regular frog regular 10 0 dart 0 10 What s optmal decso whe sakes fd a frog lke these? 6
7 Mmum Rsk Classfcato We have see that f both sakes have P the both say regular However, f P YX YX 0 j dart ( j x) 1 j regular 0.1 j dart ( j x) 0.9 j regular the the vulerable sake says dart whle the predator says regular Its fte loss for sayg regular whe frog s dart, makes the vulerable sake much more cautous! 7
8 BDR = Mmzg Codtoal Rsk Note that the defto of rsk: Immedately defes the optmal classfer as the oe that mmzes the codtoal rsk for a gve observato x The Optmal Decso s the Bayes Decso Rule (BDR) : * ( x) argm R( x, ) argm L j P ( j x). j YX The BDR yelds the optmal (mmal) rsk : R x R x L j P j x * * ( ) (, ) m YX ( ) j 8
9 What s a Decso Rule? Cosder the c-ary classfcato problem wth class labels, {1,, c}. Gve a observato (feature), x, to be classfed, a decso rule s a fucto d = d(.) of the observato that takes ts values the set of class labels, dx ( ) {1,, c}. * * d ( x) ( x) Note that defed o the prevous slde s a optmal decso rule the sese that for a specfc value of x t mmzes the codtoal rsk R(x,) over all possble decsos C 9
10 (d-depedet) Total Average Rsk Gve a decso rule d ad the codtoal rsk R(x,), we ca cosder the (d-depedet) codtoal rsk R(x,d(x)). We ca ow defe the total (d-depedet) Expected or Average Rsk (aka d-rsk): R( d) E { R( x, d( x) )} Note that we have averaged over all possble measuremets (features) x that we mght ecouter the world. Note that R(d) s a fucto of a fucto! (A fucto of d) The (d-rsk) R(d) s a measure of how we expect to perform o the average whe we use the fxed decso rule d over-ad-overaga o a large set of real world data. It s atural to ask f there s a optmal decso rule whch mmzes the average rsk R(d) over the class of all possble decso rules. 10
11 Mmzg the Average Rsk R(d) Optmzg total rsk R(d) seems hard because we are tryg to mmze t over a famly of fuctos (decso rules), d. However, sce R( d) E{ R( x, d( x))} R( x, d( x)) p x) dx, oe ca equvaletly mmze the data-codtoal rsk R(x,d(x)) pot-wse x. I.e. solve for the value of the optmal decso rule at each x : d * ( x) arg m R( x, d( x)) argm R( x, ) Thus d*(x) = *(x)!! I.e. the BDR, whch we already kow optmzes the Data-Codtoal Rsk, ALSO optmzes the Average Rsk R(d) over ALL possble decso rules d!! Ths makes sese: f the BDR s optmal for every sgle stuato, x, t must be optmal o the average over all x 11 X ( 0 d( x)
12 The 0/1 Loss Fucto A mportat specal case of terest: zero loss for o error ad equal loss for two error types Ths s equvalet to the zero/oe loss : L j 0 1 j j sake predcto dart frog regular frog regular 1 0 dart 0 1 Uder ths loss the optmal Bayes decso rule (BDR) s * * d ( x) x L j PYX j j ( ) arg m ( x) arg m P ( j x) j YX 12
13 0/1 Loss yelds MAP Decso Rule Note that : * x PYX j x j ( ) arg m ( ) arg m 1 PYX ( x) arg max P ( x) YX Thus the Optmal Decso for the 0/1 loss s : Pck the class that s most probable gve the observato x *(x) s kow as the Maxmum a Posteror Probablty (MAP) soluto Ths s also kow as the Bayes Decso Rule (BDR) for the 0/1 loss We wll ofte smplfy our dscusso by assumg ths loss But you should always be aware that other losses may be used 13
14 BDR for the 0/1 Loss Cosder the evaluato of the BDR for 0/1 loss * x PYX x ( ) arg max ( ) Ths s also called the Maxmum a Posteror Probablty (MAP) rule It s usually ot trval to evaluate the posteror probabltes P Y X ( x ) Ths s due to the fact that we are tryg to fer the cause (class ) from the cosequece (observato x).e. we are tryg to solve a otrval verse problem E.g. mage that I wat to evaluate P Y X ( perso has two eyes ) Ths strogly depeds o what the other classes are 14
15 Posteror Probabltes ad Detecto If the two classes are people ad cars the P Y X ( perso has two eyes ) = 1 But f the classes are people ad cats the P Y X ( perso has two eyes ) = ½ f there are equal umbers of cats ad people to uformly choose from [ ths s addtoal fo! ] How do we deal wth ths problem? We ote that t s much easer to fer cosequece from cause E.g., t s easy to fer that P X Y ( has two eyes perso ) = 1 Ths does ot deped o ay other classes We do ot eed ay addtoal formato Gve a class, just cout the frequecy of observato 15
16 Bayes Rule How do we go from P X Y ( x j ) to P Y X ( j x )? We use Bayes rule: P YX ( x) P Cosder the two-class problem,.e. Y=0 or Y=1 the BDR uder 0/1 loss s X Y ( x ) P ( ) P X ( x) * x PYX x ( ) arg max ( ) 0, f PY X (0 x) PY X (1 x) 1, f PY X (0 x) PY X (1 x) Y 16
17 BDR for 0/1 Loss Bary Classfcato Pck 0 whe P ad 1 otherwse Y X (0 x) PY X (1 x) Usg Bayes rule o both sdes of ths equalty yelds P (0 x) P (1 x) Y X Y X PX Y ( x 0) PY (0) PX Y ( x 1) PY (1) P ( x) P ( x) X Notg that P X (x) s a o-egatve quatty ths s the same as the rule pck 0 whe P ( x 0) P (0) P ( x 1) P (1) X Y Y X Y Y X.e. * x PX Y x PY ( ) argmax ( ) ( ) 17
18 The Log Trck Sometmes t s ot coveet to work drectly wth pdf s Oe helpful trck s to take logs Note that the log s a mootocally creasg fucto a b log a from whch we have log b * x PX Y x PY ( ) arg max ( ) ( ) X Y X Y X Y log a log b arg max log P ( x ) P ( ) arg max log P ( x ) log P ( ) arg m log P ( x ) log P ( ) Y Y Y b a 18
19 Stadard (0/1) BDR I summary for the zero/oe loss, the followg three decso rules are optmal ad equvalet 1) 2) * ( x ) arg max PY X ( x ) ( ) arg max ( ) ( ) * x PX Y x PY 3) * ( x ) arg max log P X Y ( x ) log P ( ) Y The form 1) s usually hardest to use, 3) s frequetly easer tha 2) 19
20 (Stadard 0/1-Loss) BDR - Example So far the BDR s a abstract rule How does oe mplemet the optmal decso practce? I addto to havg a loss fucto, you eed to kow, model, or estmate the probabltes! Example Suppose that you ru a gas stato O Modays you have a promoto to sell more gas Q: s the promoto workg? I.e., s Y = 0 (o) or Y = 1 (yes)? A good observato to aswer ths questo s the terarrval tme (t) betwee cars hgh t: ot workg (Y = 0) low t: workg well (Y = 1) 20
21 BDR - Example What are the class-codtoal ad pror probabltes? Model the probablty of arrval of a car by a Expoetal desty (a stadard pdf to use) Cotuous-valued terarrval tmes are assumed to be expoetally dstrbuted. Hece P ( t ) l e lt X Y where l s the arrval rate (cars/s). The expected value of the terarrval tme s XY E x y Cosecutve tmes are assumed to be depedet : 1 l P ( t,, t ) P ( t ) l e lt k X1,, X Y 1 X Y k k1 k1 21
22 BDR - Example Let s assume that we kow l ad the (pror) class probabltes P Y () = p, = 0,1 Have measured a collecto of tmes durg the day, D = {t 1,...,t } The probabltes are of expoetal form Therefore t s easer to use the log-based BDR ( ) arg max log ( ) log ( ) * PX Y PY lt k arg max logle logp k 1 arg max lt k log l logp k 1 arg max l log t k l p k 1 22
23 BDR - Example Ths meas we pck 0 whe log k l t log l p l t l p 0 k k1 k1 l ( l1l0) t k log k 1 l l 1 p 1 t k log 1 ( 1 0) k l l l0 p 0 ad 1 otherwse Does ths decso rule make sese? Let s assume, for smplcty, that p 1 = p 2 = 1/2 p p, or, or (reasoably takg l 1 > l 0 ) 23
24 BDR - Example For p 1 = p 2 = ½, we pck promoto dd ot work (Y=0) f t 1 1 l 1 t k log k1 ( l1 l0 ) l0 The left had sde s the (sample) average terarrval tme for the day Ths meas that there s a optmal choce of a threshold 1 l 1 T log ( l1 l0 ) l0 above whch we say promoto dd ot work. Ths makes sese! T What s the shape of ths threshold? Assumg l 0 = 1, t looks lke ths. Hgher the l 1, the more lkely to say promoto dd ot work. l 1 24
25 BDR - Example Whe p 1 = p 2 = ½, we pck dd ot work (Y=0) whe t 1 t k k1 T T 1 ( l l ) 1 0 l 1 log l0 T Assumg l 0 = 1, T decreases wth l 1 I.e. for a gve daly average, Larger l 1 : easer to say dd ot work Ths meas that As the expected rate of arrval for good days creases we are gog to mpose a tougher stadard o the average measured terarrval tmes The average has to be smaller for us to accept the day as a good oe Oce aga, ths makes sese! A sesble aswer s usually the case wth the BDR (a good way to check your math) l 1 25
26 The Gaussa Classfer Oe mportat case s that of Multvarate Gaussa Classes The pdf of class s a Gaussa of mea m ad covarace S f P ( x ) The BDR s 1 exp 1 ( x m ) 2 S ( x m ) T 1 X Y d (2p ) S * 1 T 1 ( x) arg max ( x m) ( x m) S 2 1 log(2 ) d p S log PY ( ) 2 26
27 Implemetato of a Gaussa Classfer To desg a Gaussa classfer (e.g. homework) Start from a collecto of datasets, where the -th class dataset D () = {x 1 (),..., x () } s a set of () examples from class For each class estmate the Gaussa parameters : ˆ m where 1 () () x j j c () T k 1 ˆ 1 S ( ˆ )( ˆ x m x m ) ( ) ( ) T ( ) j j j Pˆ () s the total umber of examples over all c classes Va the plug rule, the BDR s approxmated as Y T () * 1 T 1 ( ) arg max ( ˆ ) ˆ x x m ( ˆ x m) S 2 1 d log(2 ) ˆ l g ˆ p S o PY( ) 2 27
28 Gaussa Classfer The Gaussa Classfer ca be wrtte as ( ) = 0.5 x d x m a * 2 ( ) arg m (, ) wth d x y x y x y 2 T 1 (, ) ( ) S ( ) a log( 2p ) d S 2log P Y ( ) ad ca be see as a earest class-eghbor classfer wth a fuy metrc Each class has ts ow dstace measure: Sum the Mahalaobs-squared for that class, the add the a costat. We effectvely have dfferet metrcs the data (feature) space that are class depedet. 28
29 Gaussa Classfer A specal case of terest s whe All classes have the same covarace S = S x d x m a * 2 ( ) arg m (, ) ( ) = 0.5 wth d x y x y x y 2 T 1 (, ) ( ) S ( ) a 2log ( ) Note that: P Y a ca be dropped whe all classes have equal pror probablty Ths s remscet of the NN classfer wth Mahalaobs dstace Istead of fdg the earest data pot eghbor of x, t looks for the earest class prototype, (or archetype, or exemplar, or template, or represetatve, or deal, or form ), defed as the class mea m 29
30 Bary Classfer Specal Case Cosder S = S wth two classes Oe mportat property of ths case s that the decso boudary s a hyperplae (Homework) Ths ca be show by computg the set of pots x such that d ( x, m ) a d ( x, m ) a ad showg that they satsfy ( ) = 0.5 T w ( x x ) 0 0 Ths s the equato of a hyperplae wth ormal w. x 0 ca be ay fxed pot o the hyperplae, but t s stadard to choose t to have mmum orm, whch case w ad x 0 are the parallel x x 1 x 3 x 2 x x 0 0 x w 30
31 Gaussa M-ary Classfer Specal Case If all the class covaraces are the detty, S =I, the x d x m a * ( ) arg m 2 (, ) wth d 2 ( x, y) x y 2 a 2log ( ) P Y Ths s called (smple, Cartesa) template matchg wth class meas as templates E.g. for dgt classfcato *? Compare the complexty of ths classfer to NN Classfers! 31
32 The Sgmod Fucto We have derved much of the above from the log-based BDR ( ) arg max log ( ) log ( ) * x PX Y x PY Whe there are oly two classes, = 0,1, t s also terestg mapulate the orgal defto as follows: where * ( x) arg max g ( x) g ( x) P ( x) Y X P P X Y X Y ( x ) P ( ) P ( x) ( x ) P ( ) P ( x 0) P (0) P ( x 1) P (1) X Y Y X Y Y X Y Y 32
33 The Sgmod Fucto Note that ths ca be wrtte as * ( x) arg max g ( x) g1( x ) 1 g0( x ) g 0 ( x) 1 1 P ( x 1) P (1) X Y P ( x 0) P (0) X Y Y Y For Gaussa classes, the posteror probabltes are g 0 ( ) 1 x 1 exp d ( x, m ) d ( x, m ) a a where, as before, d x y x y x y 2 T 1 (, ) ( ) S ( ) a log( 2p ) d S 2log P Y ( ) 33
34 The Sgmod ( S-shaped ) Fucto The posteror pdf for class = 0, g 0 ( ) 1 x 1 exp d ( x, m ) d ( x, m ) a a s a sgmod ad looks lke ths ( 1 ) =
35 The Sgmod Fucto Neural Nets The sgmod appears eural etworks, where t ca be terpreted as a posteror pdf for a Gaussa bary classfcato problem whe the covaraces are the same 35
36 The Sgmod Fucto Neural Nets But ot ecessarly whe the covaraces are dfferet 36
37 END 37
Bayes (Naïve or not) Classifiers: Generative Approach
Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg
More informationIntroduction to local (nonparametric) density estimation. methods
Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest
More informationUNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS
UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted
More informationPoint Estimation: definition of estimators
Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.
More informationDimensionality Reduction and Learning
CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that
More informationMultivariate Transformation of Variables and Maximum Likelihood Estimation
Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty
More informationLecture 7: Linear and quadratic classifiers
Lecture 7: Lear ad quadratc classfers Bayes classfers for ormally dstrbuted classes Case : Σ σ I Case : Σ Σ (Σ daoal Case : Σ Σ (Σ o-daoal Case 4: Σ σ I Case 5: Σ Σ j eeral case Lear ad quadratc classfers:
More informationSTK4011 and STK9011 Autumn 2016
STK4 ad STK9 Autum 6 Pot estmato Covers (most of the followg materal from chapter 7: Secto 7.: pages 3-3 Secto 7..: pages 3-33 Secto 7..: pages 35-3 Secto 7..3: pages 34-35 Secto 7.3.: pages 33-33 Secto
More informationDiscrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b
CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package
More informationChapter 14 Logistic Regression Models
Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as
More information6. Nonparametric techniques
6. Noparametrc techques Motvato Problem: how to decde o a sutable model (e.g. whch type of Gaussa) Idea: just use the orgal data (lazy learg) 2 Idea 1: each data pot represets a pece of probablty P(x)
More informationChapter 5 Properties of a Random Sample
Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample
More informationSummary of the lecture in Biostatistics
Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the
More informationUNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS
UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:
More informationAn Introduction to. Support Vector Machine
A Itroducto to Support Vector Mache Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork
More informationHomework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015
Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts
More informationChapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements
Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall
More informationbest estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best
Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg
More informationLecture 3 Probability review (cont d)
STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto
More informationLecture 3. Sampling, sampling distributions, and parameter estimation
Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called
More informationMA/CSSE 473 Day 27. Dynamic programming
MA/CSSE 473 Day 7 Dyamc Programmg Bomal Coeffcets Warshall's algorthm (Optmal BSTs) Studet questos? Dyamc programmg Used for problems wth recursve solutos ad overlappg subproblems Typcally, we save (memoze)
More informationhp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations
HP 30S Statstcs Averages ad Stadard Devatos Average ad Stadard Devato Practce Fdg Averages ad Stadard Devatos HP 30S Statstcs Averages ad Stadard Devatos Average ad stadard devato The HP 30S provdes several
More informationChapter 4 Multiple Random Variables
Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:
More informationLecture 02: Bounding tail distributions of a random variable
CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome
More informationENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections
ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty
More informationLecture Notes Types of economic variables
Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte
More informationLecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model
Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The
More informationSpecial Instructions / Useful Data
JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth
More information1 Solution to Problem 6.40
1 Soluto to Problem 6.40 (a We wll wrte T τ (X 1,...,X where the X s are..d. wth PDF f(x µ, σ 1 ( x µ σ g, σ where the locato parameter µ s ay real umber ad the scale parameter σ s > 0. Lettg Z X µ σ we
More informationX X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then
Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers
More informationSimulation Output Analysis
Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1
STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ
More informationFunctions of Random Variables
Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,
More informationRandom Variables and Probability Distributions
Radom Varables ad Probablty Dstrbutos * If X : S R s a dscrete radom varable wth rage {x, x, x 3,. } the r = P (X = xr ) = * Let X : S R be a dscrete radom varable wth rage {x, x, x 3,.}.If x r P(X = x
More informationEconometric Methods. Review of Estimation
Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators
More informationENGI 3423 Simple Linear Regression Page 12-01
ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable
More informationCHAPTER 4 RADICAL EXPRESSIONS
6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube
More information22 Nonparametric Methods.
22 oparametrc Methods. I parametrc models oe assumes apror that the dstrbutos have a specfc form wth oe or more ukow parameters ad oe tres to fd the best or atleast reasoably effcet procedures that aswer
More informationRademacher Complexity. Examples
Algorthmc Foudatos of Learg Lecture 3 Rademacher Complexty. Examples Lecturer: Patrck Rebesch Verso: October 16th 018 3.1 Itroducto I the last lecture we troduced the oto of Rademacher complexty ad showed
More informationUnsupervised Learning and Other Neural Networks
CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all
More informationGenerative classification models
CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato
More informationFeature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)
CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.
More informationLECTURE 2: Linear and quadratic classifiers
LECURE : Lear ad quadratc classfers g Part : Bayesa Decso heory he Lkelhood Rato est Maxmum A Posteror ad Maxmum Lkelhood Dscrmat fuctos g Part : Quadratc classfers Bayes classfers for ormally dstrbuted
More informationChapter 3 Sampling For Proportions and Percentages
Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys
More informationD. VQ WITH 1ST-ORDER LOSSLESS CODING
VARIABLE-RATE VQ (AKA VQ WITH ENTROPY CODING) Varable-Rate VQ = Quatzato + Lossless Varable-Legth Bary Codg A rage of optos -- from smple to complex A. Uform scalar quatzato wth varable-legth codg, oe
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 00 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for the
More informationLecture Note to Rice Chapter 8
ECON 430 HG revsed Nov 06 Lecture Note to Rce Chapter 8 Radom matrces Let Y, =,,, m, =,,, be radom varables (r.v. s). The matrx Y Y Y Y Y Y Y Y Y Y = m m m s called a radom matrx ( wth a ot m-dmesoal dstrbuto,
More informationTESTS BASED ON MAXIMUM LIKELIHOOD
ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal
More information{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:
Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed
More informationClass 13,14 June 17, 19, 2015
Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral
More informationTHE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5
THE ROYAL STATISTICAL SOCIETY 06 EAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 The Socety s provdg these solutos to assst cadtes preparg for the examatos 07. The solutos are teded as learg ads ad should
More informationTHE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA
THE ROYAL STATISTICAL SOCIETY 3 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER I STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad
More information3. Basic Concepts: Consequences and Properties
: 3. Basc Cocepts: Cosequeces ad Propertes Markku Jutt Overvew More advaced cosequeces ad propertes of the basc cocepts troduced the prevous lecture are derved. Source The materal s maly based o Sectos.6.8
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationQualifying Exam Statistical Theory Problem Solutions August 2005
Qualfyg Exam Statstcal Theory Problem Solutos August 5. Let X, X,..., X be d uform U(,),
More informationMidterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes
coometrcs, CON Sa Fracsco State Uverst Mchael Bar Sprg 5 Mdterm xam, secto Soluto Thursda, Februar 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes exam.. No calculators of a kd are allowed..
More informationBayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information
Malaysa Joural of Mathematcal Sceces (): 97- (9) Bayes Estmator for Expoetal Dstrbuto wth Exteso of Jeffery Pror Iformato Hadeel Salm Al-Kutub ad Noor Akma Ibrahm Isttute for Mathematcal Research, Uverst
More informationX ε ) = 0, or equivalently, lim
Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece
More informationCHAPTER VI Statistical Analysis of Experimental Data
Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca
More informationLecture 9: Tolerant Testing
Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have
More informationChapter 9 Jordan Block Matrices
Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.
More information6.867 Machine Learning
6.867 Mache Learg Problem set Due Frday, September 9, rectato Please address all questos ad commets about ths problem set to 6.867-staff@a.mt.edu. You do ot eed to use MATLAB for ths problem set though
More informationENGI 4421 Propagation of Error Page 8-01
ENGI 441 Propagato of Error Page 8-01 Propagato of Error [Navd Chapter 3; ot Devore] Ay realstc measuremet procedure cotas error. Ay calculatos based o that measuremet wll therefore also cota a error.
More informationarxiv:math/ v1 [math.gm] 8 Dec 2005
arxv:math/05272v [math.gm] 8 Dec 2005 A GENERALIZATION OF AN INEQUALITY FROM IMO 2005 NIKOLAI NIKOLOV The preset paper was spred by the thrd problem from the IMO 2005. A specal award was gve to Yure Boreko
More informationTHE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA
THE ROYAL STATISTICAL SOCIETY EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER II STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for
More informationρ < 1 be five real numbers. The
Lecture o BST 63: Statstcal Theory I Ku Zhag, /0/006 Revew for the prevous lecture Deftos: covarace, correlato Examples: How to calculate covarace ad correlato Theorems: propertes of correlato ad covarace
More informationThe Mathematical Appendix
The Mathematcal Appedx Defto A: If ( Λ, Ω, where ( λ λ λ whch the probablty dstrbutos,,..., Defto A. uppose that ( Λ,,..., s a expermet type, the σ-algebra o λ λ λ are defed s deoted by ( (,,...,, σ Ω.
More informationLogistic regression (continued)
STAT562 page 138 Logstc regresso (cotued) Suppose we ow cosder more complex models to descrbe the relatoshp betwee a categorcal respose varable (Y) that takes o two (2) possble outcomes ad a set of p explaatory
More informationMultiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades
STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos
More informationChapter 8. Inferences about More Than Two Population Central Values
Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha
More informationMedian as a Weighted Arithmetic Mean of All Sample Observations
Meda as a Weghted Arthmetc Mea of All Sample Observatos SK Mshra Dept. of Ecoomcs NEHU, Shllog (Ida). Itroducto: Iumerably may textbooks Statstcs explctly meto that oe of the weakesses (or propertes) of
More informationMATH 247/Winter Notes on the adjoint and on normal operators.
MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say
More informationAnnouncements. Recognition II. Computer Vision I. Example: Face Detection. Evaluating a binary classifier
Aoucemets Recogto II H3 exteded to toght H4 to be aouced today. Due Frday 2/8. Note wll take a whle to ru some thgs. Fal Exam: hursday 2/4 at 7pm-0pm CSE252A Lecture 7 Example: Face Detecto Evaluatg a
More informationEstimation of Stress- Strength Reliability model using finite mixture of exponential distributions
Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur
More informationMu Sequences/Series Solutions National Convention 2014
Mu Sequeces/Seres Solutos Natoal Coveto 04 C 6 E A 6C A 6 B B 7 A D 7 D C 7 A B 8 A B 8 A C 8 E 4 B 9 B 4 E 9 B 4 C 9 E C 0 A A 0 D B 0 C C Usg basc propertes of arthmetc sequeces, we fd a ad bm m We eed
More information7. Joint Distributions
7. Jot Dstrbutos Chrs Pech ad Mehra Saham Ma 2017 Ofte ou wll work o problems where there are several radom varables (ofte teractg wth oe aother. We are gog to start to formall look at how those teractos
More information= 2. Statistic - function that doesn't depend on any of the known parameters; examples:
of Samplg Theory amples - uemploymet househol cosumpto survey Raom sample - set of rv's... ; 's have ot strbuto [ ] f f s vector of parameters e.g. Statstc - fucto that oes't epe o ay of the ow parameters;
More information9.1 Introduction to the probit and logit models
EC3000 Ecoometrcs Lecture 9 Probt & Logt Aalss 9. Itroducto to the probt ad logt models 9. The logt model 9.3 The probt model Appedx 9. Itroducto to the probt ad logt models These models are used regressos
More informationIdeal multigrades with trigonometric coefficients
Ideal multgrades wth trgoometrc coeffcets Zarathustra Brady December 13, 010 1 The problem A (, k) multgrade s defed as a par of dstct sets of tegers such that (a 1,..., a ; b 1,..., b ) a j = =1 for all
More informationThe number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter
LOGISTIC REGRESSION Notato Model Logstc regresso regresses a dchotomous depedet varable o a set of depedet varables. Several methods are mplemeted for selectg the depedet varables. The followg otato s
More informationMidterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes
coometrcs, CON Sa Fracsco State Uversty Mchael Bar Sprg 5 Mdterm am, secto Soluto Thursday, February 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes eam.. No calculators of ay kd are allowed..
More informationIdea is to sample from a different distribution that picks points in important regions of the sample space. Want ( ) ( ) ( ) E f X = f x g x dx
Importace Samplg Used for a umber of purposes: Varace reducto Allows for dffcult dstrbutos to be sampled from. Sestvty aalyss Reusg samples to reduce computatoal burde. Idea s to sample from a dfferet
More informationMultiple Choice Test. Chapter Adequacy of Models for Regression
Multple Choce Test Chapter 06.0 Adequac of Models for Regresso. For a lear regresso model to be cosdered adequate, the percetage of scaled resduals that eed to be the rage [-,] s greater tha or equal to
More informationAssignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)
Assgmet 5/MATH 7/Wter 00 Due: Frday, February 9 class (!) (aswers wll be posted rght after class) As usual, there are peces of text, before the questos [], [], themselves. Recall: For the quadratc form
More informationContinuous Distributions
7//3 Cotuous Dstrbutos Radom Varables of the Cotuous Type Desty Curve Percet Desty fucto, f (x) A smooth curve that ft the dstrbuto 3 4 5 6 7 8 9 Test scores Desty Curve Percet Probablty Desty Fucto, f
More informationPTAS for Bin-Packing
CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,
More informationPoint Estimation: definition of estimators
Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.
More informationCHAPTER 2. = y ˆ β x (.1022) So we can write
CHAPTER SOLUTIONS TO PROBLEMS. () Let y = GPA, x = ACT, ad = 8. The x = 5.875, y = 3.5, (x x )(y y ) = 5.85, ad (x x ) = 56.875. From equato (.9), we obta the slope as ˆβ = = 5.85/56.875., rouded to four
More informationTHE ROYAL STATISTICAL SOCIETY 2010 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2 STATISTICAL INFERENCE
THE ROYAL STATISTICAL SOCIETY 00 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE STATISTICAL INFERENCE The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for the
More informationThe equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.
INTRODUCTORY NOTE ON LINEAR REGREION We have data of the form (x y ) (x y ) (x y ) These wll most ofte be preseted to us as two colum of a spreadsheet As the topc develops we wll see both upper case ad
More informationBounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy
Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled
More information2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.
.5 x 54.5 a. x 7. 786 7 b. The raked observatos are: 7.4, 7.5, 7.7, 7.8, 7.9, 8.0, 8.. Sce the sample sze 7 s odd, the meda s the (+)/ 4 th raked observato, or meda 7.8 c. The cosumer would more lkely
More informationOrdinary Least Squares Regression. Simple Regression. Algebra and Assumptions.
Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos
More informationKernel-based Methods and Support Vector Machines
Kerel-based Methods ad Support Vector Maches Larr Holder CptS 570 Mache Learg School of Electrcal Egeerg ad Computer Scece Washgto State Uverst Refereces Muller et al. A Itroducto to Kerel-Based Learg
More informationNaïve Bayes MIT Course Notes Cynthia Rudin
Thaks to Şeyda Ertek Credt: Ng, Mtchell Naïve Bayes MIT 5.097 Course Notes Cytha Rud The Naïve Bayes algorthm comes from a geeratve model. There s a mportat dstcto betwee geeratve ad dscrmatve models.
More information( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model
Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch
More informationLaboratory I.10 It All Adds Up
Laboratory I. It All Adds Up Goals The studet wll work wth Rema sums ad evaluate them usg Derve. The studet wll see applcatos of tegrals as accumulatos of chages. The studet wll revew curve fttg sklls.
More informationCHAPTER 3 POSTERIOR DISTRIBUTIONS
CHAPTER 3 POSTERIOR DISTRIBUTIONS If scece caot measure the degree of probablt volved, so much the worse for scece. The practcal ma wll stck to hs apprecatve methods utl t does, or wll accept the results
More information1 Onto functions and bijections Applications to Counting
1 Oto fuctos ad bectos Applcatos to Coutg Now we move o to a ew topc. Defto 1.1 (Surecto. A fucto f : A B s sad to be surectve or oto f for each b B there s some a A so that f(a B. What are examples of
More information8.1 Hashing Algorithms
CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega
More information