Lecture 19 of 42. MAP and MLE continued, Minimum Description Length (MDL)
|
|
- Abner Briggs
- 5 years ago
- Views:
Transcription
1 Lecture 19 of 4 MA and MLE contnued, Mnu Descrpton Length (MDL) Wednesday, 8 February 007 Wlla H. Hsu, KSU Readngs for next class: Chapter 5, Mtchell Lecture Outlne Read Sectons , Mtchell Overvew of Bayesan Learnng Fraework: usng probablstc crtera to generate hypotheses of all knds robablty: foundatons Bayes s Theore Defnton of condtonal (posteror) probablty Rafcatons of Bayes s Theore Answerng probablstc queres MA hypotheses Generatng Maxu A osteror (MA) Hypotheses Generatng Maxu Lkelhood Hypotheses Next Week: Sectons , Mtchell; Roth; earl and Vera More Bayesan learnng: MDL, BOC, Gbbs, Sple (Naïve) Bayes Learnng over text 1
2 Bayes s Theore MA Hypothess Choosng Hypotheses Generally want ost probable hypothess gven the tranng data Defne: arg ax[ f ( x )] the value of x n the saple space Ω wth the hghest f(x) x Ω Maxu a posteror hypothess, h MA ML Hypothess ( h D) h MA = ( D h) ( h) ( D) = arg ax = arg ax = arg ax = ( h D) ( D) ( h D) ( D h) ( h) ( D) ( D h) ( h) Assue that p(h ) = p(h j ) for all pars, j (unfor prors,.e., H ~ Unfor) Can further splfy and choose the axu lkelhood hypothess, h ML h ML = arg ax h H ( D h ) Bayes s Theore: Query Answerng (QA) Answerng User Queres Suppose we want to perfor ntellgent nferences over a database DB Scenaro 1: DB contans records (nstances), soe labeled wth answers Scenaro : DB contans probabltes (annotatons) over propostons QA: an applcaton of probablstc nference QA Usng ror and Condtonal robabltes: Exaple Query: Does patent have cancer or not? Suppose: patent takes a lab test and result coes back postve Correct + result n only 98% of the cases n whch dsease s actually present Correct - result n only 97% of the cases n whch dsease s not present Only of the entre populaton has ths cancer α (false negatve for H 0 Cancer) = 0.0 (NB: for 1-pont saple) β (false postve for H 0 Cancer) = 0.03 (NB: for 1-pont saple) ( Cancer ) = ( + Cancer ) = 0.98 ( + Cancer ) = 0.03 ( Cancer ) = 0.99 ( Cancer ) = 0.0 ( Cancer ) = 0.97 (+ H 0 ) (H 0 ) = , (+ H A ) (H A ) = h MA = H A Cancer
3 Basc Forulas for robabltes roduct Rule (Alternatve Stateent of Bayes s Theore) roof: requres axoatc set theory, as does Bayes s Theore Su Rule Sketch of proof (edate fro axoatc set theory) Draw a Venn dagra of two sets denotng events A and B A Let A B denote the event correspondng to A B Theore of Total robablty Suppose events A 1, A,, A n are utually exclusve and exhaustve Mutually exclusve: j A A j = Exhaustve: (A )= 1 n Then B = B A A ( ) ( ) ( ) =1 ( A B) = ( A B) ( B) ( A B) = ( A) + ( B) ( A B) roof: follows fro product rule and 3 rd Kologorov axo B MA and ML Hypotheses: A attern Recognton Fraework attern Recognton Fraework Autoated speech recognton (ASR), autoated age recognton Dagnoss Forward roble: One Step n ML Estaton Gven: odel h, observatons (data) D Estate: (D h), the probablty that the odel generated the data Backward roble: attern Recognton / redcton Step Gven: odel h, observatons D Maxze: (h(x) = x h, D) for a new X (.e., fnd best x) Forward-Backward (Learnng) roble Gven: odel space H, data D Fnd: h H such that (h D) s axzed (.e., MA hypothess) More Info HddenMarkovModels.htl Ephass on a partcular H (the space of hdden Markov odels) 3
4 Bayesan Learnng Exaple: Unbased Con [1] Con Flp Saple space: Ω = {Head, Tal} Scenaro: gven con s ether far or has a 60% bas n favor of Head h 1 far con: (Head) = 0.5 h 60% bas towards Head: (Head) = 0.6 Objectve: to decde between default (null) and alternatve hypotheses A ror (aka ror) Dstrbuton on H (h 1 ) = 0.75, (h ) = 0.5 Reflects learnng agent s pror belefs regardng H Learnng s revson of agent s belefs Collecton of Evdence Frst pece of evdence: d a sngle con toss, coes up Head Q: What does the agent beleve now? A: Copute (d) = (d h 1 ) (h 1 ) + (d h ) (h ) Bayesan Learnng Exaple: Unbased Con [] Bayesan Inference: Copute (d) = (d h 1 ) (h 1 ) + (d h ) (h ) (Head) = = = 0.55 Ths s the probablty of the observaton d = Head Bayesan Learnng Now apply Bayes s Theore (h 1 d) = (d h 1 ) (h 1 ) / (d) = / 0.55 = (h d) = (d h ) (h ) / (d) = 0.15 / 0.55 = 0.86 Belef has been revsed downwards for h 1, upwards for h The agent stll thnks that the far con s the ore lkely hypothess Suppose we were to use the ML approach (.e., assue equal prors) Belef s revsed upwards fro 0.5 for h 1 Data then supports the bas con better More Evdence: Sequence D of 100 cons wth 70 heads and 30 tals (D) = (0.5) 50 (0.5) (0.6) 70 (0.4) Now (h 1 d) << (h d) 4
5 Brute Force MA Hypothess Learner Intutve Idea: roduce Most Lkely h Gven Observed D Algorth Fnd-MA-Hypothess (D) 1. FOR each hypothess h H Calculate the condtonal (.e., posteror) probablty: = ( h D) ( D h) ( h) ( D). RETURN the hypothess h MA wth the hghest condtonal probablty h MA = arg ax ( h D) Relaton to Concept Learnng Usual Concept Learnng Task Instance space X Hypothess space H Tranng exaples D Consder Fnd-S Algorth Gven: D Return: ost specfc h n the verson space VS H,D MA and Concept Learnng Bayes s Rule: Applcaton of Bayes s Theore What would Bayes s Rule produce as the MA hypothess? Does Fnd-S Output A MA Hypothess? 5
6 Bayesan Concept Learnng and Verson Spaces Assuptons Fxed set of nstances <x 1, x,, x > Let D denote the set of classfcatons: D = <c(x 1 ), c(x ),, c(x )> Choose (D h) (D h) = 1 f h consstent wth D (.e., x. h(x ) = c(x )) (D h) = 0 otherwse Choose (h) ~ Unfor 1 Unfor dstrbuton: ( h) = H Unfor prors correspond to no background knowledge about h Recall: axu entropy MA Hypothess ( h D) 1 = VS 0 H,D f h s consstent wth D otherwse Evoluton of osteror robabltes Start wth Unfor rors Equal probabltes assgned to each hypothess Maxu uncertanty (entropy), nu pror nforaton (h) (h D 1 ) (h D 1, D ) Hypotheses Hypotheses Hypotheses Evdental Inference Introduce data (evdence) D 1 : belef revson occurs Learnng agent revses condtonal probablty of nconsstent hypotheses to 0 osteror probabltes for reanng h VS H,D revsed upward Add ore data (evdence) D : further belef revson 6
7 Characterzng Learnng Algorths by Equvalent MA Learners Inductve Syste Tranng Exaples D Hypothess Space H Canddate Elnaton Algorth Output hypotheses Tranng Exaples D Equvalent Bayesan Inference Syste Output hypotheses Hypothess Space H (h) ~ Unfor (D h) = δ(h( ), c( )) Brute Force MA Learner ror knowledge ade explct Maxu Lkelhood: Learnng A Real-Valued Functon [1] y f(x) e h ML x roble Defnton Target functon: any real-valued functon f Tranng exaples <x, y > where y s nosy tranng value y = f(x ) + e e s rando varable (nose)..d. ~ Noral (0, σ), aka Gaussan nose Objectve: approxate f as closely as possble Soluton Maxu lkelhood hypothess h ML Mnzes su of squared errors (SSE) h ML = arg n ( ( )) d h x = 1 7
8 Maxu Lkelhood: Learnng A Real-Valued Functon [] Dervaton of Least Squares Soluton Assue nose s Gaussan (pror knowledge) Max lkelhood soluton: h = arg ax p D h σ = arg ax e = 1 ππ roble: Coputng Exponents, Coparng Reals - Expensve! Soluton: Maxze Log rob 1 1 d ( ) h x h ML = arg ax ln = 1 πσ σ 1 d ( ) h x = arg ax = 1 σ = arg ax = 1 ML = arg ax = arg n = 1 ( ) p( d h) ( d h( x )) ( d h( x )) = 1 1 d ( ) 1 h x Learnng to redct robabltes Applcaton: redctng Survval robablty fro atent Data roble Defnton Gven tranng exaples <x, d >, where d H {0, 1} Want to tran neural network to output a probablty gven x (not a 0 or 1) Maxu Lkelhood Estator (MLE) In ths case can show: hml = arg ax [ d lnh( x ) + ( 1 d ) ln( 1 h( x ))] Weght update rule for a sgod unt w = w x 1 w 1 x w x n w n x 0 = 1 start layer,end layer Δw start layer,end layer w 0 = 1 = r Σ n r r net = w x = w x = 0 start layer,end layer ( d h( x )) x start layer,end layer = 1 + Δw start layer,end layer r r r o = ( x ) = σ( x w ) σ( net ) 8
9 Most robable Classfcaton of New Instances MA and MLE: Ltatons roble so far: fnd the ost lkely hypothess gven the data Soetes we just want the best classfcaton of a new nstance x, gven D A Soluton Method Fnd best (MA) h, use t to classfy Ths ay not be optal, though! Analogy Estatng a dstrbuton usng the ode versus the ntegral One fnds the axu, the other the area Refned Objectve Want to deterne the ost probable classfcaton Need to cobne the predcton of all hypotheses redctons ust be weghted by ther condtonal probabltes Result: Bayes Optal Classfer (next te ) Mnu Descrpton Length (MDL) rncple: Occa s Razor Occa s Razor Recall: prefer the shortest hypothess - an nductve bas Questons Why short hypotheses as opposed to an arbtrary class of rare hypotheses? What s specal about nu descrpton length? Answers MDL approxates an optal codng strategy for hypotheses In certan cases, ths codng strategy axzes condtonal probablty Issues How exactly s nu length beng acheved (length of what)? When and why can we use MDL learnng for MA hypothess learnng? What does MDL learnng really ental (what does the prncple buy us)? MDL rncple refer h that nzes codng length of odel plus codng length of exceptons Model: encode h usng a codng schee C 1 Exceptons: encode the condtoned data D h usng a codng schee C 9
10 MDL Hypothess MDL and Optal Codng: Bayesan Inforaton Crteron (BIC) = arg n[ L ( h) L ( D h) ] h + MDL C 1 C e.g., H decson trees, D = labeled tranng data L C1 (h) nuber of bts requred to descrbe tree h under encodng C 1 L C (D h) nuber of bts requred to descrbe D gven h under encodng C NB: L C (D h) = 0 f all x classfed perfectly by h (need only descrbe exceptons) Hence h MDL trades off tree sze aganst tranng errors Bayesan Inforaton Crteron BIC ( h) = lg ( D h) + lg ( h) hma = arg ax[ ( D h) ( h) ] = arg ax[ lg ( D h) + lg ( h) ] = arg ax BIC( h) = arg n[ lg ( D h) lg ( h) ] Interestng fact fro nforaton theory: the optal (shortest expected code length) code for an event wth probablty p s -lg(p) bts Interpret h MA as total length of h and D gven h under optal code BIC = -MDL (.e., argax of BIC s argn of MDL crteron) refer hypothess that nzes length(h) + length (sclassfcatons) Concludng Rearks on MDL What Can We Conclude? Q: Does ths prove once and for all that short hypotheses are best? A: Not necessarly Only shows: f we fnd log-optal representatons for (h) and (D h), then h MA = h MDL No reason to beleve that h MDL s preferable for arbtrary codngs C 1, C Case n pont: practcal probablstc knowledge bases Elctaton of a full descrpton of (h) and (D h) s hard Huan pleentor ght prefer to specfy relatve probabltes Inforaton Theoretc Learnng: Ideas Learnng as copresson Abu-Mostafa: coplexty of learnng probles (n ters of nal codngs) Wolff: coputng (especally search) as copresson (Bayesan) odel selecton: searchng H usng probablstc crtera 10
11 Bayesan Classfcaton Fraework Fnd ost probable classfcaton (as opposed to MA hypothess) f: X V (doan nstance space, range fnte set of values) Instances x X can be descrbed as a collecton of features x (x 1, x,, x n ) erforance eleent: Bayesan classfer Gven: an exaple (e.g., Boolean-valued nstances: x H) Output: the ost probable value v j V (NB: prors for x constant wrt v MA ) v MA = arg ax v v j V = arg ax v j V ( j x ) = arg ax ( v j x1, x, K, x n ) v j V ( x, x, K, x v ) ( v ) 1 araeter Estaton Issues Estatng (v j ) s easy: for each value v j, count ts frequency n D = {<x, f(x)>} However, t s nfeasble to estate (x 1, x,, x n v j ): too any 0 values In practce, need to ake assuptons that allow us to estate (x d) n j j Bayes Optal Classfer (BOC) Intutve Idea h MA (x) s not necessarly the ost probable classfcaton! Exaple Three possble hypotheses: (h 1 D) = 0.4, (h D) = 0.3, (h 3 D) = 0.3 Suppose that for new nstance x, h 1 (x) = +, h (x) =, h 3 (x) = What s the ost probable classfcaton of x? Bayes Optal Classfcaton (BOC) v* = v Exaple (h 1 D) = 0.4, ( h 1 ) = 0, (+ h 1 ) = 1 (h) (h D) = 0.3, ( h ) = 1, (+ h ) = 0 (h 3 D) = 0.3, ( h 3 ) = 1, (+ h 3 ) = 0 [ ( + h ) ( h D) ] h H = 0.4 [ ( h ) ( h D) ] = 0.6 h H [ ( j ) ( )] Result: v* = v = arg ax v h h D = BOC v j V h H BOC = arg ax v j V h H [ ( v j h ) ( h D) ] h 11
12 Ternology Introducton to Bayesan Learnng robablty foundatons Defntons: subjectvst, frequentst, logcst (3) Kologorov axos Bayes s Theore ror probablty of an event Jont probablty of an event Condtonal (posteror) probablty of an event Maxu A osteror (MA) and Maxu Lkelhood (ML) Hypotheses MA hypothess: hghest condtonal probablty gven observatons (data) ML: hghest lkelhood of generatng the observed data ML estaton (MLE): estatng paraeters to fnd ML hypothess Bayesan Inference: Coputng Condtonal robabltes (Cs) n A Model Bayesan Learnng: Searchng Model (Hypothess) Space usng Cs Suary onts Introducton to Bayesan Learnng Fraework: usng probablstc crtera to search H robablty foundatons Defntons: subjectvst, objectvst; Bayesan, frequentst, logcst Kologorov axos Bayes s Theore Defnton of condtonal (posteror) probablty roduct rule Maxu A osteror (MA) and Maxu Lkelhood (ML) Hypotheses Bayes s Rule and MA Unfor prors: allow use of MLE to generate MA hypotheses Relaton to verson spaces, canddate elnaton Next Week: , Mtchell; Chapter 14-15, Russell and Norvg; Roth More Bayesan learnng: MDL, BOC, Gbbs, Sple (Naïve) Bayes Learnng over text 1
Excess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More informationPGM Learning Tasks and Metrics
Probablstc Graphcal odels Learnng Overvew PG Learnng Tasks and etrcs Learnng doan epert True dstrbuton P* aybe correspondng to a PG * dataset of nstances D{d],...d]} sapled fro P* elctaton Network Learnng
More informationNeed for Probabilistic Reasoning. Raymond J. Mooney. Conditional Probability. Axioms of Probability Theory. Classification (Categorization)
Need for Probablstc Reasonng CS 343: Artfcal Intelence Probablstc Reasonng and Naïve Bayes Rayond J. Mooney Unversty of Texas at Austn Most everyday reasonng s based on uncertan evdence and nferences.
More informationCOS 511: Theoretical Machine Learning
COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that
More informationBAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup
BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) (
More information1 Definition of Rademacher Complexity
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the
More information1 Review From Last Time
COS 5: Foundatons of Machne Learnng Rob Schapre Lecture #8 Scrbe: Monrul I Sharf Aprl 0, 2003 Revew Fro Last Te Last te, we were talkng about how to odel dstrbutons, and we had ths setup: Gven - exaples
More informationXII.3 The EM (Expectation-Maximization) Algorithm
XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationXiangwen Li. March 8th and March 13th, 2001
CS49I Approxaton Algorths The Vertex-Cover Proble Lecture Notes Xangwen L March 8th and March 3th, 00 Absolute Approxaton Gven an optzaton proble P, an algorth A s an approxaton algorth for P f, for an
More informationMultipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18
Multpont Analyss for Sblng ars Bostatstcs 666 Lecture 8 revously Lnkage analyss wth pars of ndvduals Non-paraetrc BS Methods Maxu Lkelhood BD Based Method ossble Trangle Constrant AS Methods Covered So
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationLECTURE :FACTOR ANALYSIS
LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationComputational and Statistical Learning theory Assignment 4
Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationBayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County
Smart Home Health Analytcs Sprng 2018 Bayesan Learnng Nrmalya Roy Department of Informaton Systems Unversty of Maryland Baltmore ounty www.umbc.edu Bayesan Learnng ombnes pror knowledge wth evdence to
More informationOur focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.
SSTEM MODELLIN In order to solve a control syste proble, the descrptons of the syste and ts coponents ust be put nto a for sutable for analyss and evaluaton. The followng ethods can be used to odel physcal
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationEE513 Audio Signals and Systems. Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky
EE53 Audo Sgnals and Systes Statstcal Pattern Classfcaton Kevn D. Donohue Electrcal and Couter Engneerng Unversty of Kentucy Interretaton of Audtory Scenes Huan erceton and cognton greatly eceeds any couter-based
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationDesigning Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate
The Frst Internatonal Senar on Scence and Technology, Islac Unversty of Indonesa, 4-5 January 009. Desgnng Fuzzy Te Seres odel Usng Generalzed Wang s ethod and Its applcaton to Forecastng Interest Rate
More informationClassification Bayesian Classifiers
lassfcaton Bayesan lassfers Jeff Howbert Introducton to Machne Learnng Wnter 2014 1 Bayesan classfcaton A robablstc framework for solvng classfcaton roblems. Used where class assgnment s not determnstc,.e.
More informationOutline. Prior Information and Subjective Probability. Subjective Probability. The Histogram Approach. Subjective Determination of the Prior Density
Outlne Pror Inforaton and Subjectve Probablty u89603 1 Subjectve Probablty Subjectve Deternaton of the Pror Densty Nonnforatve Prors Maxu Entropy Prors Usng the Margnal Dstrbuton to Deterne the Pror Herarchcal
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationReliability estimation in Pareto-I distribution based on progressively type II censored sample with binomial removals
Journal of Scentfc esearch Developent (): 08-3 05 Avalable onlne at wwwjsradorg ISSN 5-7569 05 JSAD elablty estaton n Pareto-I dstrbuton based on progressvely type II censored saple wth bnoal reovals Ilhan
More information} Often, when learning, we deal with uncertainty:
Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationWhat is LP? LP is an optimization technique that allocates limited resources among competing activities in the best possible manner.
(C) 998 Gerald B Sheblé, all rghts reserved Lnear Prograng Introducton Contents I. What s LP? II. LP Theor III. The Splex Method IV. Refneents to the Splex Method What s LP? LP s an optzaton technque that
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationCHAPT II : Prob-stats, estimation
CHAPT II : Prob-stats, estaton Randoness, probablty Probablty densty functons and cuulatve densty functons. Jont, argnal and condtonal dstrbutons. The Bayes forula. Saplng and statstcs Descrptve and nferental
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More informationy new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)
Feature Selecton: Lnear ransforatons new = M x old Constrant Optzaton (nserton) 3 Proble: Gven an objectve functon f(x) to be optzed and let constrants be gven b h k (x)=c k, ovng constants to the left,
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationBayesian Decision Theory
Bayesan Decson heory Berln hen 2005 References:. E. Alpaydn Introducton to Machne Learnng hapter 3 2. om M. Mtchell Machne Learnng hapter 6 Revew: Basc Formulas for robabltes roduct Rule: probablty A B
More informationTwo Conjectures About Recency Rank Encoding
Internatonal Journal of Matheatcs and Coputer Scence, 0(205, no. 2, 75 84 M CS Two Conjectures About Recency Rank Encodng Chrs Buhse, Peter Johnson, Wlla Lnz 2, Matthew Spson 3 Departent of Matheatcs and
More informationSystem in Weibull Distribution
Internatonal Matheatcal Foru 4 9 no. 9 94-95 Relablty Equvalence Factors of a Seres-Parallel Syste n Webull Dstrbuton M. A. El-Dacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co
More informationThe big picture. Outline
The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=
More informationCHAPTER 3: BAYESIAN DECISION THEORY
HATER 3: BAYESIAN DEISION THEORY Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationDenote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form
SET OF METHODS FO SOUTION THE AUHY POBEM FO STIFF SYSTEMS OF ODINAY DIFFEENTIA EUATIONS AF atypov and YuV Nulchev Insttute of Theoretcal and Appled Mechancs SB AS 639 Novosbrs ussa Introducton A constructon
More informationA Knowledge-Based Feature Selection Method for Text Categorization
A Knowledge-Based Feature Selecton Method for Text Categorzaton Yan Xu,2, JnTao L, Bn Wang,ChunMng Sun,2 Insttute of Coputng Technology,Chnese Acadey of Scences No.6 Kexueyuan South Road, Zhongguancun,Hadan
More informationBayesian decision theory. Nuno Vasconcelos ECE Department, UCSD
Bayesan decson theory Nuno Vasconcelos ECE Department UCSD Notaton the notaton n DHS s qute sloppy e.. show that error error z z dz really not clear what ths means we wll use the follown notaton subscrpts
More informationLeast Squares Fitting of Data
Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2014. All Rghts Reserved. Created: July 15, 1999 Last Modfed: February 9, 2008 Contents 1 Lnear Fttng
More informationDecision-making and rationality
Reslence Informatcs for Innovaton Classcal Decson Theory RRC/TMI Kazuo URUTA Decson-makng and ratonalty What s decson-makng? Methodology for makng a choce The qualty of decson-makng determnes success or
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationAn Optimal Bound for Sum of Square Roots of Special Type of Integers
The Sxth Internatonal Syposu on Operatons Research and Its Applcatons ISORA 06 Xnang, Chna, August 8 12, 2006 Copyrght 2006 ORSC & APORC pp. 206 211 An Optal Bound for Su of Square Roots of Specal Type
More informationOn Pfaff s solution of the Pfaff problem
Zur Pfaff scen Lösung des Pfaff scen Probles Mat. Ann. 7 (880) 53-530. On Pfaff s soluton of te Pfaff proble By A. MAYER n Lepzg Translated by D. H. Delpenc Te way tat Pfaff adopted for te ntegraton of
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationOne-Shot Quantum Information Theory I: Entropic Quantities. Nilanjana Datta University of Cambridge,U.K.
One-Shot Quantu Inforaton Theory I: Entropc Quanttes Nlanjana Datta Unversty of Cabrdge,U.K. In Quantu nforaton theory, ntally one evaluated: optal rates of nfo-processng tasks, e.g., data copresson, transsson
More informationIntroducing Entropy Distributions
Graubner, Schdt & Proske: Proceedngs of the 6 th Internatonal Probablstc Workshop, Darstadt 8 Introducng Entropy Dstrbutons Noel van Erp & Peter van Gelder Structural Hydraulc Engneerng and Probablstc
More informationSeveral generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c
Internatonal Conference on Appled Scence and Engneerng Innovaton (ASEI 205) Several generaton ethods of ultnoal dstrbuted rando nuber Tan Le, a,lnhe,b,zhgang Zhang,c School of Matheatcs and Physcs, USTB,
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationGadjah Mada University, Indonesia. Yogyakarta State University, Indonesia Karangmalang Yogyakarta 55281
Reducng Fuzzy Relatons of Fuzzy Te Seres odel Usng QR Factorzaton ethod and Its Applcaton to Forecastng Interest Rate of Bank Indonesa Certfcate Agus aan Abad Subanar Wdodo 3 Sasubar Saleh 4 Ph.D Student
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationWhy Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)
Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationMDL-Based Unsupervised Attribute Ranking
MDL-Based Unsupervsed Attrbute Rankng Zdravko Markov Computer Scence Department Central Connectcut State Unversty New Brtan, CT 06050, USA http://www.cs.ccsu.edu/~markov/ markovz@ccsu.edu MDL-Based Unsupervsed
More informationSpeech and Language Processing
Speech and Language rocessng Lecture 3 ayesan network and ayesan nference Informaton and ommuncatons Engneerng ourse Takahro Shnozak 08//5 Lecture lan (Shnozak s part) I gves the frst 6 lectures about
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationApplied Mathematics Letters
Appled Matheatcs Letters 2 (2) 46 5 Contents lsts avalable at ScenceDrect Appled Matheatcs Letters journal hoepage: wwwelseverco/locate/al Calculaton of coeffcents of a cardnal B-splne Gradr V Mlovanovć
More informationInstance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification
Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n
More informationPHYS 1443 Section 002 Lecture #20
PHYS 1443 Secton 002 Lecture #20 Dr. Jae Condtons for Equlbru & Mechancal Equlbru How to Solve Equlbru Probles? A ew Exaples of Mechancal Equlbru Elastc Propertes of Solds Densty and Specfc Gravty lud
More information2 Complement Representation PIC. John J. Sudano Lockheed Martin Moorestown, NJ, 08057, USA
The yste Probablty nforaton ontent P Relatonshp to ontrbutng oponents obnng ndependent Mult-ource elefs Hybrd and Pedgree Pgnstc Probabltes ohn. udano Lockheed Martn Moorestown 08057 U john.j.sudano@lco.co
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationHere is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)
Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,
More informationMachine Learning. What is a good Decision Boundary? Support Vector Machines
Machne Learnng 0-70/5 70/5-78 78 Sprng 200 Support Vector Machnes Erc Xng Lecture 7 March 5 200 Readng: Chap. 6&7 C.B book and lsted papers Erc Xng @ CMU 2006-200 What s a good Decson Boundar? Consder
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEARNING Vasant Honavar Bonformatcs and Computatonal Bology Program Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING
ESE 5 ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING Gven a geostatstcal regresson odel: k Y () s x () s () s x () s () s, s R wth () unknown () E[ ( s)], s R ()
More informationSolving Fuzzy Linear Programming Problem With Fuzzy Relational Equation Constraint
Intern. J. Fuzz Maeatcal Archve Vol., 0, -0 ISSN: 0 (P, 0 0 (onlne Publshed on 0 Septeber 0 www.researchasc.org Internatonal Journal of Solvng Fuzz Lnear Prograng Proble W Fuzz Relatonal Equaton Constrant
More informationOn the Construction of Polar Codes
On the Constructon of Polar Codes Ratn Pedarsan School of Coputer and Councaton Systes, Lausanne, Swtzerland. ratn.pedarsan@epfl.ch S. Haed Hassan School of Coputer and Councaton Systes, Lausanne, Swtzerland.
More informationBayesian predictive Configural Frequency Analysis
Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse
More informationCOMP th April, 2007 Clement Pang
COMP 540 12 th Aprl, 2007 Cleent Pang Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental,
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationSmall-Sample Equating With Prior Information
Research Report Sall-Saple Equatng Wth Pror Inforaton Sauel A Lvngston Charles Lews June 009 ETS RR-09-5 Lstenng Learnng Leadng Sall-Saple Equatng Wth Pror Inforaton Sauel A Lvngston and Charles Lews ETS,
More information