Probabilistic Ranking Principle. Hongning Wang
|
|
- Arron Brooks
- 5 years ago
- Views:
Transcription
1 robablstc ankng rncple Hongnng Wang
2 Noton of relevance elevance epq epd Smlarty r1qd r {01} robablty of elevance d q or q d robablstc nference Dfferent rep & smlarty Vector space model Salton et al. 75 egresson Model Fox 83 rob. dstr. model Wong & Yao 89 Doc generaton Generatve Model Classcal prob. Model obertson & Sparck Jones 76 Today s lecture uery generaton LM approach onte & Croft 98 Lafferty & Zha 01a Dfferent nference system rob. concept space model Wong & Yao 95 Inference network model Turtle & Croft 91 CS@UVa CS 6501: Informaton etreval 2
3 Basc concepts n probablty andom experment n experment wth uncertan outcome e.g. tossng a con pckng a word from text Sample space S ll possble outcomes of an experment e.g. tossng 2 far cons S{HH HT TH TT} Event E E S E happens ff outcome s n S e.g. E{HH} all heads E{HHTT} same face Impossble event {} certan event S robablty of event 0 E 1 CS@UVa CS 6501: Informaton etreval 3
4 Essental probablty concepts robablty of events Mutually exclusve events BB + BB General events BB + BB BB Independent events BB BB Jont probablty or smply as BB CS@UVa CS 6501: Informaton etreval 4
5 ecap: TF normalzaton Two vews of document length doc s long because t s verbose doc s long because t has more content aw TF s naccurate Document length varaton epeated occurrences are less nformatve than the frst occurrence elevance does not ncrease proportonally wth number of term occurrence Generally penalze long doc but avod overpenalzng voted length normalzaton CS@UVa CS 6501: Informaton etreval 5
6 ecap: document frequency Idea: a term s more dscrmnatve f t occurs only n fewer documents CS@UVa CS 6501: Informaton etreval 6
7 ecap: cosne smlarty ngle between two vectors TF-IDF vector cccccccccccc VV qq VV dd VV qq VV dd VV qq 2 VV dd 2 VV qq VV qq 2 VV dd VV dd 2 Document length normalzed Fnance D 2 TF-IDF space Unt vector D 1 uery Sports CS@UVa CS 6501: Informaton etreval 7
8 ecap: fast computaton of cosne n retreval Mantan a score accumulator for each doc when scannng the postngs CS@UVa uery nfo securty Sdqgt 1 + +gt n [sum of TF of matched terms] Info: d1 3 d2 4 d3 1 d4 5 Securty: d2 3 d4 1 d5 3 nfo securty ccumulators: d1 d2 d3 d4 d5 d13 > d24 > d31 > d45 > d23 > d41 > d53 > CS 6501: Informaton etreval Can be easly appled to TF-IDF weghtng! Keep only the most promsng accumulators for top K retreval 8
9 ecap: advantages of VS Model Emprcally effectve! Top TEC performance Intutve Easy to mplement Well-studed/Mostly evaluated The Smart system Developed at Cornell: Stll wdely used Warnng: Many varants of TF-IDF! CS 6501: Informaton etreval 9
10 ecap: dsadvantages of VS Model ssume term ndependence ssume query and document to be the same Lack of predctve adequacy rbtrary term weghtng rbtrary smlarty measure Lots of parameter tunng! CS 6501: Informaton etreval 10
11 Essental probablty concepts Condtonal probablty BB BB/ Bayes ule: BB BB BB/ For ndependent events BB BB Total probablty If 1 nn form a non-overlappng partton of S BB SS BB BB nn BB BB BB BB nn nn BB Ths allows us to compute BB based on BB CS@UVa CS 6501: Informaton etreval 11
12 Interpretaton of Bayes rule Hypothess space: HH {HH 1 HH nn } Evdence: EE HH EE EE HH HH EE If we want to pck the most lkely hypothess H* we can drop EE osteror probablty of HH ror probablty of HH H E E H H Lkelhood of data/evdence gven HH CS@UVa CS 6501: Informaton etreval 12
13 Theoretcal justfcaton of rankng s stated by Wllam Cooper If a reference retreval system s response to each request s a rankng of the documents n the collectons n order of decreasng probablty of usefulness to the user who submtted the request where the probabltes are estmated as accurately as possble on the bass of whatever data made avalable to the system for ths purpose then the overall effectveness of the system to ts users wll be the best that s obtanable on the bass of that data. ank by probablty of relevance leads to the optmal retreval effectveness CS@UVa CS 6501: Informaton etreval 13
14 Justfcaton From decson theory Two types of loss Lossretrevednon-relevantaa 1 Lossnot retrevedrelevantaa 2 φφdd qq: probablty of dd beng relevant to qq Expected loss regardng to the decson of ncludng dd n the fnal results etreve: 1 φφ dd qq aa 1 Your decson crteron? Not retreve: φφ dd qq aa 2 CS@UVa CS 6501: Informaton etreval 14
15 Justfcaton From decson theory We make decson by If 1 φφ dd qq Otherwse not retreve dd aa 1 <φφ dd qq aa 2 retreve dd Check f φφ dd qq > aa 1 aa 1 +aa 2 ank documents by descendng order of φφ dd qq would mnmze the loss CS@UVa CS 6501: Informaton etreval 15
16 ccordng to what we need s relevance measure functon Fqd For all q d 1 d 2 Fqd 1 > Fqd 2 ff. pelqd 1 >pelqd 2 ssumptons Independent relevance Sequental browsng Most exstng research on I models so far has fallen nto ths lne of thnkng Lmtatons? CS@UVa CS 6501: Informaton etreval 16
17 robablty of relevance Three random varables uery Document D elevance {01} Goal: rank D based on 1D Compute 1D ctually one only needs to compare 1D 1 wth 1D 2.e. rank documents Several dfferent ways to defne 1D CS@UVa CS 6501: Informaton etreval 17
18 Condtonal models for 1D Basc dea: relevance depends on how well a query matches a document 1DgepDθ epd: feature representaton of query-doc par E.g. #matched terms hghest IDF of a matched term doclen Usng tranng data wth known relevance judgments to estmate parameter θ pply the model to rank new documents Specal case: logstc regresson a functonal form CS@UVa CS 6501: Informaton etreval 18
19 egresson for rankng? Lnear regresson yy ww TT XX In a rankng problem: XX features about query-document par yy relevance label of document for the gven query elatonshp between a scalar dependent varable yy and one or more explanatory varables CS@UVa CS 6501: Informaton etreval 19
20 Features/ttrbutes for rankng Typcal features consdered n rankng problems M 1 X1 logft j X X X X X M 1 M 1 M 1 L M 1 DL M 1 log M log DF N nt log n t j t j j verage bsolute uery Frequency uery Length verage bsolute Document Frequency Document Length verage Inverse Document Frequency Number of Terms n common between 6 query and document CS@UVa CS 6501: Informaton etreval 20
21 egresson for rankng Lnear regresson yy ww TT XX elatonshp between a scalar dependent varable yy and one or more explanatory varables y Y s dscrete n a rankng problem! yy 1 wwtt XX > ww TT XX 0.5 What f we have an outler? Optmal regresson model 0.00 x CS@UVa CS 6501: Informaton etreval 21
22 egresson for rankng Logstc regresson 1D σσ ww TT XX 1+exp ww TT XX Drectly modelng posteror of document relevance Sgmod functon yx 1 What f we have an outler? x CS@UVa CS 6501: Informaton etreval 22
23 Condtonal models for 1D ros & Cons dvantages bsolute probablty of relevance avalable May re-use all the past relevance judgments roblems erformance heavly depends on the selecton of features Lttle gudance on feature selecton Wll be covered wth more detals n later learnng-to-rank dscussons CS@UVa CS 6501: Informaton etreval 23
24 Generatve models for 1D Basc dea Compute Odd1D usng Bayes rule Odd 1 D 1 D 0 D ssumpton elevance s a bnary varable Varants Document generaton DD uery generaton DDD D 1 1 D 0 0 Ignored for rankng CS@UVa CS 6501: Informaton etreval 24
25 ecap: justfcaton From decson theory We make decson by If 1 φφ dd qq Otherwse not retreve dd aa 1 <φφ dd qq aa 2 retreve dd Check f φφ dd qq > aa 1 aa 1 +aa 2 ank documents by descendng order of φφ dd qq would mnmze the loss CS@UVa CS 6501: Informaton etreval 25
26 ecap: condtonal models for 1D Basc dea: relevance depends on how well a query matches a document 1DgepDθ epd: feature representaton of query-doc par E.g. #matched terms hghest IDF of a matched term doclen Usng tranng data wth known relevance judgments to estmate parameter θ pply the model to rank new documents Specal case: logstc regresson a functonal form CS@UVa CS 6501: Informaton etreval 26
27 Basc dea ecap: generatve models for 1D Compute Odd1D usng Bayes rule Odd 1 D 1 D 0 D ssumpton elevance s a bnary varable Varants Document generaton DD uery generaton DDD D 1 1 D 0 0 Ignored for rankng CS@UVa CS 6501: Informaton etreval 27
28 Document generaton model CS 6501: Informaton etreval D D D D D D D Odd Model of relevant docs for Model of non-relevant docs for ssume ndependent attrbutes of 1 k.why? Let Dd 1 d k where d k {01} s the value of attrbute k Smlarly q 1 q k k d k d k d d D Odd Terms occur n doc Terms do not occur n doc document relevant1 nonrelevant0 term present 1 p u term absent 0 1-p 1-u Ignored for rankng nformaton retreval retreved s helpful for you everyone Doc Doc CS@UVa 28
29 Document generaton model CS 6501: Informaton etreval k q k q d k q d k q d k d k d k u p p u u p u p u p d d D Odd Terms occur n doc Terms do not occur n doc document relevant1 nonrelevant0 term present 1 p u term absent 0 1-p 1-u ssumpton: terms not occurrng n the query are equally lkely to occur n relevant and nonrelevant documents.e. p t u t Important trcks CS@UVa 29
30 obertson-sparck Jones Model obertson & Sparck Jones 76 logo 1 D ank k 1 d q 1 p 1 u log u 1 p k 1 d q 1 p log 1 p + 1 u log u SJ model Two parameters for each term : p 11: prob. that term occurs n a relevant doc u 10: prob. that term occurs n a non-relevant doc How to estmate these parameters? Suppose we have relevance judgments pˆ # rel. doc wth # rel. doc + 1 uˆ # nonrel. doc wth # nonrel. doc and +1 can be justfed by Bayesan estmaton as prors er-query estmaton! CS@UVa CS 6501: Informaton etreval 30
31 arameter estmaton General settng: Gven a hypotheszed & probablstc model that governs the random experment The model gves a probablty of any data ppddθθ that depends on the parameter θθ Now gven actual sample data X{x 1 x n } what can we say about the value of θθ? Intutvely take our best guess of θθ -- best means best explanng/fttng the data Generally an optmzaton problem CS@UVa CS 6501: Informaton etreval 31
32 Maxmum lkelhood vs. Bayesan Maxmum lkelhood estmaton Best means data lkelhood reaches maxmum θθ aaaaaaaaaaaa θθ XXθθ Issue: small sample sze Bayesan estmaton Best means beng consstent wth our pror knowledge and explanng data well θθ aaaaaaaaaaxx θθ θθ XX aaaaaaaaaaaa θθ XX θθ θθ.k.a Maxmum a osteror estmaton Issue: how to defne pror? ML: Frequentst s pont of vew M: Bayesan s pont of vew CS@UVa CS 6501: Informaton etreval 32
33 Illustraton of Bayesan estmaton osteror: pθx pxθpθ Lkelhood: pxθ Xx 1 x N ror: pθ θ α : pror mode θ: posteror mode θ ml : ML estmate θ CS@UVa CS 6501: Informaton etreval 33
34 Maxmum lkelhood estmaton Data: a document d wth counts cw 1 cw N Model: multnomal computer dstrbuton pwwθθ wth parameters 30% nformaton θθ retreval ppww computer scence relevant lterature Maxmum lkelhood estmator: θθ aaaaaaaaaaxx θθ ppwwθθ pp WW θθ NN NN cc ww 1 ccww NN 1 NN NN θθ ccww 1 LL WW θθ cc ww log θθ + λλ θθ 1 1 NN 1 LL cc ww + λλ θθ θθ θθ cc ww λλ Snce θθ scence 31% lterature relevant 8% 1% 1 we have θθ ccww NN θθ 1 λλ cc ww cc ww nformaton 20% retreval 10% NN 1 log pp WW θθ cc ww Usng Lagrange multpler approach we ll tune θθ to maxmze LLWW θθ Set partal dervatves to zero ML estmate CS@UVa NN CS 6501: Informaton etreval 34 1 cc ww NN 1 equrement from probablty log θθ
35 obertson-sparck Jones Model obertson & Sparck Jones 76 logo 1 D ank k 1 d q 1 p 1 u log u 1 p k 1 d q 1 p log 1 p + 1 u log u SJ model Two parameters for each term : p 11: prob. that term occurs n a relevant doc u 10: prob. that term occurs n a non-relevant doc How to estmate these parameters? Suppose we have relevance judgments pˆ # rel. doc wth # rel. doc + 1 uˆ # nonrel. doc wth # nonrel. doc and +1 can be justfed by Bayesan estmaton as prors er-query estmaton! CS@UVa CS 6501: Informaton etreval 35
36 SJ Model wthout relevance nfo Croft & Harper 79 nformaton retreval retreved s helpful for you everyone ank k k p 1 u p 1 u log Doc1 O 1 D1 log 1 0 1log 1 + log d q u p d q p u Doc SJ model Suppose we do not have relevance judgments - We wll assume p to be a constant - Estmate u by assumng all documents to be non-relevant logo 1 D ank k c + log 1 d q 1 n N n emnder: IDF 1+ log N n N: # documents n collecton n : # documents n whch term occurs IDF weghted Boolean model? CS@UVa CS 6501: Informaton etreval 36
37 SJ Model: summary The most mportant classc probablstc I model Use only term presence/absence thus also referred to as Bnary Independence Model Essentally Naïve Bayes for doc rankng Desgned for short catalog records When wthout relevance judgments the model parameters must be estmated n an ad-hoc way erformance sn t as good as tuned VS models CS@UVa CS 6501: Informaton etreval 37
38 Improvng SJ: addng TF CS 6501: Informaton etreval Let Dd 1 d k where d k s the frequency count of term k k d k d k d k d d d d d d D D E E e f E e f E p E f p E E f p E p f p f E f E µ µ µ µ + +!! 2-osson mxture model for TF Many more parameters to estmate! how many exactly? Compound wth document length! Elteness: f the term s about the concept asked n the query CS@UVa 38
39 BM25/Okap approxmaton obertson et al. 94 Idea: model pd wth a smpler functon that approxmates 2-osson mxture model Observatons: 1 D k d D log O1D s a sum of term weghts occurrng n both query and document Term weght W 0 f TF 0 W ncreases monotoncally wth TF W has an asymptotc lmt The smple functon s W 1 d 1 d TF k p 1 u log K + TF u 1 p 0 CS@UVa CS 6501: Informaton etreval 39
40 ddng doc. length Incorporatng doc length Motvaton: the 2-osson model assumes equal document length Implementaton: penalze long doc W where TF k p 1 u log K + TF u 1 p 1 d K k1 b + b avg d voted document length normalzaton CS@UVa CS 6501: Informaton etreval 40
41 ddng query TF Incorporatng query TF Motvaton Natural symmetry between document and query Implementaton: a smlar TF transformaton as n document TF W TF ks + 1 TF k1 + 1 p 1 u log k + TF K + TF u 1 p s The fnal formula s called BM25 achevng top TEC performance BM: best match CS@UVa CS 6501: Informaton etreval 41
42 The BM25 formula Okap TF/BM25 TF becomes IDF when no relevance nfo s avalable CS@UVa CS 6501: Informaton etreval 42
43 The BM25 formula closer look rel q D TF-IDF component for document k n 1 IDF q D 1 tf k + k b + b 2 11 bb s usually set to [ ] kk 1 s usually set to [ ] kk 2 s usually set to ] tf + 1 avg D TF component for query qtf k qtf Vector space model wth TF-IDF schema! CS@UVa CS 6501: Informaton etreval 43
44 Extensons of Doc Generaton models Capture term dependence lternatve ways to ncorporate TF [jsbergen & Harper 78] Feature/term selecton for feedback reports] [Croft 83 Kalt96] [Okap s TEC Estmate of the relevance model based on pseudo feedback [Lavrenko & Croft 01] to be covered later CS@UVa CS 6501: Informaton etreval 44
45 uery generaton models O 1 D D D 1 0 D 1 D 1 D 0 D 0 D D 1 D 1 0 ssume D 0 0 uery lkelhood pq θ d Document pror ssumng unform document pror we have O 1 D D 1 Now the queston s how to compute D 1? Generally nvolves two steps: 1 estmate a language model based on D 2 compute the query lkelhood accordng to the estmated model Language models we wll cover t n next lecture! CS@UVa CS 6501: Informaton etreval 45
46 What you should know Essental concepts n probablty Justfcaton of rankng by relevance Dervaton of SJ model Maxmum lkelhood estmaton BM25 formula CS@UVa CS 6501: Informaton etreval 46
47 Today s readng Chapter 11. robablstc nformaton retreval 11.2 The robablty ankng rncple 11.3 The Bnary Independence Model Okap BM25: a non-bnary model CS@UVa CS 6501: Informaton etreval 47
Probabilistic Ranking Principle. Hongning Wang
Probablstc anng Prncple Hongnng Wang CS@UVa ecap: latent semantc analyss Soluton Low ran matrx approxmaton Imagne ths s *true* concept-document matrx Imagne ths s our observed term-document matrx andom
More informationCS47300: Web Information Search and Management
CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes
More informationProbabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology
Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationWeb-Mining Agents Probabilistic Information Retrieval
Web-Mnng Agents Probablstc Informaton etreval Prof. Dr. alf Möller Unverstät zu Lübeck Insttut für Informatonssysteme Karsten Martny Übungen Acknowledgements Sldes taken from: Introducton to Informaton
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationLanguage Models. Hongning Wang
Language Models Hongning Wang CS@UVa Notion of Relevance Relevance (Rep(q), Rep(d)) Similarity P(r1 q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationGenerative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
Generatve and Dscrmnatve Models Je Tang Department o Computer Scence & Technolog Tsnghua Unverst 202 ML as Searchng Hpotheses Space ML Methodologes are ncreasngl statstcal Rule-based epert sstems beng
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed
More informationStochastic Structural Dynamics
Stochastc Structural Dynamcs Lecture-1 Defnton of probablty measure and condtonal probablty Dr C S Manohar Department of Cvl Engneerng Professor of Structural Engneerng Indan Insttute of Scence angalore
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationLogistic Classifier CISC 5800 Professor Daniel Leeds
lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationGoodness of fit and Wilks theorem
DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationSee Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)
Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationSTAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression
STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,
More informationDecision-making and rationality
Reslence Informatcs for Innovaton Classcal Decson Theory RRC/TMI Kazuo URUTA Decson-makng and ratonalty What s decson-makng? Methodology for makng a choce The qualty of decson-makng determnes success or
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More information4.3 Poisson Regression
of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationExtending Relevance Model for Relevance Feedback
Extendng Relevance Model for Relevance Feedback Le Zhao, Chenmn Lang and Jame Callan Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty {lezhao, chenmnl, callan}@cs.cmu.edu
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationClassification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,
Machne Learnng 10-701/15-781, 781, Fall 2011 Nonparametrc methods Erc Xng Lecture 2, September 14, 2011 Readng: 1 Classfcaton Representng data: Hypothess (classfer) 2 1 Clusterng 3 Supervsed vs. Unsupervsed
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationMidterm Review. Hongning Wang
Mdterm Revew Hongnng Wang CS@UVa Core concepts Search Engne Archtecture Key components n a modern search engne Crawlng & Text processng Dfferent strateges for crawlng Challenges n crawlng Text processng
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,
More informationThe conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More information/ n ) are compared. The logic is: if the two
STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence
More informationGaussian process classification: a message-passing viewpoint
Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP
More informationPolynomial Regression Models
LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance
More informationCHAPTER 3: BAYESIAN DECISION THEORY
HATER 3: BAYESIAN DEISION THEORY Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationCommunication with AWGN Interference
Communcaton wth AWG Interference m {m } {p(m } Modulator s {s } r=s+n Recever ˆm AWG n m s a dscrete random varable(rv whch takes m wth probablty p(m. Modulator maps each m nto a waveform sgnal s m=m
More informationConditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More informationHere is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)
Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,
More informationChecking Pairwise Relationships. Lecture 19 Biostatistics 666
Checkng Parwse Relatonshps Lecture 19 Bostatstcs 666 Last Lecture: Markov Model for Multpont Analyss X X X 1 3 X M P X 1 I P X I P X 3 I P X M I 1 3 M I 1 I I 3 I M P I I P I 3 I P... 1 IBD states along
More informationLECTURE 9 CANONICAL CORRELATION ANALYSIS
LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationLOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi
LOGIT ANALYSIS A.K. VASISHT Indan Agrcultural Statstcs Research Insttute, Lbrary Avenue, New Delh-0 02 amtvassht@asr.res.n. Introducton In dummy regresson varable models, t s assumed mplctly that the dependent
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationExcess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics
ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More information