Probabilistic Ranking Principle. Hongning Wang

Size: px
Start display at page:

Download "Probabilistic Ranking Principle. Hongning Wang"

Transcription

1 Probablstc anng Prncple Hongnng Wang

2 ecap: latent semantc analyss Soluton Low ran matrx approxmaton Imagne ths s *true* concept-document matrx Imagne ths s our observed term-document matrx andom nose over the word selecton n each document CS@UVa CS4501: Informaton etreval 2

3 ecap: latent semantc analyss Solve LS by SVD Z = argmn Z ran Z = = argmn Z ran Z = = C M N Procedure of LS Map to a lower dmensonal space C Z F M N 2 =1 j=1 C j Z j 1. Perform SVD on document-term adjacency matrx 2. Construct C M N by only eepng the largest sngular values n Σ non-zero CS@UVa CS4501: Informaton etreval 3

4 ecap: latent semantc analyss Vsualzaton CS4501: Informaton etreval 4

5 ecap: LS for retreval Project queres nto the new document space 1 q = qv N Σ Treat query as a pseudo document of term vector Cosne smlarty between query and documents n ths lower-dmensonal space CS@UVa CS4501: Informaton etreval 5

6 ecap: LS for retreval HCI Graph theory q: human computer nteracton CS4501: Informaton etreval 6

7 Noton of relevance elevance epq epd Smlarty r=1 qd r {01} Probablty of elevance d q or q d Probablstc nference Dfferent rep & smlarty Vector space model Salton et al. 75 egresson Model Fox 83 Prob. dstr. model Wong & Yao 89 Doc generaton Generatve Model Classcal prob. Model obertson & Sparc Jones 76 Today s lecture uery generaton LM approach Ponte & Croft 98 Lafferty & Zha 01a Dfferent nference system Prob. concept space model Wong & Yao 95 Inference networ model Turtle & Croft 91 CS@UVa CS 4501: Informaton etreval 7

8 Basc concepts n probablty andom experment n experment wth uncertan outcome e.g. tossng a con pcng a word from text Sample space S ll possble outcomes of an experment e.g. tossng 2 far cons S={HH HT TH TT} Event E ES E happens ff outcome s n S e.g. E={HH} all heads E={HHTT} same face Impossble event {} certan event S Probablty of event 0 E 1 CS@UVa CS 4501: Informaton etreval 8

9 Essental probablty concepts Probablty of events Mutually exclusve events P B = P + B General events P B = P + P B B Independent events P B = P P B Jont probablty or smply as B CS@UVa CS 4501: Informaton etreval 9

10 Essental probablty concepts Condtonal probablty P B = B/P Bayes ule: P B = P B B/P For ndependent events P B = B Total probablty If 1 n form a non-overlappng partton of S BS = B B n P B = P B P P B 1 P 1 + +P B n P n P B Ths allows us to compute B based on B CS@UVa CS 4501: Informaton etreval 10

11 Interpretaton of Bayes rule Hypothess space: H = {H 1 H n } Evdence: E P H E = P E H H E If we want to pc the most lely hypothess H* we can drop E Posteror probablty of H Pror probablty of H H E E H H Lelhood of data/evdence gven H CS@UVa CS 4501: Informaton etreval 11

12 Theoretcal justfcaton of ranng s stated by Wllam Cooper If a reference retreval system s response to each request s a ranng of the documents n the collectons n order of decreasng probablty of usefulness to the user who submtted the request where the probabltes are estmated as accurately as possble on the bass of whatever data made avalable to the system for ths purpose then the overall effectveness of the system to ts users wll be the best that s obtanable on the bass of that data. an by probablty of relevance leads to optmal retreval performance CS@UVa CS 4501: Informaton etreval 12

13 Justfcaton From decson theory Two types of loss Lossretreved non-relevant=a 1 Lossnot retreved relevant=a 2 φd q: probablty of d beng relevant to q Expected loss regardng to the decson of ncludng d n the fnal results etreve: 1 φ d q a 1 Your decson crteron? Not retreve: φ d q a 2 CS@UVa CS 4501: Informaton etreval 13

14 Justfcaton From decson theory We mae decson by If 1 φ d q Otherwse not retreve d a 1 <φ d q a 2 retreve d Chec f φ d q > a 1 a 1 +a 2 an documents by descendng order of φ d q would mnmze the loss CS@UVa CS 4501: Informaton etreval 14

15 ccordng to PP what we need s relevance measure functon Fqd For all q d 1 d 2 Fqd 1 > Fqd 2 ff. pel qd 1 >pel qd 2 ssumptons Independent relevance Sequental browsng Most exstng research on I models so far has fallen nto ths lne of thnng Lmtatons? CS@UVa CS 4501: Informaton etreval 15

16 Probablty of relevance Three random varables uery Document D elevance {01} Goal: ran D based on =1 D Compute =1 D ctually one only needs to compare =1 D 1 wth =1 D 2.e. ran documents Several dfferent ways to defne =1 D CS@UVa CS 4501: Informaton etreval 16

17 Condtonal models for =1 D Basc dea: relevance depends on how well a query matches a document =1 D=gepD epd: feature representaton of query-doc par E.g. #matched terms hghest IDF of a matched term doclen Usng tranng data wth nown relevance judgments to estmate parameter pply the model to ran new documents Specal case: logstc regresson a functonal form CS@UVa CS 4501: Informaton etreval 17

18 egresson for ranng? Lnear regresson y w T X In a ranng problem: X features about query-document par y relevance label of document for the gven query elatonshp between a scalar dependent varable y and one or more explanatory varables CS@UVa CS 4501: Informaton etreval 18

19 Features/ttrbutes for ranng Typcal features consdered n ranng problems X X X X X X M 1 M 1 M M 1 L M 1 DL M 1 log M logf log DF N nt log n t j t t j j j verage bsolute uery Frequency uery Length verage bsolute Document Frequency Document Length verage Inverse Document Frequency Number of Terms n common between query and document CS@UVa CS 4501: Informaton etreval 19

20 egresson for ranng Lnear regresson y w T X elatonshp between a scalar dependent varable y and one or more explanatory varables y Y s dscrete n a ranng problem! y = 1 wt X > w T X 0.5 What f we have an outler? Optmal regresson model 0.00 CS@UVa CS 4501: Informaton etreval 20 x

21 egresson for ranng Logstc regresson =1 D = σ w T X = 1+exp w T X Drectly modelng posteror of document relevance Sgmod functon y x 1 What f we have an outler? CS@UVa CS 4501: Informaton etreval 21 x

22 Condtonal models for =1 D Pros & Cons dvantages bsolute probablty of relevance avalable May re-use all the past relevance judgments Problems Performance heavly depends on the selecton of features Lttle gudance on feature selecton Wll be covered wth more detals n later learnng-to-ran dscussons CS@UVa CS 4501: Informaton etreval 22

23 Generatve models for =1 D Basc dea Compute Odd=1 D usng Bayes rule Odd 1 D 1 D 0 D ssumpton elevance s a bnary varable Varants Document generaton D =D uery generaton D = DD D 1 1 D 0 0 Ignored for ranng CS@UVa CS 4501: Informaton etreval 23

24 Document generaton model Odd 1 D D 1 D 0 D 1 1 D 0 0 nformaton retreval retrevedignored s helpful for ranng for you everyone Doc Model 1 of relevant 1 docs 1 for 0 1 Doc Model 1 of non-relevant 1 1 docs 1 for 0 D 1 D 0 ssume ndependent attrbutes of 1.why? Let D=d 1 d where d {01} s the value of attrbute Smlarly =q 1 q d 1 Terms occur n doc 1 D Terms do not occur n doc Odd 1 1 d 1 d d document relevant=1 nonrelevant=0 term present =1 p u CS@UVa term absent =0 1-p 1-u CS 4501: Informaton etreval 24

25 Document generaton model CS 4501: Informaton etreval q q d q d q d d d u p p u u p u p u p P P P P d P d P D Odd Terms occur n doc Terms do not occur n doc document relevant=1 nonrelevant=0 term present =1 p u term absent =0 1-p 1-u ssumpton: terms not occurrng n the query are equally lely to occur n relevant and nonrelevant documents.e. p t =u t Important trcs 25

26 obertson-sparc Jones Model obertson & Sparc Jones 76 logo 1 D an 1 d q 1 p 1 u log u 1 p 1 d q 1 p log 1 p 1 u log u SJ model Two parameters for each term : p = =1 =1: prob. that term occurs n a relevant doc u = =1 =0: prob. that term occurs n a non-relevant doc How to estmate these parameters? Suppose we have relevance judgments pˆ # rel. doc wth 0.5 # rel. doc 1 uˆ # nonrel. doc wth 0.5 # nonrel. doc and +1 can be justfed by Bayesan estmaton as prors CS@UVa Per-query estmaton! CS 4501: Informaton etreval 26

27 Parameter estmaton General settng: Gven a hypotheszed & probablstc model that governs the random experment The model gves a probablty of any data pd θ that depends on the parameter θ Now gven actual sample data X={x 1 x n } what can we say about the value of θ? Intutvely tae our best guess of θ -- best means best explanng/fttng the data Generally an optmzaton problem CS@UVa CS 4501: Informaton etreval 27

28 Maxmum lelhood vs. Bayesan Maxmum lelhood estmaton Best means data lelhood reaches maxmum θ = argmax θ X θ Issue: small sample sze Bayesan estmaton Best means beng consstent wth our pror ML: Frequentst s pont of vew nowledge and explanng data well θ = argmax θ P θ X = argmax θ P X θ θ..a Maxmum a Posteror estmaton Issue: how to defne pror? MP: Bayesan s pont of vew CS@UVa CS 4501: Informaton etreval 28

29 Illustraton of Bayesan estmaton Posteror: p X px p Lelhood: px X=x 1 x N Pror: p : pror mode : posteror mode ml : ML estmate CS@UVa CS 4501: Informaton etreval 29

30 Data: a document d wth counts cw 1 cw N Model: multnomal computer dstrbuton pw θ wth parameters 30% nformaton θ = retreval pw computer scence relevant lterature Maxmum lelhood estmator: θ = argmax θ pw θ p W θ = lterature nformaton relevant 8% 20% 1% Maxmum lelhood estmaton L W θ = N c w 1 cw N N =1 N =1 c w log θ + λ N θ cw =1 =1 L = c w + λ θ θ θ = c w λ Snce θ = scence 31% N θ =1 we have λ = =1 c w retreval 10% N θ 1 N =1 θ cw c w log p W θ = Usng Lagrange multpler approach we ll tune θ to maxmze LW θ Set partal dervatves to zero ML estmate CS@UVa CS 4501: Informaton etreval 30 =1 N c w N =1 c w equrement from probablty log θ

31 obertson-sparc Jones Model obertson & Sparc Jones 76 logo 1 D an 1 d q 1 p 1 u log u 1 p 1 d q 1 p log 1 p 1 u log u SJ model Two parameters for each term : p = =1 =1: prob. that term occurs n a relevant doc u = =1 =0: prob. that term occurs n a non-relevant doc How to estmate these parameters? Suppose we have relevance judgments pˆ # rel. doc wth 0.5 # rel. doc 1 uˆ # nonrel. doc wth 0.5 # nonrel. doc and +1 can be justfed by Bayesan estmaton as prors CS@UVa Per-query estmaton! CS 4501: Informaton etreval 31

32 SJ Model wthout relevance nfo Croft & Harper 79 nformaton retreval retreved s helpful for you everyone an p 1 u p 1u log Doc1 O 1 D1 log 1 0 1log 1 log d q u p d q p u Doc SJ model Suppose we do not have relevance judgments - We wll assume p to be a constant - Estmate u by assumng all documents to be non-relevant logo 1 D 0.5 an c log 1 d q 1 n 0. 5 N n emnder: IDF 1 log N n N: # documents n collecton n : # documents n whch term occurs IDF weghted Boolean model? CS@UVa CS 4501: Informaton etreval 32

33 SJ Model: summary The most mportant classc probablstc I model Use only term presence/absence thus also referred to as Bnary Independence Model Essentally Naïve Bayes for doc ranng Desgned for short catalog records When wthout relevance judgments the model parameters must be estmated n an ad-hoc way Performance sn t as good as tuned VS models CS@UVa CS 4501: Informaton etreval 33

34 Improvng SJ: addng TF CS 4501: Informaton etreval Let D=d 1 d where d s the frequency count of term d d d P d P P d P P P d P d P d P d P D P D P E E e f E P e f E p E f p E P E f p E p f p f E f E!! 2-Posson mxture model for TF Many more parameters to estmate! how many exactly? Compound wth document length! Elteness: f the term s about the concept ased n the query 34

35 BM25/Oap approxmaton obertson et al. 94 Idea: pproxmate p=1 D wth a smpler functon that approxmates 2-Posson mxture model Observatons: 1 D 0 D log O=1 D s a sum of term weghts occurrng n both query and document Term weght W = 0 f TF =0 W ncreases monotoncally wth TF W has an asymptotc lmt The smple functon s W d d 1 d 1 TF 1 1 p 1 u log K TF u 1 p CS@UVa CS 4501: Informaton etreval 35

36 ddng doc. length Incorporatng doc length Motvaton: the 2-Posson model assumes equal document length Implementaton: penalze long doc W where TF 1 1 p 1 u log K TF u 1 p d K 11 b b avg d Pvoted document length normalzaton CS@UVa CS 4501: Informaton etreval 36

37 ddng query TF Incorporatng query TF Motvaton Natural symmetry between document and query Implementaton: a smlar TF transformaton as n document TF W TF s 1 TF 1 1 p 1 u log TF K TF u 1 p s The fnal formula s called BM25 achevng top TEC performance BM: best match CS@UVa CS 4501: Informaton etreval 37

38 The BM25 formula Oap TF/BM25 TF becomes IDF when no relevance nfo s avalable CS@UVa CS 4501: Informaton etreval 38

39 The BM25 formula closer loo rel q D TF-IDF component for document n 1 IDF q D 1 tf b b 2 11 b s usually set to [ ] 1 s usually set to [ ] 2 s usually set to ] tf 1 avg D TF component for query qtf 2 1 qtf Vector space model wth TF-IDF schema! CS@UVa CS 4501: Informaton etreval 39

40 Extensons of Doc Generaton models Capture term dependence lternatve ways to ncorporate TF [jsbergen & Harper 78] Feature/term selecton for feedbac reports] [Croft 83 Kalt96] [Oap s TEC Estmate of the relevance model based on pseudo feedbac [Lavreno & Croft 01] to be covered later CS@UVa CS 4501: Informaton etreval 40

41 uery generaton models O 1 D D D 1 0 D 1 D 1 D 0 D 0 D D 1 D 1 0 ssume D 0 0 uery lelhood pq d Document pror CS@UVa ssumng unform document pror we have O 1 D D 1 Now the queston s how to compute D 1? Generally nvolves two steps: 1 estmate a language model based on D 2 compute the query lelhood accordng to the estmated model Language models we wll cover t n next lecture! CS 4501: Informaton etreval 41

42 What you should now Essental concepts n probablty Justfcaton of ranng by relevance Dervaton of SJ model Maxmum lelhood estmaton BM25 formula CS@UVa CS 4501: Informaton etreval 42

43 Today s readng Chapter 11. Probablstc nformaton retreval 11.2 The Probablty anng Prncple 11.3 The Bnary Independence Model Oap BM25: a non-bnary model CS@UVa CS 4501: Informaton etreval 43

Probabilistic Ranking Principle. Hongning Wang

Probabilistic Ranking Principle. Hongning Wang robablstc ankng rncple Hongnng Wang CS@UVa Noton of relevance elevance epq epd Smlarty r1qd r {01} robablty of elevance d q or q d robablstc nference Dfferent rep & smlarty Vector space model Salton et

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology

Probabilistic Information Retrieval CE-324: Modern Information Retrieval Sharif University of Technology Probablstc Informaton Retreval CE-324: Modern Informaton Retreval Sharf Unversty of Technology M. Soleyman Fall 2016 Most sldes have been adapted from: Profs. Mannng, Nayak & Raghavan (CS-276, Stanford)

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Engineering Risk Benefit Analysis

Engineering Risk Benefit Analysis Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Midterm Review. Hongning Wang

Midterm Review. Hongning Wang Mdterm Revew Hongnng Wang CS@UVa Core concepts Search Engne Archtecture Key components n a modern search engne Crawlng & Text processng Dfferent strateges for crawlng Challenges n crawlng Text processng

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

Logistic Classifier CISC 5800 Professor Daniel Leeds

Logistic Classifier CISC 5800 Professor Daniel Leeds lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Goodness of fit and Wilks theorem

Goodness of fit and Wilks theorem DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Web-Mining Agents Probabilistic Information Retrieval

Web-Mining Agents Probabilistic Information Retrieval Web-Mnng Agents Probablstc Informaton etreval Prof. Dr. alf Möller Unverstät zu Lübeck Insttut für Informatonssysteme Karsten Martny Übungen Acknowledgements Sldes taken from: Introducton to Informaton

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 Generatve and Dscrmnatve Models Je Tang Department o Computer Scence & Technolog Tsnghua Unverst 202 ML as Searchng Hpotheses Space ML Methodologes are ncreasngl statstcal Rule-based epert sstems beng

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

CHAPTER 3: BAYESIAN DECISION THEORY

CHAPTER 3: BAYESIAN DECISION THEORY HATER 3: BAYESIAN DEISION THEORY Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

Stochastic Structural Dynamics

Stochastic Structural Dynamics Stochastc Structural Dynamcs Lecture-1 Defnton of probablty measure and condtonal probablty Dr C S Manohar Department of Cvl Engneerng Professor of Structural Engneerng Indan Insttute of Scence angalore

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

Evaluation for sets of classes

Evaluation for sets of classes Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton

More information

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ

More information

Clustering with Gaussian Mixtures

Clustering with Gaussian Mixtures Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Classification learning II

Classification learning II Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Probability Theory (revisited)

Probability Theory (revisited) Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

The big picture. Outline

The big picture. Outline The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=

More information

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Checking Pairwise Relationships. Lecture 19 Biostatistics 666

Checking Pairwise Relationships. Lecture 19 Biostatistics 666 Checkng Parwse Relatonshps Lecture 19 Bostatstcs 666 Last Lecture: Markov Model for Multpont Analyss X X X 1 3 X M P X 1 I P X I P X 3 I P X M I 1 3 M I 1 I I 3 I M P I I P I 3 I P... 1 IBD states along

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

10.34 Fall 2015 Metropolis Monte Carlo Algorithm 10.34 Fall 2015 Metropols Monte Carlo Algorthm The Metropols Monte Carlo method s very useful for calculatng manydmensonal ntegraton. For e.g. n statstcal mechancs n order to calculate the prospertes of

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Artificial Intelligence Bayesian Networks

Artificial Intelligence Bayesian Networks Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Lecture 4: Constant Time SVD Approximation

Lecture 4: Constant Time SVD Approximation Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),

More information

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta

Bayesian Networks. Course: CS40022 Instructor: Dr. Pallab Dasgupta Bayesan Networks Course: CS40022 Instructor: Dr. Pallab Dasgupta Department of Computer Scence & Engneerng Indan Insttute of Technology Kharagpur Example Burglar alarm at home Farly relable at detectng

More information

CHAPTER 14 GENERAL PERTURBATION THEORY

CHAPTER 14 GENERAL PERTURBATION THEORY CHAPTER 4 GENERAL PERTURBATION THEORY 4 Introducton A partcle n orbt around a pont mass or a sphercally symmetrc mass dstrbuton s movng n a gravtatonal potental of the form GM / r In ths potental t moves

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

A be a probability space. A random vector

A be a probability space. A random vector Statstcs 1: Probablty Theory II 8 1 JOINT AND MARGINAL DISTRIBUTIONS In Probablty Theory I we formulate the concept of a (real) random varable and descrbe the probablstc behavor of ths random varable by

More information

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients ECON 5 -- NOE 15 Margnal Effects n Probt Models: Interpretaton and estng hs note ntroduces you to the two types of margnal effects n probt models: margnal ndex effects, and margnal probablty effects. It

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information