Information Retrieval. Lecture 10: Text Classification;
|
|
- Joshua Hoover
- 5 years ago
- Views:
Transcription
1 Introduction ti to Information Retrieval Lecture 10: Tet Classification; The Naïve Bayes algorithm
2 Relevance feedback revisited In relevance feedback, the user marks a number of documents as relevant/nonrelevant. We then try to use this information to return better search results. Suose we ust tried to learn a filter for nonrelevant documents This is an instance of a tet classification roblem: Two classes : relevant, nonrelevant For each document, decide whether it is relevant or nonrelevant The notion of classification is very general and has many alications within and beyond information retrieval.
3 Standing queries The ath from information retrieval to tet classification: You have an information need, say: Unrest in the Niger delta region You want to rerun an aroriate query eriodically to find new news items on this toic You will be sent new documents that are found Ie I.e., it s classification not ranking Such queries are called standing queries Long used by information rofessionals A modern mass instantiation is Google Alerts 13.0
4 Sam filtering: Another tet classification task From: "" Subect: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Sto aying rent TODAY! There is no need to send hundreds or even thousands for similar courses I am 22 years old and I have already urchased 6 roerties using the methods outlined in this truly INCREDIBLE ebook. Change your life NOW! Click Below to order: htt:// 13.0
5 Tet classification: Naïve Bayes Tet Classification Today: Introduction to Tet Classification Also widely known as tet categorization. Same thing. Probabilistic Language Models Naïve Bayes tet classification
6 Categorization/Classification Given: A descrition of an instance, X, where X is the instance language g or instance sace. Issue: how to reresent tet documents. A fied set of classes: C {c 1, c 2,, c J } Determine: The category of : c C, where c is a classification function whose domain is X and whose range is C. We want to know how to build classification functions classifiers. 13.1
7 Document Classification Test Data: lanning language g roof intelligence Classes: ML AI Programming HCI Planning Semantics Garb.Coll. Multimedia GUI Training Data: learning intelligence algorithm reinforcement network... lanning temoral reasoning lan language... rogramming semantics language roof... garbage collection memory otimization region Note: in real life there is often a hierarchy, not resent in the above roblem statement; and also, you get aers on ML aroaches to Garb. Coll. 13.1
8 More Tet Classification Eamles: Many search engine functionalities use classification Assign labels to each document or web-age: Labels are most often toics such as Yahoo-categories eg e.g., "finance finance, ""sorts sorts, ""news>world>asia>business" Labels may be genres e.g., "editorials" "movie-reviews" "news Labels may be oinion on a erson/roduct e.g., like, hate, neutral Labels may be domain-secific e.g., "interesting-to-me" : "not-interesting-to-me e.g., contains adult language : doesn t e.g., language identification: English, French, Chinese, e.g., search vertical: about Linu versus not e.g., link sam : not link sam
9 Classification Methods 1 Manual classification Used by Yahoo! originally; now resent but downlayed, Looksmart, about.com, ODP, PubMed Very accurate when ob is done by eerts Consistent when the roblem size and team is small Difficult and eensive to scale Means we need automatic classification methods for big roblems 13.0
10 Classification Methods 2 Automatic document classification Hand-coded rule-based systems One technique used by CS det s sam filter, Reuters, CIA, etc. Comanies Verity rovide IDE for writing such rules E.g., assign category if document contains a given boolean combination of words Standing queries: Commercial systems have comle query languages everything in IR query languages + accumulators Accuracy is often very high if a rule has been carefully refined over time by a subect eert Building and maintaining these rules is eensive 13.0
11 A Verity toic a comle classification rule Note: maintenance issues author, etc. Hand-weighting of terms
12 Classification Methods 3 Suervised learning of a document-label assignment function Many systems artly rely on machine learning Autonomy, MSN, Verity, Enkata, Yahoo!, k-nearest Neighbors simle, owerful Naive Bayes simle, common method Suort-vector machines new, more owerful lus many other methods No free lunch: requires hand-classified training data But data can be built u and refined by amateurs Note that many commercial systems use a miture of methods
13 Probabilistic relevance feedback Recall this idea: Rather than reweighting in a vector sace If user has told us some relevant and some irrelevant documents, then we can roceed to build a robabilistic classifier, such as the Naive Bayes model we will look at today: Pt k R D rk / D r Pt k NR D nrk / D nr t k is a term; D r is the set of known relevant documents; D rk is the subset that contain t k; D nr is the set of known irrelevant documents; D nrk is the subset that contain t k
14 Recall a few robability basics Recall a few robability basics For events a and b: For events a and b: Bayes Rule a a b b b a b a b a a a b b b a a a b b b a b a b a, Prior b a a b b a a b b a Prior Odds: a a b b, a a O Posterior Odds: 1 a a a O 13.2
15 Bayesian Methods Our focus this lecture Learning and classification methods based on robability theory. Bayes theorem lays a critical role in robabilistic learning and classification. Build a generative model that t aroimates how data is roduced Uses rior robability of each category given no information about an item. Categorization roduces a osterior robability distribution over the ossible categories given a descrition of an item. 13.2
16 Bayes Rule PC,D PC DPD PD CPC PD CPC PC D PD
17 Naive Bayes Classifiers Naive Bayes Classifiers Task: Classify a new instance D based on a tule of attribute Task: Classify a new instance D based on a tule of attribute values into one of the classes c C n D,, 2, 1 K,,, argma 2 1 n C c MAP c P c K,,,,,, argma n n C c P c P c P K K,,, 2 1 n C c K,,, argma 2 1 n C c P c P K c C
18 Naïve Bayes Classifier: Naïve Bayes Assumtion Pc Can be estimated from the frequency of classes in the training eamles. P 1, 2,, n c O X n C arameters Could only be estimated if a very, very large number of training eamles was available. Naïve Bayes Conditional Indeendence d Assumtion: Assume that the robability of observing the conunction of attributes is equal to the roduct of the individual robabilities P i c.
19 The Naïve Bayes Classifier Flu P X 1 X 2 X 3 X 4 X 5 runnynose sinus cough fever muscle-ache Conditional Indeendence Assumtion: features detect term resence and are indeendent of each other given the class: 5 X1, K, X 5 C P X1 C P X 2 C L P X C This model is aroriate for binary variables Multivariate Bernoulli model 13.3
20 Learning the Model C X 1 X 2 X 3 X 4 X 5 X 6 First attemt: maimum likelihood estimates simly use the frequencies in the data P ˆ Pˆ c i c N C N c N X, C i N C c i c 13.3
21 Problem with Ma Likelihood Flu P X 1 X 2 X 3 X 4 X 5 runnynose sinus cough fever muscle-ache 5 X1, K, X 5 C P X1 C P X 2 C L P X C What if we have seen no training cases where atient had no flu and muscle aches? ˆ N X 5 t, C nf P X t C nf N C nf 5 Zero robabilities cannot be conditioned away, no matter the other evidence! l arg ma c i i P ˆ c Pˆ c
22 Smoothing to Avoid Overfitting Smoothing to Avoid Overfitting c C X N + 1 k c C N c C X N c P i i i + + 1, ˆ # of values of X i ll f ti i Somewhat more subtle version C X N overall fraction in data where X i i,k m c C N m c C X N c P k i k i i k i + +, ˆ,,, etent of smoothing
23 Stochastic Language Models Models robability of generating strings each word in turn in the language commonly all strings over. E.g., unigram model Model M 0.2 the 0.1 a 0.01 man 0.01 woman 0.03 said 0.02 likes the man likes the woman multily Ps M
24 Stochastic Language Models Model robability of generating any string Model M1 Model M2 0.2 the 0.01 class sayst leaseth yon maiden 0.01 woman 0.2 the class 0.03 sayst leaseth 0.1 yon 0.01 maiden woman the class leaseth yon Ps M2 > Ps M1 maiden
25 Unigram and higher-order models P P P P P Unigram Language Models P P P P Bigram generally, n-gram Language Models P P P P Other Language Models Grammar-based models PCFGs, etc. Probably not the first thing to try in IR Easy. Effective!
26 Naïve Bayes via a class conditional language age model multinomial NB Cat w 1 w 2 w 3 w 4 w 5 w 6 Effectively, the robability of each class is done as a class-secific unigram language model
27 Using Multinomial Naive Bayes Classifiers to Classify Tet: Basic method Attributes are tet ositions, values are words. c argma P c NB c C c C i P 1 i c argma P c P "our" c LP "tet" n c Still too many ossibilities Assume that classification is indeendent of the ositions of the words Use same arameters for each osition Result is bag of words model over tokens not tyes
28 Naïve Bayes: Learning From training corus, etract Vocabulary Calculate required Pc and P k c terms For each c in C do docs subset of documents for which h the target t class is c P c docs total # documents Tet single document containing all docs for each word k in Vocabulary n k number of occurrences of k in Tet P k c nk + α n + α Vocabulary
29 Naïve Bayes: Classifying ositions all word ositions in current document which contain tokens found in Vocabulary Return c NB, where c NB argma P c P i c c C i ositions
30 Naive Bayes: Time Comleity Training Time: O D L d + C V where L d is the average length of a document in D. Assumes V and all D i, n i, and n i re-comuted in O D L d time during one ass through all of the data. Generally ust O D L d since usually C V < D L d Test Time: O C C L t where L t is the average length of a test document. Very efficient overall, linearly roortional to the time Very efficient overall, linearly roortional to the time needed to ust read in all the data. Why?
31 Underflow Prevention: log sace Multilying lots of robabilities, which are between 0 and 1 by definition, can result in floating-oint underflow. Since logy log + logy, it is better to erform all comutations by summing logs of robabilities rather than multilying robabilities. Class with highest final un-normalized log robability score is still the most robable. c NB argma log P c c C + i ositions log P Note that model is now ust ma of sum of weights i c
32 Note: Two Models Model 1: Multivariate Bernoulli One feature X w for each word in dictionary X w true in document d if w aears in d Naive Bayes assumtion: Given the document s toic, aearance of one word in the document tells us nothing about chances that another word aears This is the model used in the binary indeendence model in classic robabilistic relevance feedback in hand-classified data Maron in IR was a very early user of NB
33 Two Models Model 2: Multinomial Class conditional unigram One feature X i for each word os in document feature s values are all words in dictionary Value of X i is the word in osition i Naïve Bayes assumtion: Given the document s toic, word in one osition in the document tells us nothing about words in other ositions Second assumtion: Word aearance does not deend on osition P X w c P X w c i for all ositions i i,, word w, and class c Just have one multinomial feature redicting all words
34 Parameter estimation Multivariate i t Bernoulli model: Pˆ X t c w fraction of documents of toic c in which word w aears Multinomial model: Pˆ X w c i fraction of times in which word w aears across all documents of toic c Can create a mega-document for toic by concatenating all documents in this toic Use frequency of w in mega-document
35 Classification Multinomial vs Multivariate Bernoulli? Multinomial model is almost always more effective in tet alications! See results figures later See IIR sections 13.2 and 13.3 for worked eamles with each model
36 Feature Selection: Why? Tet collections have a large number of features 10,000 1,000,000 unique words and more May make using a articular classifier feasible Some classifiers can t deal with 100,000 of features Reduces training time Training i time for some methods is quadratic or worse in the number of features Can imrove generalization erformance Eliminates noise features Avoids overfitting 13.5
37 Feature selection: how? Two ideas: Hyothesis testing statistics: Are we confident that the value of one categorical variable is associated with the value of another Chi-square test Information theory: How much information does the value of one categorical variable give you about the value of another Mutual information They re similar, but χ 2 measures confidence in association, based on available statistics, while MI measures etent of association assuming erfect knowledge of robabilities 13.5
38 χ 2 statistic CHI χ2 is interested in f o f e 2 /f e summed over all table entries: is the observed number what you d eect given the marginals? 2 2 χ, a O E / E / The null hyothesis is reected with confidence.999, since 12.9 > the value for.999 confidence. 2 / / / <.001 Term aguar Term aguar eected: f e Class auto Class auto observed: f o
39 χ 2 statistic CHI There is a simler formula for 22 χ 2 : A #t,c C # t,c B #t #t, c D # t # t, c N A + B + C + D Value for comlete indeendence of term and category?
40 Feature selection via Mutual Information In training set, choose k words which best discriminate give most info on the categories. The Mutual Information between a word, class is: e w, ec I w, c e, e log w c e { 0,1} e { 0,1} e e w c For each word w and each category c w e c
41 Feature selection via MI contd. For each category we build a list of k most discriminating terms. For eamle on 20 Newsgrous: sci.electronics: circuit, voltage, am, ground, coy, battery, electronics, cooling, rec.autos: car, cars, engine, ford, dealer, mustang, oil, collision, autos, tires, toyota, Greedy: does not account for correlations between terms Why?
42 Feature Selection Mutual Information Clear information-theoretic interretation May select rare uninformative terms Chi-square Statistical foundation May select very slightly informative frequent terms that are not very useful for classification Just use the commonest terms? No articular foundation In ractice, this is often 90% as good
43 Feature selection for NB In general feature selection is necessary for multivariate Bernoulli NB. Otherwise you suffer from noise, multi-counting Feature selection really means something different for multinomial NB. It means dictionary truncation The multinomial NB model only has 1 feature This feature selection normally isn t needed for multinomial NB, but may hel a fraction with quantities that are badly estimated
44 Evaluating Categorization Evaluation must be done on test data that are indeendent of the training data usually a disoint set of instances. Classification accuracy: c/n/ where n is the total number of test instances and c is the number of test instances correctly classified by the system. Adequate if one class er document Otherwise F measure for each class Results can vary based on samling error due to different training and test sets. Average results over multile training and test sets slits of the overall data for the best results. See IIR 13.6 for evaluation on Reuters
45 WebKB Eeriment 1998 Classify webages from CS deartments into: student, faculty, course,roect Train on ~5,000 hand-labeled web ages Cornell, Washington, U.Teas, Wisconsin Crawl and classify a new site CMU Results: Student Faculty Person Proect Course Deartmt Etracted Correct Accuracy: 72% 42% 79% 73% 89% 100%
46 NB Model Comarison: WebKB
47
48 Naïve Bayes on sam 13.6
49 SamAssassin Naïve Bayes has found a home in sam filtering Paul Graham s A Plan for Sam A mutant with more mutant offsring... Naive Bayes-like classifier with weird arameter estimation Widely used in sam filters Classic Naive Bayes suerior when aroriately used According to David D. Lewis But also many other things: black hole lists, etc. Many toic filters also use NB classifiers
50 Violation of NB Assumtions Conditional indeendence Positional indeendence Eamles?
51 Eamle: Sensors Raining Reality Sunny P++r P+,+,r 3/8 P-,,r -r 1/8 P++s P+,+,s 1/8 P-,,s -s 3/8 NB Model Raining? M1 M2 NB FACTORS: Ps 1/2 P+ s 1/4 P+ r 3/4 PREDICTIONS: Pr,+,+ ½¾¾ Ps,+,+ ½¼¼ Pr +,+ 9/10 Ps +,+ 1/10
52 Naïve Bayes Posterior Probabilities Classification results of naïve Bayes the class with maimum osterior robability are usually fairly accurate. However, due to the inadequacy of the conditional indeendence assumtion, the actual osterior-robability numerical estimates are not. Outut robabilities are commonly very close to 0 or 1. Correct estimation accurate rediction, but correct robability estimation is NOT necessary for accurate rediction ust need right ordering of robabilities
53 Naive Bayes is Not So Naive Naïve Bayes: First and Second lace in KDD-CUP 97 cometition, among 16 then state t of the art algorithms Goal: Financial services industry direct mail resonse rediction model: Predict if the reciient of mail will actually resond to the advertisement 750,000 records. Robust to Irrelevant Features Irrelevant Features cancel each other without affecting results Instead Decision Trees can heavily suffer from this. Very good in domains with many equally imortant features Decision Trees suffer from fragmentation in such cases esecially if little data A good deendable baseline for tet classification but not the best! Otimal if the Indeendence Assumtions hold: If assumed indeendence is correct, then it is the Bayes Otimal Classifier for roblem Very Fast: Learning with one ass of counting over the data; testing linear in the number of attributes, and document collection size Low Storage requirements
54 Resources IIR 13 Fabrizio Sebastiani. Machine Learning in Automated Tet Categorization. ACM Comuting Surveys, 341:1-47, Yiming Yang & Xin Liu, A re-eamination of tet categorization methods. Proceedings of SIGIR, Andrew McCallum and Kamal Nigam. A Comarison of Event Models for Naive Bayes Tet Classification. In AAAI/ICML-98 Worksho on Learning for Tet Categorization, Tom Mitchell, Machine Learning. McGraw-Hill, Clear simle elanation of Naïve Bayes Oen Calais: Automatic Semantic Tagging Free but they can kee your data, rovided by Thomson/Reuters Weka: A data mining software ackage that includes an imlementation of Naive Bayes Reuters the most famous tet classification evaluation set and still widely used by lazy eole but now it s too small for realistic eeriments you should use Reuters RCV1
Information Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More informationPart 9: Text Classification; The Naïve Bayes algorithm Francesco Ricci
Part 9: Text Classification; The Naïve Bayes algorithm Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan 1 Content
More informationBayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)
Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning) Review: Bayes Theorem & Diagnosis P( a b) Posterior Likelihood Prior P( b a) P( a)
More informationAutomatic Clasification: Naïve Bayes
Automatic Clasification: Naïve Bayes Wm&R a.a. 2012/13 R. Basili (slides borrowed by: H. Schutze Dipartimento di Informatica Sistemi e produzione Università di Roma Tor Vergata Email: basili@info.uniroma2.it
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationnaive bayes document classification
naive bayes document classification October 31, 2018 naive bayes document classification 1 / 50 Overview 1 Text classification 2 Naive Bayes 3 NB theory 4 Evaluation of TC naive bayes document classification
More information10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables
Probability, Conditional Probability & Bayes Rule A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) 2 Discrete random variables A random variable can take on one of a set of different values, each with an
More informationIntroduction to Probability for Graphical Models
Introduction to Probability for Grahical Models CSC 4 Kaustav Kundu Thursday January 4, 06 *Most slides based on Kevin Swersky s slides, Inmar Givoni s slides, Danny Tarlow s slides, Jaser Snoek s slides,
More informationNamed Entity Recognition using Maximum Entropy Model SEEM5680
Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves
More informationRanked Retrieval (2)
Text Technologies for Data Science INFR11145 Ranked Retrieval (2) Instructor: Walid Magdy 31-Oct-2017 Lecture Objectives Learn about Probabilistic models BM25 Learn about LM for IR 2 1 Recall: VSM & TFIDF
More informationCombining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)
Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationBERNOULLI TRIALS and RELATED PROBABILITY DISTRIBUTIONS
BERNOULLI TRIALS and RELATED PROBABILITY DISTRIBUTIONS A BERNOULLI TRIALS Consider tossing a coin several times It is generally agreed that the following aly here ) Each time the coin is tossed there are
More informationTests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)
Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant
More informationText Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)
Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed
More informationHotelling s Two- Sample T 2
Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test
More informationMATH 2710: NOTES FOR ANALYSIS
MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite
More informationOn split sample and randomized confidence intervals for binomial proportions
On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have
More informationBehavioral Data Mining. Lecture 2
Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events
More informationNaïve Bayes for Text Classification
Naïve Bayes for Tet Classifiation adapted by Lyle Ungar from slides by Mith Marus, whih were adapted from slides by Massimo Poesio, whih were adapted from slides by Chris Manning : Eample: Is this spam?
More informationINFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from
INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,
More informationDistributed Rule-Based Inference in the Presence of Redundant Information
istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationOutline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding
Outline EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Error detection using arity Hamming code for error detection/correction Linear Feedback Shift
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More informationSolved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.
Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationAn AI-ish view of Probability, Conditional Probability & Bayes Theorem
An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there
More information10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.
An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there
More informationAn Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit
More informationGenerative Models. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University
Generative Models CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell, Chapter 6.9-6.10 Duda, Hart & Stork, Pages 20-39 Bayes decision rule Bayes theorem Generative
More informationEconomics 101. Lecture 7 - Monopoly and Oligopoly
Economics 0 Lecture 7 - Monooly and Oligooly Production Equilibrium After having exlored Walrasian equilibria with roduction in the Robinson Crusoe economy, we will now ste in to a more general setting.
More informationNotes on Instrumental Variables Methods
Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of
More informationGraphical Models (Lecture 1 - Introduction)
Grahical Models Lecture - Introduction Tibério Caetano tiberiocaetano.com Statistical Machine Learning Grou NICTA Canberra LLSS Canberra 009 Tibério Caetano: Grahical Models Lecture - Introduction / 7
More informationChapter 7 Sampling and Sampling Distributions. Introduction. Selecting a Sample. Introduction. Sampling from a Finite Population
Chater 7 and s Selecting a Samle Point Estimation Introduction to s of Proerties of Point Estimators Other Methods Introduction An element is the entity on which data are collected. A oulation is a collection
More informationLearning Sequence Motif Models Using Gibbs Sampling
Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under
More informationSTA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2
STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous
More informationUniversal Finite Memory Coding of Binary Sequences
Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv
More informationUse of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek
Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.
More information7.2 Inference for comparing means of two populations where the samples are independent
Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht
More informationSlides Prepared by JOHN S. LOUCKS St. Edward s s University Thomson/South-Western. Slide
s Preared by JOHN S. LOUCKS St. Edward s s University 1 Chater 11 Comarisons Involving Proortions and a Test of Indeendence Inferences About the Difference Between Two Poulation Proortions Hyothesis Test
More informationModel checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]
Chater 5 Model checking, verification of CTL One must verify or exel... doubts, and convert them into the certainty of YES or NO. [Thomas Carlyle] 5. The verification setting Page 66 We introduce linear
More informationLecture 7 Machine Learning for IR. Many thanks to Prabhakar Raghavan for sharing most content from the following slides
Lecture 7 Machine Learning for IR Many thanks to Prabhakar Raghavan for sharing most content from the following slides Relevance Feedback Relevance feedback: user feedback on relevance of docs in initial
More informationProbability Estimates for Multi-class Classification by Pairwise Coupling
Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics
More informationText classification II CE-324: Modern Information Retrieval Sharif University of Technology
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationAsymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..
IOSR Journal of Mathematics (IOSR-JM) e-issn: 78-578, -ISSN: 319-765X. Volume 1, Issue 4 Ver. III (Jul. - Aug.016), PP 53-60 www.iosrournals.org Asymtotic Proerties of the Markov Chain Model method of
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationJohn Weatherwax. Analysis of Parallel Depth First Search Algorithms
Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel
More informationUsing the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process
Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund
More informationECE 534 Information Theory - Midterm 2
ECE 534 Information Theory - Midterm Nov.4, 009. 3:30-4:45 in LH03. You will be given the full class time: 75 minutes. Use it wisely! Many of the roblems have short answers; try to find shortcuts. You
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More information2. Sample representativeness. That means some type of probability/random sampling.
1 Neuendorf Cluster Analysis Assumes: 1. Actually, any level of measurement (nominal, ordinal, interval/ratio) is accetable for certain tyes of clustering. The tyical methods, though, require metric (I/R)
More informationwhere x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.
More informationClassification Algorithms
Classification Algorithms UCSB 290N, 2015. T. Yang Slides based on R. Mooney UT Austin 1 Table of Content roblem Definition Rocchio K-nearest neighbor case based Bayesian algorithm Decision trees 2 Given:
More informationText classification II CE-324: Modern Information Retrieval Sharif University of Technology
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)
More informationCHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit
Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationPlotting the Wilson distribution
, Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion
More informationSQUARES IN Z/NZ. q = ( 1) (p 1)(q 1)
SQUARES I Z/Z We study squares in the ring Z/Z from a theoretical and comutational oint of view. We resent two related crytograhic schemes. 1. SQUARES I Z/Z Consider for eamle the rime = 13. Write the
More informationMODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL
Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management
More informationBehavioral Data Mining. Lecture 3 Naïve Bayes Classifier and Generalized Linear Models
Behavioral Data Mining Lecture 3 Naïve Bayes Classifier and Generalized Linear Models Outline Naïve Bayes Classifier Regularization in Linear Regression Generalized Linear Models Assignment Tips: Matrix
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationStatistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform
Statistics II Logistic Regression Çağrı Çöltekin Exam date & time: June 21, 10:00 13:00 (The same day/time lanned at the beginning of the semester) University of Groningen, Det of Information Science May
More informationBayesian Networks Practice
Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation
More informationClassical AI and ML research ignored this phenomena Another example
Wat is tis? Classical AI and ML researc ignored tis enomena Anoter eamle you want to catc a fligt at 0:00am from Pitt to SF, can I make it if I leave at 8am and take a Marta at Gatec? artial observability
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 11: Probabilistic Information Retrieval 1 Outline Basic Probability Theory Probability Ranking Principle Extensions 2 Basic Probability Theory For events A
More informationCharacterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations
Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations PINAR KORKMAZ, BILGE E. S. AKGUL and KRISHNA V. PALEM Georgia Institute of
More informationPositive decomposition of transfer functions with multiple poles
Positive decomosition of transfer functions with multile oles Béla Nagy 1, Máté Matolcsi 2, and Márta Szilvási 1 Deartment of Analysis, Technical University of Budaest (BME), H-1111, Budaest, Egry J. u.
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More informationSystem Reliability Estimation and Confidence Regions from Subsystem and Full System Tests
009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract
More informationLecture 21: Quantum Communication
CS 880: Quantum Information Processing 0/6/00 Lecture : Quantum Communication Instructor: Dieter van Melkebeek Scribe: Mark Wellons Last lecture, we introduced the EPR airs which we will use in this lecture
More informationNaive Bayesian Text Classifier Based on Different Probability Model
aive Bayesian Text Classifier Based on Different Probability Model,,3 LV Pin, Zhong Luo College of Computer Science and Technology, Wuhan University of Technology, Wuhan,China, lpwhict@63.com,zhongluo.netease.com
More informationNaïve Bayes. Vibhav Gogate The University of Texas at Dallas
Naïve Bayes Vibhav Gogate The University of Texas at Dallas Supervised Learning of Classifiers Find f Given: Training set {(x i, y i ) i = 1 n} Find: A good approximation to f : X Y Examples: what are
More informationMATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK
Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment
More informationProbability Review and Naïve Bayes
Probability Review and Naïve Bayes Instructor: Alan Ritter Some slides adapted from Dan Jurfasky and Brendan O connor What is Probability? The probability the coin will land heads is 0.5 Q: what does this
More informationPV211: Introduction to Information Retrieval
PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 11: Probabilistic Information Retrieval Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk
More informationNaïve Bayes Classifiers
Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB
More informationLecture 7: Linear Classification Methods
Homeork Homeork Lecture 7: Linear lassification Methods Final rojects? Grous oics Proosal eek 5 Lecture is oster session, Jacobs Hall Lobby, snacks Final reort 5 June. What is linear classification? lassification
More informationChapter 10. Supplemental Text Material
Chater 1. Sulemental Tet Material S1-1. The Covariance Matri of the Regression Coefficients In Section 1-3 of the tetbook, we show that the least squares estimator of β in the linear regression model y=
More informationThe Poisson Regression Model
The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle
More informationBayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder
Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly rovided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov
More informationNumerical Linear Algebra
Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval http://informationretrieval.org IIR 13: Text Classification & Naive Bayes Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-05-15
More informationReal Analysis 1 Fall Homework 3. a n.
eal Analysis Fall 06 Homework 3. Let and consider the measure sace N, P, µ, where µ is counting measure. That is, if N, then µ equals the number of elements in if is finite; µ = otherwise. One usually
More informationStatics and dynamics: some elementary concepts
1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and
More informationEstimation of the large covariance matrix with two-step monotone missing data
Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo
More informationNUMERICAL AND THEORETICAL INVESTIGATIONS ON DETONATION- INERT CONFINEMENT INTERACTIONS
NUMERICAL AND THEORETICAL INVESTIGATIONS ON DETONATION- INERT CONFINEMENT INTERACTIONS Tariq D. Aslam and John B. Bdzil Los Alamos National Laboratory Los Alamos, NM 87545 hone: 1-55-667-1367, fax: 1-55-667-6372
More informationComputer arithmetic. Intensive Computation. Annalisa Massini 2017/2018
Comuter arithmetic Intensive Comutation Annalisa Massini 7/8 Intensive Comutation - 7/8 References Comuter Architecture - A Quantitative Aroach Hennessy Patterson Aendix J Intensive Comutation - 7/8 3
More informationEvaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models
Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu
More information8 STOCHASTIC PROCESSES
8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular
More informationOne-way ANOVA Inference for one-way ANOVA
One-way ANOVA Inference for one-way ANOVA IPS Chater 12.1 2009 W.H. Freeman and Comany Objectives (IPS Chater 12.1) Inference for one-way ANOVA Comaring means The two-samle t statistic An overview of ANOVA
More informationThe one-sample t test for a population mean
Objectives Constructing and assessing hyotheses The t-statistic and the P-value Statistical significance The one-samle t test for a oulation mean One-sided versus two-sided tests Further reading: OS3,
More informationBayesian Networks for Modeling and Managing Risks of Natural Hazards
[National Telford Institute and Scottish Informatics and Comuter Science Alliance, Glasgow University, Set 8, 200 ] Bayesian Networks for Modeling and Managing Risks of Natural Hazards Daniel Straub Engineering
More informationLecture 1.2 Units, Dimensions, Estimations 1. Units To measure a quantity in physics means to compare it with a standard. Since there are many
Lecture. Units, Dimensions, Estimations. Units To measure a quantity in hysics means to comare it with a standard. Since there are many different quantities in nature, it should be many standards for those
More informationLecture 3: Probabilistic Retrieval Models
Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme
More informationCategorization ANLP Lecture 10 Text Categorization with Naive Bayes
1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition
More informationANLP Lecture 10 Text Categorization with Naive Bayes
ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition
More information