Clustering: K-Means. Machine Learning , Fall Bhavana Dalvi Mishra PhD student LTI, CMU

Size: px
Start display at page:

Download "Clustering: K-Means. Machine Learning , Fall Bhavana Dalvi Mishra PhD student LTI, CMU"

Transcription

1 Clusterg: K-Meas Mache Learg 0-60, Fall 204 Bhavaa Dalv Mshra PhD studet LTI, CMU Sldes are based o materals from Prof. Erc Xg, Prof. Wllam Cohe ad Prof. Adrew Ng

2 Outle What s clusterg? How are smlarty measures defed? Dfferet clusterg algorthms K-Meas Gaussa Mture Models Epectato Mamato Advaced topcs How to seed clusterg? How to choose #clusters Applcato: Gloss fdg for a Kowledge Base 2

3 3 Clusterg

4 Classfcato vs. Clusterg Supervso avalable Usupervsed Learg from supervsed data: eample classfcatos are gve Usupervsed learg: learg from raw ulabeled data 4

5 Clusterg The process of groupg a set of objects to clusters hgh tra-cluster smlarty low ter-cluster smlarty How may clusters? How to detfy them? 5

6 Applcatos of Clusterg Google ews: Clusters ews stores from dfferet sources about same evet. Computatoal bology: Group gees that perform the same fuctos Socal meda aalyss: Group dvduals that have smlar poltcal vews Computer graphcs: Idetfy smlar objects from pctures 6

7 Eamples People Images Speces 7

8 What s a atural groupg amog these objects? 8

9 9 Smlarty Measures

10 What s Smlarty? Hard to defe! But we ow t whe we see t 0 The real meag of smlarty s a phlosophcal questo. Depeds o represetato ad algorthm. For may rep./alg., easer to th terms of a dstace rather tha smlarty betwee vectors.

11 Itutos behd desrable dstace measure propertes DA,B = DB,A Symmetry Otherwse you could clam "Ale loos le Bob, but Bob loos othg le Ale" DA,A = 0 Costacy of Self-Smlarty Otherwse you could clam "Ale loos more le Bob, tha Bob does" DA,B = 0 IIf A= B Idetty of dscerblesects your world that are dfferet, but you caot tell apart. DA,B DA,C + DB,C Tragular Iequalty Otherwse you could clam "Ale s very le Bob, ad Ale s very le Carl, but Bob s very ule Carl"

12 Itutos behd desrable dstace measure propertes DA,B = DB,A Symmetry Otherwse you could clam "Ale loos le Bob, but Bob loos othg le Ale" DA,A = 0 Costacy of Self-Smlarty Otherwse you could clam "Ale loos more le Bob, tha Bob does" DA,B = 0 IIf A= B Idetty of dscerbles Otherwse there are objects your world that are dfferet, but you caot tell apart. DA,B DA,C + DB,C Tragular Iequalty Otherwse you could clam "Ale s very le Bob, ad Ale s very le Carl, but Bob s very ule Carl" 2

13 Dstace Measures: Mows Metrc 3 Suppose two object ad y both have p features The Mows metrc s defed by Most Commo Mows Metrcs r p r y y d,,,,,,, 2 2 p p y y y y ma, dstace "sup", Mahatta dstace, Eucldea dstace p p p y y d r y y d r y y d r

14 A Eample 3 y 4 : 2 : 3: Eucldea dstace : Mahatta dstace : "sup" dstace : ma{ 4, 3} 4. 4

15 Hammg dstace Mahatta dstace s called Hammg dstace whe all features are bary. Gee Epresso Levels Uder 7 Codtos -Hgh,0-Low GeeA GeeB Hammg Dstace : # 0 #

16 Smlarty Measures: Correlato Coeffcet Negatvely correlated Epresso Level Epresso Level Ucorrelated Gee A Gee B 3 Gee B Gee A Tme Tme 6 Epresso Level 2 Tme Postvely correlated Gee B Gee A

17 Smlarty Measures: Correlato Coeffcet 7 Pearso correlato coeffcet Specal case: cose dstace. ad where, 2 2 p p p p p p p y y y y y y y s, y s y y y s,

18 Clusterg Algorthm K-Meas 8

19 9 K-meas Clusterg: Step

20 20 K-meas Clusterg: Step 2

21 2 K-meas Clusterg: Step 3

22 22 K-meas Clusterg: Step 4

23 23 K-meas Clusterg: Step 5

24 K-Meas: Algorthm. Decde o a value for. 2. Itale the cluster ceters radomly f ecessary. 3. Repeat tll ay object chages ts cluster assgmet Decde the cluster membershps of the N objects by assgg them to the earest cluster cetrod cluster arg m j d, j Re-estmate the cluster ceters, by assumg the membershps foud above are correct. 24

25 K-Meas s wdely used practce Etremely fast ad scalable: used varety of applcatos Ca be easly paralleled Easy Map-Reduce mplemetato Mapper: assgs each datapot to earest cluster Reducer: taes all pots assged to a cluster, ad re-computes the cetrods Sestve to startg pots or radom seed talato Smlar to Neural etwors There are etesos le K-Meas++ that try to solve ths problem 25

26 26 Outlers

27 Clusterg Algorthm Gaussa Mture Model 27

28 Desty estmato Estmate desty fucto P gve ulabeled datapots X to X 28 A arcraft testg faclty measures Heat ad Vbrato parameters for every ewly bult arcraft.

29 29 Mture of Gaussas

30 Mture Models A desty model p may be mult-modal. We may be able to model t as a mture of u-modal dstrbutos e.g., Gaussas. Each mode may correspod to a dfferet sub-populato e.g., male ad female. 30

31 Gaussa Mture Models GMMs Cosder a mture of K Gaussa compoets: K p, N, mture proporto mture compoet 3 Ths model ca be used for usupervsed clusterg. Ths model ft by AutoClass has bee used to dscover ew ds of stars astroomcal data, etc.

32 Learg mture models 32 I fully observed d settgs, the log lelhood decomposes to a sum of local terms. Wth latet varables, all the parameters become coupled together va margalato, log log, log ; c p p p D l c p p p D, log, log ; l

33 If we are dog MLE for completely observed data Data log-lelhood MLE C N p p p D log, ; log log,, log, log ; 2 θ l MLE for GMM 33, ; arg ma ˆ, D MLE θ l ; arg ma ˆ, D MLE θ l ; arg ma ˆ, D MLE θ l, ˆ MLE ˆ datapots Number of, Z MLE Gaussa Naïve Bayes

34 34 Learg GMM s are uow

35 35 Epectato Mamato EM

36 Epectato-Mamato EM Start: "Guess" the mea ad covarace of each of the K gaussas Loop 36

37 37

38 Epectato-Mamato EM Start: "Guess" the cetrod ad covarace of each of the K clusters Loop 38

39 The Epectato-Mamato EM Algorthm 39 E Step: Guess values of Z s l t t t t j l P l p j P j p j p w,,,, t j t j N j p t Z P

40 The Epectato-Mamato EM Algorthm 40 # datapots t t w Z P t T t t t t w w t t t w w M Step: Update parameter estmates

41 EM Algorthm for GMM 4 E Step: Guess values of Z s l t t t t j l P l p j P j p j p w,,, t N w Z P t t T t t t t w w t t t w w M Step: Update parameter estmates

42 K-meas s a hard verso of EM 42 I the K-meas E-step we do hard assgmet: I the K-meas M-step we update the meas as the weghted sum of the data, but ow the weghts are 0 or : arg m t t T t t t t t,,

43 Soft vs. Hard EM assgmets GMM K-Meas 43

44 Theory uderlyg EM What are we dog? Recall that accordg to MLE, we ted to lear the model parameters that would mame the lelhood of the data. But we do ot observe, so computg s dffcult! What shall we do? l ; D log p, log p p, c 44

45 45 Ituto behd the EM algorthm

46 Jese s Iequalty For a cove fucto f fε[ ] [f] Smlarly, for a cocave fucto f fε[ ] [f] 46

47 Jese s Iequalty: cocave f fε[ ] [f] 47

48 EM ad Jese s Iequalty fε[ ] [f] 48

49 49 Advaced Topcs

50 How May Clusters? Number of clusters K s gve Partto documets to predetermed #topcs Solve a optmato problem: peale #clusters Iformato theoretc approaches: AIC, BIC crtera for model selecto Tradeoff betwee havg clearly separable clusters ad havg too may clusters 50

51 Seed Choce: K-Meas++ K-Meas results ca vary based o radom seed selecto. K-Meas++ Choose oe ceter uformly at radom amog gve datapots. For each data pot, compute D D = dstace, earest ceter Choose oe ew data pot at radom as a ew ceter P D 2. Repeat Steps 2 ad 3 utl ceters have bee chose. Ru stadard K-Meas wth ths cetrod talato. 5

52 52 Sem-supervsed K-Meas

53 Supervsed Learg Usupervsed Learg Sem-supervsed Learg 53

54 Automatc Gloss Fdg for a Kowledge Base Glosses: Natural laguage deftos of amed ettes. E.g. Mcrosoft s a Amerca multatoal corporato headquartered Redmod that develops, maufactures, lceses, supports ad sells computer software, cosumer electrocs ad persoal computers ad servces... Iput: Kowledge Base.e. a set of cocepts e.g. compay ad ettes belogg to those cocepts e.g. Mcrosoft, ad a set of potetal glosses. Output: Caddate glosses matched to relevat ettes the KB. Mcrosoft s a Amerca multatoal corporato headquartered Redmod s mapped to etty Mcrosoft of type Compay. [Automatc Gloss Fdg for a Kowledge Base usg Otologcal Costrats, Bhavaa Dalv Mshra, Eat Mov, Partha Pratm Taludar, ad Wllam W. Cohe, 204, Uder submsso] 54

55 55 Eample: Gloss fdg

56 56 Eample: Gloss fdg

57 57 Eample: Gloss fdg

58 58 Eample: Gloss fdg

59 Trag a clusterg model Frut Compay Test: Ambguous glosses 59 Tra: Uambguous glosses

60 60 GLOFIN: Clusterg glosses

61 6 GLOFIN: Clusterg glosses

62 62 GLOFIN: Clusterg glosses

63 63 GLOFIN: Clusterg glosses

64 64 GLOFIN: Clusterg glosses

65 65 GLOFIN: Clusterg glosses

66 GLOFIN o NELL Dataset SVM Labal Propagato GLOFIN 0 0 Precso Recall F categores, 247K caddate glosses, #tra=20k, #test=227k

67 GLOFIN o Freebase Dataset Precso Recall F SVM Labal Propagato GLOFIN categores, 285K caddate glosses, #tra=25k, #test=260k

68 Summary What s clusterg? What are smlarty measures? K-Meas clusterg algorthm Mture of Gaussas GMM Epectato Mamato Advaced Topcs How to seed clusterg How to decde #clusters Applcato: Gloss fdg for a Kowledge Bases 68

69 Tha You Questos? 69

Unsupervised Learning and Other Neural Networks

Unsupervised Learning and Other Neural Networks CSE 53 Soft Computg NOT PART OF THE FINAL Usupervsed Learg ad Other Neural Networs Itroducto Mture Destes ad Idetfablty ML Estmates Applcato to Normal Mtures Other Neural Networs Itroducto Prevously, all

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

Generative classification models

Generative classification models CS 75 Mache Learg Lecture Geeratve classfcato models Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Data: D { d, d,.., d} d, Classfcato represets a dscrete class value Goal: lear f : X Y Bar classfcato

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Dimensionality reduction Feature selection

Dimensionality reduction Feature selection CS 750 Mache Learg Lecture 3 Dmesoalty reducto Feature selecto Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 750 Mache Learg Dmesoalty reducto. Motvato. Classfcato problem eample: We have a put data

More information

Kernel-based Methods and Support Vector Machines

Kernel-based Methods and Support Vector Machines Kerel-based Methods ad Support Vector Maches Larr Holder CptS 570 Mache Learg School of Electrcal Egeerg ad Computer Scece Washgto State Uverst Refereces Muller et al. A Itroducto to Kerel-Based Learg

More information

Machine Learning. Topic 4: Measuring Distance

Machine Learning. Topic 4: Measuring Distance Mache Learg Topc 4: Measurg Dstace Bra Pardo Mache Learg: EECS 349 Fall 2009 Wh measure dstace? Clusterg requres dstace measures. Local methods requre a measure of localt Search eges requre a measure of

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

KLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames

KLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames KLT Tracker Tracker. Detect Harrs corers the frst frame 2. For each Harrs corer compute moto betwee cosecutve frames (Algmet). 3. Lk moto vectors successve frames to get a track 4. Itroduce ew Harrs pots

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

6. Nonparametric techniques

6. Nonparametric techniques 6. Noparametrc techques Motvato Problem: how to decde o a sutable model (e.g. whch type of Gaussa) Idea: just use the orgal data (lazy learg) 2 Idea 1: each data pot represets a pece of probablty P(x)

More information

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines CS 675 Itroducto to Mache Learg Lecture Support vector maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Mdterm eam October 9, 7 I-class eam Closed book Stud materal: Lecture otes Correspodg chapters

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design Authors: Pradp Basak, Kaustav Adtya, Hukum Chadra ad U.C. Sud Applcato of Calbrato Approach for Regresso Coeffcet Estmato uder Two-stage Samplg Desg Pradp Basak, Kaustav Adtya, Hukum Chadra ad U.C. Sud

More information

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic Covergece of the Desrozers scheme ad ts relato to the lag ovato dagostc chard Méard Evromet Caada, Ar Qualty esearch Dvso World Weather Ope Scece Coferece Motreal, August 9, 04 o t t O x x x y x y Oservato

More information

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

STA 105-M BASIC STATISTICS (This is a multiple choice paper.) DCDM BUSINESS SCHOOL September Mock Eamatos STA 0-M BASIC STATISTICS (Ths s a multple choce paper.) Tme: hours 0 mutes INSTRUCTIONS TO CANDIDATES Do ot ope ths questo paper utl you have bee told to do

More information

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall

More information

Lecture 9: Tolerant Testing

Lecture 9: Tolerant Testing Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

Nonparametric Techniques

Nonparametric Techniques Noparametrc Techques Noparametrc Techques w/o assumg ay partcular dstrbuto the uderlyg fucto may ot be kow e.g. mult-modal destes too may parameters Estmatg desty dstrbuto drectly Trasform to a lower-dmesoal

More information

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn: Chapter 3 3- Busess Statstcs: A Frst Course Ffth Edto Chapter 2 Correlato ad Smple Lear Regresso Busess Statstcs: A Frst Course, 5e 29 Pretce-Hall, Ic. Chap 2- Learg Objectves I ths chapter, you lear:

More information

Chapter 14 Logistic Regression Models

Chapter 14 Logistic Regression Models Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as

More information

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x CS 75 Mache Learg Lecture 8 Lear regresso Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Lear regresso Fucto f : X Y s a lear combato of put compoets f + + + K d d K k - parameters

More information

1 Onto functions and bijections Applications to Counting

1 Onto functions and bijections Applications to Counting 1 Oto fuctos ad bectos Applcatos to Coutg Now we move o to a ew topc. Defto 1.1 (Surecto. A fucto f : A B s sad to be surectve or oto f for each b B there s some a A so that f(a B. What are examples of

More information

3. Basic Concepts: Consequences and Properties

3. Basic Concepts: Consequences and Properties : 3. Basc Cocepts: Cosequeces ad Propertes Markku Jutt Overvew More advaced cosequeces ad propertes of the basc cocepts troduced the prevous lecture are derved. Source The materal s maly based o Sectos.6.8

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Radial Basis Function Networks

Radial Basis Function Networks Radal Bass Fucto Netorks Radal Bass Fucto Netorks A specal types of ANN that have three layers Iput layer Hdde layer Output layer Mappg from put to hdde layer s olear Mappg from hdde to output layer s

More information

An Introduction to. Support Vector Machine

An Introduction to. Support Vector Machine A Itroducto to Support Vector Mache Support Vector Mache (SVM) A classfer derved from statstcal learg theory by Vapk, et al. 99 SVM became famous whe, usg mages as put, t gave accuracy comparable to eural-etwork

More information

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I Chapter 8 Heterosedastcty Recall MLR 5 Homsedastcty error u has the same varace gve ay values of the eplaatory varables Varu,..., = or EUU = I Suppose other GM assumptos hold but have heterosedastcty.

More information

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture) CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

Generalization of the Dissimilarity Measure of Fuzzy Sets

Generalization of the Dissimilarity Measure of Fuzzy Sets Iteratoal Mathematcal Forum 2 2007 o. 68 3395-3400 Geeralzato of the Dssmlarty Measure of Fuzzy Sets Faramarz Faghh Boformatcs Laboratory Naobotechology Research Ceter vesa Research Isttute CECR Tehra

More information

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations

13. Parametric and Non-Parametric Uncertainties, Radial Basis Functions and Neural Network Approximations Lecture 7 3. Parametrc ad No-Parametrc Ucertates, Radal Bass Fuctos ad Neural Network Approxmatos he parameter estmato algorthms descrbed prevous sectos were based o the assumpto that the system ucertates

More information

STA302/1001-Fall 2008 Midterm Test October 21, 2008

STA302/1001-Fall 2008 Midterm Test October 21, 2008 STA3/-Fall 8 Mdterm Test October, 8 Last Name: Frst Name: Studet Number: Erolled (Crcle oe) STA3 STA INSTRUCTIONS Tme allowed: hour 45 mutes Ads allowed: A o-programmable calculator A table of values from

More information

Special Instructions / Useful Data

Special Instructions / Useful Data JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth

More information

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data

More information

Binary classification: Support Vector Machines

Binary classification: Support Vector Machines CS 57 Itroducto to AI Lecture 6 Bar classfcato: Support Vector Maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 57 Itro to AI Supervsed learg Data: D { D, D,.., D} a set of eamples D, (,,,,,

More information

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis) We have covered: Selecto, Iserto, Mergesort, Bubblesort, Heapsort Next: Selecto the Qucksort The Selecto Problem - Varable Sze Decrease/Coquer (Practce wth algorthm aalyss) Cosder the problem of fdg the

More information

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations HP 30S Statstcs Averages ad Stadard Devatos Average ad Stadard Devato Practce Fdg Averages ad Stadard Devatos HP 30S Statstcs Averages ad Stadard Devatos Average ad stadard devato The HP 30S provdes several

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA THE ROYAL STATISTICAL SOCIETY EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER II STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

Chapter 3 Sampling For Proportions and Percentages

Chapter 3 Sampling For Proportions and Percentages Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys

More information

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression Overvew Basc cocepts of Bayesa learg Most probable model gve data Co tosses Lear regresso Logstc regresso Bayesa predctos Co tosses Lear regresso 30 Recap: regresso problems Iput to learg problem: trag

More information

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem CS86. Lecture 4: Dur s Proof of the PCP Theorem Scrbe: Thom Bohdaowcz Prevously, we have prove a weak verso of the PCP theorem: NP PCP 1,1/ (r = poly, q = O(1)). Wth ths result we have the desred costat

More information

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR Pot Patter Aalyss Part I Outle Revst IRP/CSR, frst- ad secod order effects What s pot patter aalyss (PPA)? Desty-based pot patter measures Dstace-based pot patter measures Revst IRP/CSR Equal probablty:

More information

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation CS 750 Mache Learg Lecture 5 esty estmato Mlos Hausrecht mlos@tt.edu 539 Seott Square esty estmato esty estmato: s a usuervsed learg roblem Goal: Lear a model that rereset the relatos amog attrbutes the

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 17

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 17 CS434a/54a: Patter Recogto Prof. Olga Vesler Lecture 7 Today Paraetrc Usupervsed Learg Expectato Maxato (EM) oe of the ost useful statstcal ethods oldest verso 958 (Hartley) seal paper 977 (Depster et

More information

Support vector machines

Support vector machines CS 75 Mache Learg Lecture Support vector maches Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square CS 75 Mache Learg Outle Outle: Algorthms for lear decso boudary Support vector maches Mamum marg hyperplae.

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1 STA 08 Appled Lear Models: Regresso Aalyss Sprg 0 Soluto for Homework #. Let Y the dollar cost per year, X the umber of vsts per year. The the mathematcal relato betwee X ad Y s: Y 300 + X. Ths s a fuctoal

More information

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier arametrc Dest Estmato: Baesa Estmato. Naïve Baes Classfer Baesa arameter Estmato Suppose we have some dea of the rage where parameters θ should be Should t we formalze such pror owledge hopes that t wll

More information

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab Lear Regresso Lear Regresso th Shrkage Some sldes are due to Tomm Jaakkola, MIT AI Lab Itroducto The goal of regresso s to make quattatve real valued predctos o the bass of a vector of features or attrbutes.

More information

Descriptive Statistics

Descriptive Statistics Page Techcal Math II Descrptve Statstcs Descrptve Statstcs Descrptve statstcs s the body of methods used to represet ad summarze sets of data. A descrpto of how a set of measuremets (for eample, people

More information

Statistics MINITAB - Lab 5

Statistics MINITAB - Lab 5 Statstcs 10010 MINITAB - Lab 5 PART I: The Correlato Coeffcet Qute ofte statstcs we are preseted wth data that suggests that a lear relatoshp exsts betwee two varables. For example the plot below s of

More information

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S. Bg Data Aaltcs Data Fttg ad Samplg Ackowledgemet: Notes b Profs. R. Szelsk, S. Setz, S. Lazebk, K. Chaturved, ad S. Shah Fttg: Cocepts ad recpes A bag of techques If we kow whch pots belog to the le, how

More information

Continuous Distributions

Continuous Distributions 7//3 Cotuous Dstrbutos Radom Varables of the Cotuous Type Desty Curve Percet Desty fucto, f (x) A smooth curve that ft the dstrbuto 3 4 5 6 7 8 9 Test scores Desty Curve Percet Probablty Desty Fucto, f

More information

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION Iteratoal Joural of Mathematcs ad Statstcs Studes Vol.4, No.3, pp.5-39, Jue 06 Publshed by Europea Cetre for Research Trag ad Developmet UK (www.eajourals.org BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL

More information

Section l h l Stem=Tens. 8l Leaf=Ones. 8h l 03. 9h 58

Section l h l Stem=Tens. 8l Leaf=Ones. 8h l 03. 9h 58 Secto.. 6l 34 6h 667899 7l 44 7h Stem=Tes 8l 344 Leaf=Oes 8h 5557899 9l 3 9h 58 Ths dsplay brgs out the gap the data: There are o scores the hgh 7's. 6. a. beams cylders 9 5 8 88533 6 6 98877643 7 488

More information

Bayes Decision Theory - II

Bayes Decision Theory - II Bayes Decso Theory - II Ke Kreutz-Delgado (Nuo Vascocelos) ECE 175 Wter 2012 - UCSD Nearest Neghbor Classfer We are cosderg supervsed classfcato Nearest Neghbor (NN) Classfer A trag set D = {(x 1,y 1 ),,

More information

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty

More information

Dr. Shalabh. Indian Institute of Technology Kanpur

Dr. Shalabh. Indian Institute of Technology Kanpur Aalyss of Varace ad Desg of Expermets-I MODULE -I LECTURE - SOME RESULTS ON LINEAR ALGEBRA, MATRIX THEORY AND DISTRIBUTIONS Dr. Shalabh Departmet t of Mathematcs t ad Statstcs t t Ida Isttute of Techology

More information

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018 /3/08 Sstems & Bomedcal Egeerg Departmet SBE 304: Bo-Statstcs Smple Lear Regresso ad Correlato Dr. Ama Eldeb Fall 07 Descrptve Orgasg, summarsg & descrbg data Statstcs Correlatoal Relatoshps Iferetal Geeralsg

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality

= 1. UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Parameters and Statistics. Measures of Centrality UCLA STAT Itroducto to Statstcal Methods for the Lfe ad Health Sceces Istructor: Ivo Dov, Asst. Prof. of Statstcs ad Neurology Teachg Assstats: Fred Phoa, Krste Johso, Mg Zheg & Matlda Hseh Uversty of

More information

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura Statstcs Descrptve ad Iferetal Statstcs Istructor: Dasuke Nagakura (agakura@z7.keo.jp) 1 Today s topc Today, I talk about two categores of statstcal aalyses, descrptve statstcs ad feretal statstcs, ad

More information

Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18

Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18 Mache Learg The feld of mache learg s cocered wth the questo of how to costruct computer programs that automatcally mprove wth eperece. (Mtchell, 1997) Thgs lear whe they chage ther behavor a way that

More information

EECE 301 Signals & Systems

EECE 301 Signals & Systems EECE 01 Sgals & Systems Prof. Mark Fowler Note Set #9 Computg D-T Covoluto Readg Assgmet: Secto. of Kame ad Heck 1/ Course Flow Dagram The arrows here show coceptual flow betwee deas. Note the parallel

More information

Dimensionality Reduction and Learning

Dimensionality Reduction and Learning CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that

More information

Some Applications of the Resampling Methods in Computational Physics

Some Applications of the Resampling Methods in Computational Physics Iteratoal Joural of Mathematcs Treds ad Techoloy Volume 6 February 04 Some Applcatos of the Resampl Methods Computatoal Physcs Sotraq Marko #, Lorec Ekoom * # Physcs Departmet, Uversty of Korca, Albaa,

More information

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov Iteratoal Boo Seres "Iformato Scece ad Computg" 97 MULTIIMNSIONAL HTROGNOUS VARIABL PRICTION BAS ON PRTS STATMNTS Geady Lbov Maxm Gerasmov Abstract: I the wors [ ] we proposed a approach of formg a cosesus

More information

Lecture 1. (Part II) The number of ways of partitioning n distinct objects into k distinct groups containing n 1,

Lecture 1. (Part II) The number of ways of partitioning n distinct objects into k distinct groups containing n 1, Lecture (Part II) Materals Covered Ths Lecture: Chapter 2 (2.6 --- 2.0) The umber of ways of parttog dstct obects to dstct groups cotag, 2,, obects, respectvely, where each obect appears exactly oe group

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

Module 7. Lecture 7: Statistical parameter estimation

Module 7. Lecture 7: Statistical parameter estimation Lecture 7: Statstcal parameter estmato Parameter Estmato Methods of Parameter Estmato 1) Method of Matchg Pots ) Method of Momets 3) Mamum Lkelhood method Populato Parameter Sample Parameter Ubased estmato

More information

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen. .5 x 54.5 a. x 7. 786 7 b. The raked observatos are: 7.4, 7.5, 7.7, 7.8, 7.9, 8.0, 8.. Sce the sample sze 7 s odd, the meda s the (+)/ 4 th raked observato, or meda 7.8 c. The cosumer would more lkely

More information

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line HUR Techcal Report 000--9 verso.05 / Frak Borg (borgbros@ett.f) A Study of the Reproducblty of Measuremets wth HUR Leg Eteso/Curl Research Le A mportat property of measuremets s that the results should

More information

Regresso What s a Model? 1. Ofte Descrbe Relatoshp betwee Varables 2. Types - Determstc Models (o radomess) - Probablstc Models (wth radomess) EPI 809/Sprg 2008 9 Determstc Models 1. Hypothesze

More information

Arithmetic Mean Suppose there is only a finite number N of items in the system of interest. Then the population arithmetic mean is

Arithmetic Mean Suppose there is only a finite number N of items in the system of interest. Then the population arithmetic mean is Topc : Probablty Theory Module : Descrptve Statstcs Measures of Locato Descrptve statstcs are measures of locato ad shape that perta to probablty dstrbutos The prmary measures of locato are the arthmetc

More information

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Mean is only appropriate for interval or ratio scales, not ordinal or nominal. Mea Same as ordary average Sum all the data values ad dvde by the sample sze. x = ( x + x +... + x Usg summato otato, we wrte ths as x = x = x = = ) x Mea s oly approprate for terval or rato scales, ot

More information

Chapter 13 Student Lecture Notes 13-1

Chapter 13 Student Lecture Notes 13-1 Chapter 3 Studet Lecture Notes 3- Basc Busess Statstcs (9 th Edto) Chapter 3 Smple Lear Regresso 4 Pretce-Hall, Ic. Chap 3- Chapter Topcs Types of Regresso Models Determg the Smple Lear Regresso Equato

More information

Median as a Weighted Arithmetic Mean of All Sample Observations

Median as a Weighted Arithmetic Mean of All Sample Observations Meda as a Weghted Arthmetc Mea of All Sample Observatos SK Mshra Dept. of Ecoomcs NEHU, Shllog (Ida). Itroducto: Iumerably may textbooks Statstcs explctly meto that oe of the weakesses (or propertes) of

More information

Lecture 8: Linear Regression

Lecture 8: Linear Regression Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE

More information

8.1 Hashing Algorithms

8.1 Hashing Algorithms CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega

More information

Support vector machines II

Support vector machines II CS 75 Mache Learg Lecture Support vector maches II Mlos Hauskrecht mlos@cs.ptt.edu 539 Seott Square Learl separable classes Learl separable classes: here s a hperplae that separates trag staces th o error

More information

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set. Addtoal Decrease ad Coquer Algorthms For combatoral problems we mght eed to geerate all permutatos, combatos, or subsets of a set. Geeratg Permutatos If we have a set f elemets: { a 1, a 2, a 3, a } the

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:

More information

Summary tables and charts

Summary tables and charts Data Aalyss Summary tables ad charts. Orgazg umercal data: Hstograms ad frequecy tables I ths lecture, we wll study descrptve statstcs. By descrptve statstcs, we refer to methods volvg the collecto, presetato,

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

CSE 5526: Introduction to Neural Networks Linear Regression

CSE 5526: Introduction to Neural Networks Linear Regression CSE 556: Itroducto to Neural Netorks Lear Regresso Part II 1 Problem statemet Part II Problem statemet Part II 3 Lear regresso th oe varable Gve a set of N pars of data , appromate d by a lear fucto

More information

Quantitative analysis requires : sound knowledge of chemistry : possibility of interferences WHY do we need to use STATISTICS in Anal. Chem.?

Quantitative analysis requires : sound knowledge of chemistry : possibility of interferences WHY do we need to use STATISTICS in Anal. Chem.? Ch 4. Statstcs 4.1 Quattatve aalyss requres : soud kowledge of chemstry : possblty of terfereces WHY do we eed to use STATISTICS Aal. Chem.? ucertaty ests. wll we accept ucertaty always? f ot, from how

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Tema 5: Aprendizaje NO Supervisado: CLUSTERING Unsupervised Learning: CLUSTERING. Febrero-Mayo 2005

Tema 5: Aprendizaje NO Supervisado: CLUSTERING Unsupervised Learning: CLUSTERING. Febrero-Mayo 2005 Tema 5: Apredzae NO Supervsado: CLUSTERING Usupervsed Learg: CLUSTERING Febrero-Mayo 2005 SUPERVISED METHODS: LABELED Data Base Labeled Data Base Dvded to Tra ad Test Choose Algorthm: MAP, ML, K-Nearest

More information

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS SPECIAL CONSIDERAIONS FOR VOLUMERIC Z-ES FOR PROPORIONS Oe s stctve reacto to the questo of whether two percetages are sgfcatly dfferet from each other s to treat them as f they were proportos whch the

More information