Machine Learning. knowledge acquisition skill refinement. Relation between machine learning and data mining. P. Berka, /18

Similar documents
STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Bayes (Naïve or not) Classifiers: Generative Approach

CHAPTER VI Statistical Analysis of Experimental Data

CSE 5526: Introduction to Neural Networks Linear Regression

Supervised learning: Linear regression Logistic regression

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

An Introduction to. Support Vector Machine

Unsupervised Learning and Other Neural Networks

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Regression and the LMS Algorithm

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Support vector machines

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov

Simple Linear Regression

Generative classification models

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Dimensionality reduction Feature selection

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

MEASURES OF DISPERSION

ESS Line Fitting

Lecture 8: Linear Regression

Correlation and Regression Analysis

Summary of the lecture in Biostatistics

CHAPTER 4 RADICAL EXPRESSIONS

A conic cutting surface method for linear-quadraticsemidefinite

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Objectives of Multiple Regression

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

COMPROMISE HYPERSPHERE FOR STOCHASTIC DOMINANCE MODEL

A tighter lower bound on the circuit size of the hardest Boolean functions

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

About a Fuzzy Distance between Two Fuzzy Partitions and Application in Attribute Reduction Problem

Kernel-based Methods and Support Vector Machines

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Chapter 14 Logistic Regression Models

Lecture Notes Types of economic variables

Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Multiple Choice Test. Chapter Adequacy of Models for Regression

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Binary classification: Support Vector Machines

Chapter 9 Jordan Block Matrices

L5 Polynomial / Spline Curves

Functions of Random Variables

Ideal multigrades with trigonometric coefficients

Evaluation of uncertainty in measurements

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Beam Warming Second-Order Upwind Method

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

Lecture 5: Interpolation. Polynomial interpolation Rational approximation

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

PTAS for Bin-Packing

ABOUT ONE APPROACH TO APPROXIMATION OF CONTINUOUS FUNCTION BY THREE-LAYERED NEURAL NETWORK


f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

( ) 2 2. Multi-Layer Refraction Problem Rafael Espericueta, Bakersfield College, November, 2006

Generalized Linear Regression with Regularization

ECON 5360 Class Notes GMM

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

LINEAR REGRESSION ANALYSIS

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

arxiv: v1 [cs.lg] 22 Feb 2015

EECE 301 Signals & Systems

Introduction to local (nonparametric) density estimation. methods

ENGI 3423 Simple Linear Regression Page 12-01

AN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

MA/CSSE 473 Day 27. Dynamic programming

Support vector machines II

Maximum Likelihood Estimation

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

8.1 Hashing Algorithms

Lecture 12 APPROXIMATION OF FIRST ORDER DERIVATIVES

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

: At least two means differ SST

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

6.867 Machine Learning

STK4011 and STK9011 Autumn 2016

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Chapter Statistics Background of Regression Analysis

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Point Estimation: definition of estimators

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Simple Linear Regression

Lecture 07: Poles and Zeros

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Chapter 13 Student Lecture Notes 13-1

Third handout: On the Gini Index

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Econometric Methods. Review of Estimation

Principal Components. Analysis. Basic Intuition. A Method of Self Organized Learning

1 Onto functions and bijections Applications to Counting

Lecture 1: Introduction to Regression

Transcription:

Mache Learg The feld of mache learg s cocered wth the questo of how to costruct computer programs that automatcally mprove wth eperece. (Mtchell, 1997) Thgs lear whe they chage ther behavor a way that makes them perform better a future. (Wtte, Frak, 1999) types of learg: kowledge acqusto skll refemet Relato betwee mache learg ad data mg P. Berka, 2018 1/18

learg decso makg select represetato learg kowledge object object decso decso descrpto makg Geeral scheme of a learg system Learg methods: 1. rote learg, 2. learg from structo, learg by beg told), 3. learg by aalogy, stace-based learg, lazy learg, 4. eplaato-based learg, 5. learg from eamples, 6. learg from observato ad dscovery. P. Berka, 2018 2/18

Learg methods: statstcal methods - regresso methods, dscrmat aalyss, cluster aalyss, symbolc mache learg methods - decso trees ad rules, case-based reasog (CBR) sub-symbolc mache learg methods euroal etworks, bayesa etworks or geetc algorthms. Feedback durg learg: pre-classfed eamples (supervsed learg) small umber of pre-classfed eamples ad a large umber of eamples wthout kow class (sem-supervsed learg) algorthm ca query the teacher for class membershp for uclassfed eamples (actve learg), drect hts derved from the teacher s behavor (appretceshp learg) o feedback (usupervsed learg) P. Berka, 2018 3/18

represetato of eamples: 1. attrbutes: categoral (bary, omal, ordal) ad umerc [har=black & heght=180 & beard=yes & educato=uv] 2. relatos father(ja_lucembursky, karel_iv) Algorthms: batch all eamples are processed at oce cremetal eamples are processed subsequetly system ca be re-traed Learg methods: emprcal uses large set of (trag) eamples ad lmted (or o) backgroud kowledge aalytc uses large backgroud kowledge ad several (oe or eve o) llustratve eamples P. Berka, 2018 4/18

Prcples of emprcal cocept learg 1. eamples of the same class have smlar characterstcs (smlarty-based learg) 1. eamples of the same class create clusters the attrbute space The goal of learg s to fd ad represet these clusters garbage, garbage out problem Importace of data uderstadg ad preprocessg P. Berka, 2018 5/18

2. Geeral kowledge ferred from a fte set of eamples (ductve learg) Eamples dvded to 2 (or 3) sets: o trag set to buld a model o (valdato set to tue the parameters) o testg set to test the model P. Berka, 2018 6/18

Geeral defto of (supervsed) mache learg Aalyzed data: D : 11 2 1 1 12 2 2 : 2......... 1m 2 m : m Rows the table represet objects (eamples, staces) Colums the table correspod to attrbutes (varables) Whe addg target attrbute to the data table, we obta data sutable for supervsed learg methods (so called trag data). D TR : 11 2 1 1 12 2 2 : 2......... 1m 2 m : m y y y : 1 2 Classfcato task: to fd kowledge (represeted by a decso fucto f), that assgs value of target attrbute y to a object descrbed by values of put attrbutes f: y. P. Berka, 2018 7/18

We fer durg classfcato for values of put attrbutes for a object the value of target attrbutes: ŷ = f (). The derved value ŷ ca be dfferet from the real value y. We ca thus compute for every object o D TR the classfcato error Q f (o, ŷ ). for umerc attrbute C e.g. as: Q (, y ) = (y - y ) 2 f o for categoral attrbute C e.g. as: Q f ( o, yˆ 1 ) = 0 ff ff y y yˆ = yˆ We ca compute the overall error Err(f,D TR ) for the whole trag set D TR e.g. as mea error:. =1 Err(f,D = 1 TR ) Q f ( o, y ) The goal of learg s to fd such kowledge f*, that wll mmze ths error Err(f*,D TR ) m Err(f,DTR ). f P. Berka, 2018 8/18

1. Learg as search Lookg for both structure ad parameters of the model Models as cluster descrptos: MGM most geeral model (oe cluster for all eamples) MSM most specfc model(s) (each eample creates a cluster) M1 s more geeral tha M2, M2 s more specfc tha M1 1 1 B( ) B( k), B(0) 1 k k 1 2 3 4 5 10 B( ) 1 2 5 Bell umbers 115975 P. Berka, 2018 9/18 15 52

Search methods: Drecto Top dow (from geeral to specfc models) Bottom up (from specfc to geeral models) Strategy bld (we cosder each possblty how to specalze/geeralze gve model) heurstc (we use some crtero to select oly the best possbltes how to specalze/geeralze gve model) radom Badwdth sgle (we cosder oly oe trasformato of actual model) parallel (we cosder more trasformatos) P. Berka, 2018 10/18

Eample: Let us assume, that both put attrbutes ad target attrbute are categoral let us deote category the value of a attrbute: 1. atomc formula that epresses property of object o : A (v j k )( o ) 1 0 pro pro j j v v 2. set of objects that fulfll gve property A (v ) { o : } j k j v k k k Combatos are created from categores usg logcal AND Comb [ A (v ), A (v ),...,A (v )] A (v ) A (v )... A j1 k1 j2 k2 jl kl j1 k1 j2 k2 jl kl (v ) 1. 1 o Comb( o ) 0 f else v v... v j1 k1 j 2 k2 j l kl : 2. Comb { : vk j vk... j } o. j v 1 1 2 2 l k l Comb covers object o ff Comb(o ) = 1 We ca create supercombatos by addg categores to a combato ad create subcombatos by removg categores from a combato. P. Berka, 2018 11/18

Partal orderg betwee combatos: If combato Comb 1 s a subcombato of combato Comb 2, the combato Comb 1 s more geeral tha combato Comb 2 ad combato Comb 2 s more specfc tha combato Comb 1. If combato Comb 1 s more geeral tha combato Comb 2, the Comb 1 covers at least all objects that are covered by Comb 2. (dowward-closure property) The resultg kowledge wll be represeted by combatos that cover oly eamples of gve class. Combato Comb s cosstet, ff t covers oly eamples of a sgle class: C(v t ) o Eample data: D TR : Comb( o ) 1 y v příjem koto pohlaví ezaměstaý auto bydleí úvěr vysoký vysoké žea e ao vlastí Ao vysoký vysoké muž e ao vlastí Ao zký ízké muž e ao ájemí Ne vysoký vysoké muž e e ájemí Ao Combato Comb (hypothess represetg the cocept úvěr ) ca cota followg values of a attrbute: t? to dcate that the value of ths attrbute s rrelevat, value of the attrbute, to dcate that o value of ths attrbute s applcable. P. Berka, 2018 12/18

[?,?,?,?,?,?]... [?,?, žea,?,?,?] [vysoký,?,?,?,?,?] [?,vysoké,?,?,?,?] [?,?,?,?,?, vlastí]... [vysoký,?,?, e,?,?] [vysoký,vysoké,?,?,?,?] [?, vysoké,?, e,?,?,?] [vysoký,vysoké,?, e,?,?] [vysoký,vysoké,?, e,ao,?] [vysoký,vysoké,?,e,?,vlastí] [vysoký,vysoké,muž, e,?,?] [vysoký, vysoké,?, e,ao, vlastí] [vysoký, vysoké, muž,e,?,vlastí] [vysoký, vysoké, žea,e,?,vlastí] [vysoký, vysoké, muž,e, ao,?] [vysoký, vysoké, muž,e, e,?] [vysoký,vysoké,žea,e, ao, vlastí] [vysoký,vysoké,muž,e, ao, vlastí] [vysoký,vysoké,muž,e, e, ájemí] [,,,,, ] Hypothess space We ca traverse the hypothess space usg two methods: from geeral to specfc (top-dow, specalzato), from specfc to geeral (bottom-up, geeralzato). P. Berka, 2018 13/18

Fd-S algorthm 1. Italze h to the most specfc hypothess H 2. For each postve trag eample 2.1. For each attrbute a from hypothess h f value of attrbute a does ot correspod to the replace value of a by the et more geeral value that correspods to 3. output h S: [vysoký, vysoké,?,e,?,?] [vysoký,vysoké,?, e,ao,?] [vysoký,vysoké,?,e,?,vlastí] [vysoký,vysoké,muž, e,?,?] [vysoký, vysoké,?, e,ao, vlastí] [vysoký, vysoké, muž,e, e,?] [vysoký,vysoké,žea,e, ao, vlastí] [vysoký,vysoké,muž,e, ao, vlastí] [vysoký,vysoké,muž,e, e, ájemí] P. Berka, 2018 14/18

Caddate-Elmato algorthm 1. Italze G to the set of mamally geeral hypotheses H 2. Italze S to the set of mamally specfc hypotheses H 3. for each eample 3.1. f s a postve eample the remove form G ay hypothess cosstet wth for each hypothess s S that s ot cosstet wth remove s from S add to S mmal geeralzato h of s such, that h s cosstet wth ad some member of G s more geeral tha h remove from S hypotheses that are more geeral tha aother hypothess S 3.2. f s a egatve eample the remove from S ay hypothess cosstet wth for each hypothess g G that s ot cosstet wth remove g from G add to G mmal specalzato h of g such, that h s cosstet wth ad some member of S s more specfc tha h remove from G hypotheses that are more specfc tha aother hypothess G G: [vysoký,?,?,?,?,?] [?, vysoké,?,?,?,?] [vysoký,?,?, e,?,?] [vysoký,vysoké,?,?,?,?] [?, vysoké,?, e,?,?,?] S: [vysoký, vysoké,?,e,?,?] P. Berka, 2018 15/18

2. Learg as appromato Lookg oly for parameters of the model Eample: usg fte umber of data pots fd parameters of a (geeral) fucto to best ft the data y=f() f() = q 1 + q 0 least squares method: the problem of fdg the mmum of the overall error m (y - f( )) 2 s trasformed to solvg the equato d dq (y - f( )) 2 = 0 P. Berka, 2018 16/18

soluto: 1) aalytc (we kow the type of fucto) solvg the equatos for the parameters of fuctos q 0 = ( ky k )( k k2 ) - ( k k y k )( k k ) ( k k2 ) - ( k k ) 2 q 1 = ( k k y k ) - ( k k )( k y k ) ( k k2 )- ( k k ) 2 2) umerc (we do ot kow the type of fucto) gradet methods Err(q) = Err q 0, Err q 1,..., Err q Q. Modfcato of kowledge q = [q 0, q 1,..., q Q ] accordg the algorthm where q j q j + q j Δq j Err - η q j ad s a parameter epressg step used to approach the mmum of fucto Err. P. Berka, 2018 17/18

E.g. for error fucto 1 Err(f,D = (y - y 1 TR ) ) (y - f`( )) 2 2 =1 2 2 ad epected fucto f as lear combato of puts f() = q, =1 we ca derve the gradet of fucto Err as Err q So j = 1 2 2 y - y~ = 2y - y~ y - y~ 1 q j 2 1 q j 1 y - y~ y - q = y - y~ - j = q 1 j 1 q = y - y j j =1 Problem wth covergece to local mmum P. Berka, 2018 18/18