Data Informatics. Seon Ho Kim, Ph.D.

Size: px

Start display at page:

Download "Data Informatics. Seon Ho Kim, Ph.D."

Cornelia Nelson
6 years ago
Views:

1 Data Informatics Seon Ho Kim, Ph.D.

2 What is Machine Learning? Overview slides by ETHEM ALPAYDIN

3 Why Learn? Learn: programming computers to optimize a performance criterion using example data or past experience. There is no need to learn known things calculatingpayroll: we already know well Learning is used when: Human expertise does not exist (navigating on Mars), Humans are unable toexplain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics)

4 What We Talk About When We Talk About Learning Learning general model from data of particular example Data is cheap and abundant; knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People whobought X also bought Y Build a model that is a good and useful approximation to the data. model: a system used as an example to follow or imitate

5 What is Machine Learning? The study and construction of algorithms that can learn from and make predictions on data. Optimize a performance criterion using example data or past experience. Role of Statistics: Build mathematical models Inference from samples Role of Computer Science: Efficient algorithms to Solve the optimization problem Representing and evaluating the model for inference

6 Applications Association Auto-association Hetero-association Supervised Learning Classification (Recognition) Regression Unsupervised Learning Clustering (Grouping) Reinforcement Learning

7 Association Basket analysis (example) To find associations between products bought by customers Learning a conditional probability P (Y X ) Example probability that somebody who buys X also buys Y where X and Y are products/services. P ( chips beer ) = percent of customers who buy beer also buy chips.

8 Classification Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ 1 AND savings > θ 2 THEN low-risk ELSE high-risk

9 Classification: Applications Pattern recognition Character recognition: Different handwriting styles. Face recognition: Pose, lighting, occlusion, make-up, hair style Speech recognition: Temporal dependency. Use of a dictionary or the syntax of the language. Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech Medical diagnosis: From symptoms to illnesses...

10 Regression A statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Example: Price of a used car y = wx+w 0 10

11 Supervised Learning Supervised Laearning: learn a mapping from input tooutput whose correct values are provided by a supervisor (e.g., regression) Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy tounderstand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule

12 Unsupervised Learning Unsupervised Learning: learning what normally happens The aim is to find the regularities in the input. Density estimation: we want to see what generally happens and what does not. Clustering: Grouping similar instances Example applications Customer segmentation in CRM (customer relationship management) Image compression: Color quantization Bioinformatics: Learning motifs

13 Reinforcement Learning Learning a policy: A sequence of outputs The output of the system is a sequence of actions. An action is good if it is part of a good policy. No supervised output but delayed reward Examples: Game playing Robot in a maze Partial observability...

14 Supervised Learning Classification

15 Learning a Class from Examples Class C of a family car Prediction: Is car x a family car? Knowledge extraction: What do people expect from a family car? Output: Positive (+) and negative ( ) examples Input representation exmaple: x 1 : price, x 2 : engine power input to the class recognizer

16 Training set X t t N X = { x,r } t= 1 r 1 if x is positive = 0 if x is negative x x 1 = x 2

17 Class C ( p price p ) AND ( e engine power e ) The class of family car defined by the expert

18 Hypothesis class H The class of family car defined by the learning system h() x 1 if h classifies x as positive = 0 if h classifies x as negative The class of family car defined by the expert Error of h on H N t= 1 ( ( t) t x ) Eh ( X ) = 1 h r where 1( a b) is 1 if a b and is 0 if a= b

19 S, G, and the Version Space most specific hypothesis, S most general hypothesis, G Any h Î H, between S and G is a valid hypothesis with no error, said to be consistent with the training set, and make up the version space. (Mitchell, 1997) version space

20 Choose h with largest margin We choose the hypothesis with the largest margin, for best separation. The shaded instances are those that define the margin; other instances can be remove without affecting h. Margin

21 Noise and Model Complexity Noise: unwanted anomoly in the data Noise interpretation imprecision in recording input attribute Error in labeling data points Additional attributes which have not taken into account

22 Noise and Model Complexity Use the simpler model because Simpler to use (lower computational complexity) Easier to train (lower space complexity) Easier to explain (more interpretable) Generalizes better (less variance and less affected by single instances)

23 Learning Multiple Classes, C i, i =1,...,K X r t i = r t t N { x, } t= 1 t 1 if x Ci = t 0 if x C j, j i Train hypotheses h i (x), i =1,..., K: h i t ( x ) t 1 if x Ci = t 0 if x C j, j i

24 Model Selection & Generalization Learning is an ill-posed problem Data is not sufficient to find a unique solution The need for inductive bias (assumptions) about H The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered (Mitchell, 1980). A classical example of an inductive bias is Occam's Razor, assuming that the simplest consistent hypothesis about the target function is actually the best. For example Assuming the shape of a rectangle is one inductive bias In linear regression, assuming a linear function is an inductive bias, and choosing the one that minimizes squared error is another inductive bias.

25 Model Selection & Generalization Generalization: How well a model trained on the training set predicts the right output for new instances. Underfitting: H less complex than C or f The hypothesis is less complex than the function. Overfitting: H more complex than C or f The hypothesis is more complex than the function.

26 Cross-Validation Measure the generalization ability Cross-validation is the statistical practice of partitioning a sample of data into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained for subsequent use in confirming and validating the initial analysis. 26

27 Cross-Validation To estimate generalization error, weneed data unseen during training. We split the data as Training set (50%) Validation set (25%) : validation error To test the generalization ability Test (publication) set (25%) : expected error Contains examples not used in training or validation Resampling when there is few data

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol