From Mating Pool Distributions to Model Overfitting

Size: px
Start display at page:

Download "From Mating Pool Distributions to Model Overfitting"

Transcription

1 From Mating Pool Distributions to Model Overfitting Claudio F. Lima 1 Fernando G. Lobo 1 Martin Pelikan 2 1 Department of Electronics and Computer Science Engineering University of Algarve, Portugal 2 Missouri Estimation of Distribution Algorithm Laboratory University of Missouri at St. Louis, USA GECCO 2008, Atlanta, USA

2 Probabilistic Modeling in EDAs Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup The core EDA is defined by its model complexity (expressiveness). The spectrum Simpler EDAs only learn the parameters of the model. More powerful EDAs learn both structure and parameters. Bayesian EDAs Solve a broad class of problems efficiently. But oftentimes models do not exactly reflect the underlying problem structure.

3 Motivation Introduction Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Importance of model accuracy Model-based efficiency enhancement techniques. Substructural local search. Fitness estimation (surrogate fitness function). Offline interpretation of probabilistic models. Bias structural learning in EDAs (Mark Hauschild s talk). Design structure-aware variation operators. Empirical findings with BOA: Tournament vs. truncation Population size requirements. Model quality difference.

4 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Bayesian Optimization Algorithm (BOA) BOA EDA which uses Bayesian networks (BNs). Model selected solutions to generate new ones. Similar algorithms: Estimation of Bayesian networks algorithm (EBNA). Learning factorized distribution algorithm (LFDA). Learning BNs Decision trees express conditional probabilities. Greedy algorithm for learning. Scoring metrics considered: K2 and BIC.

5 Experimental Setup Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Test problem To investigate MSA we use a problem of known structure. Clear identification of: Necessary dependencies successful tractability. Unnecessary dependencies poor model interpretability. m k trap problem Consists of m concatenated k-bit trap functions. For k 3, the model should be able to maintain k order statistics, otherwise exponential scalability. k = 5 used in experiments.

6 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Ideal Model for the m k trap problem Joint distribution in BNs p(x a, X b, X c, X d ) = p(x a )p(x b X a )p(x c X a, X b )p(x d X a, X b, X c ) Clique between variables of the same trap function. While between different traps there are no dependencies.

7 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Measuring Model Structural Accuracy Definition 1 The model structural accuracy (MSA) is defined as the ratio of correct edges over the total number of edges in the BN. Definition 2 An edge is correct if it connects two variables that are linked according to the objective function definition. Definition 3 Model overfitting is defined as the inclusion of incorrect (or unnecessary) edges to the BN.

8 Selection Methods Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Selection methods considered Tournament selection. Randomly pick s individuals and choose the best. Repeat n times. Truncation selection. Choose the best τ proportion of the population. Equivalent selection intensity Selection intensity, I Tournament size, s Truncation threshold, τ(%)

9 Selection-Metric Comparison Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Model Structural Accuracy, MSA Tournament w/ K2 Tournament w/ BIC Truncation w/ K2 Truncation w/ BIC Number of function evaluations, n fe x Tournament w/ K2 Tournament w/ BIC Truncation w/ K2 Truncation w/ BIC Selection intensity, I Selection intensity, I Observations 1 Truncation performs better than tournament selection. 2 K2 metric performs better than BIC metric.

10 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator A Simple Analysis of Tournament Selection Probability of an individual with rank i to win a tournament p i = s 1 j=1 Number of copies in the mating pool i j, for s 2. (1) n j c i = s p i. (2) Which can be approximated by a power distribution with p.d.f. f (x) = α x α 1, 0 < x 1, α = s, x = i/n. (3)

11 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator A Simple Analysis of Truncation Selection Number of copies in the mating pool { 0, if i < n (1 (τ/100)) c i = 1, otherwise. (4)

12 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Distribution of the Expected Number of Copies Expected number of copies, c i s = 2 s = 3 s = 4 s = 5 s = 7 s = 10 s = 20 Expected number of copies, c i 1 τ = 66% τ = 47% τ = 36% τ = 30% τ = 22% τ = 15% τ = 8% Rank, i (in percentile) Rank, i (in percentile) Tournament and truncation selection differ in: 1 Window size. 2 Distribution shape.

13 Mating Pool Distribution: An Example Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Function f mk trap = f 1 (X 1 X 2 X 3 ) + f 2 (X 4 X 5 X 6 ) + f 3 (X 7 X 8 X 9 ) Example Rank Individual Copies Mating pool distributions: Uniform Linear Exponential X 6 X 7 freq. X 6 X 7 freq. X 6 X 7 freq

14 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Mating Pool w/ Non-Uniform Distribution What s good about it? Detection of existent patterns is achieved with less data. Tournament selection requires smaller population sizes than truncation. What s bad about it? Irrelevant features can also be detected as patterns. Excessive complexity model overfitting. So what? Recognize that we are learning from imbalanced data sets. Change scoring metric accordingly!

15 Modeling Metric Gain When Overfitting Adaptive Scoring Metric for Tournament Agenda 1 Compute how the power-law-based frequencies differ from uniform ones in the mating pool. 2 Use computed frequency deviation to estimate scoring metric gain due to overfitting. 3 Change the complexity penalty to counterbalance the misleading metric gain.

16 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Deviation of power-law-based frequencies Deviation is linear w.r.t. tournament size Expected market share of top-ranked individuals after selection is given by the (right-side area of the) c.d.f. of the power distribution. F(x) = x s, 0 < x < 1, s 2. (5) In the worst case, market share grows linearly with tournament size. Thus, misleading frequencies deviate linearly from the true ones (uniform distribution). More details in the paper.

17 Computing Metric Gain Modeling Metric Gain When Overfitting Adaptive Scoring Metric Consider the decision of adding an edge from X 2 to X 1 Metric gain must be calculated: G metric = ScoreAfter ScoreBefore ComplexityPenalty Uniform mating pool reveals p 00 = p 01 = p 10 = p 11 = 0.25 G metric < 0. Simulate tournament mating pool by assigning p (s 1), p (s 1), p (s 1), p (s 1). Where is related to the proportion of top-individuals which lead to overfitting (more info in the paper).

18 Computing Metric Gain (2) Modeling Metric Gain When Overfitting Adaptive Scoring Metric Approximated metric gain G 2 (0.25 (s 1)) log 2 (0.25 (s 1)) + 1. (6) Metric gain, G BIC Approximation O(s) O(s 2 ) Tournament size, s

19 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Comparison of Different Penalty Corrections Complexity penalty for the K2 metric p(b) = 2 0.5cs log 2 (n) P l i=1 L i c s = {1, s, s, and s log 2 (s)} Model Structural Accuracy, MSA c =1 s c s =sqrt(s) c s =s c s =s*log 2 (s) Number of function evaluations, n fe 5 x c =1 s c s =sqrt(s) c s =s c s =s*log 2 (s) Tournament size, s Tournament size, s

20 Results for the s penalty (c s = s) Modeling Metric Gain When Overfitting Adaptive Scoring Metric 14 x 105 Model Structural Accuracy, MSA l = 40 l = 60 l = 80 l = 100 Number of function evaluations, n fe l = 40 l = 60 l = 80 l = Tournament size, s Tournament size, s Remark Model structural accuracy is superior to 99%!

21 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Results for Truncation with the Standard Penalty Model Structural Accuracy, MSA l = 40 l = 60 l = 80 l = 100 Number of function evaluations, n fe 2 x l = 40 l = 60 l = 80 l = Truncation threshold, τ Truncation threshold, τ Remark Good results, but not as good as for tournament selection with the s penalty.

22 Summary & Conclusions Modeling Metric Gain When Overfitting Adaptive Scoring Metric Selection addressed as a source of overfitting. Influence of selection method in BN learning: Truncation selection uniform mating pool is more appropriate for learning (standard scoring metrics). Tournament selection non-uniform mating pool leads to model overfitting. Scoring metric adapted to the power distribution of the mating pool generated by tournament selection. Model accuracy has been considerably improved! Improved model interpretability to assist efficiency enhancement techiniques and human researchers.

23 Modeling Metric Gain When Overfitting Adaptive Scoring Metric From Mating Pool Distributions to Model Overfitting Claudio F. Lima 1 Fernando G. Lobo 1 Martin Pelikan 2 1 Department of Electronics and Computer Science Engineering University of Algarve, Portugal 2 Missouri Estimation of Distribution Algorithm Laboratory University of Missouri at St. Louis, USA GECCO 2008, Atlanta, USA

24 Scoring Metrics Introduction Modeling Metric Gain When Overfitting Adaptive Scoring Metric (Bayesian-Dirichlet) K2 metric K 2(B) = p(b) l 1 Γ(m i (x i, l) + 1), (7) Γ(m i (l) + 2) l L i x i i=1 where p(b) = log 2 (n) P l i=1 L i. Bayesian information criteria (BIC) metric l BIC(B) = i=1 l Li x i ( ) m i (x i, l) m i (x i, l) log 2 m i (l) L i log 2 (n). 2 (8)

From Mating Pool Distributions to Model Overfitting

From Mating Pool Distributions to Model Overfitting From Mating Pool Distributions to Model Overfitting Claudio F. Lima DEEI-FCT University of Algarve Campus de Gambelas 85-39, Faro, Portugal clima.research@gmail.com Fernando G. Lobo DEEI-FCT University

More information

Estimation-of-Distribution Algorithms. Discrete Domain.

Estimation-of-Distribution Algorithms. Discrete Domain. Estimation-of-Distribution Algorithms. Discrete Domain. Petr Pošík Introduction to EDAs 2 Genetic Algorithms and Epistasis.....................................................................................

More information

Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses

Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses Mark Hauschild, Martin Pelikan, Claudio F. Lima, and Kumara Sastry IlliGAL Report No. 2007008 January 2007 Illinois Genetic

More information

Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses

Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses Mark Hauschild Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) Dept. of Math and Computer Science, 320 CCB

More information

Spurious Dependencies and EDA Scalability

Spurious Dependencies and EDA Scalability Spurious Dependencies and EDA Scalability Elizabeth Radetic, Martin Pelikan MEDAL Report No. 2010002 January 2010 Abstract Numerous studies have shown that advanced estimation of distribution algorithms

More information

Probabilistic Model-Building Genetic Algorithms

Probabilistic Model-Building Genetic Algorithms Probabilistic Model-Building Genetic Algorithms Martin Pelikan Dept. of Math. and Computer Science University of Missouri at St. Louis St. Louis, Missouri pelikan@cs.umsl.edu Foreword! Motivation Genetic

More information

Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and Tunable Overlap

Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and Tunable Overlap Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and Tunable Overlap Martin Pelikan, Kumara Sastry, David E. Goldberg, Martin V. Butz, and Mark Hauschild Missouri

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Probabilistic Model-Building Genetic Algorithms

Probabilistic Model-Building Genetic Algorithms Probabilistic Model-Building Genetic Algorithms a.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms Martin Pelikan Missouri Estimation of Distribution Algorithms

More information

Probabilistic Model-Building Genetic Algorithms

Probabilistic Model-Building Genetic Algorithms Probabilistic Model-Building Genetic Algorithms a.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms Martin Pelikan Foreword Motivation Genetic and evolutionary computation

More information

Finding Ground States of Sherrington-Kirkpatrick Spin Glasses with Hierarchical BOA and Genetic Algorithms

Finding Ground States of Sherrington-Kirkpatrick Spin Glasses with Hierarchical BOA and Genetic Algorithms Finding Ground States of Sherrington-Kirkpatrick Spin Glasses with Hierarchical BOA and Genetic Algorithms Martin Pelikan, Helmut G. Katzgraber and Sigismund Kobe MEDAL Report No. 284 January 28 Abstract

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm

Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm Kai-Chun Fan Jui-Ting Lee Tian-Li Yu Tsung-Yu Ho TEIL Technical Report No. 2010005 April,

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Tufts COMP 135: Introduction to Machine Learning

Tufts COMP 135: Introduction to Machine Learning Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)

More information

Sample Complexity of Learning Mahalanobis Distance Metrics. Nakul Verma Janelia, HHMI

Sample Complexity of Learning Mahalanobis Distance Metrics. Nakul Verma Janelia, HHMI Sample Complexity of Learning Mahalanobis Distance Metrics Nakul Verma Janelia, HHMI feature 2 Mahalanobis Metric Learning Comparing observations in feature space: x 1 [sq. Euclidean dist] x 2 (all features

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Learning With Bayesian Networks. Markus Kalisch ETH Zürich Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal

More information

Convergence Time for Linkage Model Building in Estimation of Distribution Algorithms

Convergence Time for Linkage Model Building in Estimation of Distribution Algorithms Convergence Time for Linkage Model Building in Estimation of Distribution Algorithms Hau-Jiun Yang Tian-Li Yu TEIL Technical Report No. 2009003 January, 2009 Taiwan Evolutionary Intelligence Laboratory

More information

WITH. P. Larran aga. Computational Intelligence Group Artificial Intelligence Department Technical University of Madrid

WITH. P. Larran aga. Computational Intelligence Group Artificial Intelligence Department Technical University of Madrid Introduction EDAs Our Proposal Results Conclusions M ULTI - OBJECTIVE O PTIMIZATION WITH E STIMATION OF D ISTRIBUTION A LGORITHMS Pedro Larran aga Computational Intelligence Group Artificial Intelligence

More information

Analysis of Epistasis Correlation on NK Landscapes. Landscapes with Nearest Neighbor Interactions

Analysis of Epistasis Correlation on NK Landscapes. Landscapes with Nearest Neighbor Interactions Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions Missouri Estimation of Distribution Algorithms Laboratory (MEDAL University of Missouri, St. Louis, MO http://medal.cs.umsl.edu/

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Expanding From Discrete To Continuous Estimation Of Distribution Algorithms: The IDEA

Expanding From Discrete To Continuous Estimation Of Distribution Algorithms: The IDEA Expanding From Discrete To Continuous Estimation Of Distribution Algorithms: The IDEA Peter A.N. Bosman and Dirk Thierens Department of Computer Science, Utrecht University, P.O. Box 80.089, 3508 TB Utrecht,

More information

how should the GA proceed?

how should the GA proceed? how should the GA proceed? string fitness 10111 10 01000 5 11010 3 00011 20 which new string would be better than any of the above? (the GA does not know the mapping between strings and fitness values!)

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Analysis of Epistasis Correlation on NK Landscapes with Nearest-Neighbor Interactions

Analysis of Epistasis Correlation on NK Landscapes with Nearest-Neighbor Interactions Analysis of Epistasis Correlation on NK Landscapes with Nearest-Neighbor Interactions Martin Pelikan MEDAL Report No. 20102 February 2011 Abstract Epistasis correlation is a measure that estimates the

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

State of the art in genetic algorithms' research

State of the art in genetic algorithms' research State of the art in genetic algorithms' Genetic Algorithms research Prabhas Chongstitvatana Department of computer engineering Chulalongkorn university A class of probabilistic search, inspired by natural

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Behaviour of the UMDA c algorithm with truncation selection on monotone functions

Behaviour of the UMDA c algorithm with truncation selection on monotone functions Mannheim Business School Dept. of Logistics Technical Report 01/2005 Behaviour of the UMDA c algorithm with truncation selection on monotone functions Jörn Grahl, Stefan Minner, Franz Rothlauf Technical

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Decision Support. Dr. Johan Hagelbäck.

Decision Support. Dr. Johan Hagelbäck. Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Dynamic Poisson Factorization

Dynamic Poisson Factorization Dynamic Poisson Factorization Laurent Charlin Joint work with: Rajesh Ranganath, James McInerney, David M. Blei McGill & Columbia University Presented at RecSys 2015 2 Click Data for a paper 3.5 Item 4663:

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

about the problem can be used in order to enhance the estimation and subsequently improve convergence. New strings are generated according to the join

about the problem can be used in order to enhance the estimation and subsequently improve convergence. New strings are generated according to the join Bayesian Optimization Algorithm, Population Sizing, and Time to Convergence Martin Pelikan Dept. of General Engineering and Dept. of Computer Science University of Illinois, Urbana, IL 61801 pelikan@illigal.ge.uiuc.edu

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2

More information

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy

More information

CS Data Structures and Algorithm Analysis

CS Data Structures and Algorithm Analysis CS 483 - Data Structures and Algorithm Analysis Lecture II: Chapter 2 R. Paul Wiegand George Mason University, Department of Computer Science February 1, 2006 Outline 1 Analysis Framework 2 Asymptotic

More information

COMP538: Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology

More information

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

Reducing The Computational Cost of Bayesian Indoor Positioning Systems Reducing The Computational Cost of Bayesian Indoor Positioning Systems Konstantinos Kleisouris, Richard P. Martin Computer Science Department Rutgers University WINLAB Research Review May 15 th, 2006 Motivation

More information

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company UNSUPERVISED LEARNING OF BAYESIAN NETWORKS VIA ESTIMATION OF DISTRIBUTION ALGORITHMS: AN

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole University of British Columbia 1 Overview Belief Networks Variable Elimination Algorithm Parent Contexts

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 8 Probability and Statistical Models Motivating ideas AMS 131: Suppose that the random variable

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

Efficient Bayesian Multivariate Surface Regression

Efficient Bayesian Multivariate Surface Regression Efficient Bayesian Multivariate Surface Regression Feng Li (joint with Mattias Villani) Department of Statistics, Stockholm University October, 211 Outline of the talk 1 Flexible regression models 2 The

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback 9/3/3 Admin Assignment 3: - how did it go? - do the experiments help? Assignment 4 IMBALANCED DATA Course feedback David Kauchak CS 45 Fall 3 Phishing 9/3/3 Setup Imbalanced data. for hour, google collects

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Parameter learning in CRF s

Parameter learning in CRF s Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,

More information

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area under the curve. The curve is called the probability

More information

Probability Distributions: Continuous

Probability Distributions: Continuous Probability Distributions: Continuous INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 28, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions:

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Bayesian networks approximation

Bayesian networks approximation Bayesian networks approximation Eric Fabre ANR StochMC, Feb. 13, 2014 Outline 1 Motivation 2 Formalization 3 Triangulated graphs & I-projections 4 Successive approximations 5 Best graph selection 6 Conclusion

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Hierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses

Hierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses Hierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses Martin Pelikan 1, Alexander K. Hartmann 2, and Kumara Sastry 1 Dept. of Math and Computer Science, 320 CCB University of Missouri at

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:

More information

An empirical distribution-based approximation of PROMETHEE II's net flow scores

An empirical distribution-based approximation of PROMETHEE II's net flow scores imw215 23 January, 215 Brussels, Belgium An empirical distribution-based approximation of PROMETHEE II's net flow scores Stefan Eppe, Yves De Smet Computer & Decision Engineering Department Université

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information