From Mating Pool Distributions to Model Overfitting
|
|
- Bernadette Newman
- 6 years ago
- Views:
Transcription
1 From Mating Pool Distributions to Model Overfitting Claudio F. Lima 1 Fernando G. Lobo 1 Martin Pelikan 2 1 Department of Electronics and Computer Science Engineering University of Algarve, Portugal 2 Missouri Estimation of Distribution Algorithm Laboratory University of Missouri at St. Louis, USA GECCO 2008, Atlanta, USA
2 Probabilistic Modeling in EDAs Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup The core EDA is defined by its model complexity (expressiveness). The spectrum Simpler EDAs only learn the parameters of the model. More powerful EDAs learn both structure and parameters. Bayesian EDAs Solve a broad class of problems efficiently. But oftentimes models do not exactly reflect the underlying problem structure.
3 Motivation Introduction Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Importance of model accuracy Model-based efficiency enhancement techniques. Substructural local search. Fitness estimation (surrogate fitness function). Offline interpretation of probabilistic models. Bias structural learning in EDAs (Mark Hauschild s talk). Design structure-aware variation operators. Empirical findings with BOA: Tournament vs. truncation Population size requirements. Model quality difference.
4 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Bayesian Optimization Algorithm (BOA) BOA EDA which uses Bayesian networks (BNs). Model selected solutions to generate new ones. Similar algorithms: Estimation of Bayesian networks algorithm (EBNA). Learning factorized distribution algorithm (LFDA). Learning BNs Decision trees express conditional probabilities. Greedy algorithm for learning. Scoring metrics considered: K2 and BIC.
5 Experimental Setup Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Test problem To investigate MSA we use a problem of known structure. Clear identification of: Necessary dependencies successful tractability. Unnecessary dependencies poor model interpretability. m k trap problem Consists of m concatenated k-bit trap functions. For k 3, the model should be able to maintain k order statistics, otherwise exponential scalability. k = 5 used in experiments.
6 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Ideal Model for the m k trap problem Joint distribution in BNs p(x a, X b, X c, X d ) = p(x a )p(x b X a )p(x c X a, X b )p(x d X a, X b, X c ) Clique between variables of the same trap function. While between different traps there are no dependencies.
7 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Measuring Model Structural Accuracy Definition 1 The model structural accuracy (MSA) is defined as the ratio of correct edges over the total number of edges in the BN. Definition 2 An edge is correct if it connects two variables that are linked according to the objective function definition. Definition 3 Model overfitting is defined as the inclusion of incorrect (or unnecessary) edges to the BN.
8 Selection Methods Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Selection methods considered Tournament selection. Randomly pick s individuals and choose the best. Repeat n times. Truncation selection. Choose the best τ proportion of the population. Equivalent selection intensity Selection intensity, I Tournament size, s Truncation threshold, τ(%)
9 Selection-Metric Comparison Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Model Structural Accuracy, MSA Tournament w/ K2 Tournament w/ BIC Truncation w/ K2 Truncation w/ BIC Number of function evaluations, n fe x Tournament w/ K2 Tournament w/ BIC Truncation w/ K2 Truncation w/ BIC Selection intensity, I Selection intensity, I Observations 1 Truncation performs better than tournament selection. 2 K2 metric performs better than BIC metric.
10 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator A Simple Analysis of Tournament Selection Probability of an individual with rank i to win a tournament p i = s 1 j=1 Number of copies in the mating pool i j, for s 2. (1) n j c i = s p i. (2) Which can be approximated by a power distribution with p.d.f. f (x) = α x α 1, 0 < x 1, α = s, x = i/n. (3)
11 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator A Simple Analysis of Truncation Selection Number of copies in the mating pool { 0, if i < n (1 (τ/100)) c i = 1, otherwise. (4)
12 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Distribution of the Expected Number of Copies Expected number of copies, c i s = 2 s = 3 s = 4 s = 5 s = 7 s = 10 s = 20 Expected number of copies, c i 1 τ = 66% τ = 47% τ = 36% τ = 30% τ = 22% τ = 15% τ = 8% Rank, i (in percentile) Rank, i (in percentile) Tournament and truncation selection differ in: 1 Window size. 2 Distribution shape.
13 Mating Pool Distribution: An Example Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Function f mk trap = f 1 (X 1 X 2 X 3 ) + f 2 (X 4 X 5 X 6 ) + f 3 (X 7 X 8 X 9 ) Example Rank Individual Copies Mating pool distributions: Uniform Linear Exponential X 6 X 7 freq. X 6 X 7 freq. X 6 X 7 freq
14 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Mating Pool w/ Non-Uniform Distribution What s good about it? Detection of existent patterns is achieved with less data. Tournament selection requires smaller population sizes than truncation. What s bad about it? Irrelevant features can also be detected as patterns. Excessive complexity model overfitting. So what? Recognize that we are learning from imbalanced data sets. Change scoring metric accordingly!
15 Modeling Metric Gain When Overfitting Adaptive Scoring Metric for Tournament Agenda 1 Compute how the power-law-based frequencies differ from uniform ones in the mating pool. 2 Use computed frequency deviation to estimate scoring metric gain due to overfitting. 3 Change the complexity penalty to counterbalance the misleading metric gain.
16 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Deviation of power-law-based frequencies Deviation is linear w.r.t. tournament size Expected market share of top-ranked individuals after selection is given by the (right-side area of the) c.d.f. of the power distribution. F(x) = x s, 0 < x < 1, s 2. (5) In the worst case, market share grows linearly with tournament size. Thus, misleading frequencies deviate linearly from the true ones (uniform distribution). More details in the paper.
17 Computing Metric Gain Modeling Metric Gain When Overfitting Adaptive Scoring Metric Consider the decision of adding an edge from X 2 to X 1 Metric gain must be calculated: G metric = ScoreAfter ScoreBefore ComplexityPenalty Uniform mating pool reveals p 00 = p 01 = p 10 = p 11 = 0.25 G metric < 0. Simulate tournament mating pool by assigning p (s 1), p (s 1), p (s 1), p (s 1). Where is related to the proportion of top-individuals which lead to overfitting (more info in the paper).
18 Computing Metric Gain (2) Modeling Metric Gain When Overfitting Adaptive Scoring Metric Approximated metric gain G 2 (0.25 (s 1)) log 2 (0.25 (s 1)) + 1. (6) Metric gain, G BIC Approximation O(s) O(s 2 ) Tournament size, s
19 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Comparison of Different Penalty Corrections Complexity penalty for the K2 metric p(b) = 2 0.5cs log 2 (n) P l i=1 L i c s = {1, s, s, and s log 2 (s)} Model Structural Accuracy, MSA c =1 s c s =sqrt(s) c s =s c s =s*log 2 (s) Number of function evaluations, n fe 5 x c =1 s c s =sqrt(s) c s =s c s =s*log 2 (s) Tournament size, s Tournament size, s
20 Results for the s penalty (c s = s) Modeling Metric Gain When Overfitting Adaptive Scoring Metric 14 x 105 Model Structural Accuracy, MSA l = 40 l = 60 l = 80 l = 100 Number of function evaluations, n fe l = 40 l = 60 l = 80 l = Tournament size, s Tournament size, s Remark Model structural accuracy is superior to 99%!
21 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Results for Truncation with the Standard Penalty Model Structural Accuracy, MSA l = 40 l = 60 l = 80 l = 100 Number of function evaluations, n fe 2 x l = 40 l = 60 l = 80 l = Truncation threshold, τ Truncation threshold, τ Remark Good results, but not as good as for tournament selection with the s penalty.
22 Summary & Conclusions Modeling Metric Gain When Overfitting Adaptive Scoring Metric Selection addressed as a source of overfitting. Influence of selection method in BN learning: Truncation selection uniform mating pool is more appropriate for learning (standard scoring metrics). Tournament selection non-uniform mating pool leads to model overfitting. Scoring metric adapted to the power distribution of the mating pool generated by tournament selection. Model accuracy has been considerably improved! Improved model interpretability to assist efficiency enhancement techiniques and human researchers.
23 Modeling Metric Gain When Overfitting Adaptive Scoring Metric From Mating Pool Distributions to Model Overfitting Claudio F. Lima 1 Fernando G. Lobo 1 Martin Pelikan 2 1 Department of Electronics and Computer Science Engineering University of Algarve, Portugal 2 Missouri Estimation of Distribution Algorithm Laboratory University of Missouri at St. Louis, USA GECCO 2008, Atlanta, USA
24 Scoring Metrics Introduction Modeling Metric Gain When Overfitting Adaptive Scoring Metric (Bayesian-Dirichlet) K2 metric K 2(B) = p(b) l 1 Γ(m i (x i, l) + 1), (7) Γ(m i (l) + 2) l L i x i i=1 where p(b) = log 2 (n) P l i=1 L i. Bayesian information criteria (BIC) metric l BIC(B) = i=1 l Li x i ( ) m i (x i, l) m i (x i, l) log 2 m i (l) L i log 2 (n). 2 (8)
From Mating Pool Distributions to Model Overfitting
From Mating Pool Distributions to Model Overfitting Claudio F. Lima DEEI-FCT University of Algarve Campus de Gambelas 85-39, Faro, Portugal clima.research@gmail.com Fernando G. Lobo DEEI-FCT University
More informationEstimation-of-Distribution Algorithms. Discrete Domain.
Estimation-of-Distribution Algorithms. Discrete Domain. Petr Pošík Introduction to EDAs 2 Genetic Algorithms and Epistasis.....................................................................................
More informationAnalyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses Mark Hauschild, Martin Pelikan, Claudio F. Lima, and Kumara Sastry IlliGAL Report No. 2007008 January 2007 Illinois Genetic
More informationAnalyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses Mark Hauschild Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) Dept. of Math and Computer Science, 320 CCB
More informationSpurious Dependencies and EDA Scalability
Spurious Dependencies and EDA Scalability Elizabeth Radetic, Martin Pelikan MEDAL Report No. 2010002 January 2010 Abstract Numerous studies have shown that advanced estimation of distribution algorithms
More informationProbabilistic Model-Building Genetic Algorithms
Probabilistic Model-Building Genetic Algorithms Martin Pelikan Dept. of Math. and Computer Science University of Missouri at St. Louis St. Louis, Missouri pelikan@cs.umsl.edu Foreword! Motivation Genetic
More informationPerformance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and Tunable Overlap
Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and Tunable Overlap Martin Pelikan, Kumara Sastry, David E. Goldberg, Martin V. Butz, and Mark Hauschild Missouri
More informationCS 6375: Machine Learning Computational Learning Theory
CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty
More informationProbabilistic Model-Building Genetic Algorithms
Probabilistic Model-Building Genetic Algorithms a.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms Martin Pelikan Missouri Estimation of Distribution Algorithms
More informationProbabilistic Model-Building Genetic Algorithms
Probabilistic Model-Building Genetic Algorithms a.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms Martin Pelikan Foreword Motivation Genetic and evolutionary computation
More informationFinding Ground States of Sherrington-Kirkpatrick Spin Glasses with Hierarchical BOA and Genetic Algorithms
Finding Ground States of Sherrington-Kirkpatrick Spin Glasses with Hierarchical BOA and Genetic Algorithms Martin Pelikan, Helmut G. Katzgraber and Sigismund Kobe MEDAL Report No. 284 January 28 Abstract
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationInteraction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm
Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm Kai-Chun Fan Jui-Ting Lee Tian-Li Yu Tsung-Yu Ho TEIL Technical Report No. 2010005 April,
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationSample Complexity of Learning Mahalanobis Distance Metrics. Nakul Verma Janelia, HHMI
Sample Complexity of Learning Mahalanobis Distance Metrics Nakul Verma Janelia, HHMI feature 2 Mahalanobis Metric Learning Comparing observations in feature space: x 1 [sq. Euclidean dist] x 2 (all features
More informationModel Selection. Frank Wood. December 10, 2009
Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationLearning With Bayesian Networks. Markus Kalisch ETH Zürich
Learning With Bayesian Networks Markus Kalisch ETH Zürich Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal
More informationConvergence Time for Linkage Model Building in Estimation of Distribution Algorithms
Convergence Time for Linkage Model Building in Estimation of Distribution Algorithms Hau-Jiun Yang Tian-Li Yu TEIL Technical Report No. 2009003 January, 2009 Taiwan Evolutionary Intelligence Laboratory
More informationWITH. P. Larran aga. Computational Intelligence Group Artificial Intelligence Department Technical University of Madrid
Introduction EDAs Our Proposal Results Conclusions M ULTI - OBJECTIVE O PTIMIZATION WITH E STIMATION OF D ISTRIBUTION A LGORITHMS Pedro Larran aga Computational Intelligence Group Artificial Intelligence
More informationAnalysis of Epistasis Correlation on NK Landscapes. Landscapes with Nearest Neighbor Interactions
Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions Missouri Estimation of Distribution Algorithms Laboratory (MEDAL University of Missouri, St. Louis, MO http://medal.cs.umsl.edu/
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationExpanding From Discrete To Continuous Estimation Of Distribution Algorithms: The IDEA
Expanding From Discrete To Continuous Estimation Of Distribution Algorithms: The IDEA Peter A.N. Bosman and Dirk Thierens Department of Computer Science, Utrecht University, P.O. Box 80.089, 3508 TB Utrecht,
More informationhow should the GA proceed?
how should the GA proceed? string fitness 10111 10 01000 5 11010 3 00011 20 which new string would be better than any of the above? (the GA does not know the mapping between strings and fitness values!)
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationAnalysis of Epistasis Correlation on NK Landscapes with Nearest-Neighbor Interactions
Analysis of Epistasis Correlation on NK Landscapes with Nearest-Neighbor Interactions Martin Pelikan MEDAL Report No. 20102 February 2011 Abstract Epistasis correlation is a measure that estimates the
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationState of the art in genetic algorithms' research
State of the art in genetic algorithms' Genetic Algorithms research Prabhas Chongstitvatana Department of computer engineering Chulalongkorn university A class of probabilistic search, inspired by natural
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationBehaviour of the UMDA c algorithm with truncation selection on monotone functions
Mannheim Business School Dept. of Logistics Technical Report 01/2005 Behaviour of the UMDA c algorithm with truncation selection on monotone functions Jörn Grahl, Stefan Minner, Franz Rothlauf Technical
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationDecision Support. Dr. Johan Hagelbäck.
Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationDynamic Poisson Factorization
Dynamic Poisson Factorization Laurent Charlin Joint work with: Rajesh Ranganath, James McInerney, David M. Blei McGill & Columbia University Presented at RecSys 2015 2 Click Data for a paper 3.5 Item 4663:
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationabout the problem can be used in order to enhance the estimation and subsequently improve convergence. New strings are generated according to the join
Bayesian Optimization Algorithm, Population Sizing, and Time to Convergence Martin Pelikan Dept. of General Engineering and Dept. of Computer Science University of Illinois, Urbana, IL 61801 pelikan@illigal.ge.uiuc.edu
More informationLearning Decision Trees
Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2
More informationCS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs
CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy
More informationCS Data Structures and Algorithm Analysis
CS 483 - Data Structures and Algorithm Analysis Lecture II: Chapter 2 R. Paul Wiegand George Mason University, Department of Computer Science February 1, 2006 Outline 1 Analysis Framework 2 Asymptotic
More informationCOMP538: Introduction to Bayesian Networks
COMP538: Introduction to Bayesian Networks Lecture 9: Optimal Structure Learning Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationReducing The Computational Cost of Bayesian Indoor Positioning Systems
Reducing The Computational Cost of Bayesian Indoor Positioning Systems Konstantinos Kleisouris, Richard P. Martin Computer Science Department Rutgers University WINLAB Research Review May 15 th, 2006 Motivation
More informationInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company UNSUPERVISED LEARNING OF BAYESIAN NETWORKS VIA ESTIMATION OF DISTRIBUTION ALGORITHMS: AN
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationProbabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference
Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole University of British Columbia 1 Overview Belief Networks Variable Elimination Algorithm Parent Contexts
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 8 Probability and Statistical Models Motivating ideas AMS 131: Suppose that the random variable
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationEfficient Bayesian Multivariate Surface Regression
Efficient Bayesian Multivariate Surface Regression Feng Li (joint with Mattias Villani) Department of Statistics, Stockholm University October, 211 Outline of the talk 1 Flexible regression models 2 The
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationIMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback
9/3/3 Admin Assignment 3: - how did it go? - do the experiments help? Assignment 4 IMBALANCED DATA Course feedback David Kauchak CS 45 Fall 3 Phishing 9/3/3 Setup Imbalanced data. for hour, google collects
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationParameter learning in CRF s
Parameter learning in CRF s June 01, 2009 Structured output learning We ish to learn a discriminant (or compatability) function: F : X Y R (1) here X is the space of inputs and Y is the space of outputs.
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationLecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016
Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,
More informationProperties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area
Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area under the curve. The curve is called the probability
More informationProbability Distributions: Continuous
Probability Distributions: Continuous INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 28, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions:
More informationDecision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014
Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationCS Homework 3. October 15, 2009
CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationCS168: The Modern Algorithmic Toolbox Lecture #6: Regularization
CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models
More informationBayesian networks approximation
Bayesian networks approximation Eric Fabre ANR StochMC, Feb. 13, 2014 Outline 1 Motivation 2 Formalization 3 Triangulated graphs & I-projections 4 Successive approximations 5 Best graph selection 6 Conclusion
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationHierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses
Hierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses Martin Pelikan 1, Alexander K. Hartmann 2, and Kumara Sastry 1 Dept. of Math and Computer Science, 320 CCB University of Missouri at
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem
Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:
More informationAn empirical distribution-based approximation of PROMETHEE II's net flow scores
imw215 23 January, 215 Brussels, Belgium An empirical distribution-based approximation of PROMETHEE II's net flow scores Stefan Eppe, Yves De Smet Computer & Decision Engineering Department Université
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More information