Learning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/

Similar documents
Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Intelligent Systems I

Statistical learning. Chapter 20, Sections 1 4 1

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Probabilistic Graphical Models for Image Analysis - Lecture 1

Machine Learning (CS 567) Lecture 2

CS 6375 Machine Learning

Bayesian Learning (II)

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

PMR Introduction. Probabilistic Modelling and Reasoning. Amos Storkey. School of Informatics, University of Edinburgh

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Bayesian Networks BY: MOHAMAD ALSABBAGH

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Week 5: Logistic Regression & Neural Networks

Introduction to Bayesian Learning. Machine Learning Fall 2018

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Pattern Recognition and Machine Learning

Machine Learning and Deep Learning! Vincent Lepetit!

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

CS 540: Machine Learning Lecture 1: Introduction

Logistic Regression. Machine Learning Fall 2018

Lecture : Probabilistic Machine Learning

Course 395: Machine Learning

Brief Introduction to Machine Learning

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

PATTERN CLASSIFICATION

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Introduction. Chapter 1

Machine Learning and Related Disciplines

ECE521 week 3: 23/26 January 2017

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

CS 188: Artificial Intelligence Spring Announcements

Machine Learning Lecture 1

GWAS V: Gaussian processes

Probabilistic Graphical Models

Lecture Machine Learning (past summer semester) If at least one English-speaking student is present.

Learning from Data: Multi-layer Perceptrons

Notes on Discriminant Functions and Optimal Classification

Naive Bayes classification

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Bias-Variance Tradeoff

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Generative Clustering, Topic Modeling, & Bayesian Inference

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Machine Learning Lecture 7

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Probabilistic Reasoning. (Mostly using Bayesian Networks)

total 58

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

PMR Learning as Inference

CS Machine Learning Qualifying Exam

Introduction to Graphical Models

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Be able to define the following terms and answer basic questions about them:

Introduction to Gaussian Process

Statistical learning. Chapter 20, Sections 1 3 1

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

CS 188: Artificial Intelligence Spring Announcements

Algorithmisches Lernen/Machine Learning

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Approximate Inference Part 1 of 2

STA 414/2104: Lecture 8

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Learning with Probabilities

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Approximate Inference Part 1 of 2

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Deep Generative Models. (Unsupervised Learning)

Computational Genomics

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

CS-E3210 Machine Learning: Basic Principles

Mining Classification Knowledge

Bayesian Networks Inference with Probabilistic Graphical Models

Mathematical Formulation of Our Example

Statistical learning. Chapter 20, Sections 1 3 1

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Inference Control and Driving of Natural Systems

CSCI-567: Machine Learning (Spring 2019)

L11: Pattern recognition principles

BAYESIAN MACHINE LEARNING.

Statistical Models. David M. Blei Columbia University. October 14, 2014

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Introduction to Machine Learning Midterm Exam

CSC321 Lecture 5: Multilayer Perceptrons

IFT Lecture 7 Elements of statistical learning theory

Linear Classifiers: Expressiveness

Non-parametric Methods

Machine Learning 2007: Slides 1. Instructor: Tim van Erven Website: erven/teaching/0708/ml/

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Statistical Learning. Philipp Koehn. 10 November 2015

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CS 188: Artificial Intelligence Spring Announcements

Introduction to Statistical Inference

Transcription:

Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/

Introduction Welcome Administration Online notes Books: See website Assignments Tutorials Exams Acknowledgement: I would like to that David Barber and Chris Williams for permission to use course material from previous years.

Course details 18 lectures 5.10 to 6.00pm Mon and Thurs 7 tutorials (compulsory). Start Thurs week 3. 2 assignments (20%) (week 4 and week 8) 1 exam (80%) Course notes.

Relationships between courses LfD. Basic introductory course on supervised and unsupervised learning RL Reinforcement Learning. MLSC Machine Learning and Sensorimotor Control. PMR Probabilistic modelling and reasoning. Focus on probabilistic modelling. Learning and inference for probabilistic models, e.g. Probabilistic expert systems, latent variable models, Hidden Markov models, Kalman filters, Boltzmann machines. DME Data mining and Exploration. Using methods from PMR to deal with practical issues in learning from large datasets.

Maths and will involve a significant number of mathematical ideas and a significant amount of mathematical manipulation. For those wanting to pursue research in any of the areas covered, you better understand all (or almost all) the maths. Others should understand the ideas behind the maths. It is obviously preferable to understand the detail too, but understanding in a procedural way (i.e. how to program an algorithm) will usually be sufficient.

Course Outline (not necessarily in order) Introduction. Thinking about data. Preliminaries: supplementary maths, MATLAB. Understanding data, and models of data: generative versus discriminative, supervised or unsupervised. Gaussian density estimation, dimensionality reduction. Visualisation. Naive Bayes. Regression and classification, linear and logistic regression, generalised linear models. Mixture models, class conditional classification, visualisation 2. Generalisation, perceptron, layered neural networks, radial basis functions, nearest neighbour classifiers.

Why Learn from Data? Growing flood of online data. People already learn from data. Automated methods increase the scope. Recent progress in algorithms and theory. Computational power is available. Budding industry. Because we can.

Examples Science (Astronomy, neuroscience, medical imaging, bio-informatics). Retail (Intelligent stock control, demographic store placement) Manufacturing (Intelligent control, automated monitoring, detection methods) Security (Intelligent smoke alarms, fraud detection). Marketing Management (Scheduling, time tabling, competitor analysis warning systems). Finance (risk analysis, micro-elasticity analysis). Over to you...

Thinking about Data This course is not computer science as you know it. Computer science and algorithms: Computer science as algorithm generation. If the algorithm works it is good. If it doesn t it is bad. Machine Learning: the algorithm and the model. Model encodes understanding about the data. Algorithm comes from the model (and a bit of maths). Algorithms give different approximations.

Thinking about Data Learning from data is not magic. Prior beliefs/assumptions + data posterior beliefs. The model encodes beliefs about the generative process of the data. or... The model encodes beliefs about the features/characteristics in the data. Can do nothing without some prior input - no connection between data and question.

Illusions Logvinenko illusion Meteor - internet ray tracing entry.

Logvinenko Illusion

Example 3 Boolean variables. Data set: What is x? 1 0 1 1 1 1 1 1 0 0 1 0 0 0 x We cannot say. We have no information at all about how any of these data items is connected. No free lunch

No Free Lunch Try to predict C {0, 1} from A, B {0, 1}. No noise - given A,B then C is always the same. Possible hypotheses. C = 1 if and only if values for A,B are in a particular set. One example hypothesis is: {(1, 1), (0, 1), (1, 0)} (i.e. C = A OR B). Here (0, 1) means A = 0, B = 1. If no bias, then there are 4 C 0 + 4 C 1 + 4 C 2 + 4 C 3 + 4 C 4 = 16 equally possible hypotheses - the hypothesis space. Each data point reduces the size of the hypothesis space, but when we attempted to predict C given an unseen set of values of A,B the number of hypotheses predicting C = 1 is the same as the number predicting C = 0.

No Free Lunch contd. Eg suppose we have data (a = 1, b = 1, c = 0), (a = 0, b = 1, c = 1), then the remaining C = 1 hypothesis space for (AB) is {(0, 1)}, {(0, 1), (1, 0)}, {(0, 1), (0, 0)}, {(0, 1), (1, 0), (0, 0)}. Suppose we now query (a = 1, b = 0). Two of the possible hypotheses predict C = 1, and two predict C = 0. Suppose we now see data (a = 0, b = 0, c = 0). Hypothesis space is {(0, 1)}, {(0, 1), (1, 0)}. One of the remaining hypotheses predict C = 1, and the other predicts C = 0. No matter what data you receive the number of hypotheses predicting one values for unseen data will equal the number predicting the other value.

Prior assumption Prior assumption: say something about the ways C is allowed to relate to A, B. e.g. (C = A OR B) OR (C = A AND B) OR (C = A XOR B). (a = 1, b = 0, c = 1), (a = 0, b = 1, c = 1): So either OR or XOR. Can predict (a = 0, b = 0, c =?).

Suspect Terms Model Free. Bias Free. Unbiased. No prior information. Generally applicable.

Machine Learning and Probability Probability theory is key: probabilistic understanding of uncertainty. Bayesian methods: machine learning is really just statistics? Bayesian methods are non-trivial: Hard to really understand the full implications of a probability distribution. Hard to accurately represent your prior beliefs, and represent them in a way that is amenable to computation.

Summary Fairly mathematical course. Try to keep on top of it. No free lunch. Plethora of practical needs. Models not algorithms. Probability theory is key.