An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University

Similar documents
Bias/variance tradeoff, Model assessment and selec+on

CS 6140: Machine Learning Spring 2016

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

STAD68: Machine Learning

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Mixture Models. Michael Kuhn

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Density Estimation. Seungjin Choi

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang

COMP 562: Introduction to Machine Learning

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Introduction to Gaussian Process

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Generative Model (Naïve Bayes, LDA)

CS534 Machine Learning - Spring Final Exam

Chapter 16. Structured Probabilistic Models for Deep Learning

Machine Learning, Midterm Exam

Latent Dirichlet Alloca/on

Lecture : Probabilistic Machine Learning

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

Statistical learning. Chapter 20, Sections 1 4 1

Unsupervised Learning

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Introduction to Probabilistic Machine Learning

Pattern Recognition and Machine Learning

Probabilistic Reasoning in Deep Learning

Introduction to Machine Learning Midterm, Tues April 8

PATTERN CLASSIFICATION

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

STA414/2104 Statistical Methods for Machine Learning II

Statistics & Data Sciences: First Year Prelim Exam May 2018

STA 4273H: Sta-s-cal Machine Learning

Statistical learning. Chapter 20, Sections 1 3 1

Numerical Learning Algorithms

Brief Introduction of Machine Learning Techniques for Content Analysis

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Machine Learning Practice Page 2 of 2 10/28/13

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Lecture 3: Machine learning, classification, and generative models

Part 2. Representation Learning Algorithms

Logistic Regression. Machine Learning Fall 2018

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models

Lecture 2 Machine Learning Review

CSE 546 Final Exam, Autumn 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Computer Vision. Pa0ern Recogni4on Concepts. Luis F. Teixeira MAP- i 2014/15

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

CSC321 Lecture 18: Learning Probabilistic Models

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

CS-E3210 Machine Learning: Basic Principles

Bayesian Machine Learning

Parametric Techniques Lecture 3

Sta$s$cal sequence recogni$on

The Origin of Deep Learning. Lili Mou Jan, 2015

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Mining Classification Knowledge

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Kernel methods, kernel SVM and ridge regression

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Final Examination CS 540-2: Introduction to Artificial Intelligence

Introduction to Bayesian Learning. Machine Learning Fall 2018

Announcements. Proposals graded

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Final Examination CS540-2: Introduction to Artificial Intelligence

Linear Models for Classification

Bayesian Decision and Bayesian Learning

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

CSCE 478/878 Lecture 6: Bayesian Learning

COMP90051 Statistical Machine Learning

CS6220: DATA MINING TECHNIQUES

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Support Vector Machines

UVA CS / Introduc8on to Machine Learning and Data Mining

GAUSSIAN PROCESS REGRESSION

Class Notes. Examining Repeated Measures Data on Individuals

Mining Classification Knowledge

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Learning Bayesian network : Given structure and completely observed data

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Transcription:

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University

Why Do We Care? Necessity in today s labs Principled approach: defending conclusions, comparing methods,.. Insights for biological informa+on processing?

What Can We Expect to Do Today? Central recurring themes Orienta+on and connec+ons Resources

Data Deluge in Biology hnp://biomedicalcomputa+onreview.org hnp://ugene.net/learn.html hnps://sites.stanford.edu/baccuslab/ hnp://mmtg.fel.cvut.cz/mapsim/

Then and Now Data= 10-100 numbers Ques+ons=>,<,!= Data=1-10TB Ques+ons=10 4-10 5 tests Ronald Fisher (on the right) during tea +me at Rothamsted Experimental Sta+on (1920s) hnp://disserta+onreviews.org/archives/724 New York Genome Center hnp://blogs.scien+ficamerican.com/

How Much to Learn from Data? hnps://shapeofdata.wordpress.com/2013/03/26/general-regression-and-over-fihng/

Model Selec+on: Bias-Variance Tradeoff Underfihng Overfihng hnp://scon.fortmann-roe.com/docs/biasvariance.html

Probability Distribu+ons Discrete distribu+ons: Binomial, Poisson,.. P(X = k)= n k pk (1 p) n k Poisson: n large, p small with np= fixed Con+nuous distribu+ons: normal, chi-squared, student-t, F,.. P(a X b)= b a p(x µ,σ 2 )dx 2 1 µ) p(x µ,σ 2 )= exp[ (x ] σ 2π 2σ 2 hnp://sta+s+cs.wikidot.com

Distribu+ons Derived from Standard Normal Distribu+on (μ=0, σ 2 =1) Chi-squared distribu+on z 1,...,z k iid N(0,1) k i=1 Q = z i 2 χ 2 (k) Student s t distribu+on z N(0,1) Q χ 2 (k) T = z /(Q /k) 1/2 t(k) F distribu+on Q 1 χ 2 (k 1 ) Q 2 χ 2 (k 2 ) Q 1 /k 1 Q 2 /k 2 F(k 1,k 2 )

Es+ma+on p(x µ,σ 2 )= 2 1 µ) exp[ (x ] σ 2π 2σ 2 How to find μ, σ 2 from observa+ons?

Candidate Es+mators ˆµ(x) = i n x i = x ˆ σ 2 (x)= i (x i x ) 2 n 1 = s 2

Hypothesis tes+ng p(x µ,σ 2 )= 2 1 µ) exp[ (x ] σ 2π 2σ 2 How to decide whether the hypothesis H 1 : μ >6 is true from observa+ons, as opposed to H 0 : μ =6?

Test Sta+s+c z = x µ σ N(0,1) underh 0 T = x µ t(n 1) underh s 0

Type I and Type II error Accept H 0 Reject H 0 H 0 true OK Type I error Prob α H 0 false Type II error Prob β OK hnp://www.healthknowledge.org.uk/e-learning/sta+s+cal-methods/

Regression and Goodness of Fit hnp://randomanalyses.blogspot.com/2011/12/basics-of-regression-and-model-fihng.html

F-test for Linear Regression i 2 2 ( y y) = ( ŷi y) + ( yi ŷ ) i i (Corrected) Sum of Squares Total i SST = SSM + SSE (Corrected) Sum of Squares for Model i Sum of Squares for Error 2 SSM /(p 1) F = SSE /(n p) hnp://www.jamesstacks.com/stat/ Related measure: Coefficient of Determina+on R 2 = SSM SST

Likelihood L(θ x)= P(x θ) e.g. L(µ,σ 2 x 1,...,x n )= n i=1 1 exp[ (x µ) 2 i ] σ 2π 2σ 2

Likelihood Surface The peak gets sharper and sharper with more data. hnp://reliawiki.org/index.php/appendix:_maximum_likelihood_es+ma+on_example

Example: Normal Distribu+on L(µ,σ 2 x 1,...,x n )= n i=1 1 σ 2π exp[ (x µ) 2 i ] 2σ 2 ˆµ(x) = i n x i = x Maximizing likelihood=> ˆ σ 2 (x)= i (x i x ) 2 n

Limited Noisy Data with Small Signal Imagine we had to es+mate only the mean μ. L(µ x 1,...,x n )= n i=1 x i 1 σ 0 2π exp[ (x µ) 2 i ] 2 2σ 0 i Should we use MLE ˆµ(x) = n = x? What if we choose, instead, ˆµ 0 (x)= 0? E[( ˆµ(x) µ) 2 ]= σ 2 0 n E[( ˆµ 0 (x) µ) 2 ]= µ 2 Error from noise (variance) Trivial es+mator bener when µ < σ 0 n! Error from bias

Regularize to Shrink the Es+mate We use the es+mator For es+ma+ng only the mean μ, maximize exp[ µ2 n 2λ ] 1 2 σ 0 2π exp[ (x µ) 2 i ] 2 i=1 2σ 0 ˆµ(x) = i x i 2 n+ σ 0 λ 2. When λ σ 0 n we get ˆµ(x) 0!

Regulariza+on via Bayesian Framework P(θ x)= P(x,θ) P(x) = P(x θ)p 0 (θ) P(x) where P(x)= dθ 'P(x θ ')P 0 (θ ') The prior P 0 (θ) incorporates addi+onal constraints on parameters.

Maximum A Posteriori Es+ma+on θ ˆ = argmax P(θ x)= argmax θ θ P(x θ)p 0 (θ) P(x) Maximum likelihood es+mate corresponds to a uniform prior: P 0 (θ)=const. One could also sample θ from the posterior distribu+on P(θ x) to get a sense of range of possibili+es. Some+mes averaging over θ is a bener idea than maximiza+on

Likelihood Improvement with Higher Complexity L 2 L 1

Evidence Prior Product Likelihood P(x)= dθ 'P(x,θ ') = dθ 'P(x θ ')P 0 (θ ') Since dxp(x) = 1 θ less constrained models have lower evidence.

An Example: Tied Means dµ 1 λ 2π exp[ µ 2 n 2λ ] 1 2 σ 0 2π exp[ (x µ) 2 i ] 2 i=1 2σ 0 x 2 x 1 n 1 dµ i σ 0 2π exp[ (x µ ) 2 i i 1 ] 2 i=1 2σ 0 λ 2π exp[ µ 2 i 2λ ] 2

Bayesian Hypothesis Tes+ng: Going Beyond p-values Pr(Disease test positive) Pr(No Disease test positive) = Pr(testpositive Disease) Pr(testpositive No Disease) P(Disease) P(No Disease)

Bayesian Hypothesis tes+ng P(H) dθ 'P(x θ ')P (θ ' H) P(H x)= 0 P(x) where P(x)= H' P(H 0 x) P(H 1 x) = P(H ) 0 P(H 1 ) P(H') dθ 'P(x θ ')P (θ ' H) 0 dθ 'P(x θ ')P 0 (θ ' H 0 ) dθ 'P(x θ ')P 0 (θ ' H 1 ) Posterior odds = Prior odds x Bayes factor

More Complex Tasks in Data Analysis Supervised: Classifica+on, regression,.. Unsupervised: Clustering, discovering latent factors, network structures,..

Clustering Gene Expression hnp://www.pha.jhu.edu/~ghzheng/ hnp://www.computa+onal-genomics.net/case_studies/cellcycle_demo.html

Latent Variable Models: Clustering and Mixture Models p(x λ)= w i g(x µ i,σ i ) i 1 g(x µ i,σ i )= (2π ) n/2 Σ i exp{ 1 1/2 2 (x µ )'Σ 1 (x µ i i i )} hnps://en.wikipedia.org/wiki/cluster_analysis Challenge: Learning parameters w i, μ i, Σ i. Expecta+on Maximiza+on, Sampling,

Latent Variable Models: Hidden Markov Models P(o,s) = P(s)P(o s) = π s0 b s0 (o 0 )a s0,s 1 b s1 (o 1 )a s1,s 2 b s2 (o 2 )a s2,s 3... hnp://ar+nt.info/html/artint_161.html hnp://compbio.pbworks.com/

Diffusive HMM for Single Molecule Phenomena Beausang et al, Biophys J, 2007

Genera+on vs Predic+on So far we have dealt with explicit probability models of data. What if we have no clue on precise probability models?

RosenblaN s Perceptron hnp://sebas+anraschka.com/ar+cles/2015_singlelayer_neurons.html

Learning the Weights w t+1 = w t +η( y i ŷ i )x i Good learning: The decision surface does not depend Too much on the subset of training data. Test by cross-valida+on.

Linearly Non-separable Cases hnp://www.saedsayad.com/ar+ficial_neural_network_bkp.htm

Parametrizing Complex Decision Boundaries Shallow methods: SVM, Boos+ng, Deep methods: Layered neural nets

Classifica+on: Kernel Methods/SVM hnp://www.mdpi.com/1424-8220/14/11/20713/htm i φ(x)= α i y i K(x,x i )

SVM Kernels Linear Polynomial K(x, y)= x. y K(x, y)= (x. y +1) p Radial Basis Func+on K(x, y)= exp( (x y)2 σ 2 )

Classifying Regulatory Sequences Djordjevic, Sengupta and Shraiman, Genome Res, 2003

Back to Probability Models: Restricted Boltzmann Machine Restricted Boltzmann Machine (RBM) h P(x,v)= exp( E(x,v))/ Z E(x,v)= Z = {x,v } x i W ia h a ia exp( E(x,v)) x Smolensky, 1986 Hinton and Salakutdinov, Science, 2006

MNIST Data and RBM hnps://corpocrat.com/2014/10/17/machine-learning-using-restricted-boltzmann-machines/

Stacking RBMs into Deep Boltzmann Machines (DBM) DBM RBM h 2 h RBM h 1 x RBM x

Mul+layer Ar+ficial Neural Nets/ Deep Learning (ANN 2.0) hnp://www.rsipvision.com/exploring-deep-learning/

Forget Probabili+es: Feedforward Deep nets (1) h M1 1 (2) h M2 1! (d ) h Md 1 σ(w (1) M1 x N N 1 + b 1 ), σ(w (2) (1) M2 h M M1 1 1 + b 2 ) (d σ(w ) M2 h (d 1) M M1 + b 1 1 d ) From hnp://www.andrewsnoke.com

Applica+ons to Biology Angermueller et al, Mol Syst Bio, 2016, adapted from Alipanahi et al, Nat Biotechnol, 2015

Summary Complexity of models and overfihng Significance tests Cross-valida+on

Tools

Books