Computational Genomics

Similar documents
10-701: Introduction to Deep Neural Networks Machine Learning.

Lecture 1: Bayesian Framework Basics

Conditional Independence

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Naïve Bayes classification

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

CPSC 540: Machine Learning

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1

Stephen Scott.

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

STA 4273H: Statistical Machine Learning

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Probabilistic Graphical Models (I)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

CS6220: DATA MINING TECHNIQUES

Probabilistic Methods in Bioinformatics. Pabitra Mitra

Pattern Recognition and Machine Learning

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Bayesian Approaches Data Mining Selected Technique

Intelligent Systems (AI-2)

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks Inference with Probabilistic Graphical Models

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

The Bayes classifier

Mathematical Formulation of Our Example

Probabilistic Graphical Models

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 11: Probability Review. Dr. Yanjun Qi. University of Virginia. Department of Computer Science

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Statistical learning. Chapter 20, Sections 1 4 1

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Directed Graphical Models

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Recitation 2: Probability

Graphical Models and Kernel Methods

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

10-701/15-781, Machine Learning: Homework 4

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

UVA CS / Introduc8on to Machine Learning and Data Mining

CS6220: DATA MINING TECHNIQUES

Algorithms for Uncertainty Quantification

Final Exam, Fall 2002

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Joint Probability Distributions, Correlations

Machine Learning. Naïve Bayes classifiers

Introduction to Bayesian Learning. Machine Learning Fall 2018

Lecture 2: Repetition of probability theory and statistics

Machine Learning, Fall 2009: Midterm

Bayesian RL Seminar. Chris Mansley September 9, 2008

Introduction to Graphical Models

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Learning in Bayesian Networks

CSCI-567: Machine Learning (Spring 2019)

Graphical Models - Part I

Where are we? è Five major sec3ons of this course

Active and Semi-supervised Kernel Classification

Introduction to Machine Learning Midterm Exam

Machine Learning 4771

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning: Probability Theory

Chris Bishop s PRML Ch. 8: Graphical Models

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Directed Graphical Models or Bayesian Networks

Lecture 9: PGM Learning

Random Field Models for Applications in Computer Vision

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Bayesian Learning Features of Bayesian learning methods:

Least Squares Regression

Gaussian Models

Machine Learning 4771

STAT 518 Intro Student Presentation

Least Squares Regression

Introduction to Machine Learning Midterm Exam Solutions

MACHINE LEARNING ADVANCED MACHINE LEARNING

Statistical Learning. Philipp Koehn. 10 November 2015

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Statistical methods in recognition. Why is classification a problem?

Probabilistic Graphical Models for Image Analysis - Lecture 1

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Review (Probability & Linear Algebra)

8.1 Concentration inequality for Gaussian random matrix (cont d)

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

PATTERN CLASSIFICATION

Day 5: Generative models, structured classification

Probability Theory for Machine Learning. Chris Cremer September 2015

Probabilistic Reasoning. (Mostly using Bayesian Networks)

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Transcription:

Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms

(brief) intro to probability

Basic notations Random variable - referring to an element / event whose status is unknown: A = gene g is increased 2 folds Domain (usually denoted by ) - The set of values a random variable can take: - A = Cancer? : Binary - A = Protein family : Discrete - A = Log ratio change in expression : Continuous

Priors Degree of belief in an event in the absence of any other information Cancer Healthy P(cancer) = 0.2 P(no cancer) = 0.8

Conditional probability P(A = 1 B = 1): The fraction of cases where A is true if B is true P(A = 0.2) P(A B = 0.5)

Conditional probability In some cases, given knowledge of one or more random variables we can improve upon our prior belief of another random variable For example: p(cancer) = 0.5 p(cancer non smoker) = 1/4 p(cancer smoker) = 3/4 Cancer 1 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 Smoker

Joint distributions The probability that a set of random variables will take a specific value is their joint distribution. Notation: P(A B) or P(A,B) Example: P(cancer, smoking) If we assume independence then P(A,B)=P(A)P(B) However, in many cases such an assumption maybe too strong (more later in the class)

Chain rule The joint distribution can be specified in terms of conditional probability: P(A,B) = P(A B)*P(B) Together with Bayes rule (which is actually derived from it) this is one of the most powerful rules in probabilistic reasoning

Bayes rule One of the most important rules for this class. Derived from the chain rule: P(A,B) = P(A B)P(B) = P(B A)P(A) Thus, P( A B) P( B A) P( A) P( B) Thomas Bayes was an English clergyman who set out his theory of probability in 1764.

Bayes rule (cont) Often it would be useful to derive the rule a bit further: A A P A B P A P A B P B P A P A B P A B P ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( This results from: P(B) = A P(B,A) A B A B P(B,A=1) P(B,A=0)

Probability Density Function Discrete distributions 1 2 3 4 5 6 Continuous: Cumulative Density Function (CDF): F(a) f(x) a x

Total probability Cumulative Density Functions Probability Density Function (PDF) Properties: F(x)

Expectations Mean/Expected Value: Variance: In general:

Multivariate Joint for (x,y) Marginal: Conditionals: Chain rule:

Bayes Rule Standard form: Replacing the bottom:

Binomial Distribution: Mean/Var:

Uniform Anything is equally likely in the region [a,b] Distribution: Mean/Var a b

Poisson Distribution p( x k) k e Discrete distribution Widely used in sequence analysis (read counts are discrete). is the expected value of x (the number of observations) and is also the k! variance: E( x) Var ( x)

Gaussian (Normal) If I look at the height of women in country xx, it will look approximately Gaussian Small random noise errors, look Gaussian/Normal Distribution: Mean/var

Why Do People Use Gaussians Central Limit Theorem: (loosely) - Sum of a large number of IID random variables is approximately Gaussian

Multivariate Gaussians Distribution for vector x PDF:

Multivariate Gaussians n 1 cov( x, x ) ( x )( x ) 1 2 1, i 1 2, i 2 n i 1

Covariance examples Anti-correlated Correlated Independent (almost) Covariance: -9.2 Covariance: 18.33 Covariance: 0.6

(A few) key computational methods

Regression Given an input x we would like to compute an output y In linear regression we assume that y and x are related with the following equation: Y What we are trying to predict y = wx+ Observed values where w is a parameter and represents measurement or other noise X

Supervised learning Classification is one of the key components of supervised learning In supervised learning the teacher (us) provides the algorithm with the solutions to some of the instances and the goal is to generalize so that a model / method can be used to determine the labels of the unobserved samples X Classifier w 1, w 2 X,Y teacher Y

Types of classifiers We can divide the large variety of classification approaches into roughly two main types 1. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors 2. Generative: - build a generative statistical model - e.g., Naïve Bayes 3. Discriminative - directly estimate a decision rule/boundary - e.g., decision tree, SVM

Unsupervised learning We do not have a teacher that provides examples with their labels Goal: Organize data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among objects

Graphical models: Sparse methods for representing joint distributions Nodes represent random variables Edges represent conditional dependence Can be either directed (Bayesian networks, HMMs) or undirected (Markov Random Fields, Gaussian Random Fields)

Bayesian networks Bayesian networks are directed acyclic graphs. Conditional probability tables (CPTs) P(Lo) = 0.5 Le Conditional dependency P(Li Lo) = 0.4 P(Li Lo) = 0.7 Li S P(S Lo) = 0.6 P(S Lo) = 0.2 Random variables