Bioinformatics: Network Analysis

Similar documents
Resampling techniques for statistical modeling

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Probability and Statistics

Lecture 30. DATA 8 Summer Regression Inference

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Regression Analysis V... More Model Building: Including Qualitative Predictors, Model Searching, Model "Checking"/Diagnostics

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Numerical Optimization: Basic Concepts and Algorithms

Chapter 1 Statistical Inference

Introduction to Statistical modeling: handout for Math 489/583

Confidence Estimation Methods for Neural Networks: A Practical Comparison

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Logistic Regression. Machine Learning Fall 2018

Fundamentals of Machine Learning (Part I)

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

A.I.: Beyond Classical Search

Poisson regression: Further topics

Machine Learning and Data Mining. Linear regression. Kalev Kask

Lecture 3: Introduction to Complexity Regularization

Multivariate Time Series: Part 4

Data Warehousing & Data Mining

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Local Search & Optimization

ECE521 week 3: 23/26 January 2017

Tutorial on Machine Learning for Advanced Electronics

Computational statistics

Generalized Linear Models

Model Selection. Frank Wood. December 10, 2009

Linear model selection and regularization

Machine Learning. Ensemble Methods. Manfred Huber

Introduction to Optimization

Chapter 12 - Part I: Correlation Analysis

Machine Learning Linear Regression. Prof. Matteo Matteucci

Business Statistics. Lecture 9: Simple Regression

Optimization and Complexity

Ch 6. Model Specification. Time Series Analysis

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Local Search & Optimization

Lecture 5: Logistic Regression. Neural Networks

Ch 4. Linear Models for Classification

Fundamentals of Machine Learning

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

Sparse Approximation and Variable Selection

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Regression, Ridge Regression, Lasso

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Nonparametric Bayesian Methods (Gaussian Processes)

CPSC 540: Machine Learning

Testing Statistical Hypotheses

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Data Mining Part 5. Prediction

Motivation, Basic Concepts, Basic Methods, Travelling Salesperson Problem (TSP), Algorithms

Linear Models for Regression CS534

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Some general observations.

Open Problems in Mixed Models

Sections 18.6 and 18.7 Artificial Neural Networks

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Outline lecture 2 2(30)

Linear Regression In God we trust, all others bring data. William Edwards Deming

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

Do not copy, post, or distribute

Data Mining und Maschinelles Lernen

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

Linear Model Selection and Regularization

Reading Group on Deep Learning Session 1

Machine learning - HT Basis Expansion, Regularization, Validation

New Methods of Mixture Model Cluster Analysis

Machine Learning and Data Mining. Linear regression. Prof. Alexander Ihler

Scaling Up. So far, we have considered methods that systematically explore the full search space, possibly using principled pruning (A* etc.).

Empirical Risk Minimization

Cross-Validation with Confidence

Implementation and performance of selected evolutionary algorithms

Stochastic Gradient Descent

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2017

Sections 18.6 and 18.7 Artificial Neural Networks

How do we compare the relative performance among competing models?

Bayesian Information Criterion as a Practical Alternative to Null-Hypothesis Testing Michael E. J. Masson University of Victoria

Cross-Validation with Confidence

Incremental Stochastic Gradient Descent

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

BSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5

The Genetic Algorithm is Useful to Fitting Input Probability Distributions for Simulation Models

Day 3 Lecture 3. Optimizing deep networks

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Ways to make neural networks generalize better

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Machine Learning

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Week 5: Logistic Regression & Neural Networks

Chapter 3: Regression Methods for Trends

Classification Logistic Regression

Advanced Optimization

Transcription:

Bioinformatics: Network Analysis Model Fitting COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1

Outline Parameter estimation Model selection 2

Parameter Estimation 3

Generally speaking, parameter estimation is an optimization task, where the goal is to determine a set of numerical values for the parameters of the model that minimize the difference between experimental data and the model. 4

In order to avoid cancelation between positive and negative differences, the objective function to be minimized is usually the sum of the squared differences between data and model. This function is commonly called SSE (sum of squared errors), and methods searching for models under SSE are called least-squares methods. 5

Parameter estimation for linear systems is relatively easy, whereas parameter estimation of nonlinear systems (that can t be transformed into approximately linear ones) is very hard. 6

Linear Systems: Linear Regression Involving a Single Variable A single variable y depending on only one other variable x A scatter plot gives an immediate impression of whether the relationship might be linear. If so, the method of linear regression quickly produces the straight line that optimally describes this relationship. 7

Linear Systems: Linear Regression Involving a Single Variable 8

Linear Systems: Linear Regression Involving a Single Variable y=1.899248x+1.628571 8

Linear Systems: Linear Regression Involving a Single Variable Two issues: Extrapolations or predictions of y-values beyond the observed range of x are not reliable. The algorithm yields linear regression lines whether or not the relationship between x and y is really linear. 9

Linear Systems: Linear Regression Involving a Single Variable 10

Linear Systems: Linear Regression Involving a Single Variable Often, simple inspection as in the previous figure is sufficient. However, one should consider assessing a linear regression result with some mathematical rigor (e.g., analyze residual errors,..) 11

Linear Systems: Linear Regression Involving Several Variables Linear regression can also be executed quite easily in cases of more variables. (the function regress is Matlab does the job) 12

Linear Systems: Linear Regression Involving Several Variables 13

Linear Systems: Linear Regression Involving Several Variables z=-0.0423+0.4344x+1.13y 13

Linear Systems: Linear Regression Involving Several Variables Sometimes, a nonlinear function can be turned into a linear one... Recall: linearization of Michaelis-Menten rate law. 14

Linear Systems: Linear Regression Involving Several Variables 15

Linear Systems: Linear Regression Involving Several Variables Sometimes, a nonlinear function can be turned into a linear one... E.g., taking the log of an exponential growth function V = X g Y h ln V =ln + g ln X + h ln Y 16

Nonlinear Systems Parameter estimation for nonlinear systems is incomparably more complicated than linear regression. The main reason is that there are infinitely many different nonlinear functions (there is only one form of a linear function). 17

Nonlinear Systems Even if a specific nonlinear function has been selected, there are no simple methods for computing the optimal parameter values as they exist for linear systems. The solution of a nonlinear estimation task may not be unique. It is possible that two different parameterizations yield exactly the same residual error or that many solutions are found, but none of them is truly good, let alone optimal. Several classes of search algorithms exist for this task (each of which works well in some cases and poorly in others). 18

Nonlinear Systems Class I: exhaustive search (grid search) One has to know admissible ranges for all parameters One has to iterate the search many times with smaller and smaller intervals Clearly infeasible but for very small systems 19

Nonlinear Systems Branch-and-bound methods are significant improvements over grid searches, because they ideally discard large numbers of inferior solution candidates in each step. These method make use of two tools: branching: divide the set of candidate solutions into nonoverlapping subsets bounding: estimate upper and lower bounds for SSE 20

Nonlinear Systems Class II: hill-climbing (steepest-descent) methods head in the direction of improvement 21

Nonlinear Systems 22

Nonlinear Systems Class II: hill-climbing (steepest-descent) methods Clearly, can get stuck in local optima 23

Nonlinear Systems 24

Nonlinear Systems Class III: evolutionary algorithms simulates evolution with fitness-based selection the best-known method in this class is the genetic algorithm (GA) each individual is a parameter vector, and its fitness is computed based on the residual error; mating within the parent population lead to a new generation of individuals (parameter vectors); mutation and recombination can be added as well... 25

Nonlinear Systems A typical individual in the application of a GA to parameter estimation: 26

Nonlinear Systems Generation of offspring in a generic GA: mating (or recombination) mutation 27

Nonlinear Systems Genetic algorithms work quite well in many cases and are extremely flexible (e.g., in terms of fitness criteria) However, in some cases they do not converge to a stable population In general, these algorithms are not particularly fast 28

Nonlinear Systems Other classes of stochastic algorithms: ant colony optimization (ACO) particle swarm optimization (PSO) simulated annealing (SA)... 29

Typical Challenges Noise in the data: this is almost always present (due to inaccuracies of measurements, etc.) Noise is very challenging for parameter estimation 30

Typical Challenges moderate noise more noise much more noise 31

Bootstrapping A noisy data set!=x(")+! will not allow us to determine the true model parameters ", but only an estimate " (!). Each time we repeat the estimation with different data sets, we deal with a different realization of the random error! and obtain a different estimator ". Ideally, the mean value " of these estimates should be identical to the true parameter value, and their variance should be small. In practice, however, only a single data set is available, so we obtain a single data point estimate " without knowing its distribution. 32

Bootstrapping Bootstrapping provides a way to determine, at least approximately, the statistical properties of the estimator ". First, hypothetical data sets (of the same size as the original data set) are generated from the original data by resampling with replacement, and the estimate " is calculated for each of them. The empirical distribution of these estimates is then taken as an approximation for the true distribution of ". 33

Bootstrapping 34

Bootstrapping Bootstrapping is asymptotically consistent, that is, the approximation becomes exact as the size of the original data set goes to infinity. However, for finite data sets, it does not provide any guarantees. 35

Typical Challenges Non-identifiability: Many solutions might have the same SSE. In particular, dependence between variables may lead to redundant parameter estimates. 36

Typical Challenges The task of parameter estimation from given data is often called an inverse problem. If the solution on an inverse problem is not unique, the problem is ill-posed and addition assumptions are required to pinpoint a unique solution. 37

Structure Identification Related to the task of estimating parameter values in that of structure identification. In this case, it is not known what the structure of the model looks like. Two approaches may be pursued: explore a number of candidate models (Michaelis-Menten, sigmoidal Hill functions, etc.) compose a model from a set of canonical models (think: assembling the model from basic building blocks) 38

Cross-validation There exists a fundamental difference between model fitting and prediction. If a model has been fitted to a given data set, it will probably show a better agreement with these training data than with new test data that have not been used for model fitting. The reason is that in model fitting, we enforce an agreement with the data. 39

Cross-validation In fact, in model fitting, it is often the case that a fitted model will fit the data better than the true model itself! This phenomenon is called overfitting. Strong overfitting should be avoided! 40

Cross-validation How can we check how a model performs in prediction? In cross-validation, a given data set (size N) is split into two parts: a training data set of size n, and a test data set consisting of all remaining data. The model is fitted to the training data and the prediction error is evaluated for the test data. By repeating this procedure for many choices of test sets, we can judge how well the model, after being fitted to n data points, will predict new data. 41

Cross-validation 42

Model Selection 43

Essentially, all models are wrong, but some are useful. Useful for what? 44

45

As a rule of thumb, a model with many free parameters may fit given data easily. But, as the fit becomes better and better, the average amount of experimental information per parameter decreases, so the parameter estimates and predictions from the model become poorly determined. 46

With four parameters, I can fit an elephant, and with five, I can make him wiggle his trunk. J. von Neumann 47

We can choose between competing models by statistical tests and model selection criteria. 48

In statistical tests, we compare a more complex model to a simpler background model. According to the null hypothesis, both models perform well. In the test, we favor the background model unless it statistically contradicts the observed data. 49

Selection criteria are mathematical scoring functions that balance agreement with experimental data against model complexity. 50

Model Selection Methods include maximum likelihood and "2 -test likelihood ratio test information criteria (AIC, AICc, BIC) Bayesian model selection... 51

52

Acknowledgments Systems Biology: A Textbook, by E. Klipp et al. A First Course in Systems Biology, by E.O. Voit. 53