Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics

Size: px
Start display at page:

Download "Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics"

Transcription

1 Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics

2 I believe it is helpful to number your slides for easy reference. It's been a while since I took linear algebra and this came at me a little fast. A lot of the math presented today went over my head (I have not taken linear algebra) More background on how you go about implementing equations like this doing that would be helpful. Would appreciate a more detailed look at actually implementing neural nets with some pseudocode. It was a bit unclear at the end where exactly you get the W's from. Why are logistic functions (or Rectifier-Linear) preferred for Neural networks? Does any non-linear function work? Could you point me to the paper that reports the structures of transcription factor binding the chromatin? doi: /j.tibs

3 Linear algebra review Vector (dimension 3): scalar * vector multiplication: a 2 4 x 1 x 2 x = 2 3 ax 1 4ax 5 2 ax 3 vector * vector multiplication (dot product): x T y = x 1 x 2 x y 1 y = x 1 y 1 + x 2 y 2 + x 3 y 3 3 y 3

4 Linear algebra review Matrix (3x2): matrix * vector multiplication: Ax = apple a1,1 a 1,2 a 2,1 a 2,2 apple x1 x 2 = apple a1,1 x 1 + a 1,2 x 2 a 2,1 x 1 + a 2,2 x 2 4

5 Neural networks f 3,1 (x 2 ) = sign(w 3,1 x 2 ) x 3 x 2 f 2,1 (x 1 )=W 2,1 x 1 x 1 f 1,1 (x 0 )=W 1,1 x 0 f 1,2 (x 0 )=W 1,2 x 0 x A C C G T

6 This lecture MEME optimization ChIP-seq peak calling Motivating problem: accounting for GC and mappability bias in peak calling Method: Convex functions More ChIP-seq peak calling considerations Other functional genomics assays 6

7 Chromatin immunoprecipitation followed by sequencing (ChIP-seq) Sequence and map to reference genome 7

8 Problem: Given a ChIP-seq experiment for factor X, where does X bind?

9 Problem: Given a ChIP-seq experiment for factor X, where does X bind? Short answer: Stack up the reads in the genome; choose the tall stacks. Issues to consider: Sequencing fragment lengths Sequencing read lengths Experimental biases Mappability GC bias How to pick a threshold and assign statistical confidence? 9

10 This lecture MEME optimization ChIP-seq peak calling Motivating problem: accounting for GC and mappability bias in peak calling Method: Convex functions More ChIP-seq peak calling considerations Other functional genomics assays 10

11 ChIP-seq read counts are biased by GC content and mappability 11

12 MOSAiCS enrichment model Bound? Mappability GC content Tag counts 12

13 How can we model the background distribution of ChIP-seq reads?

14 Every predictive model is composed of a mean model and an error model E[counts j Not bound] = exp( 0 + M MAP j + GC GC j ) Pr(counts j Not bound) =? Bound? Mappability GC content Tag counts 14

15 The Poisson distribution models sequencing read counts k P (k )=e k! Mean: λ Variance: λ 15

16 The negative binomial distribution models sequencing read counts more flexibly than Poisson P (k r, p) = k + r 1 (1 p) r p k k Mean: Variance: pr 1 p pr (1 p) 2 16

17 The negative binomial distribution models the mean and variance separately Principle: Make the weakest assumptions you can afford to in your modeling choices 17

18 MOSAiCs background model Number of counts at 50-bp bin j N j NegBin(a, a/µ j ) µ j =exp( 0 + M M j + GC GC j ) Mappability at bin j GC content at bin j 18

19 How do we know if the MOSAiCS model is even optimizable?

20 This lecture MEME optimization ChIP-seq peak calling Motivating problem: accounting for GC and mappability bias in peak calling Method: Convex functions More ChIP-seq peak calling considerations Other functional genomics assays

21 Answer: The negative log likelihood is convex The class of convex functions (defined on the next slide) roughly corresponds to the set of efficiently optimizable functions. When presented with an objective function, the first step is usually to check if it is convex. 21

22 Convex functions A function f(x) is convex if it satisfies the property: f( x +(1 )y) apple f(x)+(1 )f(y) for all x, y, 0 apple apple 1. Convex functions have no local minima. 22

23 Concave functions A function f(x) is concave if -f(x) is convex. Convex functions are usually efficiently minimizable. Concave functions are usually efficiently maximizable. (A function can be neither convex nor concave.) 23

24 Examples on one variable 24 Stanford EE364a

25 Examples on one variable Convex: Non-convex: 25 Duke stat376

26 Second derivative criterion for convexity If f(x) is on one variable, f(x) is convex if and only if d 2 f dx Stanford EE364a

27 Is exp(x) convex? Proof by picture: Proof from second derivative: d exp(x) =expx dx d 2 exp(x) x 2 =expx 0 d 27 Proof from definition: exp( x +(1 )x 0 )=exp( x)exp((1 )x 0 ) exp(x)+(1 )exp(x 0 ) (Arithmetic Mean / Geometric Mean inequality)

28 Is sin(x) convex? Disproof by picture: Counterexample by second derivative: d sin(x) dx = cos(x) d 2 sin(x) dx 2 = sin(x) sin( ) = 1 < 0 Counterexample from definition: sin( )/2 + sin(0)/2 =0/2+0/2 < sin(0/2+ /2) = sin( /2) = 1 28

29 29 Is x 2 convex?

30 30 Is x 3 convex?

31 31 (ax b) 2

32 Convexity of functions on multiple variables y = f(x 1,x 2,x 3,...) Convexity criterion: f( x +(1 )y) apple f(x)+(1 )f(y) for all x, y, 0 apple apple 1. 32

33 Examples of multi-variable convex functions x 2 R n a ne function (convex and concave): a T x + b a 2 R n,b2 R Euclidian norm (convex): s X i x 2 i 33

34 Examples of multi-variable convex functions Convex if P is positive semi-definite. 34 Stanford EE364a

35 Examples of multi-variable convex functions Convex if P is positive semi-definite. 35 Stanford EE364a

36 Second derivative criterion for convexity on multiple variables Derivative of a function on multiple variables: Second derivative: r 2 f(x) = f(x) is convex if and only if: x 1 x v T r 2 f(x)v 0 for v 2 R n 36 i.e. if r 2 f(x)vis positive semi-definite. Stanford EE364a

37 Second derivative criterion for convexity on multiple variables Derivative of a function on multiple variables: Second derivative: r 2 f(x) = f(x) is convex if and only if: x 1 x v T r 2 f(x)v 0 for v 2 R n 37 i.e. if r 2 f(x)vis positive semi-definite. Stanford EE364a

38 Second derivative criterion for convexity on multiple variables Derivative of a function on multiple variables: Second derivative: r 2 f(x) = f(x) is convex if and only if: x 1 x v T r 2 f(x)v 0 for v 2 R n 38 i.e. if r 2 f(x)vis positive semi-definite. Stanford EE364a

39 Second derivative criterion for convexity on multiple variables Derivative of a function on multiple variables: Second derivative: r 2 f(x) = f(x) is convex if and only if: x 1 x v T r 2 f(x)v 0 for v 2 R n 39 i.e. if r 2 f(x)vis positive semi-definite. Stanford EE364a

40 A non-convex function f(x 1,x 2 )=x 1 x 2 = x 1 x 2 apple apple x1 x 2 Not positive semi-definite 40

41 Practical ways to establish convexity of a function Verify the definition. Verify that the second derivative is always positive semidefinite. Show that the function can be obtained from simple convex functions by operations that maintain convexity.

42 Some operations that maintain convexity 42 Stanford EE364a

43 Regularization L2 regularization: f 0 (x) =f(x)+kxk 2 43

44 Is this convex? (Ax b) 2 + x 2 44

45 What if your function is not convex? Is there a monotonic transform that makes it convex? Example: Y i x i log Y x i = X log x i i i Neither convex nor concave Concave Next best thing: split the function into convex parts. Example: f(x 1,x 2 )=x 1 x 2 45 Convex in either x1 or x2 but not both at once. Optimize each in turn. Example: EM.

46 What if your function is not convex? Is there a monotonic transform that makes it convex? Example: Y i x i log Y x i = X log x i i i Neither convex nor concave Concave Next best thing: split the function into convex parts. Example: f(x 1,x 2 )=x 1 x 2 46 Convex in either x1 or x2 but not both at once. Optimize each in turn. Example: EM.

47 Optimizing convex objectives There are general convex optimization software packages. Even when these are too slow, convex functions usually admit fast optimization specific to your problem. More on convex optimization next class. 47

48 The MOSAiCS objective is concave in β =[ 0 M GC] T F j =[1MY j GC j ] T X µ(f j, )=exp( T F j )=exp( 0 + M M j + GC GC j ) Y X X log P (N ) = log Y P (N j )= Y X = j X Nj + a +1 a log + a log 1 + N j log a N j log µ(f j, ) N j j µ(f j, ) X log(1 1/ exp(x)) X N j log µ(f j, )=N j T F j (a ne in )

49 This lecture MEME optimization ChIP-seq peak calling Motivating problem: accounting for GC and mappability bias in peak calling Method: Convex functions More ChIP-seq peak calling considerations Other functional genomics assays 49

50 The ChIP-seq protocol enriches for a particular fragment size Chromatin DNA fragments ChIP, sonication size selection sequence 50 fragment length

51 Translate reads to the inferred center of the sequencing fragment correlation between strands 51 strand shift

52 The phantom peak results from mappability islands unmappable mappable ChIP-seq measure of quality: relative strand correlation (RSC) 52 read length

53 ChIP-seq controls Input: IgG: Skip IP Use irrelevant antibody 53 Reasons for controls: - sonocation bias - CNVs - sequence composition bias

54 Problem: Different background models result in wildly different false discovery-rate estimates 54

55 Idea: Control reproducibility of peaks between biological replicates 55 Irreproducible discovery rate (IDR): Expected fraction of peaks that are not reproducible between biological replicates.

56 56 IDR can handle varying quality levels

57 This lecture MEME optimization ChIP-seq peak calling Motivating problem: accounting for GC and mappability bias in peak calling Method: Convex functions More ChIP-seq peak calling considerations Other functional genomics assays 57

58 ChIP-exo has better spatial resolution than ChIP-seq 58

59 DamID measures TF binding through a fusion protein Dam+TF fusion protein Measure methylation at GATCs DamID vs. ChIP-seq: DamID can be easier ChIP requires (specific) antibody DamID requires fusion protein DamID can t query post-transcriptional modification (histone mods) ChIP has better spacial resolution ChIP is limited by cross-linking bias DamID is limited by GATC content and Dam reactivity ChIP has better temporal resolution: Dam acts over ~24 hours 59

60 DNase-seq and ATAC-seq measure DNA accessibility 60

61 High-depth DNase-seq (DNase-DGF) measures TF binding 61

62 Paired-end DNase and ATAC-seq measure nucleosome architecture 62

63 Administrivia Homework 1 will be up later today. Due Thursday Next week: Broad (non peak-y ) functional genomics assays; Chromatin architecture. Please write 1-minute responses.

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability

More information

Genome 541! Unit 4, lecture 3! Genomics assays

Genome 541! Unit 4, lecture 3! Genomics assays Genome 541! Unit 4, lecture 3! Genomics assays Much easier to follow with slides. Good pace.! Having the slides was really helpful clearer to read and easier to follow the trajectory of the lecture.!!

More information

Genome 541 Introduction to Computational Molecular Biology. Max Libbrecht

Genome 541 Introduction to Computational Molecular Biology. Max Libbrecht Genome 541 Introduction to Computational Molecular Biology Max Libbrecht Genome 541 units Max Libbrecht: Gene regulation and epigenomics Postdoc, Bill Noble s lab Yi Yin: Bayesian statistics Postdoc, Jay

More information

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

ChIP seq peak calling. Statistical integration between ChIP seq and RNA seq

ChIP seq peak calling. Statistical integration between ChIP seq and RNA seq Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription

More information

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Data visualization, quality control, normalization & peak calling Peak annotation Presentation () Practical session

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Visualization, quality, normalization & peak-calling Presentation (Carl Herrmann) Practical session Peak annotation

More information

Statistical analysis of genomic binding sites using high-throughput ChIP-seq data

Statistical analysis of genomic binding sites using high-throughput ChIP-seq data Statistical analysis of genomic binding sites using high-throughput ChIP-seq data Ibrahim Ali H Nafisah Department of Statistics University of Leeds Submitted in accordance with the requirments for the

More information

MODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA. Naim Rashid

MODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA. Naim Rashid MODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA Naim Rashid A dissertation submitted to the faculty of the University of North Carolina

More information

Chapter 3 Class Notes Word Distributions and Occurrences

Chapter 3 Class Notes Word Distributions and Occurrences Chapter 3 Class Notes Word Distributions and Occurrences 3.1. The Biological Problem: restriction endonucleases provide[s] the means for precisely and reproducibly cutting the DNA into fragments of manageable

More information

JMJ14-HA. Col. Col. jmj14-1. jmj14-1 JMJ14ΔFYR-HA. Methylene Blue. Methylene Blue

JMJ14-HA. Col. Col. jmj14-1. jmj14-1 JMJ14ΔFYR-HA. Methylene Blue. Methylene Blue Fig. S1 JMJ14 JMJ14 JMJ14ΔFYR Methylene Blue Col jmj14-1 JMJ14-HA Methylene Blue Col jmj14-1 JMJ14ΔFYR-HA Fig. S1. The expression level of JMJ14 and truncated JMJ14 with FYR (FYRN + FYRC) domain deletion

More information

Linear Algebra: Homework 3

Linear Algebra: Homework 3 Linear Algebra: Homework 3 Alvin Lin August 206 - December 206 Section.2 Exercise 48 Find all values of the scalar k for which the two vectors are orthogonal. [ ] [ ] 2 k + u v 3 k u v 0 2(k + ) + 3(k

More information

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

Statistics for Differential Expression in Sequencing Studies. Naomi Altman Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand

More information

Please do not start working until instructed to do so. You have 50 minutes. You must show your work to receive full credit. Calculators are OK.

Please do not start working until instructed to do so. You have 50 minutes. You must show your work to receive full credit. Calculators are OK. Loyola University Chicago Math 131, Section 009, Fall 2008 Midterm 2 Name (print): Signature: Please do not start working until instructed to do so. You have 50 minutes. You must show your work to receive

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

INTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT

INTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT INTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

for the Analysis of ChIP-Seq Data

for the Analysis of ChIP-Seq Data Supplementary Materials: A Statistical Framework for the Analysis of ChIP-Seq Data Pei Fen Kuan Departments of Statistics and of Biostatistics and Medical Informatics Dongjun Chung Departments of Statistics

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Calculus 221 worksheet

Calculus 221 worksheet Calculus 221 worksheet Graphing A function has a global maximum at some a in its domain if f(x) f(a) for all other x in the domain of f. Global maxima are sometimes also called absolute maxima. A function

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

TECHNICAL REPORT NO. 1151

TECHNICAL REPORT NO. 1151 DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Avenue Madison, WI 53706 TECHNICAL REPORT NO. 1151 January 12, 2009 A Hierarchical Semi-Markov Model for Detecting Enrichment with Application

More information

MAT01B1: Maximum and Minimum Values

MAT01B1: Maximum and Minimum Values MAT01B1: Maximum and Minimum Values Dr Craig 14 August 2018 My details: acraig@uj.ac.za Consulting hours: Monday 14h40 15h25 Thursday 11h20 12h55 Friday 11h20 12h55 Office C-Ring 508 https://andrewcraigmaths.wordpress.com/

More information

Lecture 1: Introduction and probability review

Lecture 1: Introduction and probability review Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 1: Introduction and probability review Lecturer: Art B. Owen September 25 Disclaimer: These notes have not been subjected to the usual

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

This practice exam is intended to help you prepare for the final exam for MTH 142 Calculus II.

This practice exam is intended to help you prepare for the final exam for MTH 142 Calculus II. MTH 142 Practice Exam Chapters 9-11 Calculus II With Analytic Geometry Fall 2011 - University of Rhode Island This practice exam is intended to help you prepare for the final exam for MTH 142 Calculus

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Package ChIPtest. July 20, 2016

Package ChIPtest. July 20, 2016 Type Package Package ChIPtest July 20, 2016 Title Nonparametric Methods for Identifying Differential Enrichment Regions with ChIP-Seq Data Version 1.0 Date 2017-07-07 Author Vicky Qian Wu ; Kyoung-Jae

More information

Linear Independence. MATH 322, Linear Algebra I. J. Robert Buchanan. Spring Department of Mathematics

Linear Independence. MATH 322, Linear Algebra I. J. Robert Buchanan. Spring Department of Mathematics Linear Independence MATH 322, Linear Algebra I J. Robert Buchanan Department of Mathematics Spring 2015 Introduction Given a set of vectors {v 1, v 2,..., v r } and another vector v span{v 1, v 2,...,

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Announcements Monday, November 13

Announcements Monday, November 13 Announcements Monday, November 13 The third midterm is on this Friday, November 17. The exam covers 3.1, 3.2, 5.1, 5.2, 5.3, and 5.5. About half the problems will be conceptual, and the other half computational.

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

Automatic Differentiation and Neural Networks

Automatic Differentiation and Neural Networks Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield

More information

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

MIDTERM 2. Section: Signature:

MIDTERM 2. Section: Signature: MIDTERM 2 Math 3A 11/17/2010 Name: Section: Signature: Read all of the following information before starting the exam: Check your exam to make sure all pages are present. When you use a major theorem (like

More information

DEXSeq paper discussion

DEXSeq paper discussion DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml

More information

Geert Geeven. April 14, 2010

Geert Geeven. April 14, 2010 iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Final Exam Solutions June 10, 2004

Final Exam Solutions June 10, 2004 Math 0400: Analysis in R n II Spring 004 Section 55 P. Achar Final Exam Solutions June 10, 004 Total points: 00 There are three blank pages for scratch work at the end of the exam. Time it: hours 1. True

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Foundation of Intelligent Systems, Part I. Regression

Foundation of Intelligent Systems, Part I. Regression Foundation of Intelligent Systems, Part I Regression mcuturi@i.kyoto-u.ac.jp FIS-2013 1 Before starting Please take this survey before the end of this week. Here are a few books which you can check beyond

More information

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0 8.7 Taylor s Inequality Math 00 Section 005 Calculus II Name: ANSWER KEY Taylor s Inequality: If f (n+) is continuous and f (n+) < M between the center a and some point x, then f(x) T n (x) M x a n+ (n

More information

Simultaneous Equations Solve for x and y (What are the values of x and y): Summation What is the value of the following given x = j + 1. x i.

Simultaneous Equations Solve for x and y (What are the values of x and y): Summation What is the value of the following given x = j + 1. x i. 1 Algebra Simultaneous Equations Solve for x and y (What are the values of x and y): x + 2y = 6 x - y = 3 Summation What is the value of the following given x = j + 1. Summation Calculate the following:

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Announcements Monday, September 18

Announcements Monday, September 18 Announcements Monday, September 18 WeBWorK 1.4, 1.5 are due on Wednesday at 11:59pm. The first midterm is on this Friday, September 22. Midterms happen during recitation. The exam covers through 1.5. About

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

Comparative analysis of RNA- Seq data with DESeq2

Comparative analysis of RNA- Seq data with DESeq2 Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given

More information

IDR: Irreproducible discovery rate

IDR: Irreproducible discovery rate IDR: Irreproducible discovery rate Sündüz Keleş Department of Statistics Department of Biostatistics and Medical Informatics University of Wisconsin, Madison April 18, 2017 Stat 877 (Spring 17) 04/11-04/18

More information

STAT 414: Introduction to Probability Theory

STAT 414: Introduction to Probability Theory STAT 414: Introduction to Probability Theory Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical Exercises

More information

Solutions to homework assignment #7 Math 119B UC Davis, Spring for 1 r 4. Furthermore, the derivative of the logistic map is. L r(x) = r(1 2x).

Solutions to homework assignment #7 Math 119B UC Davis, Spring for 1 r 4. Furthermore, the derivative of the logistic map is. L r(x) = r(1 2x). Solutions to homework assignment #7 Math 9B UC Davis, Spring 0. A fixed point x of an interval map T is called superstable if T (x ) = 0. Find the value of 0 < r 4 for which the logistic map L r has a

More information

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012 (Homework 1: Chapter 1: Exercises 1-7, 9, 11, 19, due Monday June 11th See also the course website for lectures, assignments, etc) Note: today s lecture is primarily about definitions Lots of definitions

More information

More on infinite series Antiderivatives and area

More on infinite series Antiderivatives and area More on infinite series Antiderivatives and area September 28, 2017 The eighth breakfast was on Monday: There are still slots available for the October 4 breakfast (Wednesday, 8AM), and there s a pop-in

More information

Test for Increasing and Decreasing Theorem 5 Let f(x) be continuous on [a, b] and differentiable on (a, b).

Test for Increasing and Decreasing Theorem 5 Let f(x) be continuous on [a, b] and differentiable on (a, b). Definition of Increasing and Decreasing A function f(x) is increasing on an interval if for any two numbers x 1 and x in the interval with x 1 < x, then f(x 1 ) < f(x ). As x gets larger, y = f(x) gets

More information

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of

More information

Announcements Monday, November 13

Announcements Monday, November 13 Announcements Monday, November 13 The third midterm is on this Friday, November 17 The exam covers 31, 32, 51, 52, 53, and 55 About half the problems will be conceptual, and the other half computational

More information

Chromosomes and Inheritance

Chromosomes and Inheritance Chromosomes and Inheritance Overview Number of instructional days: 14 (1 day = 50 minutes) Content to be learned Describe the structure of DNA as a way to demonstrate an understanding of the molecular

More information

Homework 2. Spring 2019 (Due Thursday February 7)

Homework 2. Spring 2019 (Due Thursday February 7) ECE 302: Probabilistic Methods in Electrical and Computer Engineering Spring 2019 Instructor: Prof. A. R. Reibman Homework 2 Spring 2019 (Due Thursday February 7) Homework is due on Thursday February 7

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Review of Coordinate Systems

Review of Coordinate Systems Vector in 2 R and 3 R Review of Coordinate Systems Used to describe the position of a point in space Common coordinate systems are: Cartesian Polar Cartesian Coordinate System Also called rectangular coordinate

More information

Lecture 5: Processes and Timescales: Rates for the fundamental processes 5.1

Lecture 5: Processes and Timescales: Rates for the fundamental processes 5.1 Lecture 5: Processes and Timescales: Rates for the fundamental processes 5.1 Reading Assignment for Lectures 5-6: Phillips, Kondev, Theriot (PKT), Chapter 3 Life is not static. Organisms as a whole are

More information

Differential expression analysis for sequencing count data. Simon Anders

Differential expression analysis for sequencing count data. Simon Anders Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014 Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more

More information

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n( ,a) RNA-seq RNA-seq Cuffdiff, edger, DESeq Sese Jun,a) Abstract: Frequently used biological experiment technique for observing comprehensive gene expression has been changed from microarray using cdna

More information

FALL 2018 MATH 4211/6211 Optimization Homework 1

FALL 2018 MATH 4211/6211 Optimization Homework 1 FALL 2018 MATH 4211/6211 Optimization Homework 1 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution

More information

Statistical tests for differential expression in count data (1)

Statistical tests for differential expression in count data (1) Statistical tests for differential expression in count data (1) NBIC Advanced RNA-seq course 25-26 August 2011 Academic Medical Center, Amsterdam The analysis of a microarray experiment Pre-process image

More information

M155 Exam 2 Concept Review

M155 Exam 2 Concept Review M155 Exam 2 Concept Review Mark Blumstein DERIVATIVES Product Rule Used to take the derivative of a product of two functions u and v. u v + uv Quotient Rule Used to take a derivative of the quotient of

More information

Multivariate point process models

Multivariate point process models Faculty of Science Multivariate point process models Niels Richard Hansen Department of Mathematical Sciences January 8, 200 Slide /20 Ideas and outline General aim: To build and implement a flexible (non-parametric),

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

4.9 APPROXIMATING DEFINITE INTEGRALS

4.9 APPROXIMATING DEFINITE INTEGRALS 4.9 Approximating Definite Integrals Contemporary Calculus 4.9 APPROXIMATING DEFINITE INTEGRALS The Fundamental Theorem of Calculus tells how to calculate the exact value of a definite integral IF the

More information

Calculus I Announcements

Calculus I Announcements Slide 1 Calculus I Announcements Read sections 3.9-3.10 Do all the homework for section 3.9 and problems 1,3,5,7 from section 3.10. The exam is in Thursday, October 22nd. The exam will cover sections 3.2-3.10,

More information

Controlling Gene Expression

Controlling Gene Expression Controlling Gene Expression Control Mechanisms Gene regulation involves turning on or off specific genes as required by the cell Determine when to make more proteins and when to stop making more Housekeeping

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

5/8/2012: Practice final A

5/8/2012: Practice final A Math 1A: introduction to functions and calculus Oliver Knill, Spring 2012 Problem 1) TF questions (20 points) No justifications are needed. 5/8/2012: Practice final A 1) T F The quantum exponential function

More information

MAT137 - Term 2, Week 4

MAT137 - Term 2, Week 4 MAT137 - Term 2, Week 4 Reminders: Your Problem Set 6 is due tomorrow at 3pm. Test 3 is next Friday, February 3, at 4pm. See the course website for details. Today we will: Talk more about substitution.

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

EECS 70 Discrete Mathematics and Probability Theory Fall 2015 Walrand/Rao Final

EECS 70 Discrete Mathematics and Probability Theory Fall 2015 Walrand/Rao Final EECS 70 Discrete Mathematics and Probability Theory Fall 2015 Walrand/Rao Final PRINT Your Name:, (last) SIGN Your Name: (first) PRINT Your Student ID: CIRCLE your exam room: 220 Hearst 230 Hearst 237

More information

Review of Optimization Methods

Review of Optimization Methods Review of Optimization Methods Prof. Manuela Pedio 20550 Quantitative Methods for Finance August 2018 Outline of the Course Lectures 1 and 2 (3 hours, in class): Linear and non-linear functions on Limits,

More information

Support for UCL Mathematics offer holders with the Sixth Term Examination Paper

Support for UCL Mathematics offer holders with the Sixth Term Examination Paper 1 Support for UCL Mathematics offer holders with the Sixth Term Examination Paper The Sixth Term Examination Paper (STEP) examination tests advanced mathematical thinking and problem solving. The examination

More information

Static Problem Set 2 Solutions

Static Problem Set 2 Solutions Static Problem Set Solutions Jonathan Kreamer July, 0 Question (i) Let g, h be two concave functions. Is f = g + h a concave function? Prove it. Yes. Proof: Consider any two points x, x and α [0, ]. Let

More information

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University Introduction to the Mathematical and Statistical Foundations of Econometrics 1 Herman J. Bierens Pennsylvania State University November 13, 2003 Revised: March 15, 2004 2 Contents Preface Chapter 1: Probability

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

Clustering and Network

Clustering and Network Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in

More information

The Research Plan. Functional Genomics Research Stream. Transcription Factors. Tuning In Is A Good Idea

The Research Plan. Functional Genomics Research Stream. Transcription Factors. Tuning In Is A Good Idea Functional Genomics Research Stream The Research Plan Tuning In Is A Good Idea Research Meeting: March 23, 2010 The Road to Publication Transcription Factors Protein that binds specific DNA sequences controlling

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Polynomial Regression and Regularization

Polynomial Regression and Regularization Polynomial Regression and Regularization Administrivia o If you still haven t enrolled in Moodle yet Enrollment key on Piazza If joined course recently, email me to get added to Piazza o Homework 1 posted

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

, Seventh Grade Math, Quarter 1

, Seventh Grade Math, Quarter 1 2017.18, Seventh Grade Math, Quarter 1 The following Practice Standards and Literacy Skills will be used throughout the course: Standards for Mathematical Practice Literacy Skills for Mathematical Proficiency

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information