Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics
|
|
- Leon Tyler
- 5 years ago
- Views:
Transcription
1 Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics
2 Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability or time wasted writing things. Having slides can be useful for note-taking and going back afterward. Some of the equations were hard to read, so difficult to copy in notes. It would be really helpful to have more slides. It s difficult to read the writing on the board, and clearer diagrams would be quite helpful. Slides would have been helpful for equations and charts. Very clear. But better with slides. Content, organization, pace: The lecture story might be easier if there was an outline left on the board. More details on MEME (what the exact likelihood calculation is), perhaps instead of FWER, which most of us have seen. Question: where did you learn about FWER? Clear and interesting, good pace. The material was too densely presented to digest if I hadn t seen this material before.
3 False discovery rate vs. false positive rate TP: True positives TN: True negatives FP: False positives False discovery rate: FP / (FP + TP) False positive rate: FP / (FP + TN)
4 Statistical power Statistical power: The probability you reject the null hypothesis if the alternative hypothesis is true (at a given p-value threshold). We want to maximize power at a given significance level.
5 Accuracy-related terms cheat-sheet Formula TP TN FP FN TP / (TP + FN) TP / (TP + FP) TN / (TN + FP) (TP + TN) / (TP+TN+FP+FN) Names True positves True negatives False positives Type I errors False negatives Type II errors Recall Sensitivity True positive rate Power 1 - False negative rate Precision Positive predictive value 1 - False discovery rate Specificity True negative rate 1 - False positive rate 1 - Statistical significance level Accuracy
6 Chromatin immunoprecipitation followed by sequencing (ChIP-seq) Sequence and map to reference genome
7 Problem: Given a ChIP-seq experiment for factor X, where does X bind?
8 Problem: Given a ChIP-seq experiment for factor X, where does X bind? Short answer: Stack up the reads in the genome; choose the tall stacks. Issues to consider: Sequencing fragment lengths Sequencing read lengths Mappability GC bias How to pick a threshold and assign statistical confidence?
9 Today s class Accounting for GC and mappability bias in peak calling Convex functions More ChIP-seq peak calling considerations: Handling sequencing fragment sizes Choosing a p-value threshold Control experiments Other functional genomics assays
10 ChIP-seq read counts are biased by GC content and mappability
11 How can we model the background distribution of ChIP-seq reads?
12 The Poisson distribution models sequencing read counts k P (k )=e k Mean: λ Variance: λ
13 The negative binomial distribution models sequencing read counts more flexibly than Poisson P (k r, p) = k + r 1 (1 p) r p k k Mean: Variance: pr 1 p pr (1 p) 2
14 The negative binomial distribution models the mean and variance separately Principle: Make the weakest assumptions you can afford to in your modeling choices
15 MOSAiCs background model Number of counts at 50-bp bin j N j NegBin(a, a/µ j ) µ j =exp( 0 + M M j + GC GC j ) Mappability at bin j GC content at bin j
16 How do we know if the MOSAiCS model is even optimizable?
17 Answer: The negative log likelihood is convex The class of convex functions (defined on the next slide) roughly corresponds to the set of efficiently optimizable functions. When presented with an objective function, the first step is usually to check if it is convex.
18 Convex functions A function f(x) is convex if it satisfies the property: f( x +(1 )y) apple f(x)+(1 )f(y) for all x, y, 0 apple apple 1. Convex functions have no local minima.
19 Concave functions A function f(x) is concave if -f(x) is convex. Convex functions are usually efficiently minimizable. Concave functions are usually efficiently maximizable. (A function can be neither convex nor concave.)
20 Examples on one variable Stanford EE364a
21 Examples on one variable Convex: Non-convex: Duke stat376
22 Examples on vectors x 2 R n a ne function (convex and concave): a T x + b a 2 R n,b2 R Euclidian norm (convex): s X i x 2 i
23 First derivative criterion for convexity Derivative of a function on vectors: f(x) is convex if and only if: Stanford EE364a
24 Second derivative criterion for convexity If f(x) is on one variable, f(x) is convex if and only if d 2 f dx 2 0 Stanford EE364a
25 Second derivative criterion for convexity Derivative of a function on multiple variables: f(x) is convex if and only if: r 2 f(x) = i.e. if r 2 f(x) is positive semi-definite x 1 x v T r 2 f(x)v 0 for v 2 R n Stanford EE364a
26 Examples Convex if P is positive semi-definite. Stanford EE364a
27 A non-convex function f(x 1,x 2 )=x 1 x 2 = x 1 x 2 apple apple x1 x 2 Not positive semi-definite
28 Practical ways to establish convexity of a function Verify the definition. Verify that the second derivative is always positive semidefinite. Show that the function can be obtained from simple convex functions by operations that maintain convexity.
29 Some operations that maintain convexity Stanford EE364a
30 The MOSAiCS objective is concave in β =[ 0 M GC] T F j =[1M j GC j ] T µ(f j, )=exp( T F j )=exp( 0 + M M j + GC GC j ) log P (N ) = log Y j P (N j )= X log P (N j ) j = X j = X j log log Nj + a +1 N j Nj + a +1 N j 1 a Nj a a Y µ(f j, ) µ(f j, ) X a + a log 1 + N j log a µ(f j, ) X N j log µ(f j, ) log(1 1/ exp(x)) X N j log µ(f j, )=N j T F j (a ne in )
31 The MOSAiCS objective is concave in β =[ 0 M GC] T F j =[1M j GC j ] T µ(f j, )=exp( T F j )=exp( 0 + M M j + GC GC j ) log P (N ) = log Y j P (N j )= X log P (N j ) j = X j = X j log log Nj + a +1 N j Nj + a +1 N j 1 a Nj a a Y µ(f j, ) µ(f j, ) X a + a log 1 + N j log a µ(f j, ) X N j log µ(f j, ) log(1 1/ exp(x)) X N j log µ(f j, )=N j T F j (a ne in )
32 The MOSAiCS objective is concave in β =[ 0 M GC] T F j =[1M j GC j ] T µ(f j, )=exp( T F j )=exp( 0 + M M j + GC GC j ) log P (N ) = log Y j P (N j )= X log P (N j ) j = X j = X j log log Nj + a +1 N j Nj + a +1 N j 1 a Nj a a Y µ(f j, ) µ(f j, ) X a + a log 1 + N j log a µ(f j, ) X N j log µ(f j, ) log(1 1/ exp(x)) X N j log µ(f j, )=N j T F j (a ne in )
33 What if your function is not convex? Is there a monotonic transform that makes it convex? Example: Y Next best thing: split the function into convex parts. Example: i x i log Y x i = X log x i i i f(x 1,x 2 )=x 1 x 2 Neither convex nor concave Concave Convex in either x1 or x2 but not both at once. Optimize each in turn. Example: EM.
34 What if your function is not convex? Is there a monotonic transform that makes it convex? Example: Y Next best thing: split the function into convex parts. Example: i x i log Y x i = X log x i i i f(x 1,x 2 )=x 1 x 2 Neither convex nor concave Concave Convex in either x1 or x2 but not both at once. Optimize each in turn. Example: EM.
35 Optimizing convex objectives There are general convex optimization software packages. Even when these are too slow, convex functions usually admit fast optimization specific to your problem. More on convex optimization next class.
36 How well does it work?
37 MOSAiCS enrichment model Non-bound positions: NegBin(a, a/µ j ) Bound positions: NegBin(a, a/µ j )+NegBin(b, c) Bound? Mappability GC content Tag counts
38 MOSAiCS peak calls are more enriched for motifs than competing method
39 MOSAiCS calls many more peaks in low mappability and GC regions than other methods
40 Today s class Accounting for GC and mappability bias in peak calling Convex functions More ChIP-seq peak calling considerations: Handling sequencing fragment sizes Choosing a p-value threshold Control experiments Other functional genomics assays
41 The ChIP-seq protocol enriches for a particular fragment size Chromatin DNA fragments ChIP, sonication size selection sequence fragment length
42 Translate reads to the inferred center of the sequencing fragment correlation between strands strand shift
43 The phantom peak results from mappability islands unmappable mappable ChIP-seq measure of quality: relative strand correlation (RSC) read length
44 Today s class Accounting for GC and mappability bias in peak calling Convex functions More ChIP-seq peak calling considerations: Handling sequencing fragment sizes Control experiments Choosing a p-value threshold Other functional genomics assays
45 ChIP-seq controls Input: IgG: Skip IP Use irrelevant antibody Reasons for controls: - sonocation bias - CNVs - sequence composition bias
46 Today s class Accounting for GC and mappability bias in peak calling Convex functions More ChIP-seq peak calling considerations: Handling sequencing fragment sizes Control experiments Choosing a p-value threshold Other functional genomics assays
47 Problem: Different background models result in wildly different false discovery-rate estimates
48 Idea: Control reproducibility of peaks between biological replicates Irreproducible discovery rate (IDR): Expected fraction of peaks that are not reproducible between biological replicates.
49 IDR can handle varying quality levels
50 Today s class Accounting for GC and mappability bias in peak calling Convex functions More ChIP-seq peak calling considerations: Handling sequencing fragment sizes Control experiments Choosing a p-value threshold Other functional genomics assays
51 ChIP-exo has better spatial resolution than ChIP-seq
52 DamID measures TF binding through a fusion protein Dam+TF fusion protein Measure methylation at GATCs DamID vs. ChIP-seq: DamID can be easier ChIP requires (specific) antibody DamID requires fusion protein DamID can t query post-transcriptional modification (histone mods) ChIP has better spacial resolution ChIP is limited by cross-linking bias DamID is limited by GATC content and Dam reactivity ChIP has better temporal resolution: Dam acts over ~24 hours
53 DNase-seq and ATAC-seq measure DNA accessibility
54 High-depth DNase-seq (DNase-DGF) measures TF binding
55 Paired-end DNase and ATAC-seq measure nucleosome architecture
56 Administrivia Homework 6 due tomorrow. Homework 7 is up online. Due Friday Next week: Broad (non peak-y ) functional genomics assays; Chromatin architecture. Please write 1-minute responses.
Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics
Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics I believe it is helpful to number your slides for easy reference. It's been a while since I took
More informationGenome 541! Unit 4, lecture 3! Genomics assays
Genome 541! Unit 4, lecture 3! Genomics assays Much easier to follow with slides. Good pace.! Having the slides was really helpful clearer to read and easier to follow the trajectory of the lecture.!!
More informationGenome 541 Introduction to Computational Molecular Biology. Max Libbrecht
Genome 541 Introduction to Computational Molecular Biology Max Libbrecht Genome 541 units Max Libbrecht: Gene regulation and epigenomics Postdoc, Bill Noble s lab Yi Yin: Bayesian statistics Postdoc, Jay
More informationGene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC
More informationChIP seq peak calling. Statistical integration between ChIP seq and RNA seq
Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationfor the Analysis of ChIP-Seq Data
Supplementary Materials: A Statistical Framework for the Analysis of ChIP-Seq Data Pei Fen Kuan Departments of Statistics and of Biostatistics and Medical Informatics Dongjun Chung Departments of Statistics
More informationMeasuring TF-DNA interactions
Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF
More informationChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier
ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Data visualization, quality control, normalization & peak calling Peak annotation Presentation () Practical session
More informationStatistical tests for differential expression in count data (1)
Statistical tests for differential expression in count data (1) NBIC Advanced RNA-seq course 25-26 August 2011 Academic Medical Center, Amsterdam The analysis of a microarray experiment Pre-process image
More informationChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier
ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Visualization, quality, normalization & peak-calling Presentation (Carl Herrmann) Practical session Peak annotation
More informationEvaluating Classifiers. Lecture 2 Instructor: Max Welling
Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?
More informationStatistics for Differential Expression in Sequencing Studies. Naomi Altman
Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand
More informationChapter 3 Class Notes Word Distributions and Occurrences
Chapter 3 Class Notes Word Distributions and Occurrences 3.1. The Biological Problem: restriction endonucleases provide[s] the means for precisely and reproducibly cutting the DNA into fragments of manageable
More informationLecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech
Lecture 9 Two-Sample Test Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech Computer exam 1 18 Histogram 14 Frequency 9 5 0 75 83.33333333
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationStat 231 Exam 2 Fall 2013
Stat 231 Exam 2 Fall 2013 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Some IE 361 students worked with a manufacturer on quantifying the capability
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationMODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA. Naim Rashid
MODEL-BASED APPROACHES FOR THE DETECTION OF BIOLOGICALLY ACTIVE GENOMIC REGIONS FROM NEXT GENERATION SEQUENCING DATA Naim Rashid A dissertation submitted to the faculty of the University of North Carolina
More informationStatistical analysis of genomic binding sites using high-throughput ChIP-seq data
Statistical analysis of genomic binding sites using high-throughput ChIP-seq data Ibrahim Ali H Nafisah Department of Statistics University of Leeds Submitted in accordance with the requirments for the
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationOverview of IslandPick pipeline and the generation of GI datasets
Overview of IslandPick pipeline and the generation of GI datasets Predicting GIs using comparative genomics By using whole genome alignments we can identify regions that are present in one genome but not
More informationMatrix-based pattern discovery algorithms
Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More information9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.
Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences
More informationEcon 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias
Contact Information Elena Llaudet Sections are voluntary. My office hours are Thursdays 5pm-7pm in Littauer Mezzanine 34-36 (Note room change) You can email me administrative questions to ellaudet@gmail.com.
More informationHigh-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018
High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationComparative analysis of RNA- Seq data with DESeq2
Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationINTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT
INTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT
More informationCSEP 590A Summer Lecture 4 MLE, EM, RE, Expression
CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationCSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators
CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS 1a. Under the null hypothesis X has the binomial (100,.5) distribution with E(X) = 50 and SE(X) = 5. So P ( X 50 > 10) is (approximately) two tails
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationExhaustive search. CS 466 Saurabh Sinha
Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction
More informationORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing
ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationJMJ14-HA. Col. Col. jmj14-1. jmj14-1 JMJ14ΔFYR-HA. Methylene Blue. Methylene Blue
Fig. S1 JMJ14 JMJ14 JMJ14ΔFYR Methylene Blue Col jmj14-1 JMJ14-HA Methylene Blue Col jmj14-1 JMJ14ΔFYR-HA Fig. S1. The expression level of JMJ14 and truncated JMJ14 with FYR (FYRN + FYRC) domain deletion
More informationIDR: Irreproducible discovery rate
IDR: Irreproducible discovery rate Sündüz Keleş Department of Statistics Department of Biostatistics and Medical Informatics University of Wisconsin, Madison April 18, 2017 Stat 877 (Spring 17) 04/11-04/18
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationPackage ChIPtest. July 20, 2016
Type Package Package ChIPtest July 20, 2016 Title Nonparametric Methods for Identifying Differential Enrichment Regions with ChIP-Seq Data Version 1.0 Date 2017-07-07 Author Vicky Qian Wu ; Kyoung-Jae
More informationLecture 21: October 19
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationLecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationLecture on Null Hypothesis Testing & Temporal Correlation
Lecture on Null Hypothesis Testing & Temporal Correlation CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Acknowledgement Resources used in the slides
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationBuilding a cohesive bioinformatics classroom: implementation of wireless sharing technology in a large scale-up of SEA-PHAGES introductory biology lab
Building a cohesive bioinformatics classroom: implementation of wireless sharing technology in a large scale-up of SEA-PHAGES introductory biology lab Welkin Pope Biological Sciences July 1 2015 Seaphages.org
More informationMathematical Statistics
Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics
More informationPHYS 275 Experiment 2 Of Dice and Distributions
PHYS 275 Experiment 2 Of Dice and Distributions Experiment Summary Today we will study the distribution of dice rolling results Two types of measurement, not to be confused: frequency with which we obtain
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationGeert Geeven. April 14, 2010
iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationTECHNICAL REPORT NO. 1151
DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Avenue Madison, WI 53706 TECHNICAL REPORT NO. 1151 January 12, 2009 A Hierarchical Semi-Markov Model for Detecting Enrichment with Application
More informationFALL 2018 MATH 4211/6211 Optimization Homework 1
FALL 2018 MATH 4211/6211 Optimization Homework 1 This homework assignment is open to textbook, reference books, slides, and online resources, excluding any direct solution to the problem (such as solution
More informationDESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya
DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample
More informationEvaluating Hypotheses
Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,
More informationCourse Review. Kin 304W Week 14: April 9, 2013
Course Review Kin 304W Week 14: April 9, 2013 1 Today s Outline Format of Kin 304W Final Exam Course Review Hand back marked Project Part II 2 Kin 304W Final Exam Saturday, Thursday, April 18, 3:30-6:30
More informationQB LECTURE #4: Motif Finding
QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin
More informationg A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(
,a) RNA-seq RNA-seq Cuffdiff, edger, DESeq Sese Jun,a) Abstract: Frequently used biological experiment technique for observing comprehensive gene expression has been changed from microarray using cdna
More informationIntroduction to Statistical Inference and Confidence Intervals
Introduction to Statistical Inference and Confidence Intervals Administrivia o Homework 4 due Friday. o Midterm returned in class on Wednesday C PROBABLY ) Previously on CSCI 3022 Proposition: If X is
More informationVerification and Validation. CS1538: Introduction to Simulations
Verification and Validation CS1538: Introduction to Simulations Steps in a Simulation Study Problem & Objective Formulation Model Conceptualization Data Collection Model translation, Verification, Validation
More informationLab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d
BIOS 4120: Introduction to Biostatistics Breheny Lab #11 We will explore observational studies in today s lab and review how to make inferences on contingency tables. We will only use 2x2 tables for today
More informationPractice Final Examination
Practice Final Examination Mth 136 = Sta 114 Wednesday, 2000 April 26, 2:20 3:00 pm This is a closed-book examination so please do not refer to your notes, the text, or to any other books You may use a
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationComplete all warm up questions Focus on operon functioning we will be creating operon models on Monday
Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA
More informationDifferential expression analysis for sequencing count data. Simon Anders
Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationOptimization and (Under/over)fitting
Optimization and (Under/over)fitting EECS 442 Prof. David Fouhey Winter 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/eecs442_w19/ Administrivia We re grading HW2 and will try
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationf (1 0.5)/n Z =
Math 466/566 - Homework 4. We want to test a hypothesis involving a population proportion. The unknown population proportion is p. The null hypothesis is p = / and the alternative hypothesis is p > /.
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationMean Vector Inferences
Mean Vector Inferences Lecture 5 September 21, 2005 Multivariate Analysis Lecture #5-9/21/2005 Slide 1 of 34 Today s Lecture Inferences about a Mean Vector (Chapter 5). Univariate versions of mean vector
More informationLecture 10: Generalized likelihood ratio test
Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual
More informationProbability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!
Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive
More informationAnnouncements Monday, September 18
Announcements Monday, September 18 WeBWorK 1.4, 1.5 are due on Wednesday at 11:59pm. The first midterm is on this Friday, September 22. Midterms happen during recitation. The exam covers through 1.5. About
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationLecture 3. Big-O notation, more recurrences!!
Lecture 3 Big-O notation, more recurrences!! Announcements! HW1 is posted! (Due Friday) See Piazza for a list of HW clarifications First recitation section was this morning, there s another tomorrow (same
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationConstraint-based Subspace Clustering
Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30 Traditional Clustering Partitions
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationMAT01B1: Maximum and Minimum Values
MAT01B1: Maximum and Minimum Values Dr Craig 14 August 2018 My details: acraig@uj.ac.za Consulting hours: Monday 14h40 15h25 Thursday 11h20 12h55 Friday 11h20 12h55 Office C-Ring 508 https://andrewcraigmaths.wordpress.com/
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationEconomics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,
Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem
More informationTechnologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA
Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More informationAlignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More information