Optimal design of microarray experiments
|
|
- Gilbert Douglas
- 6 years ago
- Views:
Transcription
1 University of Groningen ernst 7 June 2011
2 What is a cdna Microarray Experiment? GREEN (Cy3) Cancer Tissue mrna Mix tissues in equal amounts G G G R R R R R G R G G Hybridisation RED (Cy5) Normal Tissue mrna cdna microarray measures the relative abundance of RNA in two organic tissues. First, cdna labelled with G & R dyes is obtained from 2 conditions and mixed in equal proportions Second, the mixture is hybridized with the arrayed DNA (probes) Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Enlargement Third, competitive hybridization! Probe catches the matched cdna R g3 R R R g3 R R Gene 3 matches with normal tissue mrna -> high red signal Finally, G & R read-out for each spot (gene) by scanning
3 Why DOING a microarray experiment? Obvious, isn t it? To discover genes that are differentially expressed between tissues. a typical genomic profile of a particular disease. differences between phenotypic similar pathologies i.e., to find some differences.
4 Why DESIGNING a microarray experiment? Yeah, why? We want to be sure that the the differences we find are due to biological differences (e.g. different tissue, different diseases); not due to other experimental conditions (e.g. microarray batch, print-pin anomalies, etc.); not due to chance variation. Our exp t should maximize difference between biological conditions.
5 4 known design principles 1. Vary the conditions of interest, with replicates that are as many as possible; biological rather than technical replicates; as evenly distributed across the conditions of interest. 2. Restriction: Keep conditions not of interest constant. 3. Blocking: Keep track of uninteresting conditions that do vary. 4. Matching: Channels are similar units for effective comparisons.
6 Sources of Variation Microarrays are known as noisy experiments. Many artifacts are the result of the experimental conditions: microarray print batches lab experimenter hybridization mixture preparation hybridization conditions We follow an experiment in detail to determine sources of variation.
7 1. Start of microarray experiment Organic material is typically stored frozen to prevent decay. Preparing hybridization mixture takes 1-2 hours. Preparation of mixture is intensive, hands-on process. Scientist decides to prepare 2 of 20 arrays in parallel. 2 arrays = 4 channels. 4 separate tubes are prepared for the 4 channels. The following description applies to each of the 4 tubes.
8 2. Obtaining RNA Several plants of a particular condition are selected. Scientist uses pestle and mortar to grind it down to fine mush. Organic material is full of impurities (proteins): A membrane is used to fish out the RNA. Spectrophotometry to detect: RNA > 2 Clean Proteins Remaining mixture is concentrated using a centrifuge twice. At the end, 100 µg RNA is obtained.
9 3A. Obtaining stable cdna RNA is very unstable. RNA polymerase enzyme copies RNA into cdna, after sequences of Ts (poly-t primer) is added. quantity of 25µl of each nucleotide (A,T,G,C) is added magnesium and salt buffer is added.
10 3B. Preventing RNA folding RNA is heated up to 65 C to prevent it from folding onto itself.
11 3C. Obtaining stable, labelled cdna RNA can be made visible by copying a dye into the cdna. One adds C nucleotides with a dye molecule (Cy3 or Cy5) attached. The reverse transcriptase enzyme is added for an hour.
12 4. Preparing cdna mixture for hybridization Loose A, C, T, G bases are filtered out of cdna mixture. cdna mixture is dried down in centrifuge Original liquid replaced by special liquid ( hybridization buffer. ) Scientist combines 2 tubes together in equal quantities. (This results in two hybridization mixtures for two arrays)
13 5. Slide hybridization The microarray is washed with SSC and SDS. cdna mixture put on the slide & coverslip put on top in order to spread cdna mixture evenly get rid of air bubbles Slide is left in hybridization chamber (dark) at 42 C for hours.
14 6. Post-processing slide All non-hybridized (labelled) cdna is washed from the slide. Finally, the microarray is dried and is ready to be scanned.
15 Sources of variation in a microarray expt... Sources of variation in the hybridization process Different dye sensitivities. Inequalities in the application of mrna to the slide. Variations in the washing efficiencies of non-hybridized mrna off the slide. Other differences in hybridization parameters, such as: temperature, experimenter, time of the day,... Sources of variation in the mrna Differences in conditions. Differences between experimental subjects within the same covariate level. Differences between samples from the same subject. Variation in mrna extraction methods from original sample. Variations in reverse transcription. Differences in PCR amplification. Different labelling efficiencies. Sources of variation in the microarray production Print-pin anomalies Variations in printed probe quantities even with the same pin Chip batch variation (due to many sources of unknown variations) Differences in sequence length of the immobilized DNA Variations in chemical probe attachment levels to the slide Sources of variation in the scanning Different scanners; Different photo-multipliers or gain. Different spot-finding software; Different grid alignments.
16 A statistical toy model for gene expression log y gcbaxdpkr = θ gc +B b +A ba +L a (x)+d ad (θ gc )+P p +S ag +ɛ gck +η gckr. where θ gc = true expression for gene g under condition c B b = RNA Batch effect (experimenter, time of day, temperature) A ba = array effect (scanning level, pre/post-washing) L a (x) = location effect (chip, coverslip, washing) D ad (θ gc ) = dye effect (dye, unequal mixing of mixtures, labelling, intensity) P p = print pin effect S ag = spot effect (amount of DNA in the spot printed on slide) ɛ gck = biological variation for individual k η gckr = within-replicate-k variation of a technical replicate r
17 Normalization The proposed model: log y gcbaxdpkr = θ gc +B b +A ba +L a (x)+d ad (θ gc )+P p +S ag +ɛ gck +η gckr. It is hoped that the structural artifacts, B b, A ba, L a (x), D ad (θ gc ), P p can be eliminated by means of normalization. This leaves us a model for gene g at condition c on array a for the r th technical replicate of individual k, log y gcakr = θ gc + S ag + ɛ gck + η gcakr, where η gcakr effects. includes non-structural artifacts from nuisance
18 Five questions (one we answer four you will) From the model: Cy3: log x ikr = θ i + S + ɛ ik + η ikr Cy5: log x jkr = θ j + S + ɛ jk + η jkr, We define 5 microarray design problems: 1. How to estimate differential expression δ ij = θ i θ j? 2. How to deal with biological variation ɛ ik if we have too few replicates?... if we have too many replicates? 3. What if the θ possess some factorial structure? 4. What if we are interested in some functional form θ = θ(t, ϕ)?
19 Example: two different designs with 5 microarrays Question: Which design is more efficient in estimating the contrasts? Reference Design Loop Design R Note: each arrow stands for 1 microarray (green dye red dye).
20 Developing our intuition... The principal quantity of interest in microarray experiments is: the level of differential expression across the conditions δ ij Example: Let δ 13 = true log-difference in expression between 1 and 3. How good can we estimate δ 13 in both designs, assuming that each observation y has variance σ 2? NOTE: Variance σ 2 = σ 2 b + σ2 t consist both of biological and technical variation.
21 An estimate ˆδ 13 of δ 13 in a Reference Design Let y i = observed log-difference on microarray i. Reference Design 1 y 1 y 3 = (log C 1 log Ref) (log C 3 log Ref) = ˆδ 13, 5 4 R 3 2 so ) V (ˆδ13 = V (y 1 ) + V (y 2 ) = 2σ 2, where σ 2 is variance of single log-ratio V (y i ).
22 So, the loop design has a more precise Optimal estimate design of microarray of theexperiments log-ratio δ. An estimate ˆδ 13 of δ 13 in a Loop Design Let y i = observed log-difference on microarray i 5 Loop Design 1 2 y 1 y 2 = log C 1 C 2 log C 2 C 3 = ˆδ (1) 13 y 4 + y 3 y 5 = log C 1 C 5 log C 5 C 4 log C 4 C 3 = ˆδ (2) ˆδ 13 = 0.6 ˆδ (1) ˆδ (2) 13, so ) V (ˆδ13 = σ σ 2 = 1.2σ 2, where σ 2 is the variance of a single log-ratio
23 Two major questions Are there better designs than loop designs? = Sometimes Are loop designs always more efficient than reference designs? = Not always So how can we find optimal designs?
24 Some results from the literature... These are optimal even designs for all the contrasts when the number of arrays is two more than the number of conditions. (Kerr and Churchill 2001; exhaustive search) The problem with these and similar results not of practical interest. no indication how this result can be extended.
25 Schematic representation of a microarray exp t Loop Design y 1 y 2 y 3 y 4 y 5 = δ 21 δ 31 δ 41 δ 51 + ɛ Reference Design R 3 2 y 1 y 2 y 3 y 4 y 5 = δ 1r δ 2r δ 3r δ 4r δ 5r + ɛ
26 Outline of Near Optimal Design Algorithm Wit, Nobile and Khanin (2005) proposed the following algorithm: 1. Formalize design notation X, y = X δ + ɛ 2. Estimate the parameters of interest for design X, ˆδ ij X = (X t X ) 1 X t y, i > j 3. Summarize efficiency of parameter estimates, e.g. average V (ˆδ ij X ) 4. Search design X that attains highest efficiency, X = arg min X average V (ˆδ X ij ).
27 Design Optimality (1) Two ways to translate covariance matrix into univariate summary: A-optimality = minimizing Trace [ (X t X ) 1]. Minimizes the sum of the variances of the parameter estimates. D-optimality = minimizing Det [ (X t X ) 1]. Minimizes the generalized variance of the parameter estimates. PROBLEM: A-optimality depends on parametrization.
28 Design Optimality (2) The parameters of interest are ALL CONTRASTS δ = (δ ij ) i<j. Note: δ ij = δ i1 δ j1, so therefore: There is a matrix C, such that δ = C δ 21 δ 31 δ 41 δ 51 L-optimality: minimize Trace[C(X t X ) 1 C t ]. minimize average variance of all contrasts. This is equivalent to A-optimality with j δ j1 = 0 constraint.
29 Simulated Annealing Kirkpatrick et al. (1983) developed simple algorithm to minimize f : 1. Let X be the current state 2. Propose a new state X 3. Accept the new proposal with probability: { ( ) } f (X ) 1/T min 1, f (X, ) where T is the current temperature. 4. Slightly lower the temperature and return to 1. This procedure is valid if the proposals are symmetric; This procedure reaches the optimum if the chain is irreducible.
30 Simulated Annealing: Design Move 1 Single vertex move. Pick at random one comparison (between i and j). Pick at random a condition k. Replace comparison between i and j by comparison between i and k.
31 Simulated Annealing: Design Move 1 Single vertex move. Pick at random one comparison (between i and j). Pick at random a condition k. Replace comparison between i and j by comparison between i and k.
32 Simulated Annealing: Design Move 1 Single vertex move. Pick at random one comparison (between i and j). Pick at random a condition k. Replace comparison between i and j by comparison between i and k.
33 Simulated Annealing: Design Move 2 Single edge move. Pick at random one comparison (i and j). Pick at random two conditions, l and k. Replace comparison between i and j by comparison between l and k.
34 Simulated Annealing: Design Move 2 Single edge move. Pick at random one comparison (i and j). Pick at random two conditions, l and k. Replace comparison between i and j by comparison between l and k.
35 Simulated Annealing: Design Move 2 Single edge move. Pick at random one comparison (i and j). Pick at random two conditions, l and k. Replace comparison between i and j by comparison between l and k.
36 Simulated Annealing: Design Move 3 Balanced two-edge move. Pick at random two non-overlapping comparisons (i & j and k & l). Replace comparison i and j by comparison i and l. Replace comparison k and l by comparison k and j.
37 Simulated Annealing: Design Move 3 Balanced two-edge move. Pick at random two non-overlapping comparisons (i & j and k & l). Replace comparison i and j by comparison i and l. Replace comparison k and l by comparison k and j.
38 Simulated Annealing: Design Move 3 Balanced two-edge move. Pick at random two non-overlapping comparisons (i & j and k & l). Replace comparison i and j by comparison i and l. Replace comparison k and l by comparison k and j.
39 Loop designs are often A-optimal Stochastic Search of A Optimal Design for Contrasts with Simulated Annealing. Stochastic Search of A Optimal Design for Contrasts with Simulated Annealing. Acceptance rate = 36.4 %. Score := Tr[ Inv(X X) ] = 28. Nr of iterations = Acceptance rate = 37.1 %. Score := Tr[ Inv(X X) ] = 42. Nr of iterations = Nr of arrays = 7. Nr of conditions = 7 Nr of arrays = 8. Nr of conditions = 8 A-optimal designs for 7/8 conditions with 7/8 slides, respectively.
40 ...but not always Stochastic Search of A Optimal Design for Contrasts with Simulated Annealing. Acceptance rate = 36.3 %. Score := Tr[ Inv(X X) ] = Nr of iterations = Nr of arrays = 9. Nr of conditions = 9 A-optimal design with 9 conditions and 9 microarrays.
41 In practice: cyclic designs An interwoven loop design is defined by: number of conditions (HERE: 15) number of loops (HERE: 3) jump size for each loop (HERE: 1, 4, 6) However, cyclic designs are typically not A-optimal for p 17.
42 Advantages of an Interwoven Loop design A Optimality Score for Contrasts: Tr[ Inv(X X) ] = No of arrays = 90. No of conditions = 30 Easy to implement Compare left with right! Highly efficient Left is only 0.6% more efficient than Right Automatic dye balance conditions measured equally in Cy3/Cy5
43 Use od: optimal (interwoven loop) designs in R The function od(nt, ns, optimality =..., method = "loop") finds L-optimal/D-optimal interwoven loop/all designs for any number of conditions and any number of slides Can be obtained from smida library at: wite/book Some conclusions...
44 Dye Swap or Dye Balance? The simple model needs to be expanded for a possible dye effect: log y 1kr = θ 1 + S + +ɛ 1k + η 1kr log y 2kr = θ 2 + S + D + ɛ 2k + η 2kr, Dye swap designs have been proposed as way to control dye effect. HOWEVER, interwoven loop designs are more efficient! D A 5 cond. 10 arrays 89% 80% 6 cond. 12 arrays 87% 75% 7 cond. 14 arrays 85% 69% 8 cond. 16 arrays 82% 62% Relative D- and A-efficiency of dye-swap versus interwoven loop.
45 What other 4 design questions can be raised? 1. Too little biological material: technical replicates. 2. Too many biological material: pooling. 3. Factorial designs 4. Kinetic models We briefly describe these problems in what follows. You can work on them during the afternoon.
46 Case 1: not enough biological material... Typically, Wit et al. (2005) and others assumed: number of biological replicates exceeds the number of channels. Only then the above-mentioned optimal design -algorithm is applicable. What if number of channels exceeds number of biological replicates? In that case, we have to use technical replicates. New Question: How to assign the technical replicates to the arrays?
47 A simple practical example Consider an experiment with 3 conditions and 6 arrays. Each condition has 2 biological and 2 technical replicates. Dye swap an alternative Same design matrix, but different efficiency of parameter estimates.
48 Mixed effect models The mixed effect models can be written as: y = X θ + Z bio ɛ + η, where Z: the random effect design matrices ɛ random replication effect (of no interest) So, Var(y) = σ 2 b Z t b Z b + σ 2 t I.
49 Covariance-variance matrix Σ Σ = σb σt 2 I Σ = σb If only independent biological replicates, then Σ = (σ 2 b + σ2 t )I. That is the scenario in Wit et al. (2005). + σt 2 I
50 Generalized least squares The linear model is modelled by y = X δ + ɛ, where Var(ɛ) ρσ B + I is the variance structure of the residual. The generalized least squares estimates of δ: and so ˆδ = (X t Σ 1 X ) 1 X t Σ 1 y, V (ˆδ) (X t Σ 1 X ) 1. Under A-optimality we aim to minimize the f (X, Σ) = Tr(X t Σ 1 X ) 1
51 Case 2: Too many replicates Assume the following model for single gene & single condition: log y 1kr = θ 1 + S + ɛ k + η kr, where V (ɛ k ) = σɛ 2 : biological variation: associated with individual tissue V (η kr ) = ση: 2 technical variation: associated with microarray If p biological replicates are mixed before hybridization to array. Let x i be the expression of the ith pool, then V ( x i ) = σ2 ɛ p + σ2 η. Pool has less (biological) variation than each single tissue.
52 Pooling is not free: optimal pooling We want to minimize the estimation error: σ 2 ɛ np + σ2 η n, subject to not overrunning one s budget B = n p C s + n C a, C s = costs individual sample extraction C a = cost microarray (including dyes) Euler-Lagrange: Find the stationary points of h(n, p, λ) = σ2 ɛ np + σ2 η n λ (n p C s + n C a B).
53 Case 3: multi-factorial experiments (Simplified) example: Caetano et al. Genetics 2004, 168: Objective: Identify differentially expressed genes in pigs between 2 factors: Ovulation rate (2 levels: high, normal) Time after inducing luteal regression (4 levels: 2, 3, 4, 6 days) 2 4 = 8 factor combinations Available samples: 2 pigs per factor combination. Available microarrays: 12
54 Model Uncertainty One way to design the experiment: consider all factor combinations. This corresponds to a full factorial model. BUT possibly not all interactions are equally important: given large number of genes, some form of multi-factorial optimal design is natural; down-weight for higher order interactions.
55 Multi-factorial design optimality criterion Let Ey = X β, where β = (β 0,..., β p ) be the design and parametrization of a particular factorial model. Tsai, Gilmour and Mead (2000) suggested a Q-criterion, Q(X ) = 1 n 0 p p 1 a ii i=1 j=0 a 2 ij a ii a jj w ij, where {a ij } = X t X the information matrix, and w ij = no. of factorial submodels of X, which both include β i and β j. n 0 = total number of factorial submodels of X.
56 Q-criterion in microarray studies No intercept β 0 in the microarray log-expression ratio model. Slight modification of the Q-criterion: Q(X ) = 1 n 0 p p 1 a ii i=1 j=1 a 2 ij a ii a jj w ij. The old Q-criterion corresponds to including an explicit dye-effect.
57 Example: two-factor microarray experiment Let s focus on two lines of pigs whose ovary material is studied 2,3,4 and 6 days after inducing luteal regression. Because we study differential expressions, there is no intercept in the model, y ijk = α i + β j + (αβ) ij + ɛ ijk i = 1, 2, j = 1,..., 4 Not to favour any factor level, we choose the parametrization: α i = β j = (αβ) ij = 0. i j ij
58 Factorial submodels of two factor model α α + β α β β The total number of models n 0 = 4. The weight matrix is given by: w
59 Case 4: D-optimal designs for enzyme kinetic models Conditions = time Reaction A θ,λ B ODE representation: d[a] dt = θ[a] λ. with solution [A] = η(t, ψ) = { {1 (1 λ)θt} 1/(1 λ) λ 1 e θt λ = 1 Question: how to choose observations locations t 1,..., t k to optimize info about ψ = (θ, λ)?
60 Continuous designs and D-optimality The aim is to find an D-optimal design, { x1 x ξ = 2... x n w 1 w 2... w n }, whereby the x are the support points and the w the design weights. Solution independent of number of observations: w [0, 1]. Silvey (1980): optimal design is often concentrated on p design points, with weight 1/p (where p = # parameters). ξ is D-optimal, if it maximizes where F t = ( dη(t1,ψ) dθ dη(t 1,ψ) dλ det F t WF, dη(t 2,ψ) dθ dη(t 2,ψ) dλ So, optimal design depends on (guess) ψ 0. ) W = diag(1/2, 1/2)
61 Conclusions We have met several microarray design issues and concluded: 1. There are several design principles to obey when considering sources of variation of a microarray experiment. 2. carefully assigning samples to conditions can improve estimation: optimal designs are important in order to reduce cost/time interwoven loop designs are a good compromise 3. Further questions to consider: Technical replication Pooling Factorial designs kinetic modelling.
Topics on statistical design and analysis. of cdna microarray experiment
Topics on statistical design and analysis of cdna microarray experiment Ximin Zhu A Dissertation Submitted to the University of Glasgow for the degree of Doctor of Philosophy Department of Statistics May
More informationDesign of Microarray Experiments. Xiangqin Cui
Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationBiochip informatics-(i)
Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing
More informationcdna Microarray Analysis
cdna Microarray Analysis with BioConductor packages Nolwenn Le Meur Copyright 2007 Outline Data acquisition Pre-processing Quality assessment Pre-processing background correction normalization summarization
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationExample 1: Two-Treatment CRD
Introduction to Mixed Linear Models in Microarray Experiments //0 Copyright 0 Dan Nettleton Statistical Models A statistical model describes a formal mathematical data generation mechanism from which an
More informationClustering & microarray technology
Clustering & microarray technology A large scale way to measure gene expression levels. Thanks to Kevin Wayne, Matt Hibbs, & SMD for a few of the slides 1 Why is expression important? Proteins Gene Expression
More informationSeminar Microarray-Datenanalyse
Seminar Microarray- Normalization Hans-Ulrich Klein Christian Ruckert Institut für Medizinische Informatik WWU Münster SS 2011 Organisation 1 09.05.11 Normalisierung 2 10.05.11 Bestimmen diff. expr. Gene,
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationExpression arrays, normalization, and error models
1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in
More informationSPOTTED cdna MICROARRAYS
SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.
More informationChapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:
More informationOptimal normalization of DNA-microarray data
Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La
More informationNormalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationBME 5742 Biosystems Modeling and Control
BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various
More informationNOVABEADS FOOD 1 DNA KIT
NOVABEADS FOOD 1 DNA KIT NOVABEADS FOOD DNA KIT is the new generation tool in molecular biology techniques and allows DNA isolations from highly processed food products. The method is based on the use
More informationComplete all warm up questions Focus on operon functioning we will be creating operon models on Monday
Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA
More informationNewly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:
m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationDesign of microarray experiments
Design of microarray experiments Ulrich Mansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis March 23 Heidelberg Heidelberg, March 23 Experiments Scientists deal mostly with experiments
More informationUse of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance
Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Anthea Dokidis Glenda Delenstarr Abstract The performance of the Agilent microarray system can now be evaluated
More informationWavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis
Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Jeffrey S. Morris University of Texas, MD Anderson Cancer Center Joint wor with Marina Vannucci, Philip J. Brown,
More informationGene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC
More informationLinear Regression. Volker Tresp 2018
Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w
More informationLesson 11. Functional Genomics I: Microarray Analysis
Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)
More information(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.
1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the
More informationCONJOINT 541. Translating a Transcriptome at Specific Times and Places. David Morris. Department of Biochemistry
CONJOINT 541 Translating a Transcriptome at Specific Times and Places David Morris Department of Biochemistry http://faculty.washington.edu/dmorris/ Lecture 1 The Biology and Experimental Analysis of mrna
More informationOn construction of constrained optimum designs
On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationIntroduction to clustering methods for gene expression data analysis
Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationExperimental Design. Experimental design. Outline. Choice of platform Array design. Target samples
Experimental Design Credit for some of today s materials: Jean Yang, Terry Speed, and Christina Kendziorski Experimental design Choice of platform rray design Creation of probes Location on the array Controls
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org
More informationDesign of microarray experiments
Design of microarray experiments Ulrich ansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis September Heidelberg Heidelberg, September otivation The lab biologist and theoretician need
More informationDesign and Analysis of Gene Expression Experiments
Design and Analysis of Gene Expression Experiments Guilherme J. M. Rosa Department of Animal Sciences Department of Biostatistics & Medical Informatics University of Wisconsin - Madison OUTLINE Æ Linear
More informationIntroduction to clustering methods for gene expression data analysis
Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationCybergenetics: Control theory for living cells
Department of Biosystems Science and Engineering, ETH-Zürich Cybergenetics: Control theory for living cells Corentin Briat Joint work with Ankit Gupta and Mustafa Khammash Introduction Overview Cybergenetics:
More informationFuzzy Clustering of Gene Expression Data
Fuzzy Clustering of Gene Data Matthias E. Futschik and Nikola K. Kasabov Department of Information Science, University of Otago P.O. Box 56, Dunedin, New Zealand email: mfutschik@infoscience.otago.ac.nz,
More information9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes
Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin
More informationBi 1x Spring 2014: LacI Titration
Bi 1x Spring 2014: LacI Titration 1 Overview In this experiment, you will measure the effect of various mutated LacI repressor ribosome binding sites in an E. coli cell by measuring the expression of a
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More informationDIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS
Sankhyā : The Indian Journal of Statistics Special issue in memory of D. Basu 2002, Volume 64, Series A, Pt. 3, pp 706-720 DIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS By TERENCE P. SPEED
More informationLecture 7: Simple genetic circuits I
Lecture 7: Simple genetic circuits I Paul C Bressloff (Fall 2018) 7.1 Transcription and translation In Fig. 20 we show the two main stages in the expression of a single gene according to the central dogma.
More information9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes
Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin
More informationPrincipal component analysis (PCA) for clustering gene expression data
Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical
More informationTruSight Cancer Workflow on the MiniSeq System
TruSight Cancer Workflow on the MiniSeq System Prepare Library Sequence Analyze Data TruSight Cancer 1.5 days ~ 24 hours < 2 hours TruSight Cancer Library Prep MiniSeq System Local Run Manager Enrichment
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationBioinformatics 2. Yeast two hybrid. Proteomics. Proteomics
GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More information13.4 Gene Regulation and Expression
13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.
More informationExam: high-dimensional data analysis January 20, 2014
Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationA Semi-Supervised Dimensionality Reduction Method to Reduce Batch Effects in Genomic Data
A Semi-Supervised Dimensionality Reduction Method to Reduce Batch Effects in Genomic Data Anusha Murali Mentor: Dr. Mahmoud Ghandi MIT PRIMES Computer Science Conference 2018 October 13, 2018 Anusha Murali
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More informationrobustness: revisting the significance of mirna-mediated regulation
: revisting the significance of mirna-mediated regulation Hervé Seitz IGH (CNRS), Montpellier, France October 13, 2012 microrna target identification .. microrna target identification mir: target: 2 7
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationOlympic B3 Summer Science Camp 2015 Lab 0
Using Lab Stock Solutions interpretation and calculations Introduction: In molecular biology you generally start with a specific set of general instructions, called a Protocol. You can think of it as a
More informationComputational statistics
Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f
More informationChapter 8: Introduction to Evolutionary Computation
Computational Intelligence: Second Edition Contents Some Theories about Evolution Evolution is an optimization process: the aim is to improve the ability of an organism to survive in dynamically changing
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More information1. In most cases, genes code for and it is that
Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod
More informationmrna Isolation Kit for Blood/Bone Marrow For isolation mrna from blood or bone marrow lysates Cat. No
For isolation mrna from blood or bone marrow lysates Cat. No. 1 934 333 Principle Starting material Application Time required Results Key advantages The purification of mrna requires two steps: 1. Cells
More information2015 FALL FINAL REVIEW
2015 FALL FINAL REVIEW Biomolecules & Enzymes Illustrate table and fill in parts missing 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how
More informationGLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data
GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between
More informationTwo-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation
Simple Examples of Analysis for a Single Gene wo-olor Microarray Experimental Design Notation /3/0 opyright 0 Dan Nettleton Microarray Experimental Design Notation Microarray Experimental Design Notation
More informationProteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?
Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains
More informationCyproheptadine ELISA Test Kit
Cyproheptadine ELISA Test Kit Catalog No. LSY-10041 1. Principal This test kit is based on the competitive enzyme immunoassay for the detection of Cyproheptadine in urine, tissue sample. The coupling antigens
More informationData Sheet. Azide Cy5 RNA T7 Transcription Kit
Cat. No. Size 1. Description PP-501-Cy5 10 reactions à 40 µl For in vitro use only Quality guaranteed for 12 months Store all components at -20 C. Avoid freeze and thaw cycles. DBCO-Sulfo-Cy5 must be stored
More informationData Preprocessing. Data Preprocessing
Data Preprocessing 1 Data Preprocessing Normalization: the process of removing sampleto-sample variations in the measurements not due to differential gene expression. Bringing measurements from the different
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationRNA Transport. R preps R preps
RNA Transport R0527-00 5 preps R0527-01 50 preps July 2014 RNA Transport Table of Contents Introduction...2 Kit Contents/Storage and Stability...3 Protocol...4 Storage Procedure...4 Recovery Procedure...5
More informationRelated Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.
CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio Review Autumn 2004 Larry Ruzzo Related Courses He who asks is a fool for five minutes, but he who does not ask remains
More informationThe Blessing and the Curse
The Blessing and the Curse of the Multiplicative Updates Manfred K. Warmuth University of California, Santa Cruz CMPS 272, Feb 31, 2012 Thanks to David Ilstrup and Anindya Sen for helping with the slides
More informationExam: high-dimensional data analysis February 28, 2014
Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.
More informationNetwork Biology-part II
Network Biology-part II Jun Zhu, Ph. D. Professor of Genomics and Genetic Sciences Icahn Institute of Genomics and Multi-scale Biology The Tisch Cancer Institute Icahn Medical School at Mount Sinai New
More informationMag-Bind Soil DNA Kit. M preps M preps M preps
Mag-Bind Soil DNA Kit M5635-00 5 preps M5635-01 50 preps M5635-02 200 preps January 2013 Mag-Bind Soil DNA Kit Table of Contents Introduction and Overview...2 Kit Contents/Storage and Stability...3 Preparing
More informationLinear Models and Empirical Bayes Methods for Microarrays
Methods for Microarrays by Gordon Smyth Alex Sánchez and Carme Ruíz de Villa Department d Estadística Universitat de Barcelona 16-12-2004 Outline 1 An introductory example Paper overview 2 3 Lönnsted and
More informationCSE 554 Lecture 7: Alignment
CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex
More informationBioinformatics. Transcriptome
Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics
More informationINFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld
INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical
More informationAlgorithms in Computational Biology (236522) spring 2008 Lecture #1
Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??
More informationCORE MOLIT ACTIVITIES at a glance
CORE MOLIT ACTIVITIES at a glance 1. Amplification of Biochemical Signals: The ELISA Test http://molit.concord.org/database/activities/248.html The shape of molecules affects the way they function. A test
More informationSingle gene analysis of differential expression. Giorgio Valentini
Single gene analysis of differential expression Giorgio Valentini valenti@disi.unige.it Comparing two conditions Each condition may be represented by one or more RNA samples. Using cdna microarrays, samples
More informationDiscovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.
Discovering Correlation in Data Vinh Nguyen (vinh.nguyen@unimelb.edu.au) Research Fellow in Data Science Computing and Information Systems DMD 7.14 Discovering Correlation Why is correlation important?
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationIdentifying Bio-markers for EcoArray
Identifying Bio-markers for EcoArray Ashish Bhan, Keck Graduate Institute Mustafa Kesir and Mikhail B. Malioutov, Northeastern University February 18, 2010 1 Introduction This problem was presented by
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationCSEP 590A Summer Lecture 4 MLE, EM, RE, Expression
CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationCSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators
CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationChapters 12&13 Notes: DNA, RNA & Protein Synthesis
Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,
More informationDiscovering molecular pathways from protein interaction and ge
Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why
More information