Optimal design of microarray experiments

Size: px
Start display at page:

Download "Optimal design of microarray experiments"

Transcription

1 University of Groningen ernst 7 June 2011

2 What is a cdna Microarray Experiment? GREEN (Cy3) Cancer Tissue mrna Mix tissues in equal amounts G G G R R R R R G R G G Hybridisation RED (Cy5) Normal Tissue mrna cdna microarray measures the relative abundance of RNA in two organic tissues. First, cdna labelled with G & R dyes is obtained from 2 conditions and mixed in equal proportions Second, the mixture is hybridized with the arrayed DNA (probes) Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Enlargement Third, competitive hybridization! Probe catches the matched cdna R g3 R R R g3 R R Gene 3 matches with normal tissue mrna -> high red signal Finally, G & R read-out for each spot (gene) by scanning

3 Why DOING a microarray experiment? Obvious, isn t it? To discover genes that are differentially expressed between tissues. a typical genomic profile of a particular disease. differences between phenotypic similar pathologies i.e., to find some differences.

4 Why DESIGNING a microarray experiment? Yeah, why? We want to be sure that the the differences we find are due to biological differences (e.g. different tissue, different diseases); not due to other experimental conditions (e.g. microarray batch, print-pin anomalies, etc.); not due to chance variation. Our exp t should maximize difference between biological conditions.

5 4 known design principles 1. Vary the conditions of interest, with replicates that are as many as possible; biological rather than technical replicates; as evenly distributed across the conditions of interest. 2. Restriction: Keep conditions not of interest constant. 3. Blocking: Keep track of uninteresting conditions that do vary. 4. Matching: Channels are similar units for effective comparisons.

6 Sources of Variation Microarrays are known as noisy experiments. Many artifacts are the result of the experimental conditions: microarray print batches lab experimenter hybridization mixture preparation hybridization conditions We follow an experiment in detail to determine sources of variation.

7 1. Start of microarray experiment Organic material is typically stored frozen to prevent decay. Preparing hybridization mixture takes 1-2 hours. Preparation of mixture is intensive, hands-on process. Scientist decides to prepare 2 of 20 arrays in parallel. 2 arrays = 4 channels. 4 separate tubes are prepared for the 4 channels. The following description applies to each of the 4 tubes.

8 2. Obtaining RNA Several plants of a particular condition are selected. Scientist uses pestle and mortar to grind it down to fine mush. Organic material is full of impurities (proteins): A membrane is used to fish out the RNA. Spectrophotometry to detect: RNA > 2 Clean Proteins Remaining mixture is concentrated using a centrifuge twice. At the end, 100 µg RNA is obtained.

9 3A. Obtaining stable cdna RNA is very unstable. RNA polymerase enzyme copies RNA into cdna, after sequences of Ts (poly-t primer) is added. quantity of 25µl of each nucleotide (A,T,G,C) is added magnesium and salt buffer is added.

10 3B. Preventing RNA folding RNA is heated up to 65 C to prevent it from folding onto itself.

11 3C. Obtaining stable, labelled cdna RNA can be made visible by copying a dye into the cdna. One adds C nucleotides with a dye molecule (Cy3 or Cy5) attached. The reverse transcriptase enzyme is added for an hour.

12 4. Preparing cdna mixture for hybridization Loose A, C, T, G bases are filtered out of cdna mixture. cdna mixture is dried down in centrifuge Original liquid replaced by special liquid ( hybridization buffer. ) Scientist combines 2 tubes together in equal quantities. (This results in two hybridization mixtures for two arrays)

13 5. Slide hybridization The microarray is washed with SSC and SDS. cdna mixture put on the slide & coverslip put on top in order to spread cdna mixture evenly get rid of air bubbles Slide is left in hybridization chamber (dark) at 42 C for hours.

14 6. Post-processing slide All non-hybridized (labelled) cdna is washed from the slide. Finally, the microarray is dried and is ready to be scanned.

15 Sources of variation in a microarray expt... Sources of variation in the hybridization process Different dye sensitivities. Inequalities in the application of mrna to the slide. Variations in the washing efficiencies of non-hybridized mrna off the slide. Other differences in hybridization parameters, such as: temperature, experimenter, time of the day,... Sources of variation in the mrna Differences in conditions. Differences between experimental subjects within the same covariate level. Differences between samples from the same subject. Variation in mrna extraction methods from original sample. Variations in reverse transcription. Differences in PCR amplification. Different labelling efficiencies. Sources of variation in the microarray production Print-pin anomalies Variations in printed probe quantities even with the same pin Chip batch variation (due to many sources of unknown variations) Differences in sequence length of the immobilized DNA Variations in chemical probe attachment levels to the slide Sources of variation in the scanning Different scanners; Different photo-multipliers or gain. Different spot-finding software; Different grid alignments.

16 A statistical toy model for gene expression log y gcbaxdpkr = θ gc +B b +A ba +L a (x)+d ad (θ gc )+P p +S ag +ɛ gck +η gckr. where θ gc = true expression for gene g under condition c B b = RNA Batch effect (experimenter, time of day, temperature) A ba = array effect (scanning level, pre/post-washing) L a (x) = location effect (chip, coverslip, washing) D ad (θ gc ) = dye effect (dye, unequal mixing of mixtures, labelling, intensity) P p = print pin effect S ag = spot effect (amount of DNA in the spot printed on slide) ɛ gck = biological variation for individual k η gckr = within-replicate-k variation of a technical replicate r

17 Normalization The proposed model: log y gcbaxdpkr = θ gc +B b +A ba +L a (x)+d ad (θ gc )+P p +S ag +ɛ gck +η gckr. It is hoped that the structural artifacts, B b, A ba, L a (x), D ad (θ gc ), P p can be eliminated by means of normalization. This leaves us a model for gene g at condition c on array a for the r th technical replicate of individual k, log y gcakr = θ gc + S ag + ɛ gck + η gcakr, where η gcakr effects. includes non-structural artifacts from nuisance

18 Five questions (one we answer four you will) From the model: Cy3: log x ikr = θ i + S + ɛ ik + η ikr Cy5: log x jkr = θ j + S + ɛ jk + η jkr, We define 5 microarray design problems: 1. How to estimate differential expression δ ij = θ i θ j? 2. How to deal with biological variation ɛ ik if we have too few replicates?... if we have too many replicates? 3. What if the θ possess some factorial structure? 4. What if we are interested in some functional form θ = θ(t, ϕ)?

19 Example: two different designs with 5 microarrays Question: Which design is more efficient in estimating the contrasts? Reference Design Loop Design R Note: each arrow stands for 1 microarray (green dye red dye).

20 Developing our intuition... The principal quantity of interest in microarray experiments is: the level of differential expression across the conditions δ ij Example: Let δ 13 = true log-difference in expression between 1 and 3. How good can we estimate δ 13 in both designs, assuming that each observation y has variance σ 2? NOTE: Variance σ 2 = σ 2 b + σ2 t consist both of biological and technical variation.

21 An estimate ˆδ 13 of δ 13 in a Reference Design Let y i = observed log-difference on microarray i. Reference Design 1 y 1 y 3 = (log C 1 log Ref) (log C 3 log Ref) = ˆδ 13, 5 4 R 3 2 so ) V (ˆδ13 = V (y 1 ) + V (y 2 ) = 2σ 2, where σ 2 is variance of single log-ratio V (y i ).

22 So, the loop design has a more precise Optimal estimate design of microarray of theexperiments log-ratio δ. An estimate ˆδ 13 of δ 13 in a Loop Design Let y i = observed log-difference on microarray i 5 Loop Design 1 2 y 1 y 2 = log C 1 C 2 log C 2 C 3 = ˆδ (1) 13 y 4 + y 3 y 5 = log C 1 C 5 log C 5 C 4 log C 4 C 3 = ˆδ (2) ˆδ 13 = 0.6 ˆδ (1) ˆδ (2) 13, so ) V (ˆδ13 = σ σ 2 = 1.2σ 2, where σ 2 is the variance of a single log-ratio

23 Two major questions Are there better designs than loop designs? = Sometimes Are loop designs always more efficient than reference designs? = Not always So how can we find optimal designs?

24 Some results from the literature... These are optimal even designs for all the contrasts when the number of arrays is two more than the number of conditions. (Kerr and Churchill 2001; exhaustive search) The problem with these and similar results not of practical interest. no indication how this result can be extended.

25 Schematic representation of a microarray exp t Loop Design y 1 y 2 y 3 y 4 y 5 = δ 21 δ 31 δ 41 δ 51 + ɛ Reference Design R 3 2 y 1 y 2 y 3 y 4 y 5 = δ 1r δ 2r δ 3r δ 4r δ 5r + ɛ

26 Outline of Near Optimal Design Algorithm Wit, Nobile and Khanin (2005) proposed the following algorithm: 1. Formalize design notation X, y = X δ + ɛ 2. Estimate the parameters of interest for design X, ˆδ ij X = (X t X ) 1 X t y, i > j 3. Summarize efficiency of parameter estimates, e.g. average V (ˆδ ij X ) 4. Search design X that attains highest efficiency, X = arg min X average V (ˆδ X ij ).

27 Design Optimality (1) Two ways to translate covariance matrix into univariate summary: A-optimality = minimizing Trace [ (X t X ) 1]. Minimizes the sum of the variances of the parameter estimates. D-optimality = minimizing Det [ (X t X ) 1]. Minimizes the generalized variance of the parameter estimates. PROBLEM: A-optimality depends on parametrization.

28 Design Optimality (2) The parameters of interest are ALL CONTRASTS δ = (δ ij ) i<j. Note: δ ij = δ i1 δ j1, so therefore: There is a matrix C, such that δ = C δ 21 δ 31 δ 41 δ 51 L-optimality: minimize Trace[C(X t X ) 1 C t ]. minimize average variance of all contrasts. This is equivalent to A-optimality with j δ j1 = 0 constraint.

29 Simulated Annealing Kirkpatrick et al. (1983) developed simple algorithm to minimize f : 1. Let X be the current state 2. Propose a new state X 3. Accept the new proposal with probability: { ( ) } f (X ) 1/T min 1, f (X, ) where T is the current temperature. 4. Slightly lower the temperature and return to 1. This procedure is valid if the proposals are symmetric; This procedure reaches the optimum if the chain is irreducible.

30 Simulated Annealing: Design Move 1 Single vertex move. Pick at random one comparison (between i and j). Pick at random a condition k. Replace comparison between i and j by comparison between i and k.

31 Simulated Annealing: Design Move 1 Single vertex move. Pick at random one comparison (between i and j). Pick at random a condition k. Replace comparison between i and j by comparison between i and k.

32 Simulated Annealing: Design Move 1 Single vertex move. Pick at random one comparison (between i and j). Pick at random a condition k. Replace comparison between i and j by comparison between i and k.

33 Simulated Annealing: Design Move 2 Single edge move. Pick at random one comparison (i and j). Pick at random two conditions, l and k. Replace comparison between i and j by comparison between l and k.

34 Simulated Annealing: Design Move 2 Single edge move. Pick at random one comparison (i and j). Pick at random two conditions, l and k. Replace comparison between i and j by comparison between l and k.

35 Simulated Annealing: Design Move 2 Single edge move. Pick at random one comparison (i and j). Pick at random two conditions, l and k. Replace comparison between i and j by comparison between l and k.

36 Simulated Annealing: Design Move 3 Balanced two-edge move. Pick at random two non-overlapping comparisons (i & j and k & l). Replace comparison i and j by comparison i and l. Replace comparison k and l by comparison k and j.

37 Simulated Annealing: Design Move 3 Balanced two-edge move. Pick at random two non-overlapping comparisons (i & j and k & l). Replace comparison i and j by comparison i and l. Replace comparison k and l by comparison k and j.

38 Simulated Annealing: Design Move 3 Balanced two-edge move. Pick at random two non-overlapping comparisons (i & j and k & l). Replace comparison i and j by comparison i and l. Replace comparison k and l by comparison k and j.

39 Loop designs are often A-optimal Stochastic Search of A Optimal Design for Contrasts with Simulated Annealing. Stochastic Search of A Optimal Design for Contrasts with Simulated Annealing. Acceptance rate = 36.4 %. Score := Tr[ Inv(X X) ] = 28. Nr of iterations = Acceptance rate = 37.1 %. Score := Tr[ Inv(X X) ] = 42. Nr of iterations = Nr of arrays = 7. Nr of conditions = 7 Nr of arrays = 8. Nr of conditions = 8 A-optimal designs for 7/8 conditions with 7/8 slides, respectively.

40 ...but not always Stochastic Search of A Optimal Design for Contrasts with Simulated Annealing. Acceptance rate = 36.3 %. Score := Tr[ Inv(X X) ] = Nr of iterations = Nr of arrays = 9. Nr of conditions = 9 A-optimal design with 9 conditions and 9 microarrays.

41 In practice: cyclic designs An interwoven loop design is defined by: number of conditions (HERE: 15) number of loops (HERE: 3) jump size for each loop (HERE: 1, 4, 6) However, cyclic designs are typically not A-optimal for p 17.

42 Advantages of an Interwoven Loop design A Optimality Score for Contrasts: Tr[ Inv(X X) ] = No of arrays = 90. No of conditions = 30 Easy to implement Compare left with right! Highly efficient Left is only 0.6% more efficient than Right Automatic dye balance conditions measured equally in Cy3/Cy5

43 Use od: optimal (interwoven loop) designs in R The function od(nt, ns, optimality =..., method = "loop") finds L-optimal/D-optimal interwoven loop/all designs for any number of conditions and any number of slides Can be obtained from smida library at: wite/book Some conclusions...

44 Dye Swap or Dye Balance? The simple model needs to be expanded for a possible dye effect: log y 1kr = θ 1 + S + +ɛ 1k + η 1kr log y 2kr = θ 2 + S + D + ɛ 2k + η 2kr, Dye swap designs have been proposed as way to control dye effect. HOWEVER, interwoven loop designs are more efficient! D A 5 cond. 10 arrays 89% 80% 6 cond. 12 arrays 87% 75% 7 cond. 14 arrays 85% 69% 8 cond. 16 arrays 82% 62% Relative D- and A-efficiency of dye-swap versus interwoven loop.

45 What other 4 design questions can be raised? 1. Too little biological material: technical replicates. 2. Too many biological material: pooling. 3. Factorial designs 4. Kinetic models We briefly describe these problems in what follows. You can work on them during the afternoon.

46 Case 1: not enough biological material... Typically, Wit et al. (2005) and others assumed: number of biological replicates exceeds the number of channels. Only then the above-mentioned optimal design -algorithm is applicable. What if number of channels exceeds number of biological replicates? In that case, we have to use technical replicates. New Question: How to assign the technical replicates to the arrays?

47 A simple practical example Consider an experiment with 3 conditions and 6 arrays. Each condition has 2 biological and 2 technical replicates. Dye swap an alternative Same design matrix, but different efficiency of parameter estimates.

48 Mixed effect models The mixed effect models can be written as: y = X θ + Z bio ɛ + η, where Z: the random effect design matrices ɛ random replication effect (of no interest) So, Var(y) = σ 2 b Z t b Z b + σ 2 t I.

49 Covariance-variance matrix Σ Σ = σb σt 2 I Σ = σb If only independent biological replicates, then Σ = (σ 2 b + σ2 t )I. That is the scenario in Wit et al. (2005). + σt 2 I

50 Generalized least squares The linear model is modelled by y = X δ + ɛ, where Var(ɛ) ρσ B + I is the variance structure of the residual. The generalized least squares estimates of δ: and so ˆδ = (X t Σ 1 X ) 1 X t Σ 1 y, V (ˆδ) (X t Σ 1 X ) 1. Under A-optimality we aim to minimize the f (X, Σ) = Tr(X t Σ 1 X ) 1

51 Case 2: Too many replicates Assume the following model for single gene & single condition: log y 1kr = θ 1 + S + ɛ k + η kr, where V (ɛ k ) = σɛ 2 : biological variation: associated with individual tissue V (η kr ) = ση: 2 technical variation: associated with microarray If p biological replicates are mixed before hybridization to array. Let x i be the expression of the ith pool, then V ( x i ) = σ2 ɛ p + σ2 η. Pool has less (biological) variation than each single tissue.

52 Pooling is not free: optimal pooling We want to minimize the estimation error: σ 2 ɛ np + σ2 η n, subject to not overrunning one s budget B = n p C s + n C a, C s = costs individual sample extraction C a = cost microarray (including dyes) Euler-Lagrange: Find the stationary points of h(n, p, λ) = σ2 ɛ np + σ2 η n λ (n p C s + n C a B).

53 Case 3: multi-factorial experiments (Simplified) example: Caetano et al. Genetics 2004, 168: Objective: Identify differentially expressed genes in pigs between 2 factors: Ovulation rate (2 levels: high, normal) Time after inducing luteal regression (4 levels: 2, 3, 4, 6 days) 2 4 = 8 factor combinations Available samples: 2 pigs per factor combination. Available microarrays: 12

54 Model Uncertainty One way to design the experiment: consider all factor combinations. This corresponds to a full factorial model. BUT possibly not all interactions are equally important: given large number of genes, some form of multi-factorial optimal design is natural; down-weight for higher order interactions.

55 Multi-factorial design optimality criterion Let Ey = X β, where β = (β 0,..., β p ) be the design and parametrization of a particular factorial model. Tsai, Gilmour and Mead (2000) suggested a Q-criterion, Q(X ) = 1 n 0 p p 1 a ii i=1 j=0 a 2 ij a ii a jj w ij, where {a ij } = X t X the information matrix, and w ij = no. of factorial submodels of X, which both include β i and β j. n 0 = total number of factorial submodels of X.

56 Q-criterion in microarray studies No intercept β 0 in the microarray log-expression ratio model. Slight modification of the Q-criterion: Q(X ) = 1 n 0 p p 1 a ii i=1 j=1 a 2 ij a ii a jj w ij. The old Q-criterion corresponds to including an explicit dye-effect.

57 Example: two-factor microarray experiment Let s focus on two lines of pigs whose ovary material is studied 2,3,4 and 6 days after inducing luteal regression. Because we study differential expressions, there is no intercept in the model, y ijk = α i + β j + (αβ) ij + ɛ ijk i = 1, 2, j = 1,..., 4 Not to favour any factor level, we choose the parametrization: α i = β j = (αβ) ij = 0. i j ij

58 Factorial submodels of two factor model α α + β α β β The total number of models n 0 = 4. The weight matrix is given by: w

59 Case 4: D-optimal designs for enzyme kinetic models Conditions = time Reaction A θ,λ B ODE representation: d[a] dt = θ[a] λ. with solution [A] = η(t, ψ) = { {1 (1 λ)θt} 1/(1 λ) λ 1 e θt λ = 1 Question: how to choose observations locations t 1,..., t k to optimize info about ψ = (θ, λ)?

60 Continuous designs and D-optimality The aim is to find an D-optimal design, { x1 x ξ = 2... x n w 1 w 2... w n }, whereby the x are the support points and the w the design weights. Solution independent of number of observations: w [0, 1]. Silvey (1980): optimal design is often concentrated on p design points, with weight 1/p (where p = # parameters). ξ is D-optimal, if it maximizes where F t = ( dη(t1,ψ) dθ dη(t 1,ψ) dλ det F t WF, dη(t 2,ψ) dθ dη(t 2,ψ) dλ So, optimal design depends on (guess) ψ 0. ) W = diag(1/2, 1/2)

61 Conclusions We have met several microarray design issues and concluded: 1. There are several design principles to obey when considering sources of variation of a microarray experiment. 2. carefully assigning samples to conditions can improve estimation: optimal designs are important in order to reduce cost/time interwoven loop designs are a good compromise 3. Further questions to consider: Technical replication Pooling Factorial designs kinetic modelling.

Topics on statistical design and analysis. of cdna microarray experiment

Topics on statistical design and analysis. of cdna microarray experiment Topics on statistical design and analysis of cdna microarray experiment Ximin Zhu A Dissertation Submitted to the University of Glasgow for the degree of Doctor of Philosophy Department of Statistics May

More information

Design of Microarray Experiments. Xiangqin Cui

Design of Microarray Experiments. Xiangqin Cui Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Biochip informatics-(i)

Biochip informatics-(i) Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing

More information

cdna Microarray Analysis

cdna Microarray Analysis cdna Microarray Analysis with BioConductor packages Nolwenn Le Meur Copyright 2007 Outline Data acquisition Pre-processing Quality assessment Pre-processing background correction normalization summarization

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Example 1: Two-Treatment CRD

Example 1: Two-Treatment CRD Introduction to Mixed Linear Models in Microarray Experiments //0 Copyright 0 Dan Nettleton Statistical Models A statistical model describes a formal mathematical data generation mechanism from which an

More information

Clustering & microarray technology

Clustering & microarray technology Clustering & microarray technology A large scale way to measure gene expression levels. Thanks to Kevin Wayne, Matt Hibbs, & SMD for a few of the slides 1 Why is expression important? Proteins Gene Expression

More information

Seminar Microarray-Datenanalyse

Seminar Microarray-Datenanalyse Seminar Microarray- Normalization Hans-Ulrich Klein Christian Ruckert Institut für Medizinische Informatik WWU Münster SS 2011 Organisation 1 09.05.11 Normalisierung 2 10.05.11 Bestimmen diff. expr. Gene,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Expression arrays, normalization, and error models

Expression arrays, normalization, and error models 1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in

More information

SPOTTED cdna MICROARRAYS

SPOTTED cdna MICROARRAYS SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.

More information

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:

More information

Optimal normalization of DNA-microarray data

Optimal normalization of DNA-microarray data Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La

More information

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

NOVABEADS FOOD 1 DNA KIT

NOVABEADS FOOD 1 DNA KIT NOVABEADS FOOD 1 DNA KIT NOVABEADS FOOD DNA KIT is the new generation tool in molecular biology techniques and allows DNA isolations from highly processed food products. The method is based on the use

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Design of microarray experiments

Design of microarray experiments Design of microarray experiments Ulrich Mansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis March 23 Heidelberg Heidelberg, March 23 Experiments Scientists deal mostly with experiments

More information

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Anthea Dokidis Glenda Delenstarr Abstract The performance of the Agilent microarray system can now be evaluated

More information

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Jeffrey S. Morris University of Texas, MD Anderson Cancer Center Joint wor with Marina Vannucci, Philip J. Brown,

More information

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

Linear Regression. Volker Tresp 2018

Linear Regression. Volker Tresp 2018 Linear Regression Volker Tresp 2018 1 Learning Machine: The Linear Model / ADALINE As with the Perceptron we start with an activation functions that is a linearly weighted sum of the inputs h = M j=0 w

More information

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

CONJOINT 541. Translating a Transcriptome at Specific Times and Places. David Morris. Department of Biochemistry

CONJOINT 541. Translating a Transcriptome at Specific Times and Places. David Morris. Department of Biochemistry CONJOINT 541 Translating a Transcriptome at Specific Times and Places David Morris Department of Biochemistry http://faculty.washington.edu/dmorris/ Lecture 1 The Biology and Experimental Analysis of mrna

More information

On construction of constrained optimum designs

On construction of constrained optimum designs On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples

Experimental Design. Experimental design. Outline. Choice of platform Array design. Target samples Experimental Design Credit for some of today s materials: Jean Yang, Terry Speed, and Christina Kendziorski Experimental design Choice of platform rray design Creation of probes Location on the array Controls

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

Design of microarray experiments

Design of microarray experiments Design of microarray experiments Ulrich ansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis September Heidelberg Heidelberg, September otivation The lab biologist and theoretician need

More information

Design and Analysis of Gene Expression Experiments

Design and Analysis of Gene Expression Experiments Design and Analysis of Gene Expression Experiments Guilherme J. M. Rosa Department of Animal Sciences Department of Biostatistics & Medical Informatics University of Wisconsin - Madison OUTLINE Æ Linear

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Cybergenetics: Control theory for living cells

Cybergenetics: Control theory for living cells Department of Biosystems Science and Engineering, ETH-Zürich Cybergenetics: Control theory for living cells Corentin Briat Joint work with Ankit Gupta and Mustafa Khammash Introduction Overview Cybergenetics:

More information

Fuzzy Clustering of Gene Expression Data

Fuzzy Clustering of Gene Expression Data Fuzzy Clustering of Gene Data Matthias E. Futschik and Nikola K. Kasabov Department of Information Science, University of Otago P.O. Box 56, Dunedin, New Zealand email: mfutschik@infoscience.otago.ac.nz,

More information

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

Bi 1x Spring 2014: LacI Titration

Bi 1x Spring 2014: LacI Titration Bi 1x Spring 2014: LacI Titration 1 Overview In this experiment, you will measure the effect of various mutated LacI repressor ribosome binding sites in an E. coli cell by measuring the expression of a

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

DIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS

DIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS Sankhyā : The Indian Journal of Statistics Special issue in memory of D. Basu 2002, Volume 64, Series A, Pt. 3, pp 706-720 DIRECT VERSUS INDIRECT DESIGNS FOR edna MICROARRAY EXPERIMENTS By TERENCE P. SPEED

More information

Lecture 7: Simple genetic circuits I

Lecture 7: Simple genetic circuits I Lecture 7: Simple genetic circuits I Paul C Bressloff (Fall 2018) 7.1 Transcription and translation In Fig. 20 we show the two main stages in the expression of a single gene according to the central dogma.

More information

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

TruSight Cancer Workflow on the MiniSeq System

TruSight Cancer Workflow on the MiniSeq System TruSight Cancer Workflow on the MiniSeq System Prepare Library Sequence Analyze Data TruSight Cancer 1.5 days ~ 24 hours < 2 hours TruSight Cancer Library Prep MiniSeq System Local Run Manager Enrichment

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

13.4 Gene Regulation and Expression

13.4 Gene Regulation and Expression 13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

A Semi-Supervised Dimensionality Reduction Method to Reduce Batch Effects in Genomic Data

A Semi-Supervised Dimensionality Reduction Method to Reduce Batch Effects in Genomic Data A Semi-Supervised Dimensionality Reduction Method to Reduce Batch Effects in Genomic Data Anusha Murali Mentor: Dr. Mahmoud Ghandi MIT PRIMES Computer Science Conference 2018 October 13, 2018 Anusha Murali

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

robustness: revisting the significance of mirna-mediated regulation

robustness: revisting the significance of mirna-mediated regulation : revisting the significance of mirna-mediated regulation Hervé Seitz IGH (CNRS), Montpellier, France October 13, 2012 microrna target identification .. microrna target identification mir: target: 2 7

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Olympic B3 Summer Science Camp 2015 Lab 0

Olympic B3 Summer Science Camp 2015 Lab 0 Using Lab Stock Solutions interpretation and calculations Introduction: In molecular biology you generally start with a specific set of general instructions, called a Protocol. You can think of it as a

More information

Computational statistics

Computational statistics Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f

More information

Chapter 8: Introduction to Evolutionary Computation

Chapter 8: Introduction to Evolutionary Computation Computational Intelligence: Second Edition Contents Some Theories about Evolution Evolution is an optimization process: the aim is to improve the ability of an organism to survive in dynamically changing

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

mrna Isolation Kit for Blood/Bone Marrow For isolation mrna from blood or bone marrow lysates Cat. No

mrna Isolation Kit for Blood/Bone Marrow For isolation mrna from blood or bone marrow lysates Cat. No For isolation mrna from blood or bone marrow lysates Cat. No. 1 934 333 Principle Starting material Application Time required Results Key advantages The purification of mrna requires two steps: 1. Cells

More information

2015 FALL FINAL REVIEW

2015 FALL FINAL REVIEW 2015 FALL FINAL REVIEW Biomolecules & Enzymes Illustrate table and fill in parts missing 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation

Two-Color Microarray Experimental Design Notation. Simple Examples of Analysis for a Single Gene. Microarray Experimental Design Notation Simple Examples of Analysis for a Single Gene wo-olor Microarray Experimental Design Notation /3/0 opyright 0 Dan Nettleton Microarray Experimental Design Notation Microarray Experimental Design Notation

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Cyproheptadine ELISA Test Kit

Cyproheptadine ELISA Test Kit Cyproheptadine ELISA Test Kit Catalog No. LSY-10041 1. Principal This test kit is based on the competitive enzyme immunoassay for the detection of Cyproheptadine in urine, tissue sample. The coupling antigens

More information

Data Sheet. Azide Cy5 RNA T7 Transcription Kit

Data Sheet. Azide Cy5 RNA T7 Transcription Kit Cat. No. Size 1. Description PP-501-Cy5 10 reactions à 40 µl For in vitro use only Quality guaranteed for 12 months Store all components at -20 C. Avoid freeze and thaw cycles. DBCO-Sulfo-Cy5 must be stored

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing 1 Data Preprocessing Normalization: the process of removing sampleto-sample variations in the measurements not due to differential gene expression. Bringing measurements from the different

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

RNA Transport. R preps R preps

RNA Transport. R preps R preps RNA Transport R0527-00 5 preps R0527-01 50 preps July 2014 RNA Transport Table of Contents Introduction...2 Kit Contents/Storage and Stability...3 Protocol...4 Storage Procedure...4 Recovery Procedure...5

More information

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever. CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio Review Autumn 2004 Larry Ruzzo Related Courses He who asks is a fool for five minutes, but he who does not ask remains

More information

The Blessing and the Curse

The Blessing and the Curse The Blessing and the Curse of the Multiplicative Updates Manfred K. Warmuth University of California, Santa Cruz CMPS 272, Feb 31, 2012 Thanks to David Ilstrup and Anindya Sen for helping with the slides

More information

Exam: high-dimensional data analysis February 28, 2014

Exam: high-dimensional data analysis February 28, 2014 Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.

More information

Network Biology-part II

Network Biology-part II Network Biology-part II Jun Zhu, Ph. D. Professor of Genomics and Genetic Sciences Icahn Institute of Genomics and Multi-scale Biology The Tisch Cancer Institute Icahn Medical School at Mount Sinai New

More information

Mag-Bind Soil DNA Kit. M preps M preps M preps

Mag-Bind Soil DNA Kit. M preps M preps M preps Mag-Bind Soil DNA Kit M5635-00 5 preps M5635-01 50 preps M5635-02 200 preps January 2013 Mag-Bind Soil DNA Kit Table of Contents Introduction and Overview...2 Kit Contents/Storage and Stability...3 Preparing

More information

Linear Models and Empirical Bayes Methods for Microarrays

Linear Models and Empirical Bayes Methods for Microarrays Methods for Microarrays by Gordon Smyth Alex Sánchez and Carme Ruíz de Villa Department d Estadística Universitat de Barcelona 16-12-2004 Outline 1 An introductory example Paper overview 2 3 Lönnsted and

More information

CSE 554 Lecture 7: Alignment

CSE 554 Lecture 7: Alignment CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex

More information

Bioinformatics. Transcriptome

Bioinformatics. Transcriptome Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics

More information

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical

More information

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??

More information

CORE MOLIT ACTIVITIES at a glance

CORE MOLIT ACTIVITIES at a glance CORE MOLIT ACTIVITIES at a glance 1. Amplification of Biochemical Signals: The ELISA Test http://molit.concord.org/database/activities/248.html The shape of molecules affects the way they function. A test

More information

Single gene analysis of differential expression. Giorgio Valentini

Single gene analysis of differential expression. Giorgio Valentini Single gene analysis of differential expression Giorgio Valentini valenti@disi.unige.it Comparing two conditions Each condition may be represented by one or more RNA samples. Using cdna microarrays, samples

More information

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7.

Discovering Correlation in Data. Vinh Nguyen Research Fellow in Data Science Computing and Information Systems DMD 7. Discovering Correlation in Data Vinh Nguyen (vinh.nguyen@unimelb.edu.au) Research Fellow in Data Science Computing and Information Systems DMD 7.14 Discovering Correlation Why is correlation important?

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Identifying Bio-markers for EcoArray

Identifying Bio-markers for EcoArray Identifying Bio-markers for EcoArray Ashish Bhan, Keck Graduate Institute Mustafa Kesir and Mikhail B. Malioutov, Northeastern University February 18, 2010 1 Introduction This problem was presented by

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information