(134/ !"#!$%&%' ()*)$+,-.)//01*$2*345/0/ M<39$N1$;<)5$")3/B.)O ;).C0*141=5 " /7$83.9$:
|
|
- Alyson Parrish
- 5 years ago
- Views:
Transcription
1 (134/!"#!$%&%' ()*)$+,-)//01*$2*345/0/ "061335/7$839$: %&'()*&%&+,($'*&$%'-& D1C340E3?1*$F91$/93?/?634$34=109<C/H "*%'/01&$23$'(()%45"+$"+$6&+,&*$"7$-0(,*02)5"+ 8"6'/$4"/3+"%0'/$*&9*&((0"+ "*%'/01&$23$'(()%45"+$"+$&+5*&$-0(,*02)5"+ :)'+5/&$+"*%'/01'5"+ I ;)C0*141=5 M<39$N1$;<)5$")3/B)O " $3357$1$6<0- J5K00E3?1* 81K)/ L)39B)/ ;3=)9 #63**) :*9)*/0?)/ "061335/ ()*)$+,-)//01*$235/ +,1*$235/
2 ;3*/60-?1* P)Q)/)$93*/60-?1*!41*)$6ND2$/93*/7$61C-4)C)*935$91$9<)$CPD2 ND2 CPD2 T T C C T C C T T PD2$ -145C)3/) U U C C L1C$ND2$91$CPD2 CPD2 6ND2 U U C C U C P)Q)/)$ 93*/60-93/) T T C T T C T T C C T T C C T T C C T T T C T T DB64)06$260 N)*39B3?1* 2$;$($!$!$($;$;$($!$2 2$;$($!$!$($;$;$($!$2 ;$2$!$($($!$2$2$!$($; ;$2$!$($($!$2$2$!$($;
3 J5K00E3?1* 2$;$($!$!$($;$;$($!$2 J5K00E3?1* 2$;$($!$!$($;$;$($!$2 ;$2$!$($($!$2$2$!$($;!$!$!$;$2$;$!$($!$2$;$ 2$!$!$;$;$2$!$($!$;$2$ ;$2$!$($($!$2$2$!$($;!$!$!$;$2$;$!$($!$2$;$ 2$!$!$;$;$2$!$($!$;$2$!"#$%&%'()*+!"#$%&%'()*+ ;3=)9 CTT CT R3K)4)$;3=)9 CTT CT L)39B)/$1$81K)/ TCT CT
4 843S1C/$9<39$1C0*39)$C3T)9 2U5C)90,$F<0=<$)*/0957$1*)$6141H 2=04)*9$F6064)/$1*$=07$1*)$1$91$6141H :44BC0*3$F<0=<$)*/0957$1*)$1$91$6141H D0CK4)=)*$F<0=<$)*/0957$1*)$1$91$6141H J0=<$)*/0957$1*)$6141 2U5C)90, :44BC0*3$B/)/$K)3/$0*/9)3$1>$0*V/09B$/)WB)*60*= #)WB)*6)$FJ0=<$)*/095H J5K00E3?1*$1>$CPD2$91$-1K)/ XY
5 2U5C)90,$()*)!<0- [ $235/ * * * * * ')>1)$R3K)40*= Sample 1 Sample 2 XZ rray 1 rray 2 ')>1)$J5K00E3?1*@$\*)$!<3**)4 Sample 1 Sample 2 2])$J5K00E3?1* rray 1 rray 2 rray 1 rray 2
6 #63**)$:C3=) ^B3*?_63?1* rray 1 rray 2 rray 1 rray 2 "061335$:C3=) ;1$61417$=0$1*$6064)
7 Sample 1 Sample 2 ')>1)$J5K00E3?1* Sample 1 Sample 2 rray 1 rray 1 2])$J5K00E3?1* #63**)$:C3=) rray 1 rray 1
8 ^B3*?_63?1* "061335$:C3=) 4,0 2,4 0,0 3,3 rray 1 Measurements Samples (individuals) Probes (genes) N article MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Scott rmstrong 1 4, Jane E Staunton 5, Lewis B Silverman 1,3,4, Rob Pieters 6, Monique L den Boer 6, Mark D Minden 7, Stephen E Sallan 1,3,4, Eric S Lander 5, Todd R olub 1,3,4,5 * & Stanley J Korsmeyer 2,4,8 * *These authors contributed equally to this work Published online: 3 December 2001, DOI: /ng765 DT MTRIX
9 Common Types of Data nalysis Microarray Data Class Discovery (Clustering, Unsupervised learning) Class Prediction (Classification, Supervised Learning) Class Comparison (Differential Expression) Why log? Why logs? For better of worse, fold changes are the preferred quantification of differential expression Fold changes are basically ratios Ratios are not symmetric around 1 This makes it problematic to perform statistical operations with ratios We prefer logs
10 Why logs Scatter Plot The intensity distribution has a fat right tail Log of ratios are symmetric around 0: verage of 1/10 and 10 is about 5 verage of log(1/10) and log(10) is 0 veraging ratios is almost always a bad idea! Facts you must remember: log(1) = 0 log(xy) = log(x) + log(y) log(y/x) = log(y) - log(x) log( X) = 1/2 log(x) 45 rotation highlights a problem Experiments with replicates If we are interested in genes with over-all large fold changes why not look at average (log) fold changes? Experience has shown that one usually wants to stratify by over-all expression We can make averaged M plots: M = difference in average log intensities and = average of log intensities This is referred to as Mplot
11 M plot of average log ratios Scatter Smooth This particular problem addressed by RM (see last slide for ref) Normalization Normalization is needed to ensure that differences in intensities are indeed due to differential expression, and not some printing, hybridization, or scanning artifact D1C340E0*=$6)*9)$1>$0/90KB?1* F41634$-145*1C034$)=)//01*H Normalization is necessary before any analysis which involves within or between slides comparisons of intensities, eg, clustering, classification, testing Somewhat different approaches are used in twocolor and one-color technologies ``
12 Most Common Problem Scatter Plot Intensity dependent effect: Different background level most likely culprit Demonstrates importance of M plot Example: Intensity Effect Loess The most common problem is intensity dependent effects Probably due to different background Loess is used to estimate and remove this effect
13
14 Loess LOcal regression Sometimes lowess for LOcally Weighted regression For each interval i Fit linear model using Ni points in interval using least squares min β 0i, β 1i N i j=1 (y ij β 0i β 1i x ij ) 2 Loess Why stop at linear fits? Can use polynomials For each interval i Fit quadratic model using Ni points in interval using least squares min β 0i, β 1i, β 2i N i j=1 (y ij β 0i β 1i x ij β 2i x 2 ij) 2 Loess What if data is irregularly spaced then some intervals will have more data points to estimate the fit One solution: for each value use α% of the data points (adaptive) Similar to k-nearest neighbors There are other more sophisticated solutions
15 Example of Replicate Data Quantile normalization Loess estimates mean of intensity effect and corrects based on that Quantile normalization assumes distributions must match So-called distribution normalization It is fast and conceptually simple Basic idea: order value in each array take average across probes Substitute probe intensity with average Put in original order Example of quantile normalization Original Ordered veraged Re-ordered Preprocessing and Normalization There is a lot more we havenʼt discussed But, this is essential to successful use of microarray data There is a lot of interest in analyzing across experiments Say, multiple tumor/normal comparisons Proper normalization is hard but essential Further reading: frozen RM and the microarray barcode (PMIDS: and ) PUBMED:
Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationBiochip informatics-(i)
Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing
More informationcdna Microarray Analysis
cdna Microarray Analysis with BioConductor packages Nolwenn Le Meur Copyright 2007 Outline Data acquisition Pre-processing Quality assessment Pre-processing background correction normalization summarization
More informationSeminar Microarray-Datenanalyse
Seminar Microarray- Normalization Hans-Ulrich Klein Christian Ruckert Institut für Medizinische Informatik WWU Münster SS 2011 Organisation 1 09.05.11 Normalisierung 2 10.05.11 Bestimmen diff. expr. Gene,
More informationMicroarray Preprocessing
Microarray Preprocessing Normaliza$on Normaliza$on is needed to ensure that differences in intensi$es are indeed due to differen$al expression, and not some prin$ng, hybridiza$on, or scanning ar$fact.
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationMicroarray Data Analysis: Discovery
Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover
More informationA Case Study -- Chu et al. The Transcriptional Program of Sporulation in Budding Yeast. What is Sporulation? CSE 527, W.L. Ruzzo 1
A Case Study -- Chu et al. An interesting early microarray paper My goals Show arrays used in a real experiment Show where computation is important Start looking at analysis techniques The Transcriptional
More informationLow-Level Analysis of High- Density Oligonucleotide Microarray Data
Low-Level Analysis of High- Density Oligonucleotide Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley UC Berkeley Feb 23, 2004 Outline
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationSPOTTED cdna MICROARRAYS
SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.
More informationChapter 5: Microarray Techniques
Chapter 5: Microarray Techniques 5.2 Analysis of Microarray Data Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Normalization Clustering Overview 2 1 Processing Microarray Data
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationExperimental Design. Experimental design. Outline. Choice of platform Array design. Target samples
Experimental Design Credit for some of today s materials: Jean Yang, Terry Speed, and Christina Kendziorski Experimental design Choice of platform rray design Creation of probes Location on the array Controls
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationSession 06 (A): Microarray Basic Data Analysis
1 SJTU-Bioinformatics Summer School 2017 Session 06 (A): Microarray Basic Data Analysis Maoying,Wu ricket.woo@gmail.com Dept. of Bioinformatics & Biostatistics Shanghai Jiao Tong University Summer, 2017
More informationData Preprocessing. Data Preprocessing
Data Preprocessing 1 Data Preprocessing Normalization: the process of removing sampleto-sample variations in the measurements not due to differential gene expression. Bringing measurements from the different
More informationCLASSIFIER FUSION FOR POORLY-DIFFERENTIATED TUMOR CLASSIFICATION USING BOTH MESSENGER RNA AND MICRORNA EXPRESSION PROFILES
1 CLASSIFIER FUSION FOR POORLY-DIFFERENTIATED TUMOR CLASSIFICATION USING BOTH MESSENGER RNA AND MICRORNA EXPRESSION PROFILES Yuhang Wang and Margaret H. Dunham Department of Computer Science and Engineering,
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationSimple Linear Regression
Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationSATELLITE REMOTE SENSING
SATELLITE REMOTE SENSING of NATURAL RESOURCES David L. Verbyla LEWIS PUBLISHERS Boca Raton New York London Tokyo Contents CHAPTER 1. SATELLITE IMAGES 1 Raster Image Data 2 Remote Sensing Detectors 2 Analog
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More information22S39: Class Notes / November 14, 2000 back to start 1
Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationUNC Charlotte Super Competition Level 3 Test March 4, 2019 Test with Solutions for Sponsors
. Find the minimum value of the function f (x) x 2 + (A) 6 (B) 3 6 (C) 4 Solution. We have f (x) x 2 + + x 2 + (D) 3 4, which is equivalent to x 0. x 2 + (E) x 2 +, x R. x 2 + 2 (x 2 + ) 2. How many solutions
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationProbe-Level Analysis of Affymetrix GeneChip Microarray Data
Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley University of Minnesota Mar 30, 2004 Outline
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationSimultaneous variable selection and class fusion for high-dimensional linear discriminant analysis
Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationGeneral Least Squares Fitting
Chapter 1 General Least Squares Fitting 1.1 Introduction Previously you have done curve fitting in two dimensions. Now you will learn how to extend that to multiple dimensions. 1.1.1 Non-linear Linearizable
More informationOptimal normalization of DNA-microarray data
Optimal normalization of DNA-microarray data Daniel Faller 1, HD Dr. J. Timmer 1, Dr. H. U. Voss 1, Prof. Dr. Honerkamp 1 and Dr. U. Hobohm 2 1 Freiburg Center for Data Analysis and Modeling 1 F. Hoffman-La
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationUse of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance
Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Anthea Dokidis Glenda Delenstarr Abstract The performance of the Agilent microarray system can now be evaluated
More informationSupplementary Figure 1: Chemical compound space. Errors depending on the size of the training set for models with T = 1, 2, 3 interaction passes
9 8 7 6 5 4 3 2 1 0 10 3 10 4 10 5 6 5 4 3 2 1 0 10000 25000 50000 100000 Supplementary Figure 1: Chemical compound space. Errors depending on the size of the training set for models with T = 1, 2, 3 interaction
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some
More informationModule Based Neural Networks for Modeling Gene Regulatory Networks
Module Based Neural Networks for Modeling Gene Regulatory Networks Paresh Chandra Barman, Std 1 ID: 20044523 Term Project: BiS732 Bio-Network Department of BioSystems, Korea Advanced Institute of Science
More informationGraph-Based Semi-Supervised Learning
Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier
More informationIntroduction to clustering methods for gene expression data analysis
Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationActivity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression
Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear
More informationTraining the linear classifier
215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.
More informationUsing a Hopfield Network: A Nuts and Bolts Approach
Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationLocal regression, intrinsic dimension, and nonparametric sparsity
Local regression, intrinsic dimension, and nonparametric sparsity Samory Kpotufe Toyota Technological Institute - Chicago and Max Planck Institute for Intelligent Systems I. Local regression and (local)
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationGENOMIC SIGNAL PROCESSING. Lecture 2. Classification of disease subtype based on microarray data
GENOMIC SIGNAL PROCESSING Lecture 2 Classification of disease subtype based on microarray data 1. Analysis of microarray data (see last 15 slides of Lecture 1) 2. Classification methods for microarray
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationUnsupervised Learning: K-Means, Gaussian Mixture Models
Unsupervised Learning: K-Means, Gaussian Mixture Models These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online.
More informationMicroarray Data Analysis - II. FIOCRUZ Bioinformatics Workshop 6 June, 2001
Microarray Data Analysis - II FIOCRUZ Bioinformatics Workshop 6 June, 2001 Challenges in Microarray Data Analysis Spot Identification and Quantitation. Normalization of data from each experiment. Identification
More informationData Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the
More informationDensity-Based Clustering
Density-Based Clustering idea: Clusters are dense regions in feature space F. density: objects volume ε here: volume: ε-neighborhood for object o w.r.t. distance measure dist(x,y) dense region: ε-neighborhood
More informationExpectation-Maximization for Sparse and Non-Negative PCA
Christian D. Sigg Joachim M. Buhmann Institute of Computational Science, ETH Zurich, 8092 Zurich, Switzerland chrsigg@inf.ethz.ch jbuhmann@inf.ethz.ch Abstract We study the problem of finding the dominant
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org
More informationEconometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018
Econometrics I Lecture 10: Nonparametric Estimation with Kernels Paul T. Scott NYU Stern Fall 2018 Paul T. Scott NYU Stern Econometrics I Fall 2018 1 / 12 Nonparametric Regression: Intuition Let s get
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for
More informationComparison Figures from the New 22-Year Daily Eddy Dataset (January April 2015)
Comparison Figures from the New 22-Year Daily Eddy Dataset (January 1993 - April 2015) The figures on the following pages were constructed from the new version of the eddy dataset that is available online
More informationPolynomials and Polynomial Equations
Polynomials and Polynomial Equations A Polynomial is any expression that has constants, variables and exponents, and can be combined using addition, subtraction, multiplication and division, but: no division
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning
CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining
More informationProbe-Level Analysis of Affymetrix GeneChip Microarray Data
Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley Memorial Sloan-Kettering Cancer Center July
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org
More informationStability-Based Model Selection
Stability-Based Model Selection Tilman Lange, Mikio L. Braun, Volker Roth, Joachim M. Buhmann (lange,braunm,roth,jb)@cs.uni-bonn.de Institute of Computer Science, Dept. III, University of Bonn Römerstraße
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationStat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,
1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.
More informationCopy anything you do not already have memorized into your notes.
Copy anything you do not already have memorized into your notes. A monomial is the product of non-negative integer powers of variables. ONLY ONE TERM. ( mono- means one) No negative exponents which means
More informationDesign of microarray experiments
Design of microarray experiments Ulrich ansmann mansmann@imbi.uni-heidelberg.de Practical microarray analysis September Heidelberg Heidelberg, September otivation The lab biologist and theoretician need
More informationOptimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties
Optimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties Soil Spectroscopy Extracting chemical and physical attributes from spectral
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More informationDesign of Microarray Experiments. Xiangqin Cui
Design of Microarray Experiments Xiangqin Cui Experimental design Experimental design: is a term used about efficient methods for planning the collection of data, in order to obtain the maximum amount
More informationClassical Monte Carlo Simulations
Classical Monte Carlo Simulations Hyejin Ju April 17, 2012 1 Introduction Why do we need numerics? One of the main goals of condensed matter is to compute expectation values O = 1 Z Tr{O e βĥ} (1) and
More informationClassification Algorithms
Classification Algorithms UCSB 290N, 2015. T. Yang Slides based on R. Mooney UT Austin 1 Table of Content roblem Definition Rocchio K-nearest neighbor case based Bayesian algorithm Decision trees 2 Given:
More informationTechnologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA
Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem
More information