CS284A: Representations and Algorithms in Molecular Biology
|
|
- Alexandra Spencer
- 5 years ago
- Views:
Transcription
1 CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by Professor Xiaohui Xie o Jauary 14 & 16, 2008 I Motif Discovery via Eumeratio A A Model for Motif Discovery (Review from Lecture 2) We wat to idetify biologically sigificat motifs i a set S of sequeces, s 1, s 2,, s Each potetially sigificat motif m i of legth w is associated with a summatio variable k i, which is the total umber of sequeces from S i which the motif appears To systematically measure this sigificace, we must first fid the uderlyig probability p ay sequece of legth l cotais ay theoretical motif of legth w With the overridig assumptio that the four bases are uiformly distributed, or ( P(A),P(C),P(G),P(T) ) = 1 4, 1 4, 1 4, 1 % # 4 ', we have calculated a value for p of & lw+1 # & % ( We use p as the probability of success for fidig this 4 w ' theoretical motif each time we sample a sequece from set S For k out of trials, the probability of success is biomial, P( k) = % ' p k ( 1( p) (k, # k& %! where ' = # k& k! ( ( k)!, as a motif either is i a sequece or is ot To test the sigificace of our specific motif m i, we evaluate a p-value, or the probability, based o our distributio, that m i would appear i at least k i sequeces:
2 2 # P( k) = % & ( p k ( 1) p) )k k' k= k i k= k i If the p-value is smaller tha a chose sigificace level, 1 we ca say with some cofidece that our motif m i is biologically sigificat For large the biomial distributio is approximated by a ormal distributio, ad we ca map k i to a ew distributio ad compute the z-score to determie the sigificace of our motif m i B Problems with this Model 1 The assumptio that the four bases are uiformly distributed i the sequeces is ot ecessarily correct To be more accurate, we would eed to model the first-order statistics (ie, P(A), P(C), P(G), ad P(T)) of the ucleotide distributio 2 The model igores secod-order statistics Two bases might be more likely paired together tha distributed at radom (eg P(GA) P(G)P(A) ) The same could be also said for higher-order statistics C Cotrol Sequeces I order ot to rely o the assumptio of uiform distributio of bases to measure sigificace, we ca geerate a set of N cotrol sequeces, s o 1, s o 2,,s o N The assumptio is that our motif of iterest m i is ot sigificat i the cotrol sequeces Now we have two sets of sequeces Each m i is associated with two values k i ad k o i, which correspod with the umber of differet sequeces this motif appears i the sets of sequeces S ad S o, respectively Now to fid out if our motif m i is biologically sigificat, we choose the appropriate probability distributio for successfully fidig a motif i k out of trials There are two types to choose from: 1 The biomial distributio If the set S is idepedet of S o, we ca still model the probability of success P(k) o fidig a motif i k out of trials usig the biomial distributio If S S o (ie, the set S is a subset of S o ), choosig the appropriate distributio ow depeds o the size of both sets ad the distributio of our motif m i i them If the umber N of S o sequeces ad the umber k i o of sequeces cotaiig our motif are large compared to the umber of S sequeces, the the probability p of radomly pickig a sequece with our motif remais essetially uchaged for trials, ad we could still model the probability P(k)
3 3 usig the biomial distributio 2 For these scearios the oly chage we eed to make from the model i Part A is to adopt a differet uderlyig probability p of success for fidig a motif every time we sample a sequece For p we will use the relative frequecy k i N our motif m i is foud i the set S o This way, whe we ru k trials, we ca compare the distributios from both S ad S o to see if our motif ideed stads out i S The probability of success o k out of trials may be writte as P(k) = % # k& ' k o k % i ' 1( k o % i ' # N & # N & To test the sigificace of our motif, we calculate the p-value i the same fashio as we did before: P(k) For large we ca agai map k i to a ormal distributio with mea p ad variace p(1-p) ad compute the z-score 2 The hypergeometric distributio k= k i If S S o ad if either N or k i o is ot large compared to for a give m i, the sequece of trials is aalogous to samplig without replacemet The probability p of radomly pickig a sequece with our motif chages sigificatly over trials Hece, we caot use the biomial distributio, which assumes the same p for all trials The appropriate distributio is hypergeometric, where the probability of success o fidig a motif i k out of trials is P( k) = K% ' N ( K % ' # k &# ( k &, N% ' # & where K % ' is the umber of ways of choosig k sequeces with a # k & # N K& motif from the total umber K of sequeces with that motif, % ( is k ' the umber of ways of choosig -k sequeces without the motif from N% the total umber N-K of sequeces without the motif, ad ' is the # & (k o
4 4 umber of ways of choosig sequeces from the total umber N sequeces While usig this distributio to test the sigificace of our particular motif m i, we assig k o i to the value K Like before we calculate the p-value usig the summatio P(k) We caot compute a z-score here, as a ormal distributio does ot approximate a hypergeometric distributio for large k= k i II Represetatio of a Motif Usig a Positio Weight Matrix A What is a Positio Weight Matrix? Motifs are hardly ever represeted accurately by a uique cosecutive sequece of A s, C s, G s ad T s Istead, we create a positio weight matrix (PWM) to represet the frequecies of each base at each positio i the motif: G A T C Sometimes a positio weight matrix is represeted by a sequece logo, where the height of the letters represetig the ucleotides correlates with the frequecy that base is foud i differet sequeces cotaiig the motif: From the example above, positio 1 is said to be degeerate; there is o sigle ucleotide that represets the motif here O the other had positio 3 is said to be striget because the motif is well represeted by adeosie B Mathematical Represetatio of a Positio Weight Matrix The positio weight matrix for a motif of width w ca be expressed as
5 5 # w1 & % ( = % w2 (, % w3 ( % ( w4 ' where each row j represets A, C, G, or T, ad each colum i represets oe positio of the motif, ad is ormalized: 4 # ij =1 j=1 for all i = 1, 2, w For example θ 23 is the relative frequecy that guaie is foud i positio 2 of the motif C Likelihood of a Sequece If all the relative frequecies θ ij are give for the positio weight matrix θ, we ca measure the probability of geeratig a sequece S = (s 1, s 2,, s w ) This is also kow as the likelihood L(θ) of the sequece For example we ca use a positio weight matrix of width w = 3 to calculate likelihood of the sequece GGG It is simply the product of three relative frequecies θ 13, θ 23, ad θ 33 Geeralizig this usig mathematics, we fid the likelihood of a sequece S = (s 1, s 2,, s w ) give θ i is L() = P S ( ) = ij I( s i = j) where I s i = j w 4 #, i=1 j=1 ( ) = 1 if s i = j # 0 if ot Let us briefly go over a few sytax elemets First of all, the expressio P(S θ) represets a coditioal probability: We are askig, What is the likelihood of sequece S give the coditio that the positio weight matrix is θ? Secodly, the (ie, capital pi) otatio meas we take the product of the associated terms Fially, for coveiece we coverted the alphabetical strig (A, C, G, T) ito a umerical oe (1, 2, 3, 4) These umbers are represeted by the variable j i the above expressio Other ways of expressig the likelihood L(θ) are
6 6 L() = P S w # ( ) = P( s i i ) i=1 w = # i,si The coditioal probability P(s i θ i ) is the probability of geeratig a ucleotide elemet s i give its relative frequecy θ i We ca expad this idea further ad measure the likelihood for a set of sequeces S 1, S 2,, S give θ Sice we are assumig each sequece S k is geerated idepedetly from θ, this probability is simply the product of the relative frequecies i,ski represetig each ucleotide elemet s ki : L() = P S 1,,S i=1 ( ) = P( S k ) # w ## = i,ski Note that the sytax P(S 1, S 2,, S θ) represets a joit probability the probability of geeratig sequeces S 1, S 2,, S as well as a coditioal probability the probability give θ i=1 D Usig Maximum Likelihood to Estimate the Positioal Weight Matrix θ Ofte times we wat to costruct a positio weight matrix θ of legth w from observed sequece data For a set of sequeces S 1, S 2,, S represeted by the same θ, our strategy is to maximize the likelihood L(θ) over all possible values of θ ij This could be doe by settig the partial derivative L(#) # ij equal to zero ad solvig for θ ij ; however, it is much easier to take the partial derivative with respect to the log-likelihood fuctio (ie, the logarithm of the likelihood) ad set it to zero logl(#) # ij = 0 because the product associated with the likelihood L(θ) turs ito a sum Note that there are oly 3w ad ot 4w parameters for which we eed to solve, sice if we figure out θ i1, θ i2, ad θ i3, we ca use the relatio # ij =1to give us θ i4 4 j=1
7 7 Usig this method o a set of sequeces S 1, S 2,, S, all with the same θ, we ca derive a expressio for the relative frequecy ij = ij, which is simply the absolute frequecy of each ucleotide j for every colum i, divided by the total umber of sequeces Ofte times it is much harder to solve for the positio weight matrix θ It is quite likely withi a set of give sequeces S 1, S 2,, S that oly some sequeces cotai the motif, ad thus oly this subset ca geerate the weight matrix θ The problem is we do ot kow which sequeces form this subset Let us assume the rest of the o-motif (also called backgroud) sequeces form a subset geerated from a sigle distributio (ie, from a secod positio weight matrix θ o made up of idetical colums of p o = (p o A, p o C, p o G, p o T) = (p o 1, p o 2, p o 3, p o 4) The likelihood L(θ, θ o ) for this set of sequeces S 1, S 2,, S is ow ( ) = [ z k P( S k ) + ( 1# z k )P( S k o )] L(, o ) = P S 1,,S z,, o, # where z k = 1 if S k is geerated by % 0 if S k is geerated by o The problem of ot kowig if a sequece S k belogs to the motif (θ) or the backgroud model (θ o ) ca ow be expressed mathematically as ot kowig which value 0 or 1 to use for the biary fuctio z k associated with each S k Fortuately, we ca remove z from the equatio by itegratig the likelihood L(θ, θ o ) over all possible evets z: 3 ( ) = P( S 1,,S z,, o ) P S 1,,S, o After itegratio, we are left with L(, o ) = P S 1,,S, o # P( z) z ( ) = [ P( z k )P( S k ) + ( 1# P( z k ))P( S k o )] We may be fortuate to kow the probability P(z k =1) for the set of sequeces S 1, S 2,, S Represetig this probability as the costat α, the likelihood of the set may ow be writte as
8 8 ( ) = %[#P( S k ) + ( 1# )P( S k o )] L(, o ) = P S 1,,S, o Havig successfully expressed the likelihood as a fuctio of 3w o idepedet variables i,ski ad 3 idepedet variables i,ski, we ca ow use o our strategy of solvig for i,ski ad i,ski whe the likelihood is at a maximum However, settig the partial derivatives of the log-likelihood fuctio equal to zero is too difficult a task because the likelihood L(θ, θ o ) i this case is simply ot just a product of the idepedet variables We will implemet the EM Algorithm ext lecture to solve this maximum likelihood estimatio problem 1 Wikipedia, P-value, 2 The relative frequecy k o i N the motif is foud i the set So must also ot be close to 0 or 1 3 I geeral we ca calculate a margial probability from a coditioal or joit probability by removig oe of the variables usig itegratio ( ) = P( X,Y) P X = P( X Y) P( Y), Y where we take the sum over all possible evets Y From R Durbi, S Eddy, A Krogh, ad G Mitchiso, Biological Sequece Aalysis, Cambridge Uiversity Press, 2006, p 6 Y
Bayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationDirection: This test is worth 250 points. You are required to complete this test within 50 minutes.
Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely
More informationLecture 12: November 13, 2018
Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationClass 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700
Class 23 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 2017 by D.B. Rowe 1 Ageda: Recap Chapter 9.1 Lecture Chapter 9.2 Review Exam 6 Problem Solvig Sessio. 2
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationLecture 14: Graph Entropy
15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationAxioms of Measure Theory
MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that
More informationTable 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab
Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationDirection: This test is worth 150 points. You are required to complete this test within 55 minutes.
Term Test 3 (Part A) November 1, 004 Name Math 6 Studet Number Directio: This test is worth 10 poits. You are required to complete this test withi miutes. I order to receive full credit, aswer each problem
More information( ) = p and P( i = b) = q.
MATH 540 Radom Walks Part 1 A radom walk X is special stochastic process that measures the height (or value) of a particle that radomly moves upward or dowward certai fixed amouts o each uit icremet of
More informationMa 530 Introduction to Power Series
Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power
More informationRecurrence Relations
Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The
More informationIntroduction to Computational Molecular Biology. Gibbs Sampling
18.417 Itroductio to Computatioal Molecular Biology Lecture 19: November 16, 2004 Scribe: Tushara C. Karuarata Lecturer: Ross Lippert Editor: Tushara C. Karuarata Gibbs Samplig Itroductio Let s first recall
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationIt is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.
MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied
More informationStatisticians use the word population to refer the total number of (potential) observations under consideration
6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space
More information1 Hash tables. 1.1 Implementation
Lecture 8 Hash Tables, Uiversal Hash Fuctios, Balls ad Bis Scribes: Luke Johsto, Moses Charikar, G. Valiat Date: Oct 18, 2017 Adapted From Virgiia Williams lecture otes 1 Hash tables A hash table is a
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationIntroduction to Computational Biology Homework 2 Solution
Itroductio to Computatioal Biology Homework 2 Solutio Problem 1: Cocave gap pealty fuctio Let γ be a gap pealty fuctio defied over o-egative itegers. The fuctio γ is called sub-additive iff it satisfies
More informationRandomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)
Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationChapter 4. Fourier Series
Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,
More informationSample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.
ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationAMS570 Lecture Notes #2
AMS570 Lecture Notes # Review of Probability (cotiued) Probability distributios. () Biomial distributio Biomial Experimet: ) It cosists of trials ) Each trial results i of possible outcomes, S or F 3)
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationL = n i, i=1. dp p n 1
Exchageable sequeces ad probabilities for probabilities 1996; modified 98 5 21 to add material o mutual iformatio; modified 98 7 21 to add Heath-Sudderth proof of de Fietti represetatio; modified 99 11
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationChapter 2 The Monte Carlo Method
Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationShannon s noiseless coding theorem
18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio
More informationMassachusetts Institute of Technology
6.0/6.3: Probabilistic Systems Aalysis (Fall 00) Problem Set 8: Solutios. (a) We cosider a Markov chai with states 0,,, 3,, 5, where state i idicates that there are i shoes available at the frot door i
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationMassachusetts Institute of Technology
Solutios to Quiz : Sprig 006 Problem : Each of the followig statemets is either True or False. There will be o partial credit give for the True False questios, thus ay explaatios will ot be graded. Please
More informationMath 475, Problem Set #12: Answers
Math 475, Problem Set #12: Aswers A. Chapter 8, problem 12, parts (b) ad (d). (b) S # (, 2) = 2 2, sice, from amog the 2 ways of puttig elemets ito 2 distiguishable boxes, exactly 2 of them result i oe
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationSequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018
CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical
More informationKLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions
We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give
More informationClass 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700
Class 7 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 013 by D.B. Rowe 1 Ageda: Skip Recap Chapter 10.5 ad 10.6 Lecture Chapter 11.1-11. Review Chapters 9 ad 10
More informationMath 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency
Math 152. Rumbos Fall 2009 1 Solutios to Review Problems for Exam #2 1. I the book Experimetatio ad Measuremet, by W. J. Youde ad published by the by the Natioal Sciece Teachers Associatio i 1962, the
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationChapter 7 COMBINATIONS AND PERMUTATIONS. where we have the specific formula for the binomial coefficients:
Chapter 7 COMBINATIONS AND PERMUTATIONS We have see i the previous chapter that (a + b) ca be writte as 0 a % a & b%þ% a & b %þ% b where we have the specific formula for the biomial coefficiets: '!!(&)!
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationComputing Confidence Intervals for Sample Data
Computig Cofidece Itervals for Sample Data Topics Use of Statistics Sources of errors Accuracy, precisio, resolutio A mathematical model of errors Cofidece itervals For meas For variaces For proportios
More informationFIR Filter Design: Part II
EEL335: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we cosider how we might go about desigig FIR filters with arbitrary frequecy resposes, through compositio of multiple sigle-peak
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationDiscrete probability distributions
Discrete probability distributios I the chapter o probability we used the classical method to calculate the probability of various values of a radom variable. I some cases, however, we may be able to develop
More informationSums, products and sequences
Sums, products ad sequeces How to write log sums, e.g., 1+2+ (-1)+ cocisely? i=1 Sum otatio ( sum from 1 to ): i 3 = 1 + 2 + + If =3, i=1 i = 1+2+3=6. The ame ii does ot matter. Could use aother letter
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationSTAT 516 Answers Homework 6 April 2, 2008 Solutions by Mark Daniel Ward PROBLEMS
STAT 56 Aswers Homework 6 April 2, 28 Solutios by Mark Daiel Ward PROBLEMS Chapter 6 Problems 2a. The mass p(, correspods to either o the irst two balls beig white, so p(, 8 7 4/39. The mass p(, correspods
More informationLecture Overview. 2 Permutations and Combinations. n(n 1) (n (k 1)) = n(n 1) (n k + 1) =
COMPSCI 230: Discrete Mathematics for Computer Sciece April 8, 2019 Lecturer: Debmalya Paigrahi Lecture 22 Scribe: Kevi Su 1 Overview I this lecture, we begi studyig the fudametals of coutig discrete objects.
More informationThe variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.
SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More information1 Models for Matched Pairs
1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationGG313 GEOLOGICAL DATA ANALYSIS
GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationProblems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:
Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19
CS 70 Discrete Mathematics ad Probability Theory Sprig 2016 Rao ad Walrad Note 19 Some Importat Distributios Recall our basic probabilistic experimet of tossig a biased coi times. This is a very simple
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationKurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)
Kurskod: TAMS Provkod: TENB 2 March 205, 4:00-8:00 Examier: Xiagfeg Yag (Tel: 070 2234765). Please aswer i ENGLISH if you ca. a. You are allowed to use: a calculator; formel -och tabellsamlig i matematisk
More informationConfidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.
MATH1005 Statistics Lecture 24 M. Stewart School of Mathematics ad Statistics Uiversity of Sydey Outlie Cofidece itervals summary Coservative ad approximate cofidece itervals for a biomial p The aïve iterval
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationUC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170
UC Berkeley CS 170: Efficiet Algorithms ad Itractable Problems Hadout 17 Lecturer: David Wager April 3, 2003 Notes 17 for CS 170 1 The Lempel-Ziv algorithm There is a sese i which the Huffma codig was
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationEconomics Spring 2015
1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures
More informationBig Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.
5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece
More informationProbability theory and mathematical statistics:
N.I. Lobachevsky State Uiversity of Nizhi Novgorod Probability theory ad mathematical statistics: Law of Total Probability. Associate Professor A.V. Zorie Law of Total Probability. 1 / 14 Theorem Let H
More information1 Generating functions for balls in boxes
Math 566 Fall 05 Some otes o geeratig fuctios Give a sequece a 0, a, a,..., a,..., a geeratig fuctio some way of represetig the sequece as a fuctio. There are may ways to do this, with the most commo ways
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/
More informationConfidence Intervals for the Population Proportion p
Cofidece Itervals for the Populatio Proportio p The cocept of cofidece itervals for the populatio proportio p is the same as the oe for, the samplig distributio of the mea, x. The structure is idetical:
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More information(7 One- and Two-Sample Estimation Problem )
34 Stat Lecture Notes (7 Oe- ad Two-Sample Estimatio Problem ) ( Book*: Chapter 8,pg65) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye Estimatio 1 ) ( ˆ S P i i Poit estimate:
More informationApril 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE
April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece
More information