Pairwise sequence alignment
|
|
- Alison Harmon
- 5 years ago
- Views:
Transcription
1 Dstace ad smlarty Parwse sequece algmet Lecturer: Mara Alexadersso 29 October 2003 Whe faced wth two related bologcal sequeces, we would lke to estmate ther evolutoary dstace That s, the dstace to ther commo acestor There s o uque ad uversal defto of smlarty Whe comparg regular words, for stace, ther soud, spellg ad meag ca be combed dfferet ways hear ad here: smlar soud ad spellg, but totally dfferet meags hear ad bear: eve more smlar spellg, but dfferet souds ad meags hear ad lste: dfferet soud ad spellg, but very smlar meag I a smlar way, the sequece, structure ad fucto of protes ca be combed dfferet ways Fortuately protes are a lttle more regular tha the stuato descrbed above The geeral rule s that sequece determes structure, ad structure determes fucto So whe studyg sequece smlarty we hope to dscover or valdate smlarty structure ad fucto Ths s ofte successful, but there are couter-examples to each rule Protes wth very dfferet sequece fold smlarly ad perform smlar fucto, protes wth very smlar sequece fold up dfferetly, or protes wth very smlar fuctos but stll very dfferet structure Here we oly study the sequece smlarty, ad a dstace measure dcate the degree of smlarty the sequeces compared Defto Mathematcally, a dstace D s a fucto o a set S, where for obects the followg holds for D : symmetrc: D ( v, w) = D( w, v) u, v, w S o-egatve: D( v, w) 0 wth D( v, w) = 0 oly whe v = w tragle equalty: D ( u, w) D( u, v) + D( v, w) u v w
2 Parwse algmet The game parwse algmet s to place the tow sequeces o top of each other ad sert spaces (gaps) varous umbers ad places both sequeces to obta the hghest umber of colums of detcal resdue pars Algmet algorthms strves to model the mutatoal process gvg rse to the two sequeces, startg from a commo acestor The basc mutatoal processes are substtutos: replace resdues sertos: add resdues deletos: remove resdues Isertos ad deletos are the reverse operatos of oe aother ad are usually called dels for short Dstace ad smlarty A dstace measure assg weghts to each mutato, ad the dstace betwee two sequeces s the mmal sum of weghts for a set of mutatos trasformg oe to aother A smlarty measure assgs weghts accordg to the resemblace of the two sequeces The smlarty betwee two sequeces s the maxmal sum of such weghts I parwse algmet we try to combe the smlarty ad dstace to oe sgle algmet score Smplest model: Edt dstace The edt dstace betwee two sequece s the mmal umber of edt operatos (dels ad substtutos) eeded to trasform oe sequece to aother Example Trasformato of acctga to agcta: accgta agctga agcta Edt dstace = 2 substtuto deleto Algmet scores Whe calculatg the total score of a algmet we assume depedece betwee resdues, such that the probablty of the algmet s x : x x x y : y y y Pr( algmet) = Pr( x, y) Pr( x2, y2 ) Pr( x, y 2 2 )
3 or l(pr( algmet )) = l( y)) + l(pr( x2, y2 )) + + l(pr( x, y )) Substtuto matrces We wat to separate algmets of homologous sequeces from algmets of ohomologous sequeces That s, we wat to kow whether a certa algmet fers homology or has occurred by pure chace (ad thus that the sequeces are depedet) Oe way s by usg the relatve lkelhood y M ) where M s the model for homology (e uder the assumpto that x ad y are related), ad R s the radom algmet model ( x ad y are ot related ad the smlarty occurred by chace) Radom model (R): (o homology) The two sequeces, as well as the postos wth each sequece, are assumed depedet The resdues x x ad y y (DNA bases or amo acds) occur wth probabltes q x ad q y respectvely Match model (M): (homology) The two sequeces are assumed to be depedet (or related), postos wth each sequece are stll assumed depedet The resdues x x ad y y occur together wth probablty p x y Example The algmet ADE CDF has the probabltes Radom model: Match model: I geeral, = q y M ) = A p q AC D p q E DD q p C EF q D q F ad Pr( x, y R ) = = y M ) = = q q x y p x y
4 x ad y really homologous y M ) y M ) y M ) S = l l() = 0 If S s large eough we reect the radom model ad assume x ad y to be homologous y M ) p S = l = l q x y x p q x2 y2 where s x, y ) s the score for par x, y ) ( The scores should be such that ( x q p y x y q y = = p l q q dettes ad coservatve substtutos get a postve score, eutral substtutos get score 0, p = q q x y o-coservatve substtutos get egatve scores, x y x y x x y y x = = x y p < q q s( x, y ) p > q q A substtuto matrx for amo acds s a matrx of the scores for each resdue combato If for all pars x, y ) we kew the probabltes p, q ad q we could ( calculate all 20 dstctve etres the matrx (20 sce s x, y ) = s( y, x ) ), ad the use t to check f S s large eough x y y x x y ( But we do t! Aother way s to estmate them from cofrmed algmets Problems wth ths: Hard to fd a radom sample Protes come famles, so the kow protes are hghly based Dfferet pars or protes have dfferet dstaces to ther commo acestor, ad thus ther scorg matrces would be dfferet We do t wat to estmate a scorg matrx for each possble evolutoary dstace, but rather ust oe matrx altogether If the commo acestor s very recet we would expect p ab to be small for a b, ad s ( a, b) should be strogly egatve If log tme has passed sce the sequeces dverged we expect p ab to ted to q q q b, ad so s ( a, b ) should be close to zero for all a, b PAM matrces (Dayhoff) PAM = percet accepted pot mutato The PAM matrx was costructed from 7 prote famles Sequeces of at least 85% detty were alged Ths requremet was because resulted uambguous algmet chaces of two substtutos the same posto s small y
5 The PAM matrx cossts of trasto probabltes Pr( b a) of a substtuto of amo acd a to amo acd b prote sequeces of such a evolutoary dstace that the average umber of substtutos s % of the umber of postos That s, exposg a prote to the evolutoary chage such a tme terval results o average substtutos 00 amo acds To extrapolate to loger evolutoary tmes we assume that the substtutos occur as a tme-reversble Markov process, ad wth PAM cosstg of etres Pr( b a, t = ) the a PAM2 s costructed by PR ( b a, t = 2) = Pr( c a, t = ) Pr( b c, t = ) c that s, substtutos from a to b va some arbtrary amo acd c (summg over all possble amo acds) A PAMx s costructed by terato Pr( b a, t = x) = Pr( c a, t = x ) Pr( b c, t = ) The score matrx s the obtaed by c Pr( b a, t) s( a, b t) = l Oe dsadvatage wth the PAM matrces s that whle there s oly a small estmato error PAM t grows wth dstace ad s rather large PAM250 To get aroud ths problem the BLOSUM (BLOcks SUbsttuto Matrx) matrx famly was costructed Gap pealtes Whe algg two sequeces we place them o top of each other ad sert spaces varous umbers ad places to obta the maxmal umber of dettes alged These spaces fer gaps, ad the legth of a gap s smply the umber of spaces, or del operatos, t Stadard gap pealtes for a gap of legth g are Lear pealtes: γ ( g) = gd e the gap resdues are depedet wth score q b (or pealty d ) Affe pealtes: γ ( g) = d e( g ) e opeg a gap gets score ad the, whe extedg t, every succeedg gap resdue gets score e Usually e < d Usg lear gap pealtes results a smpler model, but t s less relevat bologcally tha the affe gap Usg lear gaps s to say that havg a serto of say 0 resdues s as dffcult as havg 0 sertos of resdue each It makes more sese that 0 cosecutve gap resdues should be pealzed less tha 0 separate oes
6 Algmet algorthms Global algmet: Needlema-Wusch We wat to alg two etre sequeces, ths tme ot ecessarly of the same legth sce we re allowed to sert gaps We let x : x x y : y 2 x 2 y y, ) = score of the best algmet of subsequeces x x2 x ad y y2 y m We arrage these scores a m matrx F F s flled by usg the dyamc programmg method If we use a lear gap pealty, we start wth F ( 0,0) = 0 ad the fll the frst row wth,0) = d, ad the the frst colum wth 0, ) = The rest of the matrx s flled from top-left to bottom-rght usg, ) + s( x, y ), ) = max, ), ) (match/msmatch) (gap y) (gap x) That s, each posto (, ) the score s bult upo oe of three possble prevous postos, ) + s( x, y ), ) F (, ), ) Ths way the last cell the matrx, F (, m), cotas the score of the optmal algmet To ot ust get the score, but the actual algmet tself, we keep poters each posto the matrx to the best prevous posto That s, for posto (, ) we remember whch of the postos (, ), (, ) or (, ) that gave rse to the score F (, ) The we backtrack through F, startg (, m), to obta the best global algmet
7 Example The dyamc programmg matrx for the two sequeces ANVDR VCNDR usg lear gap pealtes d = 8 ad the BLOSUM80 scorg matrx (see below), becomes A N V D R V C N D R Thus, the score of the optmal algmet s, ad the optmal algmet tself s A-NVDR VCN-DR or -ANVDR VCN-DR e there are two algmets wth the exact same score Local algmet: Smth-Waterma Sometmes we wat to fd a coserved rego, eg a coserved prote doma, a ot alg the etre sequeces CTCCCCCCTTCAGGCTCGCCAC -T-----CTTCAGGC-----A- The local algmet algorthm s very smlar to the global, t oly eeds a slght modfcato We talze F ( 0,0) = 0 as before, but ow the frst row ad the frst colum also become 0, that s F (,0) = 0, ) = 0
8 The rest of the matrx s flled usg, ) + s( x, y ), ), ) = max, ) 0 The traceback starts the (, ) cell wth the hghest score whch s ot ecessarly (but of course stll could be) cell (, m) The we follow the poters back utl a cell wth score 0 s reached or utl we reach cell ( 0,0) For ths to work the expected score of a radom match must be egatve Otherwse log radom matches wll get a hgh score ust based o ther legth Also we have to have s ( a, b) > 0 for at least some pars ( a, b), otherwse the algorthm wll ot fd ay algmets at all Heurstc algmet algorthms Usg dyamcal programmg algorthms as Needlema-Wusch ad Smth- Waterma we are guarateed to fd the optmal algmet All possble algmets are searched through ad the oe wth the hghest score s selected Ufortuately though, these algorthms are usually computatoally complex ( speed ad memory usage) Heurstc searches are less sestve, but pretty good stll ad much faster The dea s that most good algmets have short stretches of ugapped very hgh scorg matches BLAST (Basc Local Algmet Search Tool) BLAST s oe of the most wdely used tool boformatcs, ad perhaps bology all together BLAST takes a query sequece (DNA or prote) ad searches for local algmet matches a database The procedure s as follows Fd all substrgs of legth w ( w -legth words) the database that algs wth words the query sequece, where the (ugapped) algmet has score hgher tha some threshold t These words are called hts Exted each ht to see f t s cotaed a (ugapped) algmet segmet of score hgher tha some other threshold S Usually w s about 3-5 for amo acds ad ~2 for DNA Example Let w = 2, t = 8 ad use a PAM20 scorg matrx (see below) Assume that we wat to compare the followg query sequece to a database of kow protes Query: QLNFSAGW The possble w -legth words the query are the QL, LN, NF, FS, SA, AG, GW Frst we wat to extract all protes the database that cota at least oe w -legth word that algs to oe of the w -legth words the query such that the score (usg
9 PAM20) s at least t = 8 These w -legth words are the called hts For stace, for QL, the possble hts are QL, QM ad HL: QL: QL =, QL = 9, QL = 8 QL,QM,HL QL QM HL I the same maer hts for all the other w -legth words the query are determed, ad protes cotag these extracted from the database LN: LN = 9 LN LN NF: NF = 2, NF = 8, NF = 8 NF,AF,NY NF AF NY FS: FS = FS = FS = FS = 9, FS = 8 FS,FA,FN,FG,FQ FS FA FN FG FQ SA: - AG: AG GW: GW,AW,SW,NW,DW,EW,ET,RW,VW,QW,KW,RW,CW,HW,IW,MW For the protes extracted from the database, try to exted aroud each ht to see f t s part of a larger algmet of score S 20 For stace, the database etry NLNYTRW cotas hts NL, LN ad RW Q LN FSAGW N LN YTPW BLAST reports the score ad the E-value (see below) FASTA For ucleotde searches, FASTA may be more sestve tha BLAST The procedure s as follows: A lookup table s created for all detcal matchg words of legth ktup (-2 for protes, 4-6 for DNA) betwee the query sequece ad the database The comparso of the query sequece ad the database ca be vewed as a set of dotplots, wth the query as the vertcal sequece ad the database sequeces as the horzotal Dagoals wth the largest umber of words are regstered The best regos are rescored, usg a scorg matrx, tryg to exted the match for as log as possble ad stll have a score above a gve threshold Ugapped regos are oed f the total score s S The hghest scorg caddates are realged usg a dyamcal programmg algorthm, but restrctg t to a bad aroud the caddate match
10 Sgfcace of scores Whe searchg prote databases usg search tools such as BLAST or FASTA the best ht s reported alog wth a E-value ad a P-value These are used to somehow dcate how covcg the smlarty betwee your query sequece ad the top ht the database s The statstcal theory for optmal algmet scores s very complex, ad there s a rgorous theory developed oly for local algmets wthout gaps It s ths theory that s used whe reportg the E- ad P-values database searches The classcal approach uses the extreme value dstrbuto to calculate the probablty that the best match from a search of N urelated sequeces has score S If ths probablty s very small we do t beleve that the sequeces are urelated, ad so they are lkely to be homologous HSP = hgh scorg par If two radom sequeces of legths ad m are alged, the probablty of fdg at least oe segmet par wth score S s where Pr( at least oe HSP wth score S ) exp{ Kme λs } K, λ depeds o the scorg scheme used We call ths the P-value The expected umber of segmet pars havg score We call ths the E-value E[# HSPs wth score S S the radom model s ] = Kme λs Scores are ofte ormalzed to get rd of the depedece o the scorg system ad the E-value / 2 S ' m S l K S' = λ l 2 BLAST reports the ormalzed score ad the E-value where m s the legth of the query sequece ad the legth of the etre database (the sum of all sequece legths the database)
11 BLOSUM80 PAM20 A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
1. BLAST (Karlin Altschul) Statistics
Parwse seuece algmet global ad local Multple seuece algmet Substtuto matrces Database searchg global local BLAST Seuece statstcs Evolutoary tree recostructo Gee Fdg Prote structure predcto RNA structure
More informationChapter 9 Jordan Block Matrices
Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.
More informationLecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model
Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The
More informationPTAS for Bin-Packing
CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,
More informationIntroduction to local (nonparametric) density estimation. methods
Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest
More informationL5 Polynomial / Spline Curves
L5 Polyomal / Sple Curves Cotets Coc sectos Polyomal Curves Hermte Curves Bezer Curves B-Sples No-Uform Ratoal B-Sples (NURBS) Mapulato ad Represetato of Curves Types of Curve Equatos Implct: Descrbe a
More informationDiscrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b
CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package
More informationHomework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015
Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts
More informationCHAPTER VI Statistical Analysis of Experimental Data
Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca
More informationChapter 5 Properties of a Random Sample
Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample
More informationEconometric Methods. Review of Estimation
Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators
More informationComputational Geometry
Problem efto omputatoal eometry hapter 6 Pot Locato Preprocess a plaar map S. ve a query pot p, report the face of S cotag p. oal: O()-sze data structure that eables O(log ) query tme. pplcato: Whch state
More information1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.
CS 94- Desty Matrces, vo Neuma Etropy 3/7/07 Sprg 007 Lecture 3 I ths lecture, we wll dscuss the bascs of quatum formato theory I partcular, we wll dscuss mxed quatum states, desty matrces, vo Neuma etropy
More informationBayes (Naïve or not) Classifiers: Generative Approach
Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg
More informationFor combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.
Addtoal Decrease ad Coquer Algorthms For combatoral problems we mght eed to geerate all permutatos, combatos, or subsets of a set. Geeratg Permutatos If we have a set f elemets: { a 1, a 2, a 3, a } the
More informationThe Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)
We have covered: Selecto, Iserto, Mergesort, Bubblesort, Heapsort Next: Selecto the Qucksort The Selecto Problem - Varable Sze Decrease/Coquer (Practce wth algorthm aalyss) Cosder the problem of fdg the
More informationOutline. Point Pattern Analysis Part I. Revisit IRP/CSR
Pot Patter Aalyss Part I Outle Revst IRP/CSR, frst- ad secod order effects What s pot patter aalyss (PPA)? Desty-based pot patter measures Dstace-based pot patter measures Revst IRP/CSR Equal probablty:
More informationLecture 3 Probability review (cont d)
STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto
More informationSummary of the lecture in Biostatistics
Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the
More informationA tighter lower bound on the circuit size of the hardest Boolean functions
Electroc Colloquum o Computatoal Complexty, Report No. 86 2011) A tghter lower boud o the crcut sze of the hardest Boolea fuctos Masak Yamamoto Abstract I [IPL2005], Fradse ad Mlterse mproved bouds o the
More informationLecture Notes Types of economic variables
Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte
More informationPoint Estimation: definition of estimators
Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.
More informationUNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS
UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted
More information3D Geometry for Computer Graphics. Lesson 2: PCA & SVD
3D Geometry for Computer Graphcs Lesso 2: PCA & SVD Last week - egedecomposto We wat to lear how the matrx A works: A 2 Last week - egedecomposto If we look at arbtrary vectors, t does t tell us much.
More informationUNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS
UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:
More informationLecture 02: Bounding tail distributions of a random variable
CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome
More informationAlgorithms Design & Analysis. Hash Tables
Algorthms Desg & Aalyss Hash Tables Recap Lower boud Order statstcs 2 Today s topcs Drect-accessble table Hash tables Hash fuctos Uversal hashg Perfect Hashg Ope addressg 3 Symbol-table problem Symbol
More informationMultiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades
STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos
More informationFeature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)
CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.
More informationbest estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best
Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg
More informationBIOREPS Problem Set #11 The Evolution of DNA Strands
BIOREPS Problem Set #11 The Evoluto of DNA Strads 1 Backgroud I the md 2000s, evolutoary bologsts studyg DNA mutato rates brds ad prmates dscovered somethg surprsg. There were a large umber of mutatos
More informationExercises for Square-Congruence Modulo n ver 11
Exercses for Square-Cogruece Modulo ver Let ad ab,.. Mark True or False. a. 3S 30 b. 3S 90 c. 3S 3 d. 3S 4 e. 4S f. 5S g. 0S 55 h. 8S 57. 9S 58 j. S 76 k. 6S 304 l. 47S 5347. Fd the equvalece classes duced
More informationCHAPTER 4 RADICAL EXPRESSIONS
6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube
More informationLecture 9: Tolerant Testing
Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have
More informationThe number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter
LOGISTIC REGRESSION Notato Model Logstc regresso regresses a dchotomous depedet varable o a set of depedet varables. Several methods are mplemeted for selectg the depedet varables. The followg otato s
More informationIdeal multigrades with trigonometric coefficients
Ideal multgrades wth trgoometrc coeffcets Zarathustra Brady December 13, 010 1 The problem A (, k) multgrade s defed as a par of dstct sets of tegers such that (a 1,..., a ; b 1,..., b ) a j = =1 for all
More informationChapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements
Aoucemets No-Parametrc Desty Estmato Techques HW assged Most of ths lecture was o the blacboard. These sldes cover the same materal as preseted DHS Bometrcs CSE 90-a Lecture 7 CSE90a Fall 06 CSE90a Fall
More informationDescriptive Statistics
Page Techcal Math II Descrptve Statstcs Descrptve Statstcs Descrptve statstcs s the body of methods used to represet ad summarze sets of data. A descrpto of how a set of measuremets (for eample, people
More information8.1 Hashing Algorithms
CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega
More informationMean is only appropriate for interval or ratio scales, not ordinal or nominal.
Mea Same as ordary average Sum all the data values ad dvde by the sample sze. x = ( x + x +... + x Usg summato otato, we wrte ths as x = x = x = = ) x Mea s oly approprate for terval or rato scales, ot
More informationKLT Tracker. Alignment. 1. Detect Harris corners in the first frame. 2. For each Harris corner compute motion between consecutive frames
KLT Tracker Tracker. Detect Harrs corers the frst frame 2. For each Harrs corer compute moto betwee cosecutve frames (Algmet). 3. Lk moto vectors successve frames to get a track 4. Itroduce ew Harrs pots
More informationQR Factorization and Singular Value Decomposition COS 323
QR Factorzato ad Sgular Value Decomposto COS 33 Why Yet Aother Method? How do we solve least-squares wthout currg codto-squarg effect of ormal equatos (A T A A T b) whe A s sgular, fat, or otherwse poorly-specfed?
More informationDimensionality Reduction and Learning
CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that
More information{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:
Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed
More information9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d
9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,
More informationAnalysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems
Char for Network Archtectures ad Servces Prof. Carle Departmet of Computer Scece U Müche Aalyss of System Performace IN2072 Chapter 5 Aalyss of No Markov Systems Dr. Alexader Kle Prof. Dr.-Ig. Georg Carle
More informationIII-16 G. Brief Review of Grand Orthogonality Theorem and impact on Representations (Γ i ) l i = h n = number of irreducible representations.
III- G. Bref evew of Grad Orthogoalty Theorem ad mpact o epresetatos ( ) GOT: h [ () m ] [ () m ] δδ δmm ll GOT puts great restrcto o form of rreducble represetato also o umber: l h umber of rreducble
More information18.413: Error Correcting Codes Lab March 2, Lecture 8
18.413: Error Correctg Codes Lab March 2, 2004 Lecturer: Dael A. Spelma Lecture 8 8.1 Vector Spaces A set C {0, 1} s a vector space f for x all C ad y C, x + y C, where we take addto to be compoet wse
More informationSPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS
SPECIAL CONSIDERAIONS FOR VOLUMERIC Z-ES FOR PROPORIONS Oe s stctve reacto to the questo of whether two percetages are sgfcatly dfferet from each other s to treat them as f they were proportos whch the
More informationChapter 11 Systematic Sampling
Chapter stematc amplg The sstematc samplg techue s operatoall more coveet tha the smple radom samplg. It also esures at the same tme that each ut has eual probablt of cluso the sample. I ths method of
More informationMOLECULAR VIBRATIONS
MOLECULAR VIBRATIONS Here we wsh to vestgate molecular vbratos ad draw a smlarty betwee the theory of molecular vbratos ad Hückel theory. 1. Smple Harmoc Oscllator Recall that the eergy of a oe-dmesoal
More informationMA/CSSE 473 Day 27. Dynamic programming
MA/CSSE 473 Day 7 Dyamc Programmg Bomal Coeffcets Warshall's algorthm (Optmal BSTs) Studet questos? Dyamc programmg Used for problems wth recursve solutos ad overlappg subproblems Typcally, we save (memoze)
More information4 Inner Product Spaces
11.MH1 LINEAR ALGEBRA Summary Notes 4 Ier Product Spaces Ier product s the abstracto to geeral vector spaces of the famlar dea of the scalar product of two vectors or 3. I what follows, keep these key
More informationCIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights
CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:
More informationOrdinary Least Squares Regression. Simple Regression. Algebra and Assumptions.
Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos
More informationA Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line
HUR Techcal Report 000--9 verso.05 / Frak Borg (borgbros@ett.f) A Study of the Reproducblty of Measuremets wth HUR Leg Eteso/Curl Research Le A mportat property of measuremets s that the results should
More informationSimple Linear Regression
Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato
More information1 Onto functions and bijections Applications to Counting
1 Oto fuctos ad bectos Applcatos to Coutg Now we move o to a ew topc. Defto 1.1 (Surecto. A fucto f : A B s sad to be surectve or oto f for each b B there s some a A so that f(a B. What are examples of
More information2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.
.5 x 54.5 a. x 7. 786 7 b. The raked observatos are: 7.4, 7.5, 7.7, 7.8, 7.9, 8.0, 8.. Sce the sample sze 7 s odd, the meda s the (+)/ 4 th raked observato, or meda 7.8 c. The cosumer would more lkely
More informationTESTS BASED ON MAXIMUM LIKELIHOOD
ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1
STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ
More informationPosition Dependent and Independent Evolutionary Models Based on Empirical Amino Acid Substitution Matrices
Posto Depedet ad Idepedet Evolutoary odels Based o Emprcal Amo Acd Substtuto atrces Berardo Barbell 1, Alexadra Portova 2, Aa Chetoukha 2, Cha-Hs Lu 2 ad atteo Pellegr 3 1 Departmet o Physcs Northeaster
More informationFunctions of Random Variables
Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,
More informationSimulation Output Analysis
Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5
More informationMultiple Linear Regression Analysis
LINEA EGESSION ANALYSIS MODULE III Lecture - 4 Multple Lear egresso Aalyss Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Cofdece terval estmato The cofdece tervals multple
More informationX X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then
Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers
More informationEstimation of Stress- Strength Reliability model using finite mixture of exponential distributions
Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur
More information(b) By independence, the probability that the string 1011 is received correctly is
Soluto to Problem 1.31. (a) Let A be the evet that a 0 s trasmtted. Usg the total probablty theorem, the desred probablty s P(A)(1 ɛ ( 0)+ 1 P(A) ) (1 ɛ 1)=p(1 ɛ 0)+(1 p)(1 ɛ 1). (b) By depedece, the probablty
More informationX ε ) = 0, or equivalently, lim
Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece
More informationUNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS
Numercal Computg -I UNIT SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Structure Page Nos..0 Itroducto 6. Objectves 7. Ital Approxmato to a Root 7. Bsecto Method 8.. Error Aalyss 9.4 Regula Fals Method
More informationPseudo-random Functions
Pseudo-radom Fuctos Debdeep Mukhopadhyay IIT Kharagpur We have see the costructo of PRG (pseudo-radom geerators) beg costructed from ay oe-way fuctos. Now we shall cosder a related cocept: Pseudo-radom
More informationInvestigating Cellular Automata
Researcher: Taylor Dupuy Advsor: Aaro Wootto Semester: Fall 4 Ivestgatg Cellular Automata A Overvew of Cellular Automata: Cellular Automata are smple computer programs that geerate rows of black ad whte
More informationClass 13,14 June 17, 19, 2015
Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral
More informationMaps on Triangular Matrix Algebras
Maps o ragular Matrx lgebras HMED RMZI SOUROUR Departmet of Mathematcs ad Statstcs Uversty of Vctora Vctora, BC V8W 3P4 CND sourour@mathuvcca bstract We surveys results about somorphsms, Jorda somorphsms,
More informationBlock-Based Compact Thermal Modeling of Semiconductor Integrated Circuits
Block-Based Compact hermal Modelg of Semcoductor Itegrated Crcuts Master s hess Defese Caddate: Jg Ba Commttee Members: Dr. Mg-Cheg Cheg Dr. Daqg Hou Dr. Robert Schllg July 27, 2009 Outle Itroducto Backgroud
More informationSpecial Instructions / Useful Data
JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth
More informationLecture 3. Sampling, sampling distributions, and parameter estimation
Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called
More informationENGI 3423 Simple Linear Regression Page 12-01
ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable
More informationTHE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA
THE ROYAL STATISTICAL SOCIETY EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER II STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for
More informationStatistics: Unlocking the Power of Data Lock 5
STAT 0 Dr. Kar Lock Morga Exam 2 Grades: I- Class Multple Regresso SECTIONS 9.2, 0., 0.2 Multple explaatory varables (0.) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (0.2) Exam 2 Re- grades Re-
More informationhp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations
HP 30S Statstcs Averages ad Stadard Devatos Average ad Stadard Devato Practce Fdg Averages ad Stadard Devatos HP 30S Statstcs Averages ad Stadard Devatos Average ad stadard devato The HP 30S provdes several
More informationChapter 8. Inferences about More Than Two Population Central Values
Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha
More informationInterpolated Markov Models for Gene Finding
Iterpolated Markov Models for Gee Fdg BMI/CS 776 www.bostat.wsc.edu/bm776/ Sprg 2009 Mark Crave crave@bostat.wsc.edu The Gee Fdg Task Gve: a ucharacterzed DNA sequece Do: locate the gees the sequece, cludg
More informationMultivariate Transformation of Variables and Maximum Likelihood Estimation
Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty
More informationChapter 14 Logistic Regression Models
Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as
More informationTransforms that are commonly used are separable
Trasforms s Trasforms that are commoly used are separable Eamples: Two-dmesoal DFT DCT DST adamard We ca the use -D trasforms computg the D separable trasforms: Take -D trasform of the rows > rows ( )
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 00 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for the
More informationb. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.
.46. a. The frst varable (X) s the frst umber the par ad s plotted o the horzotal axs, whle the secod varable (Y) s the secod umber the par ad s plotted o the vertcal axs. The scatterplot s show the fgure
More information12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model
1. Estmatg Model parameters Assumptos: ox ad y are related accordg to the smple lear regresso model (The lear regresso model s the model that says that x ad y are related a lear fasho, but the observed
More informationLecture 2 - What are component and system reliability and how it can be improved?
Lecture 2 - What are compoet ad system relablty ad how t ca be mproved? Relablty s a measure of the qualty of the product over the log ru. The cocept of relablty s a exteded tme perod over whch the expected
More informationA Markov Chain Competition Model
Academc Forum 3 5-6 A Marov Cha Competto Model Mchael Lloyd, Ph.D. Mathematcs ad Computer Scece Abstract A brth ad death cha for two or more speces s examed aalytcally ad umercally. Descrpto of the Model
More informationMu Sequences/Series Solutions National Convention 2014
Mu Sequeces/Seres Solutos Natoal Coveto 04 C 6 E A 6C A 6 B B 7 A D 7 D C 7 A B 8 A B 8 A C 8 E 4 B 9 B 4 E 9 B 4 C 9 E C 0 A A 0 D B 0 C C Usg basc propertes of arthmetc sequeces, we fd a ad bm m We eed
More informationSolving Constrained Flow-Shop Scheduling. Problems with Three Machines
It J Cotemp Math Sceces, Vol 5, 2010, o 19, 921-929 Solvg Costraed Flow-Shop Schedulg Problems wth Three Maches P Pada ad P Rajedra Departmet of Mathematcs, School of Advaced Sceces, VIT Uversty, Vellore-632
More informationChapter 8: Statistical Analysis of Simulated Data
Marquette Uversty MSCS600 Chapter 8: Statstcal Aalyss of Smulated Data Dael B. Rowe, Ph.D. Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 08 by Marquette Uversty MSCS600 Ageda 8. The Sample
More informationSTA 105-M BASIC STATISTICS (This is a multiple choice paper.)
DCDM BUSINESS SCHOOL September Mock Eamatos STA 0-M BASIC STATISTICS (Ths s a multple choce paper.) Tme: hours 0 mutes INSTRUCTIONS TO CANDIDATES Do ot ope ths questo paper utl you have bee told to do
More informationThird handout: On the Gini Index
Thrd hadout: O the dex Corrado, a tala statstca, proposed (, 9, 96) to measure absolute equalt va the mea dfferece whch s defed as ( / ) where refers to the total umber of dvduals socet. Assume that. The
More informationECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity
ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data
More informationLebesgue Measure of Generalized Cantor Set
Aals of Pure ad Appled Mathematcs Vol., No.,, -8 ISSN: -8X P), -888ole) Publshed o 8 May www.researchmathsc.org Aals of Lebesgue Measure of Geeralzed ator Set Md. Jahurul Islam ad Md. Shahdul Islam Departmet
More informationModule 7: Probability and Statistics
Lecture 4: Goodess of ft tests. Itroducto Module 7: Probablty ad Statstcs I the prevous two lectures, the cocepts, steps ad applcatos of Hypotheses testg were dscussed. Hypotheses testg may be used to
More informationStatistics of Random DNA
Statstcs of Radom DNA Aruma Ray Aaro Youg SUNY Geeseo Bomathematcs Group The Am To obta the epectato ad varaces for the ethalpy chage ΔH, etropy chage ΔS ad the free eergy chage ΔG for a radom -mer of
More information