Bayesian embedding of co-occurrence data for query-based visualization

Size: px
Start display at page:

Download "Bayesian embedding of co-occurrence data for query-based visualization"

Transcription

1 Bayesan embeddng of co-occurrence data for query-based vsualzaton Mohammad Khoshneshn Department of Management Scences The Unversty of Iowa Iowa Cty, IA 541 USA W. Nc Street Department of Management Scences The Unversty of Iowa Iowa Cty, IA 541 USA Padmn Srnvasan Computer Scence Department The Unversty of Iowa Iowa Cty, IA 541 USA Abstract We propose a generatve probablstc model for vsualzng co-occurrence data. In co-occurrence data, there are a number of enttes and the data ncludes the frequency of two enttes co-occurrng. We propose a Bayesan approach to nfer the latent varables. Gven the ntractablty of nference for the posteror dstrbuton, we use approxmate nference va varatonal approaches. The proposed Bayesan approach enables accurate embeddng n hgh-dmensonal space whch s not useful for vsualzaton. Therefore, we propose a method to embed a fltered number of enttes for a query query-based vsualzaton. Our experments show that our proposed models outperform co-occurrence data embeddng, the state-of-the-art model for vsualzng co-occurrence data. I. INTRODUCTION Vsualzaton s one of the most mportant exploratory tools for data analyss and mnng. In ths paper we propose a fully generatve probablstc model Bayesan co-occurrence data embeddng to embed co-occurrence data n a Eucldean space as an unsupervsed learnng approach that can be used for vsualzaton. Ths model s a Bayesan extenson to cooccurrence data embeddng () [7]. Bayesan co-occurrence data embeddng (Bayes-) s a generatve probablstc model for embeddng co-occurrence data n a Eucldean space. To explan dfferent types of cooccurrence data, we use graph notaton. Let G = (U, E) represent an unweghted graph wth multple edges allowed to exst between any two nodes. Each edge depcts a toen n the co-occurrence dataset. A toen s a sngle observed occurrence. So f edge or toen (, ) and ndexes enttes or nodes occurred 3 tmes, then v (the th, th element of co-occurrence matrx V ) s 3, and there are 3 edges between nodes and n the equvalent graph. The reason we use graph notaton s to consder some specfc relatonshps that cannot be represented by the matrx notaton. Based on the structure of the graph G, there are two types of structural attrbutes that descrbe co-occurrence data. Frst s whether the graph s drected or undrected. Second s whether the nodes U are homogeneous or heterogeneous. When the co-occurrence graph s drected, t means that one type of entty s responsble for generatng another type, therefore t s logcal to defne the generatve model as a condtonal probablty. In the case of an undrected graph, the jont probablty s more approprate. In the case of heterogeneous nodes, nodes n U are dvded nto two groups, U 1 and U, and edges can only be defned between nodes of dfferent types; ths heterogeneous graph s a bpartte graph. In the case of homogeneous nodes, there s only one type of node. Based on the categores for the co-occurrence data, four dfferent classes can be defned. In hetero-drected, each toen conssts of two dfferent enttes and one type of entty s responsble for generatng the other. The most popular example of ths type s text data where a document s responsble for generatng words. Therefore, the ln drecton s from document nodes to word nodes. In hetero-undrected, each toen conssts of two dfferent enttes and both enttes are generated smultaneously from a jont dstrbuton. An example of ths type s the co-occurrence of mage features and eywords. In, homo-drected, each toen conssts of two enttes from the same type and one entty s responsble for generatng the other. An example s co-ctaton data. Enttes are scholars who cte other scholars n ther research papers. In homo-undrected, each toen conssts of two enttes from the same type and both enttes are generated smultaneously from a jont dstrbuton. An example s the co-occurrence between tems from maret baset data. Each toen conssts of two tems that have been purchased together. In ths paper, we focus on the hetero-drected case. Dervng the model and nference algorthm for other cases wll be straghtforward based on the method proposed n ths paper. Snce nference n the proposed model s ntractable, we use approxmate nference va varatonal methods. It s hard to capture the essence of real-world data n two dmensons due to hgh complexty. On the other hand, vsualzng a large number of data s confusng rather than nformatve. Therefore, a way to present a fltered verson of data s useful. To ths end, we propose a query-based vsualzaton method. Although we present ths algorthm n the context of nformaton retreval, t can be appled to any query-answerng problem for co-occurrence data. The paper s organzed as follows. In Secton II, we revew the related lterature on vsualzng co-occurrence data. In Secton III, the Bayesan co-occurrence data embeddng s presented. In Secton IV, the approxmate nference dervatons are presented where we use varatonal Bayes approach to

2 learn the posteror parameters. In Secton V, we present querybased vsualzaton. In Secton VI, the expermental results are presented. We examned Bayes- n the context of vsualzng text data where t gves very compettve results. In the last secton, we conclude wth some comments and future drectons. II. BACKGROUND Vsualzng co-occurrence data wth heterogeneous nodes generally heterogeneous data has been rarely studed. Most of the lterature concentrates on embeddng only one type of data (e.g. mult-dmensonal scalng [5]). In the context of text data whch s co-occurrence data wth heterogeneous nodes most vsualzaton approaches usually embed only documents or words va embeddng algorthms such as multdmensonal scalng [10]. The state-of-the-art algorthm to vsualze co-occurrence data wth heterogeneous data s cooccurrence data embeddng () [7]. In, all enttes are embedded n a unfed Eucldean space n a way that closer enttes are more correlated. Let X (a 1 D vector) represent the latent varable of entty and Y (a 1 D vector) represent the latent varable of entty n a D-dmensonal space. In, two approaches are used for fttng the model to the data. In the frst approach, the jont dstrbuton s modeled va: P (, ) = 1 Z P () P () exp( δ ), (1) where δ = (X Y )(X Y ) T s the squared Eucldean dstance between latent varables of and and the normalzng factor s Z = P () P () exp( δ ). P () and P () are the emprcal margnal probabltes whch model the basness of enttes some enttes occur more frequently compared to others. Ths model s approprate for the hetero-undrected case. The second approach n s modelng the condtonal probablty nstead of the jont probablty: P ( ) = 1 Z P () exp( δ ), () where Z = P () exp( δ ), whch s useful n heterodrected. The latent varables can be found va mnmzng the Kullbac-Lebler dvergence between emprcal and model probabltes: P = arg mn D KL [ P P ]. The man ntuton behnd s that f two ponts are related then they should be very close n the latent space. Therefore, the result of embeddng can be presented as a vsualzaton of enttes. Although the embeddng s based on the relatonshp between enttes from dfferent types, we expect the enttes from the same type to be close due to the transtvty of dstance. Our proposed model, Bayes-, s smlar to for the basc probablty model. The man dfference, whch s our man contrbuton, s extendng to a fully generatve probablstc model for learnng co-occurrence data. Instead of Fg. 1. θ K Y θ I X N b N I j N K The graphcal model of Bayesan embeddng of co-occurrence data. the maxmum lelhood approach whch s very vulnerable to overfttng, we use a Bayesan approach whch has some classc advantages such as robustness. Although our generatve model outperforms even n a -dmensonal space, the dfference ntensfes n hgher dmensons. Nevertheless, t s not possble to nterpret more than 3 dmensons vsually. Our other contrbuton s proposng a query-based vsualzaton to embed a fltered number of enttes. Another contrbuton of our wor s related to how the bas of enttes s learned. Our algorthm uses a Bayesan approach, whereas bas parameters were estmated drectly from margnal emprcal probabltes n. III. BAYESIAN EMBEDDING OF CO-OCCURRENCE DATA In Bayesan co-occurrence data embeddng (Bayes-), the relatonshp between enttes s captured by embeddng them n a latent varable space. Here, we present Bayes- for the hetero-drected case and dervng the model for other cases s straghtforward. Followng the same notaton from prevous sectons, ndexes the enttes of the frst type and ndexes the enttes of the second type. Smlar to other latent space models, the dmenson D of the latent space s an algorthmc nput and can be chosen n a Bayesan manner. Here, we treat D as a nown parameter. The learned postons of enttes n the latent space are denoted by X for the enttes of the frst type and Y for the enttes of the second type. Furthermore, b and b represent the bases of enttes. By basness we refer to the stuaton n whch some enttes tend to occur more often, such as some words whch have hgher frequency than others. In the current model, we assume Gaussan prors on all latent varables. Ths s an arbtrary choce of dstrbuton and one may assume any other dstrbuton. The relatonshp between enttes s computed va the squared Eucldean dstance. The probablty model for ths data s defned as a condtonal probablty: P ( ) = 1 Z exp( δ(x, Y ) + b ), (3) where Z = exp(δ(x, Y ) + b ). Note that we do not nclude a bas parameter for entty, snce we are condtonng

3 on and the bas of has no effect. Even f b s nserted nto (3), t wll be canceled out. The graphcal model of the generatve process s shown n Fgure 1. Frst, latent varables are generated for the N I enttes of the frst type from the pror parameters θ I and for the N K enttes of the second type from the pror parameters θ K. Then for each of the N toens n entty, an entty of the frst type s chosen from N I possble enttes va a multnomal dstrbuton wth probabltes from (3). The generatve process can be summarzed as follows: 1) For each entty : a) Choose entty latent varable X N(µ 0I, σ 0I I) b) Choose entty bas varable b N(β 0I, ξ 0I ) ) For each entty : a) Choose entty latent varable Y N(µ 0K, σ 0K I) b) For each toen: ) Choose j Multnomal(P (. )) where I denotes the dentty matrx and P s computed from (3). IV. APPROXIMATE INFERENCE The lelhood of the whole dataset gven the assumpton that the probablty of each toen s ndependent from other toens gven the hdden varables s as follows: P (U X, Y, b) = P ( u u, X, Y, b) = P (, X, Y, b) v, u (4) where U s the set of all toens (.e. the whole dataset), and v s the number of tmes the toen (, ) has occurred. Gven (3) and (4), the log-lelhood s as follows: log P (U X, Y, b) = v. log v δ(x, Y ) + v. b exp( δ(x, Y ) + b ), (5) where v. = v and v. = v. To estmate the hdden varables X, Y and b, we can maxmze the log-lelhood (5). However, the maxmum lelhood approach has problems such as overfttng. A Bayesan approach wll result n a more robust soluton by gvng the posteror dstrbuton of hdden parameters. Gven the Bayes rules, the posteror dstrbuton of latent varables gven the data s as follows: P (X, Y, b U) = P (U X, Y, b)p (X)P (Y )P (b) P (U X, Y, b )P (X )P (Y )P (b )dx dy db, (6) whch s not computable analytcally due to the ntractablty of the ntegral n the denomnator. Therefore, we chose to use varaton nference [8] whch s a popular algorthm for approxmate nference n graphcal models. In varatonal approxmaton, nstead of fndng the true posteror, we estmate a varatonal dstrbuton for each latent varable. The man dea s mnmzng the dfference between the true posteror and the surrogate varatonal dstrbuton so then we can use the varatonal dstrbuton for mang nference about latent varables. If Q(X, Y, b) shows the varatonal dstrbuton over latent varables, we are nterested n mnmzng the Kullbac-Lebler dvergence between the true posteror and ts approxmaton: KL(Q(X, Y, b) P (X, Y, b U). Ths s equvalent to maxmzng a lower bound over the margnal probablty of data [1]: log P (U) E Q [log P (U X, Y, b)] KL(Q(X, Y, b) P (X, Y, b)), (7) where E Q [.] s expectaton wth regard to varatonal dstrbutons. Here we assume the varatonal dstrbutons are ndependent Gaussans wth the followng parameters: X N(µ, σ I), Y N(µ, σ I), and b N(β, ξ ), where I s the dentty matrx wth dmenson D. Note that t s possble to use a covarance matrx nstead of σ I and the only reason we chose to use ndependent coordnates s smplcty n optmzaton. Substtutng P (U X, Y, b) wth ts value from (5), the lower bound on the probablty of data n (7) can be wrtten as: L(µ, β, σ, ξ ) = v E Q [ δ(x, Y ) + b ] v. E Q [log exp( δ(x, Y ) + b )] KL(Q(X) P (X)) KL(Q(Y ) P (Y )) KL(Q(b) P (b)). (8) Gven the Gaussan dstrbuton for prors and varatonal dstrbutons, all ntegrals for computng the expectatons n (8) are analytcally solvable except for the part v.e Q [log exp(δ(x, Y ) + b )] whch s ntractable because of the log-sum-exp format. It s possble to use the concavty of the log functon to defne an upper bound on the log-sum-exp functon: log a φa log φ 1, (9) where the equalty holds ff φ = 1/a. Such an approach has been used n several wors n the varatonal nference context [4], [], [3]. Therefore, the new bound s as follows: L(µ, β, σ, ξ ) = constant + v E Q [δ(x, Y ) + b ] v. φ E Q [exp(δ(x, Y )+b )] KL(Q(X) P (X)) KL(Q(Y ) P (Y )) KL(Q(b) P (b)), (10) where the constant does not depend on decson varables and we set φ = [ E Q(exp( δ(x, Y ) + b ))] 1 to tghten the lower bound wth regard to the log-sum-exp part. The only trcy part n (10) s dervng the ntegral E Q [exp( δ (X, Y ) + b )]. Let x 1 N(µ 1, σ 1 ) and x N(µ, σ ), then t can be shown that the followng equaton

4 holds: E[exp( (x 1 x ) )] = exp( (µ 1 µ ) 1+(σ1 +σ)) 1 + (σ 1 + σ )) (11) and therefore we have: E Q [e δ (X,Y )+b ] = η D e η δ (µ,µ )+β +ξ / where η = [1 + (σ + σ ))] 1/. As a result, the lower bound n (10) can be wrtten as follows: L(µ, β, σ, ξ ) = constant v δ (µ, µ ) ( v. σ + v. σ)d + v. β v. φ η D exp( ηδ (µ, µ ) + β + ξ /) 1 [ D log σ + D σ σ + (µ µ 0I )(µ µ 0I ) T 0I σ0i ] 1 [ D log σ + D σ σ + (µ µ 0K )(µ µ 0K ) T 0K σ0k ] 1 [ log ξ + ξ ξ0 + (β β 0 ) ξ0 ]. (1) To fnd varatonal values, we need to optmze (1). Snce σ 0, to have an unconstraned optmzaton problem, we use an auxlary varable χ and the exponental functon n our experments: σ = exp(χ). Any unconstraned optmzaton algorthm can be used to solve (1). In our experments, we used gradent ascent wth multple random starts. V. QUERY-BASED VISUALIZATION Here, we present query-based vsualzaton for nformaton retreval; however, t can be appled to other dyadc data see [9] for a smlar approach n collaboratve flterng. In query-based vsualzaton (QBV), documents, query words, and the query are embedded n a Eucldean space to help the user n dentfyng documents of nterest. Unfortunately, dmensons s barely enough to capture the complexty of a data, whle hgher dmensons cannot be nterpreted vsually. Addtonally, representng all data to a user s beyond a person s processng ablty. Therefore, vsualzng only top- N (N can be specfed by user) documents s of nterest. These top-n documents can be chosen by an arbtrary retreval method. Therefore, a two-phase vsualzaton can be used. Frst, the data wll be embedded n a hgh-dmensonal space and then a group of enttes can be chosen va some flterng approach to be re-embedded n a -dmensonal space by classc algorthms such as multdmensonal scalng (MDS) [5]. The second embeddng phase s straghtforward, snce we already have the dstances from the frst phase whch satsfy all requrements for the Eucldean dstance and MDS can be appled drectly. In an nformaton retreval context, our proposed approach s embeddng words, documents, and the query n a hgh-dmensonal space, and then usng the dstances n that space, embeddng selected objects n a - dmensonal space va multdmensonal scalng. Another approach may be embeddng the fltered data drectly nto a -dmensonal space. However, such an approach s not desrable for two reasons. Frst, snce there are few enttes n the fltered data, generalzaton s expected to be poor whch deterorates the result. Second, embeddng enttes separately for each query s very tme consumng and neffcent, especally gven the hgh number of queres n the retreval systems. VI. EXPERIMENTS In our experments, we compare and Bayes- n the context of nformaton retreval. Text data s consdered as hetero-drected and so we use a condtonal probablty model. We study the goodness of embeddng usng two evaluaton metrcs. Frst s the proxmty of an embedded query to the relevant documents. Ths can be measured wth average precson (AP). The defnton of AP that we used s averagng the precson of all relevant documents at the pont they are retreved. Let AP (S, R) be the average precson where R s the set of relevant documents and S s the score of a method for ranng all documents for retreval. Then gven an embedded query q, the average precson s AP ( δq., R) where δq. shows the dstance of the query to all documents. Second, the proxmty of relevant documents s of nterest. It s mportant whether all relevant documents are close to each other compared to other documents. To measure ths, average relevant documents proxmty (ARDP) s proposed: ARDP = [ c R c AP ( δ., R c \{}) ]/N C, (13) R c 1 where c ndexes categores or queres (any group of relevant documents), ndexes documents, R c s the set of relevant documents n c, and N C s the total number of categores or queres. ARDP s partally smlar to doc-doc measure used n [7]. More precsely, we consder each relevant document as an embedded query and then we compute the AP of retrevng other relevant documents. Ths metrc measures how well we embed data, snce relevant documents are expected to be closer and the vsualzaton s better as a result. In Bayes-, we set the pror mean to 0 for both words and documents, and pror varance to 1 for words and for documents. No fne tunnng was done for fndng prors. We chose smaller pror varance for words due to the fact that the support for words s often lower compared to the support for documents. There are some words that occur only or 3 tmes whle the number of occurrences of documents (the number of words n them) s usually much hgher. Four datasets were used n our experments. We used subsets of the TDT- and Reuters1578 datasets 1. In these datasets, there are a number of categores and each document s related to one the documents n the same category are related. In each dataset, we selected 5 categores so that the number of 1

5 documents n categores s almost equal. Words that occurred fewer than 3 tmes were excluded. Our subset of TDT- ncludes 8,676 words and 1,584 documents and our subset of Reuters1578 ncludes 4,711 words and 103 documents. These two datasets were only used for evaluatng based on ARDP. For the nformaton retreval tas, we used datasets CRAN and MED. The CRAN dataset ncludes 3,763 words, 1398 documents and 5 queres, and MEDLINE ncludes 7014 words, 1033 documents, and 30 queres. In both and Bayes-, queres were treated as new documents and mapped nto the latent space. Then, the Eucldean dstance between queres and documents was used to compute a score for retreval. These data were used for evaluatng based on both AP and ARDP. Table I presents the result of ARDP for all datasets. We mplemented and Bayes- n a -dmensonal space and then computed the ARDP score. Bayes- outperforms n all 4 datasets (n CRAN the performance s close). TDT Reuters1579 CRAN MEDLINE Bayes TABLE I ARDP ON A -DIMENSIONAL SPACE (BEST RESULTS ARE BOLD) For evaluatng versus Bayes- n hgher dmensons, we used CRAN and MEDLINE. Fgures and 3 shows the result for average precson and Fgures 4 and 5 shows the result for ARDP, n CRAN and MEDLINE datasets respectvely. As expected, the performance of decreases as we ncrease the number of dmensons. Note that procedures such as tunng are not possble snce s an unsupervsed algorthm. CRAN dataset MEDLINE dataset AP ARDP AP ARDP Bayes--QBV TABLE II QUERY-BASED VISUALIZATION + BAYES- VERSUS WITH DIMENSIONS (BEST RESULTS ARE BOLD) Fnally, we explored query-based vsualzaton usng Bayes-. Frst, we selected top-100 documents for each query usng latent semantc ndexng [6] whch s a successful method n nformaton retreval. Note that t s possble to use Bayes- for flterng documents drectly but here we need a method for both and Bayes- for the sae of comparson. Then, we re-embedded all fltered documents, query words, and the query n a -dmensonal space va MDS wth dstances from mplementng Bayes- n a 100-dmensonal space. We compare the result to the s result n a -dmensonal space. Table II represents the result. The performance of query-based vsualzaton s dramatcally better than. Fgures 6 and 7 represent a typcal snapshot of vsualzng a query usng and query-based vsualzaton usng Bayes- respectvely, for a specfc query n the MEDLINE dataset. Note the dstncton between relevant and rrelevant documents n query-based vsualzaton whle they are hghly mxed n. Addtonally, the query s far away from other enttes n whch maes nterpretaton dffcult. Query words mght help user to dentfy whch area n the space s more relevant. VII. CONCLUSION In ths paper, we developed a Bayesan model based on the state-of-the-art vsualzaton model for co-occurrence. Our expermental studes reveal the superorty of the Bayesan approach. However, better embeddng n hgher dmensons s not useful for vsualzaton. Therefore, we proposed a method to embed fltered data from a hgh-dmensonal embeddng for a query query-based vsualzaton whch was successful n our experments. Query-based vsualzaton can be a bass for an nteractve user nterface, n whch a user receves her recommendatons n a vsual manner whle she can have a general pcture of relatonshps between her query, her eywords, and the top- N relevant documents. Then she can explore the vsual space and mar documents accordngly. Also, she mght have the opton of asng for more documents close to a documents or more words close to a word, and after re-embeddng she can have a more accurate pcture. Also, usng relevance feedbac technques, nown relevant documents can be used to formulate a more accurate query, fnd more relevant documents, and re-construct the pcture. REFERENCES [1] C. M. Bshop. Pattern Recognton and Machne Learnng (Informaton Scence and Statstcs). Sprnger-Verlag New Yor, Inc., Secaucus, NJ, USA, 006. [] D. Ble and J. Lafferty. Correlated topc models. Advances n Neural Informaton Processng Systems, 18:147, 006. [3] D. M. Ble and J. D. Lafferty. Dynamc topc models. In Proceedngs of the 3rd Internatonal Conference on Machne Learnng, ICML 06, pages , New Yor, NY, USA, 006. ACM. [4] G. Bouchard. Effcent bounds for the softmax functon, applcatons to nference n hybrd models. Advances n Neural Informaton Processng Systems, 007. [5] M. Cox and T. Cox. Multdmensonal scalng. Handboo of Data Vsualzaton, pages , 008. [6] S. Deerwester, S. Dumas, G. Furnas, T. Landauer, and R. Harshman. Indexng by latent semantc analyss. Journal of the Amercan socety for Informaton Scence, 41(6): , [7] A. Globerson, G. Chech, F. Perera, and N. Tshby. Eucldean embeddng of co-occurrence data. Journal of Machne Learnng Research, 8:65 95, 007. [8] M. Jordan, Z. Ghahraman, T. Jaaola, and L. Saul. An ntroducton to varatonal methods for graphcal models. Machne Learnng, 37():183 33, [9] M. Khoshneshn and W. N. Street. Collaboratve flterng va Eucldean embeddng. In Proceedngs of the Fourth ACM Conference on Recommender Systems, RecSys 10, pages 87 94, New Yor, NY, USA, 010. ACM. [10] J. Zhang. Vsualzaton for Informaton Retreval. Sprnger Verlag, 008.

6 Bayes Average precson Average precson Bayes # of dmensons # of dmensons Fg.. The average precson results for CRAN dataset Fg. 3. The average precson results for MEDLINE dataset Bayes Bayes 0.4 ARDP ARDP # of dmensons # of dmensons Fg. 4. The ARDP results for CRAN dataset Fg. 5. The ARDP results for MEDLINE dataset 6 Irrelevant documents Relevant documents Query words query 5 4 effect 3 effect renal system azathoprn regard erythematosus lupus system azathoprn regard lupus erythematosus leson renal leson Fg. 6. Vsualzaton of a typcal query from MEDLINE dataset usng wth dmensons Fg. 7. Vsualzaton of a typcal query from MEDLINE dataset usng query-based vsualzaton + Bayes- wth 100 dmensons

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Retrieval Models: Language models

Retrieval Models: Language models CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Probability Density Function Estimation by different Methods

Probability Density Function Estimation by different Methods EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

arxiv: v2 [stat.me] 26 Jun 2012

arxiv: v2 [stat.me] 26 Jun 2012 The Two-Way Lkelhood Rato (G Test and Comparson to Two-Way χ Test Jesse Hoey June 7, 01 arxv:106.4881v [stat.me] 6 Jun 01 1 One-Way Lkelhood Rato or χ test Suppose we have a set of data x and two hypotheses

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Speech and Language Processing

Speech and Language Processing Speech and Language rocessng Lecture 3 ayesan network and ayesan nference Informaton and ommuncatons Engneerng ourse Takahro Shnozak 08//5 Lecture lan (Shnozak s part) I gves the frst 6 lectures about

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Introduction to Hidden Markov Models

Introduction to Hidden Markov Models Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Statistical learning

Statistical learning Statstcal learnng Model the data generaton process Learn the model parameters Crteron to optmze: Lkelhood of the dataset (maxmzaton) Maxmum Lkelhood (ML) Estmaton: Dataset X Statstcal model p(x;θ) (θ parameters)

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Sparse Gaussian Processes Using Backward Elimination

Sparse Gaussian Processes Using Backward Elimination Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Differentiating Gaussian Processes

Differentiating Gaussian Processes Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one) Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Probabilistic Classification: Bayes Classifiers. Lecture 6: Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Split alignment. Martin C. Frith April 13, 2012

Split alignment. Martin C. Frith April 13, 2012 Splt algnment Martn C. Frth Aprl 13, 2012 1 Introducton Ths document s about algnng a query sequence to a genome, allowng dfferent parts of the query to match dfferent parts of the genome. Here are some

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Multi-Conditional Learning for Joint Probability Models with Latent Variables

Multi-Conditional Learning for Joint Probability Models with Latent Variables Mult-Condtonal Learnng for Jont Probablty Models wth Latent Varables Chrs Pal, Xueru Wang, Mchael Kelm and Andrew McCallum Department of Computer Scence Unversty of Massachusetts Amherst Amherst, MA USA

More information

Laboratory 1c: Method of Least Squares

Laboratory 1c: Method of Least Squares Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Expectation propagation

Expectation propagation Expectaton propagaton Lloyd Ellott May 17, 2011 Suppose p(x) s a pdf and we have a factorzaton p(x) = 1 Z n f (x). (1) =1 Expectaton propagaton s an nference algorthm desgned to approxmate the factors

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

System Identification of Nonlinear State-Space Models with Linearly Dependent Unknown Parameters Based on Variational Bayes

System Identification of Nonlinear State-Space Models with Linearly Dependent Unknown Parameters Based on Variational Bayes SICE Journal of Control Measurement and System Integraton Vol. 11 No. 6 pp. 456 462 November 2018 System Identfcaton of Nonlnear State-Space Models wth Lnearly Dependent Unknown Parameters Based on Varatonal

More information