Kernels for Structured Natural Language Data

Size: px
Start display at page:

Download "Kernels for Structured Natural Language Data"

Transcription

1 Kernels for Structured atural Language Data Jun Suzuki, Yutaka Sasaki, and Eisaku Maeda TT Communication Science Laboratories, TT Corp -4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan {jun, sasaki, Abstract This paper devises a novel kernel function for structured natural language data In the field of atural Language rocessing, feature extraction consists of the following two steps: () syntactically and semantically analyzing raw data, ie, character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; () creating (possibly high-dimensional) numerical feature vectors from the discrete structures The new kernels, called Hierarchical Directed Acyclic raph (HDA) kernels, directly accept DAs whose nodes can contain DAs HDA data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have In this paper, we define the kernel function and show how it permits efficient calculation Experiments demonstrate that the proposed kernels are superior to existing kernel functions, eg, sequence kernels, tree kernels, and bag-of-words kernels Introduction Recent developments in kernel technology enable us to handle discrete structures, such as sequences, trees, and graphs Kernel functions suitable for atural Language rocessing (L) have recently been proposed Convolution Kernels [4, ] demonstrate how to build kernels over discrete structures Since texts can be analyzed as discrete structures, these discrete kernels have been applied to L tasks, such as sequence kernels [8, 9] for text categorization and tree kernels [, ] for (shallow) parsing In this paper, we focus on tasks in the application areas of L, such as Machine Translation, Text Summarization, Text Categorization and Question Answering In these tasks, richer types of information within texts, such as syntactic and semantic information, are required for higher performance However, syntactic information and semantic information are formed by very complex structures that cannot be written in simple structures, such as sequences and trees The motivation of this paper is to propose kernels specifically suited to structured natural language data The proposed kernels can handle several of the structures found within texts and calculate kernels with regard to these structures at a practical cost and time Accordingly, these kernels can be efficiently applied to learning and clustering problems in L applications

2 Koizumi Text : Koizumi J un ic h ir o is p r ime min is t e r of J a p a n ( ) r e s ul t of a p a r t - of - s p e e c h t a g g e r ( ) r e s ul t of a n oun p h r a s e c h un k e r Koizumi J un ic h ir o is p r ime min is t e r of J a p a n V B J J J I J un ic h ir o Koizumi is p r ime min is t e r of J a p a n ( 3 ) r e s ul t of a n a me d e n t it ie s t a g g e r ( 4 ) r e s ul t of a d e p e n d e n c y s t r uc t ur e a n a l y ze r J un ic h ir o Koizumi is p r ime min is t e r of J a p a n oun t r y C e r s on Koizumi J un ic h ir o is p r ime min is t e r of J a p a n ( 5 ) s e ma n t ic in f or ma t ion f r om d ic t ion a r y ( e g W or d - e t ) ( ) - ( 5 ) J un ic h ir okoizumi is p r ime min is t e r of J a p a n [n um b er ] [executive] [executive d ir ecto r ] [A s ia n C o un tr y ] [A s ia n n a tio n ] J a p a n J u n i c h i r o K o i z u m i i s p r i m e m i n i s ter o f V B J J J I er s o n C o u n tr y Figure : Examples of structures within texts as determined by basic L tools Structured atural Language Data for Application Tasks in L In general, natural language data contain many kinds of syntactic and semantic structures For example, texts have several levels of syntactic and semantic chunks, such as part-ofspeech (OS) chunks, named entities (Es), noun phrase () chunks, sentences, and discourse segments, and these are bound by relation structures, such as dependency structures, anaphora, discourse relations and coreference These syntactic and semantic structures can provide important information for understanding natural language and, moreover, tackling real tasks in application areas of L The accuracies of basic L tools such as OS taggers, chunkers, E taggers, and dependency structure analyzers have improved to the point that they can help to develop real applications This paper proposes a method to handle these syntactic and semantic structures in a single framework: We combine the results of basic L tools to make one hierarchically structured data set Figure shows an example of structures within texts analyzed by basic L tools that are currently available and that offer easy use and high performance As shown in Figure, structures in texts can be hierarchical or recursive graphs in graph A certain node can be constructed or characterized by other graphs odes usually have several kinds of attributes, such as words, OS tags, semantic information such as Wordet [3], and classes of the named entities Moreover, the relations between nodes are usually directed Therefore, we should employ a () directed, () multi-labeled, and (3) hierarchically structured graph to model structured natural language data Let V be a set of vertices (or nodes) and E be a set of edges (or links) Then, a graph = (V, E) is called a directed graph if E is a set of directed links E V V Definition (Multi-Labeled raph) Let Γ be a set of labels (or attributes) and M V Γ be label allocations Then, = (V, E, M) is called a multi-labeled graph Definition (Hierarchically Structured raph) Let i = (V i, E i ) be a subgraph in = (V, E) where V i V and E i E, and = {,, n } be a set of subgraphs in F V represents a set of vertical links from a node v V to a subgraph i Then, = (V, E,, F ) is called a hierarchically structured graph if each node has at most one vertical edge Intuitively, vertical link f i,j F from node v i to graph j indicates that node v i contains graph j Finally, in this paper, we successfully represent structured natural language data by using a multi-labeled hierarchical directed graph Definition 3 (Multi-Labeled Hierarchical Directed raph) = (V, E, M,, F ) is a multi-labeled hierarchical directed graph

3 c j A r a p h i ca l m o d e l o f s t r u ct u r e s w i t h i n t e x t a c b d a e V b a c d : c h u nk : r el a t i on b et w een c h u nk s M u l t i - L a b e l e d H i e r a r ch i ca l D i r e ct e d r a p h : q a q q3 e, 4 f, e 3, 4 q4 e, 5 b e 6, 4 q5 q6 d e 5, 7 q7 q8 f5, f7, 3 a : 3 r r 3 e r e, 3 b V e, 4 r 4 f, f4, r 5 r 6 e 6, 5 a e 7, 3 3 r 7 r 8 c e 7, 8 f6, 3 d : node qi, ri : s u b g r a p h j : di r ec t ed l i nk : v er t i c a l l i nk e i,j fi, Figure : Examples of Hierarchical Directed raph structures (these are also HDA): each letter represents an attribute Figure shows examples of multi-labeled hierarchical directed graphs In this paper, we call a multi-labeled hierarchical directed graph a hierarchical directed graph 3 Kernels on Hierarchical Directed Acyclic raph At first, in order to calculate kernels efficiently, we add one constraint: that the hierarchical directed graph has no cyclic paths First, we define a path on a Hierarchical Directed raph If a node has no vertical link, then the node is called a terminal node, which is denoted as T V ; otherwise it is a non-terminal node, which is denoted as T V Definition 4 (Hierarchical ath (Hi)) Let p = v i, e i,j, v j,, v k, e k,l, v l be a path Let Υ(v) be a function that returns a subgraph i that is linked with v by a vertical link if v T Let () be a function that returns the set of all His in, where links between v and v / are ignored Then, p h = h(v i ), e i,j, h(v j ),, h(v k ), e k,l, h(v l ) is defined as a Hi, where h(v) returns vp h x, p h x ( x ) st x = Υ(v) if v T otherwise returns v Intuitively, a Hi is constructed by a path in the path structure, eg, p h = v i, e i,j, v j v m, e m,n, v n,, v k, e k,l, v l Definition 5 (Hierarchical Directed Acyclic raph (HDA)) hierarchical directed graph = (V, E, M,, F ) is an HDA if there is no Hi from any node v to the same node v A primitive feature for defining kernels on HDAs is a hierarchical attribute subsequence Definition 6 (Hierarchical Attribute Subsequence (HiAS)) A HiAS is defined as a list of attributes with hierarchical information extracted from nodes on His For example, let p h = v i, e i,j, v j v m, e m,n, v n,, v k, e k,l, v l be a Hi, then, HiASs in p h are written as τ(p h ) = a i, a j a m, a n,, a k, a l, which is all combinations for all a i τ(v i ), where τ(v) of node v is a function that returns the set of attributes allocated to node v, and τ(p h ) of Hi p h is a function that returns all possible HiASs extracted from Hi p h Γ denotes all possible HiASs constructed by the attribute in Γ and γ i Γ denotes the i th HiAS An explicit representation of a feature vector of an HDA kernel is defined as φ() = (φ (),, φ Γ ()), where φ represents the explicit feature mapping from HDA to the numerical feature space The value of φ i () becomes the weighted number of occurrences of γ i in According to this approach, the HDA kernel, K(, ) = Γ i= φ i( ) φ i ( ), calculates the inner product of the weighted common HiASs in

4 j : v :0 L5 :0 Li : w e i g h t o f La b i e l L f, :0 8 : node v i v i : w e i g h t o f n o v i d e v :0 8 v 3 :0 7 v 4 :0 9 e3, 4 :0 7 L :0 4 L3 : L4 :0 9 L :0 5 e, 4 :0 6 : s u b g r a p h j : di r ec t ed l i nk : v er t i c a l l i nk ei,j fi, ei, j : w e i g h t o f d i r e c t e d l i n k f r o m v i t o v j j f i, : w e i g h t o f v e r t i c a l l i n k f o m r v i t o j Figure 3: An Example of Hierarchical Directed raph with weight factors two HDAs, and In this paper, we use stand for the meaning of such that, since it is simple K HDA (, ) = W γi (p h )W γi (p h ), () γ i Γ γ i τ(p h ) ph ( ) γ i τ(p h ) ph ( ) where W γi (p h ) represents the weight value of HiAS γ i in Hi p h The weight of HiAS γ i in Hi p h is determined by W γi (p h ) = W V (v) W E (v i, v j ) W F (v i, j ) W Γ (a), () v V (p h ) e i,j E(p h ) f i,j F (p h ) a τ(γ i) where W V (v), W E (v i, v j ), W F (v i, j ), and W Γ (a) represent the weight of node v, link from v i to v j, vertical link from v i to subgraph j, and attribute a, respectively An example of how each weight factor is given is shown in Figure 3 In the case of L data, for example, W Γ (a) might be given by the score of tf idf from large scale documents, W V (v) by the type of chunk such as word, phrase or named entity, W E (v i, v j ) by the type of relation between v i and v j, and W F (v i, j ) by the number of nodes in j Soft Structural Matching Frameworks Since HDA kernels permit not only the exact matching of substructures but also approximate matching, we add the framework of node skip and relaxation of hierarchical information First, we discuss the framework of the node skip We introduce decay function Λ V (v)(0 < Λ V (v) ), which represents the cost of skipping node v when extracting HiASs from the His, which is almost the same architecture as [8] For example, a HiAS under the node skips is written as a, a 3,, a 5 from Hi v v, v 3, v 4, v 5, where is the explicit representation of a node that is skipped ext, in the case of the relaxation of hierarchical information, we perform two processes: () we form one hierarchy if there is multiple hierarchy information in the same point, for example, a i, a j, a k becomes a i, a j, a k ; and () we delete hierarchical information if there exists only one node, for example, a i, a j, a k becomes a i, a j, a k These two frameworks achieve approximate substructure matching automatically Table shows an explicit representation of the common HiASs (features) of and in Figure For the sake of simplicity, for all the weights W V (v), W E (v i, v j ), W F (v i, j ), and W Γ (a), are taken as and for all v, Λ V (v) = λ if v has at least one attribute, otherwise Λ V (v) = Efficient Recursive Computation In general, when the dimension of the feature space Γ becomes very high, it is computationally infeasible to generate feature vector φ() explicitly We define an efficient calculation formula between HDAs and, which is written as: K HDA (, ) = K(q, r), (3) q Q r R

5 Table : Common HiASs of and in Figure : (S represents the node skip, HR represents the relaxation of hierarchical information) S S+ HR HiAS with HiAS value HiAS with HiAS value common HiAS value common HiAS value a a a a a a b b b b b b c c c c c c d d d d d d c, b c, b c, b c, b c, b c, b d, b d, b d, b d, b - 0 b, d a a a a a a c c c c - 0 c,, a, a λ 3, a, a λ, a λ 4, a λ 4,,, λ,,, λ, λ, b, b, b, b, b, b, d, d λ, d, d λ - 0, d λ b,, a b, a λ 3 b, a b, a λ b, a λ 4 b, a λ 4 b,, b, λ b, b, b, λ b, λ b, d b, d λ b, d b, d λ - 0 b, d λ c,, a c, a λ 3 c, a c, a - 0 c, a λ 3 c, d c, d λ c, d c, d - 0 c, d λ d, a d, a λ d, a d, a - 0 d, a λ,, a, a λ, a, a, a λ, a λ b,, a b, a λ b, a b, a b, a λ b, a λ, b,, a, b, a λ 3, b, a, b, a λ, b, a λ 4, b, a λ 4, b,,, b, λ, b,, b,, b, λ, b, λ, b, d, b, d λ, b, d, b, d λ - 0, b, d λ, b,, a, b, a λ, b, a, b, a, b, a λ, b, a λ where Q = {q,, q Q } and R = {r,, r R } represent nodes in and, respectively K(q, r) represents the sum of the weighted common HiASs that are extracted from the His whose sink nodes are q and r K(q, r) = J,(q, r)h(q, r) + Ĥ(q, r)i(q, r) + I(q, r) (4) Function I(q, r) returns the weighted number of common attributes of nodes q and r, I(q, r) = W V (q)w V (r) W Γ (a )W Γ (a )δ(a, a ), (5) a τ(q) a τ(r) where δ(a, a ) = if a = a, and 0 otherwise Let H(q, r) be a function that returns the sum of the weighted common HiASs between q and r including Υ(q) and Υ(r) { H(q, r) = I(q, r) + (I(q, r) + ΛV (q)λ V (r)) Ĥ(q, r), if q, r T (6) I(q, r), otherwise Ĥ(q, r) = W F (q, i )W F (r, j )J i,j (s, t) (7) s i i =Υ(q) t j j =Υ(r) Let J x,y (q, r), J x,y(q, r), and J x,y(q, r), where x, y are (sub)graphs, be recursive functions to calculate H(q, r) and K(q, r) J x,y (q, r) = J x,y(q, r)h(q, r) + H(q, r) (8) W E (q, t) ( Λ J x,y(q, V (t)j x,y(q, t)+j x,y (q, t) ), if ψ(r) r) = t {ψ(r) V (y)} (9) W E (s, r) ( Λ J x,y(q, V (s)j x,y(s, r)+j x,y(s, r) ), if ψ(q) r) = s {ψ(q) V (x)} (0)

6 where Λ V (v) = Λ V (v) t i i=υ(v) Λ V (t) if v T, Λ V (v) = Λ V (v) otherwise Function ψ(q) returns a set of nodes that have direct links to node q ψ(q) = means that no node has a direct link to s ext, we show the formula when using the framework of relaxation of hierarchical information The functions have the same meanings as in the previous formula We denote H(q, r) = H(q, r) + H (q, r) K(q, r) = J, (q, r) H(q, r) + ( H (q, r) + H (q, r) ) I(q, r) + I(q, r) () H(q, r) = ( H (q, r) + H (q, r) ) I(q, r) + H (q, r) + I(q, r) () W F (r, H j ) H(q, t), if r T (q, r) = t j j =Υ(r) (3) W F (q, i )H(s, r) + Ĥ(q, r), if q, r T s H i i =Υ(q) (q, r) = W F (q, i )H(s, r), if q T (4) s i i =Υ(q) J x,y (q, r) = J x,y(q, r) H(q, r) (5) W E (q, t) ( Λ J x,y(q, V (t)j x,y(q, t)+j x,y (q, t)+ H(q, t) ), if ψ(r) r) = t {ψ(r) V (y)} (6) Functions I(q, r), J x,y(q, r), and Ĥ(q, r) are the same as those shown above According to equation (3), given the recursive definition of K HDA (q, r), the value between two HDAs can be calculated in time O( Q R ) In actual use, we may want to evaluate only the subset of all HiASs whose sizes are under n when determining the kernel value because of the problem discussed in [] This can simply realized by not calculating those HiASs whose size exceeds n when calculating K(q, r); the calculation cost becomes O(n Q R ) Finally, we normalize the values of the HDA kernels to remove any bias introduced by the number of nodes in the graphs This normalization corresponds to the standard unit norm normalization of examples in the feature space corresponding to the kernel space ˆK(x, y) = K(x, y) (K(x, x)k(y, y)) / [4] We will now elucidate an efficient processing algorithm First, as a pre-process, the nodes are sorted under two conditions, V (Υ(v)) v and Ψ(v) v, where Ψ(v) represents all nodes that have a path to v The dynamic programming technique can be used to compute HDA kernels very efficiently: By following the sorted order, the values that are needed to calculate K(q, r) have already been calculated in the previous calculation 4 Experiments Our aim was to test the efficiency of using the richer syntactic and semantic structures available within texts, which can be treated now for the first time by our proposed method We evaluated the performance of the proposed method in the actual L task of Question Classification, which is similar to the Text Classification task except that it requires many

7 question: Who i s p r i m e m i n i s t e r of J a p a n? w or d or d e r of a t t r i b u t e s ( S e q - K ) hi e r a r c hi c a l c hu n k s a n d t he i r r e l a t i on s ( H D A - K ) W h o is p r im e m inister of J a p a n? J J I W V B J C ountr y [n um b er ] [executive] [A s ia n C o un tr y ] W h o is p r im e m inister of W B J J V J I um b [executive] [n er ] J a p a n [A s ia n C o un tr y ]? d e p e n d e n c y s t r u c t u r e s of a t t r i b u t e s ( D S - K, D A - K ) C ountr y W h o is p r im e m inister of B J J W V J I um b [executive] [n er ] J a p a n C ountr y [A s ia n C o un tr y ]? Figure 4: Examples of input data of comparison methods Table : Results of question classification by SVM with comparison kernel functions evaluated by F-measure TIME TO LOCATIO ORAIZATIO UMEX n 3 4 HDA-K DA-K DS-K Seq-K BOW-K more semantic features within texts [7, 0] We used three different QA data sets written in Japanese [0] We compared the performance of the proposed kernel, the HDA Kernel (HDA-K), with DA kernels (DA-K), Dependency Structure kernels (DS-K) [], and sequence kernels (Seq-K) [9] Moreover, we evaluated the bag-of-words kernel (BOW-K) [6], that is, the bag-of-words with polynomial kernels, as the baseline method The main difference between each method is the ability to treat syntactic and semantic information within texts Figure 4 shows the differences of input objects between each method For better understanding, these examples are shown in English We used words, named entity tags, and semantic information [5] for attributes Seq-K only treats word order, DS-K and DA-K treat dependency structures, and HDA-K treats the and E chunks with their dependency structures We used the same formula with our proposed method for DA-K Comparing HDA-K to DA-K shows the difference in performance between handling the hierarchical structures and not handling them We extended Seq-K and DS-K to improve the total performance and to establish a more equal evaluation, with the same conditions, against our proposed method ote that though DA-K and DS-K handle input objects of the same form, their kernel calculation methods differ as do their return values We used node skip parameter Λ V (v) = 05 for all nodes v in each comparison We used SVM [] as a kernel-based machine learning algorithm We evaluated the performance of the comparison methods with question type TIME TO, ORAIZATIO, LOCATIO, and UMEX, which are defined in the CRL QA-data Table shows the average F-measure as evaluated by 5-fold cross validation n in Table indicates the threshold of an attribute s number, that is, we evaluated only those HiASs that contain less than n-attributes for each kernel calculation As shown in this table, HDA- K showed the best performance in the experiments The experiments in this paper were designed to investigate how to improve the performance by using the richer syntactic and semantic structures within texts In the task of Question Classification, a given question is classified into Question Type, which reflects the intention of the question These results sekine/roject/crlqa/

8 indicate that our approach, incorporating richer structure features within texts, is well suited to the tasks in the L applications The original DS-K requires exact matching of the tree structure, even when it is extended for more flexible matching This is why DS-K showed the worst performance in our experiments The sequence, DA, and HDA kernels offer approximate matching by the framework of node skip, which produces better performance in the tasks that evaluate the intention of the texts The structure of HDA approaches that of DA if we do not consider the hierarchical structure In addition, the structures of sequences and trees are entirely included in that of DA Thus, the HDA kernel subsumes some of the discrete kernels, such as sequence, tree, and graph kernels 5 Conclusions This paper proposed HDA kernels, which can handle more of the rich syntactic and semantic information present within texts Our proposed method is a very generalized framework for handling structured natural language data We evaluated the performance of HDA kernels with the real L task of question classification Our experiments showed that HDA kernels offer better performance than sequence kernels, tree kernels, and the baseline method bag-of-words kernels if the target task requires the use of the richer information within texts References [] M Collins and Duffy Convolution Kernels for atural Language In roc of eural Information rocessing Systems (IS 00), 00 [] M Collins and Duffy arsing with a Single euron: Convolution Kernels for atural Language roblems In Technical Report UCS-CRL-0-0 UC Santa Cruz, 00 [3] C Fellbaum Wordet: An Electronic Lexical Database MIT ress, 998 [4] D Haussler Convolution Kernels on Discrete Structures In Technical Report UCS-CRL-99-0 UC Santa Cruz, 999 [5] S Ikehara, M Miyazaki, S Shirai, A Yokoo, H akaiwa, K Ogura, Y Oyama, and Y Hayashi, editors The Semantic Attribute System, oi-taikei A Japanese Lexicon, volume Iwanami ublishing, 997 (in Japanese) [6] T Joachims Text Categorization with Support Vector Machines: Learning with Many Relevant Features In roc of European Conference on Machine Learning(ECML 98), pages 37 4, 998 [7] X Li and D Roth Learning Question Classifiers In roc of the 9th International Conference on Computational Linguistics (COLI 00), pages , 00 [8] H Lodhi, C Saunders, J Shawe-Taylor, Cristianini, and C Watkins Text Classification Using String Kernel Journal of Machine Learning Research, :49 444, 00 [9] Cancedda and E aussier and C outte and J-M Renders Word-Sequence Kernels Journal of Machine Learning Research, 3:059 08, 003 [0] J Suzuki, H Taira, Y Sasaki, and E Maeda Question Classification using HDA Kernel In Workshop on Multilingual Summarization and Question Answering (003), pages 6 68, 003 [] V Vapnik The ature of Statistical Learning Theory Springer, 995 [] C Watkins Dynamic Alignment Kernels In Technical Report CSD-TR-98- Royal Holloway, University of London Computer Science Department, 999

NTT/NAIST s Text Summarization Systems for TSC-2

NTT/NAIST s Text Summarization Systems for TSC-2 Proceedings of the Third NTCIR Workshop NTT/NAIST s Text Summarization Systems for TSC-2 Tsutomu Hirao Kazuhiro Takeuchi Hideki Isozaki Yutaka Sasaki Eisaku Maeda NTT Communication Science Laboratories,

More information

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning A Deterministic Word Dependency Analyzer Enhanced With Preference Learning Hideki Isozaki and Hideto Kazawa and Tsutomu Hirao NTT Communication Science Laboratories NTT Corporation 2-4 Hikaridai, Seikacho,

More information

Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling

Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling Wanxiang Che 1, Min Zhang 2, Ai Ti Aw 2, Chew Lim Tan 3, Ting Liu 1, Sheng Li 1 1 School of Computer Science and Technology

More information

Representing structured relational data in Euclidean vector spaces. Tony Plate

Representing structured relational data in Euclidean vector spaces. Tony Plate Representing structured relational data in Euclidean vector spaces Tony Plate tplate@acm.org http://www.d-reps.org October 2004 AAAI Symposium 2004 1 Overview A general method for representing structured

More information

Chunking with Support Vector Machines

Chunking with Support Vector Machines NAACL2001 Chunking with Support Vector Machines Graduate School of Information Science, Nara Institute of Science and Technology, JAPAN Taku Kudo, Yuji Matsumoto {taku-ku,matsu}@is.aist-nara.ac.jp Chunking

More information

Support Vector Machines: Kernels

Support Vector Machines: Kernels Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems

More information

A Support Vector Method for Multivariate Performance Measures

A Support Vector Method for Multivariate Performance Measures A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme

More information

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS Filippo Portera and Alessandro Sperduti Dipartimento di Matematica Pura ed Applicata Universit a di Padova, Padova, Italy {portera,sperduti}@math.unipd.it

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Analysis of Multiclass Support Vector Machines

Analysis of Multiclass Support Vector Machines Analysis of Multiclass Support Vector Machines Shigeo Abe Graduate School of Science and Technology Kobe University Kobe, Japan abe@eedept.kobe-u.ac.jp Abstract Since support vector machines for pattern

More information

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR Machine Learning for Interpretation of Spatial Natural Language in terms of QSR Parisa Kordjamshidi 1, Joana Hois 2, Martijn van Otterlo 1, and Marie-Francine Moens 1 1 Katholieke Universiteit Leuven,

More information

Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Prenominal Modifier Ordering via MSA. Alignment

Prenominal Modifier Ordering via MSA. Alignment Introduction Prenominal Modifier Ordering via Multiple Sequence Alignment Aaron Dunlop Margaret Mitchell 2 Brian Roark Oregon Health & Science University Portland, OR 2 University of Aberdeen Aberdeen,

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures

A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures Tomoya Iwakura Fujitsu Laboratories Ltd iwakura.tomoya@jp.fujitsu.com Abstract Research in text classification

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 3 Modeling Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Retrieval Evaluation, Modern Information Retrieval,

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

On-Line Learning with Path Experts and Non-Additive Losses

On-Line Learning with Path Experts and Non-Additive Losses On-Line Learning with Path Experts and Non-Additive Losses Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Manfred Warmuth (UC Santa Cruz) MEHRYAR MOHRI MOHRI@ COURANT

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Dynamic Time-Alignment Kernel in Support Vector Machine

Dynamic Time-Alignment Kernel in Support Vector Machine Dynamic Time-Alignment Kernel in Support Vector Machine Hiroshi Shimodaira School of Information Science, Japan Advanced Institute of Science and Technology sim@jaist.ac.jp Mitsuru Nakai School of Information

More information

Spatial Role Labeling CS365 Course Project

Spatial Role Labeling CS365 Course Project Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

A Result of Vapnik with Applications

A Result of Vapnik with Applications A Result of Vapnik with Applications Martin Anthony Department of Statistical and Mathematical Sciences London School of Economics Houghton Street London WC2A 2AE, U.K. John Shawe-Taylor Department of

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

The Decision List Machine

The Decision List Machine The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London

Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London Space-Time Kernels Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London Joint International Conference on Theory, Data Handling and Modelling in GeoSpatial Information Science, Hong Kong,

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Structured Output Prediction: Generative Models

Structured Output Prediction: Generative Models Structured Output Prediction: Generative Models CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 17.3, 17.4, 17.5.1 Structured Output Prediction Supervised

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Support Vector Machines Explained

Support Vector Machines Explained December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

CLRG Biocreative V

CLRG Biocreative V CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range

More information

Support Vector Machines and Kernel Algorithms

Support Vector Machines and Kernel Algorithms Support Vector Machines and Kernel Algorithms Bernhard Schölkopf Max-Planck-Institut für biologische Kybernetik 72076 Tübingen, Germany Bernhard.Schoelkopf@tuebingen.mpg.de Alex Smola RSISE, Australian

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Automatic Evaluation in Text Summarization

Automatic Evaluation in Text Summarization 1 Automatic Evaluation in Text Summarization Hidetsugu Nanba Tsutomu Hirao Hiroshima City University nanba@hiroshima-cuacjp, http://wwwnlpitshiroshima-cuacjp/~nanba/ NTT NTT Communication Science Laboratories

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

On the complexity of shallow and deep neural network classifiers

On the complexity of shallow and deep neural network classifiers On the complexity of shallow and deep neural network classifiers Monica Bianchini and Franco Scarselli Department of Information Engineering and Mathematics University of Siena Via Roma 56, I-53100, Siena,

More information

Computational Models - Lecture 3

Computational Models - Lecture 3 Slides modified by Benny Chor, based on original slides by Maurice Herlihy, Brown University. p. 1 Computational Models - Lecture 3 Equivalence of regular expressions and regular languages (lukewarm leftover

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Predicting New Search-Query Cluster Volume

Predicting New Search-Query Cluster Volume Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive

More information

Semi-Supervised Learning

Semi-Supervised Learning Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human

More information

Mapping kernels defined over countably infinite mapping systems and their application

Mapping kernels defined over countably infinite mapping systems and their application Journal of Machine Learning Research 20 (2011) 367 382 Asian Conference on Machine Learning Mapping kernels defined over countably infinite mapping systems and their application Kilho Shin yshin@ai.u-hyogo.ac.jp

More information

Lecture Notes on From Rules to Propositions

Lecture Notes on From Rules to Propositions Lecture Notes on From Rules to Propositions 15-816: Substructural Logics Frank Pfenning Lecture 2 September 1, 2016 We review the ideas of ephemeral truth and linear inference with another example from

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Fast String Kernels Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Alex.Smola@anu.edu.au joint work with S.V.N. Vishwanathan Slides (soon) available

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Course 10. Kernel methods. Classical and deep neural networks.

Course 10. Kernel methods. Classical and deep neural networks. Course 10 Kernel methods. Classical and deep neural networks. Kernel methods in similarity-based learning Following (Ionescu, 2018) The Vector Space Model ò The representation of a set of objects as vectors

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Model-Theory of Property Grammars with Features

Model-Theory of Property Grammars with Features Model-Theory of Property Grammars with Features Denys Duchier Thi-Bich-Hanh Dao firstname.lastname@univ-orleans.fr Yannick Parmentier Abstract In this paper, we present a model-theoretic description of

More information

Learning From Data Lecture 26 Kernel Machines

Learning From Data Lecture 26 Kernel Machines Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100 recap: The Kernel Allows Us to Bypass Z-space

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

What we have done so far

What we have done so far What we have done so far DFAs and regular languages NFAs and their equivalence to DFAs Regular expressions. Regular expressions capture exactly regular languages: Construct a NFA from a regular expression.

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

6.891: Lecture 24 (December 8th, 2003) Kernel Methods

6.891: Lecture 24 (December 8th, 2003) Kernel Methods 6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

The Set Covering Machine with Data-Dependent Half-Spaces

The Set Covering Machine with Data-Dependent Half-Spaces The Set Covering Machine with Data-Dependent Half-Spaces Mario Marchand Département d Informatique, Université Laval, Québec, Canada, G1K-7P4 MARIO.MARCHAND@IFT.ULAVAL.CA Mohak Shah School of Information

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp. On different ensembles of kernel machines Michiko Yamana, Hiroyuki Nakahara, Massimiliano Pontil, and Shun-ichi Amari Λ Abstract. We study some ensembles of kernel machines. Each machine is first trained

More information

Encoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case

Encoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case Encoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case Alessandro Moschitti DII University of Trento Via ommarive 14 38100 POVO (T) - Italy moschitti@dit.unitn.it

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised

More information

Deep Learning For Mathematical Functions

Deep Learning For Mathematical Functions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Fast Kernels for String and Tree Matching

Fast Kernels for String and Tree Matching Fast Kernels for String and Tree Matching S. V. N. Vishwanathan Dept. of Comp. Sci. & Automation Indian Institute of Science Bangalore, 560012, India vishy@csa.iisc.ernet.in Alexander J. Smola Machine

More information

Software Process Models there are many process model s in th e li t e ra t u re, s om e a r e prescriptions and some are descriptions you need to mode

Software Process Models there are many process model s in th e li t e ra t u re, s om e a r e prescriptions and some are descriptions you need to mode Unit 2 : Software Process O b j ec t i ve This unit introduces software systems engineering through a discussion of software processes and their principal characteristics. In order to achieve the desireable

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

Bringing machine learning & compositional semantics together: central concepts

Bringing machine learning & compositional semantics together: central concepts Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding

More information

Efficient kernels for sentence pair classification

Efficient kernels for sentence pair classification Efficient kernels for sentence pair classification Fabio Massimo Zanzotto DIP University of Rome Tor Vergata Via del Politecnico 1 00133 Roma Italy zanzotto@info.uniroma2.it Lorenzo Dell Arciprete University

More information

Francisco M. Couto Mário J. Silva Pedro Coutinho

Francisco M. Couto Mário J. Silva Pedro Coutinho Francisco M. Couto Mário J. Silva Pedro Coutinho DI FCUL TR 03 29 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749 016 Lisboa Portugal Technical reports are

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A. D. Fowler Department of Computer Science University of Toronto 10 King s College Rd., Toronto, ON, M5S 3G4, Canada

More information