MACHINE LEARNING. Kernel Methods. Alessandro Moschitti
|
|
- Edmund Knight
- 6 years ago
- Views:
Transcription
1 MACHINE LEARNING Kernel Methods Alessandro Moschitti Department of information and communication technology University of Trento
2 Communications Next Lectures will be on February 11, 13, 16, 18 We start at 12:00 (sharp)
3 Outline (1) PART I: Theory Motivations Kernel Based Machines Kernel Methods Kernel Trick Mercer s conditions Kernel operators Basic Kernels Linear Kernel Polynomial Kernel Lexical kernel String Kernel
4 Outline (2) Tree kernels subtree, subset tree and partial tree kernels PART II: Applications to Question Answering Question Classification Kernel Combinations Answer Classification Experiments and Results Conclusions
5 Motivation (1) Feature design most difficult aspect in designing a learning system complex and difficult phase, e.g., structural feature representation: deep knowledge and intuitions are required design problems when the phenomenon is described by many features
6 Motivation (2) Kernel methods alleviate such problems Structures represented in terms of substructures High dimensional feature spaces Implicit and abstract feature spaces Generate high number of features Support Vector Machines select the relevant features Automatic Feature engineering side-effect
7 Theory Kernel Trick Kernel Based Machines Basic Kernel Properties Kernel Types
8 The main idea of Kernel Functions Mapping vectors in a space where they are linearly separable x φ(x) φ o o x x o x o x φ(x) φ(x) φ(o) φ(x) φ(o) φ(x) φ(o) φ(o)
9 A mapping example Given two masses m 1 and m 2, one is constrained Apply a force f a to the mass m 1 Experiments Features m 1, m 2 and f a We want to learn a classifier that tells when a mass m 1 will get far away from m 2 If we consider the Gravitational Newton Law m1m2 f ( m1, m2, r) = C 2 r we need to find when f(m 1, m 2, r) < f a
10 Mapping Example x = x1,..., xn) φ ( x) = ( φ ( x),..., φn( x)) ( 1 The gravitational law is not linear so we need to change space ( fa = 2, m1, m2, r) ( k, x, y, z) (ln fa,ln m1,ln m,ln r) As ln f ( m1, m2, r) = ln C + ln m1 + ln m2 2ln r = c + x + y 2z We need the hyperplane ln f a ln m1 ln m2 + 2ln r ln C = (ln m 1,ln m 2,-2ln r) (x,y,z)- ln f a + ln C = 0, we can decide without error if the mass will get far away or not 0
11 A kernel-based Machine Perceptron training w 0 0 ;b 0 0;k 0;R max 1 i l x i Repeat for i = 1 to if y i ( w k x i + b k ) 0 then w k +1 = w k + ηy i x i endif endfor b k +1 = b k + ηy i R 2 k = k +1 until an error is found return k,( w k,b k )
12
13
14
15 Changing Gradient Representation Each step of perceptron only training data is added with a certain weight w = α j y j x j j=1.. So the classification function sgn( w x + b) = sgn( α j j=1.. y j x j x + b ) Note that data only appears in the scalar product
16 We can rewrite the classification function as As well as the updating function The learning rate does not affect the algorithm so we set it to η =1. η Dual Perceptron algorithm and Kernel functions ) ), ( sgn( ) ) ( ) ( sgn( ) ) ( ) ( sgn( ) ( b x x k y b x x y b x w x h j j i j j j j j + = = + = + = = = α φ φ α φ φ η α α α φ φ α + = + = + = = i i i j j j j i j j j j i b x x k y b x x y y 0 allora ) ), ( ) ( ) ( ( if
17 Kernels in Support Vector Machines In Soft Margin SVMs we maximize: By using kernel functions we rewrite the problem as:
18 Kernel Function Definition Kernel are the product of mapping functions such as x R n, φ ( x) = ( φ1( x), φ2 ( x),..., φ m ( x)) R m
19 The Kernel Gram Matrix With KM-based learning, the sole information used from the training data set is the Kernel Gram Matrix If the kernel is valid, K is symmetric definitepositive.
20 Valid Kernels
21 Valid Kernels cont d If the matrix is positive semi-definite then we can find a mapping φ implementing the kernel function
22 Valid Kernel operations k(x,z) = k 1 (x,z)+k 2 (x,z) k(x,z) = k 1 (x,z)*k 2 (x,z) k(x,z) = α k 1 (x,z) k(x,z) = f(x)f(z) k(x,z) = k 1 (φ(x),φ(z)) k(x,z) = x'bz
23 Basic Kernels for unstructured data Linear Kernel Polynomial Kernel Lexical kernel String Kernel
24 A simple classification problem: Text Categorization Berlusconi Wonderful Bush acquires declares Totti Yesterday Inzaghi before war match elections Politic C 1 Economic C 2 Sport C n
25 Text Classification Problem Given: a set of target categories: the set T of documents, define f : T 2 C C = { C 1,..,C n } VSM (Salton89 ) Features are dimensions of a Vector Space. Documents and Categories are vectors of feature weights. C d is assigned to if d C i > th i
26 The Vector Space Model Berlusconi d 1 : Politic Bush declares war. Berlusconi gives support d 1 d 2 C 1 d 3 C 2 Totti d 2 : Sport Wonderful Totti in the yesterday match against Berlusconi s Milan d 3 :Economic Berlusconi acquires Inzaghi before elections C 1 : Politics Category C 2 : Sport Category Bush
27 Linear Kernel In Text Categorization documents are word vectors Φ(d x ) = x = (0,..,1,..,0,..,0,..,1,..,0,..,0,..,1,..,0,..,0,..,1,..,0,..,1) buy acquisition stocks sell market Φ(d z ) = z = (0,..,1,..,0,..,1,..,0,..,0,..,0,..,1,..,0,..,0,..,1,..,0,..,0) The dot product x z features in common buy company stocks sell This provides a sort of similarity counts the number of
28 Feature Conjunction (polynomial Kernel) The initial vectors are mapped in a higher space More expressive, as encodes Stock+Market vs. Downtown+Market features We can smartly compute the scalar product as,1) 2, 2, 2,, ( ), ( x x x x x x x x > Φ < ), ( 1) ( 1) ( ,1) 2, 2, 2,, (,1) 2, 2, 2,, ( ) ( ) ( z x K z x z x z x z x z x z z x x z x z x z z z z z z x x x x x x z x Poly = + = + + = = = = = = Φ Φ ( 2 ) x 1 x
29 Using character sequences φ("bank") = x = (0,..,1,..,0,..,1,..,0,...1,..,0,..,1,..,0,..,1,..,0) bank ank bnk bk b φ("rank") = z = (1,..,0,..,0,..,1,..,0,...0,..,1,..,0,..,1,..,0,..,1) rank ank rnk rk r x z counts the number of common substrings x z = φ("bank") φ("rank") = k("bank","rank")
30 String Kernel Given two strings, the number of matches between their substrings is evaluated E.g. Bank and Rank B, a, n, k, Ba, Ban, Bank, Bk, an, ank, nk,.. R, a, n, k, Ra, Ran, Rank, Rk, an, ank, nk,.. String kernel over sentences and texts Huge space but there are efficient algorithms
31 Formal Definition, where i 1 +1, where
32 Kernel between Bank and Rank
33 An example of string kernel computation
34 Efficient Evaluation Dynamic Programming technique Evaluate the spectrum string kernels Substrings of size p Sum the contribution of the different spectra
35 Efficient Evaluation
36 An example: SK( Gatta, Cata ) First, evaluate the SK with size p=1, i.e. a, a, t, t, a, a Store in the table SK p=1
37 Evaluating DP2 Predicted weight of the string of size p multiplied by the number of substrings of size p-1
38 Evaluating the Predictive DP on strings of size 2 (second row) Let s consider substrings of size 2 and suppose that: we have matched the first a we will match the next character that we add to the two strings We compute the weights of future matches at different string positions with some not-yet known character? If the match occurs immediately after a the weight will be λ 1+1 x λ 1+1 = λ 4 and we store just λ 2 in the DP entry in [ a, a ]
39 Evaluating the DP wrt different positions (second row) If the match for gatta occurs after t the weight will be λ 1+2 (x λ 2 = λ 5 ) since the substring for it will be with a? (λ 3 ) We write such prediction in the entry [ a, t ] Same rationale for a match after the second t, we have the substring a? (matching with a? from catta ) for a weight of λ 3+1 (x λ 2 )
40 Evaluating the DP wrt different positions (third row) If the match of the second character occurs after t of cata, the weight will be λ 2+1 (x λ 2 = λ 5 ) since it will be with the string a?, with a weight of λ 3 If the match occurs after t of both gatta and cata, there are two way to compose substring of size two: a? with weight λ 4 or t? with weight λ 2 the total is λ 2 +λ 4
41 Evaluating the DP wrt different positions (third row) Finally, match after the last t of both cat and gatta There are three possible substrings of gatta : a?, t?, t? for gatta with weight λ 3, λ 2 or λ, respectively. There are two possible substrings of gatta a?, t? with weight λ 2 and λ With a weight of λ 5, λ 3, λ 2 the total weight string is λ 5 + λ 3 + λ 2
42 Evaluating SK of size 2 using DP2 SK p= 2 The number/weight of substrings of size 2 between gat and cat is λ 4 = λ 2 ([ a, a ] entry of DP) x λ 2 (cost of one character), where a = t and b = t. Between gatta and cata is λ 7 + λ 5 + λ 4, i.e the matches of a a, t a, ta with at and a a and ta
43 Document Similarity Doc 1 Doc 2 industry company telephone product market
44 Lexical Semantic Kernel [CoNLL 2005] The document similarity is the SK function: SK(d 1,d 2 ) = s(w 1,w 2 ) w 1 d 1,w 2 d 2 where s is any similarity function between words, e.g. WordNet [Basili et al.,2005] similarity or LSA [Cristianini et al., 2002] Good results when training data is poor
45 Mercer s Theorem (finite space) ( ) i, j=1 Let us consider K = K( x i, x j ) n K symmetric V: K = VΛ V for Takagi factorization of a complex-symmetric matrix, where: Λ is the diagonal matrix of the eigenvalues λ t of K n v t = ( v ti ) i =1 are the eigenvectors, i.e. the columns of V Let us assume lambda values non-negative φ : x i ( λ t v ) n ti t =1 R n, i =1,..,n
46 Mercer s Theorem (sufficient conditions) Therefore Φ( x i ) Φ( x j ) = n λ t v ti v tj = VΛ V t=1 ( ) ij = K, ij = K( which implies that K is a kernel function x i, x j )
47 Mercer s Theorem (necessary conditions) Suppose we have negative eigenvalues λ s and eigenvectors v s z = has the following norm: the following point n v si Φ( x i ) = Λ V i=1 z 2 = z z = Λ V v s Λ V v s = v s ' V Λ Λ V v s = v s ' Kv s = v s ' λ s v s = λ s v 2 s < 0 v s, this contradicts the geometry of the space.
48 Is it a valid kernel? It may not be a kernel so we can use M M
49 Tree kernels Subtree, Subset Tree, Partial Tree kernels Efficient computation
50 Example of a parse tree John delivers a talk in Rome S S N VP N VP VP V NP PP John V delivers D NP N PP PP IN N IN N N Rome a talk in Rome
51 The Syntactic Tree Kernel (STK) [Collins and Duffy, 2002] VP V NP delivers D a N talk
52 The overall fragment set VP D NP a Children are not divided
53 Explicit kernel space φ(t x ) = x = (0,..,1,..,0,..,1,..,0,..,1,..,0,..,1,..,0,..,1,..,0,..,1,..,0) φ(t z ) = z = (1,..,0,..,0,..,1,..,0,..,1,..,0,..,1,..,0,..,0,..,1,..,0,..,0) x z counts the number of common substructures
54 Efficient evaluation of the scalar product x z = φ(t x ) φ(t z ) = K(T x,t z ) = = Δ(n x,n z ) n x T x n z T z
55 Efficient evaluation of the scalar product x z = φ(t x ) φ(t z ) = K(T x,t z ) = [Collins and Duffy, ACL 2002] evaluate Δ in O(n 2 ): Δ(n x,n z ) = 0, if the productions are different else Δ(n x,n z ) =1, if pre - terminals else = Δ(n x,n z ) n x T x n z T z Δ(n x,n z ) = nc(n x ) j=1 (1+ Δ(ch(n x, j),ch(n z, j)))
56 Other Adjustments Decay factor Δ(n x,n z ) = λ, if pre - terminals else Δ(n x,n z ) = λ nc(n x ) j=1 (1+ Δ(ch(n x, j),ch(n z, j))) Normalization K (T x,t z ) = K(T x,t z ) K(T x,t x ) K(T z,t z )
57 SubTree (ST) Kernel [Vishwanathan and Smola, 2002] VP V NP NP delivers D N D N V D N a talk a talk delivers a talk
58 Evaluation Given the equation for the SST kernel Δ(n x,n z ) = 0, if the productions are different else Δ(n x,n z ) =1, if pre - terminals else Δ(n x,n z ) = nc(n x ) j=1 (1+ Δ(ch(n x, j),ch(n z, j)))
59 Evaluation Given the equation for the SST kernel Δ(n x,n z ) = 0, if the productions are different else Δ(n x,n z ) =1, if pre - terminals else Δ(n x,n z ) = nc(n x ) j=1 Δ(ch(n x, j),ch(n z, j))
60 Fast Evaluation of STK [EACL 2006] K(T x,t z ) = Δ(n x,n z ) where P(n x ) and P(n z ) are the production rules used at nodes n x and n z n x,n z NP NP = { n x,n z T x T z : Δ(n x,n z ) 0} = = { n x,n z T x T z : P(n x ) = P(n z )},
61 Algorithm
62 Observations We order the production rules used in T x and T z, at loading time At learning time we may evaluate NP in T x + T z running time If T x and T z are generated by only one production rule O( T x T z )
63 Observations We order the production rules used in T x and T z, at loading time At learning time we may evaluate NP in T x + T z running time If T x and T z are generated by only one production rule O( T x T z ) Very Unlikely!!!!
64 Labeled Ordered Tree Kernel SST satisfies the constraint remove 0 or all children at a time. If we relax such constraint we get more general substructures [Kashima and Koyanagi, 2002] VP VP VP VP VP VP VP VP V gives D a NP N talk V D a NP N talk D a NP N talk D a NP N D a NP NP D NP D N NP N NP N NP D NP
65 Weighting Problems V gives D a VP NP N talk V gives D a VP NP N talk Both matched pairs give the same contribution. Gap based weighting is VP VP needed. V NP V NP A novel efficient evaluation gives D JJ N gives D JJ N has to be defined a good talk a bad talk
66 Partial Trees, [ECML 2006] SST + String Kernel with weighted gaps on Nodes children VP VP VP VP VP VP VP VP V brought D a NP N cat V D a NP N cat D a NP N cat D a NP N D a NP NP D NP D N NP N NP N NP D NP
67 Partial Tree Kernel By adding two decay factors we obtain:
68 Efficient Evaluation (2) The complexity of finding the subsequences is Therefore the overall complexity is where ρ is the maximum branching factor (p = ρ)
69 Running Time of Tree Kernel Functions
70 Kernel Combinations an example Kernel Combinations: ,, p Tree p Tree P Tree p p Tree Tree P Tree p Tree P Tree p Tree P Tree K K K K K K K K K K K K K K K K = + = = + = + + γ γ Tree kernel features flat of kernel polynomial 3 Tree p K K
71 Question Classification Definition: What does HTML stand for? Description: What's the final line in the Edgar Allan Poe poem "The Raven"? Entity: What foods can cause allergic reaction in people? Human: Who won the Nobel Peace Prize in 1992? Location: Where is the Statue of Liberty? Manner: How did Bob Marley die? Numeric: When was Martin Luther King Jr. born? Organization: What company makes Bentley cars?
72 Question Classifier based on Tree Kernels Question dataset ( [Lin and Roth, 2005]) Distributed on 6 categories: Abbreviations, Descriptions, Entity, Human, Location, and Numeric. Fixed split 5500 training and 500 test questions Cross-validation (10-folds) Using the whole question parse trees Constituent parsing Example What is an offer of direct stock purchase plan?
73
74 Kernels BOW, POS are obtained with a simple tree, e.g. BOX What is an offer an PT (parse tree) * * * * * PAS (predicate argument structure)
75 Question classification
76 TASK: Automatic Classification The classifier detects if a pair (question and answer) is correct or not A representation for the pair is needed The classifier can be used to re-rank the output of a basic QA system
77 Dataset 2: TREC data 138 TREC 2001 test questions labeled as description 2,256 sentences, extracted from the best ranked paragraphs (using a basic QA system based on Lucene search engine on TREC dataset) 216 of which labeled as correct by one annotator
78 Dataset 2: TREC data 138 TREC 2001 test questions labeled as description 2,256 sentences, extracted from the best ranked A question is linked to many answers: all its derived paragraphs (using a basic QA system based on pairs cannot be shared by training and test sets Lucene search engine on TREC dataset) 216 of which labeled as correct by one annotator
79 Computational Representations and Feature Spaces
80 Bags of words (BOW) and POS-tags (POS) To save time, apply STK to these trees: BOX What is an offer of * * * * * BOX WHNP VBZ DT NN IN * * * * *
81 Word and POS Sequences What is an offer of? (word sequence, WSK) What_is_offer What_is WHNP VBZ DT NN IN (POS sequence, POSSK) WHNP_VBZ_NN WHNP_NN_IN
82 Syntactic Parse Trees (PT)
83 Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: Paul gives a lecture in Rome
84 Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: [ Arg0 Paul] [ predicate gives ] [ Arg1 a lecture] [ ArgM in Rome]
85 Predicate Argument Structure for Partial Tree Kernel (PAS PTK ) [ARG1 Antigens] were [AM TMP originally] [rel defined] [ARG2 as nonself molecules]. [ARG0 Researchers] [rel describe] [ARG1 antigens][arg2 as foreign molecules] [ARGM LOC in the body]
86 Kernels and Combinations Exploiting the property: k(x,z) = k 1 (x,z)+k 2 (x,z) BOW, POS, WSK, POSSK, PT, PAS PTK BOW+POS, BOW+PT, PT+POS,
87 Results on TREC Data (5 folds cross validation) F1-measure Kernel Type
88 Results on TREC Data (5 folds cross validation) F1-measure Kernel Type
89 Results on TREC Data (5 folds cross validation) F1-measure Kernel Type
90 Results on TREC Data (5 folds cross validation) F1-measure Kernel Type
91 Results on TREC Data (5 folds cross validation) F1-measure Kernel Type
92 Results on TREC Data (5 folds cross validation) F1-measure Kernel Type
93 Results on TREC Data (5 folds cross validation) F1-measure BOW 24 POSSK+STK+PAS-PTK % of improvement Kernel Type
94 Conclusions Dealing with noisy and errors of NLP modules require robust approaches SVMs are robust to noise and Kernel methods allows for: Syntactic information via STK Shallow Semantic Information via PTK Word/POS sequences via String Kernels When the IR task is complex, syntax and semantics are essential Great improvement in Q/A classification SVM-Light-TK: an efficient tool to use them
95 SVM-light-TK Software Encodes ST, SST and combination kernels in SVM-light [Joachims, 1999] Available at Tree forests, vector sets New extensions: the PT kernel will be released asap
96 Data Format What does Html stand for? 1 BT (SBARQ (WHNP (WP What))(SQ (AUX does)(np (NNP S.O.S.))(VP (VB stand)(pp (IN for))))(.?)) BT (BOW (What *)(does *)(S.O.S. *)(stand *)(for *)(? *)) BT (BOP (WP *)(AUX *)(NNP *)(VB *)(IN *)(. *)) BT (PAS (ARG0 (R-A1 (What *)))(ARG1 (A1 (S.O.S. NNP)))(ARG2 (rel stand))) ET 1:1 21: E-4 23:1 30:1 36:1 39:1 41:1 46:1 49:1 66:1 152:1 274:1 333:1 BV 2:1 21: E-4 23:1 31:1 36:1 39:1 41:1 46:1 49:1 52:1 66:1 152:1 246:1 333:1 392:1 EV
97 Basic Commands Training and classification./svm_learn -t 5 -C T train.dat model./svm_classify test.dat model Learning with a vector sequence./svm_learn -t 5 -C V train.dat model Learning with the sum of vector and kernel sequences./svm_learn -t 5 -C + train.dat model
98 Custom Kernel Kernel.h double custom_kernel(kernel_parm *kernel_parm, DOC *a, DOC *b); if(a->num_of_trees && b->num_of_trees && a- >forest_vec[i]!=null && b->forest_vec[i]! =NULL){// Test if one the i-th tree of instance a and b is an empty tree
99 Custom Kernel: tree-kernel k1= // summation of tree kernels tree_kernel(kernel_parm, a, b, i, i)/ Evaluate tree kernel between the two i-th trees. sqrt(tree_kernel(kernel_parm, a, a, i, i) * tree_kernel(kernel_parm, b, b, i, i)); Normalize respect to both i-th trees.
100 Custom Kernel: Polynomial kernel if(a->num_of_vectors && b->num_of_vectors && a->vectors[i]!=null && b->vectors[i]! =NULL){ Check if the i-th vectors are empty. k2= // summation of vectors basic_kernel(kernel_parm, a, b, i, i)/ Compute standard kernel (selected according to the "second_kernel" parameter).
101 Custom Kernel: Polynomial kernel sqrt( basic_kernel(kernel_parm, a, a, i, i) * basic_kernel(kernel_parm, b, b, i, i) ); //normalize vectors return k1+k2;
102 Conclusions Kernel methods and SVMs are useful tools to design language applications Kernel design still require some level of expertise Engineering approaches to tree kernels Basic Combinations Canonical Mappings, e.g. Node Marking Merging of kernels in more complex kernels State-of-the-art in SRL and QC An efficient tool to use them
103 Thank you
104 References Alessandro Moschitti, Silvia Quarteroni, Roberto Basili and Suresh Manandhar, Exploiting Syntactic and Shallow Semantic Kernels for Question/Answer Classification, Proceedings of the 45th Conference of the Association for Computational Linguistics (ACL), Prague, June Alessandro Moschitti and Fabio Massimo Zanzotto, Fast and Effective Kernels for Relational Learning from Texts, Proceedings of The 24th Annual International Conference on Machine Learning (ICML 2007), Corvallis, OR, USA. Daniele Pighin, Alessandro Moschitti and Roberto Basili, RTV: Tree Kernels for Thematic Role Classification, Proceedings of the 4th International Workshop on Semantic Evaluation (SemEval-4), English Semantic Labeling, Prague, June Stephan Bloehdorn and Alessandro Moschitti, Combined Syntactic and Semanitc Kernels for Text Classification, to appear in the 29th European Conference on Information Retrieval (ECIR), April 2007, Rome, Italy. Fabio Aiolli, Giovanni Da San Martino, Alessandro Sperduti, and Alessandro Moschitti, Efficient Kernel-based Learning for Trees, to appear in the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Honolulu, Hawaii, 2007
105 An introductory book on SVMs, Kernel methods and Text Categorization
106 References Roberto Basili and Alessandro Moschitti, Automatic Text Categorization: from Information Retrieval to Support Vector Learning, Aracne editrice, Rome, Italy. Alessandro Moschitti, Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany, Alessandro Moschitti, Daniele Pighin, and Roberto Basili, Tree Kernel Engineering for Proposition Re-ranking, In Proceedings of Mining and Learning with Graphs (MLG 2006), Workshop held with ECML/PKDD 2006, Berlin, Germany, Elisa Cilia, Alessandro Moschitti, Sergio Ammendola, and Roberto Basili, Structured Kernels for Automatic Detection of Protein Active Sites. In Proceedings of Mining and Learning with Graphs (MLG 2006), Workshop held with ECML/PKDD 2006, Berlin, Germany, 2006.
107 References Fabio Massimo Zanzotto and Alessandro Moschitti, Automatic learning of textual entailments with cross-pair similarities. In Proceedings of COLING-ACL, Sydney, Australia, Alessandro Moschitti, Making tree kernels practical for natural language learning. In Proceedings of the Eleventh International Conference on European Association for Computational Linguistics, Trento, Italy, Alessandro Moschitti, Daniele Pighin and Roberto Basili. Semantic Role Labeling via Tree Kernel joint inference. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York, USA, Alessandro Moschitti, Bonaventura Coppola, Daniele Pighin and Roberto Basili, Semantic Tree Kernels to classify Predicate Argument Structures. In Proceedings of the the 17th European Conference on Artificial Intelligence, Riva del Garda, Italy, 2006.
108 References Alessandro Moschitti and Roberto Basili, A Tree Kernel approach to Question and Answer Classification in Question Answering Systems. In Proceedings of the Conference on Language Resources and Evaluation, Genova, Italy, Ana-Maria Giuglea and Alessandro Moschitti, Semantic Role Labeling via FrameNet, VerbNet and PropBank. In Proceedings of the Joint 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, Australia, Roberto Basili, Marco Cammisa and Alessandro Moschitti, Effective use of wordnet semantics via kernel-based learning. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005), Ann Arbor(MI), USA, 2005
109 References Alessandro Moschitti, Ana-Maria Giuglea, Bonaventura Coppola and Roberto Basili. Hierarchical Semantic Role Labeling. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005 shared task), Ann Arbor(MI), USA, Roberto Basili, Marco Cammisa and Alessandro Moschitti, A Semantic Kernel to classify texts with very few training examples. In Proceedings of the Workshop on Learning in Web Search, at the 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany, Alessandro Moschitti, Bonaventura Coppola, Daniele Pighin and Roberto Basili. Engineering of Syntactic Features for Shallow Semantic Parsing. In Proceedings of the ACL05 Workshop on Feature Engineering for Machine Learning in Natural Language Processing, Ann Arbor (MI), USA, 2005.
110 References Alessandro Moschitti, A study on Convolution Kernel for Shallow Semantic Parsing. In proceedings of ACL-2004, Spain, Alessandro Moschitti and Cosmin Adrian Bejan, A Semantic Kernel for Predicate Argument Classification. In proceedings of the CoNLL-2004, Boston, MA, USA, M. Collins and N. Duffy, New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In ACL02, S.V.N. Vishwanathan and A.J. Smola. Fast kernels on strings and trees. In Proceedings of Neural Information Processing Systems, 2002.
111 References AN INTRODUCTION TO SUPPORT VECTOR MACHINES (and other kernel-based learning methods) N. Cristianini and J. Shawe-Taylor Cambridge University Press Xavier Carreras and Llu ıs M`arquez Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In proceedings of CoNLL 05. Sameer Pradhan, Kadri Hacioglu, Valeri Krugler, Wayne Ward, James H. Martin, and Daniel Jurafsky Support vector learning for semantic argument classification. to appear in Machine Learning Journal.
112 The Impact of SSTK in Answer Classification F1-measure Q(BOW)+A(BOW) Q(BOW)+A(PT,BOW) Q(PT)+A(PT,BOW) Q(BOW)+A(BOW,PT,PAS) Q(BOW)+A(BOW,PT,PAS_N) Q(PT)+A(PT,BOW,PAS) Q(BOW)+A(BOW,PAS) Q(BOW)+A(BOW,PAS_N) j
113 Mercer s conditions (1)
114 Mercer s conditions (2) If the Gram matrix: G = k( x i x, j is positive semi-definite there is a mapping φ that produces the target kernel function )
115 The lexical semantic kernel is not always a kernel It may not be a kernel so we can use M M, where M is the initial similarity matrix
116 Efficient Evaluation (1) In [Taylor and Cristianini, 2004 book], sequence kernels with weighted gaps are factorized with respect to different subsequence sizes. We treat children as sequences and apply the same theory D p
MACHINE LEARNING. Kernel Methods. Alessandro Moschitti
MACHINE LEARNING Kernel Methods Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Outline (1) PART I: Theory Motivations Kernel
More informationSemantic Role Labeling via Tree Kernel Joint Inference
Semantic Role Labeling via Tree Kernel Joint Inference Alessandro Moschitti, Daniele Pighin and Roberto Basili Department of Computer Science University of Rome Tor Vergata 00133 Rome, Italy {moschitti,basili}@info.uniroma2.it
More informationEncoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case
Encoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case Alessandro Moschitti DII University of Trento Via ommarive 14 38100 POVO (T) - Italy moschitti@dit.unitn.it
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationFast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling
Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling Wanxiang Che 1, Min Zhang 2, Ai Ti Aw 2, Chew Lim Tan 3, Ting Liu 1, Sheng Li 1 1 School of Computer Science and Technology
More informationCKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016
CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationSupport Vector Machines: Kernels
Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems
More informationMACHINE LEARNING. Vapnik-Chervonenkis (VC) Dimension. Alessandro Moschitti
MACHINE LEARNING Vapnik-Chervonenkis (VC) Dimension Alessandro Moschitti Department of Information Engineering and Computer Science University of Trento Email: moschitti@disi.unitn.it Computational Learning
More informationThe Research on Syntactic Features in Semantic Role Labeling
23 6 2009 11 J OU RNAL OF CH IN ESE IN FORMA TION PROCESSIN G Vol. 23, No. 6 Nov., 2009 : 100320077 (2009) 0620011208,,,, (, 215006) :,,,( NULL ),,,; CoNLL22005 Shared Task WSJ 77. 54 %78. 75 %F1, : ;;;;
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationFast and Effective Kernels for Relational Learning from Texts
Fast and Effective Kernels for Relational Learning from Texts Alessandro Moschitti MOCHITTI@DIT.UNITN.IT Department of Information and Communication Technology, University of Trento, 38050 Povo di Trento,
More informationIntroduction to the CoNLL-2004 Shared Task: Semantic Role Labeling
Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling Xavier Carreras and Lluís Màrquez TALP Research Center Technical University of Catalonia Boston, May 7th, 2004 Outline Outline of the
More informationSUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS
SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS Filippo Portera and Alessandro Sperduti Dipartimento di Matematica Pura ed Applicata Universit a di Padova, Padova, Italy {portera,sperduti}@math.unipd.it
More informationDriving Semantic Parsing from the World s Response
Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationMACHINE LEARNING. Support Vector Machines. Alessandro Moschitti
MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines
More informationPredicting Graph Labels using Perceptron. Shuang Song
Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationFast Logistic Regression for Text Categorization with Variable-Length N-grams
Fast Logistic Regression for Text Categorization with Variable-Length N-grams Georgiana Ifrim *, Gökhan Bakır +, Gerhard Weikum * * Max-Planck Institute for Informatics Saarbrücken, Germany + Google Switzerland
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More information6.891: Lecture 24 (December 8th, 2003) Kernel Methods
6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl
More informationMACHINE LEARNING. Automated Text Categorization. Alessandro Moschitti
MACHINE LEARNING Automated Text Categorization Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Outline Text Categorization
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationRepresenting structured relational data in Euclidean vector spaces. Tony Plate
Representing structured relational data in Euclidean vector spaces Tony Plate tplate@acm.org http://www.d-reps.org October 2004 AAAI Symposium 2004 1 Overview A general method for representing structured
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationMachine Learning : Support Vector Machines
Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationLearning SVM Classifiers with Indefinite Kernels
Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in
More informationSVM merch available in the CS department after lecture
SVM merch available in the CS department after lecture SVM SVM Introduction Previously we considered Perceptrons based on the McCulloch and Pitts neuron model. We identified a method for updating weights,
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationKernel Methods. Charles Elkan October 17, 2007
Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationApplied inductive learning - Lecture 7
Applied inductive learning - Lecture 7 Louis Wehenkel & Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - November 5, 2012 Find slides: http://montefiore.ulg.ac.be/
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationLinear Classifiers (Kernels)
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationText Mining. March 3, March 3, / 49
Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49
More informationSupport Vector Machines
Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationMulti-Label Informed Latent Semantic Indexing
Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate Technology Department of Neural Computation 2 University of Munich
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationTALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing
TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationThe SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania
The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania 1 PICTURE OF ANALYSIS PIPELINE Tokenize Maximum Entropy POS tagger MXPOST Ratnaparkhi Core Parser Collins
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationProposition Knowledge Graphs. Gabriel Stanovsky Omer Levy Ido Dagan Bar-Ilan University Israel
Proposition Knowledge Graphs Gabriel Stanovsky Omer Levy Ido Dagan Bar-Ilan University Israel 1 Problem End User 2 Case Study: Curiosity (Mars Rover) Curiosity is a fully equipped lab. Curiosity is a rover.
More information