MACHINE LEARNING. Kernel Methods. Alessandro Moschitti

Size: px
Start display at page:

Download "MACHINE LEARNING. Kernel Methods. Alessandro Moschitti"

Transcription

1 MACHINE LEARNING Kernel Methods Alessandro Moschitti Department of information and communication technology University of Trento

2 Outline (1) PART I: Theory Motivations Kernel Based Machines Kernel Methods Kernel Trick Mercer s conditions Kernel operators Basic Kernels Linear Kernel Polynomial Kernel Lexical kernel String Kernel

3 Outline (2) Tree kernels subtree, subset tree and partial tree kernels PART II: Kernel engineering for Natural Language Applications Object Transformation and Kernel Combinations: The Semantic Role Labeling case Node marking Shallow Semantic Tree Kernels Experiments

4 Outline (2) Merging of Kernels and Kernel Combinations: Question/Answer Classification The Syntactic/Semantic Tree Kernel Advanced Kernel combinations Experiments Example of Custom Kernel in SVM-LIGHT-TK Conclusions

5 Kernels for Natural Language Processing (1) Typical tasks: from text categorization to syntactic and semantic parsing Feature design for NLP applications: complex and difficult phase, e.g., structural feature representation: deep knowledge and intuitions are required design problems when the phenomenon is described by many features

6 Kernels for Natural Language Processing (2) Kernel methods alleviate such problems Structures represented in terms of substructures High dimensional feature spaces Implicit and abstract feature spaces Generate high number of features Support Vector Machines select the relevant features Automatic Feature engineering side-effect

7 Kernels Engineering Kernel design still requires noticeable creativity and expertise. An engineering approach is needed: starting from basic and well known kernel functions basic combinations canonical mappings, e.g. object transformations merging of kernels

8 Theory Kernel Trick Kernel Based Machines Basic Kernel Properties Kernel Types

9 The main idea of Kernel Functions Mapping vectors in a space where they are linearly separable x φ(x) φ o o x x o x o x φ(x) φ(x) φ(o) φ(x) φ(o) φ(x) φ(o) φ(o)

10 A mapping example Given two masses m 1 and m 2, one is constrained Apply a force f a to the mass m 1 Experiments Features m 1, m 2 and f a We want to learn a classifier that tells when a mass m 1 will get far away from m 2 If we consider the Gravitational Newton Law m1m2 f ( m1, m2, r) = C 2 r we need to find when f(m 1, m 2, r) < f a

11 Mapping Example x = x1,..., xn) φ ( x) = ( φ ( x),..., φn( x)) ( 1 The gravitational law is not linear so we need to change space ( fa = 2, m1, m2, r) ( k, x, y, z) (ln fa,ln m1,ln m,ln r) As ln f ( m1, m2, r) = ln C + ln m1 + ln m2 2ln r = c + x + y 2z We need the hyperplane ln f a ln m1 ln m2 + 2ln r ln C = (ln m 1,ln m 2,-2ln r) (x,y,z)- ln f a + ln C = 0, we can decide without error if the mass will get far away or not 0

12 Classification function in the dual space sgn( w x + b) = sgn( Note that data only appears in the scalar product The Matrix G ( x ) i x j i, j= 1 = α j j=1.. l y j x j x + b ) is called Gram matrix

13 We can rewrite the classification function as As well as the updating function The learning rate does not affect the algorithm so we set it to η =1. η Dual Perceptron algorithm and Kernel functions ) ), ( sgn( ) ) ( ) ( sgn( ) ) ( ) ( sgn( ) ( b x x k y b x x y b x w x h j j i j j j j j + = = + = + = = = α φ φ α φ φ η α α α φ φ α + = + = + = = i i i j j j j i j j j j i b x x k y b x x y y 0 allora ) ), ( ) ( ) ( ( if

14 Kernels in Support Vector Machines In Soft Margin SVMs we maximize: By using kernel functions we rewrite the problem as:

15 Kernel Function Definition Kernel are the product of mapping functions such as x R n, φ ( x) = ( φ1( x), φ2 ( x),..., φ m ( x)) R m

16 Valid Kernels

17 Valid Kernels cont d If the matrix is positive semi-definite then we can find a mapping φ implementing the kernel function

18 Valid Kernel operations k(x,z) = k 1 (x,z)+k 2 (x,z) k(x,z) = k 1 (x,z)*k 2 (x,z) k(x,z) = α k 1 (x,z) k(x,z) = f(x)f(z) k(x,z) = k 1 (φ(x),φ(z)) k(x,z) = x'bz

19 Basic Kernels for unstructured data Linear Kernel Polynomial Kernel Lexical kernel String Kernel

20 Linear Kernel In Text Categorization documents are word vectors Φ(d x ) = x = (0,..,1,..,0,..,0,..,1,..,0,..,0,..,1,..,0,..,0,..,1,..,0,..,1) buy acquisition stocks sell market Φ(d z ) = z = (0,..,1,..,0,..,1,..,0,..,0,..,0,..,1,..,0,..,0,..,1,..,0,..,0) The dot product x z features in common buy company stocks sell This provides a sort of similarity counts the number of

21 Feature Conjunction (polynomial Kernel) The initial vectors are mapped in a higher space More expressive, as encodes Stock+Market vs. Downtown+Market features We can smartly compute the scalar product as,1) 2, 2, 2,, ( ), ( x x x x x x x x > Φ < ), ( 1) ( 1) ( ,1) 2, 2, 2,, (,1) 2, 2, 2,, ( ) ( ) ( z x K z x z x z x z x z x z z x x z x z x z z z z z z x x x x x x z x Poly = + = + + = = = = = = Φ Φ ( 2 ) x 1 x

22 Using character sequences φ("bank") = x = (0,..,1,..,0,..,1,..,0,...1,..,0,..,1,..,0,..,1,..,0) bank ank bnk bk b φ("rank") = z = (1,..,0,..,0,..,1,..,0,...0,..,1,..,0,..,1,..,0,..,1) rank ank rnk rk r x z counts the number of common substrings x z = φ("bank") φ("rank") = k("bank","rank")

23 String Kernel Given two strings, the number of matches between their substrings is evaluated E.g. Bank and Rank B, a, n, k, Ba, Ban, Bank, Bk, an, ank, nk,.. R, a, n, k, Ra, Ran, Rank, Rk, an, ank, nk,.. String kernel over sentences and texts Huge space but there are efficient algorithms

24 Formal Definition, where i 1 +1, where

25 Kernel between Bank and Rank

26 An example of string kernel computation

27 Intuition behind efficient evaluation Dynamic programming Given two strings of size n and m, NS(n,m,ρ) = NS(n,m,ρ-1)+NS(n-1,m,ρ)+NS(n,m-1,ρ) NS(n-1,m-1,ρ)

28 Document Similarity Doc 1 Doc 2 industry company telephone product market

29 Lexical Semantic Kernel [CoNLL 2005] The document similarity is the SK function: SK(d 1,d 2 ) = s(w 1,w 2 ) w 1 d 1,w 2 d 2 where s is any similarity function between words, e.g. WordNet [Basili et al.,2005] similarity or LSA [Cristianini et al., 2002] Good results when training data is poor

30 Is it a valid kernel? It may not be a kernel so we can use M M

31 Tree kernels Subtree, Subset Tree, Partial Tree kernels Efficient computation

32 Example of a parse tree John delivers a talk in Rome S S N VP N VP VP V NP PP John V delivers D NP N PP PP IN N IN N N Rome a talk in Rome

33 The Syntactic Tree Kernel (STK) [Collins and Duffy, 2002] VP V NP delivers D a N talk

34 The overall fragment set VP D NP a Children are not divided

35 Explicit kernel space φ(t x ) = x = (0,..,1,..,0,..,1,..,0,..,1,..,0,..,1,..,0,..,1,..,0,..,1,..,0) φ(t z ) = z = (1,..,0,..,0,..,1,..,0,..,1,..,0,..,1,..,0,..,0,..,1,..,0,..,0) x z counts the number of common substructures

36 Efficient evaluation of the scalar product x z = φ(t x ) φ(t z ) = K(T x,t z ) = = Δ(n x,n z ) n x T x n z T z

37 Efficient evaluation of the scalar product x z = φ(t x ) φ(t z ) = K(T x,t z ) = [Collins and Duffy, ACL 2002] evaluate Δ in O(n 2 ): Δ(n x,n z ) = 0, if the productions are different else Δ(n x,n z ) =1, if pre - terminals else = Δ(n x,n z ) n x T x n z T z Δ(n x,n z ) = nc(n x ) j=1 (1+ Δ(ch(n x, j),ch(n z, j)))

38 Explicit kernel space x = (0,..,1,..,0,..,1, VP VP..,0,..,1,..,0, VP.,1,..,0,..,1, NP NP..,0,..,1,..,0) NP V NP V NP V NP D N D N D N delivers D N D N D N a talk a talk a talk a talk x z counts the number of common substructures

39 Implicit Representation x z = φ(t x ) φ(t z ) = K(T x,t z ) = = Δ(n x,n z ) n x T x n z T z

40 Implicit Representation x z = φ(t x ) φ(t z ) = K(T x,t z ) = [Collins and Duffy, ACL 2002] evaluate Δ in O(n 2 ): Δ(n x,n z ) = 0, if the productions are different else Δ(n x,n z ) =1, if pre - terminals else = Δ(n x,n z ) n x T x n z T z Δ(n x,n z ) = nc(n x ) j=1 (1+ Δ(ch(n x, j),ch(n z, j)))

41 Other Adjustments Decay factor Δ(n x,n z ) = λ, if pre - terminals else Δ(n x,n z ) = λ nc(n x ) j=1 (1+ Δ(ch(n x, j),ch(n z, j))) Normalization K (T x,t z ) = K(T x,t z ) K(T x,t x ) K(T z,t z )

42 Fast SST Evaluation [EACL 2006] K(T x,t z ) = Δ(n x,n z ) where P(n x ) and P(n z ) are the production rules used at nodes n x and n z n x,n z NP NP = { n x,n z T x T z : Δ(n x,n z ) 0} = = { n x,n z T x T z : P(n x ) = P(n z )},

43 Observations We order the production rules used in T x and T z, at loading time At learning time we may evaluate NP in T x + T z running time If T x and T z are generated by only one production rule O( T x T z )

44 Observations We order the production rules used in T x and T z, at loading time At learning time we may evaluate NP in T x + T z running time If T x and T z are generated by only one production rule O( T x T z ) Very Unlikely!!!!

45 SubTree (ST) Kernel [Vishwanathan and Smola, 2002] VP V NP NP delivers D N D N V D N a talk a talk delivers a talk

46 Evaluation Given the equation for the SST kernel Δ(n x,n z ) = 0, if the productions are different else Δ(n x,n z ) =1, if pre - terminals else Δ(n x,n z ) = nc(n x ) j=1 (1+ Δ(ch(n x, j),ch(n z, j)))

47 Evaluation Given the equation for the SST kernel Δ(n x,n z ) = 0, if the productions are different else Δ(n x,n z ) =1, if pre - terminals else Δ(n x,n z ) = nc(n x ) j=1 Δ(ch(n x, j),ch(n z, j))

48 Labeled Ordered Tree Kernel SST satisfies the constraint remove 0 or all children at a time. If we relax such constraint we get more general substructures [Kashima and Koyanagi, 2002] VP VP VP VP VP VP VP VP V gives D a NP N talk V D a NP N talk D a NP N talk D a NP N D a NP NP D NP D N NP N NP N NP D NP

49 Weighting Problems V gives D a VP NP N talk V gives D a VP NP N talk Both matched pairs give the same contribution. Gap based weighting is VP VP needed. V NP V NP A novel efficient evaluation gives D JJ N gives D JJ N has to be defined a good talk a bad talk

50 Partial Trees, [ECML 2006] SST + String Kernel with weighted gaps on Nodes children VP VP VP VP VP VP VP VP V brought D a NP N cat V D a NP N cat D a NP N cat D a NP N D a NP NP D NP D N NP N NP N NP D NP

51 Partial Tree Kernel By adding two decay factors we obtain:

52 Efficient Evaluation (1) In [Taylor and Cristianini, 2004 book], sequence kernels with weighted gaps are factorized with respect to different subsequence sizes. We treat children as sequences and apply the same theory D p

53 Efficient Evaluation (2) The complexity of finding the subsequences is Therefore the overall complexity is where ρ is the maximum branching factor (p = ρ)

54 Running Time of Tree Kernel Functions µseconds FTK-SST QTK-SST FTK-PT Number of Tree Nodes

55 Gold Standard Tree Experiments PropBank and PennTree bank about 53,700 sentences Sections from 2 to 21 train., 23 test., 1 and 22 dev. Arguments from Arg0 to Arg5, ArgA and ArgM for a total of 122,774 and 7,359

56 Argument Classification Accuracy Accuracy ST Linear SST PT % Training Data

57 PART II: Kernel Engineering for Language Applications Basic Combinations Canonical Mappings, e.g. object transformations Merging of Kernels

58 Kernel Combinations an example Kernel Combinations: ,, p Tree p Tree P Tree p p Tree Tree P Tree p Tree P Tree p Tree P Tree K K K K K K K K K K K K K K K K = + = = + = + + γ γ Tree kernel features flat of kernel polynomial 3 Tree p K K

59 Object Transformation [CLJ 2008] Canonical Mapping, φ M () object transformation, e. g. a syntactic parse tree, into a verb subcategorization frame tree. Feature Extraction, φ E () maps the canonical structure in all its fragments different fragment spaces, e. g. ST, SST and PT. ), ( ) ( ) ( )) ( ( )) ( ( ) ( ) ( ), ( S S K S S O O O O O O K E E E M E M E = = = = φ φ φ φ φ φ φ φ

60 Object Transformation and Kernel Combinations: Semantic Role Labeling Kernel Engineering via node marking Kernels for the whole predicate argument structures Kernels for re-ranking

61 Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: Paul gives a lecture in Rome

62 Predicate Argument Classification In an event: target words describe relation among different entities the participants are often seen as predicate's arguments. Example: [ Arg0 Paul] [ predicate gives ] [ Arg1 a lecture] [ ArgM in Rome]

63 Predicate Argument Structures and syntax Semantics are connected to syntax via parse trees S N VP Paul Arg. 0 V gives D NP N IN PP N Predicate a lecture in Rome Arg. 1 Arg. M

64 Object Transformation: tree tailoring [ACL 2004] S N VP Paul Arg. 0 V gives D NP N PP IN N Predicate a lecture in Rome Arg. 1 Arg. M

65 Object Transformation: Node Marking

66 Node Marking Effect

67 Different tailoring and marking MMST CMST

68 Experiments PropBank and PennTree bank about 53,700 sentences Charniak trees from CoNLL 2005 Boundary detection: Section 2 training Section 24 testing PAF and MPAF

69 Number of examples/nodes of Section 2

70 Predicate Argument Feature (PAF) vs. Marked PAF (MPAF) [ACL-ws-2005]

71 More general mappings: Semantic structures for re-ranking [CoNLL 2006]

72 Other Shallow Semantic structures [ARG1 Antigens] were [AM TMP originally] [rel defined] [ARG2 as nonself molecules]. [ARG0 Researchers] [rel describe] [ARG1 antigens][arg2 as foreign molecules] [ARGM LOC in the body]

73 Shallow Semantic Trees for SST kernel [ACL 2007]

74 Merging of Kernels [ECIR 2007]: Question/Answer Classification Syntactic/Semantic Tree Kernel Kernel Combinations Experiments

75 Merging of Kernels

76 Merging of Kernels VP VP V NP V NP gives D N N gives D N N a good talk a solid talk

77 Delta Evaluation is very simple

78 Question Classification Definition: What does HTML stand for? Description: What's the final line in the Edgar Allan Poe poem "The Raven"? Entity: What foods can cause allergic reaction in people? Human: Who won the Nobel Peace Prize in 1992? Location: Where is the Statue of Liberty? Manner: How did Bob Marley die? Numeric: When was Martin Luther King Jr. born? Organization: What company makes Bentley cars?

79 Question Classifier based on Tree Kernels Question dataset ( [Lin and Roth, 2005]) Distributed on 6 categories: Abbreviations, Descriptions, Entity, Human, Location, and Numeric. Fixed split 5500 training and 500 test questions Cross-validation (10-folds) Using the whole question parse trees Constituent parsing Example What is an offer of direct stock purchase plan?

80

81 Kernels BOW, POS are obtained with a simple tree, e.g. BOX What is an offer an PT (parse tree) * * * * * PAS (predicate argument structure)

82 Question classification

83 Similarity based on WordNet

84 Question Classification with S/STK

85 Answer Classification data-set 138 TREC 2001 test questions labeled as description 1309 sentences, extracted from the best ranked paragraphs (using a basic QA system based on Google search engine) 416 of which labeled as correct by two annotators

86 Impact of Bow and PT in Answer classification F1-measure Q(BOW)+A(BOW) Q(BOW)+A(PT,BOW) Q(PT)+A(PT) Q(PT)+A(BOW) Q(BOW)+A(PT) Q(PT,BOW)+A(PT,BOW) Q(PT)+A(PT,BOW) j

87 The Impact of SSTK in Answer Classification F1-measure Q(BOW)+A(BOW) Q(BOW)+A(PT,BOW) Q(BOW)+A(BOW,PT,PAS) Q(BOW)+A(BOW,PAS) j

88 SVM-light-TK Software Encodes ST, SST and combination kernels in SVM-light [Joachims, 1999] Available at Tree forests, vector sets New extensions: the PT kernel will be released asap

89 Data Format What does Html stand for? 1 BT (SBARQ (WHNP (WP What))(SQ (AUX does)(np (NNP S.O.S.))(VP (VB stand)(pp (IN for))))(.?)) BT (BOW (What *)(does *)(S.O.S. *)(stand *)(for *)(? *)) BT (BOP (WP *)(AUX *)(NNP *)(VB *)(IN *)(. *)) BT (PAS (ARG0 (R-A1 (What *)))(ARG1 (A1 (S.O.S. NNP)))(ARG2 (rel stand))) ET 1:1 21: E-4 23:1 30:1 36:1 39:1 41:1 46:1 49:1 66:1 152:1 274:1 333:1 BV 2:1 21: E-4 23:1 31:1 36:1 39:1 41:1 46:1 49:1 52:1 66:1 152:1 246:1 333:1 392:1 EV

90 Basic Commands Training and classification./svm_learn -t 5 -C T train.dat model./svm_classify test.dat model Learning with a vector sequence./svm_learn -t 5 -C V train.dat model Learning with the sum of vector and kernel sequences./svm_learn -t 5 -C + train.dat model

91 Custom Kernel Kernel.h double custom_kernel(kernel_parm *kernel_parm, DOC *a, DOC *b); if(a->num_of_trees && b->num_of_trees && a- >forest_vec[i]!=null && b->forest_vec[i]! =NULL){// Test if one the i-th tree of instance a and b is an empty tree

92 Custom Kernel: tree-kernel k1= // summation of tree kernels tree_kernel(kernel_parm, a, b, i, i)/ Evaluate tree kernel between the two i-th trees. sqrt(tree_kernel(kernel_parm, a, a, i, i) * tree_kernel(kernel_parm, b, b, i, i)); Normalize respect to both i-th trees.

93 Custom Kernel: Polynomial kernel if(a->num_of_vectors && b->num_of_vectors && a->vectors[i]!=null && b->vectors[i]! =NULL){ Check if the i-th vectors are empty. k2= // summation of vectors basic_kernel(kernel_parm, a, b, i, i)/ Compute standard kernel (selected according to the "second_kernel" parameter).

94 Custom Kernel: Polynomial kernel sqrt( basic_kernel(kernel_parm, a, a, i, i) * basic_kernel(kernel_parm, b, b, i, i) ); //normalize vectors return k1+k2;

95 Conclusions Kernel methods and SVMs are useful tools to design language applications Kernel design still require some level of expertise Engineering approaches to tree kernels Basic Combinations Canonical Mappings, e.g. Node Marking Merging of kernels in more complex kernels State-of-the-art in SRL and QC An efficient tool to use them

96 Thank you

97 References Alessandro Moschitti, Silvia Quarteroni, Roberto Basili and Suresh Manandhar, Exploiting Syntactic and Shallow Semantic Kernels for Question/Answer Classification, Proceedings of the 45th Conference of the Association for Computational Linguistics (ACL), Prague, June Alessandro Moschitti and Fabio Massimo Zanzotto, Fast and Effective Kernels for Relational Learning from Texts, Proceedings of The 24th Annual International Conference on Machine Learning (ICML 2007), Corvallis, OR, USA. Daniele Pighin, Alessandro Moschitti and Roberto Basili, RTV: Tree Kernels for Thematic Role Classification, Proceedings of the 4th International Workshop on Semantic Evaluation (SemEval-4), English Semantic Labeling, Prague, June Stephan Bloehdorn and Alessandro Moschitti, Combined Syntactic and Semanitc Kernels for Text Classification, to appear in the 29th European Conference on Information Retrieval (ECIR), April 2007, Rome, Italy. Fabio Aiolli, Giovanni Da San Martino, Alessandro Sperduti, and Alessandro Moschitti, Efficient Kernel-based Learning for Trees, to appear in the IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Honolulu, Hawaii, 2007

98 An introductory book on SVMs, Kernel methods and Text Categorization

99 References Roberto Basili and Alessandro Moschitti, Automatic Text Categorization: from Information Retrieval to Support Vector Learning, Aracne editrice, Rome, Italy. Alessandro Moschitti, Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany, Alessandro Moschitti, Daniele Pighin, and Roberto Basili, Tree Kernel Engineering for Proposition Re-ranking, In Proceedings of Mining and Learning with Graphs (MLG 2006), Workshop held with ECML/PKDD 2006, Berlin, Germany, Elisa Cilia, Alessandro Moschitti, Sergio Ammendola, and Roberto Basili, Structured Kernels for Automatic Detection of Protein Active Sites. In Proceedings of Mining and Learning with Graphs (MLG 2006), Workshop held with ECML/PKDD 2006, Berlin, Germany, 2006.

100 References Fabio Massimo Zanzotto and Alessandro Moschitti, Automatic learning of textual entailments with cross-pair similarities. In Proceedings of COLING-ACL, Sydney, Australia, Alessandro Moschitti, Making tree kernels practical for natural language learning. In Proceedings of the Eleventh International Conference on European Association for Computational Linguistics, Trento, Italy, Alessandro Moschitti, Daniele Pighin and Roberto Basili. Semantic Role Labeling via Tree Kernel joint inference. In Proceedings of the 10th Conference on Computational Natural Language Learning, New York, USA, Alessandro Moschitti, Bonaventura Coppola, Daniele Pighin and Roberto Basili, Semantic Tree Kernels to classify Predicate Argument Structures. In Proceedings of the the 17th European Conference on Artificial Intelligence, Riva del Garda, Italy, 2006.

101 References Alessandro Moschitti and Roberto Basili, A Tree Kernel approach to Question and Answer Classification in Question Answering Systems. In Proceedings of the Conference on Language Resources and Evaluation, Genova, Italy, Ana-Maria Giuglea and Alessandro Moschitti, Semantic Role Labeling via FrameNet, VerbNet and PropBank. In Proceedings of the Joint 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, Australia, Roberto Basili, Marco Cammisa and Alessandro Moschitti, Effective use of wordnet semantics via kernel-based learning. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005), Ann Arbor(MI), USA, 2005

102 References Alessandro Moschitti, Ana-Maria Giuglea, Bonaventura Coppola and Roberto Basili. Hierarchical Semantic Role Labeling. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL 2005 shared task), Ann Arbor(MI), USA, Roberto Basili, Marco Cammisa and Alessandro Moschitti, A Semantic Kernel to classify texts with very few training examples. In Proceedings of the Workshop on Learning in Web Search, at the 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany, Alessandro Moschitti, Bonaventura Coppola, Daniele Pighin and Roberto Basili. Engineering of Syntactic Features for Shallow Semantic Parsing. In Proceedings of the ACL05 Workshop on Feature Engineering for Machine Learning in Natural Language Processing, Ann Arbor (MI), USA, 2005.

103 References Alessandro Moschitti, A study on Convolution Kernel for Shallow Semantic Parsing. In proceedings of ACL-2004, Spain, Alessandro Moschitti and Cosmin Adrian Bejan, A Semantic Kernel for Predicate Argument Classification. In proceedings of the CoNLL-2004, Boston, MA, USA, M. Collins and N. Duffy, New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In ACL02, S.V.N. Vishwanathan and A.J. Smola. Fast kernels on strings and trees. In Proceedings of Neural Information Processing Systems, 2002.

104 References AN INTRODUCTION TO SUPPORT VECTOR MACHINES (and other kernel-based learning methods) N. Cristianini and J. Shawe-Taylor Cambridge University Press Xavier Carreras and Llu ıs M`arquez Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In proceedings of CoNLL 05. Sameer Pradhan, Kadri Hacioglu, Valeri Krugler, Wayne Ward, James H. Martin, and Daniel Jurafsky Support vector learning for semantic argument classification. to appear in Machine Learning Journal.

105 Algorithm

106 The Impact of SSTK in Answer Classification F1-measure Q(BOW)+A(BOW) Q(BOW)+A(PT,BOW) Q(PT)+A(PT,BOW) Q(BOW)+A(BOW,PT,PAS) Q(BOW)+A(BOW,PT,PAS_N) Q(PT)+A(PT,BOW,PAS) Q(BOW)+A(BOW,PAS) Q(BOW)+A(BOW,PAS_N) j

107 Mercer s conditions (1)

108 Mercer s conditions (2) If the Gram matrix: G = k( x i x, j is positive semi-definite there is a mapping φ that produces the target kernel function )

109 The lexical semantic kernel is not always a kernel It may not be a kernel so we can use M M, where M is the initial similarity matrix

MACHINE LEARNING. Kernel Methods. Alessandro Moschitti

MACHINE LEARNING. Kernel Methods. Alessandro Moschitti MACHINE LEARNING Kernel Methods Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Communications Next Lectures will be on February

More information

Semantic Role Labeling via Tree Kernel Joint Inference

Semantic Role Labeling via Tree Kernel Joint Inference Semantic Role Labeling via Tree Kernel Joint Inference Alessandro Moschitti, Daniele Pighin and Roberto Basili Department of Computer Science University of Rome Tor Vergata 00133 Rome, Italy {moschitti,basili}@info.uniroma2.it

More information

Encoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case

Encoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case Encoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case Alessandro Moschitti DII University of Trento Via ommarive 14 38100 POVO (T) - Italy moschitti@dit.unitn.it

More information

Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling

Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling Wanxiang Che 1, Min Zhang 2, Ai Ti Aw 2, Chew Lim Tan 3, Ting Liu 1, Sheng Li 1 1 School of Computer Science and Technology

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

The Research on Syntactic Features in Semantic Role Labeling

The Research on Syntactic Features in Semantic Role Labeling 23 6 2009 11 J OU RNAL OF CH IN ESE IN FORMA TION PROCESSIN G Vol. 23, No. 6 Nov., 2009 : 100320077 (2009) 0620011208,,,, (, 215006) :,,,( NULL ),,,; CoNLL22005 Shared Task WSJ 77. 54 %78. 75 %F1, : ;;;;

More information

MACHINE LEARNING. Vapnik-Chervonenkis (VC) Dimension. Alessandro Moschitti

MACHINE LEARNING. Vapnik-Chervonenkis (VC) Dimension. Alessandro Moschitti MACHINE LEARNING Vapnik-Chervonenkis (VC) Dimension Alessandro Moschitti Department of Information Engineering and Computer Science University of Trento Email: moschitti@disi.unitn.it Computational Learning

More information

Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling

Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling Xavier Carreras and Lluís Màrquez TALP Research Center Technical University of Catalonia Boston, May 7th, 2004 Outline Outline of the

More information

Support Vector Machines: Kernels

Support Vector Machines: Kernels Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems

More information

Fast and Effective Kernels for Relational Learning from Texts

Fast and Effective Kernels for Relational Learning from Texts Fast and Effective Kernels for Relational Learning from Texts Alessandro Moschitti MOCHITTI@DIT.UNITN.IT Department of Information and Communication Technology, University of Trento, 38050 Povo di Trento,

More information

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised

More information

6.891: Lecture 24 (December 8th, 2003) Kernel Methods

6.891: Lecture 24 (December 8th, 2003) Kernel Methods 6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

Representing structured relational data in Euclidean vector spaces. Tony Plate

Representing structured relational data in Euclidean vector spaces. Tony Plate Representing structured relational data in Euclidean vector spaces Tony Plate tplate@acm.org http://www.d-reps.org October 2004 AAAI Symposium 2004 1 Overview A general method for representing structured

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

SVM merch available in the CS department after lecture

SVM merch available in the CS department after lecture SVM merch available in the CS department after lecture SVM SVM Introduction Previously we considered Perceptrons based on the McCulloch and Pitts neuron model. We identified a method for updating weights,

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS Filippo Portera and Alessandro Sperduti Dipartimento di Matematica Pura ed Applicata Universit a di Padova, Padova, Italy {portera,sperduti}@math.unipd.it

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Chunking with Support Vector Machines

Chunking with Support Vector Machines NAACL2001 Chunking with Support Vector Machines Graduate School of Information Science, Nara Institute of Science and Technology, JAPAN Taku Kudo, Yuji Matsumoto {taku-ku,matsu}@is.aist-nara.ac.jp Chunking

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM. Dr. Xin Zhang Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Learning with kernels and SVM

Learning with kernels and SVM Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Semantic Role Labeling and Constrained Conditional Models

Semantic Role Labeling and Constrained Conditional Models Semantic Role Labeling and Constrained Conditional Models Mausam Slides by Ming-Wei Chang, Nick Rizzolo, Dan Roth, Dan Jurafsky Page 1 Nice to Meet You 0: 2 ILP & Constraints Conditional Models (CCMs)

More information

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation

More information

Proposition Knowledge Graphs. Gabriel Stanovsky Omer Levy Ido Dagan Bar-Ilan University Israel

Proposition Knowledge Graphs. Gabriel Stanovsky Omer Levy Ido Dagan Bar-Ilan University Israel Proposition Knowledge Graphs Gabriel Stanovsky Omer Levy Ido Dagan Bar-Ilan University Israel 1 Problem End User 2 Case Study: Curiosity (Mars Rover) Curiosity is a fully equipped lab. Curiosity is a rover.

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania 1 PICTURE OF ANALYSIS PIPELINE Tokenize Maximum Entropy POS tagger MXPOST Ratnaparkhi Core Parser Collins

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

An Introduction to String Re-Writing Kernel

An Introduction to String Re-Writing Kernel An Introduction to String Re-Writing Kernel Fan Bu 1, Hang Li 2 and Xiaoyan Zhu 3 1,3 State Key Laboratory of Intelligent Technology and Systems 1,3 Tsinghua National Laboratory for Information Sci. and

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Bringing machine learning & compositional semantics together: central concepts

Bringing machine learning & compositional semantics together: central concepts Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 5: Text classification (Feb 5, 2019) David Bamman, UC Berkeley Data Classification A mapping h from input data x (drawn from instance space X) to a

More information

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning A Deterministic Word Dependency Analyzer Enhanced With Preference Learning Hideki Isozaki and Hideto Kazawa and Tsutomu Hirao NTT Communication Science Laboratories NTT Corporation 2-4 Hikaridai, Seikacho,

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

Text Mining. March 3, March 3, / 49

Text Mining. March 3, March 3, / 49 Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49

More information

Applied inductive learning - Lecture 7

Applied inductive learning - Lecture 7 Applied inductive learning - Lecture 7 Louis Wehenkel & Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - November 5, 2012 Find slides: http://montefiore.ulg.ac.be/

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Kernel methods CSE 250B

Kernel methods CSE 250B Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu

More information

Efficient kernels for sentence pair classification

Efficient kernels for sentence pair classification Efficient kernels for sentence pair classification Fabio Massimo Zanzotto DIP University of Rome Tor Vergata Via del Politecnico 1 00133 Roma Italy zanzotto@info.uniroma2.it Lorenzo Dell Arciprete University

More information

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Structural Representations for Learning Relations between Pairs of Texts

Structural Representations for Learning Relations between Pairs of Texts Structural Representations for Learning Relations between Pairs of Texts Simone Filice and Giovanni Da San Martino and Alessandro Moschitti ALT, Qatar Computing Research Institute, Hamad Bin Khalifa University

More information

Introduction to Data-Driven Dependency Parsing

Introduction to Data-Driven Dependency Parsing Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Global Machine Learning for Spatial Ontology Population

Global Machine Learning for Spatial Ontology Population Global Machine Learning for Spatial Ontology Population Parisa Kordjamshidi, Marie-Francine Moens KU Leuven, Belgium Abstract Understanding spatial language is important in many applications such as geographical

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

Spatial Role Labeling CS365 Course Project

Spatial Role Labeling CS365 Course Project Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important

More information

Multi-Component Word Sense Disambiguation

Multi-Component Word Sense Disambiguation Multi-Component Word Sense Disambiguation Massimiliano Ciaramita and Mark Johnson Brown University BLLIP: http://www.cog.brown.edu/research/nlp Ciaramita and Johnson 1 Outline Pattern classification for

More information

10/17/04. Today s Main Points

10/17/04. Today s Main Points Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1 Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,

More information

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

Fast Logistic Regression for Text Categorization with Variable-Length N-grams Fast Logistic Regression for Text Categorization with Variable-Length N-grams Georgiana Ifrim *, Gökhan Bakır +, Gerhard Weikum * * Max-Planck Institute for Informatics Saarbrücken, Germany + Google Switzerland

More information

Model-Theory of Property Grammars with Features

Model-Theory of Property Grammars with Features Model-Theory of Property Grammars with Features Denys Duchier Thi-Bich-Hanh Dao firstname.lastname@univ-orleans.fr Yannick Parmentier Abstract In this paper, we present a model-theoretic description of

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines Kernel Methods & Support Vector Machines Mahdi pakdaman Naeini PhD Candidate, University of Tehran Senior Researcher, TOSAN Intelligent Data Miners Outline Motivation Introduction to pattern recognition

More information

Learning SVM Classifiers with Indefinite Kernels

Learning SVM Classifiers with Indefinite Kernels Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Fast String Kernels Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Alex.Smola@anu.edu.au joint work with S.V.N. Vishwanathan Slides (soon) available

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences

More information