Linear Classifiers (Kernels)
|
|
- Terence Knight
- 5 years ago
- Views:
Transcription
1 Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer
2 Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March 25 th Next week Dr. Landwehr will give you info for registering; please think about what date would be best for you Remaining Lectures Jan. 21 Hypothesis Evaluation Jan. 28 Summary of Topics Feb. 4 <Study Time No Lecture> 2
3 Contents Kernels for Structured Data Spaces String Kernels, Graph Kernels Main Idea: Kernel learning separates data & learning Learning algorithm is developed to achieve reasonable separation of classes in a feature space. Kernel function is developed to express a pairwise notion of similarity that corresponds to an inner product in some feature space --- domain-specific! The kernel abstraction allows us to learn on data that is non-numeric / structured 3
4 Recall: Kernel Functions Kernel function k x, x = φ x T φ x computes the inner product of the feature mapping of 2 instances. The kernel function can often be computed without an explicit representation φ x. Eg, polynomial kernel: k poly x i, x j = x i T x j + 1 p Infinite-dimensional feature mappings are possible Eg, RBF kernel: k RBF x i, x j = e γ x i x j 2 For every positive definite kernel there is a feature mapping φ x such that k x, x = φ x T φ x. For a given kernel matrix, the Mercer map provides a feature mapping. 4
5 Recall: Polynomial Kernels Kernel: k poly x i, x j = x i T x j + 1 p, 2D-input, p = 2. k poly x i, x j = x i T x j = x i1 x i2 x j1 x j2 + 1 = x 2 i1 x 2 j1 + x 2 i2 x j2 2 = x i1 x j1 + x i2 x j x i1 x j1 x i2 x j2 + 2x i1 x j1 + 2x i2 x j x j1 2 x j2 = 2 x i1 2 x i2 2x i1 x i2 2x i1 2x i2 1 φ x i T All monomials of degree 2 over input attributes = x i x i 2x i 1 T x j x j 2x j 1 2x j1 x j2 2x j1 2x j2 1 φ x j 5
6 STRING KERNELS 6
7 Strings: Motivation Strings are a common non-numeric type of data Documents & are strings DNA & Protein sequences are strings 7
8 String Kernels String a sequence of characters from alphabet Σ written as s = s 1 s 2 s n with s = n. The set of all strings is Σ = n N Σ n s i:j = s i s i+1 s j Subsequence: for any i 0,1 n, s i is the elements of s corresponding to elements of i that are 1 Eg. If s= abcd s 1,0,0,1 = ad A string kernel is a real-valued function on Σ Σ. We need positive definite kernels We will design kernels by looking at a feature space of substrings / subsequences 8
9 Bag-of-Words Kernel For textual data, a simple feature representation is indexed by the words contained in the string Attribute Instance x Dear Beneficiary, your address has been picked online in this years MICROSOFT CONSUMER AWARD as a Winner of One Hundred and Fifty Five Thousand Pounds Sterling Word #1 occurs? Word #m occurs? m 1,000, Aardvark Beneficiary Friend Sterling Science Bag-of-Words Kernel computes the number of common words between 2 texts; efficient? 9
10 Spectrum Kernel Consider feature space with features corresponding to every p length substring of alphabet Σ. φ s u is # of times u Σ p is contained in string s The p-spectrum kernel is the result κ p s, t = φ s u T φ t u u Σ p φ aa ab ba bb K aaab bbab aaaa baab aaab bbab aaaa baab aaab bbab aaaa baab
11 Spectrum Kernel Computation Without explicitly computing this feature map, the p-spectrum kernel can be computed as s p+1 t p+1 κ p s, t = I s i:i+p 1 = t j:j+p 1 i=1 j=1 This computation is O p s t. Using trie data structures, this computation can be reduced to O p max s, t. Naturally, we can also compute (weighted) sums of different length substrings 11
12 All-Subsequences Kernel A subsequence is an ordered subset of a string Every subsequence of a string s of length n is uniquely indexed by some i 0,1 n The subsequence corresponding to i is s i Consider feature space with features corresponding to every string of alphabet Σ. φ s u is # of times u Σ p is a subsequence of s The all-subsequences kernel is the result κ p s, t = φ s u T φ t u u Σ 12
13 All-Subsequences Kernel The all-subsequences kernel is κ p s, t = φ s u T φ t u u Σ where φ s u is # of times u Σ p is a subsequence of s φ a b aa ab ba bb aaa aab aba abb baa bab bba bbb aab bab bba Problem: there are min length k in s s k, Σk subsequences of 13
14 All-Subsequences Kernel How can we avoid the exponential size of the explicit feature space? Consider rewriting the all-subsequence kernel as κ s, t = I s i = t j i,j These matching subsequences can be split into 2 possibilities; when The last character of s is not used in the match The last character of s is used in the match κ sσ, t = I s i = t j i,j κ s,t + I s i = u j u t=uσv i,j κ s,u 14
15 All-Subsequences Kernel κ s, t # matching subsequences of s and t Ignore last character of s # matching subsequences of s 1:n 1 and t Match last character of s to k th character of t # matching subsequences of s 1:n 1 and t 1:k 1 κ s 1:n 1, t + k:t k =s n κ s 1:n 1, t 1:k 1 15
16 All-Subsequences Kernel Based on this decomposition, we get a recursion with base cases: κ s, = 1 and κ, s = 1 for all s and recursions κ s, t = κ s 1:n 1, t + κ s 1:n 1, t 1:k 1 k:t k =s n κ s, t = κ s, t 1:m 1 + κ s 1:k 1, t 1:m 1 k:s k =t m 1 st term corresponds to ignoring last character of s/ t 2 nd term corresponds to possible matches of last character within other string Naïve recursion still exponential dynamic programming 16
17 Dynamic Programming Solution Initial State: matches only 1 subsequence m a c h i n e l 1 e 1 a 1 r 1 n 1 i 1 n 1 g 1 17
18 Dynamic Programming Solution l does not match any character in machine m a c h i n e l e 1 a 1 r 1 n 1 i 1 n 1 g 1 18
19 Dynamic Programming Solution e matches the last character in machine e added m a c h i n e l e a 1 r 1 n 1 i 1 n 1 g 1 19
20 Dynamic Programming Solution a matches 2 nd character in machine a added m a c h i n e l e a r 1 n 1 i 1 n 1 g 1 20
21 Dynamic Programming Solution r does not match any character in machine m a c h i n e l e a r n 1 i 1 n 1 g 1 21
22 Dynamic Programming Solution n matches 6 th character in machine n and an added m a c h i n e l e a r n i 1 n 1 g 1 22
23 Dynamic Programming Solution i matches 5 th character in machine i and ai added m a c h i n e l e a r n i n 1 g 1 23
24 Dynamic Programming Solution n matches 6th character in machine n, in, an, ain m a c h i n e l e a r n i n g 1 24
25 Dynamic Programming Solution g does not match any character in machine m a c h i n e l e a r n i n g
26 Dynamic Programming Solution m a c h i n e l e a r n i n g Total matching subsequences: 11 26
27 All-Subsequences Kernel Using caching of sub-results, this dynamic programming solution runs in O s t. AllSubseqKernel( s, t ) FOR j = 0: t : DP[0,j] = 1; FOR i = 1: s : last = 0; cache[0]=0; FOR k = 1: t : cache[k] = cache[last]; IF t k = s i THEN cache[k] += DP[i-1,k-1]; last = k; FOR k = 0: t : DP[i,k] = DP[i-1,k] + cache[k]; RETURN DP[ s, t ]; Note: strings are 1-indexed but DP & cache have a 0-index for 27
28 String Kernels Here we have seen a number of string kernels that can be efficiently computed (using dynamic programming, tries, etc.) Bag-of-word kernel p-spectrum kernel All-subsequences kernel Many other variants exist (fixed length subsequence, gap-weighted subsequence, mismatch, etc.) Choice of kernel depends on notion of similarity appropriate for application domain Kernel normalization / centering are common 28
29 GRAPH KERNELS 29
30 Graphs: Motivation Graphs are often used to model objects and their relationship to one another: Bioinformatics: Molecule relationships Internet, social networks Central Question: How similar are two Graphs? How similar are two nodes within a Graph? 30
31 Graph Kernel: Example Consider a dataset of websites with links constituting the edges in the graph A kernel on the nodes of the graph would be useful for learning w.r.t. the web-pages A kernel on graphs would be useful for comparing different components of the internet (e.g. domains) 31
32 Graph Kernel: Example Consider a set of chemical pathways (sequences of interactions among molecules); i.e. graphs A node kernel would a useful way to measure similarity of different molecules roles within these A graph kernel would be a useful measure of similarity for different pathways 32
33 Graphs: Definition A graph G = V, E is specified by A set of nodes: v 1,, v n V A set of edges: E V V Data structures for representing graphs: Adjacency matrix: A = a ij, a i,j=1 ij = I v i, v j E n Adjacency list Incidence matrix v 1 v 3 v 2 v 4 G 1 = V 1, E 1 V 1 = v 1,, v 4 A 1 = E 1 = v 1, v 1, v 1, v 2, v 2, v 3, v 4, v
34 Similarity between Graphs Central Question: How similar are two graphs? 1st Possibility: Number of isomorphisms between all (sub-) graphs. v 1 v 2 v a v b v 3 v 5 v 4 v c v d v e G 1 = V 1, E 1 G 2 = V 2, E 2 34
35 Isomorphisms of Graphs Isomorphism: Two Graphs G 1 = V 1, E 1 & G 2 = V 2, E 2 are isomorphic if there exists a bijective mapping f V 1 V 2 so that v i, v j E 1 f v i, f v j E 2 v 1 v 2 v a v b v 3 v 5 v 4 v c v d v e G 1 = V 1, E 1 G 2 = V 2, E 2 35
36 Isomorphisms of Graphs Isomorphism: Two Graphs G 1 = V 1, E 1 & G 2 = V 2, E 2 are isomorphic if there NP-hard! exists a bijective mapping f V 1 V 2 so that v i, v j E 1 f v i, f v j E 2 v 1 v 2 v a v b v 3 v 5 v 4 v c v d v e G 1 = V 1, E 1 G 2 = V 2, E 2 36
37 Similarity between Graphs Central Question: How similar are two graphs? 2nd Possibility: Counting the number of common paths in the graph. v 1 v 2 v a v b v 3 v 5 v 4 v c v d v e G 1 = V 1, E 1 G 2 = V 2, E 2 37
38 Common Paths in Graphs The number of paths of length 0 is just the number of nodes in the graph. v 1 v 3 v 1 v 3 v 2 v 4 v 2 v 4 G 1 = V 1, E 1 38
39 Common Paths in Graphs The number of paths of length 1 from one node to any other is given by the adjacency matrix. 1 v 1 v 3 v 2 v 4 G 1 = V 1, E 1 From v 1 v 2 A 1 = v 3 v v 1 v 2 v 3 v 4 To 39
40 Common Paths in Graphs Number of paths of length k from one node to any other are given by the k th power of the adjacency matrix. 1 v 1 v 3 v 2 v 4 G 1 = V 1, E 1 From v 1 A 2 v 2 1 = v 3 v v 1 v 2 v 3 v 4 To 40
41 Common Paths in Graphs Number of paths of length k from one node to any other are given by the k th power of the adjacency matrix. Proof? k 1 v 1 v 3 v 2 v 4 G 1 = V 1, E 1 A 1 k = From v 1 v 2 v 3 v v 1 v 2 v 3 v 4 To k > 2 41
42 Common Paths in Graphs Number of paths of length k from one node to any other are given by the k th power of the adjacency matrix. k 1 v 1 v 3 v 2 v 4 G 1 = V 1, E 1 From v 1 A k v 2 1 = v 3 v v 1 v 2 v 3 v 4 To k > 2 n Number of paths of length k: i,j=1 A k ij = 1 T A k 1 42
43 Common Paths in Graphs Common paths are given by product graphs G = V, E : V = V 1 V 2 E = v, v, w, w v, w E 1 v, w E 2 a a1 a2 b 1 b1 b2 c 2 c1 c2 G 1 G 2 G 43
44 Similarity between Graphs Similarity between graphs: number of common paths in their product graph. a b c 1 2 a1 b1 c1 G 1 G 2 G a2 b2 c2 A 0 = From a1 a2 b1 b2 c1 c a1 a2 b1 b2 c1 c2 To CP 0 = n i,j=1 A 0 ij = 6 44
45 Similarity between Graphs Similarity between graphs: number of common paths in their product graph. a b c a1 b1 c1 G 1 G 2 G a2 b2 c2 A = From a1 a2 b1 b2 c1 c a1 a2 b1 b2 c1 c2 To CP 1 = CP 0 + n i,j=1 A 1 ij = = 12 45
46 Similarity between Graphs a b c Similarity between graphs: number of common paths in their product graph a1 b1 G 1 G 2 G c1 a2 b2 c2 A 2 = From a1 a2 b1 b2 c1 c a1 a2 b1 b2 c1 c2 To CP 2 = CP 1 + n i,j=1 A 2 ij = = 16 46
47 Similarity between Graphs Similarity between graphs: number of common paths in their product graph. a b c G G 2 a1 b1 c1 G a2 b2 c2 A 3 = From a1 a2 b1 b2 c1 c a1 a2 b1 b2 c1 c2 To CP 3 = CP 2 + n i,j=1 A 3 ij = = 16 47
48 Similarity between Graphs Similarity between graphs: number of common paths in their product graph. a b c G G 2 a1 b1 c1 G a2 b2 c2 CP = A k ij n k=0 i,j=1 A k = From a1 a2 b1 b2 c1 c2 = a1 a2 b1 b2 c1 c2 To k > 2 48
49 Similarity between Graphs Similarity between graphs: number of common paths in their product graph. a b c G 1 With cycles, there can be an infinite number paths! 1 2 G 2 L n CP L = A k ij k=0 i,j=1 a1 b1 c1 G a2 b2 c2 A k = From a1 a2 b1 b2 c1 c2 = 3 2 L L k 1 k 1 k a1 a2 b1 b2 c1 c2 To k > 2 49
50 Similarity between Graphs Similarity between graphs: number of common paths in their product graph. With cycles, there can be an infinite number paths! We must downweight the influence of long paths. Random Walk Kernels: k G 1, G 2 = k G 1, G 2 = n 1 λ k k A V 1 V 2 k=0 i,j=1 n 1 λ k V 1 V 2 k! k=0 i,j=1 A k ij ij = 1T I λa 1 1 V 1 V 2 = 1T exp λa 1 V 1 V 2 These kernels can be calculated by means of the Sylvester equation in O n 3. 50
51 Similarity between Nodes Similarity between graphs: number of common paths in their product graph. Assumption: Nodes are similar if they are connected by many paths. Random Walk Kernels: k v i, v j = λ k A k k v i, v j = k=1 k=1 λ k k! A k ij ij = I λa 1 = exp λa ij ij 51
52 Additional Graph-Kernels Shortest-Path Kernel All shortest paths between pairs of nodes computed by Floyd-Warshall algorithm with run time O V 3 Compare all pairs of shortest paths between 2 graphs O V 1 2 V 2 2 Subtree-Kernel: Idea: use tree structures as indexes in the feature space Can be recursively computed for a fixed height tree Trees are downweighted in their height 52
53 Summary Kernel functions provide a measure of similarity that allows to compare non-numeric data String Kernels based on space of all strings, they count the # of common occurrences within 2 strings Graph Kernels they use common structures within graphs as a basis for their feature space Paths all-paths kernel, random-walk kernel, shortest path kernel Subtrees subtree kernel Kernels are also defined on other structures (e.g. trees, images, ) The kernel is selected for a particular domain 53
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationModels, Data, Learning Problems
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,
More informationCOSE212: Programming Languages. Lecture 1 Inductive Definitions (1)
COSE212: Programming Languages Lecture 1 Inductive Definitions (1) Hakjoo Oh 2018 Fall Hakjoo Oh COSE212 2018 Fall, Lecture 1 September 5, 2018 1 / 10 Inductive Definitions Inductive definition (induction)
More informationCOSE212: Programming Languages. Lecture 1 Inductive Definitions (1)
COSE212: Programming Languages Lecture 1 Inductive Definitions (1) Hakjoo Oh 2017 Fall Hakjoo Oh COSE212 2017 Fall, Lecture 1 September 4, 2017 1 / 9 Inductive Definitions Inductive definition (induction)
More informationHarvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries
Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries Harry Lewis September 5, 2013 Reading: Sipser, Chapter 0 Sets Sets are defined by their members A = B means that for every x, x A iff
More informationA canonical semi-deterministic transducer
A canonical semi-deterministic transducer Achilles A. Beros Joint work with Colin de la Higuera Laboratoire d Informatique de Nantes Atlantique, Université de Nantes September 18, 2014 The Results There
More informationIntrusion Detection and Malware Analysis
Intrusion Detection and Malware Analysis IDS feature extraction Pavel Laskov Wilhelm Schickard Institute for Computer Science Metric embedding of byte sequences Sequences 1. blabla blubla blablabu aa 2.
More informationChapter 4. Regular Expressions. 4.1 Some Definitions
Chapter 4 Regular Expressions 4.1 Some Definitions Definition: If S and T are sets of strings of letters (whether they are finite or infinite sets), we define the product set of strings of letters to be
More informationDynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.
Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal
More informationFABER Formal Languages, Automata. Lecture 2. Mälardalen University
CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2010 1 Content Languages, g Alphabets and Strings Strings & String Operations Languages & Language Operations
More informationTheory of Computation
Theory of Computation Lecture #2 Sarmad Abbasi Virtual University Sarmad Abbasi (Virtual University) Theory of Computation 1 / 1 Lecture 2: Overview Recall some basic definitions from Automata Theory.
More informationCSCI 340: Computational Models. Regular Expressions. Department of Computer Science
CSCI 340: Computational Models Regular Expressions Chapter 4 Department of Computer Science Yet Another New Method for Defining Languages Given the Language: L 1 = {x n for n = 1 2 3...} We could easily
More informationCS 133 : Automata Theory and Computability
CS 133 : Automata Theory and Computability Lecture Slides 1 Regular Languages and Finite Automata Nestine Hope S. Hernandez Algorithms and Complexity Laboratory Department of Computer Science University
More informationLearning From Data Lecture 26 Kernel Machines
Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100 recap: The Kernel Allows Us to Bypass Z-space
More informationIn English, there are at least three different types of entities: letters, words, sentences.
Chapter 2 Languages 2.1 Introduction In English, there are at least three different types of entities: letters, words, sentences. letters are from a finite alphabet { a, b, c,..., z } words are made up
More informationA Universal Turing Machine
A Universal Turing Machine A limitation of Turing Machines: Turing Machines are hardwired they execute only one program Real Computers are re-programmable Solution: Universal Turing Machine Attributes:
More informationAuthor: Vivek Kulkarni ( )
Author: Vivek Kulkarni ( vivek_kulkarni@yahoo.com ) Chapter-3: Regular Expressions Solutions for Review Questions @ Oxford University Press 2013. All rights reserved. 1 Q.1 Define the following and give
More informationTheory of Computer Science
Theory of Computer Science C1. Formal Languages and Grammars Malte Helmert University of Basel March 14, 2016 Introduction Example: Propositional Formulas from the logic part: Definition (Syntax of Propositional
More informationCSEP 590 Data Compression Autumn Arithmetic Coding
CSEP 590 Data Compression Autumn 2007 Arithmetic Coding Reals in Binary Any real number x in the interval [0,1) can be represented in binary as.b 1 b 2... where b i is a bit. x 0 0 1 0 1... binary representation
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More information{a, b, c} {a, b} {a, c} {b, c} {a}
Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also say, S is a partially ordered set or S is a poset.
More informationAutomata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008
Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008 Name: ID #: This is a Close Book examination. Only an A4 cheating sheet belonging to you is acceptable. You can write your answers
More informationSemigroup presentations via boundaries in Cayley graphs 1
Semigroup presentations via boundaries in Cayley graphs 1 Robert Gray University of Leeds BMC, Newcastle 2006 1 (Research conducted while I was a research student at the University of St Andrews, under
More informationThe Probability of Winning a Series. Gregory Quenell
The Probability of Winning a Series Gregory Quenell Exercise: Team A and Team B play a series of n + games. The first team to win n + games wins the series. All games are independent, and Team A wins any
More informationKernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Non-linear models Unsupervised machine learning Generic scaffolding So far Supervised
More informationKernel Methods. Konstantin Tretyakov MTAT Machine Learning
Kernel Methods Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Supervised machine learning Linear models Least squares regression, SVR Fisher s discriminant, Perceptron, Logistic model,
More informationTheoretical Computer Science
Theoretical Computer Science Zdeněk Sawa Department of Computer Science, FEI, Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba 708 33 Czech republic September 22, 2017 Z. Sawa (TU Ostrava)
More informationCS375 Midterm Exam Solution Set (Fall 2017)
CS375 Midterm Exam Solution Set (Fall 2017) Closed book & closed notes October 17, 2017 Name sample 1. (10 points) (a) Put in the following blank the number of strings of length 5 over A={a, b, c} that
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationCS Automata, Computability and Formal Languages
Automata, Computability and Formal Languages Luc Longpré faculty.utep.edu/longpre 1 - Pg 1 Slides : version 3.1 version 1 A. Tapp version 2 P. McKenzie, L. Longpré version 2.1 D. Gehl version 2.2 M. Csűrös,
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationThe Binomial Theorem.
The Binomial Theorem RajeshRathod42@gmail.com The Problem Evaluate (A+B) N as a polynomial in powers of A and B Where N is a positive integer A and B are numbers Example: (A+B) 5 = A 5 +5A 4 B+10A 3 B
More informationStructure-Based Comparison of Biomolecules
Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein
More informationTuring s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine
Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine There is no known model of computation more powerful than Turing Machines Definition of Algorithm:
More informationAutomata Theory and Formal Grammars: Lecture 1
Automata Theory and Formal Grammars: Lecture 1 Sets, Languages, Logic Automata Theory and Formal Grammars: Lecture 1 p.1/72 Sets, Languages, Logic Today Course Overview Administrivia Sets Theory (Review?)
More informationCS6902 Theory of Computation and Algorithms
CS6902 Theory of Computation and Algorithms Any mechanically (automatically) discretely computation of problem solving contains at least three components: - problem description - computational tool - procedure/analysis
More information1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,
1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u, v, x, y, z as per the pumping theorem. 3. Prove that
More informationClass Note #20. In today s class, the following four concepts were introduced: decision
Class Note #20 Date: 03/29/2006 [Overall Information] In today s class, the following four concepts were introduced: decision version of a problem, formal language, P and NP. We also discussed the relationship
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationCS 241 Analysis of Algorithms
CS 241 Analysis of Algorithms Professor Eric Aaron Lecture T Th 9:00am Lecture Meeting Location: OLB 205 Business Grading updates: HW5 back today HW7 due Dec. 10 Reading: Ch. 22.1-22.3, Ch. 25.1-2, Ch.
More informationBasics of WQO theory, with some applications in computer science
Basics of WQO theory, with some applications in computer science aka WQOs for dummies Ph. Schnoebelen LSV, CNRS, Cachan CMI Silver Jubilee Lecture Chennai, Feb. 23rd, 2015 INTRODUCTION Well-quasi-orderings,
More informationAutomata Theory CS F-04 Non-Determinisitic Finite Automata
Automata Theory CS411-2015F-04 Non-Determinisitic Finite Automata David Galles Department of Computer Science University of San Francisco 04-0: Non-Determinism A Deterministic Finite Automata s transition
More informationHW 3 Solutions. Tommy November 27, 2012
HW 3 Solutions Tommy November 27, 2012 5.1.1 (a) Online solution: S 0S1 ɛ. (b) Similar to online solution: S AY XC A aa ɛ b ɛ C cc ɛ X axb aa b Y by c b cc (c) S X A A A V AV a V V b V a b X V V X V (d)
More informationTheory of Computation
Fall 2002 (YEN) Theory of Computation Midterm Exam. Name:... I.D.#:... 1. (30 pts) True or false (mark O for true ; X for false ). (Score=Max{0, Right- 1 2 Wrong}.) (1) X... If L 1 is regular and L 2 L
More informationTalen en Automaten Lecture 1: Regular Languages Herman Geuvers 1/19
Talen en Automaten Lecture 1: Regular Languages Herman Geuvers 1/19 Outline Organisation Regular Languages 2/19 About this course I Lectures Teachers: Jurriaan Rot (except today: Herman Geuvers) Weekly,
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationC1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.
Theory of Computer Science March 20, 2017 C1. Formal Languages and Grammars Theory of Computer Science C1. Formal Languages and Grammars Malte Helmert University of Basel March 20, 2017 C1.1 Introduction
More informationStudent#: CISC-462 Exam, December XY, 2017 Page 1 of 12
Student#: CISC-462 Exam, December XY, 2017 Page 1 of 12 Queen s University, Faculty of Arts and Science, School of Computing CISC-462 Final Exam, December XY, 2017 (Instructor: Kai Salomaa) INSTRUCTIONS
More informationRecap from Last Time
Regular Expressions Recap from Last Time Regular Languages A language L is a regular language if there is a DFA D such that L( D) = L. Theorem: The following are equivalent: L is a regular language. There
More informationHomework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz
CS3133 - A Term 2009: Foundations of Computer Science Prof. Carolina Ruiz Homework 4 WPI By Li Feng, Shweta Srivastava, and Carolina Ruiz Chapter 7 Problem: Chap 7.1 part a This PDA accepts the language
More informationCSE 549: Computational Biology. Computer Science for Biologists Biology
CSE 549: Computational Biology Computer Science for Biologists Biology What is Computer Science? http://people.cs.pitt.edu/~kirk/cs2110/computer_science_major.png What is Computer Science? Not actually
More informationWhat Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages
Do Homework 2. What Is a Language? Grammars, Languages, and Machines L Language Grammar Accepts Machine Strings: the Building Blocks of Languages An alphabet is a finite set of symbols: English alphabet:
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationRegular Expressions and Finite-State Automata. L545 Spring 2008
Regular Expressions and Finite-State Automata L545 Spring 2008 Overview Finite-state technology is: Fast and efficient Useful for a variety of language tasks Three main topics we ll discuss: Regular Expressions
More informationFormal solution Chen-Fliess series
Formal solution Chen-Fliess series Ṡ = S(t) u a (t) a, S() = 1 Z = m a Z on algebra Â(Z ) of formal power series in aset Z of noncommuting indeterminates (letters) has the unique solution CF(T, u) = w
More informationBOUNDS ON ZIMIN WORD AVOIDANCE
BOUNDS ON ZIMIN WORD AVOIDANCE JOSHUA COOPER* AND DANNY RORABAUGH* Abstract. How long can a word be that avoids the unavoidable? Word W encounters word V provided there is a homomorphism φ defined by mapping
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationarxiv: v1 [math.gr] 21 Feb 2014
LEFT TRANSITIVE AG-GROUPOIDS M. RASHAD, I. AHMAD, AND M. SHAH arxiv:1402.5296v1 [math.gr] 21 Feb 2014 Abstract. An AG-groupoid is an algebraic structure that satisfies the left invertive law: (ab)c = (cb)a.
More informationKernel Methods. Charles Elkan October 17, 2007
Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then
More informationDynamic Programming: Shortest Paths and DFA to Reg Exps
CS 374: Algorithms & Models of Computation, Spring 207 Dynamic Programming: Shortest Paths and DFA to Reg Exps Lecture 8 March 28, 207 Chandra Chekuri (UIUC) CS374 Spring 207 / 56 Part I Shortest Paths
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationLearning Context Free Grammars with the Syntactic Concept Lattice
Learning Context Free Grammars with the Syntactic Concept Lattice Alexander Clark Department of Computer Science Royal Holloway, University of London alexc@cs.rhul.ac.uk ICGI, September 2010 Outline Introduction
More informationICS141: Discrete Mathematics for Computer Science I
ICS141: Discrete Mathematics for Computer Science I Dept. Information & Computer Sci., Jan Stelovsky based on slides by Dr. Baek and Dr. Still Originals by Dr. M. P. Frank and Dr. J.L. Gross Provided by
More informationWarshall s algorithm
Regular Expressions [1] Warshall s algorithm See Floyd-Warshall algorithm on Wikipedia The Floyd-Warshall algorithm is a graph analysis algorithm for finding shortest paths in a weigthed, directed graph
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationAutomata: a short introduction
ILIAS, University of Luxembourg Discrete Mathematics II May 2012 What is a computer? Real computers are complicated; We abstract up to an essential model of computation; We begin with the simplest possible
More informationContext Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages
Context Free Languages Automata Theory and Formal Grammars: Lecture 6 Context Free Languages Last Time Decision procedures for FAs Minimum-state DFAs Today The Myhill-Nerode Theorem The Pumping Lemma Context-free
More informationCS A Term 2009: Foundations of Computer Science. Homework 2. By Li Feng, Shweta Srivastava, and Carolina Ruiz.
CS3133 - A Term 2009: Foundations of Computer Science Prof. Carolina Ruiz Homework 2 WPI By Li Feng, Shweta Srivastava, and Carolina Ruiz Chapter 4 Problem 1: (10 Points) Exercise 4.3 Solution 1: S is
More informationValence automata over E-unitary inverse semigroups
Valence automata over E-unitary inverse semigroups Erzsi Dombi 30 May 2018 Outline Motivation Notation and introduction Valence automata Bicyclic and polycyclic monoids Motivation Chomsky-Schützenberger
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationCombinatorial Optimization
Combinatorial Optimization Problem set 8: solutions 1. Fix constants a R and b > 1. For n N, let f(n) = n a and g(n) = b n. Prove that f(n) = o ( g(n) ). Solution. First we observe that g(n) 0 for all
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationarxiv: v1 [cs.ds] 9 Apr 2018
From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract
More informationCircuits. Lecture 11 Uniform Circuit Complexity
Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2 Recall Non-uniform complexity P/1 Decidable 2 Recall Non-uniform complexity P/1 Decidable NP P/log NP = P 2 Recall
More informationLecture 12 Simplification of Context-Free Grammars and Normal Forms
Lecture 12 Simplification of Context-Free Grammars and Normal Forms COT 4420 Theory of Computation Chapter 6 Normal Forms for CFGs 1. Chomsky Normal Form CNF Productions of form A BC A, B, C V A a a T
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationHierarchical Overlap Graph
Hierarchical Overlap Graph B. Cazaux and E. Rivals LIRMM & IBC, Montpellier 8. Feb. 2018 arxiv:1802.04632 2018 B. Cazaux & E. Rivals 1 / 29 Overlap Graph for a set of words Consider the set P := {abaa,
More informationAutomata Theory CS F-08 Context-Free Grammars
Automata Theory CS411-2015F-08 Context-Free Grammars David Galles Department of Computer Science University of San Francisco 08-0: Context-Free Grammars Set of Terminals (Σ) Set of Non-Terminals Set of
More informationSection 1.3 Ordered Structures
Section 1.3 Ordered Structures Tuples Have order and can have repetitions. (6,7,6) is a 3-tuple () is the empty tuple A 2-tuple is called a pair and a 3-tuple is called a triple. We write (x 1,, x n )
More informationWeek Some Warm-up Questions
1 Some Warm-up Questions Week 1-2 Abstraction: The process going from specific cases to general problem. Proof: A sequence of arguments to show certain conclusion to be true. If... then... : The part after
More informationCS-C Data Science Chapter 8: Discrete methods for analyzing large binary datasets
CS-C3160 - Data Science Chapter 8: Discrete methods for analyzing large binary datasets Jaakko Hollmén, Department of Computer Science 30.10.2017-18.12.2017 1 Rest of the course In the first part of the
More informationSolutions to Problem Set 3
V22.0453-001 Theory of Computation October 8, 2003 TA: Nelly Fazio Solutions to Problem Set 3 Problem 1 We have seen that a grammar where all productions are of the form: A ab, A c (where A, B non-terminals,
More informationFiniteness conditions and index in semigroup theory
Finiteness conditions and index in semigroup theory Robert Gray University of Leeds Leeds, January 2007 Robert Gray (University of Leeds) 1 / 39 Outline 1 Motivation and background Finiteness conditions
More informationCS375: Logic and Theory of Computing
CS375: Logic and Theory of Computing Fuhua (Frank) Cheng Department of Computer Science University of Kentucky 1 Table of Contents: Week 1: Preliminaries (set algebra, relations, functions) (read Chapters
More informationarxiv: v1 [math.ra] 15 Jul 2013
Additive Property of Drazin Invertibility of Elements Long Wang, Huihui Zhu, Xia Zhu, Jianlong Chen arxiv:1307.3816v1 [math.ra] 15 Jul 2013 Department of Mathematics, Southeast University, Nanjing 210096,
More information1 More finite deterministic automata
CS 125 Section #6 Finite automata October 18, 2016 1 More finite deterministic automata Exercise. Consider the following game with two players: Repeatedly flip a coin. On heads, player 1 gets a point.
More informationComplexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler
Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard
More informationOutline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.
Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationFall, 2017 CIS 262. Automata, Computability and Complexity Jean Gallier Solutions of the Practice Final Exam
Fall, 2017 CIS 262 Automata, Computability and Complexity Jean Gallier Solutions of the Practice Final Exam December 6, 2017 Problem 1 (10 pts). Let Σ be an alphabet. (1) What is an ambiguous context-free
More informationUndecibability. Hilbert's 10th Problem: Give an algorithm that given a polynomial decides if the polynomial has integer roots or not.
Undecibability Hilbert's 10th Problem: Give an algorithm that given a polynomial decides if the polynomial has integer roots or not. The problem was posed in 1900. In 1970 it was proved that there can
More information