A Brief Survey on Semi-supervised Learning with Graph Regularization
|
|
- Dominic Stafford
- 5 years ago
- Views:
Transcription
1 A Brie Survey on Semi-supervised Learning with Graph Regularization Hongyuan You Department o Computer Science University o Caliornia, Santa Barbara hyou@cs.ucsb.edu Abstract In this survey, we go over a ew historical literatures on semi-supervised learning problems which apply graph regularization on both labled and unlabeled data to improve classiication perormance. These semi-supervised methods usually construct a nearest neighbour graph on instance space under certain measure unction, and then wor under the smoothness assumption that class labels o samples change quite slowly or connected instances on the graph. Graph Lapalcian o the graph has been used in the graph smoothness regularization since it is a discrete analogous to Beltrami Laplace smoothness operator in the continuous space. A relationship between ernel method and graph Laplacian has also been discovered, and suggests an approach to design speciic ernel or classiication utilizing the spectrum o Laplacian matrix. Smoothness Regularization via Laplace Operator and Graph Laplacian. Semi-supervised Learning Problem Nowadays there are many practical applications o machine learning problem. Though in academic papers we oten assume these problems have some nice conditions and huge number o samples or inerence and learning, we cannot achieve such datasets in reality. One o the biggest problem is that not all o sample instances have class labels since people can hardly label millions o instances manually. Then we need some learning method that can deal with the problem with both labeld and unlabled data at the same time, utilizing the labels o labeled instances and also the inormation rom unlabeled instances. Let X be the instance space, {x, x 2,..., x } are the labeled training samples with labels {y, y 2,..., y }, y i [, +], and {x +, x +2,..., x n } are the n unlabeled samples. Usually we have n which means only very ew samples are labeled. Our purpose is learning a classiication unction : x [, +] to predict the class o testing samples. We restrict ourselves to the transductive setting where the unlabelled data also serve as the test data in our perormance analysis..2 Maniold Assumption and Laplace Operator The inormation rom unlabeled instance is hard to be incorporated into the classiication model unless we could raise a reasonable assumption o the unlabeld data. The irst assumption is thatthe data lies on a low-dimensional maniold within a high-dimensional representation space. We can easily ind many examples to justiy this assumption. For example a handwriting digital number can be represented as a matrix o image pixel with very high dimension, however it can also be represented by only a ew parameters in the low-dimensional ambient space, and in this space, the structure o similar handwriting digital numbers lie close to each other to orm a low-dimensional maniold. Similar examples can be ind in document categorization problems, where people discov-
2 ered documents represented by word identiier vector (remar the counts or requency) should be in a maniold no matter how high dimension the original documents are. Realizing the existence o the intrinsic structure o the instance space is signiicant or developing classiication methods or such ind o problems. On such a low-dimensional maniold, we should have the second assumption, which we may already suggest above, that instances have a small distance on the maniold should have the same labels. Under this assumption, we are able to use these unlabeled instances as long as they are close to some labeled training samples. Under the maniold setting, we can use the Laplace Operator or regularization. Let the Laplace operator = Σ i 2 / x 2 i, and let be a L2 unction deined on an n-dimensional compact Riemann maniold M. It has been now that, the spectrum o is discrete (λ 0 = 0) and its eigenunctions e i ( ) orm a basis o L 2 (M), which means any can be represented as a linear combination o e i ( ): (x) = α i e i (x) () i=0 where e i ( ) = λ i e i ( ). Using Laplace operator we can have a natural measure o smoothness or any unction on M, which computes the variation o unction values: S() := 2 dµ = dµ =, L 2 (M) (2) M And the smoothness o eigenunctions: M S(e i ( )) = e i ( ), e i ( ) L2 (M) = λ i (3) Then we can easily compute the smoothness by eigenvalue λ i s and decomposition actor α i s: S(( )) =, L2 (M) = α i e i ( ), α i e i ( ) L2 (M) = λ i αi 2 (4) i=0 The last equation tells us, the eigenunction with large eigenvalues will lead to a high smoothness value and diers a lot on dierent instances that consists an oscillation component o the original unction. However the eigenunction with low eigenvalue has a smooth pattern under the measure. For the learning problem, we can mae be the classiier we want. Then i we want close instances on the low-dimensional maniold have similar classiication results, then the smoothness measure S() should be small enough. So S() become an ideal regularizer..3 Graph Approximation o Maniold and Graph Laplacian However how to recover the intrinsic structure or the low-dimensional maniold o the input data is still an open problem remained unsolved. What we need here or semi-supervised learning is the second assumption on relationship between similarity distance and the label. We could construct a nearest neighbour graph G to connect similar instances (though distance on graph diers rom distance on maniold), and mae the graph to be a good approximation o the low-dimensional maniold o the input instance space. Any instance x i should adding t nearest neighbours in the graph G. Let A be the adjacency matrix o G, and A ij = ω ij = i instances x i and x j are close under some similarity unction, such as angle distance (you can also change values on edges into meaningul weights). People have developed many methods on how to compute the similarity distance that we would omit them here. Till now our assumption could be clear, Two instances connected by one edge should have similar classiication results. Under the idea o graph approximation o maniold, we can use the graph Laplacian as a natural analogous o Lapalce operator. Let L = D A be the Laplcian matrix o the adjacency matrix A o graph G, where D = diag{d, d 2,..., d n } and d i is the degree o instance x i in the nearest neighbour graph G. We now that L has its eigensystem {(λ i, v i )} n and {v i} n orm an orthonormal basis o R n. Any vector = {, 2,..., n } T can be represented with a set o basis, = α i v i, Lv i = λ i v i (5) i=0 i=0 2
3 More important, the natural measure o smoothness could eep the same orm when we use L to replace : S L () := ω ij ( i j ) 2 = T L =, L G (6) i,j From this smoothness deinition, we can notice that i ω ij is nonzero and large value, then a dierence o their labels will increase the smoothness measure s value, and a same label will mae the item zero, which is exactly what we want in regularization. The smoothness o eigenunctions are the same as their eigenvalues: S L (v i ) = v T i Lv i = λ i v T i v i = λ i (7) Then we can easily compute the smoothness by eigenvalue λ i s and decomposition actor α i s: S L () = L, G = α i Lv i, α i v i G = λ i αi 2 (8) As we can see these properties are all the same with the continuous version. 2 Learning Optimal Labels or Unlabeled Data 2. Classiication as Interpolating Problem I we treat the vector s entries are the unction values o classiier ( ) on some instances, and the model o instance space has the smoothness assumption, then it means can be computed as a combination o the prediction v i which are the eigenvectors o Laplacian matrix L. The only thing we need to now is the combination actor α i s. This early wor in the semi-supervised learning problem is irst done by Mihail Belin and Partha Niyogi[2]. The continuous ormulation should be 2 p = arg in (x) α j e j (x) dx (9) M However, its hard to get those eigenunctions on the maniold M, and we usually could access to a limit amount o labeled instances, thus they set the discrete version o ormulation below: 2 p = arg in y i α j v j (i) (0) where {v i } p j= is the irst p eigenvectors o L corresponding to the smallest p eigenvalues, and = Σ j α j v j. Although they mentioned the smoothness measure o the Graph Laplacian, they did not explicitly use them in this method but choose the smallest p eigenvectors. Looing bac to the ormula o S L () we could ind that such component selection leads to the top p smooth eigenvectors and ensure the smoothness o the learned prediction results. One can solve the least square problem as ollowing: j=0 j= α = ( E T E ) E T y () where α = (α, α 2,..., α p ) T, y = (y, y 2,..., y p ) T, and v () v 2 () v p () v (2) v 2 (2) v p (2) E = (2) v (s) v 2 (s) v p (s) The inal classiier or unlabeled data in the dataset is p (x i ) = sgn α j v j (i) (3) One problem might be the diiculty o approximation the Laplace operator decomposition with graph Laplacian decomposition, because the decomposition o eigenunctions o should be much more meaningul and generalizable than the decomposition o eigenvectors o. j= 3
4 Tihonov Regularization The optimization problem o this method simply changes a little bit rom above that is to explicitly add the smoothness measure into the objective unction. But the theoretical result o interest shows the error bound is controlled with the Fiedler number o the graph, which means generalization ability o graph regularization learning method has some relationship with the geometric invariants o the graph [2]. The setting here is a little dierent: labeled instances could be randomly chosen without replacement, which means each labeled instance may appear more than once with same or dierent labels. And we preprocess the labels lie ỹ = (y ȳ, y 2 ȳ,..., y ȳ), ȳ = y i (4) They try to learn the labels by solving ollowing optimization problem: = arg in (ỹ i i ) 2 + γs L () (5) T =0 where S L is the regularization term and S L could be replaced by S L p or p N or S exp ( tl) or t > 0. Let n i be the number o the multiplicity o x i, and let y ij be the j-th label o x i where j =, 2,..., n i. Then we can rewrite the objective unction into the quadratic programming problem: n i n i arg in n i 2 i + yij 2 2 i y ij + γ T L T =0 = arg in T =0 j= ( T I 2ỹ T j= ) + γ T L = arg in T ( + γs) 2ỹ T T =0 By Lagrange multiplier, we set L(, µ) = 0, then get (I + γs) = ỹ + (µ/2), redeine µ/2 µ, the optimal solution o label vector is where the selection o µ maes T = 0. = ( + γs) (ỹ + µ) µ = T ( + γs) ỹ T ( + γs) By deining the empirical error bound on labeled training samples R () = ( i ỹ i ) 2 (8) and deining the generalization error bound on all labled and unlabeled samples (6) (7) R() = E x D,y R ((x) ỹ(x)) 2 (9) We can have the ollowing perormance bound[2] based on stability perormance analysis [3]: Let γ be the regularization parameter, T be a set o > 4 vertices x,...x where each vertex occurs no more than t times, together with values y,..., y, and y i M. Let be the regularization solution using the smoothness unctional L with the second smallest eigenvalue λ 2. Assuming that i K. We have the ollowing bound exists with probability at least δ: 2log(2/δ) R () R() β + (β + (K + M)) (20) where β = 3M t (γλ 2 t) 2 + 4M γλ 2 t (2) 4
5 Here the Fiedler number λ 2 shows in the generalization error bound. λ 2 gives an estimation o the connectivity o the graph. I λ 2 is small, then it is possible to have a small graph cut, and i λ 2 is large, then the graph is more connected. And we can see rom the bound, the larger the λ 2, the better perormance on the smoothness regularization, because an over-sparse graph is hard to peorm smoothness assumption, and the prediction value o a single instance is inluenced only on a very ew number o neighbours which may leads to overitting and worse generalization perormance. One drawbac o these two methods is that they can only predict labels o a set o unlabeled instance at one time. In the next time, they have to recompute the minimization problem or a new set o unlabeled data, that is, they do not learn an explicit orm o classiier (x) or the instance which does not show up in the dataset but only the best prediction result (x i ). To overcome this shortage, we will then loo into ernel method and ind the relationship between graph Laplaican and the ernel matrix. 3 Kernel Method and Graph Regularization 3. Kernel unction and ernel matrix Kernel method is a class o algorithms which contains Support Vector Machine (SVM). The basic idea is utilizing training sample instances to predict testing samples through pairwise similarity. The pairwise similarity matrix is then the ernel matrix K, which the entry K ij = K(x i, x j ) shows the similarity between x i and x j, and K(, ) here should be a ernel unction over X X, where X be the instance space. Actually, a unction K : X X R is a ernel unction, i: (i) K is symmetric, i.e. K(x, y) = K(y, x); (ii) K is positive semi-deinite, i.e. or any set o {x,..., x n}, ernel matrix K (which K ij = K(x i, x j )) is positive semi-deinite. And or every valid ernel unction K, people has proved in Mercer theorem that there is always a unction φ : X V which satisies K(x, y) = φ(x), φ(y) V. It means the pairwise similarity computed by ernel unction actually comes rom the inner product in some high dimensional eature space V (usually), and φ is a ind o eature transormation. This eature-space ormulation will helps in the uture. 3.2 Repoducing ernel Hilbert spaces Let us loo deeper into the ernel method and reproducing ernel Hilbert space, which is helpul or solving the semi-supervised learning problem using graph regularization. Let X be the instance space, K : X X R be a ernel unction described above. Let unctions : X R where the classiier should comes rom. H K is a Hilbert space o unction s with inner product, K. We say that H K is a reproducing ernel Hilbert space (RKHS), i the deinition o H K and, K satisy the ollowing two conditions: (i) mapping unction Φ : x K(x, ), or any x X, we have K(x, ) H K ; (ii) (x) = ( ), K(x, ) K, or any H K and x X. The irst condition is a map rom x to a unction K(x, ). The second condition is the reproducing ernel property which implies K(x, y) = K(x, ), K(y, ). Φ maps a space o instances into a space o unctions, and maes the unction value o at point x become the inner product o and K(x, ) Regularization in an RKHS is usally ormulated as ollowing minimization problem which will learn a unction H K : = arg in E γ (K) = arg in L((x i ), y i ) + γ, K (22) 5
6 where L is the loss unction and γ > 0 is the unction regularization parameter. People has nown that the minimizer has the ollowing orm [0] (x) = Σ c i K(x i, x), x X (23) which is a linear combination o the reproducing mapping unction K(x i, ) o training sample instancs x i. This orm helps a lot in our learning problem because it reduces a learning problem on all possible unctions into a problem o learning a set o combination actors. Thus it is possible to generalize previous methods by learning a classiier (x) or all instances x in space X rather than just learning the labels or unlabled data in the training dataset. 3.3 Construct RKHS and Regularization A speciic setting is that Let H K be the linear combination o the ollowing ind o unction: ( ) = l α i K(x i, ) (24) where l N, and α i R, K is the valid ernel unction, x i s are labeled or unlabled training samples. Let the inner product o H K deined as ollowing ( ), g( ) K = i α i K(x i, ), j α j K(x j, ) = i,j α i α j K(x i, x j ) (25) Under this setting you can easily chec that H K is a RKHS with ernel K and inner product, K. Then people proved ollowing equivalence [7][8]: I we have a valid ernel K on all the labled and unlabeled instances, the problem o learning labels o unlabeled instances (ocusing on labels) below: is equivalent to the problem = arg in ( ) = arg in ( ( ) L (ỹ i i ) + γ T K ) L (ỹ i (x i )) + γ, K Note that (i) the second problem is the same as SVM in the eature space ormulation (we mentioned earlier) and can be solved in dual space eiciently; (ii) the irst problem optimizes a label vector and the second problem optimizes a unction; (iii) The equivalence means that (x i ) = x i or those unlabled instances in the dataset. This equivalence means that we can learn a classiier to predict new instances without more computational cost. 3.4 From Graph Laplacian to Kernel Now the only problem is how to choose a valid ernel rom the graph Laplacian matrix L we now. Assume that we have the graph Laplacian matrix L over n instances which o them are labeled training samples. Let us deine a semi-norm on R n space with L: (26) (27), g := T Lg,, g R n (28) It is a semi-norm because that, = 0 i is a constant vector. However i we deine the space H 0 to be the space o real value unctions deined on instances in the graph, and let H be a subspace o H 0 which any unction cannot have the same value on all instances. Then it is clear that, g is a well deined norm on H 0 [6]. We can claim that the pseudoinverse L + is a valid ernel matrix. It is because we now the ollowing equation T LL + = LL + = (29) 6
7 which can be interpreted as ollowing i =, L + i = T LL + i (30) where L + i is the i-th column o L+. It shows a reproducing property o the inner product deined with L + on H. Thus L+ is a valid reproducing ernel on H. I the graph Laplacian matrix is invertible, we simply use L other than the expression o L +. And we can use such K = L + in the minimization problem in the last subsection. This is a simple justiication o design ernel rom graph Laplacian matrix. And urther, we can hope that the ernel matrix has the ollowing spectral orm: K = v i v T i (3) λ i where v i s are eigenvectors o L. 4 Spectral Transorm or Graph Kernel 4. Parametric Spectral Transorm or Graph Kernel However, we can urther improve our ernel or better perormance, where the basic idea is to encourage the spectrum with small eigenvalue and penalize the spectrum with high eigenvalues. Here we add a unction r( ) deined on eigenvalues and perorm a parametric spectral transorm o K 0 to design a better ernel K. Then we have the ollowing orm o ernel K. K = r(λ i ) v iv T i (32) As shown in [4], under such spectral grap ernel ramewor, many ernels can be designed by transorming the Regularized Laplacian Kernel Diusion Process Kernel K = (I + σ 2 L) r(λ) = + σ 2 λ (33) K = exp( σ 2 /2 L) p-step Random Wal Kernel K = (αi L) p, α 2 Inverse Cosine Kernel K = cos( Lπ/4) 4.2 Nonparametric Spectral Transorm or Graph Kernel r(λ) = exp(σ 2 /2λ) (34) r(λ) = exp(σ 2 /2λ) (35) r(λ) = exp(σ 2 /2λ) (36) Many methods use the ramewor above to design ernels rom graph Laplacian matrix L. But the spectral transormation does not address question o which parameter amily to use or r. The number o degrees o reedom o r (usually one or two) might be too ew or modelling the learning problem and leaves the ernel overly constrained. Then people are trying to optimize a nonparametric spectral transormation which only applies the ordering constraint (see the last constraint) [5]. arg max µ Â(K, T ) = subject to K = Σ n µ i K i µ i 0 trace(k) = µ i µ i+, i = 2,.., n K, T F K, K F T, T F (37) 7
8 The objective unction is called the empirical ernel alignment which evaluates the itness o a ernel K to the labels o labled training samples. It has many good properties which have been shown in [9]., F is the Frobenius product, deined as M, N F = Σ i,j = M ij N(ij). K in the objective unction is the ernel matrix K restricted in the irst labeled training samples, and T is the outer product o y, that is, T = yy T. And µ = (µ, µ 2,..., µ n ) T is the combination actor o ernel s spectrum. Such an ordering constraint is reasonable. As we mentioned beore, considering the smoothness measure S G, the smaller the eigenvalue λ i, the smoother its corresponding eigenvector v i over the graph G. Then the ordering constraint maintain a decreasing signiicance order o the transormed spectral vector o the ernel, which implies it encourages smooth unctions in the end. Note that we do not constrain the constant vector v with the zero eigenvalue λ (assume connected graph G), which do mae sense or it is simply a bias. Constraint trace(k) = are both trying to ix the scale invariance o the ernel alignment. The optimization problem above has an equivalent orm: arg max K, T F µ subject to K, K F µ i 0 µ i µ i+, i = 2,.., n Under such subtle coniguration o the optimization problem, it can be urther transormed into an equivalent quadratically constrained quadratic programming problem (QCQP) below which has easible and ast optimization method. Reerences arg max µ vec(t ) T Mµ subject to Mµ µ i 0 µ i µ i+, i = 2,.., n [] Belin, M., & Niyogi, P. (2002). Semi-supervised learning on maniolds. Advances in Neural Inormation Processing Systems (NIPS). [2] Belin, M., Matveeva, I., & Niyogi, P. (2004). Regularization and semi-supervised learning on large graphs. Learning theory, Springer Berlin Heidelberg. [3] Bousquet, O., & Elissee, A. (2002). Stability and generalization. The Journal o Machine Learning Research, 2, [4] Smola, A. J., & Kondor, R. (2003). Kernels and regularization on graphs. In Learning theory and ernel machines (pp ). Springer Berlin Heidelberg. [5] Zhu, X., Kandola, J. S., Ghahramani, Z., & Laerty, J. D. (2004). Nonparametric Transorms o Graph Kernels or Semi-Supervised Learning. Advances in Neural Inormation Processing Systems (NIPS), Vol. 7, [6] Argyriou, A., Herbster, M., & Pontil, M. (2005, August). Combining graph Laplacians or semi-supervised learning. Advances in Neural Inormation Processing Systems (NIPS). [7] Johnson, R., & Zhang, T. (2008). Graph-based semi-supervised learning and spectral ernel design. Inormation Theory, IEEE Transactions on, 54(), [8] Zhang, T., & Ando, R. (2006). Analysis o spectral ernel design based semi-supervised learning. Advances in neural inormation processing systems, 8, 60. [9] Cristianini, N., Shawe-Taylor, J., Elissee, A., & Kandola, J. (200, December). On ernel target alignment. Advances in neural inormation processing systems, 8, Vol. 2, 4. [0] Herbster, M., Pontil, M., & Wainer, L. (2005, August). Online learning over graphs. Proceedings o the 22nd international conerence on Machine learning, (38) (39) 8
Definition: Let f(x) be a function of one variable with continuous derivatives of all orders at a the point x 0, then the series.
2.4 Local properties o unctions o several variables In this section we will learn how to address three kinds o problems which are o great importance in the ield o applied mathematics: how to obtain the
More informationFisher Consistency of Multicategory Support Vector Machines
Fisher Consistency o Multicategory Support Vector Machines Yueng Liu Department o Statistics and Operations Research Carolina Center or Genome Sciences University o North Carolina Chapel Hill, NC 7599-360
More information1 Graph Kernels by Spectral Transforms
Graph Kernels by Spectral Transforms Xiaojin Zhu Jaz Kandola John Lafferty Zoubin Ghahramani Many graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target
More informationPredicting Graph Labels using Perceptron. Shuang Song
Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction
More informationCOMP 408/508. Computer Vision Fall 2017 PCA for Recognition
COMP 408/508 Computer Vision Fall 07 PCA or Recognition Recall: Color Gradient by PCA v λ ( G G, ) x x x R R v, v : eigenvectors o D D with v ^v (, ) x x λ, λ : eigenvalues o D D with λ >λ v λ ( B B, )
More informationA Simple Explanation of the Sobolev Gradient Method
A Simple Explanation o the Sobolev Gradient Method R. J. Renka July 3, 2006 Abstract We have observed that the term Sobolev gradient is used more oten than it is understood. Also, the term is oten used
More informationManifold Regularization
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is
More informationRoberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points
Roberto s Notes on Dierential Calculus Chapter 8: Graphical analysis Section 1 Extreme points What you need to know already: How to solve basic algebraic and trigonometric equations. All basic techniques
More informationGaussian Process Regression Models for Predicting Stock Trends
Gaussian Process Regression Models or Predicting Stock Trends M. Todd Farrell Andrew Correa December 5, 7 Introduction Historical stock price data is a massive amount o time-series data with little-to-no
More informationThe Deutsch-Jozsa Problem: De-quantization and entanglement
The Deutsch-Jozsa Problem: De-quantization and entanglement Alastair A. Abbott Department o Computer Science University o Auckland, New Zealand May 31, 009 Abstract The Deustch-Jozsa problem is one o the
More informationFluctuationlessness Theorem and its Application to Boundary Value Problems of ODEs
Fluctuationlessness Theorem and its Application to Boundary Value Problems o ODEs NEJLA ALTAY İstanbul Technical University Inormatics Institute Maslak, 34469, İstanbul TÜRKİYE TURKEY) nejla@be.itu.edu.tr
More informationNumerical Solution of Ordinary Differential Equations in Fluctuationlessness Theorem Perspective
Numerical Solution o Ordinary Dierential Equations in Fluctuationlessness Theorem Perspective NEJLA ALTAY Bahçeşehir University Faculty o Arts and Sciences Beşiktaş, İstanbul TÜRKİYE TURKEY METİN DEMİRALP
More informationScattered Data Approximation of Noisy Data via Iterated Moving Least Squares
Scattered Data Approximation o Noisy Data via Iterated Moving Least Squares Gregory E. Fasshauer and Jack G. Zhang Abstract. In this paper we ocus on two methods or multivariate approximation problems
More information2. ETA EVALUATIONS USING WEBER FUNCTIONS. Introduction
. ETA EVALUATIONS USING WEBER FUNCTIONS Introduction So ar we have seen some o the methods or providing eta evaluations that appear in the literature and we have seen some o the interesting properties
More informationSupport Vector Machine
Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationTANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION. Shiliang Sun
TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION Shiliang Sun Department o Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 200241, P. R.
More informationSolving Continuous Linear Least-Squares Problems by Iterated Projection
Solving Continuous Linear Least-Squares Problems by Iterated Projection by Ral Juengling Department o Computer Science, Portland State University PO Box 75 Portland, OR 977 USA Email: juenglin@cs.pdx.edu
More informationOn the Consistency of Multi-Label Learning
On the Consistency o Multi-Label Learning Wei Gao and Zhi-Hua Zhou National Key Laboratory or Novel Sotware Technology Nanjing University, Nanjing 210093, China {gaow,zhouzh}@lamda.nju.edu.cn Abstract
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationOn the Girth of (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs
On the Girth o (3,L) Quasi-Cyclic LDPC Codes based on Complete Protographs Sudarsan V S Ranganathan, Dariush Divsalar and Richard D Wesel Department o Electrical Engineering, University o Caliornia, Los
More informationSemi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationAH 2700A. Attenuator Pair Ratio for C vs Frequency. Option-E 50 Hz-20 khz Ultra-precision Capacitance/Loss Bridge
0 E ttenuator Pair Ratio or vs requency NEEN-ERLN 700 Option-E 0-0 k Ultra-precision apacitance/loss ridge ttenuator Ratio Pair Uncertainty o in ppm or ll Usable Pairs o Taps 0 0 0. 0. 0. 07/08/0 E E E
More informationNONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS
REVSTAT Statistical Journal Volume 16, Number 2, April 2018, 167 185 NONPARAMETRIC PREDICTIVE INFERENCE FOR REPRODUCIBILITY OF TWO BASIC TESTS BASED ON ORDER STATISTICS Authors: Frank P.A. Coolen Department
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationOBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL
OBSERVER/KALMAN AND SUBSPACE IDENTIFICATION OF THE UBC BENCHMARK STRUCTURAL MODEL Dionisio Bernal, Burcu Gunes Associate Proessor, Graduate Student Department o Civil and Environmental Engineering, 7 Snell
More informationLaplacian Spectrum Learning
Laplacian Spectrum Learning Pannagadatta K. Shivaswamy and Tony Jebara Department of Computer Science Columbia University New York, NY 0027 {pks203,jebara}@cs.columbia.edu Abstract. The eigenspectrum of
More informationCISE-301: Numerical Methods Topic 1:
CISE-3: Numerical Methods Topic : Introduction to Numerical Methods and Taylor Series Lectures -4: KFUPM Term 9 Section 8 CISE3_Topic KFUPM - T9 - Section 8 Lecture Introduction to Numerical Methods What
More information( x) f = where P and Q are polynomials.
9.8 Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm ( ) ( ) ( ) P where P and Q are polynomials. Q An eample o a simple rational
More informationSemi-Supervised Learning with Graphs
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu LTI SCS CMU Thesis Committee John Lafferty (co-chair) Ronald Rosenfeld (co-chair) Zoubin Ghahramani Tommi Jaakkola 1 Semi-supervised Learning classifiers
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationWhat the No Free Lunch Theorems Really Mean; How to Improve Search Algorithms
What the No Free Lunch Theorems Really Mean; How to Improve Search Algorithms David H. Wolpert SFI WORKING PAPER: 2012-10-017 SFI Working Papers contain accounts o scientiic work o the author(s) and do
More informationA graph based approach to semi-supervised learning
A graph based approach to semi-supervised learning 1 Feb 2011 Two papers M. Belkin, P. Niyogi, and V Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples.
More informationCHAPTER 1: INTRODUCTION. 1.1 Inverse Theory: What It Is and What It Does
Geosciences 567: CHAPTER (RR/GZ) CHAPTER : INTRODUCTION Inverse Theory: What It Is and What It Does Inverse theory, at least as I choose to deine it, is the ine art o estimating model parameters rom data
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationThe achievable limits of operational modal analysis. * Siu-Kui Au 1)
The achievable limits o operational modal analysis * Siu-Kui Au 1) 1) Center or Engineering Dynamics and Institute or Risk and Uncertainty, University o Liverpool, Liverpool L69 3GH, United Kingdom 1)
More informationRobust Residual Selection for Fault Detection
Robust Residual Selection or Fault Detection Hamed Khorasgani*, Daniel E Jung**, Gautam Biswas*, Erik Frisk**, and Mattias Krysander** Abstract A number o residual generation methods have been developed
More informationLocality Preserving Projections
Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationLeast-Squares Spectral Analysis Theory Summary
Least-Squares Spectral Analysis Theory Summary Reerence: Mtamakaya, J. D. (2012). Assessment o Atmospheric Pressure Loading on the International GNSS REPRO1 Solutions Periodic Signatures. Ph.D. dissertation,
More information2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.
.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics Spline interpolation was originally developed or image processing. In GIS, it is mainly used in visualization o spatial
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationFinding Consensus in Multi-Agent Networks Using Heat Kernel Pagerank
Finding Consensus in Multi-Agent Networs Using Heat Kernel Pageran Olivia Simpson and Fan Chung 2 Abstract We present a new and eicient algorithm or determining a consensus value or a networ o agents.
More informationImprovement of Sparse Computation Application in Power System Short Circuit Study
Volume 44, Number 1, 2003 3 Improvement o Sparse Computation Application in Power System Short Circuit Study A. MEGA *, M. BELKACEMI * and J.M. KAUFFMANN ** * Research Laboratory LEB, L2ES Department o
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationSemi-Supervised Learning
Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationImproving Semi-Supervised Target Alignment via Label-Aware Base Kernels
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels Qiaojun Wang, Kai Zhang 2, Guofei Jiang 2, and Ivan Marsic
More informationDifferentiation. The main problem of differential calculus deals with finding the slope of the tangent line at a point on a curve.
Dierentiation The main problem o dierential calculus deals with inding the slope o the tangent line at a point on a curve. deinition() : The slope o a curve at a point p is the slope, i it eists, o the
More informationNumerical Methods - Lecture 2. Numerical Methods. Lecture 2. Analysis of errors in numerical methods
Numerical Methods - Lecture 1 Numerical Methods Lecture. Analysis o errors in numerical methods Numerical Methods - Lecture Why represent numbers in loating point ormat? Eample 1. How a number 56.78 can
More informationLecture : Feedback Linearization
ecture : Feedbac inearization Niola Misovic, dipl ing and Pro Zoran Vuic June 29 Summary: This document ollows the lectures on eedbac linearization tought at the University o Zagreb, Faculty o Electrical
More informationGraph-Based Semi-Supervised Learning
Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationCurve Sketching. The process of curve sketching can be performed in the following steps:
Curve Sketching So ar you have learned how to ind st and nd derivatives o unctions and use these derivatives to determine where a unction is:. Increasing/decreasing. Relative extrema 3. Concavity 4. Points
More informationFinite Dimensional Hilbert Spaces are Complete for Dagger Compact Closed Categories (Extended Abstract)
Electronic Notes in Theoretical Computer Science 270 (1) (2011) 113 119 www.elsevier.com/locate/entcs Finite Dimensional Hilbert Spaces are Complete or Dagger Compact Closed Categories (Extended bstract)
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationOn a Closed Formula for the Derivatives of e f(x) and Related Financial Applications
International Mathematical Forum, 4, 9, no. 9, 41-47 On a Closed Formula or the Derivatives o e x) and Related Financial Applications Konstantinos Draais 1 UCD CASL, University College Dublin, Ireland
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationSpectral Feature Selection for Supervised and Unsupervised Learning
Spectral Feature Selection for Supervised and Unsupervised Learning Zheng Zhao Huan Liu Department of Computer Science and Engineering, Arizona State University zhaozheng@asu.edu huan.liu@asu.edu Abstract
More informationThe intrinsic structure of FFT and a synopsis of SFT
Global Journal o Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 2 (2016), pp. 1671-1676 Research India Publications http://www.ripublication.com/gjpam.htm The intrinsic structure o FFT
More informationSemi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High
More informationSIO 211B, Rudnick. We start with a definition of the Fourier transform! ĝ f of a time series! ( )
SIO B, Rudnick! XVIII.Wavelets The goal o a wavelet transorm is a description o a time series that is both requency and time selective. The wavelet transorm can be contrasted with the well-known and very
More informationRegularization on Discrete Spaces
Regularization on Discrete Spaces Dengyong Zhou and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany {dengyong.zhou, bernhard.schoelkopf}@tuebingen.mpg.de
More informationRandom Walks, Random Fields, and Graph Kernels
Random Walks, Random Fields, and Graph Kernels John Lafferty School of Computer Science Carnegie Mellon University Based on work with Avrim Blum, Zoubin Ghahramani, Risi Kondor Mugizi Rwebangira, Jerry
More information8.4 Inverse Functions
Section 8. Inverse Functions 803 8. Inverse Functions As we saw in the last section, in order to solve application problems involving eponential unctions, we will need to be able to solve eponential equations
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationThe concept of limit
Roberto s Notes on Dierential Calculus Chapter 1: Limits and continuity Section 1 The concept o limit What you need to know already: All basic concepts about unctions. What you can learn here: What limits
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationDETC A GENERALIZED MAX-MIN SAMPLE FOR RELIABILITY ASSESSMENT WITH DEPENDENT VARIABLES
Proceedings o the ASME International Design Engineering Technical Conerences & Computers and Inormation in Engineering Conerence IDETC/CIE August 7-,, Bualo, USA DETC- A GENERALIZED MAX-MIN SAMPLE FOR
More informationCombining Classifiers and Learning Mixture-of-Experts
38 Combining Classiiers and Learning Mixture-o-Experts Lei Xu Chinese University o Hong Kong Hong Kong & Peing University China Shun-ichi Amari Brain Science Institute Japan INTRODUCTION Expert combination
More informationITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION
ITERATED SRINKAGE ALGORITHM FOR BASIS PURSUIT MINIMIZATION Michael Elad The Computer Science Department The Technion Israel Institute o technology Haia 3000, Israel * SIAM Conerence on Imaging Science
More informationThe Space of Negative Scalar Curvature Metrics
The Space o Negative Scalar Curvature Metrics Kyler Siegel November 12, 2012 1 Introduction Review: So ar we ve been considering the question o which maniolds admit positive scalar curvature (p.s.c.) metrics.
More informationSTAT 801: Mathematical Statistics. Hypothesis Testing
STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o
More informationProbabilistic Model of Error in Fixed-Point Arithmetic Gaussian Pyramid
Probabilistic Model o Error in Fixed-Point Arithmetic Gaussian Pyramid Antoine Méler John A. Ruiz-Hernandez James L. Crowley INRIA Grenoble - Rhône-Alpes 655 avenue de l Europe 38 334 Saint Ismier Cedex
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationRobust Laplacian Eigenmaps Using Global Information
Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX
More information1 Relative degree and local normal forms
THE ZERO DYNAMICS OF A NONLINEAR SYSTEM 1 Relative degree and local normal orms The purpose o this Section is to show how single-input single-output nonlinear systems can be locally given, by means o a
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationEvent-triggered Model Predictive Control with Machine Learning for Compensation of Model Uncertainties
Event-triggered Model Predictive Control with Machine Learning or Compensation o Model Uncertainties Jaehyun Yoo, Adam Molin, Matin Jaarian, Hasan Esen, Dimos V. Dimarogonas, and Karl H. Johansson Abstract
More informationA Rough Sets Based Classifier for Induction Motors Fault Diagnosis
A ough Sets Based Classiier or Induction Motors Fault Diagnosis E. L. BONALDI, L. E. BOGES DA SILVA, G. LAMBET-TOES AND L. E. L. OLIVEIA Electronic Engineering Department Universidade Federal de Itajubá
More informationProduct Matrix MSR Codes with Bandwidth Adaptive Exact Repair
1 Product Matrix MSR Codes with Bandwidth Adaptive Exact Repair Kaveh Mahdaviani, Soheil Mohajer, and Ashish Khisti ECE Dept, University o Toronto, Toronto, ON M5S3G4, Canada ECE Dept, University o Minnesota,
More informationA Comparative Study of Non-separable Wavelet and Tensor-product. Wavelet; Image Compression
Copyright c 007 Tech Science Press CMES, vol., no., pp.91-96, 007 A Comparative Study o Non-separable Wavelet and Tensor-product Wavelet in Image Compression Jun Zhang 1 Abstract: The most commonly used
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton October 30, 2017
More informationThe construction of two dimensional Hilbert Huang transform and its
The construction o two dimensional Hilbert Huang transorm and its application in image analysis 1 Lihong Qiao, Sisi Chen *1, Henan univ. o technology,iaolihong@16.com, XingTai University, chssxt@163.com
More informationSpectral Clustering. by HU Pili. June 16, 2013
Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework
More informationIn many diverse fields physical data is collected or analysed as Fourier components.
1. Fourier Methods In many diverse ields physical data is collected or analysed as Fourier components. In this section we briely discuss the mathematics o Fourier series and Fourier transorms. 1. Fourier
More informationTLT-5200/5206 COMMUNICATION THEORY, Exercise 3, Fall TLT-5200/5206 COMMUNICATION THEORY, Exercise 3, Fall Problem 1.
TLT-5/56 COMMUNICATION THEORY, Exercise 3, Fall Problem. The "random walk" was modelled as a random sequence [ n] where W[i] are binary i.i.d. random variables with P[W[i] = s] = p (orward step with probability
More information