Least-Squares Independence Test

Size: px
Start display at page:

Download "Least-Squares Independence Test"

Transcription

1 IEICE Transactions on Information and Sstems, vol.e94-d, no.6, pp ,. Least-Squares Independence Test Masashi Sugiama Toko Institute of Technolog, Japan PRESTO, Japan Science and Technolog Agenc, Toko, Japan Taiji Suzuki The Universit of Toko, Toko, Japan Abstract Identifing the statistical independence of random variables is one of the important tasks in statistical data analsis. In this paper, we propose a novel non-parametric independence test based on a least-squares densit ratio estimator. Our method, called least-squares independence test (LSIT), is distribution-free, and thus it is more flexible than parametric approaches. Furthermore, it is equipped with a model selection procedure based on cross-validation. This is a significant advantage over existing non-parametric approaches which often require manual parameter tuning. The usefulness of the proposed method is shown through numerical experiments. Kewords independence test, densit ratio estimation, unconstrained least-squares importance fitting, squared-loss mutual information. Introduction Identifing the statistical independence of random variables is one of the fundamental tasks in statistical data analsis. Independence tests can be used for various purposes such as feature selection [9] and causal inference []. A traditional independence measure is the Pearson correlation coefficient, which can be used for detecting linear dependenc. Thus, it is useful for Gaussian data, although the Gaussian assumption is rarel fulfilled in practice. Recentl, kernel-based independence measures have been studied in order to overcome the weakness of the Pearson correlation coefficient. The Hilbert-Schmidt independence criterion (HSIC) [4] utilizes

2 Least-Squares Independence Test cross-covariance operators on universal reproducing kernel Hilbert spaces (RKHSs) [7], which is an infinite-dimensional generalization of covariance matrices. HSIC allows efficient detection of non-linear dependenc b making use of the reproducing propert of RKHSs []. However, HSIC has a critical weakness that its performance depends on the choice of RKHSs and there is no theoreticall justified wa to determine the RKHS properl. In practice, using the Gaussian RKHS with width set to the median distance between samples is a popular heuristic [4]. Another popular independence criterion would be mutual information []. In this paper, we consider a squared-loss variant of mutual information and use its estimator leastsquares mutual information (LSMI) [9] for independence test. LSMI is also distributionfree as HSIC, but it is equipped with a natural model selection procedure based on crossvalidation, which is an advantage over HSIC. Through experiments, we show the usefulness of the LSMI-based independence test called least-squares independence test (LSIT). Least-Squares Independence Test In this section, we propose a novel non-parametric independence test.. Formulation Let x ( X R dx ) be an input feature and ( Y R d ) be an output feature, which follow a joint probabilit distribution with densit p(x, ). Suppose we are given a set of i.i.d. paired samples {(x i, i )} n i=. Our goal is to test whether x and are statisticall independent or not, based on the samples {(x i, i )} n i=. For independence test, we emplo the squared-loss mutual information (SMI) defined as follows: SMI := ( ) p(x, ) p(x)p() p(x)p() dxd, () where p(x) and p() are marginal densities of x and, respectivel. SMI is the Pearson divergence [6] from the joint densit p(x, ) to the product of marginals p(x)p(), and SMI is zero if and onl if x and are statisticall independent. Hence, SMI can be used for detecting the statistical independence of random variables. SMI includes unknown probabilit densities p(x, ), p(x), and p(), and thus it cannot be directl computed. A naive approach is to estimate the densities p(x, ), p(x), and p(), and plug the estimated densities in Eq.(). However, since densit estimation is known to be a hard task and division b estimated densities can magnif the estimation error, we consider estimating the following densit ratio function directl: r(x, ) := p(x, ) p(x)p(). ()

3 Least-Squares Independence Test 3 Once an estimator of the densit ratio r(x, ) is obtained, SMI can be approximated using samples as follows: ŜMI := n n i= r(x i, i ) n n r(x i, j ). (3) i,j=. Least-Squares Mutual Information Here we explain the least-squares mutual information (LSMI) method [9], which directl learns r(x, ) from data samples without going through densit estimation of p(x, ), p(x), and p(). Let us approximate the densit ratio () using the following model: r(x, ) = b α l ψ l (x, ) = α ψ(x, ), l= where ψ(x, ) : R d x R d R b is a non-negative basis function vector, α ( R b ) is a parameter vector, and denotes the transpose. We determine the parameter α so that the following squared-error J is minimized: J (α) := ( r(x, ) r(x, )) p(x)p()dxd = r(x, ) p(x)p()dxd r(x, )p(x, )dxd + Const. Let us denote the first two terms b J. Since J contains the expectations over unknown densities p(x, ), p(x), and p(), we approximate the expectations b empirical averages. B including an l -regularizer, the LSMI optimization problem is formulated as follows. α := argmin α R b [ α Ĥα α ĥ + λ α α ], (4) where λ ( ) is the regularization parameter that controls the strength of regularization, and Ĥ := n ψ(x n i, j )ψ(x i, j ), ĥ := n i,j= n ψ(x i, i ). i= The solution α can be analticall obtained as α = (Ĥ + λi b) ĥ, (5)

4 Least-Squares Independence Test 4 where I b is the b-dimensional identit matrix. Finall, the densit ratio estimator r(x) is given b r(x) := α ψ(x). Once a densit ratio estimator r(x, ) is obtained, SMI can be approximated b Eq.(3). Thanks to the analtic-form expression, LSMI is computationall ver efficient. Furthermore, the above least-squares densit ratio estimator was shown to possess the optimal non-parametric convergence rate and optimal numerical stabilit [8, 5]..3 Basis Function Choice and Model Selection Given that both {x i } n i= and { i } n i= are normalized so that ever element of x and has unit variance, we use the following basis functions: exp ( x x ) l exp ( ) l for regression, σ σ ψ l (x, ) = exp ( x x ) l δ( = σ l ) for classification, where δ(c) = if the condition c is true; otherwise δ(c) =. We ma also include a constant basis function ϕ (x, ) = to the above kernel basis functions. The practical performance of LSMI depends on the choice of the kernel width σ and the regularization parameter λ. Model selection of LSMI is possible based on cross-validation with respect to the criterion J. More specificall, the sample set Z = {(x i, i )} n i= is divided into M disjoint sets {Z m } M m=. Then an LSMI solution r m (x) is obtained using Z\Z m (i.e., all samples without Z m ), and its J-score for the hold-out samples Z m is computed as Ĵ CV m := Z m r m (x, ) Z m (x,) Z m (x,) Z m r m (x, ), where Z denotes the number of elements in the set Z. This procedure is repeated for m =,..., M, and the average score Ĵ CV := M M m= Ĵ m CV is computed. Finall, the model (the kernel width σ and the regularization parameter λ in the current setup) that minimizes Ĵ CV is chosen as the most suitable one..4 Permutation Test Our independence test procedure is based on the permutation test [3]. We first run LSMI using the original datasets Z = {(x i, i )} n i=, and obtain an SMI estimate ŜMI(Z). Next, we randoml permute { i} n i= and form a shuffled dataset Z = {(x i, τ(i) )} n i=, where τ( ) is a randoml chosen permutation function. Then we run LSMI

5 Least-Squares Independence Test 5 again using the randoml shuffled dataset Z, and obtain an SMI estimate ŜMI( Z). Note that the random permutation eliminates the dependenc between x and (if exists), so ŜMI( Z) would take a value close to zero. This random permutation procedure is repeated man times, and the distribution of ŜMI( Z) under the null-hpothesis (i.e., x and are independent) is constructed. Finall, the p-value is approximated b evaluating the relative ranking of ŜMI(Z) in the distribution of ŜMI( Z). We refer to this procedure as the least-squares independence test (LSIT). A MATLAB R implementation of LSIT is available from sugi/software/lsit/. 3 Experiments In this section, we report experimental results. 3. Numerical Illustration First, we illustrate how the proposed LSIT method behaves using the following to datasets with one-dimensional input x and one-dimensional output : (A) Regression: For x U(, ) where U(a, b) denotes the uniform distribution on (a, b), U(, ) (Independent), N(sin(x/π), ) (Dependent), where N(µ, σ ) denotes the normal distribution with mean µ and variance σ. (B) Classification: For B(.5) where B(p) denotes the binomial distribution on {, +} with probabilit of having + being p,.5n(, ) +.5N(, ) (Independent), x N(, ) (Dependent). Examples of realized samples are plotted in Figure for n =, where {x i } n i= and { i } n i= are normalized to have unit variance. Figure depicts the distributions of ŜMI( Z) and the value of ŜMI(Z). The graphs show that reasonable p-values were obtained for all the four cases. Figure 3 depicts the p-values and the frequenc of accepting the null hpothesis (i.e., x and are independent) as functions of the sample size n. The graphs show that LSIT works reasonabl well.

6 Least-Squares Independence Test 6 3 x 3 x (a) Regression/Independent (b) Regression/Dependent x 3 x (c) Classification/Independent (d) Classification/Dependent 3. Performance Comparison Figure : To datasets. Here we compare the performance of LSMI and HSIC under the common permutationtest framework. HSIC is the state-of-the-art measure of statistical independence utilizing Gaussian kernels [4]. The performance of HSIC depends on the choice of the Gaussian width, and to the best of our knowledge, there is no theoreticall justified method to determine the kernel width. Here we use a standard heuristic of setting the Gaussian width to the median distance between samples, which was also adopted in the original paper [4]. We generate data samples b where [ ] x [ ] [ cos θ sin θ x = sin θ cos θ ], x.5n(, ) +.5N(, ),.5N(, ) +.5N(, ).

7 Least-Squares Independence Test 7 6 p value=.74 6 p value= ^. SMI.5 ^..5 SMI (a) Regression/Independent (b) Regression/Dependent 6 p value=. 6 p value= ^ SMI ^ SMI (c) Classification/Independent (d) Classification/Dependent Figure : Distributions of ŜMI( Z) (randoml shuffled samples) for the to datasets. denotes the value of ŜMI(Z) (original samples). Reg/ind Cla/ind Reg/dep Cla/dep Reg/ind Cla/ind Reg/dep Cla/dep n n Figure 3: Experimental results for the to datasets. Left: Mean and standard deviation of p-values over runs. Right: The frequenc of accepting the null hpothesis over runs under the significance level.5.

8 Least-Squares Independence Test 8 Thus, (x, ) are rotation of (x, ) b angle θ. We conduct experiments for θ = (i.e., x and are independent) and θ = π/8 (i.e., x and are dependent). Data samples {x i } n i= and { i } n i= are normalized to have unit variance (see Figure 4). The results plotted in Figure 5 show that the proposed LSIT has comparable tpe-i error (rejecting correct null-hpotheses) to HSIC, with far smaller tpe-ii error (accepting incorrect null-hpotheses). Next, we var the Gaussian width of HSIC and evaluate how the performance is changed. In Figure 5, HSIC(c) denotes HSIC with Gaussian width multiplied b c. The results show that, although the tpe-i error does not reall change with respect to c, the tpe-ii error is heavil affected b the choice of the Gaussian kernel width. For this dataset, the median heuristic does not work well, and c = / works the best. However, the optimall-tuned HSIC is still outperformed b the proposed LSIT, which is automaticall tuned based on cross-validation and does not involve manual parameter tuning. Finall, we use the MNIST handwritten digit dataset for further performance evaluation. Each digit image (representing an integer in {,,,..., 9}) consists of 784 (= 8 8) pixels, each of which takes an integer value between to 55 representing its intensit level in gra-scale. Here, we label the data as small for digits,,, 3, and 4, and large for digits 5, 6, 7, 8, and 9. We randoml choose 5 samples and randoml shuffle the label of 5( η) samples. Thus, increasing η from to corresponds to increasing the dependence between digit patterns and labels. The results are plotted in Figure 6, showing that the proposed LSIT has slightl larger tpe-i error (i.e., lower acceptance rate when η = ) than HSIC, but the tpe-ii error of LSIT is slightl smaller than HSIC (i.e., lower acceptance rate when η > ). For this dataset, the optimall-tuned HSIC (c = /) slightl outperforms automaticall-tuned LSIT. 4 Discussions and Conclusions We proposed a novel non-parametric method of independence test based on an estimator of a squared-loss variant of mutual information called least-squares mutual information [9]. The proposed method, which we called the least-squares independence test (LSIT), can overcome the limitation of the state-of-the-art method, the Hilbert-Schmidt independence criterion (HSIC) [4], which is not equipped with a model selection procedure of the Gaussian kernel width. Through experiments, we confirmed that the proposed LSIT compares favorabl with the HSIC-based independence test. Acknowledgment MS was supported b SCAT, AOARD, and the JST PRESTO program. TS was supported b MEXT Grant-in-Aid for Young Scientists (B) 789.

9 Least-Squares Independence Test 9.5 θ= θ=π/ x Figure 4: Rotation datasets. x and are independent if θ =, and the are dependent when θ = π/ θ= θ=π/8 LSIT HSIC HSIC(/5) HSIC(/) HSIC() n Figure 5: Experimental results for the rotation datasets. Frequenc of accepting the null hpothesis over runs under the significance level.5 is depicted LSIT HSIC HSIC(/5) HSIC(/) HSIC() η Figure 6: Experimental results for the MNIST datasets. Frequenc of accepting the null hpothesis over 5 runs under the significance level.5 is depicted.

10 Least-Squares Independence Test References [] N. Aronszajn, Theor of reproducing kernels, Trans. the American Mathematical Societ, vol.68, pp , 95. [] T.M. Cover and J.A. Thomas, Elements of Information Theor, nd ed., John Wile & Sons, Inc., 6. [3] B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall, 993. [4] A. Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schölkopf, and A. Smola, A kernel statistical test of independence, in Advances in Neural Information Processing Sstems, pp , 8. [5] T. Kanamori, T. Suzuki, and M. Sugiama, Condition number analsis of kernelbased densit ratio estimation, tech. rep., arxiv, 9. [6] K. Pearson, On the criterion that a given sstem of deviations from the probable in the case of a correlated sstem of variables is such that it can be reasonabl supposed to have arisen from random sampling, Philosophical Magazine, vol.5, pp.57 75, 9. [7] I. Steinwart, On the influence of the kernel on the consistenc of support vector machines, J. Machine Learning Research, vol., pp.67 93,. [8] T. Suzuki and M. Sugiama, Sufficient dimension reduction via squared-loss mutual information estimation, International Conference on Artificial Intelligence and Statistics, pp.84 8,. [9] T. Suzuki, M. Sugiama, T. Kanamori, and J. Sese, Mutual information estimation reveals global associations between stimuli and biological processes, BMC Bioinformatics, vol., no., p.s5, 9. [] M. Yamada and M. Sugiama, Dependence minimizing regression with model selection for non-linear causal inference under non-gaussian noise, AAAI Conference on Artificial Intelligence pp ,.

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise

Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise Makoto Yamada and Masashi Sugiyama Department of Computer Science, Tokyo Institute of Technology

More information

Class Prior Estimation from Positive and Unlabeled Data

Class Prior Estimation from Positive and Unlabeled Data IEICE Transactions on Information and Systems, vol.e97-d, no.5, pp.1358 1362, 2014. 1 Class Prior Estimation from Positive and Unlabeled Data Marthinus Christoffel du Plessis Tokyo Institute of Technology,

More information

Mutual Information Approximation via. Maximum Likelihood Estimation of Density Ratio:

Mutual Information Approximation via. Maximum Likelihood Estimation of Density Ratio: Mutual Information Approimation via Maimum Likelihood Estimation of Densit Ratio Taiji Suzuki Dept. of Mathematical Informatics, The Universit of Toko Email: s-taiji@stat.t.u-toko.ac.jp Masashi Sugiama

More information

Direct Density-Ratio Estimation with Dimensionality Reduction via Hetero-Distributional Subspace Analysis

Direct Density-Ratio Estimation with Dimensionality Reduction via Hetero-Distributional Subspace Analysis Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Direct Density-Ratio Estimation with Dimensionality Reduction via Hetero-Distributional Subspace Analysis Maoto Yamada and Masashi

More information

3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES

3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES 3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES 3.1 Introduction In this chapter we will review the concepts of probabilit, rom variables rom processes. We begin b reviewing some of the definitions

More information

From the help desk: It s all about the sampling

From the help desk: It s all about the sampling The Stata Journal (2002) 2, Number 2, pp. 90 20 From the help desk: It s all about the sampling Allen McDowell Stata Corporation amcdowell@stata.com Jeff Pitblado Stata Corporation jsp@stata.com Abstract.

More information

Density-Difference Estimation

Density-Difference Estimation Neural Computation. vol.25, no., pp.2734 2775, 23. Density-Difference Estimation Masashi Sugiyama Tokyo Institute of Technology, Japan. sugi@cs.titech.ac.jp http://sugiyama-www.cs.titech.ac.jp/ sugi Takafumi

More information

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,

More information

Kernel Measures of Conditional Dependence

Kernel Measures of Conditional Dependence Kernel Measures of Conditional Dependence Kenji Fukumizu Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku Tokyo 6-8569 Japan fukumizu@ism.ac.jp Arthur Gretton Max-Planck Institute for

More information

Hilbert Space Representations of Probability Distributions

Hilbert Space Representations of Probability Distributions Hilbert Space Representations of Probability Distributions Arthur Gretton joint work with Karsten Borgwardt, Kenji Fukumizu, Malte Rasch, Bernhard Schölkopf, Alex Smola, Le Song, Choon Hui Teo Max Planck

More information

Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation Ann Inst Stat Math (202) 64:009 044 DOI 0.007/s0463-0-0343-8 Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation Masashi Sugiyama Taiji Suzuki Takafumi

More information

Hilbert Schmidt Independence Criterion

Hilbert Schmidt Independence Criterion Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au

More information

arxiv: v2 [stat.ml] 3 Sep 2015

arxiv: v2 [stat.ml] 3 Sep 2015 Nonparametric Independence Testing for Small Sample Sizes arxiv:46.9v [stat.ml] 3 Sep 5 Aaditya Ramdas Dept. of Statistics and Machine Learning Dept. Carnegie Mellon University aramdas@cs.cmu.edu September

More information

Relative Density-Ratio Estimation for Robust Distribution Comparison

Relative Density-Ratio Estimation for Robust Distribution Comparison Neural Computation, vol.5, no.5, pp.34 37, 3. Relative Density-Ratio Estimation for Robust Distribution Comparison Makoto Yamada NTT Communication Science Laboratories, NTT Corporation, Japan. yamada.makoto@lab.ntt.co.jp

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

arxiv: v1 [stat.ml] 23 Jun 2011

arxiv: v1 [stat.ml] 23 Jun 2011 Relative Density-Ratio Estimation for Robust Distribution Comparison arxiv:06.4729v [stat.ml] 23 Jun 20 Makoto Yamada Tokyo Institute of Technology 2-2- O-okayama, Meguro-ku, Tokyo 52-8552, Japan. Taiji

More information

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised

More information

Input-Dependent Estimation of Generalization Error under Covariate Shift

Input-Dependent Estimation of Generalization Error under Covariate Shift Statistics & Decisions, vol.23, no.4, pp.249 279, 25. 1 Input-Dependent Estimation of Generalization Error under Covariate Shift Masashi Sugiyama (sugi@cs.titech.ac.jp) Department of Computer Science,

More information

Package dhsic. R topics documented: July 27, 2017

Package dhsic. R topics documented: July 27, 2017 Package dhsic July 27, 2017 Type Package Title Independence Testing via Hilbert Schmidt Independence Criterion Version 2.0 Date 2017-07-24 Author Niklas Pfister and Jonas Peters Maintainer Niklas Pfister

More information

A Density-ratio Framework for Statistical Data Processing

A Density-ratio Framework for Statistical Data Processing IPSJ Transactions on Computer Vision and Applications, vol.1, pp.183 08, 009. 1 A Density-ratio Framework for Statistical Data Processing Masashi Sugiyama Tokyo Institute of Technology -1-1 O-okayama,

More information

Direct Density-ratio Estimation with Dimensionality Reduction via Least-squares Hetero-distributional Subspace Search

Direct Density-ratio Estimation with Dimensionality Reduction via Least-squares Hetero-distributional Subspace Search Neural Networks, vol.4, no., pp.83 98,. Direct Density-ratio Estimation with Dimensionality Reduction via Least-squares Hetero-distributional Subspace Search Masashi Sugiyama Tokyo Institute of Technology

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models JMLR Workshop and Conference Proceedings 6:17 164 NIPS 28 workshop on causality Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Conditional Density Estimation via Least-Squares Density Ratio Estimation

Conditional Density Estimation via Least-Squares Density Ratio Estimation Conditional Density Estimation via Least-Squares Density Ratio Estimation Masashi Sugiyama Ichiro Takeuchi Taiji Suzuki Tokyo Institute of Technology & JST Nagoya Institute of Technology The University

More information

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)

More information

Identification of Nonlinear Dynamic Systems with Multiple Inputs and Single Output using discrete-time Volterra Type Equations

Identification of Nonlinear Dynamic Systems with Multiple Inputs and Single Output using discrete-time Volterra Type Equations Identification of Nonlinear Dnamic Sstems with Multiple Inputs and Single Output using discrete-time Volterra Tpe Equations Thomas Treichl, Stefan Hofmann, Dierk Schröder Institute for Electrical Drive

More information

Forefront of the Two Sample Problem

Forefront of the Two Sample Problem Forefront of the Two Sample Problem From classical to state-of-the-art methods Yuchi Matsuoka What is the Two Sample Problem? X ",, X % ~ P i. i. d Y ",, Y - ~ Q i. i. d Two Sample Problem P = Q? Example:

More information

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Journal of the Operations Research Societ of Japan 008, Vol. 51, No., 191-01 QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Tomonari Kitahara Shinji Mizuno Kazuhide Nakata Toko Institute of Technolog

More information

CHAPTER 5. Jointly Probability Mass Function for Two Discrete Distributed Random Variables:

CHAPTER 5. Jointly Probability Mass Function for Two Discrete Distributed Random Variables: CHAPTER 5 Jointl Distributed Random Variable There are some situations that experiment contains more than one variable and researcher interested in to stud joint behavior of several variables at the same

More information

Direct Importance Estimation with Gaussian Mixture Models

Direct Importance Estimation with Gaussian Mixture Models IEICE Transactions on Information and Systems, vol.e92-d, no.10, pp.2159 2162, 2009. Revised on December 31, 2010. 1 Direct Importance Estimation with Gaussian Mixture Models Maoto Yamada and Masashi Sugiyama

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton

Kernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,

More information

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang Dept of Computer Science and HIIT University of Helsinki 14 Helsinki, Finland kun.zhang@cs.helsinki.fi Aapo Hyvärinen

More information

Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation irect Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation Masashi Sugiyama Tokyo Institute of Technology sugi@cs.titech.ac.jp Shinichi Nakajima Nikon Corporation

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer

More information

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

CONTINUOUS SPATIAL DATA ANALYSIS

CONTINUOUS SPATIAL DATA ANALYSIS CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Inference about the Slope and Intercept

Inference about the Slope and Intercept Inference about the Slope and Intercept Recall, we have established that the least square estimates and 0 are linear combinations of the Y i s. Further, we have showed that the are unbiased and have the

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Perturbation Theory for Variational Inference

Perturbation Theory for Variational Inference Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

ODIF FOR MORPHOLOGICAL FILTERS. Sari Peltonen and Pauli Kuosmanen

ODIF FOR MORPHOLOGICAL FILTERS. Sari Peltonen and Pauli Kuosmanen ODIF FOR MORPHOLOGICAL FILTERS Sari Peltonen Pauli Kuosmanen Signal Processing Laborator, Digital Media Institute, Tampere Universit of Technolog P.O. Box 553, FIN-331 TAMPERE, FINLAND e-mail: sari@cs.tut.fi

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

( ) ( ) ( )( ) Algebra II Semester 2 Practice Exam A DRAFT. = x. Which expression is equivalent to. 1. Find the solution set of. f g x? B.

( ) ( ) ( )( ) Algebra II Semester 2 Practice Exam A DRAFT. = x. Which expression is equivalent to. 1. Find the solution set of. f g x? B. Algebra II Semester Practice Eam A. Find the solution set of 5 5 +, 9 9 +, 5 5 +, =.. Let f ( ) = and g( ) =. Which epression is equivalent to f g? ( ) ( ) + ( ) 9 9 +, 5. If h( ) = 0 and j( ) h( ) + j

More information

A Conditional Expectation Approach to Model Selection and Active Learning under Covariate Shift

A Conditional Expectation Approach to Model Selection and Active Learning under Covariate Shift In J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer, and N. Lawrence (Eds.), Dataset Shift in Machine Learning, Chapter 7, pp.107 130, MIT Press, Cambridge, MA, USA, 2009. 1 A Conditional Expectation

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Estimation of Mixed Exponentiated Weibull Parameters in Life Testing

Estimation of Mixed Exponentiated Weibull Parameters in Life Testing International Mathematical Forum, Vol., 6, no. 6, 769-78 HIKARI Ltd, www.m-hikari.com http://d.doi.org/.988/imf.6.6445 Estimation of Mied Eponentiated Weibull Parameters in Life Testing A. M. Abd-Elrahman

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Kernel-Based Information Criterion

Kernel-Based Information Criterion Kernel-Based Information Criterion Somayeh Danafar 1, Kenji Fukumizu 3 & Faustino Gomez 2 Computer and Information Science; Vol. 8, No. 1; 2015 ISSN 1913-8989 E-ISSN 1913-8997 Published by Canadian Center

More information

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints

Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Stochastic Variational Inference for Gaussian Process Latent Variable Models using Back Constraints Thang D. Bui Richard E. Turner tdb40@cam.ac.uk ret26@cam.ac.uk Computational and Biological Learning

More information

Kernel Partial Least Squares for Nonlinear Regression and Discrimination

Kernel Partial Least Squares for Nonlinear Regression and Discrimination Kernel Partial Least Squares for Nonlinear Regression and Discrimination Roman Rosipal Abstract This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing

More information

arxiv: v1 [cs.lg] 22 Jun 2009

arxiv: v1 [cs.lg] 22 Jun 2009 Bayesian two-sample tests arxiv:0906.4032v1 [cs.lg] 22 Jun 2009 Karsten M. Borgwardt 1 and Zoubin Ghahramani 2 1 Max-Planck-Institutes Tübingen, 2 University of Cambridge June 22, 2009 Abstract In this

More information

A Resampling Method on Pivotal Estimating Functions

A Resampling Method on Pivotal Estimating Functions A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation

More information

Tensor Product Kernels: Characteristic Property, Universality

Tensor Product Kernels: Characteristic Property, Universality Tensor Product Kernels: Characteristic Property, Universality Zolta n Szabo CMAP, E cole Polytechnique Joint work with: Bharath K. Sriperumbudur Hangzhou International Conference on Frontiers of Data Science

More information

Statistical Convergence of Kernel CCA

Statistical Convergence of Kernel CCA Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Machine Learning. 1. Linear Regression

Machine Learning. 1. Linear Regression Machine Learning Machine Learning 1. Linear Regression Lars Schmidt-Thieme Information Sstems and Machine Learning Lab (ISMLL) Institute for Business Economics and Information Sstems & Institute for Computer

More information

Metrol. Meas. Syst., Vol. XVIII (2011), No. 2, pp METROLOGY AND MEASUREMENT SYSTEMS. Index , ISSN

Metrol. Meas. Syst., Vol. XVIII (2011), No. 2, pp METROLOGY AND MEASUREMENT SYSTEMS. Index , ISSN Metrol. Meas. Sst., Vol. XVIII (11), No., pp. 335 34 METROLOGY AND MEASUREMENT SYSTEMS Index 3393, ISSN 86-89 www.metrolog.pg.gda.pl INVESTIGATION OF THE STATISTICAL METHOD OF TIME DELAY ESTIMATION BASED

More information

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis

Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analysis Semi-Supervised Laplacian Regularization of Kernel Canonical Correlation Analsis Matthew B. Blaschko, Christoph H. Lampert, & Arthur Gretton Ma Planck Institute for Biological Cbernetics Tübingen, German

More information

Information-Maximization Clustering based on Squared-Loss Mutual Information

Information-Maximization Clustering based on Squared-Loss Mutual Information Information-Maximization Clustering based on Squared-Loss Mutual Information arxiv:.6v [stat.ml] 3 Dec Masashi Sugiyama (sugi@cs.titech.ac.jp) Makoto Yamada (yamada@sg.cs.titech.ac.jp) Manabu Kimura (kimura@sg.cs.titech.ac.jp)

More information

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee

Distribution Regression: A Simple Technique with Minimax-optimal Guarantee Distribution Regression: A Simple Technique with Minimax-optimal Guarantee (CMAP, École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department,

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

arxiv: v2 [cs.lg] 1 May 2013

arxiv: v2 [cs.lg] 1 May 2013 Semi-Supervised Information-Maximization Clustering Daniele Calandriello 1, Gang Niu 2, and Masashi Sugiyama 2 1 Politecnico di Milano, Milano, Italy daniele.calandriello@mail.polimi.it 2 Tokyo Institute

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Understanding the relationship between Functional and Structural Connectivity of Brain Networks

Understanding the relationship between Functional and Structural Connectivity of Brain Networks Understanding the relationship between Functional and Structural Connectivity of Brain Networks Sashank J. Reddi Machine Learning Department, Carnegie Mellon University SJAKKAMR@CS.CMU.EDU Abstract Background.

More information

Inferential Statistics and Methods for Tuning and Configuring

Inferential Statistics and Methods for Tuning and Configuring DM63 HEURISTICS FOR COMBINATORIAL OPTIMIZATION Lecture 13 and Methods for Tuning and Configuring Marco Chiarandini 1 Summar 2 3 Eperimental Designs 4 Sequential Testing (Racing) Summar 1 Summar 2 3 Eperimental

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

The Fusion of Parametric and Non-Parametric Hypothesis Tests

The Fusion of Parametric and Non-Parametric Hypothesis Tests The Fusion of arametric and Non-arametric pothesis Tests aul Frank Singer, h.d. Ratheon Co. EO/E1/A173.O. Bo 902 El Segundo, CA 90245 310-647-2643 FSinger@ratheon.com Abstract: This paper considers the

More information

Sufficient Dimension Reduction via Direct Estimation of the Gradients of Logarithmic Conditional Densities

Sufficient Dimension Reduction via Direct Estimation of the Gradients of Logarithmic Conditional Densities JMLR: Workshop and Conference Proceedings 30:1 16, 2015 ACML 2015 Sufficient Dimension Reduction via Direct Estimation of the Gradients of Logarithmic Conditional Densities Hiroaki Sasaki Graduate School

More information

Maximum Mean Discrepancy

Maximum Mean Discrepancy Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia

More information

Feature Selection via Dependence Maximization

Feature Selection via Dependence Maximization Journal of Machine Learning Research 1 (2007) xxx-xxx Submitted xx/07; Published xx/07 Feature Selection via Dependence Maximization Le Song lesong@it.usyd.edu.au NICTA, Statistical Machine Learning Program,

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Importance Reweighting Using Adversarial-Collaborative Training

Importance Reweighting Using Adversarial-Collaborative Training Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a

More information

Multiscale Principal Components Analysis for Image Local Orientation Estimation

Multiscale Principal Components Analysis for Image Local Orientation Estimation Multiscale Principal Components Analsis for Image Local Orientation Estimation XiaoGuang Feng Peman Milanfar Computer Engineering Electrical Engineering Universit of California, Santa Cruz Universit of

More information