Bayesian Games for Adversarial Regression Problems (Online Appendix)

Size: px
Start display at page:

Download "Bayesian Games for Adversarial Regression Problems (Online Appendix)"

Transcription

1 Online Appendix) Michael Großhans Christoph Sawade Michael Brückner Tobias Scheffer University of Potsdam, Department of Computer Science, August-Bebel-Straße 9, 44 Potsdam, Germany SoundCloud Ltd., Rosenthalerstraße, 9 Berlin, Germany A. Proofs Proof of Lemma The partial derivative of Equation 4 for an instance i is given by x i ˆθd w, X, c d ) = x T i w z i ) w + xi x i ). Due to the convexity of ˆθ d we can compute the minimum by equating it to zero and solving for x i using the Sherman-Morrison formula. This results in x x i = x i c d,i ) + ) w T i w z i w. The claim follows, since the optimal transformations are independent from each other. Proof of Lemma The partial derivate of Equation is given by ˆθ l w, φc d ), c l )dqc d ) w = diag c l )φc d )w y) T φc d )w y)dqc d ) w + w w ) = φc d ) T diag c l )φc d )dqc d ) w φc d )dqc d )) T diag c l )y + w. Setting this gradient equal to zero and solving it for w yields w [φ] = I m + ) φc d ) T diag c l )φc d )dqc d ) φc d )dqc d )) T diag c l )y. The matrix φc d ) T diag c l )φc d ) and hence the expectation φc d ) T diag c l )φc d )dqc d ) is positive semidefinite for any adversary s strategy φc d ). Thus, all eigenvalues λ i are non-negative. The corresponding eigenvectors are additionally eigenvectors of I m + φc d ) T diag c l )φc d )dqc d ) with eigenvalues + λ i >. Thus, the inverse matrix exists independently of the choice of φ. This proves the claim. Proof of Lemma We define the action spaces of G as the closure of the convex hull of optimal responses Φ = cl{φ Φ φ = τφ [w ] + τ)φ [w ], τ [, ], w, w W}) W = cl{w W w = τw [φ ] + τ)w [φ ], τ [, ], φ, φ Φ}), where cl M) denotes the closure of set M. Since a Bayesian equilibrium is a pair of optimal responses see Definition ), each equilibrium in the original game G is a member of the restricted space W Φ. Since both games have identical loss functions and costs distributions each equilibrium point in G is an equilibrium point G and vice versa. It remains to be shown that the spaces Φ and W of optimal responses are bounded. We start by showing that any optimal response of the data generator is

2 bounded. Using the triangle inequality, the spectral norm of an optimal response to a model parameter w see Lemma ) can be upper-bounded by φ [w]c d ) ) X + diag c d ) + w I n zw + ) diag c d ) + w I n Xww. 4) The third summand of the right-hand side in Inequality 4 can be upper bounded as follows. ) diag c d ) + w I n Xww diag c d ) + w n) I X ww T ) = w + min X w 6) i < X. 7) Equation follows by the sub-multiplicativity of the spectral norm. The spectral norm of a symmetric positive definite matrix is given by its largest eigenvalue, that is, the maximal diagonal entry of the positive diagonal matrix see Equation 6). Equation 7 holds, since c d >. Analogously, the second summand see Inequality 4) can be bounded using the sub-multiplicativity of the spectral norm Inequality ) and evaluating the largest eigenvalue of the diagonal matrix, which we split up into two factors see Equation 9). In Inequality we can bound each dominator by one of the positive summands. ) diag c d ) + w I n zw diag c d ) + w n) I z w ) z = w + min i w w + min i 9) z < w min i w ) = z max cd,i i ) Finally, using Bound 7 and the function space Φ is bounded by φ[w]c d ) dqc d ) < X + z max cd,i dqc d ) i since max i cd,i dqc d ) < n i + cd,i )dqc d ) < n + n i cd,i dqc d ) < exist. We now show that if the data generator s action space is convex and compact the space of optimal responses W of the learner is compact and convex as well. Let φc d ) be a data generator s strategy given costs c d. Then, the learner s optimal response given by Lemma can be bounded using the submultiplicativity of the l -norm w [φ] I m + φc d ) T diag c l )φc d )dqc d )) φc d )dqc d )) diag c l )y { } sup φ Φ φ c d ) dqc d ) diag c l )y. ) For any arbitrary symmetric, positive definite matrix the squared spectral norm equals its largest eigenvalue. Since φc d ) T diag c l )φc d )dqc d ) is positive semidefinite, all eigenvalues λ i are non-negative. The corresponding eigenvectors are additionally eigenvectors of I m + φc d ) T diag c l )φc d )dqc d ) with eigenvalues + λ i. Thus, the inverse matrix exists and its norm can be upper bounded by one see Inequality ). Then, the claim follows since the data generator s action space is bounded. Proof of Theorem Let a = a,..., a n ) T be an arbitrary vector. Then, following Taylor s Theorem the data generator s optimal response can be written as φ [w]c d ) = φ t;a [w]c d ) + R t;a c d ), t = diag c d a) r C r a) + R t;a c d ), r= with Cauchy remainder R t;a c d ) = t + )diag c d ξ) t diag c d a)c t+ ξ), for some ξ = ξ,..., ξ n ) T with ξ i [, a i ] or ξ i [a i, ], respectively. It remains to show that a can be chosen, such that R t;a c d ) converges to the null matrix, or equivalently, that lim R t;a c d ) t = holds.

3 4 Bayes Nash Ridge σ =.. expected costs µ µ = 4.. variance of costs σ Figure. Evaluation against an adversary that follows a Bayesian equilibrium strategy for varying cost parameters. Error bars show standard errors. The limit of the Cauchy remainder can be expressed as lim R t;ac d ) t t = lim + )diag cd ξ) t diag c d a)c t+ ξ) t t lim + )diag cd ξ) t diag c d a) t )) t+) I n + diag w a t w ) = lim max t + ) ξ i ) t a i ) t i ) t + w a t i w. 4) In Equation we dismiss the two terms ) t+ and Xw z) w T from C t+ ξ) see Equation ). This leads to a diagonal matrix whose spectral norm is given by the maximal absolute diagonal entry of the matrix see Equation 4). If w = the claim holds. So let w >. Then, Equation 4 can be factorized as lim R t;ac d ) t ) t lim max ξ i t + ) t i w + w i) a cd,i a i ). ) A sufficient condition for the geometric sequence and thus for Equation to tend to zero is that ξ i w k < 6) holds for all i and a fixed k. In the following we show by case differentiation according to that Condition 6 is fulfilled for a i = sup { c d qc d ) > }, where c = max i c i ) is the maximum norm. Let a i. Since ξ i [a i, ] it follows that a i ξ i a i and, thus, the quotient can be upperbounded by ξ i w ξ i w a i a i w = k. Since w >, it holds that k <. Now assume that a i >. Since the data generator s costs are bounded from below by zero, it follows that ξ i a i. Hence, the quotient can be upper-bounded by ξ i w ξ i a i w w = k. This proves the claim. B. Comprehensive Empirical Results In Section 6 we studied the behavior of the Bayesian regression model in the context of spam filtering. We compared the Bayesian game regression

4 model denoted Bayes) to the Nash game regression model denoted Nash), the robust ridge regression denoted Minimax), and a regular ridge regression denoted Ridge). Depiction of Theorem In Theorem we derived sufficient conditions for unique equilibrium points. The uniqueness depends on both players costs c d and c l. Figure 6 left) depicts the optimal responses of the learner horizontal axis) and the data generator vertical axis) for different costs c l = c d in a one-dimensional regression game where X =, y =, and z = without uncertainty. Their intersections constitute the equilibrium of the corresponding game. If the costs become too large, such that Condition is violated for any w, φ) [, ] [, ], the game G is no longer locally convex indicated by the dashed black line). Playing against a Bayesian adversary In a first experiment, we evaluated how the methods perform against an adversary that chooses a strategy according to a Bayesian equilibrium. We choose q ) as a gamma distribution and varied its mean µ and the variance σ. We set the Nash model s conjecture for all values of to the mean µ; this is optimal if the data generator s costs are drawn from a single-point distribution q ) = δ = µ). Figure shows the root mean squared error ) for Bayes, Nash, and Ridge as a function of µ and σ left). Figure right) shows a sectional view along µ top) and along σ bottom). We observe that all methods coincide if the data generator s costs vanish. The advantage of Bayes over Nash grows with the variance of q. Playing against actual adversaries In a second experiment, we evaluate all methods over time into the future. Here, the models play against actual spammers. Additionally, in order to artificially create a mismatch to our modeling assumptions, we also evaluate the models on test data from the past. The training sample of instances are drawn from month k. To evaluate into the future, the regularization parameters of all learners for Bayes, we use a single cost parameter for all instances) are tuned on, instances from month k + ; for evaluation into the past, tuning data are drawn from month k. Test data are then drawn from months k + to k + and from months k to k 6, respectively. This process is repeated and measurements are averaged over ten resampling iterations of the training set and, in an outer loop, over four training months k March to June ). The data generator s costs parameters are set to µ =. and σ =. for Bayes and to µ =. for Nash. Figure 4 left) shows the over time for fixed mean µ =. and variance σ =.. Figure 4 center) depicts the for a fixed point in the past over a range of different values for µ. Figure 4 right) shows the training times of the Bayesian equilibrium model and reference models for varying number of attributes. Additionally we study the shift in spam mails in reality and compare it with the computed equilibrium point. For depiction we choose the two most discriminant principal components with respect to spam and non-spam. We train separate Nash models on March 7 see section 6), April 7, May 7 and June 7. Again we use instances and the data generator s costs are set to µ =.. The learner s costs are tuned on the subsequent month. Given the Nash model we are able to extract the optimal transformed data from training month according to Lemma. Figure depicts a training samples blue), test samples from the six subsequent months green/yellow) and the computed equilibrium data red) for three different training months April - June 7); a fourth is shown in Figure. If changes of spam mails are continuous they can be well predicted see e.g. upper right corner). If instead the changes are more volatile see e.g. lower right corner) the Nash model is not able to predict the appearance of new spam mails. Approximation of adversary s responses Finally, we study the impact of the degree t of the Taylor approximation on the accuracy and execution time of Bayes. Figure 6 center) shows the evaluated over time for t =,, see Equation ). Figure 6 right, top) shows the execution time depending on the number of training s for a fixed number of attributes m = ). Figure 6 right, bottom) shows the execution time depending on the number of attributes for a fixed number of training mails n = ).

5 Bayesian Games for Adversarial Regression Problems Bayes Nash Minimax Ridge k 6 k 4 k Execution Time in sec) Evaluation into Past k 4) Tuning Past Training Tuning Future Evaluation into Past and Future relative) k k+ k+4 k+6 month. expected costs µ 6. number of attributes Figure 4. Evaluation of regression models with fixed expected costs into the past and future left) and varying expected costs into the past center). Execution Time right). Error bars show standard errors... Apr May Oct.. May Principal Component Jun Nov Principal Component Principal Component Principal Component.. Principal Component Jun Jul Dec Principal Component Figure. Shift in spam mails over time for three different training months. Training data is shown in blue; the imaginary data are shown in red. Evaluation into Past and Future relative) cv=.. cv=. g*[w] cv=. cv=.. Bayes t=) Bayes t=) Bayes t=) Tuning Past Training Tuning Future Optimal Responses x=, y=, t=) w*[g] k 6 k 4 k k k+ k+4 k+6 month number of training mails. Execution Time in sec) number of attributes Figure 6. Optimal responses for one-dimensional regression games left). Evaluation of Bayesian model with different Taylor degrees t over time center). Execution Time right).

5 Compact linear operators

5 Compact linear operators 5 Compact linear operators One of the most important results of Linear Algebra is that for every selfadjoint linear map A on a finite-dimensional space, there exists a basis consisting of eigenvectors.

More information

STATIC PREDICTION GAMES FOR ADVERSARIAL LEARNING PROBLEMS. Preprint 304 August 2011

STATIC PREDICTION GAMES FOR ADVERSARIAL LEARNING PROBLEMS. Preprint 304 August 2011 STATIC PREDICTION GAMES FOR ADVERSARIAL LEARNING PROBLEMS Michael Brückner 1, Christian Kanzow 2, and Tobias Scheffer 1 Preprint 304 August 2011 1 University of Potsdam Department of Computer Science August-Bebel-Str.

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Managing Uncertainty

Managing Uncertainty Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian

More information

CHAPTER 7: Systems and Inequalities

CHAPTER 7: Systems and Inequalities (Exercises for Chapter 7: Systems and Inequalities) E.7.1 CHAPTER 7: Systems and Inequalities (A) means refer to Part A, (B) means refer to Part B, etc. (Calculator) means use a calculator. Otherwise,

More information

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and 7.2 - Quadratic Forms quadratic form on is a function defined on whose value at a vector in can be computed by an expression of the form, where is an symmetric matrix. The matrix R n Q R n x R n Q(x) =

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Applications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012

Applications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012 Applications of Robust Optimization in Signal Processing: Beamforg and Power Control Fall 2012 Instructor: Farid Alizadeh Scribe: Shunqiao Sun 12/09/2012 1 Overview In this presentation, we study the applications

More information

A note on the σ-algebra of cylinder sets and all that

A note on the σ-algebra of cylinder sets and all that A note on the σ-algebra of cylinder sets and all that José Luis Silva CCM, Univ. da Madeira, P-9000 Funchal Madeira BiBoS, Univ. of Bielefeld, Germany (luis@dragoeiro.uma.pt) September 1999 Abstract In

More information

Near Optimal Adaptive Robust Beamforming

Near Optimal Adaptive Robust Beamforming Near Optimal Adaptive Robust Beamforming The performance degradation in traditional adaptive beamformers can be attributed to the imprecise knowledge of the array steering vector and inaccurate estimation

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

Online Bounds for Bayesian Algorithms

Online Bounds for Bayesian Algorithms Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

The Numerics of GANs

The Numerics of GANs The Numerics of GANs Lars Mescheder 1, Sebastian Nowozin 2, Andreas Geiger 1 1 Autonomous Vision Group, MPI Tübingen 2 Machine Intelligence and Perception Group, Microsoft Research Presented by: Dinghan

More information

A geometric proof of the spectral theorem for real symmetric matrices

A geometric proof of the spectral theorem for real symmetric matrices 0 0 0 A geometric proof of the spectral theorem for real symmetric matrices Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu January 6, 2011

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Convex Sets Strict Separation. in the Minimax Theorem

Convex Sets Strict Separation. in the Minimax Theorem Applied Mathematical Sciences, Vol. 8, 2014, no. 36, 1781-1787 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.4271 Convex Sets Strict Separation in the Minimax Theorem M. A. M. Ferreira

More information

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers Page 1 Theorems Wednesday, May 9, 2018 12:53 AM Theorem 1.11: Greatest-Lower-Bound Property Suppose is an ordered set with the least-upper-bound property Suppose, and is bounded below be the set of lower

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Notes on Complex Analysis

Notes on Complex Analysis Michael Papadimitrakis Notes on Complex Analysis Department of Mathematics University of Crete Contents The complex plane.. The complex plane...................................2 Argument and polar representation.........................

More information

SPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS

SPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS SPRING 006 PRELIMINARY EXAMINATION SOLUTIONS 1A. Let G be the subgroup of the free abelian group Z 4 consisting of all integer vectors (x, y, z, w) such that x + 3y + 5z + 7w = 0. (a) Determine a linearly

More information

6.896: Topics in Algorithmic Game Theory

6.896: Topics in Algorithmic Game Theory 6.896: Topics in Algorithmic Game Theory Audiovisual Supplement to Lecture 5 Constantinos Daskalakis On the blackboard we defined multi-player games and Nash equilibria, and showed Nash s theorem that

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

AN INTRODUCTION TO MATHEMATICAL ANALYSIS ECONOMIC THEORY AND ECONOMETRICS

AN INTRODUCTION TO MATHEMATICAL ANALYSIS ECONOMIC THEORY AND ECONOMETRICS AN INTRODUCTION TO MATHEMATICAL ANALYSIS FOR ECONOMIC THEORY AND ECONOMETRICS Dean Corbae Maxwell B. Stinchcombe Juraj Zeman PRINCETON UNIVERSITY PRESS Princeton and Oxford Contents Preface User's Guide

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx. Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION - SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Bayesian consistent prior selection

Bayesian consistent prior selection Bayesian consistent prior selection Christopher P. Chambers and Takashi Hayashi August 2005 Abstract A subjective expected utility agent is given information about the state of the world in the form of

More information

Contents. Preface xi. vii

Contents. Preface xi. vii Preface xi 1. Real Numbers and Monotone Sequences 1 1.1 Introduction; Real numbers 1 1.2 Increasing sequences 3 1.3 Limit of an increasing sequence 4 1.4 Example: the number e 5 1.5 Example: the harmonic

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Distributed intelligence in multi agent systems

Distributed intelligence in multi agent systems Distributed intelligence in multi agent systems Usman Khan Department of Electrical and Computer Engineering Tufts University Workshop on Distributed Optimization, Information Processing, and Learning

More information

Convex Geometry. Carsten Schütt

Convex Geometry. Carsten Schütt Convex Geometry Carsten Schütt November 25, 2006 2 Contents 0.1 Convex sets... 4 0.2 Separation.... 9 0.3 Extreme points..... 15 0.4 Blaschke selection principle... 18 0.5 Polytopes and polyhedra.... 23

More information

Interval solutions for interval algebraic equations

Interval solutions for interval algebraic equations Mathematics and Computers in Simulation 66 (2004) 207 217 Interval solutions for interval algebraic equations B.T. Polyak, S.A. Nazin Institute of Control Sciences, Russian Academy of Sciences, 65 Profsoyuznaya

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

EXISTENCE OF PURE EQUILIBRIA IN GAMES WITH NONATOMIC SPACE OF PLAYERS. Agnieszka Wiszniewska Matyszkiel. 1. Introduction

EXISTENCE OF PURE EQUILIBRIA IN GAMES WITH NONATOMIC SPACE OF PLAYERS. Agnieszka Wiszniewska Matyszkiel. 1. Introduction Topological Methods in Nonlinear Analysis Journal of the Juliusz Schauder Center Volume 16, 2000, 339 349 EXISTENCE OF PURE EQUILIBRIA IN GAMES WITH NONATOMIC SPACE OF PLAYERS Agnieszka Wiszniewska Matyszkiel

More information

arxiv: v1 [stat.ml] 3 Apr 2017

arxiv: v1 [stat.ml] 3 Apr 2017 Geometric Insights into SVM Tuning Geometric Insights into Support Vector Machine Behavior using the KKT Conditions arxiv:1704.00767v1 [stat.ml] 3 Apr 2017 Iain Carmichael Department of Statistics and

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018 MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S

More information

Math Tune-Up Louisiana State University August, Lectures on Partial Differential Equations and Hilbert Space

Math Tune-Up Louisiana State University August, Lectures on Partial Differential Equations and Hilbert Space Math Tune-Up Louisiana State University August, 2008 Lectures on Partial Differential Equations and Hilbert Space 1. A linear partial differential equation of physics We begin by considering the simplest

More information

M17 MAT25-21 HOMEWORK 6

M17 MAT25-21 HOMEWORK 6 M17 MAT25-21 HOMEWORK 6 DUE 10:00AM WEDNESDAY SEPTEMBER 13TH 1. To Hand In Double Series. The exercises in this section will guide you to complete the proof of the following theorem: Theorem 1: Absolute

More information

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2

More information

Population Games and Evolutionary Dynamics

Population Games and Evolutionary Dynamics Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

GAMINGRE 8/1/ of 7

GAMINGRE 8/1/ of 7 FYE 09/30/92 JULY 92 0.00 254,550.00 0.00 0 0 0 0 0 0 0 0 0 254,550.00 0.00 0.00 0.00 0.00 254,550.00 AUG 10,616,710.31 5,299.95 845,656.83 84,565.68 61,084.86 23,480.82 339,734.73 135,893.89 67,946.95

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103 Analysis and Linear Algebra Lectures 1-3 on the mathematical tools that will be used in C103 Set Notation A, B sets AcB union A1B intersection A\B the set of objects in A that are not in B N. Empty set

More information

Lecture 7 Monotonicity. September 21, 2008

Lecture 7 Monotonicity. September 21, 2008 Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity

More information

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable International Journal of Wavelets, Multiresolution and Information Processing c World Scientic Publishing Company Polynomial functions are renable Henning Thielemann Institut für Informatik Martin-Luther-Universität

More information

Functional Analysis HW #3

Functional Analysis HW #3 Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform

More information

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels Mehdi Mohseni Department of Electrical Engineering Stanford University Stanford, CA 94305, USA Email: mmohseni@stanford.edu

More information

Linear algebra for computational statistics

Linear algebra for computational statistics University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

In these chapter 2A notes write vectors in boldface to reduce the ambiguity of the notation.

In these chapter 2A notes write vectors in boldface to reduce the ambiguity of the notation. 1 2 Linear Systems In these chapter 2A notes write vectors in boldface to reduce the ambiguity of the notation 21 Matrix ODEs Let and is a scalar A linear function satisfies Linear superposition ) Linear

More information

1 The General Definition

1 The General Definition MS&E 336 Lecture 1: Dynamic games Ramesh Johari April 4, 2007 1 The General Definition A dynamic game (or extensive game, or game in extensive form) consists of: A set of players N; A set H of sequences

More information

,... We would like to compare this with the sequence y n = 1 n

,... We would like to compare this with the sequence y n = 1 n Example 2.0 Let (x n ) n= be the sequence given by x n = 2, i.e. n 2, 4, 8, 6,.... We would like to compare this with the sequence = n (which we know converges to zero). We claim that 2 n n, n N. Proof.

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 12, DECEMBER 2008 2009 Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation Yuanqing Li, Member, IEEE, Andrzej Cichocki,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

More information

Constructive Proof of the Fan-Glicksberg Fixed Point Theorem for Sequentially Locally Non-constant Multi-functions in a Locally Convex Space

Constructive Proof of the Fan-Glicksberg Fixed Point Theorem for Sequentially Locally Non-constant Multi-functions in a Locally Convex Space Constructive Proof of the Fan-Glicksberg Fixed Point Theorem for Sequentially Locally Non-constant Multi-functions in a Locally Convex Space Yasuhito Tanaka, Member, IAENG, Abstract In this paper we constructively

More information

Robust Efficient Frontier Analysis with a Separable Uncertainty Model

Robust Efficient Frontier Analysis with a Separable Uncertainty Model Robust Efficient Frontier Analysis with a Separable Uncertainty Model Seung-Jean Kim Stephen Boyd October 2007 Abstract Mean-variance (MV) analysis is often sensitive to model mis-specification or uncertainty,

More information

The Free Matrix Lunch

The Free Matrix Lunch The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction

More information

How to Convexify the Intersection of a Second Order Cone and a Nonconvex Quadratic

How to Convexify the Intersection of a Second Order Cone and a Nonconvex Quadratic How to Convexify the Intersection of a Second Order Cone and a Nonconvex Quadratic arxiv:1406.1031v2 [math.oc] 5 Jun 2014 Samuel Burer Fatma Kılınç-Karzan June 3, 2014 Abstract A recent series of papers

More information

Adversarial Classification

Adversarial Classification Adversarial Classification Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, Deepak Verma [KDD 04, Seattle] Presented by: Aditya Menon UCSD April 22, 2008 Presented by: Aditya Menon (UCSD) Adversarial

More information

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular form) Given: matrix C = (c i,j ) n,m i,j=1 ODE and num math: Linear algebra (N) [lectures] c phabala 2016 DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix

More information

Cromwell's principle idealized under the theory of large deviations

Cromwell's principle idealized under the theory of large deviations Cromwell's principle idealized under the theory of large deviations Seminar, Statistics and Probability Research Group, University of Ottawa Ottawa, Ontario April 27, 2018 David Bickel University of Ottawa

More information

Zero sum games Proving the vn theorem. Zero sum games. Roberto Lucchetti. Politecnico di Milano

Zero sum games Proving the vn theorem. Zero sum games. Roberto Lucchetti. Politecnico di Milano Politecnico di Milano General form Definition A two player zero sum game in strategic form is the triplet (X, Y, f : X Y R) f (x, y) is what Pl1 gets from Pl2, when they play x, y respectively. Thus g

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER 2010 6433 Bayesian Minimax Estimation of the Normal Model With Incomplete Prior Covariance Matrix Specification Duc-Son Pham, Member,

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Majorization Minimization - the Technique of Surrogate

Majorization Minimization - the Technique of Surrogate Majorization Minimization - the Technique of Surrogate Andersen Ang Mathématique et de Recherche opérationnelle Faculté polytechnique de Mons UMONS Mons, Belgium email: manshun.ang@umons.ac.be homepage:

More information

LINEAR AND NONLINEAR PROGRAMMING

LINEAR AND NONLINEAR PROGRAMMING LINEAR AND NONLINEAR PROGRAMMING Stephen G. Nash and Ariela Sofer George Mason University The McGraw-Hill Companies, Inc. New York St. Louis San Francisco Auckland Bogota Caracas Lisbon London Madrid Mexico

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

A priori bounds on the condition numbers in interior-point methods

A priori bounds on the condition numbers in interior-point methods A priori bounds on the condition numbers in interior-point methods Florian Jarre, Mathematisches Institut, Heinrich-Heine Universität Düsseldorf, Germany. Abstract Interior-point methods are known to be

More information

Non Convex-Concave Saddle Point Optimization

Non Convex-Concave Saddle Point Optimization Non Convex-Concave Saddle Point Optimization Master Thesis Leonard Adolphs April 09, 2018 Advisors: Prof. Dr. Thomas Hofmann, M.Sc. Hadi Daneshmand Department of Computer Science, ETH Zürich Abstract

More information

LINEAR ALGEBRA - CHAPTER 1: VECTORS

LINEAR ALGEBRA - CHAPTER 1: VECTORS LINEAR ALGEBRA - CHAPTER 1: VECTORS A game to introduce Linear Algebra In measurement, there are many quantities whose description entirely rely on magnitude, i.e., length, area, volume, mass and temperature.

More information

Functional Analysis F3/F4/NVP (2005) Homework assignment 3

Functional Analysis F3/F4/NVP (2005) Homework assignment 3 Functional Analysis F3/F4/NVP (005 Homework assignment 3 All students should solve the following problems: 1. Section 4.8: Problem 8.. Section 4.9: Problem 4. 3. Let T : l l be the operator defined by

More information

A Strict Stability Limit for Adaptive Gradient Type Algorithms

A Strict Stability Limit for Adaptive Gradient Type Algorithms c 009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional A Strict Stability Limit for Adaptive Gradient Type Algorithms

More information

Inverse Power Method for Non-linear Eigenproblems

Inverse Power Method for Non-linear Eigenproblems Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems

More information

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the

More information

Basic Properties of Metric and Normed Spaces

Basic Properties of Metric and Normed Spaces Basic Properties of Metric and Normed Spaces Computational and Metric Geometry Instructor: Yury Makarychev The second part of this course is about metric geometry. We will study metric spaces, low distortion

More information

GTR # VLTs GTR/VLT/Day %Δ:

GTR # VLTs GTR/VLT/Day %Δ: MARYLAND CASINOS: MONTHLY REVENUES TOTAL REVENUE, GROSS TERMINAL REVENUE, WIN/UNIT/DAY, TABLE DATA, AND MARKET SHARE CENTER FOR GAMING RESEARCH, DECEMBER 2017 Executive Summary Since its 2010 casino debut,

More information

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Dot Products K. Behrend April 3, 008 Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Contents The dot product 3. Length of a vector........................

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

Convex optimization COMS 4771

Convex optimization COMS 4771 Convex optimization COMS 4771 1. Recap: learning via optimization Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w 2 2 + 1 n n [ ] 1 y ix T i w. + 1 / 15 Soft-margin

More information

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.) page 121 Index (Page numbers set in bold type indicate the definition of an entry.) A absolute error...26 componentwise...31 in subtraction...27 normwise...31 angle in least squares problem...98,99 approximation

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

Maximising the number of induced cycles in a graph

Maximising the number of induced cycles in a graph Maximising the number of induced cycles in a graph Natasha Morrison Alex Scott April 12, 2017 Abstract We determine the maximum number of induced cycles that can be contained in a graph on n n 0 vertices,

More information

Multi-Agent Learning with Policy Prediction

Multi-Agent Learning with Policy Prediction Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University

More information