Bayesian Games for Adversarial Regression Problems (Online Appendix)
|
|
- Dulcie Joseph
- 6 years ago
- Views:
Transcription
1 Online Appendix) Michael Großhans Christoph Sawade Michael Brückner Tobias Scheffer University of Potsdam, Department of Computer Science, August-Bebel-Straße 9, 44 Potsdam, Germany SoundCloud Ltd., Rosenthalerstraße, 9 Berlin, Germany A. Proofs Proof of Lemma The partial derivative of Equation 4 for an instance i is given by x i ˆθd w, X, c d ) = x T i w z i ) w + xi x i ). Due to the convexity of ˆθ d we can compute the minimum by equating it to zero and solving for x i using the Sherman-Morrison formula. This results in x x i = x i c d,i ) + ) w T i w z i w. The claim follows, since the optimal transformations are independent from each other. Proof of Lemma The partial derivate of Equation is given by ˆθ l w, φc d ), c l )dqc d ) w = diag c l )φc d )w y) T φc d )w y)dqc d ) w + w w ) = φc d ) T diag c l )φc d )dqc d ) w φc d )dqc d )) T diag c l )y + w. Setting this gradient equal to zero and solving it for w yields w [φ] = I m + ) φc d ) T diag c l )φc d )dqc d ) φc d )dqc d )) T diag c l )y. The matrix φc d ) T diag c l )φc d ) and hence the expectation φc d ) T diag c l )φc d )dqc d ) is positive semidefinite for any adversary s strategy φc d ). Thus, all eigenvalues λ i are non-negative. The corresponding eigenvectors are additionally eigenvectors of I m + φc d ) T diag c l )φc d )dqc d ) with eigenvalues + λ i >. Thus, the inverse matrix exists independently of the choice of φ. This proves the claim. Proof of Lemma We define the action spaces of G as the closure of the convex hull of optimal responses Φ = cl{φ Φ φ = τφ [w ] + τ)φ [w ], τ [, ], w, w W}) W = cl{w W w = τw [φ ] + τ)w [φ ], τ [, ], φ, φ Φ}), where cl M) denotes the closure of set M. Since a Bayesian equilibrium is a pair of optimal responses see Definition ), each equilibrium in the original game G is a member of the restricted space W Φ. Since both games have identical loss functions and costs distributions each equilibrium point in G is an equilibrium point G and vice versa. It remains to be shown that the spaces Φ and W of optimal responses are bounded. We start by showing that any optimal response of the data generator is
2 bounded. Using the triangle inequality, the spectral norm of an optimal response to a model parameter w see Lemma ) can be upper-bounded by φ [w]c d ) ) X + diag c d ) + w I n zw + ) diag c d ) + w I n Xww. 4) The third summand of the right-hand side in Inequality 4 can be upper bounded as follows. ) diag c d ) + w I n Xww diag c d ) + w n) I X ww T ) = w + min X w 6) i < X. 7) Equation follows by the sub-multiplicativity of the spectral norm. The spectral norm of a symmetric positive definite matrix is given by its largest eigenvalue, that is, the maximal diagonal entry of the positive diagonal matrix see Equation 6). Equation 7 holds, since c d >. Analogously, the second summand see Inequality 4) can be bounded using the sub-multiplicativity of the spectral norm Inequality ) and evaluating the largest eigenvalue of the diagonal matrix, which we split up into two factors see Equation 9). In Inequality we can bound each dominator by one of the positive summands. ) diag c d ) + w I n zw diag c d ) + w n) I z w ) z = w + min i w w + min i 9) z < w min i w ) = z max cd,i i ) Finally, using Bound 7 and the function space Φ is bounded by φ[w]c d ) dqc d ) < X + z max cd,i dqc d ) i since max i cd,i dqc d ) < n i + cd,i )dqc d ) < n + n i cd,i dqc d ) < exist. We now show that if the data generator s action space is convex and compact the space of optimal responses W of the learner is compact and convex as well. Let φc d ) be a data generator s strategy given costs c d. Then, the learner s optimal response given by Lemma can be bounded using the submultiplicativity of the l -norm w [φ] I m + φc d ) T diag c l )φc d )dqc d )) φc d )dqc d )) diag c l )y { } sup φ Φ φ c d ) dqc d ) diag c l )y. ) For any arbitrary symmetric, positive definite matrix the squared spectral norm equals its largest eigenvalue. Since φc d ) T diag c l )φc d )dqc d ) is positive semidefinite, all eigenvalues λ i are non-negative. The corresponding eigenvectors are additionally eigenvectors of I m + φc d ) T diag c l )φc d )dqc d ) with eigenvalues + λ i. Thus, the inverse matrix exists and its norm can be upper bounded by one see Inequality ). Then, the claim follows since the data generator s action space is bounded. Proof of Theorem Let a = a,..., a n ) T be an arbitrary vector. Then, following Taylor s Theorem the data generator s optimal response can be written as φ [w]c d ) = φ t;a [w]c d ) + R t;a c d ), t = diag c d a) r C r a) + R t;a c d ), r= with Cauchy remainder R t;a c d ) = t + )diag c d ξ) t diag c d a)c t+ ξ), for some ξ = ξ,..., ξ n ) T with ξ i [, a i ] or ξ i [a i, ], respectively. It remains to show that a can be chosen, such that R t;a c d ) converges to the null matrix, or equivalently, that lim R t;a c d ) t = holds.
3 4 Bayes Nash Ridge σ =.. expected costs µ µ = 4.. variance of costs σ Figure. Evaluation against an adversary that follows a Bayesian equilibrium strategy for varying cost parameters. Error bars show standard errors. The limit of the Cauchy remainder can be expressed as lim R t;ac d ) t t = lim + )diag cd ξ) t diag c d a)c t+ ξ) t t lim + )diag cd ξ) t diag c d a) t )) t+) I n + diag w a t w ) = lim max t + ) ξ i ) t a i ) t i ) t + w a t i w. 4) In Equation we dismiss the two terms ) t+ and Xw z) w T from C t+ ξ) see Equation ). This leads to a diagonal matrix whose spectral norm is given by the maximal absolute diagonal entry of the matrix see Equation 4). If w = the claim holds. So let w >. Then, Equation 4 can be factorized as lim R t;ac d ) t ) t lim max ξ i t + ) t i w + w i) a cd,i a i ). ) A sufficient condition for the geometric sequence and thus for Equation to tend to zero is that ξ i w k < 6) holds for all i and a fixed k. In the following we show by case differentiation according to that Condition 6 is fulfilled for a i = sup { c d qc d ) > }, where c = max i c i ) is the maximum norm. Let a i. Since ξ i [a i, ] it follows that a i ξ i a i and, thus, the quotient can be upperbounded by ξ i w ξ i w a i a i w = k. Since w >, it holds that k <. Now assume that a i >. Since the data generator s costs are bounded from below by zero, it follows that ξ i a i. Hence, the quotient can be upper-bounded by ξ i w ξ i a i w w = k. This proves the claim. B. Comprehensive Empirical Results In Section 6 we studied the behavior of the Bayesian regression model in the context of spam filtering. We compared the Bayesian game regression
4 model denoted Bayes) to the Nash game regression model denoted Nash), the robust ridge regression denoted Minimax), and a regular ridge regression denoted Ridge). Depiction of Theorem In Theorem we derived sufficient conditions for unique equilibrium points. The uniqueness depends on both players costs c d and c l. Figure 6 left) depicts the optimal responses of the learner horizontal axis) and the data generator vertical axis) for different costs c l = c d in a one-dimensional regression game where X =, y =, and z = without uncertainty. Their intersections constitute the equilibrium of the corresponding game. If the costs become too large, such that Condition is violated for any w, φ) [, ] [, ], the game G is no longer locally convex indicated by the dashed black line). Playing against a Bayesian adversary In a first experiment, we evaluated how the methods perform against an adversary that chooses a strategy according to a Bayesian equilibrium. We choose q ) as a gamma distribution and varied its mean µ and the variance σ. We set the Nash model s conjecture for all values of to the mean µ; this is optimal if the data generator s costs are drawn from a single-point distribution q ) = δ = µ). Figure shows the root mean squared error ) for Bayes, Nash, and Ridge as a function of µ and σ left). Figure right) shows a sectional view along µ top) and along σ bottom). We observe that all methods coincide if the data generator s costs vanish. The advantage of Bayes over Nash grows with the variance of q. Playing against actual adversaries In a second experiment, we evaluate all methods over time into the future. Here, the models play against actual spammers. Additionally, in order to artificially create a mismatch to our modeling assumptions, we also evaluate the models on test data from the past. The training sample of instances are drawn from month k. To evaluate into the future, the regularization parameters of all learners for Bayes, we use a single cost parameter for all instances) are tuned on, instances from month k + ; for evaluation into the past, tuning data are drawn from month k. Test data are then drawn from months k + to k + and from months k to k 6, respectively. This process is repeated and measurements are averaged over ten resampling iterations of the training set and, in an outer loop, over four training months k March to June ). The data generator s costs parameters are set to µ =. and σ =. for Bayes and to µ =. for Nash. Figure 4 left) shows the over time for fixed mean µ =. and variance σ =.. Figure 4 center) depicts the for a fixed point in the past over a range of different values for µ. Figure 4 right) shows the training times of the Bayesian equilibrium model and reference models for varying number of attributes. Additionally we study the shift in spam mails in reality and compare it with the computed equilibrium point. For depiction we choose the two most discriminant principal components with respect to spam and non-spam. We train separate Nash models on March 7 see section 6), April 7, May 7 and June 7. Again we use instances and the data generator s costs are set to µ =.. The learner s costs are tuned on the subsequent month. Given the Nash model we are able to extract the optimal transformed data from training month according to Lemma. Figure depicts a training samples blue), test samples from the six subsequent months green/yellow) and the computed equilibrium data red) for three different training months April - June 7); a fourth is shown in Figure. If changes of spam mails are continuous they can be well predicted see e.g. upper right corner). If instead the changes are more volatile see e.g. lower right corner) the Nash model is not able to predict the appearance of new spam mails. Approximation of adversary s responses Finally, we study the impact of the degree t of the Taylor approximation on the accuracy and execution time of Bayes. Figure 6 center) shows the evaluated over time for t =,, see Equation ). Figure 6 right, top) shows the execution time depending on the number of training s for a fixed number of attributes m = ). Figure 6 right, bottom) shows the execution time depending on the number of attributes for a fixed number of training mails n = ).
5 Bayesian Games for Adversarial Regression Problems Bayes Nash Minimax Ridge k 6 k 4 k Execution Time in sec) Evaluation into Past k 4) Tuning Past Training Tuning Future Evaluation into Past and Future relative) k k+ k+4 k+6 month. expected costs µ 6. number of attributes Figure 4. Evaluation of regression models with fixed expected costs into the past and future left) and varying expected costs into the past center). Execution Time right). Error bars show standard errors... Apr May Oct.. May Principal Component Jun Nov Principal Component Principal Component Principal Component.. Principal Component Jun Jul Dec Principal Component Figure. Shift in spam mails over time for three different training months. Training data is shown in blue; the imaginary data are shown in red. Evaluation into Past and Future relative) cv=.. cv=. g*[w] cv=. cv=.. Bayes t=) Bayes t=) Bayes t=) Tuning Past Training Tuning Future Optimal Responses x=, y=, t=) w*[g] k 6 k 4 k k k+ k+4 k+6 month number of training mails. Execution Time in sec) number of attributes Figure 6. Optimal responses for one-dimensional regression games left). Evaluation of Bayesian model with different Taylor degrees t over time center). Execution Time right).
5 Compact linear operators
5 Compact linear operators One of the most important results of Linear Algebra is that for every selfadjoint linear map A on a finite-dimensional space, there exists a basis consisting of eigenvectors.
More informationSTATIC PREDICTION GAMES FOR ADVERSARIAL LEARNING PROBLEMS. Preprint 304 August 2011
STATIC PREDICTION GAMES FOR ADVERSARIAL LEARNING PROBLEMS Michael Brückner 1, Christian Kanzow 2, and Tobias Scheffer 1 Preprint 304 August 2011 1 University of Potsdam Department of Computer Science August-Bebel-Str.
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationManaging Uncertainty
Managing Uncertainty Bayesian Linear Regression and Kalman Filter December 4, 2017 Objectives The goal of this lab is multiple: 1. First it is a reminder of some central elementary notions of Bayesian
More informationCHAPTER 7: Systems and Inequalities
(Exercises for Chapter 7: Systems and Inequalities) E.7.1 CHAPTER 7: Systems and Inequalities (A) means refer to Part A, (B) means refer to Part B, etc. (Calculator) means use a calculator. Otherwise,
More informationA A x i x j i j (i, j) (j, i) Let. Compute the value of for and
7.2 - Quadratic Forms quadratic form on is a function defined on whose value at a vector in can be computed by an expression of the form, where is an symmetric matrix. The matrix R n Q R n x R n Q(x) =
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationApplications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012
Applications of Robust Optimization in Signal Processing: Beamforg and Power Control Fall 2012 Instructor: Farid Alizadeh Scribe: Shunqiao Sun 12/09/2012 1 Overview In this presentation, we study the applications
More informationA note on the σ-algebra of cylinder sets and all that
A note on the σ-algebra of cylinder sets and all that José Luis Silva CCM, Univ. da Madeira, P-9000 Funchal Madeira BiBoS, Univ. of Bielefeld, Germany (luis@dragoeiro.uma.pt) September 1999 Abstract In
More informationNear Optimal Adaptive Robust Beamforming
Near Optimal Adaptive Robust Beamforming The performance degradation in traditional adaptive beamformers can be attributed to the imprecise knowledge of the array steering vector and inaccurate estimation
More informationNonlinear equations. Norms for R n. Convergence orders for iterative methods
Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector
More informationOnline Bounds for Bayesian Algorithms
Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present
More information1 Directional Derivatives and Differentiability
Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationThe Numerics of GANs
The Numerics of GANs Lars Mescheder 1, Sebastian Nowozin 2, Andreas Geiger 1 1 Autonomous Vision Group, MPI Tübingen 2 Machine Intelligence and Perception Group, Microsoft Research Presented by: Dinghan
More informationA geometric proof of the spectral theorem for real symmetric matrices
0 0 0 A geometric proof of the spectral theorem for real symmetric matrices Robert Sachs Department of Mathematical Sciences George Mason University Fairfax, Virginia 22030 rsachs@gmu.edu January 6, 2011
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationConvex Sets Strict Separation. in the Minimax Theorem
Applied Mathematical Sciences, Vol. 8, 2014, no. 36, 1781-1787 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.4271 Convex Sets Strict Separation in the Minimax Theorem M. A. M. Ferreira
More informationTheorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers
Page 1 Theorems Wednesday, May 9, 2018 12:53 AM Theorem 1.11: Greatest-Lower-Bound Property Suppose is an ordered set with the least-upper-bound property Suppose, and is bounded below be the set of lower
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationNotes on Complex Analysis
Michael Papadimitrakis Notes on Complex Analysis Department of Mathematics University of Crete Contents The complex plane.. The complex plane...................................2 Argument and polar representation.........................
More informationSPRING 2006 PRELIMINARY EXAMINATION SOLUTIONS
SPRING 006 PRELIMINARY EXAMINATION SOLUTIONS 1A. Let G be the subgroup of the free abelian group Z 4 consisting of all integer vectors (x, y, z, w) such that x + 3y + 5z + 7w = 0. (a) Determine a linearly
More information6.896: Topics in Algorithmic Game Theory
6.896: Topics in Algorithmic Game Theory Audiovisual Supplement to Lecture 5 Constantinos Daskalakis On the blackboard we defined multi-player games and Nash equilibria, and showed Nash s theorem that
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationAN INTRODUCTION TO MATHEMATICAL ANALYSIS ECONOMIC THEORY AND ECONOMETRICS
AN INTRODUCTION TO MATHEMATICAL ANALYSIS FOR ECONOMIC THEORY AND ECONOMETRICS Dean Corbae Maxwell B. Stinchcombe Juraj Zeman PRINCETON UNIVERSITY PRESS Princeton and Oxford Contents Preface User's Guide
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationTwo hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.
Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION - SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationBayesian consistent prior selection
Bayesian consistent prior selection Christopher P. Chambers and Takashi Hayashi August 2005 Abstract A subjective expected utility agent is given information about the state of the world in the form of
More informationContents. Preface xi. vii
Preface xi 1. Real Numbers and Monotone Sequences 1 1.1 Introduction; Real numbers 1 1.2 Increasing sequences 3 1.3 Limit of an increasing sequence 4 1.4 Example: the number e 5 1.5 Example: the harmonic
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationDistributed intelligence in multi agent systems
Distributed intelligence in multi agent systems Usman Khan Department of Electrical and Computer Engineering Tufts University Workshop on Distributed Optimization, Information Processing, and Learning
More informationConvex Geometry. Carsten Schütt
Convex Geometry Carsten Schütt November 25, 2006 2 Contents 0.1 Convex sets... 4 0.2 Separation.... 9 0.3 Extreme points..... 15 0.4 Blaschke selection principle... 18 0.5 Polytopes and polyhedra.... 23
More informationInterval solutions for interval algebraic equations
Mathematics and Computers in Simulation 66 (2004) 207 217 Interval solutions for interval algebraic equations B.T. Polyak, S.A. Nazin Institute of Control Sciences, Russian Academy of Sciences, 65 Profsoyuznaya
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationEXISTENCE OF PURE EQUILIBRIA IN GAMES WITH NONATOMIC SPACE OF PLAYERS. Agnieszka Wiszniewska Matyszkiel. 1. Introduction
Topological Methods in Nonlinear Analysis Journal of the Juliusz Schauder Center Volume 16, 2000, 339 349 EXISTENCE OF PURE EQUILIBRIA IN GAMES WITH NONATOMIC SPACE OF PLAYERS Agnieszka Wiszniewska Matyszkiel
More informationarxiv: v1 [stat.ml] 3 Apr 2017
Geometric Insights into SVM Tuning Geometric Insights into Support Vector Machine Behavior using the KKT Conditions arxiv:1704.00767v1 [stat.ml] 3 Apr 2017 Iain Carmichael Department of Statistics and
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationMATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018
MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S
More informationMath Tune-Up Louisiana State University August, Lectures on Partial Differential Equations and Hilbert Space
Math Tune-Up Louisiana State University August, 2008 Lectures on Partial Differential Equations and Hilbert Space 1. A linear partial differential equation of physics We begin by considering the simplest
More informationM17 MAT25-21 HOMEWORK 6
M17 MAT25-21 HOMEWORK 6 DUE 10:00AM WEDNESDAY SEPTEMBER 13TH 1. To Hand In Double Series. The exercises in this section will guide you to complete the proof of the following theorem: Theorem 1: Absolute
More informationDef. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as
MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2
More informationPopulation Games and Evolutionary Dynamics
Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationGAMINGRE 8/1/ of 7
FYE 09/30/92 JULY 92 0.00 254,550.00 0.00 0 0 0 0 0 0 0 0 0 254,550.00 0.00 0.00 0.00 0.00 254,550.00 AUG 10,616,710.31 5,299.95 845,656.83 84,565.68 61,084.86 23,480.82 339,734.73 135,893.89 67,946.95
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationPrincipal Component Analysis
Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationAnalysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103
Analysis and Linear Algebra Lectures 1-3 on the mathematical tools that will be used in C103 Set Notation A, B sets AcB union A1B intersection A\B the set of objects in A that are not in B N. Empty set
More informationLecture 7 Monotonicity. September 21, 2008
Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity
More informationOctober 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable
International Journal of Wavelets, Multiresolution and Information Processing c World Scientic Publishing Company Polynomial functions are renable Henning Thielemann Institut für Informatik Martin-Luther-Universität
More informationFunctional Analysis HW #3
Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform
More informationA Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels
A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels Mehdi Mohseni Department of Electrical Engineering Stanford University Stanford, CA 94305, USA Email: mmohseni@stanford.edu
More informationLinear algebra for computational statistics
University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationIn these chapter 2A notes write vectors in boldface to reduce the ambiguity of the notation.
1 2 Linear Systems In these chapter 2A notes write vectors in boldface to reduce the ambiguity of the notation 21 Matrix ODEs Let and is a scalar A linear function satisfies Linear superposition ) Linear
More information1 The General Definition
MS&E 336 Lecture 1: Dynamic games Ramesh Johari April 4, 2007 1 The General Definition A dynamic game (or extensive game, or game in extensive form) consists of: A set of players N; A set H of sequences
More information,... We would like to compare this with the sequence y n = 1 n
Example 2.0 Let (x n ) n= be the sequence given by x n = 2, i.e. n 2, 4, 8, 6,.... We would like to compare this with the sequence = n (which we know converges to zero). We claim that 2 n n, n N. Proof.
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationEquivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 12, DECEMBER 2008 2009 Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation Yuanqing Li, Member, IEEE, Andrzej Cichocki,
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationConstructive Proof of the Fan-Glicksberg Fixed Point Theorem for Sequentially Locally Non-constant Multi-functions in a Locally Convex Space
Constructive Proof of the Fan-Glicksberg Fixed Point Theorem for Sequentially Locally Non-constant Multi-functions in a Locally Convex Space Yasuhito Tanaka, Member, IAENG, Abstract In this paper we constructively
More informationRobust Efficient Frontier Analysis with a Separable Uncertainty Model
Robust Efficient Frontier Analysis with a Separable Uncertainty Model Seung-Jean Kim Stephen Boyd October 2007 Abstract Mean-variance (MV) analysis is often sensitive to model mis-specification or uncertainty,
More informationThe Free Matrix Lunch
The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction
More informationHow to Convexify the Intersection of a Second Order Cone and a Nonconvex Quadratic
How to Convexify the Intersection of a Second Order Cone and a Nonconvex Quadratic arxiv:1406.1031v2 [math.oc] 5 Jun 2014 Samuel Burer Fatma Kılınç-Karzan June 3, 2014 Abstract A recent series of papers
More informationAdversarial Classification
Adversarial Classification Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, Deepak Verma [KDD 04, Seattle] Presented by: Aditya Menon UCSD April 22, 2008 Presented by: Aditya Menon (UCSD) Adversarial
More informationDEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular
form) Given: matrix C = (c i,j ) n,m i,j=1 ODE and num math: Linear algebra (N) [lectures] c phabala 2016 DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix
More informationCromwell's principle idealized under the theory of large deviations
Cromwell's principle idealized under the theory of large deviations Seminar, Statistics and Probability Research Group, University of Ottawa Ottawa, Ontario April 27, 2018 David Bickel University of Ottawa
More informationZero sum games Proving the vn theorem. Zero sum games. Roberto Lucchetti. Politecnico di Milano
Politecnico di Milano General form Definition A two player zero sum game in strategic form is the triplet (X, Y, f : X Y R) f (x, y) is what Pl1 gets from Pl2, when they play x, y respectively. Thus g
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER 2010 6433 Bayesian Minimax Estimation of the Normal Model With Incomplete Prior Covariance Matrix Specification Duc-Son Pham, Member,
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationMajorization Minimization - the Technique of Surrogate
Majorization Minimization - the Technique of Surrogate Andersen Ang Mathématique et de Recherche opérationnelle Faculté polytechnique de Mons UMONS Mons, Belgium email: manshun.ang@umons.ac.be homepage:
More informationLINEAR AND NONLINEAR PROGRAMMING
LINEAR AND NONLINEAR PROGRAMMING Stephen G. Nash and Ariela Sofer George Mason University The McGraw-Hill Companies, Inc. New York St. Louis San Francisco Auckland Bogota Caracas Lisbon London Madrid Mexico
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationA priori bounds on the condition numbers in interior-point methods
A priori bounds on the condition numbers in interior-point methods Florian Jarre, Mathematisches Institut, Heinrich-Heine Universität Düsseldorf, Germany. Abstract Interior-point methods are known to be
More informationNon Convex-Concave Saddle Point Optimization
Non Convex-Concave Saddle Point Optimization Master Thesis Leonard Adolphs April 09, 2018 Advisors: Prof. Dr. Thomas Hofmann, M.Sc. Hadi Daneshmand Department of Computer Science, ETH Zürich Abstract
More informationLINEAR ALGEBRA - CHAPTER 1: VECTORS
LINEAR ALGEBRA - CHAPTER 1: VECTORS A game to introduce Linear Algebra In measurement, there are many quantities whose description entirely rely on magnitude, i.e., length, area, volume, mass and temperature.
More informationFunctional Analysis F3/F4/NVP (2005) Homework assignment 3
Functional Analysis F3/F4/NVP (005 Homework assignment 3 All students should solve the following problems: 1. Section 4.8: Problem 8.. Section 4.9: Problem 4. 3. Let T : l l be the operator defined by
More informationA Strict Stability Limit for Adaptive Gradient Type Algorithms
c 009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional A Strict Stability Limit for Adaptive Gradient Type Algorithms
More informationInverse Power Method for Non-linear Eigenproblems
Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems
More information1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the
More informationBasic Properties of Metric and Normed Spaces
Basic Properties of Metric and Normed Spaces Computational and Metric Geometry Instructor: Yury Makarychev The second part of this course is about metric geometry. We will study metric spaces, low distortion
More informationGTR # VLTs GTR/VLT/Day %Δ:
MARYLAND CASINOS: MONTHLY REVENUES TOTAL REVENUE, GROSS TERMINAL REVENUE, WIN/UNIT/DAY, TABLE DATA, AND MARKET SHARE CENTER FOR GAMING RESEARCH, DECEMBER 2017 Executive Summary Since its 2010 casino debut,
More informationDot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.
Dot Products K. Behrend April 3, 008 Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Contents The dot product 3. Length of a vector........................
More informationAssignment 1: From the Definition of Convexity to Helley Theorem
Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x
More informationConvex optimization COMS 4771
Convex optimization COMS 4771 1. Recap: learning via optimization Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w 2 2 + 1 n n [ ] 1 y ix T i w. + 1 / 15 Soft-margin
More informationIndex. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)
page 121 Index (Page numbers set in bold type indicate the definition of an entry.) A absolute error...26 componentwise...31 in subtraction...27 normwise...31 angle in least squares problem...98,99 approximation
More informationLecture 8 : Eigenvalues and Eigenvectors
CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with
More informationMaximising the number of induced cycles in a graph
Maximising the number of induced cycles in a graph Natasha Morrison Alex Scott April 12, 2017 Abstract We determine the maximum number of induced cycles that can be contained in a graph on n n 0 vertices,
More informationMulti-Agent Learning with Policy Prediction
Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University
More information