COPS. Cluster Optimized Proximity Scaling
|
|
- Stuart Willis
- 5 years ago
- Views:
Transcription
1 COPS Cluster Optimized Proximity Scaling SLIDE 1 Psychoco 2015,
2 Outline 1 Objectives of Multidimensional Scaling 2 COPS: Cluster optimized proximity scaling C-Clusteredness and an Index The COPS Procedure Optimization Package 3 Conclusion And Outlook This is joint work with Patrick Mair and Kurt Hornik. SLIDE 2 Psychoco 2015,
3 Multidimensional Scaling (MDS) - I Popular method for representing multivariate high-dimensional proximities in some lower-dimensional space MDS utilizes a loss function, e.g., a least squares one σ MDS (X) = i<j w ij [f(δ ij ) g(d ij (X))] 2 and minimizes it to find the configuration X arg min σ MDS (X) X d ij (X)... fitted distances δ ij... proximities w ij... finite weights g( ), f( )... transformation functions, usually the identity function I( ) SLIDE 3 Psychoco 2015,
4 Multidimensional Scaling (MDS) - II Provides an optimal map into continuous space R M and looks for directions of spread in the low dimensional space (objective 1) But often one is also interested in discrete structures of similarity between objects ( clusters ; objective 2) MDS does solve objective 1 but not objective 2. The latter is often inferred from the former by how it looks It can happen that what is optimal for objective 1 is not very useful for objective 2 SLIDE 4 Psychoco 2015,
5 Illustration I m a Republican, because... from Mair et al. (2014) Supporters of the Republican Party have been asked why they are Republican (254 statements) Natural language data that was scraped and processed = Sparse data matrix (document term matrix) Objects are the words (we use only words that appeared at least 10 times) We look for themes in the statements: Mantras (words that occur often together) We use a cosine distance for word co-occurences and apply standard least squares MDS (SMACOF) for representation. SLIDE 5 Psychoco 2015,
6 Illustration Republican Mantras? Configurations D responsibility personal low limited military taxes defense national strong fiscal people right small government individual constitution liberty freedom life principles free best family values founding market conservative country party great america will american god nation work hard Configurations D1 SLIDE 6 Psychoco 2015,
7 Illustration Optimal configuration does not have an an all too obvious clustering structure. One way out: Fit metric MDS with power transformation by setting, e.g., f(δ ij ) = δ 20 ij Clustering is clearer but the fit is now worse (0.946 versus 0.947) Republican Mantras?! people workhard Configurations D small military taxes fiscal responsibility personal freedom government strong limited defense national right individual liberty low values conservative family principles american life constitution free party america great country nation will best market god founding Configurations D1 SLIDE 7 Psychoco 2015,
8 COPS for the Rescue We propose a general solution to this problem that consists of the following steps: Use MDS loss with θ-parametrized strictly monotonic nonlinear transformations of either proximities or fitted distances or both e.g., power transformations (powerstress, g(d ij (X)) = d ij (X) κ and f(δ ij ) = δ λ ij, so θ = c(κ, λ)) Use an index of the obtained degree of clusteredness in the configuration (c-clusteredness) to quantify how clustered the result is Combine the stress function, the transformations and the clusteredness index into a single target function and optimize over the parameters We call this COPS (Cluster Optimized Proximity Scaling; Rusch et al., 2015a) SLIDE 8 Psychoco 2015,
9 C-Clusteredness C-Clusteredness: The amount of clusteredness of a configuration c clusteredness= 0 c clusteredness= 0.03 c clusteredness= 0.23 D1 c clusteredness= 0.36 D1 c clusteredness= 0.61 D1 c clusteredness= 1 D1 SLIDE 9 Psychoco 2015, D1 D1
10 OPTICS Cordillera - I Index for clusteredness: OPTICS cordillera Employs OPTICS (Ankerst et al., 1999) with metaparameters k, ɛ on the configuration distances. For row vectors x j of X returns an ordering R of these points, R = {x (i) } i=1,...,n. So, x (1) is the x j that is at position 1 in the ordering. OPTICS also returns a reachability plot (dendrogram of minimum reachabilities r (i) of point x (i)) Ordering and reachability represent the clustering structure. We aggregate that to an index OC(X) by defining (for metaparameter q > 0) ( N i=2 OC(X) = r (i) ) 1/q r (i 1) q C C... (optional) normalizing constant SLIDE 10 Psychoco 2015,
11 OPTICS Cordillera - II c clusteredness= 0 c clusteredness= 0.03 c clusteredness= 0.36 c clusteredness= 1 D1 D1 D1 D SLIDE 11 Psychoco 2015,
12 Properties of the OPTICS Cordillera For given metaparameters ɛ, k, q the following applies (Rusch et al., 2015a) Upper bound for OC(X) in the maximal c-clusteredness case ( ) C (X, d max, ɛ, k, q) = d q N 1 N 1 max + k k Cluster assignment or a priori defined number or shape of clusters not needed OC(X) typically increases when Distances between clusters increase (Emphasis Property) Points are more densely clustered (Density Property) Number of clusters increases (Tally Property) Does not pick up unbalancedness in the number of points in a cluster as a sign of c-clusteredness (Balance Property) SLIDE 12 Psychoco 2015,
13 The Full COPS Procedure Combine the θ parametrized MDS loss measure, σ MDS (X(θ), θ) and the OPTICS cordillera OC(X) to cluster optimized loss (coploss): coploss(θ) = v 1 σ MDS (X(θ), θ) v 2 OC (X(θ)) (1) with arg min X σ MDS (X, θ) := X(θ) and v 1, v 2 R controlling how much weight should be given to the individual parts of coploss, e.g, v 1 = 1, v 2 = σ MDS (X(θ 0 ), θ 0 ) OC (X(θ 0 )) with θ 0 some reference solution, e.g., for powerstress θ 0 = (1, 1). SLIDE 13 Psychoco 2015,
14 Optimization - I We need to do (θ is t dimensional) coploss(θ) min θ! We use a nested algorithm that first solves for X(θ) and then minimizes (1) over θ. For the inner part, i.e., finding X(θ) standard MDS optimization is used (e.g., majorization) The outer part of this optimization problem is complicated so we employ metaheuristics The inner minimization is costly, so a useful metaheuristic makes little evaluations of the outer function (which is okay if t is small) Simulated Annealing or population based algorithms are not that well suited We made good experiences with a customized Luus-Jaakola algorithm (usually converges to good solution in < 200 iterations for a minimal search space width (accd) of ). SLIDE 14 Psychoco 2015,
15 Optimization - II Adaptive Luus-Jakola Algorithm (ALJ): An adaptation of Luus-Jakola search (Luus & Jaakola, 1973) Sample θ (i) from within t-orthotope [l, u] t with l, u are lower, upper boundaries Set d to be the length of the search space Repeat until termination (accd, maxiter, acc) : Pick a (i) U t ( d, d) Set θ (i+1) θ (i) + a (i) If coploss(θ (i+1) ) < coploss(θ (i) ) set θ (opt) = θ (i+1), else set d = d s Here (this is the customized part): s = o ) m+1 i m, m = min, maxiter and 0 o 1. ( log(accd) log(max(u l)) log(o) SLIDE 15 Psychoco 2015,
16 R Package stops All of this is implemented in the R package stops High level function cops(proximitymatrix,loss,...) Prespecified MDS models are strain, symmetric SMACOF (smacofsym), sammon mapping, elastic scaling, SMACOF on a sphere (smacofsphere), sstress, rstress, powerstress, Sammon mapping and elastic scaling with powers (powersammon, powerelastic) Optimization with ALJ or simulated annealing (SANN) or a particle swarm algorithm (pso) Features cordillera and interface to OPTICS in ELKI (optics) S3 methods: plot, summary, print, coef, residuals, plot3d, plot3dstatic SLIDE 16 Psychoco 2015,
17 Example: Republicans We now use COPS with powerstress on the I m a Republican, because... data set: R> resc <- cops(dt.dist,loss="powerstress", + lower=c(1,1),minpts=6,upper=c(3,20)) R> resc Call: cops(dis = dt.dist, loss = "powerstress", theta = c(1, 1), minpts = 6, lower = c(1, 1), upper = c(3, 20)) Model: COPS with powerstress loss function and parameters kappa= lambda= Number of objects: 37 MDS loss value: OPTICS cordillera: Raw= Normed= Cluster optimized loss (coploss): MDS loss weight: 1, OPTICS cordillera weight: Number of iterations of ALJ optimization: 117 SLIDE 17 Psychoco 2015,
18 Example: Republicans R> plot(resc) Republican Mantras! Configurations D american foundinglife great nation conservative party free Paleocon+Populist Right constitution america principles liberty individual right national defense Neocon+Liberalism limited strong will market country god hard work Traditionalist+Compassionate best responsibility personal government taxes low Fiscalcon+Libertarian fiscal military values small freedom Unclustered (cut at eps=0.6) family people Configurations D1 SLIDE 18 Psychoco 2015,
19 Summary COPS COPS works well when the objective is to obtain both a scaling and a clustering It is easily adaptable to many other loss functions It is particularly useful when there is only little variability in the proximities C-Clusteredness and OPTICS cordillera A concept and a measure of goodness-of-clustering in dimension reduction results that has appealing properties Interesting beyond COPS SLIDE 19 Psychoco 2015,
20 Outlook Beyond COPS c-clusteredness is an aspect of a more general idea which we coined c-structuredness (Rusch et al., 2015b) The idea of COPS can be generalized to Augmented Nonlinear Dimension Reduction and STOPS (Structure optimized proximity scaling) (Rusch et al., 2015b) Nearly there, only a few kinks to even out Future research Issues with finding global optimum Speed up the optimization problem (inner minimization) Inference is still unsolved (but we re working on that too) SLIDE 20 Psychoco 2015,
21 References Ankerst, M., Breunig, M., Kriegel, H.-P. & Sander, J. (1999) OPTICS: Ordering points to identify the clustering structure, ACM Sigmod Record 28, Luus, R. & Jaakola, T. (1973) Optimization by direct search and systematic reduction of the size of search region, AIChE Journal, 19, Mair, P., Rusch, T. & Hornik, K. (2014) The grand old party - A party of values? SpringerPlus, 3:697. Rusch, T., Mair, P. & Hornik, K. (2015a) COPS: Cluster optimized proximity scaling, Report 2015/1, Discussion Paper Series, Center for Empirical Research Methods, WU Vienna University of Economics and Business. Rusch, T., Mair, P. & Hornik, K. (2015b) Structuredness Indices and Augmented Nonlinear Dimension Reduction, Report 2015/X, Discussion Paper Series, Center for Empirical Research Methods, WU Vienna University of Economics and Business. forthcoming SLIDE 21 Psychoco 2015,
22 Thank you for your attention Thomas Rusch Competence Center for Empirical Research Methods URL: WU Vienna University of Economics and Business Welthandelsplatz 1, 1020 Vienna Austria SLIDE 22 Psychoco 2015,
Multidimensional Scaling in R: SMACOF
Multidimensional Scaling in R: SMACOF Patrick Mair Institute for Statistics and Mathematics WU Vienna University of Economics and Business Jan de Leeuw Department of Statistics University of California,
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More informationSGN Advanced Signal Processing Project bonus: Sparse model estimation
SGN 21006 Advanced Signal Processing Project bonus: Sparse model estimation Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 12 Sparse models Initial problem: solve
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More informationSolving Linear Systems
Solving Linear Systems Iterative Solutions Methods Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) Linear Systems Fall 2015 1 / 12 Introduction We continue looking how to solve linear systems of
More informationMultidimensional scaling (MDS)
Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to map data points in R p to a lower-dimensional coordinate system. However, MSD approaches the problem somewhat
More informationRecommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007
Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007 Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative
More informationNonlinear Manifold Learning Summary
Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationIntroduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33
Introduction 1 STA442/2101 Fall 2016 1 See last slide for copyright information. 1 / 33 Background Reading Optional Chapter 1 of Linear models with R Chapter 1 of Davison s Statistical models: Data, and
More informationEvaluating Goodness of Fit in
Evaluating Goodness of Fit in Nonmetric Multidimensional Scaling by ALSCAL Robert MacCallum The Ohio State University Two types of information are provided to aid users of ALSCAL in evaluating goodness
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationLongitudinal Data Analysis of Health Outcomes
Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development
More informationMixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette
Mixed Models II - Behind the Scenes Report Revised June 11, 2002 by G. Monette What is a mixed model "really" estimating? Paradox lost - paradox regained - paradox lost again. "Simple example": 4 patients
More informationIterative regularization of nonlinear ill-posed problems in Banach space
Iterative regularization of nonlinear ill-posed problems in Banach space Barbara Kaltenbacher, University of Klagenfurt joint work with Bernd Hofmann, Technical University of Chemnitz, Frank Schöpfer and
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationFundamentals of Operations Research. Prof. G. Srinivasan. Indian Institute of Technology Madras. Lecture No. # 15
Fundamentals of Operations Research Prof. G. Srinivasan Indian Institute of Technology Madras Lecture No. # 15 Transportation Problem - Other Issues Assignment Problem - Introduction In the last lecture
More informationFACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT
More informationItem Response Theory for Conjoint Survey Experiments
Item Response Theory for Conjoint Survey Experiments Devin Caughey Hiroto Katsumata Teppei Yamamoto Massachusetts Institute of Technology PolMeth XXXV @ Brigham Young University July 21, 2018 Conjoint
More informationLinear Regression Models
Linear Regression Models Model Description and Model Parameters Modelling is a central theme in these notes. The idea is to develop and continuously improve a library of predictive models for hazards,
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationLinear Programming The Simplex Algorithm: Part II Chapter 5
1 Linear Programming The Simplex Algorithm: Part II Chapter 5 University of Chicago Booth School of Business Kipp Martin October 17, 2017 Outline List of Files Key Concepts Revised Simplex Revised Simplex
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was
More informationInverse Power Method for Non-linear Eigenproblems
Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationStatistícal Methods for Spatial Data Analysis
Texts in Statistícal Science Statistícal Methods for Spatial Data Analysis V- Oliver Schabenberger Carol A. Gotway PCT CHAPMAN & K Contents Preface xv 1 Introduction 1 1.1 The Need for Spatial Analysis
More informationVector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval INF 141 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Querying Corpus-wide statistics Querying
More informationRandom Numbers and Simulation
Random Numbers and Simulation Generating random numbers: Typically impossible/unfeasible to obtain truly random numbers Programs have been developed to generate pseudo-random numbers: Values generated
More informationAugmented and unconstrained: revisiting the Regional Knowledge Production Function
Augmented and unconstrained: revisiting the Regional Knowledge Production Function Sylvie Charlot (GAEL INRA, Grenoble) Riccardo Crescenzi (SERC LSE, London) Antonio Musolesi (University of Ferrara & SEEDS
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationOligopoly. Firm s Profit Maximization Firm i s profit maximization problem: Static oligopoly model with n firms producing homogenous product.
Oligopoly Static oligopoly model with n firms producing homogenous product. Firm s Profit Maximization Firm i s profit maximization problem: Max qi P(Q)q i C i (q i ) P(Q): inverse demand curve: p = P(Q)
More informationHigh-dimensional statistics and data analysis Course Part I
and data analysis Course Part I 3 - Computation of p-values in high-dimensional regression Jérémie Bigot Institut de Mathématiques de Bordeaux - Université de Bordeaux Master MAS-MSS, Université de Bordeaux,
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More information1 Maximizing a Submodular Function
6.883 Learning with Combinatorial Structure Notes for Lecture 16 Author: Arpit Agarwal 1 Maximizing a Submodular Function In the last lecture we looked at maximization of a monotone submodular function,
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationSupply Chain Network Structure and Risk Propagation
Supply Chain Network Structure and Risk Propagation John R. Birge 1 1 University of Chicago Booth School of Business (joint work with Jing Wu, Chicago Booth) IESE Business School Birge (Chicago Booth)
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationMultivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis
Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download
More informationAn Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement
An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement Satyanath Bhat Joint work with: Shweta Jain, Sujit Gujar, Y. Narahari Department of Computer Science and Automation, Indian
More informationA Note on Demand Estimation with Supply Information. in Non-Linear Models
A Note on Demand Estimation with Supply Information in Non-Linear Models Tongil TI Kim Emory University J. Miguel Villas-Boas University of California, Berkeley May, 2018 Keywords: demand estimation, limited
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationCurvilinear Components Analysis and Bregman Divergences
and Machine Learning. Bruges (Belgium), 8-3 April, d-side publi., ISBN -9337--. Curvilinear Components Analysis and Bregman Divergences Jigang Sun, Malcolm Crowe and Colin Fyfe Applied Computational Intelligence
More informationHierarchical Bayesian Nonparametrics
Hierarchical Bayesian Nonparametrics Micha Elsner April 11, 2013 2 For next time We ll tackle a paper: Green, de Marneffe, Bauer and Manning: Multiword Expression Identification with Tree Substitution
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationFreeman (2005) - Graphic Techniques for Exploring Social Network Data
Freeman (2005) - Graphic Techniques for Exploring Social Network Data The analysis of social network data has two main goals: 1. Identify cohesive groups 2. Identify social positions Moreno (1932) was
More informationOverview of clustering analysis. Yuehua Cui
Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this
More informationNOTES ON COOPERATIVE GAME THEORY AND THE CORE. 1. Introduction
NOTES ON COOPERATIVE GAME THEORY AND THE CORE SARA FROEHLICH 1. Introduction Cooperative game theory is fundamentally different from the types of games we have studied so far, which we will now refer to
More informationFebruary 22, Introduction to the Simplex Algorithm
15.53 February 22, 27 Introduction to the Simplex Algorithm 1 Quotes for today Give a man a fish and you feed him for a day. Teach him how to fish and you feed him for a lifetime. -- Lao Tzu Give a man
More informationChapter 2: Studying Geography, Economics, and Citizenship
Chapter 2: Studying Geography, Economics, and Citizenship Lesson 2.1 Studying Geography I. Displaying the Earth s Surface A. A globe of the Earth best shows the sizes of continents and the shapes of landmasses
More informationLSQ. Function: Usage:
LSQ LSQ (DEBUG, HETERO, INST=list of instrumental variables,iteru, COVU=OWN or name of residual covariance matrix,nonlinear options) list of equation names ; Function: LSQ is used to obtain least squares
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationMatrix Assembly in FEA
Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,
More informationAditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016
Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationChapte The McGraw-Hill Companies, Inc. All rights reserved.
er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationLASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape
LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers
More informationDo Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods
Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of
More informationUrban Transportation Planning Prof. Dr.V.Thamizh Arasan Department of Civil Engineering Indian Institute of Technology Madras
Urban Transportation Planning Prof. Dr.V.Thamizh Arasan Department of Civil Engineering Indian Institute of Technology Madras Module #03 Lecture #12 Trip Generation Analysis Contd. This is lecture 12 on
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More informationDynamic Macroeconomic Theory Notes. David L. Kelly. Department of Economics University of Miami Box Coral Gables, FL
Dynamic Macroeconomic Theory Notes David L. Kelly Department of Economics University of Miami Box 248126 Coral Gables, FL 33134 dkelly@miami.edu Current Version: Fall 2013/Spring 2013 I Introduction A
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationApplied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give
More informationDesign and Analysis of Algorithms
CSE 0, Winter 08 Design and Analysis of Algorithms Lecture 8: Consolidation # (DP, Greed, NP-C, Flow) Class URL: http://vlsicad.ucsd.edu/courses/cse0-w8/ Followup on IGO, Annealing Iterative Global Optimization
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationIntroduction to Game Theory Lecture Note 2: Strategic-Form Games and Nash Equilibrium (2)
Introduction to Game Theory Lecture Note 2: Strategic-Form Games and Nash Equilibrium (2) Haifeng Huang University of California, Merced Best response functions: example In simple games we can examine
More information5. Simulated Annealing 5.2 Advanced Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini
5. Simulated Annealing 5.2 Advanced Concepts Fall 2010 Instructor: Dr. Masoud Yaghini Outline Acceptance Function Initial Temperature Equilibrium State Cooling Schedule Stopping Condition Handling Constraints
More informationFundamentals of Metaheuristics
Fundamentals of Metaheuristics Part I - Basic concepts and Single-State Methods A seminar for Neural Networks Simone Scardapane Academic year 2012-2013 ABOUT THIS SEMINAR The seminar is divided in three
More informationTraining the linear classifier
215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.
More informationProjection methods to solve SDP
Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationTrip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment
Trip Distribution Modeling Milos N. Mladenovic Assistant Professor Department of Built Environment 25.04.2017 Course Outline Forecasting overview and data management Trip generation modeling Trip distribution
More informationDemand in Differentiated-Product Markets (part 2)
Demand in Differentiated-Product Markets (part 2) Spring 2009 1 Berry (1994): Estimating discrete-choice models of product differentiation Methodology for estimating differentiated-product discrete-choice
More informationLecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from
Topics in Data Analysis Steven N. Durlauf University of Wisconsin Lecture Notes : Decisions and Data In these notes, I describe some basic ideas in decision theory. theory is constructed from The Data:
More informationFitting Linear Statistical Models to Data by Least Squares II: Weighted
Fitting Linear Statistical Models to Data by Least Squares II: Weighted Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling April 21, 2014 version
More informationIterative Matching Pursuit and its Applications in Adaptive Time-Frequency Analysis
Iterative Matching Pursuit and its Applications in Adaptive Time-Frequency Analysis Zuoqiang Shi Mathematical Sciences Center, Tsinghua University Joint wor with Prof. Thomas Y. Hou and Sparsity, Jan 9,
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationRater agreement - ordinal ratings. Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT,
Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, 2012 http://biostat.ku.dk/~kach/ 1 Rater agreement - ordinal ratings Methods for analyzing
More informationChapter 2 Examples of Optimization of Discrete Parameter Systems
Chapter Examples of Optimization of Discrete Parameter Systems The following chapter gives some examples of the general optimization problem (SO) introduced in the previous chapter. They all concern the
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationOptimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes
Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding
More informationEstimation for nonparametric mixture models
Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia
More informationCopositive Programming and Combinatorial Optimization
Copositive Programming and Combinatorial Optimization Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria joint work with I.M. Bomze (Wien) and F. Jarre (Düsseldorf) IMA
More information