Sensor Deployment Recommendation for 3D Fine-Grained Air Quality Monitoring using Semi-Supervised Learning

Size: px
Start display at page:

Download "Sensor Deployment Recommendation for 3D Fine-Grained Air Quality Monitoring using Semi-Supervised Learning"

Transcription

1 Sensor Deployment Recommendation for 3D Fine-Grained Air Quality Monitoring using Semi-Supervised Learning Yuzhe Yang, Zijie Zheng, Kaigui Bian, Lingyang Song,andZhuHan School of Electrical Engineering and Computer Science, Peking University, Beijing, China, Electrical and Computer Engineering Department, University of Houston, Houston, TX, USA. Abstract Driven by the increasingly serious air pollution problem, the monitoring of fine-grained air quality index (AQI) in urban areas has drawn considerable attention. In this paper, we design a novel algorithm to recommend the placement of sensors for energy-efficient AQI monitoring in urban threedimensional (3D) space. Specifically, we first propose an entropybased semi-supervised learning (ESSL) model to estimate the AQI distribution of unobserved locations, using the sparse historical spatial-temporal data and other features, including 3D coordinates, wind speed and weather conditions. Based on ESSL, we then design an entropy minimization ranking (EMR) algorithm to recommend the best sensor locations for AQI monitoring. Through the emulation on a fine-grained AQI dataset, the results demonstrate our scheme can provide energy-efficient solutions by using the least number of sensors to achieve higher accuracy than other existing approaches. I. INTRODUCTION Air pollution has been proved to have significantly negative effects on human health and sustainable development, which has attracted a great consideration around the world []. Government agencies have defined the air quality index (AQI) to evaluate pollution degree. AQI is calculated based on the concentration of a number of air pollutants, e.g., the particulate matter (PM) such as PM.5 and PM 0 particles. A higher AQI indicates that air pollution becomes more severe and people are more likely to experience harmful health effects []. Thus, AQI monitoring is a critical issue. The more accurate AQI distribution that can be obtained in a region, the more effective methods we can find to deal with the air pollution. Air quality monitoring is usually done by setting up a few monitoring stations on dedicated sites in a city []. However, these fixed stations can only provide coarse-grained monitoring, i.e., two measurements are separated by few hundreds of meters in the three-dimensional (3D) space. Existing studies have shown that AQI has intrinsic changes from meters to meters, and it is preferred to perform AQI monitoring in the 3D space surrounding an office building or throughout a university campus, rather than city-wide [3]. The fine-grained AQI distribution in meter-sliced areas would be desirable for people, particularly those living in urban areas [4]. For monitoring fine-grained AQI in a 3D area, placing a number of low-cost sensors with laser-based AQI detector can be a desirable method. However, as we want to utilize a number of sensors to do accurate AQI monitoring and prediction, recommending suitable sensor locations in a given 3D space is on demand. In [5] [7], the authors have well investigated how to select monitoring stations over the citywide range. However, they focus on only D coarse-grained scenarios. The 3D fine-grained scenario has not yet fully been addressed. Moreover, considering the communication overhead and data transmission process of the sensors, the battery life also acts as a key factor for monitoring [8]. In order to save the total energy consumption in such sensor networks, the number of sensors would be as small as possible [9]. Thus, the goal is to use as few sensors as possible while maintaining high accuracy in AQI estimation, which is so-called an energyefficient scheme. In this paper, we design a novel scheme for energy-efficient sensor deployment recommendation in urban 3D space, e.g., around an office building. We first propose an entropy-based semi-supervised learning (ESSL) model to estimate the AQI distribution of unobserved locations, based on the sparse historical spatial-temporal data. The proposed ESSL utilizes key features in the fine-grained AQI distribution and is robust for very sparse historical air quality data. Further, we consider the entropy of the AQI distribution at unobserved locations as the uncertainty of our model. The objective is to select locations that can minimize the model s entropy. We then propose an entropy minimization ranking (EMR) algorithm for recommending such a set of locations for sensor deployment, to obtain the best estimation accuracy. The main contributions are summarized as below. The proposed ESSL model can provide higher AQI estimation accuracy based on a fine-grained AQI monitoring dataset [0], and performs better than other learning methods; The EMR algorithm can recommend the most suitable sensor locations, which can approximate the optimal deployment; The proposed scheme can realize energy-efficient deployment by using far fewer sensors while maintaining higher estimation accuracy than existing approaches. The rest of this paper is organized as follows. The preliminaries about dataset and feature selection are introduced in Section II. In Section III, we present our system model and /8/$ IEEE

2 formulate the problem. Section IV introduces the ESSL model for AQI estimation. In Section V, we propose the algorithms for sensor deployment recommendation. Experimental results are provided in Section VI, and conclusions are drawn in Section VII. Temporal Dimension II. DATASET DESCRIPTION In this section, we introduce the dataset with which we use to design and test our scheme throughout this paper. The dataset includes more than 00 days data in a typical 3D scenario, i.e., the courtyard of an office building inside Peking University [0]. In the dataset, each.txt file includes one complete measurement over a day. In each.txt file, each sample has four parameters, 3D coordinates (x, y, z) and an AQI value. Each value represents the measured AQI, while its coordinates in the matrix reflect the measuring position. Every row presents fixed position in xy plane, while every column represents the height at an interval of 5m in z direction. The courtyard is in the size of 40m 40m 50m, which can be divided into 640 5m 5m 5m cubes. Based on the dataset [0] and the previous experimental results of AQI monitoring in fine-grained scenarios [8], wind, location and weather condition are highly related to the finegrained AQI distributions. Thus, we consider these highly correlated parameters as key features as well. Labeled node (cube) Unlabeled node (cube) Edge between labeled and unlabeled nodes Edge between spatial neighbors Edge between temporal neighbors Spatial Dimension III. SYSTEM MODEL We consider the AQI monitoring in a common scenario, as the 3D space surrounding a campus building that is within hundred meters scale. The main objective is to recommend s suitable locations to deploy sensors for AQI monitoring. Based on the graph model, the AQI values at unmeasured locations can be estimated based on the data from the recommended locations. A. Multi-Layer 3D Spatial-Temporal Correlation Graph As shown in Fig., the target 3D space can be divided into a set of cubes, each presented by a node in the graph. Motivated by the correlation of AQI values in both spatial and temporal perspectives, these nodes are connected in both spatial and temporal dimension to form a multi-layer 3D graph G =(V,E). The spatial dimension is presented by the 3D coordinates in the target area, while the temporal dimension is described by a series of time points {T,T,...,T d } with equal interval (e.g., several hours in the dataset [0]), and each layer presents the 3D spatial graph at one specific time point T k. The nodes with monitoring data are named as labeled nodes, while nodes without monitoring data are named as unlabeled nodes. Each labeled node l has the true AQI value, while the AQI of each unlabeled node u can be estimated through a probability function, p u. For convenience, the node set can be written as V = {V L V U }, where V L denotes the set of labeled nodes and V U denotes unlabeled ones. Fig.. An example of the proposed multi-layer 3D spatial-temporal correlation graph model. The edges in E in the graph model can be constructed with the following methods: ) Connected with labeled nodes. Each unlabeled node is connected with all labeled nodes in V L at the same time point T k. Since labeled nodes are sparse, the connection would not increase the complexity of the whole model, while it can increase the accuracy and the speed of convergence. ) Connected with spatial neighbors. Each unlabeled node is also connected with neighboring nodes within a given spatial radius r, since the AQI value of one node is highly correlated to the AQI of its neighbors spatially. 3) Connected with temporal neighbors. Each unlabeled node is connected to the node that has the same location but at neighboring time points, e.g., node u 0 at (x k,y k,z k ) at time point T k will be connected with u and u both at (x k,y k,z k ), but at time point T k and T k+, respectively. This is because of the potential temporal correlation of the AQI value at the same location. For every edge {(v,v ) E}, it has a corresponding weight. The weight of edge represents how much of the features, e.g., wind speed, between two nodes v and v are correlated. The correlation function is defined to describe the correlation:

3 Definition. (Correlation Function) Given a set of features e = {e (),e (),...,e (M) }, the correlation function of each feature between node v and v is defined to have the form of Taylor series expansion, expressed as e Q e (m)(v,v )=ε m + α m + β (m) m (v ) e (m) (v ) + e γ (m) m (v ) e (m) (v ) + ( e ) o (m) (v ) e (m) (v ), m =,,...,M. () In (), ε m is a random variable and is assumed to follow ε m N(0,σ ). As for the fine-grained scenario, the correlation function is adopted as the second-order approximation of the Taylor expansion, of the feature distance (e (m) (v ) e (m) (v )) = δ(e (m) ). α m, β m and γ m are parameters that need to be estimated. Let N be the number of samples, τ (m) =[α m β m γ m ] denotes the parameter vector, e (m) j =[δ(e (m) j )(δ(e (m) j )) ] denotes the feature distance vector for j th sample, the log-likelihood function l can be computed as ( ) N l (m) (τ (m) )=ln exp Q j e (m) e (m) j (τ (m) ) T πσ σ j= = σ N ( Q j e (m) e (m) j j= (τ (m) ) T) n ln σ π. () The parameters τ (m) of the correlation function can be estimated through the least square estimation, given by arg max τ (m) l (m) =argmin τ (m) N ( Q j e (m) e (m) j (τ (m) ) T) =argmin τ (m) j= Q e (m) e (m) (τ (m) ) T. (3) Based on the correlation function in (), we define the weight matrix W = {w ij }, where the weight on edge {(v,v ) E} is expressed as ( ) M =exp θm Q e (m)(v,v ), (4) w v,v m= where θ m is the weight of feature e (m), and needs to be further learned to determine the AQI distribution of the unlabeled nodes. B. Problem Formulation for AQI Estimation The main objective for the model s convergence is to minimize the model s uncertainty for estimating unlabeled nodes. Since the knowledge we hold for the labeled nodes is sparse, it is infeasible to minimize the error probability, as the true values of most nodes are unknown. Instead, we first use the weighted average AQI of neighboring nodes to express the probability function p u at unlabeled nodes []. The objective then becomes to minimize the entropy of the whole model, i.e., H(p u )= u p u log p u, to achieve the accurate estimation []. This idea comes from the fact that an unlabeled node should possess a similar AQI value of its adjacent labeled nodes which are connected to it. Therefore, based on the edge weight function in (4), we define the loss function of the correlation graph to enable the propagation between highly correlated nodes with higher edge weights, expressed by L(p) = (v,v ) E w v,v p v p v, (5) where p v and p v are the AQI distribution at node v and v. Definition. (AQI Distribution Distance) The degree of the similarity of AQI distribution between two nodes is defined by their Symmetrical Kullback-Leibler (KL) Divergence [], which is written as p v p v = D KL (p v p v )+D KL (p v p v ). (6) We assume the AQI value of each node has discrete quantized values on positive integer. The AQI distribution distance can be elaborated by p v p v = [ p v (x)log p v (x) p v (x) + p v (x)log p ] v (x) p v (x) x X X = { 0,,,...,X }, (7) where X denotes the maximum possible AQI value. Thus, our goal is to determine the AQI distribution that minimizes L(p), which is given by the objective function p =argmin L(p). (8) θ By minimizing the loss function L(p), the nodes with higher edge weights would possess more similar AQI value while the nodes with lower edge weights would be more independent. Thus, the objective function can enable the AQI value propagation between the highly correlated nodes, thus improving the estimation accuracy. IV. AQI ESTIMATION USING SEMI-SUPERVISED LEARNING In this section, we investigate the entropy-based semisupervised learning (ESSL) solution for problem (8). We first derive the AQI estimation on unlabeled nodes, then introduce the entropy-based learning method. A. AQI Estimation on Unlabeled Nodes According to literatures, the minimum function in (8) is harmonic [], which means it satisfies Δp u =0on unlabeled nodes U, while Δp l = P (v l ) on labeled nodes L. HereΔ is the combinatorial Laplacian, which is defined by Δ= D W. D = diag(d i ) denotes the diagonal matrix with d i = j u(w i,j), where u( ) is the unit step function; W = {w i,j } is the weight matrix defined in (4).

4 The harmonic property of p u provides the solution for the distribution of unlabeled nodes, which is expressed as the average of its neighboring nodes []: p u (x) = w u,l p l (x), x X. (9) d u (u,l) E The solution again reflects the influence by the highly correlated nodes connected by higher weight edges. To normalize the solution, we redefine it as p u (x) = d u x X p w u,l p l (x) u(x) (u,l) E (0) = x X (u,l) E w u,lp l (x) (u,l) E w u,lp l (x). Proposition. p u in (0) is a probability mass function (PMF) on x. Proof: To be a PMF on x, we test the satisfaction of p u on the following three properties []: The domain of p u is the set of all possible states of x. x x, 0 p u (x). x x p u(x) =. Considering the expression form in (0), the conclusion is obvious, that p u is a PMF on x. The solution of harmonic function can be completed explicitly in the matrix form. We split W, Δ and P into labeled part and unlabeled part as [ ] [ ] [ ] WLL W LU ΔLL Δ LU PL W =, Δ=, P =, W UL W UU Δ UL Δ UU where P L and P U are the AQI distributions at labeled and unlabeled nodes. Applying ΔP U =0,wehave P U = Δ UU Δ ULP L. () The result of () determines the AQI probability distribution of every unlabeled nodes. To provide an exact labeling value of estimation, as is proofed in Proposition that p u is a PMF on x, we quantize it using the expectation of p u : ˆP u = E x pu [x] = P U X x P u (x = x), x X. () x= B. Entropy-based Learning So far, the expression of p u is determined, the next step is to investigate the learning weight functions given by (4). We will learn θ m from both labeled and unlabeled data, and thus form a semi-supervised mechanism. As θ m can influence the correlation between nodes, and thus influence the distribution of unlabeled locations, learning a suitable set of {θ m } is of vital importance. The common criterion of learning θ m is to maximize the likelihood of labeled data. However, this method is infeasible in our case since the labeled nodes are sparse, and thus would not improve the estimation accuracy of unlabeled ones. We instead focus on the model s entropy as the criterion, which represents the authentication of the estimation. This is intuitive since a high entropy can be regarded as the unpredicted value, resulting in poor capability of estimation and low accuracy. Thus, the objective of ESSL is to minimize the whole entropy of unlabeled nodes. The average entropy H(p u ) of unlabeled node set U is defined as H(p u )= H j (p j ) = j= j= x= X p j (x = x)logp j (x = x), x X, (3) where denotes the number of unlabeled nodes. For simplicity, we denote x p j(x = x)logp j (x = x) as p j log p j, the gradient can be derived as H θ m = j= ( log ) pj. (4) p j ln θ m For every unlabeled p j, we investigate pj θ m based on (0) and (4). By applying the chain rule of differentiation, the final gradient can be derived as w u,l =w u,l Q θ e (m)θ m. (5) m Thus, by iteratively learning and updating θ m using (5), the edge weights w u,l can be investigated and further generate the final AQI distribution of P U when the iteration converges. V. SENSOR DEPLOYMENT RECOMMENDATION ALGORITHM In this section, based on the proposed ESSL, we design an entropy minimization ranking (EMR) algorithm for sensor deployment recommendation. More specifically, we first introduce a reverse entropy minimization (REM) algorithm to suggest how to choose a single location. Based on REM, we then introduce EMR for multiple sensors. A. Reverse Entropy Minimization We first introduce the REM algorithm based on [5]. When given a set of unlabeled nodes {U R }, one may intuitively choose a target location with the highest entropy. However, once the chosen node is labeled, the correlation model will change correspondingly, which can result in other unlabeled nodes with even higher entropy. Hence, the lowest entropy node is chosen in U R every time and ranks it reversely from U R to, thus generating a reverse ranking of U R iteratively. Finally, we choose the top rank node as target location, since the correlation between it and other nodes are the lowest, hence is the most uncertain one and is the most difficult for estimation. Let denote the size of U R, since we traverse set U R, the complexity of REM is O(). The process of REM is described in Algorithm.

5 Algorithm : Reverse Entropy Minimization (REM) Input: labeled set {L R }, unlabeled set {U R } Output: target location l for i =0to ( ) do (a) Choose u that has the lowest entropy in {U R }, rank u as ( i); (b) Add u to {L R }, remove u from {U R }; end Return l that ranks st. B. Entropy Minimization Ranking Based on REM, we now propose EMR to recommend s prospective sensor locations. Since the historical data is sparse, we assume the size of labeled set {L} is less than s. This assumption promises the universal applicability of EMR in the fine-grained AQI monitoring scenario. Intuitively, labeled nodes already with historical data can be chosen as target locations. However, it cannot guarantee the minimization of model entropy. That is to say, we need to balance the choice from labeled set {L} as well as unlabeled set {U}. Inspired by this idea, we first obtain all nodes in L as potential locations, and use REM to choose and label s l ( l denotes the size of L) locations from U, to generate the initial recommending set M. Then we iteratively find one node in L without which the entropy improves the least, and replace it by finding the node in U that can best reduce the entropy. When the replacement cannot reduce the total entropy, the iteration stops. This balanced selection of target locations can promise a minimized entropy. Thus, the EMR provides a sub-optimal solution for the location selection problem. Note that at every specific time point T k, we perform the above steps to obtain a rank list M (k). Since the entropy distribution can vary with time, we average the result of {M (), M (),...,M (d) } to select the final recommended locations. This process is vital for improving the performance of EMR, because it can determine nodes with consistently low rankings. The physical meaning of these nodes is that they are more independent from other nodes, and thus need to be picked out as target locations. For analyzing the complexity of EMR, the outer loop contains d time points, the inner loop needs to compute (s + l ) times in the worst case. Since s > l, the total complexity can be derived as O (d(s + l )) = O(d s ), which is of low complexity as a linear function of s. The process of EMR is described in Algorithm. VI. SIMULATION RESULTS In this section, we evaluate the performance of the proposed ESSL model, and the EMR algorithm. As has described in Section II, we use the fine-grained AQI dataset for verification. The dataset contains more than 00 days data, each with 45 labeled locations. Hence, there are 4500 labeled samples in total. Although the 3D space can be divided into 640 cubes with total instances, we do not know the ground Algorithm : Entropy Minimization Ranking (EMR) Input: labeled set {L}, unlabeled set {U}, time series T = {T,T,...,T d }, number of recommended locations s Output: target location set M forall T k T do (a) Use REM ( L (k),u (k)) to select (s l ) unlabeled nodes, combine them with L (k) to initialize the recommend set M (k) ; (b) Remove one node l0 in L (k) which can minimize the improvement of entropy, remove l0 from M (k) ; (c) Use REM ( L (k),u (k)) to select one node u 0 in U (k), add u 0 into M (k) ; (d) Compare the entropy H with that of the last time H. IfH <H then go to (b), else the iteration stops with result M (k) for T k ; end select M from { M (k)}, k =,,...,d. RMSE ESSL DNN knn CART SVR LI MLR Number of unselected locations M Fig.. Comparison of AQI estimation accuracy between different methods, when M labeled locations are unselected. truth AQI value of unlabeled data. Thus, we divide labeled samples into 3500 samples as training set and 000 samples as testing set, while performing an cross-validation by randomly choosing the training data, and repeat for 000 times to avoid stochastic errors. A. Estimation Accuracy We first evaluate the estimation accuracy for the proposed ESSL, versus other commonly used methods. We use rootmean-square error (RMSE) as merit to reflect the estimation accuracy (the lower the better). The proposed scheme is compared to the following baselines []: Deep Neural Networks (DNN) with 50 hidden layers. k-nearest Neighbors (knn). Classification and Regression Tree (CART). Support Vector Regression (SVR). Linear Interpolation (LI). Multi-variable Linear Regression (MLR).

6 RMSE Optimal Proposed EMR Maximum Coverage Spatial-Temporal Greedy Entropy Greedy Search Number of recommending sensor locations s Fig. 3. Accuracy comparison of different recommendation algorithms, via different recommended locations s. Fig. evaluates the estimation accuracy of different methods. We remove several different numbers of labeled locations. The testing set remains to be 0 locations that are randomly chosen. From the figure, we can see our proposed ESSL outperforms other common methods significantly. When labeled locations are removed, ESSL always performs better than other solutions, which validates the robustness of our method. B. Energy-Efficiency of Recommended Locations We verify the ability of EMR to recommend best suitable locations by testing whether it can bring the best promotion on estimation accuracy. We use 0 labeled locations as testing set, and another 5 are used as the set in which we recommend s best locations. We randomly choose l =0locations from 5 potential locations to be the known labeled data, while assuming the other 5 are unknown. Since s> l, we range s from to 4. For each s, there can be Cs 5 combinations of different recommending set M in total. We traverse all the potential combinations to find an optimal M as one baseline. Also, EMR is compared to other following baselines [5]: Maximum Coverage: Every time it selects the location that has the longest distance from the last chosen one, to manage a maximum coverage of the 3D space. Spatial-Temporal Greedy Search: It greedily chooses locations that have the most dissimilarity in both spatial and temporal dimension, based on (). Entropy Greedy Search: It greedily selects locations that have the highest entropy as candidates. In Fig. 3, we report the total RMSE that each algorithm achieves. The proposed EMR generally brings much better improvement than other methods. Moreover, EMR also performs very close to the optimal choice of recommending sensor locations, which demonstrates EMR s effectiveness. In Fig. 4, we show the minimum number of sensors needed, when given the estimation accuracy (i.e., RMSE). The result indicates that EMR can achieve high accuracy with much fewer sensors, thus demonstrates that EMR can provide energy-efficient solutions. Minimum number of sensors Proposed EMR Maximum Coverage Spatial-Temporal Greedy Entropy Greedy Search RMSE Fig. 4. The minimum number of sensors needed between different methods, when given certain RMSE value as the estimation accuracy. VII. CONCLUSION In this paper, we investigated how to recommend the most suitable sensor locations in 3D space for fine-grained AQI monitoring. We first built up a multi-layer 3D spatialtemporal correlation model, and proposed an entropy-based learning model ESSL for estimation. Then, we proposed a recommendation algorithm EMR to recommend the most suitable sensor locations. Experimental results showed that ESSL can achieve higher AQI estimation accuracy than the existing methods, and EMR can provide near-optimal sensor deployment recommendation, which is also energy-efficient. REFERENCES [] Q. Di et al., Air pollution and mortality in the medicare population, New England J. of Medicine, vol. 376, no. 6, pp Jul. 07. [] B. Zou, J. Wilson, F. Zhan, and Y. Zeng, Air pollution exposure assessment methods utilized in epidemiological studies, J. of Environmental Monitoring, vol., no. 3, pp , Feb [3] T. Quang et al., Vertical particle concentration profiles around urban office buildings, Atmospheric Chemistry and Physics, vol., no., pp May 0. [4] C. Borrego et al., How urban structure can affect city sustainability from an air quality perspective, Environmental modelling & software, vol., no. 4, pp , Apr [5] H. Hsieh, S. Lin, and Y. Zheng, Inferring air quality for station location recommendation based on urban big data, ACM Int. Conf. on Knowledge Discovery and Data Mining (KDD 5), Sydney, Australia, Aug. 05. [6] T. Liu et al., Finding optimal meteorological observation locations by multi-source urban big data analysis, IEEE Int. Conf. on Cloud Computing and Big Data (CCBD 6), Macau, China, Nov. 06. [7] Y. Zheng et al., Forecasting fine-grained air quality based on big data, ACM Int. Conf. on Knowledge Discovery and Data Mining (KDD 5), Sydney, Australia, Aug. 05. [8] Y. Yang et al., Arms: a fine-grained 3D AQI realtime monitoring system by UAV, IEEE Global Commun. Conf. (GLOBECOM 7), Singapore, Dec. 07. [9] Y. Yang et al., AQNet: fine-grained 3D spatio-temporal air quality monitoring by aerial-ground WSN, IEEE Int. Conf. on Comput. Commun. (INFOCOM 8), Honolulu, HI, Apr. 08. [0] Y. Yang, Z. Zheng, K. Bian, L. Song, and Z. Han, Realtime profiling of fine-grained air quality index distribution using UAV sensing, IEEE Internet of Things Journal, vol. 99, pp. -3, Nov. 07. [] X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using gaussian fields and harmonic functions, Proc. of the 0th Int. Conf. on Machine Learning (ICML 03), Washington, DC, Aug [] I. Goodfellow, Y. Bengio, and A. Courville, Applied Math and Machine Learning, in Deep Learning. Cambridge, MA: MIT Press, 06.

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Classification of Hand-Written Digits Using Scattering Convolutional Network

Classification of Hand-Written Digits Using Scattering Convolutional Network Mid-year Progress Report Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan Co-Advisor: Dr. Maneesh Singh (SRI) Background Overview

More information

Latent Tree Approximation in Linear Model

Latent Tree Approximation in Linear Model Latent Tree Approximation in Linear Model Navid Tafaghodi Khajavi Dept. of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 Email: navidt@hawaii.edu ariv:1710.01838v1 [cs.it] 5 Oct 2017

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University

More information

A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega

A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS Akshay Gadde and Antonio Ortega Department of Electrical Engineering University of Southern California, Los Angeles Email: agadde@usc.edu,

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

AREP GAW. AQ Forecasting

AREP GAW. AQ Forecasting AQ Forecasting What Are We Forecasting Averaging Time (3 of 3) PM10 Daily Maximum Values, 2001 Santiago, Chile (MACAM stations) 300 Level 2 Pre-Emergency Level 1 Alert 200 Air Quality Standard 150 100

More information

Manifold Coarse Graining for Online Semi-supervised Learning

Manifold Coarse Graining for Online Semi-supervised Learning for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy

More information

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Bin Gao Tie-an Liu Wei-ing Ma Microsoft Research Asia 4F Sigma Center No. 49 hichun Road Beijing 00080

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Active learning in sequence labeling

Active learning in sequence labeling Active learning in sequence labeling Tomáš Šabata 11. 5. 2017 Czech Technical University in Prague Faculty of Information technology Department of Theoretical Computer Science Table of contents 1. Introduction

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation Vu Malbasa and Slobodan Vucetic Abstract Resource-constrained data mining introduces many constraints when learning from

More information

Switching-state Dynamical Modeling of Daily Behavioral Data

Switching-state Dynamical Modeling of Daily Behavioral Data Switching-state Dynamical Modeling of Daily Behavioral Data Randy Ardywibowo Shuai Huang Cao Xiao Shupeng Gui Yu Cheng Ji Liu Xiaoning Qian Texas A&M University University of Washington IBM T.J. Watson

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Graph-Based Semi-Supervised Learning

Graph-Based Semi-Supervised Learning Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier

More information

Regularization on Discrete Spaces

Regularization on Discrete Spaces Regularization on Discrete Spaces Dengyong Zhou and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany {dengyong.zhou, bernhard.schoelkopf}@tuebingen.mpg.de

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Importance Reweighting Using Adversarial-Collaborative Training

Importance Reweighting Using Adversarial-Collaborative Training Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a

More information

Caesar s Taxi Prediction Services

Caesar s Taxi Prediction Services 1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Course 10. Kernel methods. Classical and deep neural networks.

Course 10. Kernel methods. Classical and deep neural networks. Course 10 Kernel methods. Classical and deep neural networks. Kernel methods in similarity-based learning Following (Ionescu, 2018) The Vector Space Model ò The representation of a set of objects as vectors

More information

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Binary Compressive Sensing via Analog. Fountain Coding

Binary Compressive Sensing via Analog. Fountain Coding Binary Compressive Sensing via Analog 1 Fountain Coding Mahyar Shirvanimoghaddam, Member, IEEE, Yonghui Li, Senior Member, IEEE, Branka Vucetic, Fellow, IEEE, and Jinhong Yuan, Senior Member, IEEE, arxiv:1508.03401v1

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Empirical Discriminative Tensor Analysis for Crime Forecasting

Empirical Discriminative Tensor Analysis for Crime Forecasting Empirical Discriminative Tensor Analysis for Crime Forecasting Yang Mu 1, Wei Ding 1, Melissa Morabito 2, Dacheng Tao 3, 1 Department of Computer Science, University of Massachusetts Boston,100 Morrissey

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

OPTIMIZATION METHODS IN DEEP LEARNING

OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Semi-supervised learning for node classification in networks

Semi-supervised learning for node classification in networks Semi-supervised learning for node classification in networks Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Paul Bennett, John Moore, and Joel Pfeiffer)

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

IN the field of ElectroMagnetics (EM), Boundary Value

IN the field of ElectroMagnetics (EM), Boundary Value 1 A Neural Network Based ElectroMagnetic Solver Sethu Hareesh Kolluru (hareesh@stanford.edu) General Machine Learning I. BACKGROUND AND MOTIVATION IN the field of ElectroMagnetics (EM), Boundary Value

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Algorithms for Learning Good Step Sizes

Algorithms for Learning Good Step Sizes 1 Algorithms for Learning Good Step Sizes Brian Zhang (bhz) and Manikant Tiwari (manikant) with the guidance of Prof. Tim Roughgarden I. MOTIVATION AND PREVIOUS WORK Many common algorithms in machine learning,

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Overview Problem Key Insight Methods Experiments Problem: Predict future actions and objects Image

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Combining Regressive and Auto-Regressive Models for Spatial-Temporal Prediction

Combining Regressive and Auto-Regressive Models for Spatial-Temporal Prediction Combining Regressive and Auto-Regressive Models for Spatial-Temporal Prediction Dragoljub Pokrajac DPOKRAJA@EECS.WSU.EDU Zoran Obradovic ZORAN@EECS.WSU.EDU School of Electrical Engineering and Computer

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

Time Series Data Cleaning

Time Series Data Cleaning Time Series Data Cleaning Shaoxu Song http://ise.thss.tsinghua.edu.cn/sxsong/ Dirty Time Series Data Unreliable Readings Sensor monitoring GPS trajectory J. Freire, A. Bessa, F. Chirigati, H. T. Vo, K.

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Introduction Supervised learning: x r, y r R r=1 E.g.x r : image, y r : class labels Semi-supervised learning: x r, y r r=1 R, x u R+U u=r A set of unlabeled data, usually U >>

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information