CS229 Final Project. Wentao Zhang Shaochuan Xu

Similar documents
Machine Learning 11. week

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Reduced Order Greenhouse Gas Flaring Estimation

Expectation Maximization

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Dimensionality reduction

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Classification: The rest of the story

Machine Learning 4771

Oil Field Production using Machine Learning. CS 229 Project Report

Unsupervised Learning

Intelligent Data Analysis Lecture Notes on Document Mining

ECE 5984: Introduction to Machine Learning

Nonlinear Dimensionality Reduction

CS534 Machine Learning - Spring Final Exam

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

Spectral Regularization

L11: Pattern recognition principles

Introduction to Machine Learning

Regularization via Spectral Filtering

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

CS281 Section 4: Factor Analysis and PCA

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Principal Component Analysis (PCA)

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Linear regression methods

Overfitting, Bias / Variance Analysis

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

VBM683 Machine Learning

CITS 4402 Computer Vision

Lecture 2 Machine Learning Review

Machine Learning. Lecture 9: Learning Theory. Feng Li.

15 Singular Value Decomposition

Analysis of Spectral Kernel Design based Semi-supervised Learning

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

1 Singular Value Decomposition and Principal Component

ECE 661: Homework 10 Fall 2014

Optimizing Model Development and Validation Procedures of Partial Least Squares for Spectral Based Prediction of Soil Properties

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda


Point Distribution Models

Performance Evaluation

CS 340 Lec. 6: Linear Dimensionality Reduction

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Learning From Data: Modelling as an Optimisation Problem

Learning with multiple models. Boosting.

Gopalkrishna Veni. Project 4 (Active Shape Models)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

4 Bias-Variance for Ridge Regression (24 points)

An Introduction to Statistical Machine Learning - Theoretical Aspects -

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Lecture: Face Recognition and Feature Reduction

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

MACHINE LEARNING ADVANCED MACHINE LEARNING

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Feature Engineering, Model Evaluations

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Learning Theory Continued

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

DS-GA 1002 Lecture notes 12 Fall Linear regression

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Lecture: Face Recognition and Feature Reduction

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Predicting Future Energy Consumption CS229 Project Report

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Principal Component Analysis

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Bias-Variance in Machine Learning

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Linear Models for Regression

Class 2 & 3 Overfitting & Regularization

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Data Mining Techniques

Lecture 7: Con3nuous Latent Variable Models

Principal Component Analysis

CMSC 422 Introduction to Machine Learning Lecture 4 Geometry and Nearest Neighbors. Furong Huang /

Lecture 02 Linear classification methods I

Support vector machines Lecture 4

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Modelling Multivariate Data by Neuro-Fuzzy Systems

Multivariate Statistical Analysis

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Forecasting with Expert Opinions

CMSC858P Supervised Learning Methods

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Bias-Variance Tradeoff

Dimension Reduction (PCA, ICA, CCA, FLD,

Transcription:

CS229 Final Project Shale Gas Production Decline Prediction Using Machine Learning Algorithms Wentao Zhang wentaoz@stanford.edu Shaochuan Xu scxu@stanford.edu In petroleum industry, oil companies sometimes purchase oil and gas production wells from others instead of drilling a new well. The shale gas production decline curve is critical when assessing how much more natural gas can be produced for a specific well in the future, which is very important during the acquisition between the oil companies, as a small under- estimate or over- estimate of the future production may result in significantly undervaluing or overvaluing an oilfield. In this project, we use the Locally Weighted Linear Regression to predict this future production based on the existing decline curves; Then, we apply the K- means to group the decline curves into two categories, high and low productivity; Moreover, Principal Component Analysis is also tried to calculate the eigenvectors of the covariance matrix, based on which we also predict the future production both with K- means as a preprocess and without K- means. At last, three methods are compared with each other in terms of the accuracy defined by a standardized error. l Dataset The data we used are ly production rate curves of thousands of shale gas wells as below. In order to deal with different lengths of different curves, and the production rate data points of some curves, we modify these data a little bit. We substitute data points in any curve by a very small number.1, and we make all the curves the same length by adding zeros to the end, for the sake of being loaded into MATLAB as a matrix. 18 shale gas production decline - all curves 16 1 1 1 8 6 2 4 6 8 1 12 14 Figure 1 2152 decline curves used to learn to predict production in the future l Locally Weighted Linear Regression (LWLR) Our goal is to predict the future gas production of a new well, given its historical production data and information from other wells with longer history. Suppose that we randomly choose a decline curve r with n s in total. We want to use the first l to

predict the rest (n- l) s of the curve. In order to find curves from the training set that are similar to r, we define the distance between two curves by squared L- 2 norm. Before we calculate the distance, we need to filter the training set by removing curves whose history is shorter than n. Then we pick k wells from the filtered training set that are closest to r, give each of them a weight w i and make prediction for r as: f predicted = i neighb k ( f past _existing )! Where h is the longest distance. i neighb k ( f past _existing ) w(d( f past _existing, f measured )/h) f future_existing w(d( f past _existing, f measured )/h) Results: Figure 2 We restrict the number of neighbors k equal to 3. In Figure 2, four typical s are shown. The results are generally consistent with the real values. Comparatively, the s are smoother than the real ones, because the predictions are the sum of multiple training wells. 5 3 1 5 3 1 2 4 6 8 1 12 2 4 6 8 1 12

5 3 1 Figure 3 2 4 6 8 1 12 Standarlized Error 1.2 1.8.6.4.2 2 4 6 8 1 12 Month Predicted curves with l (known s) increasing for a good fitting; error vs. l 5 3 1 5 3 1 Figure 4 2 4 6 8 1 12 2 4 6 8 1 12 Standarlized Error 5 3 1 1.2 1.8.6.4.2 2 4 6 8 1 12 2 4 6 8 1 12 Month Predicted curves with l (known s) increasing for a bad fitting; error vs. l In Figure 3 and Figure 4, we change the from short to long, and plot the error versus the known s. The error does not decrease when we know longer curve and predict shorter. The reason might be that the Standardized Error is defined as the average relative error of predicted s. For this reason, in the tail of the curve, since absolute values are small, relative errors are easily to be large. A better error needs to be defined if we really want to tell if the prediction is better with longer. l Principal Component Analysis (PCA) Since each well has a history as many as tens of s, intuitively, we want to reduce the dimensions of time and keep the intrinsic components that reflect production decline. First of all, we filter the training set by removing the wells whose history is shorter than the total s n of a. After the normalization on the data, we eigen- decompose the empirical covariance matrix and extract the first 5 eigenvectors as the principal components. Then, we fit the known part of the by using a linear combination of 5 eigenvectors. The coefficients θ of the linear combination are calculated from linear regression, y known = Uθ l And we predict the future decline curve as, y estimate = U θ h

Where y known R l is the normalized known history of the test well, U l R l*5 is the eigenvectors with the first l dimensions. Our estimation is therefore y estimate. Results: Figure 5 As can be seen from Figure 5, the prediction is either too smooth or too variant compared to the real data. This is because at the fitting step, θ is either underfitted (high variance) or overfitted (low variance). Another problem in PCA is that all the training wells have contribution to the estimation, which makes it unprecise for very high or low production prediction. l PCA after K- means If we assume high- productivity wells are similar to each other and low- productivity wells are similar to each other, we can group all the decline curves into two categories. We modify the K- means method to be applied into this real situation that different decline curves have different dimensions. We calculate the distance between a centroid and a curve by using the dimension of the shorter one. As comparing two figures in Figure 6, this modified K- means method is good enough to distinguish high- productivity wells from low- productivity wells. Figure 6 Decline curves in the high productivity wells (left) and the low productivity wells (right)

Then, we run PCA again after clustering the original decline curves by K- means. Figure 7 From Figure 7, we can see that although the underfitting/overfitting problem still exists, the results are better than the original PCA. This might be due to we add the L- 2 norm distance information into the PCA, which makes it an integrated method. l Discussion Figure 8 Errors of three methods calculated from Leave-One-Out cross validation We apply Leave- One- Out cross validation to all the three methods, compare the predictions with real production data and calculate the average relative errors as in Figure 3 and Figure 4. We also define a threshold value to avoid the extremely large errors. The reason we do this is that one extreme value can make the average of all relative errors really huge, but these extreme values are due to the shutting down periods of the wells (when the production is nearly zero). Figure 8 verifies our intuition that LWLR is the best among the three methods because no information is lost due to dimension reduction. PCA has the largest relative error among three methods because higher order principal characteristics, reflecting the details of decline curves, are not included. K- means helps cluster the wells into high and low productivity classes, which improves PCA with the availability of that prior information.