Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems

Similar documents
Talk on Bayesian Optimization

Quantifying mismatch in Bayesian optimization

A parametric approach to Bayesian optimization with pairwise comparisons

Multi-Attribute Bayesian Optimization under Utility Uncertainty

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Dynamic Batch Bayesian Optimization

STAT 518 Intro Student Presentation

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Learning Gaussian Process Models from Uncertain Data

Predictive Variance Reduction Search

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration

Neutron inverse kinetics via Gaussian Processes

Optimization of Gaussian Process Hyperparameters using Rprop

Learning to Learn and Collaborative Filtering

Probabilistic numerics for deep learning

Nonparameteric Regression:

Prediction of double gene knockout measurements

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

Advanced statistical methods for data analysis Lecture 2

Gaussian Process Approximations of Stochastic Differential Equations

Global Optimisation with Gaussian Processes. Michael A. Osborne Machine Learning Research Group Department o Engineering Science University o Oxford

Can you predict the future..?

Lecture 5: GPs and Streaming regression

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Massachusetts Institute of Technology

arxiv: v2 [stat.ml] 16 Oct 2017

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

Learning Non-stationary System Dynamics Online using Gaussian Processes

A General Framework for Constrained Bayesian Optimization using Information-based Search

Gaussian Process Regression

Introduction to Gaussian Processes

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Model Selection for Gaussian Processes

Introduction to emulators - the what, the when, the why

Deep Neural Networks as Gaussian Processes

Anomaly Detection and Removal Using Non-Stationary Gaussian Processes

Probabilistic & Unsupervised Learning

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Bayesian Deep Learning

Worst-Case Bounds for Gaussian Process Models

Probabilistic Models for Learning Data Representations. Andreas Damianou

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach

CS 7140: Advanced Machine Learning

Algorithmisches Lernen/Machine Learning

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

An Introduction to Gaussian Processes for Spatial Data (Predictions!)

20: Gaussian Processes

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Probabilistic & Bayesian deep learning. Andreas Damianou

GAUSSIAN PROCESS REGRESSION

System identification and control with (deep) Gaussian processes. Andreas Damianou

Linear Dynamical Systems

Bayesian optimization for automatic machine learning

Learning Representations for Hyperparameter Transfer Learning

Multiple-step Time Series Forecasting with Sparse Gaussian Processes

Prediction of Data with help of the Gaussian Process Method

An Empirical Bayes Approach to Optimizing Machine Learning Algorithms

Big model configuration with Bayesian quadrature. David Duvenaud, Roman Garnett, Tom Gunter, Philipp Hennig, Michael A Osborne and Stephen Roberts.

Gaussian Process Regression Forecasting of Computer Network Conditions

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Joint Emotion Analysis via Multi-task Gaussian Processes

Active Learning and Optimized Information Gathering

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Introduction to Gaussian Processes

Density Estimation. Seungjin Choi

Relevance Vector Machines

Learning Non-stationary System Dynamics Online Using Gaussian Processes

GWAS V: Gaussian processes

CS 195-5: Machine Learning Problem Set 1

HMM and IOHMM Modeling of EEG Rhythms for Asynchronous BCI Systems

The Variational Gaussian Approximation Revisited

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Modeling human function learning with Gaussian processes

Learning the hyper-parameters. Luca Martino

Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Advanced Introduction to Machine Learning CMU-10715

Naïve Bayes classification

Anomaly Detection and Removal Using Non-Stationary Gaussian Processes

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Variational Principal Components

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Sparse Linear Contextual Bandits via Relevance Vector Machines

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

Learning Tetris. 1 Tetris. February 3, 2009

arxiv: v4 [stat.ml] 28 Sep 2017

Gaussian Process Regression Model in Spatial Logistic Regression

Introduction to Gaussian Process

Gaussian process for nonstationary time series prediction

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Transcription:

Context-Dependent Bayesian Optimization in Real-Time Optimal Control: A Case Study in Airborne Wind Energy Systems Ali Baheri Department of Mechanical Engineering University of North Carolina at Charlotte Charlotte, NC 28223 akhayatb@uncc.edu Chris Vermillion Department of Mechanical Engineering University of North Carolina at Charlotte Charlotte, NC 28223 cvermill@uncc.edu Abstract We present a framework in which Bayesian Optimization is used for real-time optimal control. In particular, Bayesian Optimization is applied to the real-time altitude optimization of an airborne wind energy AWE) system, for the purpose of maximizing net energy production. 1 Introduction Airborne Wind Energy AWE) systems are a new paradigm for wind turbines in which the structural elements of conventional wind turbines are replaced with tethers and a lifting body to harvest wind power from significantly increased altitudes typically up to 600m). At those altitudes, winds are stronger and more consistent than ground-level winds. Besides being able to operate at much higher altitudes than traditional turbines, AWE systems also provide additional control degrees of freedom that allow the systems to adjust their operating altitudes and intentionally induce crosswind motions to enhance power output. A fundamental challenge lies in choosing what altitude do we go to next, particularly in a partially observable wind speed vs. altitude wind shear) profile. For the altitude optimization problem at hand, it is desirable to employ a control system that can learn the statistical properties of the wind shear profile online, thereby alleviating the need for offline data collection prior to running the control system. One of the most studied problems in the machine learning community is the design of optimization algorithms for real-world applications using scarce data. In existing literature, this problem has been studied in the context of sequential decisionmaking problems aimed at learning the behavior of an objective function called exploration) while simultaneously trying to optimize performance called exploitation). As an efficient and systematic approach for balancing exploration and exploration in a partially observable environment, Bayesian Optimization has been applied to various real-world problems [1, 2, 3, 4]. In general, Bayesian Optimization aims to find the global optimum of an unknown, expensive-to-evaluate, and black-box function within only a few evaluations. 1.1 Energy Generation Model In this section, we formalize the objective function for the altitude optimization problem. An appropriate objective function should account for the energy generated by the turbine and the energy lost by adjusting and maintaining altitude even when maintaining altitude, control energy must be expended to make small adjustments to the tethers in order to reject typical levels of turbulence). In order to account for each of these factors, the instantaneous net power generation is expressed as [6]: P total = c 1 min V wind, V rated ) 3 c2 V 2 wind + P z Vwind, ż ), 1) 31st Conference on Neural Information Processing Systems NIPS 2017), Long Beach, CA, USA.

Figure 1: Altaeros Buoyant Airborne Turbine BAT), Image Credit: [5] In 1), z is the operating altitude, V wind z) is the wind speed at the operating altitude, V rated is the turbine s rated wind speed, P z is the instantaneous power required for or regenerated by adjusting altitude, and c 1,2 are constant parameters that depend upon the turbine s power curve and flight control system, respectively. We will use a simplified version of 1) that lumps motoring and regenerated power into a single term that penalizes instantaneous altitude adjustment. ) 3 P = c 1 min V wind z), V rated c2 Vwind 2 c 3 Vwind 2 ż, 2) Ultimately, equation 2) serves as our objective function, which should be maximized. 1.2 Bayesian Optimization Suppose that we want to maximize an unknown, black-box, and expensive-to-evaluate objective function. Due to the cost associated with evaluating this function, it is crucial to select the location of each new evaluation deliberately. Bayesian Optimization represents a promising candidate strategy for this altitude optimization, as Bayesian Optimization focuses specifically on converging to the global optimum of an unknown function within a minimal number of evaluations on the real system. The major assumption here is that function evaluation is expensive, whereas computational resources are cheap. This matches the altitude optimization problem at hand, where each evaluation of the objective function requires the physical AWE system to relocate itself to a new altitude, necessitating substantial time and energy expenditure. Broadly speaking, there are two main components to Bayesian Optimization. First, we need an appropriate model of the objective function. Second, we need to choose an acquisition function, which characterizes the fitness of each candidate value of the next operating altitude. The maximization of the expected value of this acquisition function determines the next point altitude) to operate the system at. 1.2.1 Learning Phase: Using Gaussian Processes GPs) In general, a GP is fully specified by its mean function, µz), which is assumed to be zero without loss of generality [7] a coordinate shift can be used to generate a zero mean objective function), and covariance function, kz, z ): ) P z) GP µz), kz, z ), 3) In the context of altitude optimization problem, the GP framework is used to predict the generated power, P z), at candidate altitude, z c, based on a set of t past observations, D 1:t = {z 1:t, P z 1:t )}. The power at candidate altitude z c is modeled to follow a multivariate Gaussian distribution [7]: [ ] [ ] ) yt Kt + σ N 0, ɛ 2 I t kt T, 4) P z c ) k t kz c, z c ) where y t = {P z 1 ),, P z t )} is the vector of observed function values. Moreover, the vector k t z) = [kz c, z 1 ),, kz c, z t )] encodes the covariances between the candidate altitude, z c, and 2

the past data points, z 1:t. The past-data covariance matrix, with entries [K t ] i,j) = kz i, z j ) for i, j {1,, t}, characterizes the covariances between pairs of past data points. The identity matrix is represented by I t and σ ɛ represents the noise variance [7]. A common choice for the kernel function is a Squared Exponential SE) kernel, which takes the form: kz i, z j ) = σ0 2 exp 1 ) 2 z i z j ) T Λ 2 z i z j ). 5) Kernel hyper-parameters are identified by maximizing the marginal log-likelihood of the existing observed data, D [7]: θ = arg max log py t z 1:t, θ) 6) θ Once the hyper-parameters have been identified, the predictive mean and variance at z c, conditioned on these past observations, can be expressed as: ) 1y µ t z c D) = k t z c ) K t + I t σɛ 2 T t, 7) ) 1k σt 2 z c D) = kz c, z c ) k t z c ) K t + I t σɛ 2 T t z c ), 8) 1.2.2 Optimization Phase The ultimate goal of Bayesian Optimization is to determine the next point based on past observations. Among several choices of acquisition functions, we use an acquisition function belonging to the improvement-based family [8]. More precisely, the next operating point is selected as the one that maximizes the so-called expected improvement: ) z t+1 = arg max E max {0, P t+1 z) P z) max } D 1:t z where max {0, P t+1 z) P z) max )} represents the improvement toward the best value of the objective function so far, P z) max. This quantity is referred to as expected improvement. 1.3 Context-Dependent Optimization The time-varying nature of our objective function the wind speed at a given altitude does not remain constant over time) compels us to modify the conventional Bayesian Optimization method to account for spatiotemporal variation of the wind shear profile. One way to cope with the dynamic nature of the objective function is through contextual GP [9]. More precisely, one of the benefits of modeling the objective function using GPs is that we can characterize the statistical properties of the system output based on environmental variables, called context. To deal with time-varying nature of our objective function, we consider time as the context. Employing contextual GP is very straightforward. In Section 1.2.1, the kernel of the GP is defined only in terms of altitude. To account for time-dependency, we modify the kernel structure such that it consists of an addition of two kernels, one over altitude, k z, and one over time, k t : ) k z, t), z, t ) = k z z, z ) + k t t, t ), 10) Context-Dependent Bayesian Optimization is summarized in Algorithm 1. First, at the very first two time steps, the procedure is initialized by two previously evaluated altitudes and the corresponding function values line 2). Then, at each time step, the context time) is fixed at the current instance line 4). Next, the composite kernel is constructed based upon Eq. 10) line 5). The GP model and its predictive mean and variance are computed line 6-7). Then, the acquisition function value is computed and optimization is done only over the altitude space line 8-9). Finally, the suggested candidate altitude) and the current time are augmented to the data line 10), and the process is restarted. 9) 3

Algorithm 1 Context-Dependent Bayesian Optimization CDBO) 1: procedure ALTITUDE OPTIMIZATION WITH CDBO 2: D Initialize: {z 1:2, t 1:2 ), P z 1:2 )} 3: for each time step do 4: Set the context into the current time 5: Construct a composite kernel Eq. 10) ) 6: Train a GP model from D 7: Compute predictive mean and variance of GP 8: Compute acquisition function 9: Find z that optimizes acquisition function 10: Append {z, t current time ), P z )} to D Net Energy kwh) 10000 8000 6000 4000 2000 Bayesian Optimization Off-line Optimum Fixed Altitude = 1233m) Minimum Fixed Altitude = 146m) Maximum Fixed Altitude = 1528m) 0 0 20 40 60 80 100 120 140 160 Time hour) Figure 2: Comparison of net energy production over the course of one week. Context-Dependent Bayesian Optimization outperforms off-line optimum fixed altitude, minimum fixed altitude, and maximum fixed altitude scenarios. 2 RESULTS We evaluated the proposed context-dependent Bayesian Optimization CDBO) algorithm by simulating the model based on available data, acquired by a wind profiler in Cape Henlopen, DE [10]. Here, we compare the system performance under four different control scenarios: Scenario 1: Off-line Optimum Fixed Altitude: The first scenario involves flying the AWE at an optimum fixed altitude. We identify this optimal fixed altitude by off-line calculation of energy generation at different altitudes, using given wind data. Scenario 2: Minimum Fixed Altitude: The second scenario is representative of conventional towered wind turbines. For this scenario, we simulate the performance of the system operating at the lowest available altitude 146m), which is comparable to the hub height of the world s tallest towered wind turbines. Scenario 3: Maximum Fixed Altitude: The third scenario involves flying the AWE at the maximum tabulated fixed altitude, 1528m. Scenario 4: Bayesian Optimization: The fourth scenario employs Bayesian Optimization, as outlined in Algorithm 1. Fig. 2 shows the net energy production across all scenarios. One can conclude from this figure that the CDBO algorithm results in superior net energy production when compared to other scenarios. 3 CONCLUSION We presented a data-driven approach for optimizing the altitude of an AWE system to maximize the system s net power output. In implementing this altitude optimization approach, we extended conventional Bayesian Optimization to capture time-dependent patterns of our objective function. 4

References [1] Ali Baheri, Shamir Bin-Karim, Alireza Bafandeh, and Christopher Vermillion. Real-time control using bayesian optimization: A case study in airborne wind energy systems. Control Engineering Practice, 69:131 140, 2017. [2] Roman Garnett, Michael A Osborne, and Stephen J Roberts. Bayesian optimization for sensor set selection. In Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, Stockholm, Sweden, 2010. [3] Ali Baheri, Joe Deese, and Chris Vermillion. Combined plant and controller design using Bayesian optimization: A case study in airborne wind energy systems. In ASME 2017 Dynamic Systems and Control Conference, Tysons Corner, US, 2017. [4] Hany Abdelrahman, Felix Berkenkamp, Jan Poland, and Andreas Krause. Bayesian optimization for maximum power point tracking in photovoltaic power plants. In Proc. European Control Conference, Aalborg, DK, 2016. [5] Altaeros Energies, inc., http://www.altaerosenergies.com. [6] Alireza Bafandeh and Chris Vermillion. Real-time altitude optimization of airborne wind energy systems using Lyapunov-based switched extremum seeking control. In American Control Conference, Boston, US, 2016. [7] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes for machine learning. Number ISBN 0-262-18253-X. The MIT Press, 2006. [8] Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arxiv preprint arxiv:1012.2599, 2010. [9] Andreas Krause and Cheng S Ong. Contextual gaussian process bandit optimization. In Advances in Neural Information Processing Systems, pages 2447 2455, 2011. [10] C. Archer. Wind profiler at Cape Henlopen. Lewes, Delaware, http://www.ceoe.udel.edu/ourpeople/profiles/carcher/fsmw. Technical report, University of Delaware, 2014. 5