ILC Group Annual Report 2017

Size: px

Start display at page:

Download "ILC Group Annual Report 2017"

Louise Bennett
5 years ago
Views:

1 ILC Group Annual Report 2017 D. SHEN

报告摘要 Letter 本报告主要汇总了迭代学习控制研究组在 2017 年的研究内容报告的主要内容包括研究组在本年度的相关数据会议交流等学术活动讨论组报告列表研究生信息表研究方向概述以及本年度发表论文集本研究小组的主要研究方向为迭代学习控制围绕这一方向, 研究组在本年度开展了一系列的研究, 在若干个方向上取得了重要突破主要贡献如下 : 1.

2 报告摘要 Letter 本报告主要汇总了迭代学习控制研究组在 2017 年的研究内容报告的主要内容包括研究组在本年度的相关数据会议交流等学术活动讨论组报告列表研究生信息表研究方向概述以及本年度发表论文集本研究小组的主要研究方向为迭代学习控制围绕这一方向, 研究组在本年度开展了一系列的研究, 在若干个方向上取得了重要突破主要贡献如下 : 1. 在数据丢包环境下的迭代学习控制方面, 提出了连续型更新算法, 并对传统的间歇型更新算法与新提出的连续型更新算法的性能给出了性能分析与对比 ; 2. 研究了量测端与输出端同时存在数据丢包的迭代学习控制问题, 针对线性确定系统线性随机系统与仿射非线性系统, 给出了学习算法设计框架与性能分析方法 ; 3. 针对三种数据丢包模型, 即随机序列模型伯努利变量模型马氏链模型, 给出了统一的迭代学习控制算法设计与分析框架 ; 4. 在采样迭代学习控制方面, 首次给出了采样区间内跟踪误差上界的估计式, 刻画了采样迭代学习控制的性能表现, 并给出了变采样的迭代学习控制方案 ; 5. 研究了非线性动态刻画的多智能体系统迭代学习控制问题, 在输出或状态受限的要求下, 给出了五种迭代学习控制算法设计方案, 并分析了相应的算法性能 ; 6. 对通信环境中同时存在数据丢包随机延迟数据乱序等耦合因素, 以及存储器容量有限的条件下, 给出了迭代学习控制算法的设计方案并给出了收敛性分析本报告的最后一部分为本年度发表论文与在线发表论文的汇总

3 报告目录 Outline 1 研究组成员 2 研究方向概述 3 学术活动时间轴 4 讨论组内容简报 5 本年度论文列表

1 研究组成员 Members 王蓝菁女 2013 年于电子科技大学获得学士学位现于北京化工大学攻读硕士学位

4 1 研究组成员 Members 王蓝菁女 2013 年于电子科技大学获得学士学位现于北京化工大学攻读硕士学位研究方向 : 多传感器的迭代学习控制问题已发表数据驱动控制与学习系统会议论文 1 篇, 第 43 届工业电子年会论文 1 篇, 在投期刊论文 1 篇. 章凡寿男 2015 年于北京化工大学获得学士学位现于北京化工大学攻读硕士学位研究方向 : 深度学习及其应用已发表 ITSC( 智能交通国际会议 ) 会议论文 1 篇. 张超男 2016 年于北京化工大学获得学士学位现于北京化工大学攻读硕士学位研究方向 : 量化迭代学习控制及迭代学习控制在移动机器人上的实际应用已发表 SCI 论文 2 篇, 中国控制会议论文 1 篇, 获得 2017 年国家奖学金.

5 曾春女 2016 年于北京化工大学获得学士学位现于北京化工大学攻读硕士学位研究方向 : 基于复合能量函数的变长度迭代学习控制在投期刊论文 1 篇. 刘辰男 2017 年于长安大学获得学士学位现于北京化工大学攻读硕士学位研究方向 : 多智能体迭代学习控制问题在投数据驱动控制与学习系统会议论文 1 篇.

6 Group Alumni 许云女 2017 年于北京化工大学获得硕士学位被评为北京化工大学优秀毕业生研究生阶段发表期刊论文 7 篇 (SCI 期刊 6 篇 ), 会议论文 4 篇. 硕士学位论文不完备数据下的迭代学习控制与优化被评为校级优秀毕业论文, 获得 2016 年国家奖学金. 晋燕琼女 2017 年于北京化工大学获得学士学位本科期间就迭代学习控制问题, 发表 SCI 论文 2 篇. 学士学位论文双边随机丢包下的迭代学习控制被评为院级优秀毕业论文.

7 2 研究方向概述 Research 本研究报告以迭代学习控制为核心研究方向主要的研究课题包括如下几个方面 : 1. 数据丢包环境下的迭代学习控制主要研究随机数据丢包对迭代学习控制性能的影响, 以及相应的算法设计与分析框架 2. 采样迭代学习控制主要研究各类系统应用基于采样数据的迭代学习控制算法时, 其采样区间内的跟踪性能分析, 以及相应的算法设计与分析框架 3. 量化迭代学习控制在降低通信信道数据传输量及保证系统跟踪性能的矛盾要求下, 主要研究如何设计量化器以及相应的迭代学习控制算法的设计方案 4. 多智能体系统的迭代学习控制针对各种类型的多智能体系统, 在不同的拓扑结构条件下, 研究如何设计分布式的迭代学习控制算法并给出相应的协同性能分析 5. 迭代变化环境下的迭代学习控制主要研究传统迭代学习控制中各种固定不变因素改为变动情形, 尤其是沿迭代轴变动的情形下, 如何进行算法设计与分析

8 3 学术活动时间轴 Timeline 参加在重庆举办的第六届 IEEE 数据驱动控制与学习系统会议 ( 沈栋王蓝菁张超曾春 ) 上海杭州学术调研 ( 沈栋 ) 研究组内许云获得硕士学位晋燕琼获得学士学位, 完成学业毕业参加在重庆举办的第 29 届中国控制与决策会议 ( 王蓝菁张超曾春 )

9 参加在北京举办的 TCCT worshop on MAS 会议 ( 王蓝菁张超曾春 ) 参加在大连举办的第 36 届中国控制会议 ( 张超 ) 参加在西安举办的中国自动化学会混合智能委员会成立大会, 沈栋博士获选为专委会委员 ( 沈栋 ) 参加在北京举办的第 43 届工业电子学会年会 ( 沈栋王兰菁张超曾春刘辰 )

10 沈栋博士晋升至 IEEE 高级会员 (Senior Member) 沈栋博士受邀参加第二届随机控制优化与数据融合研讨会, 并做专家报告网络环境下的迭代学习控制 ( 沈栋 ) 参加在澳大利亚举办的亚洲控制会议, 会后前往墨尔本 RMIT 参加学术报告会 ( 沈栋 )

11 4 讨论组内容简报 Seminar 王蓝菁 Yan F, Tian F, Shi Z. Iterative learning approach for traffic signal control of urban road networs[j]. Iet Control Theory & Applications, 2017, 11(4): 张超 Li T, Fu M, Xie L, et al. Distributed Consensus With Limited Communication Data Rate[J]. IEEE Transactions on Automatic Control, 2011, 56(2): 曾春 Shen D, Zhang W, Xu J X. Iterative Learning Control for discrete nonlinear systems with randomly iteration varying lengths [J]. Systems & Control Letters, 2016, 96: 刘辰 Shen D, Zhang C. Learning control for discrete-time nonlinear systems with sensor saturation and measurement noises[j]. International Journal of Systems Science, 2017(9) 王蓝菁 Meng D, Moore K L. Robust Iterative Learning Control for Nonrepetitive Uncertain Systems[J]. IEEE Transactions on Automatic Control, 2017, 62(2): 张超 Fu M, Xie L, Fu M. On design of finite-level quantization feedbac control[j]. Eng.newcastle.edu.au. 曾春 Tayebi A, Chien C J. A Unified Adaptive Iterative Learning Control Framewor for Uncertain Nonlinear Systems[J]. IEEE Transactions on Automatic Control, 2007, 52(10): 刘辰 Bristow D A, Tharayil M, Alleyne A G. A Survey of Iterative Learning Control A learning-based method for high-performance tracing control[j]. Control

12 Systems IEEE, 2006, 26(3): 张超 Zhang T, Li J. Iterative Learning Control for Multi-Agent Systems With Finite- Leveled Sigma-Delta Quantization and Random Pacet Losses[J]. IEEE Transactions on Circuits & Systems I Regular Papers, 2017, 64(8): 曾春 Jin X, Xu J. A barrier composite energy function approach for robot manipulators under alignment condition with position constraints[j]. International Journal of Robust & Nonlinear Control, 2015, 24(17): 刘辰 Shiping Yang, Jian-Xin Xu, Xuefang Li, Dong Shen. Iterative Learning Control for Multi-Agent Systems Coordination. Wiley, Chapter 张超 Bu X, Hou Z, Cui L, et al. Stability analysis of quantized iterative learning control systems using lifting representation[j]. International Journal of Adaptive Control & Signal Processing, 2017, 31(9). 曾春 Xu J X, Yan R. On initial conditions in iterative learning control[j]. IEEE Transactions on Automatic Control, 2005, 50(9): 刘辰 Shiping Yang, Jian-Xin Xu, Xuefang Li, Dong Shen. Iterative Learning Control for Multi-Agent Systems Coordination. Wiley, Chapter 3

13 5 本年度论文列表 Publications Journal Papers 1. Dong Shen, Jian-Xin Xu. A Framewor of Iterative Learning Control under Random Data Dropouts: Mean Square and Almost Sure Convergence. International Journal of Adaptive Control and Signal Processing, vol. 31, no. 12, pp , Dong Shen, Jian-Xin Xu. Distributed Adaptive Iterative Learning Control for Nonlinear Multi-Agent Systems with State Constraints. International Journal of Adaptive Control and Signal Processing, vol. 31, no. 12, pp , Dong Shen, Jian-Xin Xu. A Novel Marov Chain Based ILC Analysis for Linear Stochastic Systems Under General Data Dropouts Environments. IEEE Transactions on Automatic Control, vol. 62, no. 11, pp , Dong Zhao, Dong Shen, Youqing Wang. Fault Diagnosis and Compensation for Two-Dimensional Discrete Time Systems with Sensor Faults and Time- Varying Delays. International Journal of Robust and Nonlinear Control, vol. 27, no. 16, pp , Yun Xu, Dong Shen, Xuhui Bu. Zero-Error Convergence of Iterative Learning Control Using Quantized Information. IMA Journal of Mathematical Control and Information, vol. 34, no. 3, pp , Yun Xu, Dong Shen, Xiao-Dong Zhang. Stochastic Point-to-Point Iterative Learning Control Based on Stochastic Approximation. Asian Journal of Control, vol. 19, no. 5, pp , Dong Shen, Chao Zhang. Learning Control for Discrete-Time Nonlinear Systems with Sensor Saturation and Measurement Noise. International Journal of Systems Sciences, vol. 48, no. 13, pp , Xuefang Li, Dong Shen. Two Novel Iterative Learning Control Schemes for Systems with Randomly Varying Trial Lengths. Systems & Control Letters, vol. 107, pp. 9-16, Dong Shen, Yanqiong Jin, Yun Xu. Learning Control for Linear Systems under General Data Dropouts at Both Measurement and Actuator Sides: A Marov Chain Approach. Journal of the Franlin Institute, vol. 354, no. 13, pp , Dong Shen. Almost Sure Convergence of ILC for Networed Linear Systems with Random Lin Failures. International Journal of Control, Automation, and Systems, vol. 15, no. 2, pp , Dong Shen, Jian Han, Youqing Wang. Stochastic Point-to-Point Iterative Learning Tracing Without Prior Information on System Matrices. IEEE Transactions on Automation Science and Engineering, vol. 14, no. 1, pp , 2017.

14 12. Yun Xu, Dong Shen, Youqing Wang. On Interval Tracing Performance Evaluation and Practical Varying Sampling ILC. International Journal of Systems Science, vol. 48, no. 8, pp , Dong Shen, Jian Han, Youqing Wang. Convergence Analysis of ILC Input Sequence for Underdetermined Linear Systems. SCIENCE CHINA Information Sciences, vol. 60, ID: , Dong Shen, Chao Zhang, Yun Xu. Two Compensation Schemes of Iterative Learning Control for Networed Control Systems with Random Data Dropouts. Information Sciences, vol. 381, pp , Online Journal Papers 15. Saurab Verma, Dong Shen, Jian-Xin Xu. Motion Control of Robotic Fish under Dynamic Environmental Conditions using Adaptive Control Approach. IEEE Journal of Oceanic Engineering. 16. Yanqiong Jin, Dong Shen. Iterative Learning Control for Nonlinear Systems with Data Dropouts at Both Measurement and Actuator Sides. Asian Journal of Control. 17. Dong Shen. Data-Driven Learning Control for Stochastic Nonlinear Systems: Multiple Communication Constraints and Limited Storage. IEEE Transactions on Neural Networs and Learning Systems. 18. Dong Shen, Chao Zhang, Yun Xu. Intermittent and Successive ILC for Stochastic Nonlinear Systems with Random Data Dropouts. Asian Journal of Control. Conference Papers 19. Dong Shen, Jian-Xin Xu. Zero-Error Tracing of Iterative Learning Control using Probabilistically Quantized Measurements. The 2017 Asian Control Conference (ASCC2017), Gold Coast, Australia, December 17-20, 2017, pp Dong Shen, Lanjing Wang. On Iterative Learning Tracing Problem for Multi- Sensor Systems. The 43rd Annual Conference of the IEEE Industrial Electronics Society, Beijing, China, October 29-November 1, 2017, pp Dong Shen, Jian-Xin Xu. Iterative Learning Control for Linear Systems with Marov Data Dropouts: Noise-free Case. The 36th Chinese Control Conference (CCC2017), Dalian, China, July 26-28, 2017, pp Chao Zhang, Dong Shen. Zero-Error Convergence of Iterative Learning Control Using Uniform Quantizer with Encoding and Decoding Method. The 36th Chinese Control Conference (CCC2017), Dalian, China, July 26-28, 2017, pp Chiang-Ju Chien, Ying-Chung Wang, Meng-Joo Er, Ronghu Chi, Dong Shen. An Adaptive Iterative Learning Control for Discrete-Time Nonlinear Systems with Iteration-Varying Uncertainties. IEEE 6th Data Driven Control and

15 Learning Systems Conference (DDCLS17), Chongqing, China, May 26-27, 2017, pp Xuefang Li, Deqing Huang, Dong Shen, Jian-Xin Xu. Boundary Tracing Control for MIMO PDE-ODE Cascade Systems via Learning Control Approach. IEEE 6th Data Driven Control and Learning Systems Conference (DDCLS17), Chongqing, China, May 26-27, 2017, pp Lanjing Wang, Dong Shen, Xuefang Li, Chiang-Ju Chien, Ying-Chung Wang. Sampled-data Iterative Learning Control for Nonlinear Systems with Iteration Varying Lengths. IEEE 6th Data Driven Control and Learning Systems Conference (DDCLS17), Chongqing, China, May 26-27, 2017, pp

16 Received: 18 March 2017 Revised: 15 May 2017 Accepted: 11 July 2017 DOI: /acs.2802 RESEARCH ARTICLE A framewor of iterative learning control under random data dropouts: Mean square and almost sure convergence Dong Shen 1 Jian-Xin Xu 2 1 College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , China 2 Department of Electrical and Computer Engineering, National University of Singapore, Singapore Correspondence Dong Shen, College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , China. shendong@mail.buct.edu.cn Funding information National Natural Science Foundation of China, Grant/Award Number: , ; Beijing Natural Science Foundation, Grant/Award Number: ; China Scholarship Council, Grant/Award Number: Summary This paper addresses the iterative learning control problem under random data dropout environments. The recent progress on iterative learning control in the presence of data dropouts is first reviewed from 3 aspects, namely, data dropout model, data dropout position, and convergence meaning. A general framewor is then proposed for the convergence analysis of all 3 inds of data dropout models, namely, the stochastic sequence model, the Bernoulli variable model, and the Marov chain model. Both mean square and almost sure convergence of the input sequence to the desired input are strictly established for noise-free systems and stochastic systems, respectively, where the measurement output suffers from random data dropouts. Illustrative simulations are provided to verify the theoretical results. KEYWORDS almost sure convergence, Bernoulli model, data dropout, iterative learning control, Marov chain, mean square convergence, stochastic sequence 1 INTRODUCTION Iterative learning control (ILC) is a branch of intelligent control as it can improve tracing performance whenever a given tracing tas is completed repeatedly. In such a case, the tracing information and the corresponding input signal in previous iterations/cycles/batches are used to construct the input signal for the current iteration/cycle/batch from which a learning mechanism is introduced to ensure asymptotical convergence along the iteration axis. As a consequence, ILC is much suitable for systems that can complete a given tas in a finite time interval and repeat it successively. Since its introduction in the wor of Arimoto et al 1 in 1984 for robot control, ILC has gained a lot of developments both in theory and applications over the past 3 decades. 2-4 Many related topics have been studied such as robust ILC, 5 distributed ILC, 6-8 monotonic convergence, 9 interval ILC, 10 and initial resetting condition, 11 among others. As fast developments of communication and networ techniques, many systems have adopted the networed control structure, that is, the controller and the plant in such systems are separated in different sites and communicate with each other through wired/wireless networs. For example, when considering an application of ILC to the robot fish in the laboratory, 12 the control algorithm is run on a computer, and the computer is communicated with the robot fish through a wireless networ for data and command transmission. Similar implementation goes to the unmanned aerial vehicle routine surveillance control, where the control center for updating the control signals and the unmanned aerial vehicles for continuous cruising are separated and communicate through wireless networs. Moreover, in the studies of distributed ILC, 6-8,13 the communication of different agents is also through wireless networs. Therefore, a natural and critical problem is the data dropout, which damages the tracing performance. This problem motivates us to consider the design and analysis of ILC in the presence of random data dropouts. Int J Adapt Control Signal Process. 2017;31: wileyonlinelibrary.com/journal/acs Copyright 2017 John Wiley & Sons, Ltd. 1825

17 1826 SHEN AND XU Some earlier attempts have been reported In the next section, we will give a brief literature review of related studies on ILC in the presence of data dropouts from 3 aspects, namely, data dropout model, data dropout position, and convergence meaning. From the literature review, we have observed several facts: (1) most papers adopt the classic Bernoulli model for describing data dropouts, (2) most papers assume that the data dropouts only occur at the measurement side, and (3) convergence meaning is scattered in mathematical expectation, mean square, and almost sure senses in different papers. In this paper, we propose a new analysis framewor for the ILC problem under random data dropout environments. In this framewor, the random sequence model (RSM), the Bernoulli variable model (BVM), and the Marov chain model (MCM) for data dropouts are all taen into consideration. Moreover, both mean square convergence and almost sure convergence of the input sequence to the desired input are established, from which the convergence in mathematical expectation is a direct corollary. Furthermore, although we restrict our discussions to the case that data dropouts occur at the measurement side, the extension to the general case that data dropouts occur at both measurement and actuator sides is easy to establish without additional limitations on the successive data dropouts. In addition, while we consider the classic P-type algorithm in this paper to clarify our idea, the extensions to other types of ILC algorithms such as the PD type and the current-iteration-feedbac-integrated type can be derived following similar steps. We should point out that this paper focuses on control over networs, which is distinctly different from papers concerning control of networs, such as those by Meng and Moore, 7 Xiong et al, 8 and Xiong et al. 13 By control over networs, we mean the control signal is transmitted through networs, whereas by control of networs, we mean the control is constructed for a multiagent system (consisting of several agents or subsystems). Thus, the 2 problems have different research concerns. This paper is arranged as follows. Section 2 presents a brief literature review of the contributions on ILC in the presence of data dropouts. Section 3 provides the problem formulation including system formulation, data dropout models, and the control objective. The detailed convergence analysis under the new framewor for linear systems without and with stochastic noise is elaborated in Sections 4 and 5. Illustrative simulations are given in Section 6. Section 7 concludes this paper. Notation. R is the set of real numbers, and R n is the n-dimensional space. P( ) denotes the probability of its indicated event, and E denotes the mathematical expectation of its indicated random variable. I n denotes the unit matrix with dimension n n. The subscript n may be omitted where no confusion exists. 0 m n denotes the zero matrix with dimension m n, and it is abbreviated as 0 n when n = m. The superscript T is used to denote the transpose of a vector or a matrix. For a vector x, x 2 = x T x denotes the Euclidean norm with x = x T x,and x M = x T Mx denotes a weighted norm with respect to a positive definite matrix M. 2 LITERATURE REVIEW In this section, we give a brief literature review on ILC for systems with random data dropouts and classify the contributions of the existing papers from 3 aspects, namely, random data dropout models, data dropout positions, and convergence meaning. From these 3 dimensions, we can get a comprehensive picture view of the state of the art. 2.1 Data dropout models There are only 2 states describing the transmission: successful transmission and loss. Thus, if we introduce a random variable to describe the data dropouts, it is a binary variable. Usually, we let the variable be 1 if the corresponding data pacet is successfully transmitted through the wired/wireless networs; we let the variable be 0 otherwise. Moreover, such variable is inherently random, and thus, we should introduce some additional model for the binary variable to give a characterization of the randomness of data dropouts. The most popular model for the data dropout should go to the Bernoulli model. In this model, the random variable taes the value of 1 with success probability p and the value of 0 with failure probability q = 1 p. Moreover, the data dropouts for different pacets occur independently. In other words, this model has a clear probability distribution and good independence. Therefore, it is widely used in many papers addressing the data dropout topic. Most ILC papers also adopted this model with/without extra requirements on data dropouts. There are a few ILC papers dropping this model. Pan et al 28 gave an elaborate investigation of the effect of data dropouts. Thus, the authors mainly considered the case that only a single pacet was lost during the transmission and provided a specific derivation for the effect on the input error and tracing performance. As to the multiple-pacet-loss case, a general discussion

18 SHEN AND XU 1827 was given instead of strict analysis and description. Specifically, the authors claimed that the data dropout level should be far smaller than 100% to ensure a satisfactory tracing performance. The wors of Shen and Wang 29,30 provided a so-called RSM for data dropouts. Specifically, the sequence of the data dropout variables along the iteration axis was not assumed to be with any specific probability distribution. In other words, the statistical property of the data dropouts can vary along the iteration axis. Thus, the steady distribution in the Bernoulli model is removed. However, to ensure asymptotical convergence of the input sequence, an additional requirement was imposed to the data dropout model in the wors of Shen and Wang 29,30 : that there should exist a sufficient large number K such that during any successive K iterations, at least one data dropout variable taes the value of 1. In other words, the data should be successfully transmitted from time to time. There is another model for data dropouts, ie, the MCM, which has been used in some papers addressing other control strategies. In this model, the data dropouts have some dependence on the previous event. That is, the loss or not of the current pacet would affect the probability of successful transmission for the next pacet. In the ILC under data dropouts, this model has not been discussed. 2.2 Data dropout positions In the networed ILC, the plant and the learning controller are separated in different sites and communicate with each other through wired/wireless networs. Thus, there are 2 channels connecting the plant and the learning controller. One channel is at the measurement side to transmit the measured output information bac to the learning controller. The other channel is at the actuator side to transmit the generated input signal to the plant so that the operation process can continuously run. When considering the data dropout problem for ILC, the position at which data dropout occurs is usually assumed to be the measurement side. In other words, only the networ at the measurement side is assumed to be lossy, and the networ at the actuator side is assumed to wor well in most papers ,19,20,23-26,29,30 In these papers, the generated input signal can be always sent to the plant without any loss. Although some papers claimed that their results can be extended to the general case where the networs at the measurement and actuator sides suffered random data dropouts, it is actually not a trivial extension. Specifically, when the networ at the measurement side suffers random data dropouts, the output signal of the plant may or may not be successfully transmitted. One simple mechanism for treating the measured data is as follows: if the measured output is successfully transmitted, then the learning controller would employ such information for updating; if the measured output is lost during transmission, then the learning controller would stop updating until the corresponding output information is successfully transmitted. One may find that the lost data are simply replaced by 0 in this mechanism. For the case that data dropout occurs only at the measurement side, such simple mechanism is sufficient to ensure the learning process as long as the networ is not completely broen down. However, when considering the data dropout at the actuator side, it is clear that the lost input signal cannot be simply replaced by 0 as it would greatly damage the tracing performance. That is, if the networ at the actuator side suffers data dropouts, the lost input signal must be compensated with a suitable pacet to maintain the operation process of the plant. This observation motivates the investigation on compensation mechanisms for the lost data. 18,21,22,28 Pan et al 28 gave an earlier attempt on compensating the lost data. When 1 pacet of the input signal is lost at the actuator side, the one-time-instant ahead input signal is applied to compensate for the lost one. That is, if the input at time instant t is lost, it would be compensated with the input at time instant t 1. When 1 pacet of the output signal is lost at the measurement side, a similar compensation mechanism is applied. It is worth noting that the data dropouts at the measurement side and the actuator side are separately discussed in the wor of Pan et al. 28 Moreover, this mechanism was then adopted by Bu et al 18 for a Bernoulli model of random data dropouts occurring at both the measurement and actuator sides simultaneously. We should emphasize that, as a natural consequence, the data at adjacent time instants at the same iteration cannot be dropped simultaneously due to the inherent compensation requirement. Another compensation mechanism is to apply the corresponding data from the last iteration as shown in the wors of Huang and Fang 21 and Liu and Ruan. 22 That is, if the data pacet at the th iteration is lost during transmission, it is compensated with the pacet at the ( 1)th iteration with the same time instant label. In such an assumption, the successive data dropouts along the time axis are allowed; however, it restricts that there was no simultaneous data dropout at the same time instant across any 2 adjacent iterations. In other words, no successive data dropouts along the iteration axis are allowed.

19 1828 SHEN AND XU In short, the contributions in the aforementioned wors 18,21,22,28 show that the newly introduced compensation mechanisms impose additional limitations to the data dropout models. In fact, the inherent difficulty of convergence analysis lies in the asynchronism between the computed input of the learning controller and the actual input fed to the plant. A recent paper 27 solved this problem according to the Bernoulli model allowing successive data dropouts along both time and iteration axes and provided a simple compensation mechanism with the iteration-latest available pacet. 2.3 Convergence meaning In this subsection, we review the analysis techniques and the related convergence results, particularly the convergence meaning in considering the randomness of data dropouts apart from optional stochastic noise. Ahn et al provided earlier attempts on ILC for linear systems in the presence of data dropouts The Kalman filtering based technique, which was first proposed by Saab, 31 was applied, and thus, the mean square convergence of the input sequence was obtained. The main difference among the aforementioned papers lies in the position where data dropouts occur. Specifically, in the first paper, 14 the output vector was assumed to be lossy; in the second paper, 15 this assumption was relaxed to the case where only partial dimensions of the output may suffer data dropouts; and in the third paper, 16 the data dropouts at both the measurement and actuator sides were taen into account. In short, the Kalman filtering based technique was deeply investigated in the series of wors by Ahn et al. Bu et al gave different angles to solve this problem. In the first paper, 17 the exponential stable result of asynchronous dynamical systems 49 was referred to establish the convergence condition of ILC under data dropouts. As a result, the randomness of data dropouts was not involved in the analysis steps. In the second paper, 18 such randomness was eliminated from the recursion by taing mathematical expectation; thus, the algorithm was converted into a deterministic type, and then, the design and analysis of the convergence followed the conventional way. Therefore, the convergence was clearly in the mathematical expectation sense. In the third paper, 19 anewh- framewor was defined along the iteration axis, and then, the related control problem was solved in the newly defined framewor. That is, the ernel objective was to satisfy an H- performance index in the mean square sense. A linear matrix inequality design condition for the learning gain matrix was also provided. In the fourth paper, 20 the widely used 2-dimensional system approach was revisited to deal with data dropouts. A mean square asymptotically stable result was obtained, and the design condition for the learning gain matrix was solved through linear matrix inequality techniques. In short, the evolution dynamics along the iteration axis was carefully studied, and related techniques are applied for the design and analysis of ILC. There are some other scattered results on this topic ,28 Pan et al 28 proposed a detailed analysis of the effect of pacet loss for the sampled ILC. Specifically, a single pacet loss at the measurement side and the actuator side was evaluated separately to study the inherent influence of data dropout on the tracing performance. In other words, a deterministic analysis was given according to the input error. The results in the wor of Pan et al 28 revealed that neither contraction nor expansion occurred for the input error if the corresponding pacet was lost during transmission. Such a technique was further exploited and used in the wor of Huang and Fang 21 to study the general data dropout case. In the wor of Liu and Ruan, 22 a mathematical expectation was taen to the recursive inequality of input error to eliminate the randomness of data dropouts similar to the wor of Bu et al, 18 and then, the conventional contraction mapping method was used to derive the convergence results. Moreover, to construct explicit contraction mapping, the conditions in the wor of Liu and Ruan 22 were much conservative, and it may be further relaxed. Similar techniques were also used in the wor of Liu and Xu, 23 an incorporation with the conventional α-norm technique to derive convergence in the mathematical expectation sense. Shen et al mainly contributed the almost sure convergence results of ILC under data dropout environments. In the wor of Shen and Wang, 24 a simple case that the whole iteration was paced and transmitted as a single pacet was investigated by a switched system approach. Specifically, the evolution along the iteration axis was formulated as a switched system, and the statistical properties were recursively computed. Then, the convergence in the sense of expectation, mean square, and almost sure was established in turn. In the wor of Shen and Wang, 29 based on stochastic approximation theory, the almost sure convergence of the input sequence was proved for the case that the data dropouts were modeled by an RSM. This result was then extended to the unnown control direction case in the wor of Shen and Wang. 30 For the traditional Bernoulli model of data dropouts, the essential difficulty in obtaining the almost sure convergence lies in the random successive data dropouts along the iteration axis. This problem was solved in the wors of Shen et al 25,26 for linear and nonlinear stochastic systems, respectively. The authors of these papers proceeded to investigate the general data dropouts at both measurement and actuator sides without any additional requirements but the Bernoulli assumption in the wor of Shen and Xu. 27 When data dropouts occur at the actuator sides, there is a newly introduced asynchronism between the computed control generated by the learning controller and the actual control

20 SHEN AND XU 1829 TABLE 1 Classification of the papers on iterative learning control under data dropouts Refs Model Position Convergence RSM BVM MCM Measurement Actuator ME MS AS DA Ahn et al 14 Ahn et al 15 Ahn et al 16 Bu et al 20 Bu and Hou 17 Bu et al 18 Bu et al 19 Huang and Fang 21 Liu and Ruan 22 Liu and Xu 23 Pan et al 28 Shen and Wang 24 Shen and Wang 29 Shen and Wang 30 Shen and Xu 27 Shen et al 25 Shen et al 26 Abbreviations: AS, almost sure; BVM, Bernoulli variable model; DA, deterministic analysis; MCM, Marov chain model; ME, mathematical expectation; MS, mean square; RSM, random sequence model. fed to the plant. Such asynchronism was characterized by a Marov chain in the aforementioned wor, 27 and then, the mean square and almost sure convergence was established. 2.4 Further remars The recent progress on ILC in the presence of data dropouts is classified in Table 1 according to the data dropout model, data dropout position, and convergence meaning. From this Table, we have observed several points. In most papers, the data dropout is modeled by the Bernoulli random variable, while the results according to the RSM are rather limited. Moreover, for the MCM, no result has been reported. All the papers consider the data dropout occurring at the measurement side, and only a few papers address the case at the actuator side. As we have previously explained, the latter case would involve an essential influence on the controller design and convergence analysis. The convergence meaning is scattered in different papers. Mean square and almost sure convergence implies the convergence in the mathematical expectation sense. However, they cannot imply each other according to the probability theory. Thus, it is of interest to propose an in-depth framewor for the design and analysis of ILC in both senses simultaneously. Based on this progress, we will propose a comprehensive framewor for the convergence analysis of ILC under various data dropout models. In contrast to the current status, we have the following highlights. First of all, the new framewor is applicable to all the proposed models of data dropouts, and thus, the blan for the MCM is filled (differing from almost all relevant papers). Moreover, our method can be extended to the actuator-side case without imposing further restrictions on the successive data dropouts (differing from the one-side data dropout papers 17,19,20,25,26,29,30 and restricted two-side data dropout papers 18,21,22,28 ). Furthermore, we will reveal the essential connection between the convergence results in the mean square and almost sure sense and then establish the convergence results for both noise-free and noised systems, respectively (differing from the mathematical expectation based convergence papers ). In short, the ILC problem under data dropouts is deeply discussed and resolved in this paper.

21 1830 SHEN AND XU 3 PROBLEM FORMULATION In this section, we will formulate the system, models for data dropouts, and the control objective in turn. 3.1 System formulation Consider the following linear time-varying system: x (t + 1) =A t x (t)+b t u (t)+w (t + 1), y (t) =C t x (t)+v (t), (1) where is the iteration number, = 1, 2,, t is the time instant, t = 0, 1,, N, andn is the iteration length. The variables x (t) R n, u (t) R p,andy (t) R q are the system state, input, and output, respectively. The notations w (t) R n and v (t) R q are the system and measurement noise, respectively. In addition, A t, B t,andc t are system matrices with appropriate dimensions. If the stochastic noise values w (t) and v (t) are absent, ie, w (t) =v (t) =0,, t, we term the system a noise-free system. Otherwise, if variables w (t) and v (t) are described by random variables, we term the system a stochastic system. In this paper, we assume that the system relative degree is τ, τ 1, that is, for any t τ, wehave where A j i A ja j 1 A i, j i, anda i i+1 I n. C t A t 1 t+1 i B t i = 0, 1 i τ 1, (2) C t A t 1 t+1 τ B t τ 0, (3) Remar 1. The relative degree implies the smallest structure delay of the input effect on its corresponding output. For example, if the relative degree τ=1, the input at time instant t would have an effect on the output at time instant t + 1 but no effect on the output at time instant t. The relative degree is an intrinsic property of the system and, thus, is usually time invariant. Moreover, assuming the relative degree to be τ and starting the operation from the time instant t = 0, we find that the first controllable output appears at time instant t =τ, which is driven by u (0). In other words, the outputs at time t = 0uptot =τ 1 are uncontrollable in such a situation. As a consequence, these outputs would be formulated in the initialization condition. In addition, considering the MIMO system formulation, the relative degree may vary for different dimensions of the output vector, that is, different dimensions of the output vector have different relative degree values. It is straightforward to extend the following derivations to this case. Therefore, we omit the tedious extensions to mae a concise layout. Denote the desired reference as y d (t), t {0, 1,, N}. Without loss of generality, we assume that the reference is achievable, that is, with a suitable initial value of x d (0), there exists a unique input u d (t) such that x d (t + 1) =A t x d (t)+b t u d (t), y d (t) =C t x d (t). (4) Denote the tracing error as e (t) y d (t) y (t), t {0, 1,, N}. Remar 2. Note that the system relative degree is τ, implying that the output at time instant t = 0uptot =τ 1cannot be affected by the input. Therefore, the actual tracing reference is y d (t), τ t N, whereas the initial τ outputs from t = 0up to τ 1 are regulated by the initialization condition. Moreover, the uniqueness of the desired input u d (t) can be guaranteed if the matrix C t A t 1 t+1 τ B t τ is of full-column ran. That is, the input u d (t) can be recursively computed from the nominal model (4) for t τ as follows: u d (t τ)= [ (Ct A t 1 t+1 τ B ) T ( t τ Ct A t 1 t+1 τ B ) ] 1( t τ Ct A t 1 t+1 τ B ) T ( t τ yd (t) C t At τx t 1 d (t τ) ). (5) The special case of Equation 5, with τ being 1, has been explicitly given in many existing papers. 31,32 It should be emphasized that the full-column ran requirement is not strict as it has been proved necessary for perfect tracing. 33,34 As a consequence, formulation (4) is a mild assumption for the system, which has been used in many existing ILC papers. When the coupling matrix is of full-row ran rather than full-column ran, which usually implies that the dimension of the input is greater than that of the output, it is found that only the asymptotical convergence of the tracing error is ensured in many papers

22 SHEN AND XU 1831 (see, eg, the wor of Saab 35 ). Moreover, recent papers 36,37 have extended the ran conditions of coupling matrices from an iteration-invariant case to an iteration-varying case, which is a promising issue in handling nonrepetitive uncertainties. The following mild assumptions are given for system (1). Assumption 1. The system initial value satisfies that x (0) =x d (0), wherex d (0) is consistent with the desired reference y d (0) in the sense that y d (0) =C 0 x d (0). Remar 3. This initialization condition is critical for ensuring the accurate tracing performance of the whole iteration and, thus, is an important issue in the ILC field. Assumption 1 is the well-nown identical initialization condition. This condition is a basic requirement for time and space resetting of the system operation and, thus, is widely used in most ILC papers. Moreover, many scholars have contributed to relaxing this condition by introducing initial rectifying or learning mechanisms; however, either additional system information or tracing information is required when using the initial learning mechanisms. 38,39 Note that the focus of this paper is in proposing the comprehensive analysis of ILC under data dropout environments; thus, we use Assumption 1 to mae the paper concentrated. Define the σ-algebra =σ ( x i (t), u i (t), y i (t), w i (t), v i (t), 1 i, 0 t N ) (ie, the set of all events induced by these random variables) for 1. Assumption 2. The stochastic noise variables {w (t)} and {v (t)} are martingale difference sequences along the iteration axis with finite conditional second moments. That is, for t {0, 1,, N}, E{w +1 (t) }=0, sup E{ w +1 (t) 2 } <, E{v +1 (t) }=0, sup E{ v +1 (t) 2 } <. Remar 4. The system for which the ILC method is applicable should be repeated so that the tracing performance can be gradually improved along the iteration axis. Consequently, the stochastic noise variables are usually independent along the iteration axis, from which Assumption 2 is mild and widely satisfied in practical applications. It is evident that the classical zero-mean white noise satisfies this assumption. To facilitate the analysis in the following sections, we give the lifting forms of system (1). To this end, define the super-vectors as follows: U = [ u T (0), ut (1),, ut (N τ)] T, (6) Y = [ y T (τ), yt (τ + 1),, yt (N)] T. (7) Similarly, U d and Y d can be defined by replacing the subscript in the above equations with d. The associated transfer matrix H can be formulated as C τ A τ 1 B q p 0 q p C H = τ+1 A τ 1 B 0 C τ+1 A τ 2 B 1 0 q p. (8) C N A N 1 B 1 0 C N A N 1 B 2 1 C N A N 1 N τ+1 B N τ Therefore, we have the following relationship between the input and the output: Y = HU + Mx (0)+ξ (9) and Y d = HU d + Mx d (0), (10) where M =[(C τ A τ 1 0 ) T,, (C N A N 1 0 ) T ] T,and ξ = ( τ T C τ A τ 1 i w (i)+v (τ)), i=1 ( τ+1 T ( N ) T T C τ+1 A τ i w (i)+v (τ + 1)),, C N A N 1 i w (i)+v (N). (11) i=1 i=1 Recalling the tracing error e (t) =y d (t) y (t), we denote the lifted tracing error E Y d Y. Then, it is evident that E = Y d Y = H(U d U ) ξ, (12) where Assumption 1 is applied. These formulations will be used in the convergence analysis only.

23 1832 SHEN AND XU FIGURE 1 Bloc diagram of the networed iterative learning control [Colour figure can be viewed at wileyonlinelibrary.com] 3.2 Models for data dropouts In this subsection, we give the 3 common models of data dropouts and detail the differences among the models. Prior to the formulation, we first present the networed structure of ILC. The bloc diagram of the networed ILC considered in this paper is given in Figure 1, in which, without loss of generality, only the networ at the measurement side is assumed to be lossy, whereas the networ at the actuator side is assumed to wor well. The extension to the case that the networs at both sides suffer from the data dropout problem can be derived following the similar steps in this paper (cf Remar 10). Therefore, to mae our idea easy to follow, we restrict our discussions to the one-side data dropout case. The data dropout occurring or not, which could be regarded as a switch that opens and closes the networ in a random manner, is denoted by a random variable γ (t). Therefore, there are 2 possible states of the variable γ (t). Specifically, we let γ (t) be equal to 1 if the corresponding tracing error y (t) is successfully transmitted and to 0 otherwise. In this paper, without loss of generality, the information for each time instant is paced as a data pacet and transmitted, that is, y (t) also denotes an individual data pacet containing the output information at time instant t for the th iteration. In this paper, we consider the following 3 most common models of data dropouts. Random sequence model (RSM): For each t, the measurement pacet loss is random withoutobeyingany certain probability distribution, but there is a positive integer K 1 such that, at least in 1 iteration, the measurement is successfully sent bac during the successive K iterations. Bernoulli variable model (BVM): The random variable γ (t) is independent for different values of time instant t and iteration number. Moreover,γ (t) obeys a Bernoulli distribution with where γ =Eγ (t) with 0 < γ < 1. P(γ (t) =1) = γ, P(γ (t) =0) =1 γ, (13) Marov chain model (MCM): The random variable γ (t) is independent for different values of time instant t. Moreover, for each t, the evolution of γ (t) along the iteration axis follows a 2-state Marov chain, of which the probability transition matrix is P = [ ] P11 P 10 P 01 P = 00 [ μ ] 1 μ 1 ν ν with 0 < μ, ν < 1, where P 11 = P(γ +1 (t) =1 γ (t) =1), P 10 = P(γ +1 (t) =0 γ (t) =1), P 01 = P(γ +1 (t) =1 γ (t) =0), and P 00 = P(γ +1 (t) =0 γ (t) =0). Remar 5. The RSM is illustrated in Figure 2 where a horizontal bar denotes an iteration process. In any bar, the white rectangle and the blac rectangle denote the lost pacet and the successfully transmitted pacet, respectively. The gray part of each horizontal bar denotes the omission part. The RSM implies that, for an arbitrary time instant t, the corresponding output information can be received at least once for any successive K iterations. As shown in Figure 2, taing the time instant t = 4 for example, there is at least 1 blac rectangle for any successive K horizontal bars. Moreover, this model can be formulated using the random variable γ (t) as follows: for each t, K 1 i=0 γ +i(t) 1forall 1. It is worth pointing out that we only require the existence of the number K rather than its specific value, that is, the number K is not necessary to be nown prior, and it is not involved in the design of the ILC update law later. In fact, this model means that the output information should not be lost too much to ensure the learning ability in a somewhat deterministic point of view. Remar 6. The number K of the RSM indicates that the maximum length of successive data dropouts is K 1. Thus, the case K = 1 means no data dropout occurring, whereas the case K = 2 means no successive data dropout occurring for any (14)

SHEN AND XU 1833 FIGURE 2 Illustration of the random sequence model 2 subsequent iterations. Moreover, the value of the successive iteration number K is a reflection of the rate of data dropouts.

24 SHEN AND XU 1833 FIGURE 2 Illustration of the random sequence model 2 subsequent iterations. Moreover, the value of the successive iteration number K is a reflection of the rate of data dropouts. However, it is not equivalent to the data dropout rate (DDR), which can be formulated as lim n 1 n [ n =1 ( 1 γ (t) )].In fact, DDR denotes the average level of data dropouts along the iteration axis, whereas K implies the worst case of successive data dropouts. In other words, a larger K value usually corresponds to a higher DDR, whereas a smaller K value usually corresponds to a lower DDR. However, the connection between K and DDR need not necessarily be positively related. Remar 7. The mathematical expectation γ of the BVM is closely related to the DDR in light of the law of large numbers, that is, DDR is equal to 1 γ. Specifically, the data dropout is independent along the iteration axis; thus, lim n 1 n [ n ( =1 1 γ (t) )] = 1 Eγ (t) =1 γ.if γ =0, implying that the networ is completely broen down, then no information can be received from the plant, and thus, no algorithm can be applied to improve the tracing performance. If γ = 1, implying that no data dropout occurs, then the framewor converts into the classical ILC problem. In this paper, with a framewor for designing and analyzing the ILC update law under data dropouts, we simply assume 0 < γ < 1. Moreover, the statistics property of γ (t) is assumed to be identical for different time instants for a concise expression. The extension to the time-dependent case, ie, the case that Eγ (t) = γ t, is straightforward without additional efforts. Remar 8. The MCM is general for modeling the data dropouts. The transition probabilities μ and ν denote the average level of retaining the same state for successful transmission and loss, respectively. If μ+ν = 1, then the MCM converts into the BVM. That is, the BVM is a special case of the MCM. It is worth pointing out that all 3 models are widely investigated in the field of networed control systems, such as in the wors of Lin and Antsalis 40 for the RSM, Sinopoli et al 41 for the BVM, and Shi and Yu 42 for the MCM. Remar 9. In this remar, we comment the differences among the 3 models. The RSM differs from both the BVM and the MCM as it requires no probability distribution or statistics property of the random variable γ (t). However,theRSM pays the price that the successive data dropout length is bounded, compared with BVM and MCM. Specifically, both BVM and MCM admit arbitrary successive data dropouts associated with a suitable occurring probability. Consequently, the RSM cannot cover BVM/MCM, and vice versa. It should be pointed out that the RSM implies that the data dropout is not totally stochastic. Moreover, the difference between the BVM and the MCM lies in the point that the data dropout occurs independently along the iteration axis for the BVM, while dependently for the MCM. The independence of data dropout admits some specific computations such as mean and variance (compare with the wor of Shen et al 43 ) and then derives the convergence analysis. Such a technique is not applicable for the MCM. Remar 10. When the networ at the actuator side is lossy, a simple updating mechanism for the input fed to the plant is the holding strategy. That is, if the newly generated input signal is successfully transmitted, the input fed to the plant is updated; if the newly generated input signal is lost during transmission, the input fed to the plant retains the last available value. Using this updating mechanism, the following convergence analysis can be extended to the general data dropout case. Specifically, when data dropout occurring at the actuator side, it is seen that the input signal generated by the controller and the one fed to the plant are not always identical. Such asynchronism between the 2 input signals can be analyzed following similar steps as in the wor of Shen and Xu 27 and shown to be bounded (for the RSM model) or Marovian (for BVM and MCM models). Thus, the analysis techniques proposed in this paper can be applied.

25 1834 SHEN AND XU 3.3 Control objective The conventional control objective of ILC for a noise-free system is to construct an update law such that the generated input sequence can guarantee the asymptotical precise tracing to the desired reference, that is, the output y (t) can trac the given trajectory y d (t) asymptotically for the specified time instants. However, when dealing with stochastic systems, it is impossible to achieve this control objective because of the existence of the unpredictable stochastic noise variables w (t) and v (t).thatis, we cannot expect that y (t) y d (t), t, for stochastic systems, as the iterations increase to infinity. Note that the stochastic noise variables cannot be eliminated by any algorithm in advance; thus, the best achievable control objective should ensure that the desired reference can be precisely traced by the output with removing these stochastic noise variables. To this end, the control objective in this paper is to design an ILC algorithm guaranteeing the precise tracing of the input rather than the output, that is, u (t) u d (t) as, t = 0,, N τ. In fact, if we can guarantee that u (t) u d (t), then the following averaged index of tracing errors is minimized: 1 V t = lim sup n n n =1 e (t) 2 1 = lim sup n n n y d (t) y (t) 2, t τ. In addition, if the stochastic noise variables are absent, then the precise convergence of the input guarantees the precise convergence of the output. Moreover, when considering the stochastic systems, it is clear that all the inputs, states, and outputs are random variables. Even if the stochastic noise variables are removed, the random data dropouts also result in that the inputs, states, and outputs are random variables. Therefore, we should clarify the convergence meaning from the viewpoint of probability theory. Specifically, we have the following 3 types of convergence. Convergence in mathematical expectation: the input sequence {u (t)} is called to achieve convergence in mathematical expectation if lim Eu (t) =u d (t) for t = 0,, N τ. Mean square convergence: the input sequence {u (t)} is called to achieve mean square convergence if lim E u (t) u d (t) 2 = 0fort = 0,, N τ. Almost sure convergence: the input sequence {u (t)} is called to achieve almost sure convergence if lim u (t) =u d (t) with probability 1 for t = 0,, N τ. As is well nown, both mean square convergence and almost sure convergence imply the convergence in the mathematical expectation sense. Thus, if we establish either mean square convergence or almost sure convergence, then the convergence in mathematical expectation is a direct corollary. However, mean square convergence and almost sure convergence cannot imply each other generally. Therefore, in the rest of this paper, our analysis objective is to show the mean square convergence and almost sure convergence of the proposed ILC algorithms under all 3 data dropout models. =1 3.4 Preliminaries Lemma 1. Let {ϑ } be a sequence of positive real numbers and such that ϑ +1 (1 d 1 a )ϑ + d 2 a 2 (d 3 +ϑ ), (15) where d i > 0,i= 1,, 3, are constants, and a satisfies a > 0, =1 a =,and =1 a2 <,thenlim ϑ = The proof of this lemma is put in the Appendix for smooth readability. Lemma 2. Let X(n) and Z(n) be nonnegative stochastic processes (with finite expectation) adapted to increasing σ-algebra { n } and such that E{X(n + 1) n } X(n)+Z(n), (16) Then, X(n) converges almost surely, as n. 44 E[Z(n)] <. (17) n=1

26 SHEN AND XU CONVERGENCE OF THE NOISE-FREE LINEAR SYSTEM In this section, we consider the case that the stochastic noise variables are absent in Equation 1, that is, we consider the noise-free system, ie, x (t + 1) =A t x (t)+b t u (t), y (t) =C t x (t). (18) For such a system, the randomness is only resulted from the data dropouts, which provides us a concise view to address the influences of data dropouts and stochastic noise. The P-type ILC update law is designed as follows: u +1 (t) =u (t)+σγ (t +τ)l t e (t +τ), (19) for t = 0,, N τ,whereσ is a positive constant to be specified later, and L t R p q is the learning gain matrix for regulating the control direction. Remar 11. First, we emphasize again that the ILC update law is not limited to the classical P-type law, although we mainly focus on such type in this paper to mae a concise expression. Second, it is evident that the design of the positive constant σ can be blended into the design of L t. However, here, we provide the separated design procedure to elaborate on a clear design principle in the following analysis of this section as well as to provide a comparison with the design for the stochastic system case in the next section. Now, lift the input along the time axis as in Equation 6. The update law (19) can be rewritten as follows: U +1 = U +σγ LE, (20) where E = Y d Y, defined as in Equation 12 with ξ = 0, is the staced vector of the tracing errors for t =τ,, N, andγ and L are defined by γ (τ)i q γ (τ + 1)I Γ = q, (21) γ (N)I q L 0 L1 L =. (22) L N τ Clearly, Γ = diag{γ (τ), γ (τ + 1),, γ (N)} I q. Recalling that E = H(U d U ) and substituting this into Equation 20, we have We define Λ Γ LH and LH = U +1 = U +σγ LH(U d U ). (23) L 0 C τ A τ 1 1 B 0 0 p 0 p L 1 C τ+1 A τ 1 B 0 L 1 C τ+1 A τ 2 B 1 0 p L N τ C N A N 1 1 B 0 L N τ C N A N 1 2 B 1 L N τ C N A N 1 N τ+1 B N τ. (24) Since LH is a bloc lower triangular matrix, it is clear that the eigenvalue set of LH is a combination of the eigenvalue sets of L t C t+τ A t+τ 1 B t+1 t, t = 0,, N τ.moreover,γ is a bloc diagonal matrix; thus, Λ =Γ LH is also a bloc lower triangular matrix with all eigenvalues being the eigenvalues of its diagonal blocs. Specifically, the eigenvalue of Λ is either equal to the eigenvalue of LH or equal to zero, depending on whether the corresponding variable γ (t) is 1 or 0, respectively. Note that each γ (t) has 2 possible values, ie, 1 or 0, corresponding to that the data are successfully transmitted or not; thus, Γ has κ 2 N+1 τ possible outcomes due to the independence of γ (t) for different time instants. As a consequence, the newly defined Λ =Γ LH also has κ possible outcomes. Denote the set of all possible outcomes as S ={Λ (1),, Λ (κ) }. Without loss of generality, we denote Λ (1) = LH and Λ (κ) = 0 (N+1 τ)p, corresponding to the cases that all γ (t) are equal to 1 and 0, respectively. The other κ 2 alternatives are also bloc lower triangular matrices similar to LH but with one or more bloc rows of LH that are zero rows, corresponding to the time instants at which the pacets are lost during transmission. In other

27 1836 SHEN AND XU words, for a matrix Λ, if the data pacet at time instant t is lost during transmission, t τ, then the (t + 1 τ)th bloc row of Λ is a zero bloc row. Now, we give the design condition of the learning gain matrix L t,0 t N τ. Learning gain matrix condition: In order to ensure the convergence of the P-type update law (19), the learning gain matrix L t should satisfy that L t C t+τ A t+τ 1 B t+1 t is a Hurwitz matrix, where a square matrix M is called Hurwitz if all the eigenvalues of M are with negative real parts. Recalling the formulation of LH in Equation 24, we have that all eigenvalues of LH are with negative real parts if L t C t+τ A t+τ 1 B t+1 t is a Hurwitz matrix for t = 0,, N τ. By the Lyapunov theorem, for any negative definite matrix S with appropriate dimension, there is a positive definite matrix Q such that ((LH) T Q + QLH) =S. In the following, to facilitate the analysis, we let S = I. That is, there exists a positive matrix Q such that (LH) T Q + QLH = I. (25) Noting the difference between Λ (i) and LH, wehave ( Λ (i) ) T Q + QΛ (i) 0, (26) for i = 2,, κ 1. Define δu U d U. Subtracting both sides of Equation 23 from U d yields δu +1 = ( I (N+1 τ)p σλ ) δu. (27) In the following subsections, we will give the zero-error convergence proofs of Equation 27 for 3 models in turn. To show the convergence, it is sufficient to establish the inherent contraction mapping of I (N+1 τ)p σλ,inwhichλ is a random matrix. For the RSM, the contraction cannot hold for each iteration, and thus, a technical lemma is first provided to obtain the joint contraction along the iteration axis. For the BVM and the MCM, the contraction is verified for each iteration based on the probability properties of the statistic models. 4.1 RSM case When considering the RSM case, no statistical property of the data dropout can be accessed and used; however, the bounded-length assumption of successive data dropouts ensures a somewhat deterministic way for convergence analysis. To mae a clear insight of the influence of the RSM case of data dropouts, we rewrite Equation 27 as follows: Denote Now, we give an estimate of Φ +K 1, in the following lemma. δu +K = ( I σλ +K 1 ) ( I σλ ) δu. (28) Φ m,n = ( I σλ m ) ( I σλn ), m n. (29) Lemma 3. Consider the matrix product (29). If the learning matrix L t satisfies that L t C t+τ A t+τ 1 t+1 B t is a Hurwitz matrix and σ is small enough, then there exists a positive definite matrix Q such that Φ T +K 1, QΦ +K 1, ηq, 0 < η < 1,. (30) Proof. As previously explained, all Λ are bloc lower triangular matrices; thus, the summation of Λ is also a bloc lower triangular matrix. In other words, K 1 i=0 Λ +i is a bloc lower triangular matrix. Moreover, the RSM assumption of data dropouts implies that all the diagonal blocs of K 1 i=0 Λ +i are with positive real parts in their eigenvalues for 0, which further implies that there exists some positive constant c 1 > 0 such that ( K 1 ) T Λ +i Q + Q ( K 1 i=0 i=0 ) Λ +i c 1 I, 0. (31)

28 SHEN AND XU 1837 Now, revisit the recursion of Φ +K 1,, and we have Φ T +K 1, QΦ +K 1, = ( I σλ T ) ( I σλ T Q σ [ +σ 2 ( K 1 +K 1 ) T Λ +i Q + Q i=0 i, j +K 1 Λ T i QΛ j + ) Q ( I σλ+k 1 ) ( I σλ ) ( K 1 i=0 Λ +i) i<j +K 1 ] ) (QΛ i Λ j +Λ Tj ΛTi Q +. Note that Λ i LH and the possible combinations are finite due to the boundedness of K; thus, there exists a constant c 2 > 0 such that the last term on the right-hand side of the last inequality over σ 2 is bounded by c 2 I.Moreover,Q is a positive definite matrix; thus, there is a suitable constant c 3 > 0 such that c 3 Q I. Then, we have Φ T +K 1, QΦ +K 1, Q σc 1 I +σ 2 c 2 I Q [ σc 1 σ 2 c 2 ] c3 Q as long as σ is small enough such that σc 1 σ 2 c 2 > 0andσc 1 c 3 < 1. In such a case, denote η=1 [σc 1 σ 2 c 2 ]c 3,andit is clear that Φ T +K 1, QΦ +K 1, ηq, 0 < η < 1,. (32) The proof is completed. Remar 12. It is worth pointing out that the proof of Lemma 3 is quite technical; however, the inherent principle is not so complicated. Specifically, the introduction of the positive definite matrix Q is to mae well-defined expressions in the analysis. The contract effect of Φ +K 1, can be interpreted as follows. Λ is a bloc lower triangular matrix, and then, I σλ is a bloc lower triangular matrix with its eigenvalues being 1 σγ (t + τ)λ t,i,1 i p, 0 t N τ,whereλ t,i denotes the eigenvalue of L t C t+τ A t+τ 1 B t+1 t. Therefore, when γ (t +τ)=0, the corresponding eigenvalue of I σλ is 1, implying that no contraction occurs but neither any expansion occurs; when γ (t +τ)=1, the corresponding eigenvalue of I σλ is less than 1 provided that the eigenvalues λ t,i are positive and σ is small enough, implying a contraction. The bounded-length assumption on successive data dropouts actually guarantees the infinitely often contractions along the iteration axis. Remar 13. From the technical viewpoint, the parameter σ can be solved from the relationship σc 1 σ 2 c 2 > 0andσc 1 c 3 < 1, ie, σ < min{c 1 c 1 2, (c 1c 3 ) 1 }. Apparently, σ should be small enough when little information of the system is nown. However, a small σ value would render a large value of η, which limits the contraction effect. Thus, there is a trade-off in selecting parameter σ. In addition, the proof provides a rather conservative estimation of η while the actual contract influence is usually more efficient. With the help of Lemma 3, we can give the convergence for the input sequence now. Theorem 1. Consider the noise-free linear system (18) and the ILC update law (19), where the random data dropouts follow the RSM case. Assume Assumption 1 holds. Then, the input sequence {u (t)}, t= 0,, N τ,achievesbothmean square convergence and almost sure convergence to the desired input u d (t),t= 0,, N τ, if the learning gain matrix L t satisfies that L t C t+τ A t+τ 1 t+1 B t is a Hurwitz matrix and σ is small enough. Proof. The proof is carried out based on the inherent convergence principle that there exists at least one contraction during any K successive iterations. To this end, we can group the iteration number by a modulo operator with respect to K;thatis, all iterations are divided into K subsets, {ik + j, i 0},0 j K 1. Then, we show the strict contraction mapping of the input sequence with the subscripts valued in each subset given above. Define a weighted norm of δu as V = δu Q ( ) TQδU δu, which can be regarded as a Lyapunov function. Then, 0 j K 1, we have V ik+j = ( ) TQδUiK+j δu ik+j = ( Φ ik+j 1,(i 1)K+j δu (i 1)K+j ) TQΦiK+j 1,(i 1)K+j δu (i 1)K+j = ( δu (i 1)K+j ) TΦ T ik+j 1,(i 1)K+j QΦ ik+j 1,(i 1)K+jδU (i 1)K+j η ( δu (i 1)K+j ) TQδU(i 1)K+j =ηv (i 1)K+j, i 1,

29 1838 SHEN AND XU where Equation 27 and Lemma 3 are used. Consequently, we have E δu ik+j Q ηe δu (i 1)K+j Q, 0 j K 1, i 1. (33) Then, it directly leads to that E δu ik+j Q η i E δu j Q, 0 j K 1. (34) Meanwhile, following the same idea in Lemma 3, the weighted norms of the inputs for the first K iterations, ie, δu j with 0 j K 1, are bounded by the initial one. That is, 0 j K 1, we have V j = ( ) TQδUj δu j = ( ) T ( ) ( ) ( ) δu 0 I σλ T 0 I σλ T j 1) Q(I σλj 1 I σλ0 δu0 ( δu 0 ) TQδU0 = δu 0 Q, where U 0 denotes the initial input. Incorporating with Equation 34, we evidently derive E δu ik+j Q 0, 0 j K 1. (35) i Note that Q is a fixed positive definite matrix; therefore, a direct corollary of Equation 35 is that lim E δu 2 = 0. In other words, the mean square convergence of the update law is established. Next, we move to show the almost sure convergence. Recalling inequality (34) and noting that Q is a positive definite matrix, we have E δu ik+j 2 λ 1 min (Q)ηi E δu j Q, 0 j K 1, (36) where λ min ( ) denotes the smallest eigenvalue of its indicated matrix. It follows that E δu ik+j λ 1 2 min (Q)ηi 2( ) 1 2 E δu j Q which further yields i=0 i=0 λ 1 2 min (Q)( E δu 0 Q ) 1 2 i=0 η i 2 =λ 1 2 min (Q)( ) E δu 0 Q <, 1 η1 2 E δu = =0 K 1 Then, by the Marov inequality, for any ϵ > 0, we have P ( δu > ϵ ) =1 j=0 E δu ik+j <. i=0 =1 E δu ϵ This fact leads to P ( δu > ϵ, i.o. ) = 0 by the Borel-Cantelli lemma, ϵ > 0, where i.o. is short for infinitely often. That is, P ( lim δu = 0 ) = 1. In other words, δu converges to zero almost surely. This completes the proof. Remar 14. In this section, the noise-free system is taen into account; therefore, the precise convergence of the input sequence ensures that the system output y (t) can precisely trac the desired reference y d (t), t, with the help of Assumption 1. Moreover, it is noticed from Equation 34 that the update law (19) for the noise-free system ensures an exponential convergence speed. Meanwhile, such exponential convergence speed enables us to establish the almost sure convergence based on the Borel-Cantelli lemma. <. 4.2 BVM case When considering the BVM case, the technical lemma for the RSM case in the last subsection, ie, Lemma 3, is no longer valid due to the inherent randomness of data dropouts. However, in such case, the statistical property of the random variable γ (t) is valuable for establishing the convergence results. Moreover, in the BVM assumption, the data dropout variable γ (t) is independent along the iteration axis, that is, for different iteration numbers l, γ (t) is independent of γ l (t), t. Such independence will be used in the convergence analysis as follows.

30 SHEN AND XU 1839 Theorem 2. Consider the noise-free linear system (18) and the ILC update law (19), where the random data dropouts follow the BVM case. Assume Assumption 1 holds. Then, the input sequence {u (t)}, t= 0,, N τ,achievesbothmean square convergence and almost sure convergence to the desired input u d (t),t= 0,, N τ, if the learning gain matrix L t satisfies that L t C t+τ A t+τ 1 t+1 B t is a Hurwitz matrix and σ is small enough. Proof. We still apply the weighted norm of δu, V = δu Q =(δu ) T QδU. Then, we have V +1 = ( δu +1 ) TQδU+1 = ( δu ) T(I σλ ) T Q(I σλ )δu. (37) In the BVM case, the data dropout is independent along the iteration axis, while δu is constructed based on the information of the ( 1)th iteration; thus, I σλ is independent of δu in Equation 37. Consequently, taing the mathematical expectation to both sides of Equation 37 leads to that [ (δu ) ] T(I σλ ) T Q(I σλ )δu E δu +1 Q = E = E [ (δu ) TE ( ] (I σλ ) T Q(I σλ ) )δu. (38) Notice that ( ) ( E (I σλ ) T Q(I σλ ) = E Q σ ( Λ T Q + ) ) QΛT +σ 2 Λ T QΛT = Q σe ( Λ T Q + QΛT ) +σ 2 EΛ T QΛT. (39) Recalling the definition of Λ T, it is evident that EΛ = γlh. Incorporating with Equation 25 leads to that On the other hand, there exists a suitable constant c 4 > 0 such that E ( Λ T Q + QΛT ) = γi. (40) κ EΛ T QΛT = P ( Λ =Λ (i)) (Λ (i) ) T QΛ (i) c 4 I, (41) i=1 where P ( Λ =Λ (i)) denotes the probability that Λ is valued to be Λ (i) and κ i=1 P( Λ =Λ (i)) = 1. From Equations 39, 40, and 41, it follows that ( ) E (I σλ ) T Q(I σλ ) Q σ( γ σc 4 )I. (42) Using the fact that c 3 Q I given in the last subsection, where c 3 > 0 is a suitable constant, and substituting Equation 42 into Equation 38, we have E δu +1 Q ( ) 1 σ( γ σc 4 )c 3 E δu Q, (43) and consequently, we have a contraction mapping of E δu Q as E δu +1 Q η 1 E δu Q, 0 < η 1 < 1, where η 1 1 σ( γ σc 4 )c 3, as long as we select parameter σ to be small enough such that γ σc 4 > 0andσ γc 3 < 1. Following similar steps to the proof of Theorem 1, we can obtain mean square convergence and almost sure convergence of zero for the input error δu. This completes the proof. Remar 15. The condition on parameter σ is given by 2 inequalities, ie, γ σc 4 > 0andσ γc 3 < 1, which leads to σ < γc 1 and σ < γ 1 c 1.Since γ < 1, the second range can be reduced to σ < c 1. Thus, σ < min{ γc , c 1 3 }.Fromthis formulation, we find that the DDR, ie, the average level of data dropouts along the iteration axis, has an important influence on the selection of parameter σ. Roughly speaing, the smaller the DDR γ, the smaller the parameter σ. Meanwhile,aswe have previously explained, smaller selection of σ renders a slower convergence speed. This observation coincides with our intuitive recognition of the phenomenon that heavy data dropouts would lead to slower convergence of the ILC algorithms.

31 1840 SHEN AND XU 4.3 MCM case In this subsection, we move to consider the MCM case. The MCM case is more general than the BVM case as the independence property of γ (t) along the iteration axis is no longer valid in the MCM case. Consequently, the separation of δu and Λ in Equation 38 is not applicable for the MCM case. This is the motivation of the convergence analysis proposed in this subsection. In fact, our objective in this subsection is to derive a similar contraction mapping for the MCM case. With the same design condition of the learning gain matrices L t given above, we have the following theorem for the MCM case. Theorem 3. Consider the noise-free linear system (18) and the ILC update law (19), where the random data dropouts follow the MCM case. Assume Assumption 1 holds. Then, the input sequence {u (t)},t= 0,, N τ,achievesbothmean square convergence and almost sure convergence to the desired input u d (t),t= 0,, N τ, if the learning gain matrix L t satisfies that L t C t+τ A t+τ 1 t+1 B t is a Hurwitz matrix and σ is small enough. Proof. Note that the matrix Λ is valued from the set S ={Λ (1),, Λ (κ) }. We first point out that the evolution of Λ also forms a Marov chain. In the MCM case, the random variable γ (t) forms a Marov chain along the iteration axis, t. From the definition of the Marov chain, we obtain P ( γ (t) =r t γ 1(t) =r t 1,, γ ( 1(t) =r1) t = P γ (t) =r t γ 1(t) =r 1) t, r t {0, 1},, t. Moreover, for different time instants i j, γ (i) is independent of γ (j). Thus, we have P ( γ (τ) = r τ,, γ (N) =r N γ 1(τ) = r τ 1,, γ 1(N) =r N 1,, γ 1(τ) = r τ 1,, γ 1(N) =r N ) 1 = P ( γ (τ) = r τ,, γ (N) =r N γ 1(τ) = r τ 1,, γ 1(N) =r 1) N. Then, the evolution of Γ along the iteration axis can also be characterized by a Marov chain and so is Λ.Denotethe stationary transition probability matrix (p ij ) 1 i,j κ with p ij = P ( Λ =Λ ( j) Λ 1 =Λ (i)). It is evident that min 1 i κ p i1 > 0. Apply the weighted norm to δu, V = δu Q =δu T QδU. Then, we have V +1 = ( δu+1) T TQδU+1 = ( ) TQ ( ) δu σλ δu δu σλ δu = ( ) TQδU δu σ ( ) T [ δu Λ T Q + QΛ ] δu +σ 2( ) TΛ T δu QΛ δu. (44) Note that δu is no longer independent of Λ. In order to mae a separation, denote the σ-algebra =σ( x j (0), u j (i), y j (i), γ j (i), 1 j 1, 1 i N ) (ie, the set of all events induced by these random variables) for 1. Taing the conditional expectation to both sides of Equation 44 with respect to leads to E ( V +1 ) = V σ ( ) TE ( δu Λ T Q + QΛ ) δu +σ 2( ) TE ( δu Λ T QΛ ) δu. (45) Recalling the stationary transition probability matrix (p ij ) 1 i,j κ,wehavethat E ( Λ T Q + QΛ ) c5 I, (46) where c 5 > 0 is a suitable constant. On the other hand, note that Q is a positive definite matrix; thus, there exists c 6 > 0 such that E ( Λ T QΛ ) c6 I. (47) Using the fact that c 3 Q I and from Equations 45, 46, and 47, we have E ( V +1 ) ( 1 (σc5 σ 2 c 6 )c 3 ) V. (48) As a result, we can select σ to be small enough such that 0 < (σc 5 σ 2 c 6 )c 3 < 1. Then, taing the mathematical expectation to both sides of Equation 48 implies that EV +1 η 2 EV, (49) where η 2 1 (σc 5 σ 2 c 6 )c 3. This inequality further implies that lim E(δU ) T QδU = 0. Again, Q is a specified positive definite matrix; thus, we have lim E(δU ) T δu = 0. The mean square convergence is thus obtained.

32 SHEN AND XU 1841 Now, we move to show the almost sure convergence. In fact, from Equation 48, we have E ( V +1 ) (1 η2 )V V, (50) which yields that {V, 1} forms a supermartingale. Moreover, V is nonnegative for all 1. Then, by the martingale convergence theorem, 45 we have that V converges to a limit almost surely. When converging in both mean square sense and almost sure sense, the limitation should be identical. Therefore, V converges to zero almost surely, which further yields the convergence of δu to zero in the almost sure sense. This completes the proof. Remar 16. From the proofs of Theorems 1 to 3, we find that the convergence for the 3 cases follows the same inherent mechanism, that is, maing a contraction mapping of a weighted norm of the input error vectors (ie, δu Q ), which is also regarded as a Lyapunov function. The difference among the 3 cases lies in the iteration contraction length. Specifically, for the BVM and the MCM, contraction can be made for each iteration, whereas for the RSM, contraction can only be ensured jointly for K successive iterations, where K is defined in the RSM. 5 CONVERGENCE OF THE STOCHASTIC LINEAR SYSTEM In this section, we consider the stochastic linear system (1). We will give the mean square and almost sure convergence proofs for all 3 data dropout models. In addition, the update law is a slight variant of the classic P-type law (19), differing from the Kalman filtering based algorithms. 31 First, due to the existence of stochastic noise, the ILC update law (19) cannot guarantee stable convergence of the input sequence. Tae the lifted form of Equation 20 for an intuitive understanding of this limitation. If the input sequence {U } exists a stable convergence limitation, then taing the limitation to both sides of Equation 20 leads to lim U +1 = lim U + lim σγ LE. We can derive a simple corollary that lim E = 0. This corollary contradicts with the randomness of E in Equation 12. That is, the tracing error E consists of 2 parts, ie, H(U d U ) and ξ ; thus, it is impossible to derive lim E = 0. Moreover, by Assumption 2, the stochastic noise cannot be predicted and eliminated by any algorithm; thus, we have to impose an additional mechanism to reduce the effect of noise along the iteration axis. As a matter of fact, it is well nown that an appropriate decreasing gain for the correction term in updating processes is a necessary requirement to ensure convergence in the recursive computation for optimization, identification, and tracing of stochastic systems. 46,47 This fact is also illustrated in the ILC literature such as that by Saab, 31,35 in which the Kalman filtering based method is proposed to deal with the stochastic systems, and the recursively computed learning gain matrix decreases to zero along the iteration axis. Inspired by this recognition, we replace the design parameter σ in Equation 19 with a decreasing sequence to cope with the stochastic noise. Specifically, the ILC update law for the stochastic system is modified as follows: where the learning step-size {a } is a decreasing sequence satisfying that u +1 (t) =u (t)+a γ (t +τ)l t e (t +τ), (51) a (0, 1), a 0, a =, =1 =1 a 2 <, 1 a +1 1 a χ > 0. (52) Remar 17. The decreasing step-size is an additional mechanism to cope with stochastic noise. Clearly, a basic selection a =α meets all the requirements of Equation 52 where α > 0 is a tuning parameter. The inherent principle for introducing a is as follows. The tracing error E consists of 2 parts: the inaccurate tracing part HδU caused by the inaccurate input U and the stochastic noise part ξ. After sufficient learning iterations, it is believed that the inaccurate tracing part will significantly diminish such that the stochastic noise part dominates the tracing error. At this phase of the learning process, the decreasing step-size a will suppress stochastic noise to ensure stable convergence. Remar 18. As has been shown by many results in stochastic control and optimization, the introduction of a decreasing step-size slows down the convergence speed. This fact is due to that the suppression effect of a is imposed not only on the stochastic noise but also on the correction information. In fact, it is a classic trade-off between the stable zero-error convergence and convergence speed for stochastic control. Roughly speaing, the exponential convergence speed for the

33 1842 SHEN AND XU noise-free case is no longer guaranteed. We can only ensure asymptotical convergence for stochastic systems. One may tae interest in how to accelerate the convergence speed for practical applications. An acceleration approach with gain adaptation is given in the wor of Shen and Xu, 48 which can be incorporated in the proposed algorithm. However, this is out of our scope; thus, we omit the details. Similarly to the noise-free case, we lift the input along the time axis. The update law (51) is rewritten as follows: U +1 = U + a Γ LE, (53) where Γ and L are given in Equations 21 and 22. Substracting both sides of Equation 53 from U d, substituting the definition of E = H(U d U ) ξ (see Equation 12), and using the notation δu = U d U lead to δu +1 =(I a Λ )δu + a ξ, (54) where Λ =Γ LH is specified in the last section. Before proceeding to the detailed convergence analysis for the 3 cases, we need to declare that the design condition for the learning gain matrix L t remains the same as in the noise-free case. That is, the learning gain matrix L t should satisfy that L t C t+τ A t+τ 1 B t+1 t is a Hurwitz matrix. In the following subsections, we will give the detailed convergence analyses of Equation 54 for the 3 models in turn. Similarly to the noise-free system case, the convergence is inherently guaranteed by the contraction property of I a Λ.However,different from the noise-free system case, the sufficiently small constant σ is replaced by a decreasing gain sequence {a }. Consequently, the constant contraction for the noise-free system is no long valid for I a Λ. Indeed, for the RSM, an elaborate estimation on the joint contraction effect is first given, whereas for the BVM and the MCM, such contraction effect is verified according to their probability properties. Then, we establish the asymptotical convergence based on the preliminary lemmas given in Section RSM case Similar to the noise-free case, we first give a decreasing property for the multiple products of I a Λ and then show the convergence with the help of such technical lemma. Denote and Ψ m,m+1 I. Then, the estimate of Ψ m,n is given in the following lemma. Ψ m,n = ( I a m Λ m ) ( I an Λ n ), m n (55) Lemma 4. Consider the matrix product (55). If the learning gain matrix L t satisfies that L t C t+τ A t+τ 1 t+1 B t is a Hurwitz matrix, t, then there exist constants c 7 > 0 and c 8 > 0 such that, for m > n + K, we have ( Ψ m,n c 7 exp c 8 m i=n ) a i, n 1. (56) Proof. First, we recall that (LH) T Q + QLH = I and (Λ (i) ) T Q + QΛ (i) 0fori = 2,, κ (see Equations 25 and 26). The RSM assumption results in that ( K 1 ) T Λ +i Q + Q Moreover, from Equation 52, for 1 i K, wehave i=0 a m i a m 1 = a m i ( K 1 ) Λ +i c 1 I, 0. (57) i=0 ( 1 1 ) = O(a m ). (58) a m a m i

34 SHEN AND XU 1843 For any m n + K 1, we have Ψ T m,nqψ m,n =Ψ T m 1,n( I am Λ m ) TQ ( I am Λ m ) Ψm 1,n =Ψ T ) T ( ) TQ ( ) ( ) m K,n( I am K+1 Λ m K+1 I am Λ m am Λ m I am K+1 Λ m K+1 Ψm K,n [ ( m ) ] m =Ψ T m K,n Q a i Λ T i Q + a i QΛ i + o(a m ) Ψ m K,n i=m K+1 i=m K+1 { [( m ) ( m )] } Q a m Q + Q Λ i + o(a m ) Ψ m K,n, (59) =Ψ T m K,n Λ T i i=m K+1 i=m K+1 where equality (58) is invoed. Noticing 0 < a m < 1 for large enough m and using Equation 57, we have Ψ T m,nqψ m,n Ψ T m K,n( Q am c 1 I + o(a m ) ) Ψ m K,n Ψ T m K,n Q ( 1 2 I am c 1 Q 1 + o(a m ) ) Q 1 2 Ψ m K,n ( ) Ψ T m K,n Q 1 2 I c m 1 K Q 1 a i + o(a m ) Q 1 2 Ψ m K,n i=m K+1 ( ) 1 c m 1 K λ min(q 1 ) a i + o(a m ) Ψ T m K,n QΨ m K,n i=m K+1 ( ) m exp c 9 a i Ψ T m K,n QΨ m K,n (60) i=m K+1 for sufficiently large n,wherec 8 is a positive constant. Therefore, for sufficiently large n, for example, for n n 0 and m n + K, wehave ( ) m Ψ T m,nqψ m,n c 10 exp c 9 a i I with c 10 > 0, (61) which, by noticing the definition of Q > 0, implies ( Ψ m,n c 11 exp c 9 2 i=n ) m a i i=n with c 11 > 0. (62) Consequently, for n n 0 + K, n > 0, by Equation 62 and the definition Ψ m,m+1 I,wehave ( ) m Ψ m,n = Ψ m,n0 Ψ n0 1,n c 7 exp c 8 a i, (63) where c 7 is a suitable constant, and c 8 = c 9 2. The proof is completed. Remar 19. Comparing the estimations of the corresponding product (30) for the noise-free case and Equation 56 for the noise case, we can have a clear understanding of the difference between the fixed step-size σ and the decreasing step-size a. Specifically, these 2 estimations are consistent as if we replace the decreasing step-size a with the fixed but small enoughσ, estimation (56) actually turns into estimation (30). In other words, Equation 30 can be regarded as a special case of Equation 56. Now, we can move to show the convergence for the RSM case. Theorem 4. Consider the stochastic linear system (1) and the ILC update law (51), where the random data dropouts follow the RSM case. Assume Assumptions 1 and 2 hold. Then, the input sequence {u (t)}, t= 0,, N τ,achievesbothmean square convergence and almost sure convergence to the desired input u d (t),t= 0,, N τ, if the learning gain matrix L t satisfies that L t C t+τ A t+τ 1 t+1 B t is a Hurwitz matrix. Proof. The proof is carried out through grouping the iterations by a modulo operator with respect to K. To this end, all iterations are divided into K subsets, {ik + j, i 0}, 0 j K 1. Now, we chec the contraction for successive K iterations, that is, we chec the convergence for each subset. i=n

35 1844 SHEN AND XU From Equation 54, it follows, 0 j K 1, that K 1 δu ik+j =Ψ ik+j 1,(i 1)K+j δu (i 1)K+j + Ψ ik+j 1,(i 1)K+j+l+1 a (i 1)K+j+l ξ (i 1)K+j+l. (64) Apply the weighted norm V = δu Q =δu T QδU.Wehavethat where V ik+j =δu T ik+j QδU ik+j l=0 = ( Ψ ik+j 1,(i 1)K+j δu (i 1)K+j ) TQΨiK+j 1,(i 1)K+j δu (i 1)K+j + 2 ( Ψ ik+j 1,(i 1)K+j δu (i 1)K+j ) TQϕ +ϕ T Qϕ, (65) From the proof of Lemma 4, it follows that K 1 ϕ Ψ ik+j 1,(i 1)K+j+l+1 a (i 1)K+j+l ξ (i 1)K+j+l. (66) l=0 Ψ T ik+j 1,(i 1)K+j QΨ ik+j 1,(i 1)K+j (1 c 12 a ik+j 1 + c 13 a 2 ik+j 1 )Q, (67) which implies that ( ) TQΨiK+j 1,(i 1)K+j ΨiK+j 1,(i 1)K+j δu (i 1)K+j δu (i 1)K+j (1 c 12 a ik+j 1 + c 13 a 2 ik+j 1 ) δu (i 1)K+j Q. (68) Noticing that ϕ is a sum of random noise and that the noise variables are independent of the data dropout variables, we have E ( ) TQϕ Ψ ik+j 1,(i 1)K+j δu (i 1)K+j = ( ) TQ ( ) EΨ ik+j 1,(i 1)K+j δu (i 1)K+j Eϕ = ( ) TQE ( EΨ ik+j 1,(i 1)K+j δu (i 1)K+j E(ϕ (i 1)K+j 1 )) = 0, (69) where the σ-algebra is augmented from as =σ(x i(t), u i (t), y i (t), w i (t), v i (t), γ i (t),1 i, 0 t N).Moreover, by Assumption 2, the stochastic noise variables are conditionally independent along the iteration axis; thus, it follows that ( K 1 ) T ( K 1 ) Eϕ T Qϕ = E Ψ ik+j 1,(i 1)K+j+l+1 a (i 1)K+j+l ξ (i 1)K+j+l Q Ψ ik+j 1,(i 1)K+j+l+1 a (i 1)K+j+l ξ (i 1)K+j+l = E l=0 ( K 1 ) a 2 (i 1)K+j+l ξt (i 1)K+j+l ΨT ik+j 1,(i 1)K+j+l+1 QΨ ik+j 1,(i 1)K+j+l+1ξ (i 1)K+j+l l=0 ( K 1 a 2 (i 1)K+j+l c2 7 exp 2c 8 l=0 ik+j 1 i=(i 1)K+j+l+1 ) l=0 E ξ (i 1)K+j+l 2 a 2 ik+j 1 c 14, (70) where c 14 is a suitable constant such that c 14 c 2 7 sup E ξ 2 K 1 ( l=0 a 2 (i 1)K+j+l ik+j 1) a2. Taing the mathematical expectation to both sides of Equation 65 and substituting Equations 67 to 70, we have EV ik+j (1 c 12 a ik+j 1 )EV (i 1)K+j + c 13 a 2 ik+j 1 (EV (i 1)K+j + c 14 c 13 ), 0 j K 1. (71) Comparing Equation 71 with Equation 15 in Lemma 1, it is found that EV ik+j, a ik+j 1 (with respect to recursive index i), c 12, c 13,andc 14 c 13 correspond to ϑ +1, a (with respect to recursive index ), d 1, d 2,andd 3, respectively. Then, by Lemma 1, we have that lim i EV ik+j = 0, 0 j K 1. Moreover, incorporating with the fact that Q is a positive definite matrix, the mean square convergence is established for each subset of iteration number {ik + j, i 0}, ie, lim i E δu ik+j 2 = 0, 0 j K 1. The mean square convergence of the input sequence {U, 1} to the desired input U d is thus obvious. Next, we proceed to show the almost sure convergence of δu to zero. Taing a conditional expectation to Equation 65 with respect to σ-algebra, it follows that (i 1)K+j 1 E ( V ik+j (i 1)K+j 1) V(i 1)K+j + c 13 a 2 ik+j 1 (V (i 1)K+j + c 14 c 13 ), 0 j K 1. (72) Note that the 2 terms on the right-hand side of the last inequality, ie, V (i 1)K+j and c 13 a 2 ik+j 1 (V (i 1)K+j +c 14 c 13 ), correspond to X(n) and Z(n) in Lemma 2, respectively. Moreover, it has been shown that EV (i 1)K+j converges to zero as i ; thus, it is evident that i=0 [ ] E c 13 a 2 ik+j 1 (V (i 1)K+j + c 14 c 13 ) ( c 13 sup i EV (i 1)K+j + c 14 ) i=0 a 2 ik+j 1 <. (73)

36 SHEN AND XU 1845 In other words, the conditions in Lemma 2 are fulfilled. Therefore, it follows that V ik+j converges almost surely as i, j. On the other hand, we have shown that δu ik+j converges to zero in the mean square sense. Then, the almost surely convergent limitation of δu ik+j should also be zero. The proof is completed. 5.2 BVM case In this subsection, we give the convergence analysis for the BVM case. In such case, the deterministic contraction of the RSM is not valid; however, the independence of data dropouts would help establish the convergence similar to the last section. Theorem 5. Consider the stochastic linear system (1) and the ILC update law (51), where the random data dropouts follow the BVM case. Assume Assumptions 1 and 2 hold. Then, the input sequence {u (t)}, t= 0,, N τ,achievesbothmean square convergence and almost sure convergence to the desired input u d (t),t= 0,, N τ, if the learning gain matrix L t satisfies that L t C t+τ A t+τ 1 t+1 B t is a Hurwitz matrix. Proof. Let us recall update law (53) as follows: Subtracting both sides of the last equation from U d,wehave δu +1 =δu a Γ LE U +1 = U + a Γ LE. =δu a Γ LHδU + a Γ ξ =δu a γlhδu + a ( γi Γ ) LHδU + a Γ ξ. (74) Note that γi is the mathematical expectation of Γ. Now, let us apply the weighted norm of δu, V = δu Q,ie, V +1 =δu T +1 QδU +1 =δu T QδU + a 2 γ2 δu T ( ) TQLHδU LH + a 2 ξt ΓT QΓ ξ + a 2 δut (LH)T( γi ) )( ) Γ Q ( γi Γ LH δu [ (LH a γδu T ) ] TQ ( ) TQ )( ) + QLH δu + 2a δu a γlhδu ( γi Γ LH δu + 2a ( δu a γlhδu ) TQΓ ξ + 2a 2 δut ( LH ) T ( γi Γ ) QΓ ξ. (75) Note that U is constructed on the basis of the data from the ( 1)th iteration; thus, it is independent of the data dropout variable at the th iteration, ie, Γ. This fact gives that [ (δu ) TQ )( ) ] E a γlhδu ( γi Γ LH δu = 0. (76) Similarly, the independence of U, Γ,andξ yields [ (δu ) ] TQΓ E a γlhδu ξ = 0, (77) [ E δu T ( ) T ) ] LH ( γi Γ QΓ ξ = 0, (78) where Assumption 2 is applied. Taing the mathematical expectation to both sides of Equation 75 and substituting Equations 76 to 78 as well as the Lyapunov equation (LH) T Q + QLH = I, wehave EV +1 = EV a γeδu T δu + a 2 γ2 E [ δu T ( ) ] TQLHδU LH + a 2 E [ ξ T ΓT QΓ ] ξ + a 2 E [ δu T (LH)T( γi ) )( ) ] Γ Q ( γi Γ LH δu. (79) According to Assumption 2, there exists a suitable constant c 15 > 0 such that E [ ξ T ΓT QΓ ξ ] < c15. (80) Moreover, due to the positive definite property of Q,therearec 16 > 0andc 17 > 0 such that [ E δu T ( ) ] TQLHδU LH c 16 EV, (81) E [ δu T (LH)T( γi Γ ) Q ( γi Γ )( LH ) δu ] c17 EV. (82)

37 1846 SHEN AND XU Substituting Equations 80 to 82 and the inequality c 3 Q I into Equation 79 leads to EV +1 (1 a γ) EV + a 2 ( c15 +( γ 2 ) c 16 + c 17 )EV. (83) Then, it is evident that EV corresponds to ϑ in Lemma 1. Applying Lemma 1, it follows that lim EV = 0, which further implies that E δu 2 = 0 by the fact that Q is a positive definite matrix. The mean square convergence is thus obtained. Next, we proceed to show the almost sure convergence with the help of Lemma 2. To this end, taing the conditional expectation to both sides of Equation 75 with respect to, it follows that 1 E ( V +1 1) V + a 2 ( c15 +( γ 2 ) c 16 + c 17 )V. (84) Condition (17) is easy to verify for the last term of the above inequality with the help of the mean square convergence. Therefore, by using Lemma 2, it gives that δu converges almost surely. Similar to the steps in the proof for Theorem 4, the almost sure convergence of the input sequence {U } is verified. This completes the proof. 5.3 MCM case In this subsection, the convergence analysis for the MCM case is given. As previously explained, the inherent difference between the BVM case and the MCM case is the iteration dependence of the data dropout in the MCM case. Thus, the proof can be carried out by modifying the step of taing the mathematical expectation in the proof of Theorem 5 as a conditional expectation. In the following, we will only give the main setch of the proof, to save space. Theorem 6. Consider the stochastic linear system (1) and the ILC update law (51), where the random data dropouts follow the MCM case. Assume Assumptions 1 and 2 hold. Then, the input sequence {u (t)}, t= 0,, N τ,achievesbothmean square convergence and almost sure convergence to the desired input u d (t),t= 0,, N τ, if the learning gain matrix L t satisfies that L t C t+τ A t+τ 1 t+1 B t is a Hurwitz matrix. Proof. Note that the data dropout is not independent along the iteration axis; thus, it is unsuitable to derive an expectation of Γ as we have done in Equation 74. In fact, the expression for δu +1 is and then, the expansion of V +1 is formulated as δu +1 =δu a Γ LHδU + a Γ ξ, (85) V +1 = V + a 2 ( ) TΓ δut LH T QΓ LHδU + a 2 ξt ΓT Γ ξ a δu T [ + a δu T QΓ ξ +ξ T ΓT QδU ] a 2 where Λ =Γ LH has been previously defined. Note that ξ is independent of other signals; thus, we have [ Λ T Q + QΛ ] δu [ δu T ΛT QΓ ξ +ξ T ΓT QΛ δu ], (86) E ( δu T QΓ ξ +ξ T ΓT QδU 1) = 0, (87) E ( δu T ΛT QΓ ξ +ξ T ΓT QΛ δu 1) = 0. (88) Moreover, Γ is a bounded matrix; thus, there exists c 18 > 0 such that E ( ξ T ΓT Γ ξ 1) < c18. (89) Furthermore, δu is adapted to according to the definition of update law, whereas the probability transition matrix 1 for the stochastic matrix Λ has a positive probability of returning to Λ (1),ie,min 1 i κ P(Λ =Λ (1) Λ 1 =Λ (i) ) > 0(see Section 4.3); therefore, there exists a constant c 19 > 0 such that E ( Λ T Q + QΛ 1) c19 I. (90) Using Equations 87 to 90, we are able to derive from Equation 86 that where c 20 > 0 is a constant such that E ( V +1 1) V a c 19 c 3 V + a 2 c 20V + a 2 c 18, (91) ( E δu T ( ) ) TΓ LH T QΓ LHδU 1 c 20 V.

38 SHEN AND XU 1847 Then, taing the mathematical expectation to Equation 91, we have EV +1 (1 a c 19 c 3 )EV + a 2 (c 18 + c 20 EV ). (92) From now on, the steps are similar to the proof of Theorem 5. The mean square and almost sure convergence of the input sequence {U } to the desired input U d can be obtained with the help of Lemmas 1 and 2. The proof is thus completed. 5.4 Further remars Remar 20. From the proofs in this section and in the last section, we may find the technical connections and differences among the 3 different cases. Specifically, the proof for the BVM case forms a basic procedure of the technical convergence analysis, in which, by taing the expectation of the random variable of the data dropouts, we can derive a separation formula of the input error δu +1, namely, the contraction mapping of the input error and 2 additional zero-expectation errors (see Equation 74 for details). Then, the convergence proof can be established with the help of Lemmas 1 and 2. That is, the additive formulation (74) plays a basic role for the following analysis. For the RSM case and the MCM case, this basic formula should be modified accordingly. In particular, for the RSM case, the 1-step/iteration contraction relationship in the BVM case (ie, (I a γlh)δu in Equation 74) has to be extended to K-step/iteration contraction relationship (see Equation 64 for a clear expression). This is also why we have to give a technical lemma for estimating Ψ m,n (see Lemma 4) before stating the main theorem. For the MCM case, the specific separation of Equation 74 cannot be derived due to the iteration dependence of data dropouts, but a similar contraction relationship can be obtained by taing a conditional expectation. In fact, this is the inherent difference between the BVM case and the MCM case, which originates from the model differences of the BVM and the MCM. To sum up, the RSM case and the MCM case are extensions of the BVM case from different aspects, and additional treatments are developed to complete the convergence analysis. Remar 21. The essential step in the above proofs is to establish a decreasing trend of the weighted norm of the input error, ie, EV = E δu Q. For the noise-free system, it is seen that a monotonic decreasing trend for EV is derived, and the exponential convergence speed is thus guaranteed. Then, the almost sure convergence can be derived by applying the Borel-Cantelli lemma. For the noised system or the stochastic system, due to the existence of stochastic noise, it is impossible to reach the monotonic decreasing trend for EV. However, a wea version of the decreasing trend can be established, ie, EV +1 (1 a d 1 )EV + a 2 d 2(EV + d 3 ). This formula implies that the main trend of EV is still decreasing, as shown by the term (1 a d 1 )EV, but it could be involved with a faster attenuation term, as shown by the term a 2 d 2(EV +d 3 ).Insuch case, we can still ensure the convergence to zero of EV. Moreover, the almost sure convergence is established with a wea version of the convergence theorem for a nonnegative supermartingale sequence. Specifically, let us revisit Lemma 2. If Z(n) is removed from Equation 16, then X(n) (corresponding to V in the following theorems) forms a supermartingale, and the almost sure convergence is thus guaranteed. Lemma 2 implies that the almost sure convergence is still true for X(n) as long as the infinite sum of the additional term Z(n) is finite in the mathematical expectation sense. To sum up, the mean square convergence and the almost sure convergence for the 3 models are established in a framewor based on 2 technical lemmas. Remar 22. It should be pointed out that although we do not provide a similar estimate of Ψ m,n for the BVM and MCM cases, the estimate in Lemma 4 is also valid for the latter 2 cases with m n. Such derivations can be made following similar steps to the proof of Lemma 4. In fact, the similar conclusions have been merged into the convergence proofs for the latter cases (see the steps for deriving Equations 79 and 91). In fact, it is the decreasing property of Ψ m,n that essentially guarantees the convergence of the algorithms. In addition, the estimation of Ψ m,n also implies the convergence speed of the stochastic system case. Specifically, we may write that Ψ m,1 = O ( exp ( β m =1 a )). Let us select an alternative for {a } as an illustration, ie, a = 1. It is well nown that the mth harmonic number has an estimate m =1 a = O ( log m ). Then, we have Ψ m,1 = O ( exp ( β log m )) = O ( ) m β. This rate of convergence coincides with the basic nowledge of stochastic control. Remar 23. From the design condition of the learning gain matrix, it is seen that the critical components for ensuring the convergence are the diagonal blocs of LH, ie,l t C t+τ A t+τ 1 B t+1 t, denoting the input/output coupling matrix, whereas the nondiagonal blocs of LH have little influence on the essential convergence. From this viewpoint, the results of this paper can be extended to affine nonlinear systems without significant efforts. The extension from linear systems to affine nonlinear systems has been reported in many existing papers. Here, we omit the tedious discussions to avoid repetition.

39 1848 SHEN AND XU 6 ILLUSTRATIVE EXAMPLES The main objective of this paper is to propose a general framewor for the convergence analysis of ILC algorithms under various inds of data dropout models. In the last 2 sections, the detailed analysis steps and techniques are elaborated. In this section, we verify the theoretical results with a time-varying linear system (A t, B t, C t ) where [ 0.2exp( t 100) ] A t = sin(t), B t = [ 00.3sin(t) 1 ] T, C t = [ ]. ( ) ( ) The iteration length is set to be N = 100. The tracing reference is y d (t) =0.5sin t π sin t π. The initial state for all the iterations is set to x (0) =x d (0) =0. The algorithm is run for 150 iterations for each case. It should be noted that the actual tracing performance and the convergence speed depend on the average DDR. In the RSM, the assumption is made according to the worst case of successive data dropouts rather than the average DDR. In other words, a larger integer K does not necessarily imply a larger DDR. In the BVM, the expectation of the random variable γ (t) corresponds to the average DDR. In the MCM, the DDR is jointly determined by the transition probability matrix, that is, it can be computed by deriving the stationary distribution. Specifically, the 3 models are simulated as follows. RSM We consider 5 cases for the RSM case. To simulate the data missing, we first separate the iterations into groups of M successive iterations, M = 2,, 6, that is, the iterations are separated as {M+1, M+2,,(+1)M}, = 0, 1, 2,, and randomly select 1 iteration from each group denoting the one whose data are dropped during transmission. In such case, the successive number K is 3. For example, tae M = 3, then the iterations are separated as {1, 2, 3}, {4, 5, 6},, and from each group, 1 iteration is selected randomly. Therefore, the DDR for the above 5 cases is equal to 1 2, 1 3, 1 4, 1 5, and 1 6, respectively. BVM We consider 4 cases for the mathematical expectation of the random variable γ (t). Thatis, γ =0.9, 0.7, 0.5, and 0.3. The smaller the expectation is, the larger the DDR is. Specifically, the DDR is equal to 1 γ. As a consequence, the DDR values for the above 4 cases are 0.1, 0.3, 0.5, and 0.7. Additionally, the no-data-dropout case, namely, γ =1, is also simulated for a comparison. MCM We consider 4 cases for the transition probability matrix as follows: μ=0.8, ν=0.5; μ=0.7, ν=0.5; μ=0.6, ν=0.6; μ=0.5, ν=0.7. The stationary distribution π for a transition probability matrix P can be computed from πp =πand is given as π= [ 1 ν 2 μ ν, 1 μ 2 μ ν ]. Thus, the average DDR is equal to 1 μ. As a consequence, the DDR values for the above 2 μ ν 4 cases are 2 7, 3 8, 1 2, and 5 8. Additionally, the no-data-dropout case, namely, μ=1andν=0, is also simulated for a comparison. We first chec the noise-free system case. In this case, we set σ=0.4 andl t = 1. The simulation is run according to the 3 data dropout models. The maximal tracing error for each iteration is defined as max 1 j N e ( j). The maximal tracing error profiles along the iteration axis are plotted in Figure 3. The Figure exhibits 2 observations. One is that the convergence speed slows down as the DDR increases, that is, larger DDR would result in slower convergence speed. The other one is that the maximal tracing error profiles approximate straight lines in the logarithm axis, which demonstrates that the convergence is exponential when no noise is involved in the system. When the system is involved with random noise, an additional decreasing learning sequence should be introduced to the ILC rule to guarantee stable convergence of the proposed algorithms. The tracing performance is shown in Figure 4, where the random noise is assumed to be white Gaussian noise, namely, subject to (0, σ 2 ) with σ=0.1. In the simulation, the learning gain is set as L t = 1.5, and the decreasing sequence selects a = 1. We have some observations from Figure 4. First of all, +1 due to the existence of random noise, the final tracing error cannot reduce to zero as the iteration number increases, and the maximal tracing error profiles would fluctuate heavily. Moreover, the introduction of {a } maes the convergence speed much slower than in the noise-free case. However, it is a natural requirement for the control of stochastic systems. In addition, the influence of DDR on the convergence speed is similar to that of the noise-free case, which implies that stochastic noise and random data dropouts impact the performance independently.

40 SHEN AND XU Case 1 Case 2 Case 3 Case 4 Case No dropout Case 1 Case 2 Case 3 Case 4 Max tracing error Max tracing error iteration axis (A) RSM case iteration axis (B) BVM case 10 0 No dropout Case 1 Case 2 Case 3 Case 4 Max tracing error iteration axis (C) MCM case FIGURE 3 Maximal tracing error profiles for the noise-free system along the iteration axis. A, Random sequence model (RSM) case where Cases 1 to 5 correspond to the data dropout rate (DDR) being 1 2, 1 3, 1 4, 1 5, and 1 6, respectively; B, Bernoulli variable model (BVM) case where Cases 1 to 4 correspond to the DDR being 0.1, 0.3, 0.5, and 0.7, respectively; C, Marov chain model (MCM) case where Cases 1 to 4 correspond to the DDR being 2 7, 3 8, 1 2, and 5 8, respectively [Colour figure can be viewed at wileyonlinelibrary.com] Comparing Figures 3 and 4, we can observe the connections and differences of the tracing performance between the 2 cases. On one hand, the convergence speed is determined by the average DDR for both cases as DDR is a direct index for the renewal frequency. On the other hand, the tracing precision depends much on the DDRs in the noise-free system case, whereas such dependence is not distinct for the stochastic system case because the stochastic noise will dominate the tracing error after several first iterations. In addition, the convergence speed for the stochastic system case greatly slows down due to the introduction of the decreasing gain sequence. To sum up, the simulation results verify the theoretical results given in previous sections. Moreover, the convergence speed is determined by the selection of learning gain matrices as well as the DDR, where the former is a tunable factor, and the latter is an external factor due to the transmission quality of the channels. This paper is devoted to establishing a general convergence analysis framewor for ILC under various data dropout models; thus, we mainly employ the basic simulations to show a validation of the theoretical results.

41 1850 SHEN AND XU 10 0 Case 1 Case 2 Case 3 Case 4 Case No dropout Case 1 Case 2 Case 3 Case 4 Max tracing error 10 1 Max tracing error iteration axis (A) RSM case iteration axis (B) BVM case No dropout Case 1 Case 2 Case 3 Case 4 Max tracing error iteration axis (C) MCM case FIGURE 4 Maximal tracing error profiles for the noised system along the iteration axis. A, Random sequence model (RSM) case where Cases 1 to 5 correspond to the data dropout rate (DDR) being 1 2, 1 3, 1 4, 1 5, and 1 6, respectively; B, Bernoulli variable model (BVM) case where Cases 1 to 4 correspond to the DDR being 0.1, 0.3, 0.5, and 0.7, respectively; C, Marov chain model (MCM) case where Cases 1 to 4 correspond to the DDR being 2 7, 3 8, 1 2, and 5 8, respectively [Colour figure can be viewed at wileyonlinelibrary.com] 7 CONCLUSIONS In this paper, we have considered the convergence analysis for ILC under random data dropout environments. To this end, a framewor was given to demonstrate both mean square and almost sure convergence properties of the classic P-type ILC update law for 3 inds of data dropout models. Specifically, the RSM, the BVM, and the MCM were addressed in turn for both noise-free systems and stochastic systems, respectively. While we dealt with the case that the networ at the measurement side suffers random data dropouts to clarify our idea, the extension to the case that the networs at both sides suffer random data dropouts directly follows the same analysis framewor. In addition, the results can be extended to other types of ILC algorithms such as PD-type and current-iteration-feedbac-integrated type update laws. For further research, we find that the transmission of data through networs would suffer many problems such as transmission error, bandwidth limitation, and transmission delay; therefore, it is of great interest to investigate on how to generalize the proposed results to deal with more general networed ILC problems.

42 SHEN AND XU 1851 ACKNOWLEDGEMENTS This wor was supported by the National Natural Science Foundation of China ( and ), the Beijing Natural Science Foundation ( ), and the China Scholarship Council ( ). ORCID Dong Shen REFERENCES 1. Arimoto S, Kawamura S, Miyazai F. Bettering operation of robots by learning. J Robot Syst. 1984;1(2): Bristow DA, Tharayil M, Alleyne AG. A survey of iterative learning control: A learning-based method for high-performance tracing control. IEEE Control Syst Mag. 2006;26(3): Ahn HS, Chen YQ, Moore KL. Iterative learning control: survey and categorization from 1998 to IEEE Trans Syst Man Cybern-Part C. 2007;37(6): Shen D, Wang Y. Survey on stochastic iterative learning control. J Process Control. 2014;24(12): Li X, Huang D, Chu B, Xu J-X. Robust iterative learning control for systems with norm-bounded uncertainties. Int J Robust Nonlinear Contr. 2016;26: Li J, Li J. Distributed adaptive fuzzy iterative learning control of coordination problems for higher order multi-agent systems. Int J Syst Sci. 2016;47(10): Meng D, Moore KL. Learning to cooperate: networs of formation agents with switching topologies. Automatica. 2016;64: Xiong W, Yu X, Patel R, Yu W. Iterative learning control for discrete-time systems with event-triggered transmission strategy and quantization. Automatica. 2016;72: Son TD, Pipeleers G, Swevers J. Robust monotonic convergent iterative learning control. IEEE Trans Autom Control. 2016;61(4): Xiong W, Ho DWC, Yu X. Saturated finite interval iterative learning for tracing of dynamic systems with HNN-structural output. IEEE Trans Neural Netw Learn Syst. 2016;27(7): Wei Y-S, Li X-D. Iterative learning control for linear discrete-time systems with high relative degree under initial state vibration. IET Control Theory Appl. 2016;10(10): Li X, Ren Q, Xu J-X. Precise speed tracing control of a robotic fish via iterative learning control. IEEE Trans Ind Electron. 2016;63(4): Xiong W, Yu X, Chen Y, Gao J. Quantized iterative learning consensus tracing of digital networs with limited information communication. IEEE Trans Neural Netw Learn Syst. accepted for publication. 2017;28(6): Ahn HS, Chen YQ, Moore KL. Intermittent iterative learning control. Paper presented at: Proceedings of the 2006 IEEE International Symposium on Intelligent Control; 2006; Munich, Germany. 15. Ahn HS, Moore KL, Chen YQ. Discrete-time intermittent iterative learning controller with independent data dropouts. Paper presented at: Proceedings of the 2008 IFAC World Congress; 2008; Coex, South Korea. 16. Ahn HS, Moore KL, Chen YQ. Stability of discrete-time iterative learning control with random data dropouts and delayed controlled signals in networed control systems. Paper presented at: Proceedings of the IEEE International Conference Control Automation, Robotics, and Vision; 2008; Hanoi, Vietnam. 17. Bu X, Hou Z-S, Yu F. Stability of first and high order iterative learning control with data dropouts. Int J Control Autom Syst. 2011;9(5): Bu X, Yu F, Hou Z-S, Wang F. Iterative learning control for a class of nonlinear systems with random pacet losses. Nonlinear Anal: Real World Appl. 2013;14(1): Bu X, Hou Z-S, Yu F, Wang F. H- iterative learning controller design for a class of discrete-time systems with data dropouts. Int J Syst Sci. 2014;45(9): Bu X, Hou Z, Jin S, Chi R. An iterative learning control design approach for networed control systems with data dropouts. Int J Robust Nonlinear Control. 2016;26: Huang L-X, Fang Y. Convergence analysis of wireless remote iterative learning control systems with dropout compensation. Math Probl Eng. 2013;2013:1-9. Article ID Liu J, Ruan X. Networed iterative learning control approach for nonlinear systems with random communication delay. Int J Syst Sci. 2016;47(16): Liu C, Xu J-X, Wu J. Iterative learning control for remote control systems with communication delay and data dropout. Math Probl Eng. 2012;2012:1-14. Article ID Shen D, Wang Y. ILC for networed discrete systems with random data dropouts: A switched system approach. Paper presented at: Proceedings of the 33rd Chinese Control Conference; 2014; Nanjing, China. 25. Shen D, Zhang C, Xu Y. Two compensation schemes of iterative learning control for networed control systems with random data dropouts. Inform Sci. 2017;381: Shen D, Zhang C, Xu Y. Intermittent and successive ILC for stochastic nonlinear systems with random data dropouts. Asian J Control. accepted for publication.

43 1852 SHEN AND XU 27. Shen D, Xu JX. A novel Marov chain based ILC analysis for linear stochastic systems under general data dropouts environments. IEEE Trans Autom Control. accepted for publication. 28. Pan Y-J, Marquez HJ, Chen T, Sheng L. Effects of networ communications on a class of learning controlled non-linear systems. Int J Syst Sci. 2009;40(7): Shen D, Wang Y. Iterative learning control for networed stochastic systems with random pacet losses. Int J Control. 2015;88(5): Shen D, Wang Y. ILC For networed nonlinear systems with unnown control direction through random Lossy channel. Syst Control Lett. 2015;77: Saab SS. A discrete-time stochastic learning control algorithm. IEEE Trans Autom Control. 2001;46(6): Chen H-F. Almost sure convergence of iterative learning control for stochastic systems. Sci China (Series F). 2003;46(1): Huang SN, Tan KK, Lee TH. Necessary and sufficient condition for convergence of iterative learning algorithm. Automatica. 2002;38(7): Meng D, Jia Y, Du J, Yu F. Necessary and sufficient stability condition of LTV iterative learning control systems using a 2-D approach. Asian J Control. 2011;13(1): Saab SS. Selection of the learning gain matrix of an iterative learning control algorithm in presence of measurement noise. IEEE Trans Autom Control. 2005;50(11): Meng D, Moore KL. Robust iterative learning control for nonrepetitive uncertain systems. IEEE Trans Autom Control. 2017;62(2): Meng D, Moore KL. Convergence of iterative learning control for SISO nonrepetitive systems subject to iteration-dependent uncertainties. Automatica. 2017;79: Chen Y, Wen C, Gong Z, Sun M. An iterative learning controller with initial state learning. IEEE Trans Autom Control. 1999;44(2): Sun M, Wang D. Initial shift issues on discrete-time iterative learning control with system relative degree. IEEE Trans Autom Control. 2003;48(1): Lin H, Antsalis PJ. Stability and persistent disturbance attenuation properties for networed control systems: switched system approach. Int J Control. 2005;78(18): Sinopoli B, Schenato L, Franceschetti M, Poolla K, Jordan MI, Sastry SS. Kalman filtering with intermittent observations. IEEE Trans Autom Control. 2004;49(9): Shi Y, Yu B. Output feedbac stabilization of networed control systems with random delays modeled by Marov chains. IEEE Trans Autom Control. 2009;54(7): Shen D, Zhang W, Wang Y, Chien C-J. On almost sure and mean square convergence of P-type ILC under randomly varying iteration lengths. Automatica. 2016;63(1): Tsitsilis JN, Bertesas DP, Athans M. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans Autom Control. 1986;31(9): Hall P, Heyde CC. Martingale Limit Theory and Its Applications. New Yor: Academic Press; Benveniste A, Métivier M, Priouret P. Adaptive Algorithms and Stochastic Approximations. New Yor: Springer-Verlag; Caines PE. Linear Stochastic Systems. New Yor: Wiley; Shen D, Xu JX. A new iterative learning control algorithm with gain adaptation for stochastic systems. Submitted. 49. Hassibi A, Boyd SP, How JP. Control of asynchronous dynamical systems with rate constraints on events. Paper presented at: Proceedings of the 38th IEEE Conference on Decision and Control; 1999; IEEE, Phoenix, USA. How to cite this article: Shen D, Xu J-X. A framewor of iterative learning control under random data dropouts: Mean square and almost sure convergence. Int J Adapt Control Signal Process. 2017;31: APPENDIX Proof of Lemma 1 From Equation 15, we have ϑ +1 (1 d 1 a + d 2 a 2 )ϑ + d 2 d 3 a 2. (A1) Since a 0, we can choose a sufficient large integer 0 such that 1 d 1 a + d 2 a 2 < 1forall 0,andthen,wehave ϑ +1 ξ + d 4 a 2, (A2) where d 4 d 2 d 3. As a result, it follows from Equation A2 and =1 a2 < that sup ϑ <, andthen,ϑ converges. Based on this boundedness, from Equation A1, we have that ϑ +1 (1 d 1 a )ξ + d 5 a 2, where d 5 > 0 is a suitable constant. Noticing that =1 a = and =1 a2 <, we conclude that lim ϑ = 0. (A3)

44 Received: 2 July 2016 Revised: 11 April 2017 Accepted: 20 June 2017 DOI: /acs.2799 RESEARCH ARTICLE Distributed adaptive iterative learning control for nonlinear multiagent systems with state constraints D. Shen 1 J.-X. Xu 2 1 College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , China 2 Department of Electrical and Computer Engineering, National University of Singapore, Singapore Correspondence J.-X. Xu, Department of Electrical and Computer Engineering, National University of Singapore, Singapore elexujx@nus.edu.sg Funding information National Natural Science Foundation of China, Grant/Award Number: and ; Beijing Natural Science Foundation, Grant/Award Number: ; China Scholarship Council, Grant/Award Number: Summary This paper addresses the consensus problem of nonlinear multiagent system with state constraints. A novel γ-type barrier Lyapunov function is adopted to handle with the bounded constraints. The iterative learning control strategy is introduced to estimate the unnown parameter and basic control signal. Five control schemes are designed, in turn, to address the consensus problem comprehensively from both theoretical and practical viewpoints. These schemes include the original adaptive scheme, projection-based scheme, smooth function-based scheme and its alternative, and dead-zone lie scheme. The consensus convergence and constraints guarantee are strictly proved for each control scheme by using the barrier composite energy function approach. Illustrative simulations verify the theoretical analysis. KEYWORDS alignment condition, barrier function, composite energy function, iterative learning control, multiagent systems 1 INTRODUCTION 1.1 Bacgrounds on MAS and ILC In the past decade, the multiagent system (MAS) coordination and control problem has attracted much attention from the control community, and much progress has been obtained from different viewpoints, including formation control, synchronization of coupled oscillators, flocing, swarm tracing, area coverage, and containment control. The consensus framewor provides a useful approach to solve such inds of problems. 1 The objective of distributed consensus is to reach an agreement of a common value or trajectory for some variables of interest associated with the agents. The leader-follower formulation is one of the most common framewors to achieve consensus. In this consensus realization, the leader information is only available to a part of agents. Meanwhile, each agent can only use the information of its neighborhood due to the communication distance constraint or measurement limitation from onboard sensors. Therefore, the algorithms used in consensus framewor are typically distributed, which are more robust and extensible compared with the centralized algorithms. For the setting of the consensus problem, triple components are involved, ie, agent model, information exchange topology, and distributed consensus algorithm. In the literature, lots of consensus results have been reported for the single integrator model, 2,3 double integrator model, 4-6 high-order integrator model, 7 linear system, 8,9 nonlinear system, and systems with additional hard nonlinearities such as saturation. 15 It is noted that strict-feedbac system is one class of the most popular nonlinear formulations The function approximation methods were adopted in the wors of Yoo 13 and Cui et al 15 to transform the nonlinear functions into the parameterized type. The information exchange topology is an indispensable component to achieve consensus. It is usually modeled by a graph. As communication among agents is an important topic in MAS literature, various communication assumptions and consensus results are investigated. 16,17 Finally, the consensus algorithm is a simple local Int J Adapt Control Signal Process. 2017;31: wileyonlinelibrary.com/journal/acs Copyright 2017 John Wiley & Sons, Ltd. 1779

45 1780 SHEN AND XU coordination rule that can result in complex and useful group-level behaviors. Thus, it is the design emphasis according to practical problems Recently, some scholars have made some developments on the multiagent consensus and formation problem by the iterative learning control (ILC) strategy, since ILC is an intelligent and matured control technique to achieve high-precision tracing performance, as surveyed in the wors of Ahn et al, 21 Shen and Wang, 22 and Xu. 23 As a matter of fact, the philosophy of ILC mimics the human learning process. For such strategy, the tracing information of previous iterations and the desired trajectory are used to generate the input signal for the current iteration, so that the tracing performance could be improved as the learning iteration number increases. After three decades of developments, ILC has shown its distinct advantages in handling with high-nonlinearity, complexity, and high-precision tracing problems Some pioneering wors have been reported on ILC for MAS. The first result on this topic was given in the wor of Ahn and Chen, 32 where the authors considered the formation control problem. Later, the application research proceeded on the satellite trajectory-eeping problem, 33 mobile robots formation problem, 34 coordinated train trajectory tracing problem, 35 etc, by using the ILC technique. Using the contraction mapping method and composite energy function method, Yang et al proposed series achievements on the design and analysis of ILC for MAS in the wors of Yang et al, 36 Yang et al, 37 and Yang and Xu. 38 In the wors of Yang et al, 36,37 the agent was modeled by a continuous affine nonlinear system. The consensus property was proved for the fixed topology and the iteration-varying topology, respectively. In the wor of Yang and Xu, 38 a class of nonlinear system, ie, networed Lagrangian system, was considered, and the consensus objective was achieved by using local position information and speed information. Meng et al also made valuable contributions on ILC for MAS from the viewpoint of a 2-dimensional system approach The major concern of these papers was the convergence requirements on communication topology. To be specific, in the wor of Meng et al, 39 the topology should be time and iteration invariant and connected, whereas the time invariant condition is relaxed intheworofmengetal. 40 The terminal united topology is required to have a spanning tree in the wor of Meng et al. 41 In the wor of Meng et al, 42 the topology should always have a spanning tree along the iteration axis. Recently, Meng et al studied the consensus problem with using the relative difference information. 43 In addition, the ILC for MAS was also investigated by Li and Li based on the Lyapunov function approach, where the first-, second-, and high-order models of agents were considered in the wors of Li and Li, respectively. It can be concluded that the elementary triple components have been addressed in previous studies and the classic techniques in ILC have been applied to these problems. However, in many practical problems, consensus with state constraints should be well addressed. For example, convex input constraints were discussed in the wors of Qiu et al 47 and Johansson et al 48 due to practical limitations on driving forces for agents. The constraints were handled in an optimization framewor in these papers. This observation motivates us to further investigate the learning consensus problem with state constraints. 1.2 Motivation When concerning MASs in the real world, it is found that nearly almost all real systems are subject to constraints in one way or another. Such constraints may rise due to various inds of practical limitations and/or the requirements of safe operation. It is well nown that the actuators of real systems are usually saturated because there is a limit of the drive force such as voltages or currents, which, therefore, cannot be arbitrarily large. In many system operations, the states are also bounded within certain ranges. For example, the motorcade, consisting of several vehicles, is a typical MAS. As is nown, no matter how the vehicle moves, it should stay on the road for safety. In other words, the position of the vehicle is limited to be within the range of the road. Meanwhile, the speed of any vehicle is also limited due to safe traffic, although the speed limit may vary in different road cases. For such case, the state or the output is bounded within certain ranges. Similar examples include formation control performance of robots in a given site, group dance performance on the stage, coordination search by unmanned aerial vehicles in a designated area, etc. Another example comes from the communication limitation due to physical transmission bandwidth. In this case, to guarantee a good communication effect, requirements on the transmitting data pacages are commonly imposed, which can be regarded as a ind of constraint to the output of each agent. Moreover, there might also exist measurement limitations due to the usage of simple and cheap devices. When the state exceeds the measurement range, one may not get the real output. Thus, the systems should satisfy certain constraints, in turn, to get good observations. In short, it is of great value to consider the learning control of MAS with state constraints. To be specific, how to design a coordinative learning scheme to ensure the state boundedness while achieving the desired consensus or formation objectives, is of great interest. In other control fields, such as adaptive control, the barrier Lyapunov function (BLF) has been proposed to

46 SHEN AND XU 1781 address the constrained control problem. In the ILC field, Xu et al also presented some pioneer results. 49,50 However, when a MAS is taen into account, where only local information can be obtained, it is yet unclear how to realize the above target. 1.3 Our contributions In this paper, we consider the distributed adaptive ILC design and analysis for MAS with state constraints. Here the state constraints mean that the states are required to be bounded in a predefined zone. To preserve this requirement, a general γ-type BLF was first introduced with two specific cases, and it is then involved in the design of following distributed algorithms. For the first-order model of each agent, we propose five control schemes, in turn, to address the learning consensus problem comprehensively from both theoretical and practical viewpoints. The first scheme is the original adaptive algorithm with a classic parameter learning process, where a sign function is introduced for generating the compensate term of unnown uncertainties. To ensure the prior boundedness of learning parameters, the second control scheme replaces the classic learning process with projection-based ones. On the other hand, the sign function may mae the control signal discontinuous as it switches between positive and negative. Then, the sign function is replaced by a hyperbolic tangent function to generate a smooth control signal in the third control scheme. To ensure zero-error consensus, a fast decreasing parameter sequence is introduced to this scheme. However, from the practical application viewpoint, it is not quite realistic, thus in the fourth control scheme, the decreasing parameter is replaced by a fixed one, which can only ensure the bounded consensus rather than zero-error consensus. To guarantee the boundedness of learning parameters, we adopt a forgetting factor mechanism in the fourth scheme, which may result in additional consensus errors. Therefore, a simple dead-zone lie scheme is further given, which means the parameter learning process would stop whenever the consensus error converges into a prior given bound. The consensus convergence and constraints guarantee are strictly proved for each control scheme by using the barrier composite energy function (BCEF) method. 1.4 Paper organizations and notations The rest of this paper is arranged as follows. Section 2 proposes the problem formulation and the definition of γ-type BLF; the 5 control schemes for the first-order agent model are provided, in turn, in Section 3. Section 4 provides detailed illustrative simulations to verify the theoretical results, and some concluding remars are given in Section 5. All the proofs are put in the Appendix. Notations: =(, ) is a weighted graph. ={v 1,, v N } is a nonempty set of nodes/agents, where N is the number of nodes/agents. is the set of edges/arcs. (v i, v j ) indicates that agent j can get information from agent i. =[a ij ] R N N denotes the topology of a weighted graph. a ij is the weighted value, and a ij = 1if(v i, v i ) ; otherwise, a ij = 0. In addition, a ii = 0, 1 i N. d i = N j=1 a ij is the in-degree of agent i. = diag{d 1,, d N } is the in-degree matrix. = is the Laplacian matrix of a graph. i denotes the set of all neighborhoods of ith agent, where an agent v j is said to be a neighborhood of agent v i if v i can get information from v j. An agent does not belong to its neighborhood. ε j denotes the access of jth agent to the desired trajectory, that is, ε j = 1ifagentv j has direct access to the full information of desired trajectory; otherwise, ε j = 0. x denotes the Euclidean norm for a vector x. 2 PROBLEM FORMULATION Consider an MAS formulated by N (N > 2) agents, one of which is described by the following first-order SISO nonlinear system: ẋ j, =θ(t) T ξ j (x j,, t)+b j (t)u j,, (1) where j is the agent number, and is the iteration number. θ(t) T ξ j (x j,, t) is parametric uncertainty, where θ(t) is an unnown time-varying parameter, whereas ξ j (x j,, t) is a nown time-varying function. b j (t) b j (x j,, t) is the unnown time-varying control gain. In the following, denote ξ j, ξ(x j,, t) and b j b j (x j,, t) where no confusion arises. Let the desired trajectory (virtual leader) be x r, which satisfies and similarly, ξ r ξ(x r, t), b r b r (x r, t). The following assumptions are required for Equations 1 and 2. ẋ r =θ(t) T ξ(x r, t)+b r u r (2)

47 1782 SHEN AND XU Assumption 1. Assume that the control gain b j does not change its sign. Meanwhile, it has a lower and upper bound. Without loss of any generality, we assume that 0 < b min b j b max,whereb min is assumed to be nown. Assumption 2. Each agent satisfies the alignment condition, that is, x j, (0) =x j, 1 (T). In addition, the desired trajectory is spatially closed, that is, x r (0) =x r (T). Remar 1. In the ILC literature, the reinitialization is one of the fundamental issues that has been studied in lots of publications. The most common requirement is the so-called identical initialization condition, which means x j, (0) =x r (0) for all agents and all iterations. This condition has also been used in many previous papers on learning coordination problem such as in the wor of Ahn and Chen. 32 However, this condition is somewhat strict because it is hard to satisfy for many practical systems. On the other hand, it is widely noticed in the industry that many motion systems usually start from the position where they stop in the previous iteration. For example, consider the industry manipulator performing the pic and place tas repetitively. The starting position is the final position in the previous tas execution. Therefore, we adopt the so-called alignment condition defined in Assumption A2, which is a relaxed condition comparing with the classic identical initialization condition setting. Denote the tracing error of the jth agent to the desired trajectory as e j, x j, x r. It is noticed that not all agents can get information from the virtual leader. Thus, the tracing error e j, is only available for a subset of the agents that the virtual leader is within their neighborhoods. Meanwhile, all agents could acnowledge the information of its neighbors within a specified distance. Therefore, for the jth agent, denote its extended observation error as follows: z j, =ε j (x j, x r )+ (x j, x l, ). (3) l j Note that the above definition could be also formulated as N z j, =ε j (x j, x r )+ a jl (x j, x l, ). (4) The control objective is to design a set of distributed controllers such that all the agents can perfectly trac the desired trajectory in the presence of parametric uncertainties, ie, lim e j, = 0, j = 1,, N (5) and ensure the prior given boundedness of the state of all agents for all iterations. To obtain a compact form of the MAS, denote ē, x,and z as the stac of tracing error and extended observation error for all agents, that is, l=1 ē =[e 1,,, e N, ] T x =[x 1,,, x N, ] T z =[z 1,,, z N, ] T. Noting the fact that 1 = 0 and using the definition of e j,, the relationship between the extended observation error and the tracing error is given as follows: where = diag{ε 1,, ε N } and 1 =[1,, 1] T R N.Let = +. The following assumption is given on the connection of agents. Assumption 3. The undirected graph is connected. z = x + ē = ( x 1x r )+ ē =( + )ē, (6) Remar 2. Assumption A3 assumes that the virtual leader actually is reachable for each agent no matter whether the virtual leader lies in its neighborhood or not. Here by reachable we mean there is a path from the virtual leader to the agent possibly passing several other agents. This assumption is a necessary requirement for the leader-follower consensus tracing problem. If there is an isolated agent, it is hard to ensure the agent trac the leader's trajectory since no information of the virtual leader could be obtained by the isolated agent. Moreover, although in this paper we assume an undirected graph to concentrate our discussions, the results can be extended to a directed graph case with little effort. Then, we have the positive definiteness property of from the following lemma.

48 SHEN AND XU 1783 Lemma 1. (Hong et al. 4 ) If the undirected graph is connected, and M is any nonnegative diagonal matrix with at least one of the diagonal entries being positive, then + M is symmetric positive definite, where is the Laplacian matrix of. Under A3, it is seen that + is positive definite from Lemma 1. Let us denote its minimum and maximum singular values as σ min ( + ) and σ max ( + ). To ensure the state boundedness, we need a general class of BLF satisfying the following definition. Definition 1. We call a BLF V(t) =V(γ 2 (t), b ) γ-type BLF if all the following conditions hold. V if and only if γ 2 2 b,where b is a certain fixed parameter in V, provided that γ 2 (0) < 2 b. V if and only if V. γ 2 If γ 2 < 2 V,then C, wherec > 0 is a constant. b γ 2 lim b V(γ 2 (t), b )= 1 2 γ2 (t). Remar 3. The first item of the definition is to ensure the boundedness of γ 2 as long as the BLF is finite, so it is fundamental. The second item is given to show the boundedness of the BLF by maing use of V in the controller design. The third item γ 2 maes flexibility of the BLF as can be seen in the following proofs of our main results. From the last item, the defined γ-type BLF can be regarded as a special case of the classic quadratic Lyapunov function in the sense that they are mathematically equivalent when b. Remar 4. Two typical examples of the so-called γ-type BLF are given as follows. The first one is of log type ( ) V(t) = 2 b 2 2 log b (7) 2 b γ2 (t) and the other one is of tan type ( ) V(t) = 2 b π tan πγ 2 (t). (8) 2 2 b By direct calculations, one can find that all the items of γ-type BLF are satisfied. In the following, to simplify the notations, the time and state dependence of the variables will be omitted whenever no confusion arises. 3 MAIN RESULTS 3.1 Original algorithms Before proposing the distributed learning algorithms, we first give some auxiliary functions to deal with the state constraints. Let N γ j, = z j, =ε j (x j, x r )+ a jl (x j, x l, ) (9) with its stabilizing function l=1 l=1 ( ) N σ j, = ε j ẋ r + a jl ẋ l, λ 1 j, μ jγ j,, (10) where λ j, λ j, (t) = 1 V j, ), V j, = V (γ 2j, γ j, γ, bj (11) j, and V here is the γ-type BLF. bj > 0 is the constraint for γ j,,. μ j is a positive constant to be designed later. Under the definition of γ-type BLF, we are evident to get lim bj λ j, = 1 since lim b V(γ 2, b )= 1 2 γ2. The distributed control law is designed as follows: ( 1 ) u j, = û j, 1 b min θ T j, ξ j,sgn ( ) λ j, γ j, θ T j, ξ j, 1 b min (ε j + d j ) σ j,sgn(λ j, γ j, σ j, ) (12)

49 1784 SHEN AND XU with iterative updating laws û j, = û j, 1 q j λ j, γ j, (13) θ j, = θ j, 1 + p j λ j, γ j, ξ j,, (14) where q j > 0andp j > 0 are design parameters, j = 1,, N. The notation sgn( ) denotes a sign function that is defined as follows: +1, if χ > 0 sgn(χ) = 0, if χ=0 (15) 1, if χ < 0. The initial values of the iterative update laws are set to be zero, ie, û j,0 = 0, θ j,0 = 0, j = 1,, N. Remar 5. Consider the log-type BLF defined in Equation 7 and notice that 2 b j λ j, =. 2 γ 2 b j j, Then, controller 1 and its iterative updating laws will become ( 2 b γ 2 j j, ( 1) uj, = û j, sgn 2 q b j γ j, j û j, = û j, 1 2 γ 2 b j j, ) [ 1 b min θ T j, ξ j,sgn ( ) γ j, θ T j, ξ j, + ] 1 b min (ε j + d j ) σ j,sgn(γ j, σ j, ) (16) (17) 2 p b j γ j, ξ j, j θ j, = θ j, 1 +. (18) 2 γ 2 b j j, The term sgn ( 2 γ 2 b j j,) can be further removed from the controller as it is always equal to 1, which can be seen from 1 the following technical analysis that the boundedness of the BLF is guaranteed and so is the inequality γ 2 j, < 2.Onthe b j other hand, if the tan-type BLF (8) is selected, it is found Then, controller 1 and its updating laws will become ( 1 ) uj, = û j, 1 b min θ T j, ξ j,sgn û j, = û j, 1 θ j, = θ j, 1 + q j γ j, 1 λ j, = ( ). πγ cos 2 2 j, 2 2 b j ( ) γ j, θ T j, ξ j, 1 b min (ε j + d j ) σ j,sgn ( ) γ j, σ j, (19) cos 2 ( πγ 2 j, 2 2 b j ) (20) p jγ j, ξ j, cos 2 ( πγ 2 j, 2 2 b j ) (21) since sgn(λ j, )=1isalways true. It is interesting to see that the differences between ( ) and ( ) only lie in the iterative 1 1 updating laws. Now, we have the following theorem on the consensus results of agent (1) under state constraints. Theorem 1. Assume that Assumptions A1 to A3 hold for the multiagent system (1). The closed-loop system consisting of agent (1) and the control update algorithms (12) to (14) can ensure the following: (i) the tracing error e j, (t) converges to zero uniformly as the iteration number goes to infinity, j = 1,, N; (ii) the system state x j, is bounded by predefined constraint, that is, x j, < s, will always be guaranteed for all iterations and all agents j provided that γ j,1 < bj over [0, T] for j = 1,, N. The proof is put in the Appendix.

50 SHEN AND XU 1785 Remar 6. Let us tae the agent e j,, for example, and we have predefined bound such that x j, < s.since x r < s,we can select an upper bound bj < s x r for e j, to guarantee the boundedness requirement. That is, if we could ensure that e j, < bj < s x r for all iterations, then it is evident x j, < bj + x r < s x r + x r = s for all iterations. Furthermore, from the relationship between the extended observation error and the tracing error (6), the upper bound of e j, can be derived by imposing boundedness condition on z j,, as can be seen from the following analysis. Remar 7. As can be seen from the algorithms (12) to (14), if the state approximates the given constraint bound, ie, x j, s, then λ j, from the definition of BLF. In other words, controller (12) will be dominated by the pure ILC part û j, as the iteration number increases. This reveals the inherent effect of the learning process in the proposed algorithms. Remar 8. In the dynamics of each agent (1), no lumped uncertainty is taen into account to mae our discussions concentrated. 23 In fact, if there exists a lumped uncertainty, we can append a robust term to the controller so that the lumped uncertainty can be compensated. Specifically, if a lumped uncertainty exists, agent (1) is reformulated as ẋ j, =θ T ξ j (x j,, t)+b j (t)u j, +α j (x j,, t). We assume that the lumped uncertainty is bounded by a nown function α (x j,, t): α j (x j,, t) α (x j,, t). Then, an additional term can be added to controller (12) similarly to σ j,. Moreover, if the lumped uncertainty is bounded with a unnown coefficient ω: α j (x j,, t) ωα (x j,, t), then another estimation process can be established, and the robust compensation term is appended to the controller based on the estimated parameter. Readers who are interested may refer to the wor of Li et al. 51 for further details. The following steps are still valid. For saving space, the details are omitted. 3.2 Projection-based algorithms In the last subsection, we propose the original control scheme where no projection is imposed to the updating laws. Consequently, we have to show the boundedness of the updating laws (13) and (14). However, it is only guaranteed that û j, and θ j, are bounded, but the possible maximums are unnown. In practical applications, the maximums may exceed the limitation of physical devices. In this subsection, we give a projection-based scheme as an alternative choice for engineering requirements. The controller is designed as follows: ( 2 ) u j, = û j, 1 b min θ T j, ξ j,sgn with projection-based iterative updating laws ( ) λ j, γ j, θ T j, ξ j, 1 b min (ε j + d j ) σ j,sgn(λ j, γ j, σ j, ) (22) û j, = u (û j, 1 ) q j λ j, γ j, (23) θ j, = θ ( θ j, 1 )+p j λ j, γ j, ξ j,, (24) where u and θ indicate projections which are defined as follows: { u [û] = û, û ū sgn(û)ū, û >ū (25) and [ T θ [θ] = θ [ θ 1 ],, θ [ θ l ]] (26) { θ m, θ [ θ m ]= sgn( θ m ) θ m, θ m θ m θ m > θ m, m = 1,, l, (27) with u r sup ū and θ m sup θ m, m = 1,, l. The initial values of the iterative update laws are set to be zero, ie, û j,0 = 0, θ j,0 = 0, j = 1,, N. p j > 0andq j > 0 are design parameters. Remar 9. The projection is added to the iterative learning laws (23) and (24) to guarantee the boundedness where ū and θ m are predefined upper bounds. In practical applications, these upper bounds can be obtained or estimated from specific environments. An intuitive case is that the control signals of a real control system are bounded by certain hardware limits; thus, such limits can be used as the upper bounds. When the limits cannot be directly obtained, we can conduct preliminary tests to estimate them. With the help of the projection mechanism, the convergence analysis would be much easier as can be seen in the proof. Now we can formulate the following theorem for control scheme ( 2 ).

51 1786 SHEN AND XU Theorem 2. Assume that Assumptions A1 to A3 hold for the multiagent system (1). The closed-loop system consisting of agent (1) and the control update algorithms (22) to (24), can ensure the following: (i) the tracing error e j, (t) converges to zero uniformly as the iteration number goes to infinity, j = 1,, N; (ii) the system state x j, is bounded by predefined constraint, that is, x j, < s, will always be guaranteed for all iterations and all agents j provided that γ j,1 < bj over [0, T] for j = 1,, N. The proof can be found in the Appendix. 3.3 Smooth function-based algorithms Generally, the sign function used in control schemes ( 1 ) and ( 2 ) mae them be discontinuous control schemes, which further yields the problem of existence and uniqueness of solutions. Furthermore, it may also cause chattering that might excite high-frequency unmodeled dynamics. This motivates us to see an appropriate smooth approximation function of the sign function used in controllers (12) and (22). We tae hyperbolic tangent function in this paper to approximate the sign function. The following Lemma demonstrates the property of the hyperbolic tangent function. Lemma 2. (Polycarpous and Ioannouq 52 ) The following inequality holds for any ϵ > 0 and for any ρ R: ( ρ ) 0 ρ ρtanh δϵ, (28) ϵ where δ is a constant that satisfies δ=e (δ+1),ie,δ= Now, we can propose the third control scheme as follows. ( ( 3 ) u j, = û j, 1 λj, θ T j, b ξ γ j, θ T j, j, tanh ξ ) ( ) j, 1 min η b min (ε j + d j ) σ λj, γ j, σ j, j, tanh (29) η with iterative updating laws û j, = û j, 1 q j λ j, γ j, (30) θ j, = θ j, 1 + p j λ j, γ j, ξ j,, (31) where q j > 0andp j > 0 are design parameters, j = 1,, N. In addition, the parameter used in Equation 29 is given as η = 1 with ν 2. Then, it is obvious ν =1 η 2. The initial values of the iterative update laws are set to be zero, ie, û j,0 = 0, θ j,0 = 0, j = 1,, N. Remar 10. Parameter η used in the hyperbolic tangent function can be predefined before the algorithms are applied. This parameter decides the compensation error of the last term on the right-hand side (RHS) of Equation 29. Here, a fast convergence sequence of {η } is selected to mae the uniform zero-error consensus tracing. However, as η converges to zero, the plot of a hyperbolic tangent function will approximate the sign function asymptotically. In other words, the hyperbolic tangent function almost coincides with the sign function after enough iterations. We will relax this condition in the next subsection. The convergence theorem for the smooth function-based algorithms is given as follows. Theorem 3. Assume that Assumptions A1 to A3 hold for the multiagent system (1). The closed-loop system consisting of agent (1) and the control update algorithms (29) to (31) can ensure the following: (i) the tracing error e j, (t) converges to zero uniformly as the iteration number goes to infinity, j = 1,, N; (ii) the system state x j, is bounded by predefined constraint, that is, x j, < s, will always be guaranteed for all iterations and all agents j provided that γ j,1 < bj over [0, T] for j = 1,, N. The proof is put in the Appendix. 3.4 Alternative smooth function-based algorithms As explained in Remar 10, the fast convergent η maes the smooth hyperbolic tangent function close to the sign function, which is discontinuous. Thus, it may not be quite favorable to engineers. We are interested in whether a fixed parameter η,

52 SHEN AND XU 1787 possibly small, is able to ensure the convergence. In addition, if so, how is the performance? The answer to these questions is given in this subsection. Controller (29) is modified by replacing the iteration-varying parameter η with a fixed one η. However, in this case, there would be a constant compensation error in the difference and differential expression of V 1 (t). Hence, it is hard to derive that j, thedifferenceofe (T) will be negative even after sufficient iterations. As a result, only a bounded convergence rather than zero-error convergence can be obtained in this case. The details are discussed in the following. Specifically, the algorithms are formulated as follows. ( 4 ) u j, = û j, 1 b min θ T j, ξ j, tanh ( λj, γ j, θ T j, ξ ) j, η 1 b min (ε j + d j ) σ j, tanh ( ) λj, γ j, σ j, η (32) with iterative updating laws û j, = û j, 1 q j λ j, γ j, ϕû j, (33) θ j, = θ j, 1 + p j λ j, γ j, ξ j, φ θ j,, (34) where q j > 0, p j > 0, j = 1,, N, ϕ > 0, and φ > 0 are design parameters. The initial values of the iterative update laws are set to be zero, ie, û j,0 = 0, θ j,0 = 0, j = 1,, N. Remar 11. Comparing with the iterative updating laws given in control schemes ( 1 ), ( 2 ),and( 3 ), it is found that two additional terms are appended to Equations 33 and 34. These terms, ie, ϕû j, and φ θ j,, are leaage terms to increase the robustness and ensure the boundedness of the learning algorithms. The convergence property is summarized in the following theorem. Theorem 4. Assume that Assumptions A1 to A3 hold for the multiagent system (1). The closed-loop system consisting of agent (1) and the control update algorithms (32) to (34) can ensure the following: (i) the tracing error ē (t) converges to the ζ-neighborhood of zero asymptotically in the sense of L 2 -norm within finite iterations, where T ζ 2TN δη μ m σ 2 min ( ) + (N + 1)NTφ ϵ θ+ (35) 2p m μ m σ 2 min ( )θt σ 2 ( ), min with T, N, φ, μ m,p m,andσ min ( ) being the iteration length, amount of agents, leaage gain, minimum value of parameter μ j in stabilizing function, minimum value of learning gain in parameter updating law, minimum singular value of, and ϵ > 0 being an arbitrary small constant; (ii) the iterative updating parameters û j, and θ j, are bounded in the sense of L 2 -norm, j = 1,, N,. T The proof can be found in the Appendix. Remar 12. From the expression of ζ in Equation 35, any prespecified nonzero bound of tracing error can be obtained by tuning the design parameters η, φ, ϵ, p j,andμ j. More clearly, it is seen that the magnitude of ζ is proportional to η, φ, and ϵ,whereη denotes the compensation error when using the smooth hyperbolic tangent function, φ is the leaage gain in parameter updating, and ϵ can be predefined and sufficiently small. Remar 13. As can be seen from the proof, the leaage term introduced to the iterative updating law would not affect the convergence property to a neighborhood of zero asymptotically. In other words, the leaage term could be removed from Equations 33 and 34, whereas the bounded convergence of tracing errors is still established. 3.5 Practical dead-zone lie algorithms Several facts can be observed from the last subsection. First, it is hard to show the satisfaction of state constraint since the L 2 -norm of the state is only proved to converge into some neighborhood of zero. In practice, we may achieve better learning T performance such as pointwise or uniform convergence so that the state constraint is fulfilled; however, in theory only the L 2 T convergence is guaranteed. As a result, it is impossible to analyze the specific behavior of the state in time domain. Moreover, leaage terms used in Equations 33 and 34 aim only to ensure the boundedness of the learning process, since it is a ind of forgetting factor-type mechanism. Furthermore, it is easy to see from Equation A50 that the BLF is bounded before the tracing error entering the given neighborhood, and thus, the state satisfies constraints in these iterations.

53 1788 SHEN AND XU Based on these observations, we further propose the following practical dead-zone lie algorithms to ensure state constraints. ( ( 5 ) u j, = û j, 1 λj, θ T j, b ξ γ j, θ T j, j, tanh ξ ) ( ) j, 1 min η b min (ε j + d j ) σ λj, γ j, σ j, j, tanh (36) η with iterative updating laws û j, = {ûj, 1 q j λ j, γ j,, û j, 1, { θ j, 1 + p j λ j, γ j, ξ j,, T 0 z2 dτ > ς j, 1 otherwise T 0 θ j, = z2 dτ > ς j, 1 (38) θ j, 1, otherwise, where q j > 0, p j > 0, j = 1,, N are design parameters. The initial values of the iterative update laws are set to be zero, ie, û j,0 = 0, θ j,0 = 0, j = 1,, N. Parameter ς denotes the threshold of the stopping mechanism, which is defined later. Remar 14. The essential mechanism of the proposed algorithms is that the learning process of û j, and θ j, will stop updating whenever the extended observation error enters the predefined neighborhood of zero, so that the control system repeats the same tracing performance. To mae the stopping mechanism active, it should be realizable by using the available information. In Equations 37 and 38, the stopping criteria are established based on the extended observation error defined in Equation 3, which is accessible by the local agent, and thus, the integral over the iteration interval is realizable. The essential mechanism of Equations 37 and 38 is as follows. If the tracing performance is not acceptable (determined by the stopping criteria), then the updating processes are active. Otherwise, the updating process is terminated, and the input signal retains unchanged. Consequently, the boundedness of the learning algorithms (37) and (38) and the state constraints condition are fulfilled naturally as long as the bounded convergence is finished within finite iterations. Then, we have the following result. Theorem 5. Assume that Assumptions A1 to A3 hold for the multiagent system (1). The closed-loop system consisting of agent (1) and the control update algorithms (36) to (38), can ensure the following: (i) the tracing error ē (t) converges to (37) FIGURE 1 Communication graph among agents in the networ (A) (B) FIGURE 2 Trajectories of 4 agents at the first iteration and 20th iteration: ( 1 ) case. A, First iteration; B, 20th iteration [Colour figure can be viewed at wileyonlinelibrary.com]

54 SHEN AND XU 1789 the predefined ς-neighborhood of zero in the sense of L 2 -norm within finite iterations, where T ς 2TN δη μ m σ 2 min ( ) + σ 2 min ϵ (39) ( ), with T, N, μ m,andσ min ( ) being the iteration length, amount of agents, minimum value of parameter μ j in stabilizing function, minimum singular value of, andϵ > 0 being an arbitrary small constant; (ii) the system state x j, is bounded by predefined constraint, that is, x j, < s, will always be guaranteed for all iterations and all agents j provided that γ j,1 < bj over [0, T] for j = 1,, N; (iii) the iterative updating parameters û j, and θ j, are bounded in the sense of L 2 -norm, j = 1,, N,. T The proof is put in the Appendix. Remar 15. Here, we assume that the upper bound of the control gain, ie, b max, is also nown so that the δ can be calculated. Note that could be defined by the topology relationship, and μ m is a design parameter. Therefore, for any given ς, one can select η and ϵ sufficiently small and μ m sufficiently large such that Equation 39 is achieved. For the learning algorithms, the upper bound ς in Equation 39 is satisfied as long as we define ς as ς σ2 min ( ) ς= 2T δη + ϵ N μ m N. Remar 16. Now, we have proposed five schemes for solving the learning consensus problem under state constraints. Here, we mae some clarifications on the connection and application of these schemes. The original scheme ( 1 )laysabasic framewor of the solution to the learning consensus problem, of which several points are further improved by the schemes ( 2 )to( 5 ). Specifically, the boundedness of the estimation process is not clear enough and the upper bound of the estimation may exceed physical limitations; thus, the scheme ( 2 ) introduces a projection mechanism to naturally ensure the boundedness. Moreover, the sign function is used in both ( 1 )and( 2 ), which may cause chattering of the input signal and, thus, not applicable for many physical systems. This problem is solved in both ( 3 )and( 4 ), where the sign function is approximated by a hyperbolic tangent function. In ( 3 ), we use a varnishing sequence for the parameter in the hyperbolic tangent function, which may also lead to the chattering problem as the iteration number increases. Thus, in ( 4 ), a fixed parameter is used and then only bounded convergence is guaranteed. The last scheme ( 5 ) introduces a simple dead-zone lie mechanism for stopping the continuously updating process. For practical applications, the schemes ( 4 )and( 5 ) are preferable as they provide a smooth input profile. The sacrifice of these schemes is the convergence precision as they only ensure that the tracing error converges to a neighborhood of zero. However, the upper bound of the convergence neighborhood can be adjusted by selecting suitable parameters. When the chattering type of input profiles are allowed, the schemes ( 1 )to( 3 ) can be applied to achieve a zero-error tracing performance. FIGURE 3 Maximum tracing error along iteration axis: ( 1 ) case [Colour figure can be viewed at wileyonlinelibrary.com]

55 1790 SHEN AND XU 4 ILLUSTRATIVE SIMULATIONS To illustrate the applications of the proposed algorithms, consider a group of four agents. The communication topology graph is demonstrated in Figure 1, where vertex 0 represents the desired reference or virtual leader and the dashed lines stand for the communication lins between leader and followers. In other words, agents 1 and 2 can access the information from the leader. The solid lines stand for the communication lins among the four agents. The Laplacian matrix of this connection graph is and = diag{1, 1, 0, 0} = (A) (B) (C) (D) FIGURE 4 Input profiles at the first iteration, 10th iteration, and 20th iteration for each agent: ( 1 ) case. A, Agent 1; B, Agent 2; C, Agent 3; D, Agent 4 [Colour figure can be viewed at wileyonlinelibrary.com]

56 SHEN AND XU 1791 In the simulations, the agent dynamic is modeled by ẋ j, =θ(t) ( sin(t) cos(t)+0.5sin(x j, ) ) + ( sin(5t)) u j,, where θ(t) =2sin(t). The initial states for the first iteration of the four agents are set as 1, 0.5, 0.2, and 0.8, respectively. Then, the state trajectories of the four agents at the first iteration are different. The desired reference is given as x r (t) =sin 3 (t)+0.2sin(5t) cos(10t) The iteration length is set to be [0, π]. The log-type and tan-type BLFs are both simulated and the results are almost the same. Thus, we only present the results with log-type BLF (7) with b = 1. The simulations are run for 20 iterations for each control scheme. It is seen that the control scheme ( 2 ) is a technical alternative of the control scheme ( 1 ). The performance of ( 2 ) would not differ from that of ( 1 ) as long as we choose the projection bounds sufficiently large. On the other hand, the control scheme ( 5 ) also behaves similar to ( 4 ) because the former scheme is also a practical alternative of the latter one. Moreover, the ultimate (A) (B) FIGURE 5 Trajectories of 4 agents at the first iteration and 20th iteration: ( 3 ) case. A, First iteration; B, 20th iteration [Colour figure can be viewed at wileyonlinelibrary.com] FIGURE 6 Maximum tracing error along iteration axis: ( 3 ) case [Colour figure can be viewed at wileyonlinelibrary.com]

57 1792 SHEN AND XU convergence bound could be prior defined by choosing suitable parameters. As a result, we mainly provide the simulations for control schemes ( 1 ), ( 3 ), and ( 4 ) with no leaage terms. In addition, we notice from the main theorems that no specific conditions are imposed on the learning parameters. Generally, larger values of the learning parameters may lead to a faster convergence speed with a sacrifice on tracing performance. Thus, we select moderate values for the parameters to ensure an acceptable convergence in practical applications. Case 1. (Control scheme ( 1 )) The parameters used in the control scheme ( 1 ), ie, Equations 12 to 14 and 10, are set to be b min = 0.9, q j = 5, p j = 5, and μ j = 10. The trajectories of all agents at the first iteration and 20th iteration is shown in Figure 2, from which it can be seen that the trajectories at the first iteration do not match the desired reference, whereas those at the 20th iteration coincide with the desired reference. In addition, the state constraints are satisfied. Define the maximum error as max t x j, x r for the jth agent at the th iteration. The maximum error profiles of all agents along the iteration axis are shown in Figure 3. As one can see, the maximum tracing errors for all agents are reduced a lot during the first several iterations. (A) (B) (C) (D) FIGURE 7 Input profiles at the first iteration, 10th iteration, and 20th iteration for each agent: ( 3 ) case. A, Agent 1; B, Agent 2; C, Agent 3; D, Agent 4 [Colour figure can be viewed at wileyonlinelibrary.com]

58 SHEN AND XU 1793 (A) (B) FIGURE 8 Trajectories of 4 agents at the first iteration and 20th iteration: ( 4 ) case. A, First iteration; B, 20th iteration [Colour figure can be viewed at wileyonlinelibrary.com] FIGURE 9 Maximum tracing error along iteration axis: ( 4 ) case [Colour figure can be viewed at wileyonlinelibrary.com] It has been remared that the input files may be discontinuous due to the introduction of sign function in the control scheme. This point is verified in Figure 4, in which the 4 subfigures depict that the ultimate input files have high amount of chattering phenomenon. Case 2. (Control scheme ( 3 )) The parameters used in the control scheme ( 3 )aresettobeb min = 0.9, q j = 5, p j = 1, and μ j = 10. The decreasing parameter in the tanh function is selected as η = 1 3. In this case, the finite summation condition, ie, =1 η < 2, implies that η converges to zero very fast. As a result, the tanh function approximates the sign function after a few iterations. In other words, the tracing performance under the control scheme ( 3 ) would be almost the same to the control scheme ( 1 ). This fact is shown in the following.

59 1794 SHEN AND XU (A) (B) (C) (D) FIGURE 10 Input profiles at the first iteration, 10th iteration, and 20th iteration for each agent: ( 4 ) case; (A) Agent 1; (B) Agent 2; (C) Agent 3; (D) Agent 4 [Colour figure can be viewed at wileyonlinelibrary.com] The trajectories of all agents for the first iteration and 20th iteration are given in Figure 5. The maximum error profiles along iteration axis for all agents are displayed in Figure 6. As can be seen, the behaviors have little difference from those of the control scheme ( 1 ). Moreover, the input files of all agents at the first iteration, 10th iteration, and 20th iteration are provided in Figure 7. From the subfigures, one could find that the chattering phenomena do not get eliminated completely, although some improvements are seen compared to Figure 4 for the control scheme ( 1 ). The reason lies in the fast attenuation of parameter η. Case 3. (Control scheme ( 4 )) In this case, we consider the control scheme ( 4 ), where the smooth parameter used in the tanh function is set to be η=0.1. The other parameters are selected as b min = 0.9, q j = 15, p j = 20, and μ j = 10. The trajectories of all agents at the first iteration and 20th iteration are shown in Figure 8. The tracing performance is also good under the smooth function. The maximum tracing error profiles for all agents along the iteration axis are displayed in Figure 9. It is evident that the reduction degree of this scheme is as large as the previous schemes. Meanwhile, the plot depicts a flat trend, implying that

60 SHEN AND XU 1795 the maximum error might not eep decreasing when the iteration number further increases. This observation confirms our theoretical analysis. In addition, the input profiles under the control scheme ( 4 ) are presented in Figure 10. The discontinuity of input profiles in the previous two schemes is avoided due to the proposed smoothing technique. These observations are consistent with our analysis in the last section. 5 CONCLUSIONS In this paper, the adaptive distributed ILC for nonlinear MAS with state constraints is taen into account. A novel BLF is introduced to ensure the bounded constraints while improving the tracing performance. Five control schemes are designed, in turn, to address the consensus problem comprehensively from both theoretical and practical viewpoints. An original ILC scheme consisting of an iterative learning part, an unnown parameter learning part, and a robust compensating part is first investigated, where a sign function is involved to ensure zero-error convergence. Then, a projection based updating version is considered as an alternative for practical applications. The third scheme aims to provide an approximate smooth function of the sign function to expect an improved performance, where the smooth parameter should satisfy a finite summation requirement. Then, the decreasing smooth parameter is considered to be fixed prior in the fourth scheme, which is much practical for applications. Finally, a dead-zone lie scheme is studied to complete the state constraints analysis. Consensus examples of nonlinear dynamic agents are presented to show the effectiveness of the developed algorithms. For further research, it is of great interest to consider general nonlinear systems. ACKNOWLEDGEMENTS This wor was supported by the National Natural Science Foundation of China (grants and ), the Beijing Natural Science Foundation (grant ), and the China Scholarship Council (grant ). ORCID D. Shen REFERENCES 1. Cao Y, Yu W, Ren W, Chen G. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inform. 2013;9(1): Olfati-Saber R, Murray RM. Consensus problems in networs of agents with switching topology and time-delays. IEEE Trans Autom Control. 2004;49(9): Ren W, Beard RW, Atins EM. Information consensus in multivehicle cooperative control. IEEE Control Syst. 2007;27(2): Hong Y, Hu J, Gao L. Tracing control for multi-agent consensus with an active leader and variable topology. Automatica. 2006;42(7): Ren W. On consensus algorithms for double integrator dynamics. IEEE Trans Autom Control. 2008;53(6): Zhang Y, Tian YP. Consentability and protocol design of multi-agent systems with stochastic switching topology. Automatica. 2009;45: Cui Y, Jia Y. Robust L 2 -L consensus control for uncertain high-order multi-agent systems with time-delay. Int J Syst Sci. 2012;45(3): Scardovi L, Sepulchre R. Synchronization in networs of identical linear systems. Automatica. 2009;45(8): Yu L, Wang J. Distributed output regulation for multi-agent systems with norm-bounded uncertainties. Int J Syst Sci. 2014;45(11): Chen G, Lewis FL. Distributed adaptive tracing control for synchronization of unnown networed lagrangian systems. IEEE Trans Syst Man Cybern B. 2011;41(3): Mei J, Ren W, Ma G. Distributed coordinated tracing with a dynamic leader for multiple euler-lagrange systems. IEEE Trans Autom Control. 2011;56(6): Mehrabian AR, Khorasani K. Constrained distributed cooperative synchronization and reconfigurable control of heterogeneous networed Euler-Lagrange multi-agent systems. Inform Sci. 2016; : Yoo SJ. Distributed consensus tracing for multiple uncertain nonlinear strict-feedbac systems under a directed graph. IEEE Trans Neural Netw Learn Syst. 2013;24(4): Yoo SJ. Synchronised tracing control for multiple strict-feedbac non-linear systems under switching networ. IET Control Theory A. 2014;8(8):

61 1796 SHEN AND XU 15. Cui G, Xu S, Lewis FL, Zhang B, Ma Q. Distributed consensus tracing for non-linear multi-agent systems with input saturation: a command filtered bacstepping approach. IET Control Theory A. 2016;10(5): Tahbaz-Salehi A, Jadbabaie A. A necessary and sufficient condition for consensus over random networs. IEEE Trans Autom Control. 2008;53(3): Fang L, Antsalis PJ. On communication requirements for multi-agent consensus seeing. Paper presented at: Networed Embedded Sensing and Control, Worshop NESC'05; 2005; USA: University of Notre Dame. 18. Ren W, Beard RW. Distributed consensus in multi-vehicle cooperative control. Communication and Control Engineering Series. Springer-Verlag: London; Yang S, Tan S, Xu JX. Consensus based approach for economic dispatch problem in a smart grid. IEEE Trans Power Syst. 2013;28(4): Khoo S, Xie L, Man Z. Robust finite-time consensus tracing algorithm for multirobot systems. IEEE-ASME Trans Mech. 2009;14(2): Ahn HS, Chen YQ, Moore KL. Iterative learning control: survey and categorization from 1998 to IEEE Trans Syst Man Cybern C. 2007;37(6): Shen D, Wang Y. Survey on stochastic iterative learning control. J Process Control. 2014;24(12): Xu JX. A survey on iterative learning control for nonlinear systems. Int J Control. 2011;84(7): Huang D, Xu JX, Li X, Xu C, Yu M. D-type anticipatory iterative learning control for a class of inhomogeneous heat equations. Automatica. 2013;49: Huang D, Xu JX, Venataramanan V, Tuong Huynh TC. High performance tracing of piezoelectric positioning stage using current-cycle iterative learning control with gain scheduling. IEEE Trans Ind Electron. 2014;61(2): Huang D, Li X. Improved D-type anticipatory iterative learning control for a class of inhomogeneous heat equations: improved ILC for inhomogeneous heat equations. Asian J Control in press. 27. Li X, Ren Q, Xu JX. Precise speed tracing control of a robotic fish via iterative learning control. IEEE Trans Ind Electron. 2016;63(4): Shen D, Xu JX. A novel Marov chain based ILC analysis for linear stochastic systems under general data dropouts environments. IEEE Trans Autom Control in press. 29. Shen D, Zhang W, Wang Y, Chien CJ. On almost sure and mean square convergence of P-type ILC under randomly varying iteration lengths. Automatica. 2016;63(1): Shen D, Zhang C, Xu Y. Two compensation schemes of iterative learning control for networed control systems with random data dropouts. Inform Sci. 2017;381: Xu Y, Shen D, Wang Y. On Interval tracing performance evaluation and practical varying sampling ILC. Int J Syst Sci. 2017;381: Ahn HS, Chen YQ. Iterative learning control for multi-agent formation. Paper presented at: ICROS - SICE International Joint Conference; vol ; August 2006; Japan. 33. Ahn HS, Moore KL, Chen YQ. Trajectory-eeping in satellite formation flying via robust periodic learning control. Int J Robust Nonlinear Control. 2010;20(14): Chen X, Jia Y. Stereo vision-based formation control of mobile robots using iterative learning. Paper presented at: Proceedings of the International Conference on Humanized Systems; 2010; Toyo, Japan. 35. Sun H, Hou Z, Li D. Coordinated iterative learning control schemes for train trajectory tracing with overspeed protection. IEEE Trans Autom Sci Eng. 2013;10(2): Yang S, Xu JX, Huang D, Tan Y. Optimal iterative learning control design for multi-agent systems consensus tracing. Syst Control Lett. 2014;69: Yang S, Xu JX, Huang D, Tan Y. Synchronization of heterogeneous agent systems by adaptive iterative learning control. Asian J Control. 2015;17(6): Yang S, Xu JX. Leader-follower synchronisation for networed Lagrangian systems with uncertainties: a learning approach. Int J Syst Sci. 2016;47(4): Meng D, Jia Y, Du J, Yu F. Tracing control over a finite interval for multi-agent systems with a time-varying reference trajectory. Syst Control Lett. 2012;61(7): Meng D, Jia Y, Du J. Coordination learning control for groups of mobile agents. J Franl Inst. 2013;350(8): Meng D, Jia Y, Du J. Multi-agent iterative learning control with communication topologies dynamically changing in two directions. IET Control Theory A. 2013;7(2): Meng D, Jia Y, Du J. Robust consensus tracing control for multiagent systems with initial state shifts, disturbances, and switching topologies. IEEE Trans Neural Netw Learn Syst. 2015;26(4): Meng D, Moore KL. Learning to cooperate: Networs of formation agents with switching topologies. Automatica. 2016;64: Li J, Li J. Adaptive iterative learning control for consensus of multi-agent systems. IET Control Theory A. 2013;7(1): Li J, Li J. Coordination control of multi-agent systems with second-order nonlinear dynamics using fully distributed adaptive iterative learning. J Franl Inst. 2015;352(6): Li J, Li J. Distributed adaptive fuzzy iterative learning control of coordination problems for higher order multi-agent systems. Int J Syst Sci. 2016;47(10): Qiu Z, Liu S, Xie L. Distributed constrained optimal consensus of multi-agent systems. Automatica. 2016;68: Johansson B, Speranzon A, Johansson M, Johansson KH. On decentralized negotiation of optimal consensus. Automatica. 2008;44(4): Xu JX, Jin X. State-constrained iterative learning control for a class of mimo systems. IEEE Trans Autom Control. 2013;58(5):

62 SHEN AND XU Jin X, Xu JX. Iterative learning control for output-constrained systems with both parametric and nonparametric uncertainties. Automatica. 2013;49(9): Li X, Huang D, Chu B, Xu JX. Robust iterative learning control for systems with norm-bounded uncertainties. Int J Robust Nonlin Control. 2016;26: Polycarpous MM, Ioannouq PA. A robust adaptive nonlinear control design. Automatica. 1996;32(3): How to cite this article: Shen D, Xu J-X. Distributed adaptive iterative learning control for nonlinear multiagent systems with state constraints. Int J Adapt Control Signal Process. 2017;31: APPENDIX PROOF OF THEOREM 1 The proof consists of five parts. First, we investigate the decreasing property of the given BCEF in the iteration domain at t = T. Then, finiteness of the given BCEF is shown by verification on the derivative of the BCEF. The boundedness of related parameter is also proved here. Next, we give the proof of the convergence of extended observation errors. In the fourth part, the verification of constraints of state x j, for all iterations are demonstrated. Finally, the uniform consensus tracing is completed based on the previous analysis. Define the following BCEF: E (t) = N E j, (t) = j=1 N ) (V 1j, (t)+v2j, (t)+v3j, (t) j=1 (A1) V 1 j, (t) =V ( γ 2 j, (t), t ) (A2) V 2 j, (t) =(ε j + d j ) 2p j 0 t ( θ j, θ) T ( θ j, θ)dτ (A3) V 3 j, (t) =(ε j + d j ) 2q j 0 t b j û 2 j, dτ. (A4) Part I. Difference of E (t) Denote ΔE (t) E (t) E 1 (t) as the difference of BCEF along the iteration axis. Tae time t = T, and let us examine the difference of E (T), ie,δe (T) = N j=1 ΔE j,(t) = N j=1 (ΔV1 j, (T) +ΔV2 j, (T) +ΔV3 j, (T)). Here, Δη η η 1,whereη denotes arbitrary variable of. We first examine the first term ΔV 1 (T). Noticing Equation A2, we have j, ΔV 1 j, (T) =V1 j, (T) V1 j, 1 (T) T = V 1 j, (0) V1 j, 1 (T)+ 1 V1 j, γ j, γ j, dτ 0 γ j, γ j, ( ) T V γ = V(γ 2 j, (0), 0) V(γ2 j, 1 (T), T)+ 1 2 j, γ j, γ j, dτ. (A5) γ j, γ j, 0

63 1798 SHEN AND XU Note that γ j, (0) =z j, (0) =ε j (x j, (0) x r (0))+ N l=1 a jl(x j, (0) x l, (0)) = ε j (x j, 1 (T) x r (T))+ N l=1 a jl(x j, 1 (T) x l, 1 (T)) = γ j, 1 (T); thus, V(γ 2 j, (0), 0) =V(γ2 (T), T). This further yields j, 1 0 T ΔV 1 j, (T) = λ j, γ j, γ j, dτ 0 T = λ j, γ j, ż j, dτ 0 ( )] T N = λ j, γ j, [(ε j + d j )ẋ j, ε j ẋ r + a jl ẋ l, dτ 0 l=1 T ] = λ j, γ j, [(ε j + d j )θ T ξ j, +(ε j + d j )b j u j, σ j, λ 1 j, μ jγ j, dτ 0 T ( ) T = λ j, γ j, (ε j + d j ) θ T j, ξ j, +(ε j + d j ) θ T j, ξ j, +(ε j + d j )b j u j, σ j, dτ μ j γ 2 j, dτ, where θ j, = θ j, θand (10) is used. From controller (12), we have [ (ε j + d j )b j u j, =(ε j + d j )b j û j, 1 θ T j, b ξ j,sgn ( λ j, γ j, θ T j, ξ ) 1 j, min b min (ε j + d j ) σ j,sgn ( ) ] λ j, γ j, σ j, =(ε j + d j )b j û j, (ε j + d j ) b j θ T j, b ξ j,sgn ( λ j, γ j, θ T j, ξ ) j, min b j σ j, sgn ( ) λ j, γ j, σ j,. b min Notice that the following inequalities are true: (ε j + d j )λ j, γ j, θ T j, ξ j, (ε j + d j ) b j λ j, γ j, θ T j, b ξ j,sgn ( λ j, γ j, θ T j, ξ ) j, min (ε j + d j )λ j, γ j, θ T j, ξ j, (ε j + d j ) λ j, γ j, θ T j, ξ j, 0 λ j, γ j, σ j, b j λ j, γ j, σ j, sgn ( ) λ j, γ j, σ j, b min λ j, γ j, σ j, λ j, γ j, σ j, 0, where Assumption A1 is used. Therefore, the difference of ΔV 1 (T) now becomes j, T [ ] ΔV 1 j, (T) λ j, γ j, (ε j + d j ) θ T j, ξ j, +λ j, γ j, (ε j + d j )b j û j, μ j γ 2 j, dτ. 0 0 (A6) (A7) (A8) (A9) (A10) (A11) (A12) Next, we move on to chec the second term V 2 j, (T). ΔV 2 j, (T) = ε j + d j 2p j 0 = ε j + d j 2p j 0 = ε j + d j 2p j + ε j + d j p j ε j + d j p j 0 T T ( θ j, θ) T ( θ j, θ)dτ ε j + d j 2p j ( θ j, θ j, 1 ) T ( θ j, + θ j, 1 2θ) dτ T 0 T 0 T ( θ j, θ j, 1 ) T ( θ j, θ j, 1 ) dτ ( θ j, θ j, 1 ) T ( θ j, θ)dτ ( θ j, θ) T ( θ j, θ j, 1 ) dτ T =(ε j + d j ) λ j, γ j, θ T j, ξ j,dτ, where Equation 14 is used for in the last equality. 0 0 T ( θ j, 1 θ) T ( θ j, 1 θ)dτ (A13)

64 SHEN AND XU 1799 Then, for the last term ΔV 3 j, (T),wehave ΔV 3 j, (T) = ε [ j + d j b j 2q j = ε j + d j b j 2q j = ε j + d j 2q j ε j + d j b j q j 0 0 T b j 0 T 0 T = (ε j + d j )b j û 2 j, dτ 0 T û 2 j, 1 dτ ] (ûj, + û j, 1 )(ûj, û j, 1 ) dτ T (ûj, û j, 1 ) 2dτ+ ε j + d j q j û j, (ûj, û j, 1 ) dτ 0 T λ j, γ j, û j, dτ, b j 0 T û j, (ûj, û j, 1 ) dτ (A14) where Equation 13 is used in the last equality. Consequently, combining Equations A12 to A14 results in T ΔE j, (T) μ j γ 2 j, dτ, 0 (A15) which further yields ΔE (T) = N N ΔE j, (T) j=1 j=1 0 T μ j γ 2 j, dτ. (A16) Thus, the decreasing property of BCEF in the iteration domain at t = T is obtained. Part II. Finiteness of E (t) and involved quantities The finiteness of E (t) will be proved for the first iteration and then generalized to the following iterations. To this end, we first establish the expressions of Ė (t), and then show the finiteness of E 1 (t). For any iteration index, wehave Ė (t) = N Ė j, (t) = j=1 N j=1 ( V 1 j, (t)+ V 2 j, (t)+ V 3 j, (t) ). (A17) Similar to the derivations in Part I, for V 1 (t), wehave j, V 1 j, (t) λ j,γ j, (ε j + d j ) θ T j, ξ j, +λ j, γ j, (ε j + d j )b j û j, μ j γ 2 j,. (A18) For V 2 (t), wehave j, 2p j V 2 j, ε j + d (t) =( θ j, θ) T ( θ j, θ) j =θ T θ 2θ T θ j, + θ T θ j, j, =θ T θ 2θ T θ j, 1 2p j λ j, γ j, θ T ξ j, + θ T j, 1 θ j, 1 + p 2 j λ2 j, γ2 j, ξt j, ξ j, + 2p j θ T j, 1 λ j,γ j, ξ j, θ T θ 2θ T θ j, 1 + θ T j, 1 θ j, 1 2p j λ j, γ j, θ T ξ j, + 2p j θ T j, λ j,γ j, ξ j, =θ T θ 2θ T θ j, 1 + θ T j, 1 θ j, 1 + 2p j θ T j, λ j,γ j, ξ j,.

65 1800 SHEN AND XU Furthermore, for V 3 (t), wehave j, 2q j V 3 j, (ε j + d j )b (t) =û2 j, j =(û j, 1 q j λ j, γ j, ) 2 = û 2 j, 1 2q jû j, 1 λ j, γ j, + q 2 j λ2 j, γ2 j, û 2 j, 1 2q jû j, λ j, γ j,. Combining the above three inequalities of V 1 j,, V 2 j,,and V 3 together leads to j, Ė j, μ j γ 2 j, + ε j + d j 2p j ( θ T θ 2θ T θ j, 1 + θ T j, 1 θ j, 1 ) + ε j + d j b j û 2 j, 1 2q j = μ j γ 2 j, + ε j + d j (θ θ j, 1 ) T (θ θ j, 1 )+ ε j + d j b j û 2 j, 1 2p j 2q. (A19) j Now, we move on to verify the finiteness of E 1 (t). It can be derived from Equation A16 that the finiteness or boundedness of E (T) is ensured for each iteration provided that E 1 (T) is finite. Because the initial values of updating laws (13) and (14) are set to be zero, ie, û j,0 = 0and θ j,0 = 0, it is found that from Equation A19 that Ė j,1 μ j γ 2 j,1 + ε j + d j 2p j θ T θ. (A20) It is evident that Ė j,1 (t) is bounded over [0, T], j. Hence, the boundedness of E j,1 (t) over [0, T] is also obtained, j. In particular, when t = T, E j,1 (T) is bounded. Noticing E 1 (T) = N j=1 E j,1(t), we conclude that E 1 (T) is bounded. Now, we are in the position of checing the boundedness property of E (t) for 2, the parameter estimation θ j,,andthe control signal û j,. According to the definition of E (t) and the boundedness of E (T), the boundedness of V 2 j, (T) and V3 (T) are guaranteed for j, all iterations. That is, for any Z +, there are finite constants M 1 > 0andM 2 > 0 such that t T ( θ j, θ) T ( θ j, θ)dτ ( θ j, θ) T ( θ j, θ)dτ M 1 < (A21) t b j û 2 j, dτ 0 T b j û 2 j, dτ M 2 <. (A22) Hence, the boundedness of θ j, and û j, is guaranteed directly. Recalling the differential of E j, in Equation A19, we have t ( E j, (t) =E j, (0)+ μ j γ 2 j, + ε j + d j (θ θ j, 1 ) T (θ θ j, 1 )+ ε j + d j ) b j û 2 j, 1 dτ 2p j 2q j E j, (0)+ ε j + d j 2p j 0 0 t (θ θ j, 1 ) T (θ θ j, 1 )dτ+ ε j + d j 2q j 0 t b j û 2 j, 1 dτ E j, (0)+ ε j + d j M 1 + ε j + d j M 2. (A23) 2p j 2q j Meanwhile, from the alignment condition, E j, (0) =E j, 1 (T) is also bounded. Therefore, it is evident E j, (t) is bounded over [0, T] and so is the amount E (t). Part III. Convergence of extended observation errors We recall N T ΔE (T) μ j γ 2 j, dτ. (A24) j=1 0

66 SHEN AND XU 1801 Hence, for t = T, E (T) =E 1 (T)+ E 1 (T) E 1 (T) ΔE i (t) i=2 N i=2 j=1 0 N i=2 j=1 0 T T μ j γ 2 j,i dτ μ j γ 2 j,i dτ. (A25) Since E (T) is positive and E 1 (T) is bounded, we conclude that N i=2 j=1 T 0 μ jγ 2 j,i dτ is finite. Hence, we conclude that γ j, converges to zero asymptotically in the sense of L 2 -norm, as,thatis, T lim 0 T μ j γ 2 j,dτ=0, j = 1,, N. (A26) Moreover, noting γ j, = z j, from Equation 9, we have actually obtained the convergence of extended observation errors z in the sense of L 2 T -norm, ie, lim T 0 z 2 2 dτ=0. Part IV. Constraints verification on states In the last part, we have shown that E (t) is bounded over [0, T] for all iterations. Thus, it is guaranteed that V 1 (t), ie, j, V(γ 2 (t), t), is bounded over [0, T] for all iterations and all agents. According to the definition of the so-called γ-type BLF, we can j, conclude that γ j, < bj holds over [0, T], j = 1,, N, Z +. Noticing that γ j, = z j,,wehave z j, < bj, j = 1,, N. Denote m max j bj. Then, it is evident that z N m, Z +. On the other hand, the relationship between z and ē in (6) leads to ē = 1 z. This further yields ē σ max ( 1 ) z 1 σ min ( )N m (A27) for all iteration number Z +. Then, for the constraints imposed on states, ie, x j, < s, we can set m =( s x r )σ min ( ) N. Under this setting, the tracing error will be bounded as follows: e j, ē 1 σ min ( ) N 1 m σ min ( ) N( s x r ) σ min( ) = s x r. (A28) N This further yields that x j, x r = e j, s x r, and therefore x j, s x r + x r = s. That is, the state constraint of each agent is satisfied. In addition, the unnown function ξ j, is bounded as its argument x j, has been shown to be bounded. Incorporating with the result that γ j, and λ j, are bounded, and noting the control law (12), we can conclude that the input profile u j, is also bounded. Part V. Uniformly consensus tracing In the last part, it is shown that γ j, is bounded by bj for all iterations. Recall that γ j, also converges to zero in the sense of L 2 T -norm, which is shown in Part IV. Then, we can conclude that γ j, 0 uniformly as, j = 1,, N. Inotherwords, z j, 0 uniformly as, j = 1,, N. Then z 0. Meanwhile, z = ē,and is a positive matrix. Hence, ē 0 uniformly as. In conclusion, the uniform consensus tracing is proved. The proof is completed. PROOF OF THEOREM 2 The main proof of this theorem is similar to the one of Theorem 1 with minor modifications. We use the same BCEF given by Equations A1 to A4.

67 1802 SHEN AND XU The derivations from Equations A5 to A12 are ept. Modifications are made to Equations A13 and A14 as follows. and ΔV 2 j, (T) = ε j + d j 2p j ε j + d j 2p j = ε j + d j 2p j ε j + d j p j T 0 T 0 T 0 T ( θ j, 1 θ) T ( θ j, 1 θ)dτ 0 ( θ j, θ) T ( θ j, θ)dτ ε j + d T j 2p j 0 ) T ( θ j, θ ( θ j, 1 ) ( θ j, + θ ( θ j, 1 ) 2θ) dτ ( θ j, θ) T ( θ j, θ)dτ ε j + d j 2p j ) ( θ j, θ) ( θ T j, θ ( θ j, 1 ) dτ 0 T =(ε j + d j ) λ j, γ j, θ T j, ξ j,dτ 0 ΔV 3 j, (T) =ε j + d j 2q j b j ε j + d j b j 2q j [ [ = ε j + d j b j 2q j ε j + d j b j q j 0 T 0 T 0 T 0 = (ε j + d j )b j T ( ) T ( ) θ ( θ j, 1 ) θ θ ( θ j, 1 ) θ dτ T T ] û 2 j, dτ û 2 j, 1 dτ 0 T ] û 2 j, dτ u (û j, 1 ) 2 dτ 0 (ûj, + u (û j, 1 ) )( û j, u (û j, 1 ) ) dτ û j, (ûj, u (û j, 1 ) ) dτ 0 T λ j, γ j, û j, dτ. (A29) (A30) As a consequence, the decreasing property of Equations A15 and A16 is still true. The boundedness proof in Part II would be much simpler. This is because the introduced projection could guarantee a prior boundedness of the differential of Ė j,. To be specific, the estimation of V 1 j, does not change, but the estimation of V 2 j, and V 3 now become j, 2p j ε j + d j V 2 j, (t) =( θ j, θ) T ( θ j, θ) =θ T θ 2θ T θ j, + θ T j, θ j, =θ T θ 2θ T θ ( θ j, 1 ) 2p j λ j, γ j, θ T ξ j, + θ ( θ j, 1 ) T θ ( θ j, 1 )+p 2 j λ2 j, γ2 j, ξt j, ξ j, + 2p j θ ( θ j, 1 ) T λ j, γ j, ξ j, Q 1 2p j λ j, γ j, θ T ξ j, + 2p j θ T j, λ j,γ j, ξ j, = Q 1 + 2p j θ T j, λ j,γ j, ξ j,, where Q 1 θ T θ 2θ T θ ( θ j, 1 ) θ ( θ j, 1 ) T θ ( θ j, 1 ) is bounded for all iterations and all agents, and 2q j (ε j + d j )b j V 3 j, (t) =û2 j, = ( u (û j, 1 ) q j λ j, γ j, ) 2 where Q 2 = u (û j, 1 ) 2 is also bounded. = u (û j, 1 ) 2 2q j u (û j, 1 )λ j, γ j, + q 2 j λ2 j, γ2 j, Q 2 2q j û j, λ j, γ j,,

68 SHEN AND XU 1803 Thus, we have Ė j, μ j γ 2 j, + ε j + d j Q 1 + ε j + d j b j Q 2. (A31) 2p j 2q j It is seen that Ė j, is bounded. The following proofs are similar to the one of Theorem 1 and, thus, is omitted. This completes the proof. PROOF OF THEOREM 3 We give the proof by maing modifications to the one of Theorem 1. To be specific, several steps in Part I, Part II, and Part III need modifications, whereas Parts IV and V are identical to those in the proof of Theorem 1. The BCEF (A1) to (A4) is also adopted in this case. In the following, we give a brief proof from Part I to Part III. Part I. Difference of E (t) The difference of V 1 j, is ΔV 1 j, (T) = 0 T λ j, γ j, ( (ε j + d j ) θ T j, ξ j, +(ε j + d j ) θ T j, ξ j, +(ε j + d j )b j u j, σ j, ) dτ T μ j γ 2 j, dτ. 0 (A32) Substituting controller (29), we have (ε j + d j )b j u j, =(ε j + d j )b j û j, (ε j + d j ) b ( λj, j θ T j, b ξ γ j, θ T j, j, tanh ξ j, min η ( λj, γ j, σ j, b j b min σ j, tanh η ) ). (A33) Using Lemma 2, we can conclude that (ε j + d j )λ j, γ j, θ T j, ξ j, (ε j + d j ) b ( λj, j λ j, γ j, θ T j, b ξ γ j, θ T j, j, tanh ξ ) j, min b [ ( λj, j (ε j + d j ) λ j, γ j, θ T j, b ξ j, λ j, γ j, θ T j, ξ γ j, θ T j, j, tanh ξ )] j, min η η b j b min (ε j + d j )δη δη (A34) λ j, γ j, σ j, b ( ) j λj, γ j, σ j, λ j, γ j, σ j, tanh b min η b [ ( )] j λj, γ j, σ j, λ j, γ j, σ j, λ j, γ j, σ j, tanh b min η b j b min δη δη (A35) where δ is a constant satisfying δ > b j b min (ε j + d j )δ in consideration of Assumption A1.

69 1804 SHEN AND XU Therefore, the difference of V 1 now becomes j, T ] ΔV 1 j, [ λ (T) j, γ j, (ε j + d j ) θ Tj, ξ j, +λ j, γ j, (ε j + d j )b j û j, μ j γ 2j, + 2 δη dτ. 0 (A36) By the same derivations of ΔV 2 j, (T) and ΔV3 j, (T) in Equations A13 and A14, the differences of E j,(t) and E (T) are formed as T ΔE j, (T) μ j γ 2 j, dτ+2 δη T ΔE (T) 0 N j=1 0 T μ j γ 2 j, dτ+2n δη T. (A37) (A38) Part II. Finiteness of E (t) and involved quantities Similar to derivation of difference ΔV 1 j, in Equation A36, we have the differential of V1 j,, V 1 j, (T) λ j,γ j, (ε j + d j ) θ T j, ξ j, +λ j, γ j, (ε j + d j )b j û j, μ j γ 2 j, + 2 δη. (A39) Since the iterative updating laws (30) and (31) are identical to Equations 13 and 14, the differentials of V 2 j, and V3 eep the j, same form to those given in the proof of Theorem 1. As a consequence, there is an additional term appended to Equation A19, which becomes Ė j, = μ j γ 2 j, + ε j + d j (θ θ j, 1 ) T (θ θ j, 1 )+ ε j + d j b j û 2 j, 1 2p j 2q + 2 δη. (A40) j Noted that 2 δη is bounded, thus this term would not affect the essential boundedness of Ė j,. Following similar steps, it is easy to show that E j,1 (t) and E 1 (t) are bounded over [0, T]. In particular, E j,1 (T) is bounded. Now, we are in the position of checing the boundedness of E (T). To show this, recall Equation A37, and we find that j = 1,, N, E j, (T) =E 1 (T)+ ΔE j,i (T) i=2 E 1 (T)+2T δ η i i=2 E 1 (T)+4 δt i=2 0 i=2 0 T T μ j γ 2 j,i dτ, μ j γ 2 j,i dτ where =1 η 2 is used. Thus, it is evident that E j, (T) is bounded for all iterations and all agents, and so is E (T).Fromnow on, the following steps of Part II in the proof of Theorem 1 can be imitated to prove the boundedness of E (t), t [0, T]. Part III. Convergence of extended observation errors The proof is an immediate conclusion of the following derivations: (A41) E (T) =E 1 (T)+ E 1 (T) ΔE i (t) i=2 N i=2 j=1 0 N E 1 (T)+4N δt T μ j γ 2 j, dτ+2n δt i=2 j=1 0 T μ j γ 2 j, dτ. i=2 η i (A42) Since E 1 (T)+4N δt is finite and E (T) is positive, we have lim T 0 γ2 dτ=0, j = 1,, N. j,

70 SHEN AND XU 1805 PROOF OF THEOREM 4 We still apply the BCEF given in Equations A1 to A4 and chec the difference of E (T) first. Part I. Difference of E (t) By similar steps to the proof of Theorem 3, we are with little difficulty to obtain T [ ] ΔV 1 j, (T) λ j, γ j, (ε j + d j ) θ T j, ξ j, +λ j, γ j, (ε j + d j )b j û j, μ j γ 2 j, + 2 δη dτ. 0 (A43) Next, let us consider ΔV 2 (T), which is derived as j, ΔV 2 j, (T) =ε j + d j 2p j = ε j + d j 2p j 0 + ε j + d j p j = ε j + d j 2p j T ( θ j, θ) T ( θ j, θ)dτ ε j + d j 2p j T T T ( θ j, θ j, 1 ) T ( θ j, θ j, 1 )dτ ( θ j, θ) T ( θ j, θ j, 1 )dτ 0 T ( θ j, 1 θ) T ( θ j, 1 θ)dτ T ( θ j, θ j, 1 ) T ( θ j, θ j, 1 )dτ+(ε j + d j ) λ j, γ j, θ j, ξ j, dτ 0 (ε j + d j )φ p j 0 T θ T j, θ j, dτ. (A44) Meanwhile, the following equality is true: θ T j, θ j, = 1 2 θ T j, θ j, 1 2 θ T j, θ j, θt θ. (A45) Then, combining Equations A44 and A45, we have ΔV 2 j, (T) (ε j + d j ) T 0 λ j, γ j, θ j, ξ j, dτ+ (ε j + d j )Tφ 2p j θ T θ. (A46) For the third component of E j, (T), wehave [ ΔV 3 j, (T) =ε j + d T j b j 2q j 0 = ε j + d T j 2q j 0 = ε j + d T j 2q j 0 (ε j + d j )ϕ q j T (ε j + d j ) 0 û 2 j, dτ 0 T û 2 j, 1 dτ ] b j û j, (û j, û j, 1 )dτ 0 T b j (û j, û j, 1 ) 2 dτ (ε j + d j ) b j û j, λ j, γ j, dτ b j (û j, û j, 1 ) 2 dτ+ ε j + d j q j 0 T b j (û j, ) 2 dτ b j û j, λ j, γ j, dτ. T 0 (A47) Therefore, combining Equations A43, A46, and A47 gives ΔE j, (T) T 0 μ j γ 2 j, dτ+2t δη + (ε j + d j )Tφ 2p j θ T θ (A48)

71 1806 SHEN AND XU and, consequently, ΔE (T) N j=1 0 T μ j γ 2 j, dτ+κ, where κ 2TN δη + (N+1)NTφ θ T θ > 0, and p 2p m = min j p j. Due to the existence of κ > 0, it is impossible to derive that ΔE (T) is m negative even after sufficient iterations. Instead, we will show that the tracing error would enter a neighborhood of zero within finite iterations. Part II. Bounded convergence of extended observation errors and tracing errors By similar steps to the proof of Theorem 3, it is still true that E 1 (T) is finite. Let μ m denote the smallest value of μ j, j = 1,, N, ie, μ m = min j μ j. Then, for any finite sum of ΔE (T), wehave E (T) =E 1 (T)+ E 1 (T) E 1 (T) = E 1 (T) = E 1 (T) ΔE i (T) [ N i=2 i=2 [ i=2 T j=1 0 N T μ m j=1 0 [ N μ m i=2 [ μ m i=2 j=1 0 T 0 μ j γ 2 j, dτ κ ] T ] γ 2 j, dτ κ ] z 2 j, dτ κ μ m z 2 2 dτ κ μ m (A49) ]. (A50) Due to the positiveness of E (T), we can show the boundedness and convergence of T 0 z 2 dτ from Equation A50. 2 (a) If T 0 z 2 dτ goes to infinity at the th iteration, then the RHS of Equation A50 will diverge to infinity owing to the 2 finiteness of κ μ m. This contradicts to the positiveness of E (T). (b) For any given ϵ > 0, there is a finite integer 0 > 0suchthat T 0 z 2 2 dτ < κ μ m +ϵfor 0. Otherwise, T 0 z 2 2 dτ > κ μ m +ϵholds for. Thus the RHS of Equation A50 will approach, which again contradicts the positiveness of E (T). Hence, the extended observation error T 0 z 2 2 dτ will enter the specified bound κ μ m +ϵwithin finite iterations. Noting the relationship between extended observation errors and tracing errors, ie, Equation 6, we have T T ē 2 2 dτ= 1 z dτ 1 T ( ) z 0 σ dτ. (A51) min 0 whence, the tracing error T 0 ē 2 dτ will enter the specified bound σ 2 2 min ( )(κ μ m +ϵ)within finite iterations. Part III. Boundedness of θ j, and û j, The parameter updating law (34) yields θ T θ 1 j, j, = (1 +φ) ( θ 2 j, 1 + p j λ j, γ j, ξ j, ) T ( θ j, 1 + p j λ j, γ j, ξ j, ) 1 = ( θ T (1 +φ) θ ) 2 j, 1 j, 1 + 2p j λ j, γ j, θ T j, 1 ξ j, +(p j λ j, γ j, ) 2 ξ T j, ξ j,. (A52) Using the Young inequality, we have 2p j λ j, γ j, θ T j, 1 ξ j, c θ T j, 1 θ j, c ( p jλ j, γ j, ) 2 ξ T j, ξ j,, where 0 < c < φ. Substituting the above inequality to Equation A52 yields T θ j, 2 2 dτ= 1 + c T θ (1 +φ) 2 j, c 2dτ+ (1 +φ) T ( p j λ j, γ j, ) 2 ξ j, 2 2dτ. (A53) It is observed that λ j, is bounded due to the definition of BLF and constraints condition. ξ j, is a continuous function of z j,, thus it is also bounded provided that z j, is bounded. Meanwhile, γ j, = z j,. In conclusion, the last term on the RHS of Equation A53 is bounded for all iterations since the boundedness of z j, has been proved in Part II. Denote the upper bound by θ.

72 SHEN AND XU 1807 Then, Equation A52 yields 0 T ( ) i θ j, c 2 dτ= c (1 +φ) i=1 2 (1 +φ) 2 0 ( ) i 1 + c c (1 +φ) 2 (1 +φ) θ 2 i=1 T (p j λ j, γ j, ) 2 ξ j, 2 2 dτ c 1 1+c (1 +φ) θ = c θ. (A54) 2 (1 +φ) 2 1 c (1+φ) 2 The boundedness of θ j, is thus obtained. Following similar steps, we can further prove the boundedness of û j, for all iterations and for all agents. This completes the proof. PROOF OF THEOREM 5 First, to show the finite iteration convergence to a desired neighborhood of zero, it is sufficient to use the same steps to the proof of Theorem 4 with removing the last term on RHS of Equation A44. Thus, the proof of item (i) is completed. Next, let us verify the state constraints. We have shown that the tracing error will enter a neighborhood of zero within finite iterations, where the magnitude of the neighborhood could be predefined; thus, for a given neighborhood bound, there is a finite integer, for example, 1, such that the tracing error enters the given neighborhood for 1. Then, it is evident that E (T) is bounded, < 1. Thus, V 1 is also bounded, whence the constraints are satisfied following similar steps of previous proofs. j, Moreover, for the iterations 1, the tracing error will enter a prespecified neighborhood. Then, it is found that the learning processes (37) and (38) stop updating and the control system repeats its tracing performance. As a result, the state constraints are still satisfied. Finally, whenever the tracing error enters the predefined neighborhood, the learning algorithms (37) and (38) stop updating and the boundedness is therefore guaranteed. This completes the proof.

74 5850 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 11, NOVEMBER 2017 A Novel Marov Chain Based ILC Analysis for Linear Stochastic Systems Under General Data Dropouts Environments Dong Shen, Member, IEEE, and Jian-Xin Xu, Fellow, IEEE Abstract This technical note contributes to the convergence analysis for iterative learning control (ILC) for linear stochastic systems under general data dropout environments, i.e., data dropouts occur randomly at both the measurement and actuator sides. Data updating in the memory array is arranged in such a way that data at every time instance is updated independently, which allows successive data dropouts both in time and iteration axes. The update mechanisms for both the computed input and real input are proposed and then the update process of both inputs is shown to be a Marov chain. By virtue of Marov modeling, a new analysis method is developed to prove the convergence in both mean square and almost sure senses. An illustrative example verifies the theoretical results. Index Terms Almost sure convergence, data dropout, iterative learning control, Marov chain, mean square convergence. I. INTRODUCTION In practical industrial processes, many systems complete a given tas in a finite time interval and repeat the process continuously. Then, one is interested in how the repetition could help improve the control performance. This motivates the introduction of an intelligent control strategy called iterative learning control (ILC). In ILC, the tracing information from previous iterations is taen full advantage of to gradually improve the performance from iteration to iteration. After three decades of developments, ILC has shown a distinct advantage in high-precision tracing and fewer requirements on system information [1] [3]. Meanwhile, based on the fast development of communication technology, currently, more and more control systems are implemented in the networed mode to enhance flexibility and robustness. For example, the ILC has been applied to two-lin robotic fish in [4], where the control signal is transmitted to the robot fish through wireless networ. Similar examples include the formation control of satellites and unmanned aerial vehicles (UAV). In this implementation, the plant and the controller are separated and communicated with each other through wired/wireless networs. Therefore, a problem arising naturally is the data dropout phenomenon, which would damage the tracing performance. This problem motivates us to consider networed Manuscript received October 6, 2016; revised December 4, 2016; accepted December 7, Date of publication December 9, 2016; date of current version October 25, This wor was supported by National Natural Science Foundation of China ( , ), Beijing Natural Science Foundation ( ) and the China Scholarship Council ( ). Recommended by Associate Editor P. Bolzern. D. Shen is with the College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , P. R. China ( shendong@mail.buct.edu.cn). J.-X. Xu is with the Department of Electrical and Computer Engineering, National University of Singapore , Singapore ( elexujx@nus.edu.sg). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TAC ILC under data dropout conditions and some pioneer papers have been reported [5] [16]. Most of the reported results considered a special case that the data dropout problem only occurs at the measurement side [5] [12]. In other words, the networ at the actuator side was assumed to wor well so that the generated input signal can be always transmitted to the system timely. In this case, a binary valued variable was introduced to denote the occurrence of data dropout. However, when moving to the general case that data dropout occurred at both measurement and actuator sides [13] [16], the one-side transmission results cannot be extended directly due to the fact that the input signal generated by the controller and the one used for the system are not always identical, which is different from the case in [5] [12]. In order to ensure a feasible compensation mechanism, additional requirements were imposed in the existing papers. To be specific, in [13], [14], the dropped data was compensated with the data one-step bac within the same iteration. Consequently, a limitation arises in that the data at adjacent time instants cannot be dropped at the same iteration. In [15], [16], the dropped data was compensated with the data at the same time instant one-iteration bac. However, this required that there was no simultaneous data dropout at the same time instant across any two adjacent iterations. Moreover, such requirements on data dropout might be difficult to meet in practical applications. Thus, it is of great significance to address the general and random successive data dropout problem. This technical note aims to complete the exploration of this topic. To be specific, the general data dropout occurring at both measurement and actuator sides are considered for linear stochastic systems. The data dropout is modeled by a Bernoulli random variable without further conditions. In other words, successive data dropouts with arbitrary length are allowed in this technical note, which is seldom considered in previous papers. Because the existence of successive data dropouts, the updating mechanism proposed in this technical note only uses the available data while guaranteeing strong convergence properties. Moreover, for simplicity, the P-type update law is adopted in this technical note to illustrate the learning convergence, though learning controller does not restrict to P-type. Furthermore, the main difficulty in proving the convergence for the general data dropout problem lies in the random asynchronism between the input updating at the controller and the system. In this technical note, we first analyze the sample path behavior of the update process and then reveal that the process actually is a Marov chain. This paves a novel way for convergence analysis. The ILC for Marovian switching systems was reported in [17], where the controller and mean square stability conditions were obtained for a leader-follower networ based on a group of LMIs. The techniques in this technical note differ from [17] that we directly establish the mean square and almost sure convergence for stochastic systems with the classic P-type law and mild design conditions. This technical note distinguishes from previous papers in the following novelties: 1) this technical note allows randomly successive data dropouts at both measurement and actuator sides, which maes the IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See standards/publications/rights/index.html for more information.

75 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 11, NOVEMBER input signal updates at controller and plant asynchronous and has not been addressed in ILC field; 2) we formulate the control problem under data dropouts into a Marov chain, and a novel analysis framewor is then provided; 3) the rigorous convergence proof both in mean square sense and almost sure sense under stochastic noises is given, which is hard to achieve by traditional analysis approaches; and 4) the results show the effectiveness and robustness of traditional P-type update law against random factors. In addition, a decreasing gain sequence is incorporated into the ILC algorithm to actively cope with the systemic and measurement noises. The technical note is arranged as follows. Section II gives the problem formulation. Detailed behavior analysis, convergence proof, and performance discussions are provided in Section III. Section IV presents an illustrative example to verify the theoretical results. Concluding remars are provided in Section V. Fig. 1. Bloc diagram of the proposed ILC framewor. II. PROBLEM FORMULATION Consider the following linear time varying stochastic system x (t +1)=A t x (t)+b t u (t)+w (t), y (t) =C t x (t)+v (t), (1) where t =0, 1,,Ndenotes the time index, N is the iteration length, and =1, 2,, denotes the iteration index. u (t) R p, y (t) R q, and x (t) R n are the system input, output, and state, respectively. The system matrices A t, B t,andc t are with appropriate dimensions. w (t) and v (t) are system noises and measurement noises, respectively. The desired reference is y d (t), which satisfies the following formulation, x d (t +1)=A t x d (t)+b t u d (t), y d (t) =C t x d (t). (2) The following assumptions are required for further analysis. A1: The input-output coupling matrix C t+1 B t R q p is of fullcolumn ran, t = 0,,N 1 and therefore q p. A2: The initial state x (0) is precisely reset, i.e., x (0) = x d (0). Remar 1: The initial state reset is a critical issue in ILC field. The Assumption A2 is the well-nown identical initialization condition (i.i.c.), which has been used in many ILC papers. Efforts have been dedicated to the relaxation of i.i.c., but they require further information on the system or additional control mechanism. However, the extension of i.i.c. is out of our scope, thus, we simply assume A2. Denote the increasing σ-algebra F σ{x j (t), w j (t), v j (t), 1 j, t = 0, 1,,N} generated by the information from the first iteration to the th iteration. We give the following assumption on stochastic noises. A3: The stochastic noises w (t) and v (t) are independent for different time instants and they are independent with each other. For each t, E{w (t) F 1 } = 0, E{v (t) F 1 } = 0, sup E{w 2 (t) F 1 } <, sup E{v 2 (t) F 1 } <. Here E( ) denotes the mathematical expectation operator. The control objective of this technical note is to design a learning algorithm such that the generated input sequence could trac the desired trajectory asymptotically along iteration axis under data dropouts and stochastic noises conditions. Because of the existence of stochastic noises, the actual output could not precisely trac the desired trajectory. Thus, our objective is to minimize the following performance index 1 n V t = lim y d (t) y (t) 2. (3) n n =1 By Lemma 1 of [11], in order to minimize the above performance index, it is sufficient to show that the input sequence u (t) u d (t), t = 0, 1,,N 1. This is the objective of subsequent analysis. In addition, it is noticed from Lemma 1 of [11] that the minimum of the above performance index (3) is a linear combination of the upper bounds of covariance of both system and measurement noises w (t), v (t). In other words, the ultimate tracing performance is determined by the stochastic noises. In this technical note, the general networed framewor is considered, that is, both networs at the measurement and actuator sides would suffer random data dropouts. The data dropouts are modeled by two random variables, σ (t) and γ (t), subject to Bernoulli distribution. Specifically, both σ (t) and γ (t) equal 1 if the corresponding data are transmitted successfully, and 0 otherwise. Moreover, P(σ (t) =1)=σ and P(γ (t) =1)=γ, 0 < σ,γ <1, where P( ) denotes the probability of the indicted event. Both σ (t) and γ (t) are independent for different time instant t and iteration. The bloc diagram of the framewor is illustrated in Fig. 1. Inthis framewor, the update of the input follows the intermittent type. In other words, if the data are successfully transmitted at the measurement side, then the algorithm would update its input signal; while if the data are lost during transmission at the measurement side, then the algorithm would stop updating and retain the stored input signal of the previous iteration. At the actuator side, if the input signal is successfully transmitted, then the plant will use this new input signal; if the input signal is lost, then the plant will operate with the stored input signal of the previous iteration. The data dropouts at measurement and actuator sides occur independently. In this framewor, we denote the control signal generated by the learning controller, called the computed control, asu c (t), and the real control signal fed to the plant, called the real control, asu r (t). The worflow of Fig. 1 is as follows: when the system finishes one batch, all the data are transmitted bac to the controller, then the controller computes the control signal for the next batch, and the computed control would then be transmitted to the plant so that the system could run for the next batch. Then the updating law in the controller is formulated as u c +1(t) =σ +1 (t)u r (t)+(1 σ +1 (t))u c (t) + σ +1 (t)a L t e (t +1), (4) while the actually used control signal for the ( +1)th iteration is u r +1(t) =γ +1 (t)u c +1(t)+(1 γ +1 (t))u r (t), (5)

76 5852 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 11, NOVEMBER 2017 where e (t) y d (t) y (t), L t is the learning gain matrix and a is a decreasing sequence to ensure zero-error tracing. The sequence {a } should satisfy a > 0, =1 a =, and =1 a2 <. Remar 2: The decreasing sequence {a } used in (4) is a technical means to handle the stochastic noises. If the stochastic noises are eliminated from the system, the term a can be removed from (4). It is well nown that an appropriate decreasing gain for the correction term in updating processes is a necessary requirement to ensure convergence in the recursive computation for optimization, identification, and tracing of stochastic systems [18], [19]. This fact is also illustrated in ILC literature [2], [20], [21]. Remar 3: The random variables σ +1 (t) and γ +1 (t) are defined independently along both iteration and time axes. Thus it is apparent that successive data dropouts in time axis are allowed in this formulation. Moreover, from (4) it is noticed that if σ +1 (t) =1, i.e., the data is successfully transmitted, then the computed control is updated; otherwise if σ +1 (t) =0, then the computed control copies its corresponding value of the previous iteration. However, in the latter case, the corresponding computed control of the previous iteration may liewise copy the value of its previous iteration. Consequently, successive data dropouts in iteration axis are also allowed. In addition, No extra storage beyond one batch size is required by the memory array because only the latest data needs to be stored, as shown in the updating laws (4) and (5) through two random variables σ +1 (t) and γ +1 (t). In other words, at each time instance, if the input is updated, then the updated value replaces the stored data; otherwise, the stored input eeps unchanged. III. MAIN RESULTS In this section, we first show that the update process of the computed control and the real control is a Marov chain, which paves a critical and novel way to obtain the convergence. Then the convergence in both mean square and almost sure senses are established. In addition, the convergence speed is explicitly discussed at the end of this section. A. Marov Chain of the Input Sequence In this subsection, we will establish the Marov chain of the input sequences. To mae our idea clear, we first consider the case for an arbitrary time instant and then generalize it to the whole iteration. Now, for arbitrary fixed time instant t, 0 t N 1, let us consider the sample path behaviors of update algorithms (4) and (5), where a sample path means an arbitrary sequence with respect to iteration. It is noticed that the computed control and the real control are the same whenever γ =1. In the following, the sample path behavior is called synchronization if the computed control and real control are equal to each other; otherwise, it is called asynchronization. Moreover, it is called a renewal if both computed control and real control are in the state of synchronization but different from their last synchronization. The following lemma shows that the sample path behavior actually is a Marov chain in terms of synchronization and asynchronization. Lemma 1: Consider the updating laws (4) and (5). The updating of the values of u r (t) and uc (t) forms a Marov chain. Proof: We start from the th iteration where u r (t) =uc (t).thatis, the computed control and real control are in the state of synchronization at the th iteration. Then, for the ( +1)th iteration, four possible outcomes exist. 1) Case 1: σ +1 (t) =0andγ +1 (t) =1. The probability of this case is P(σ +1 (t) =0)P(γ +1 (t) =1) =(1 σ)γ. In this case, from (4) and (5) one has u c +1 (t) =uc (t) = u r (t), ur +1 = uc +1 (t) =ur (t). Thus the computed control and the real control retain the same as the th iteration. 2) Case 2: σ +1 (t) =0andγ +1 (t) =0. The probability of this case is P(σ +1 (t) =0)P(γ +1 (t) =0) =(1 σ)(1 γ). In this case, it is obvious that u c +1 (t) = u c (t) =ur (t), ur +1 (t) =ur (t). That is, no change in computed control nor in real control occurs. 3) Case 3: σ +1 (t) =1andγ +1 (t) =1. The probability of this case is P(σ +1 (t) =1)P(γ +1 (t) =1) = σγ. In this case, we find that u c +1 (t) =ur (t)+a L t e (t + 1), u r +1 (t) =uc +1 (t) =ur (t)+a L t e (t +1). In other words, the computed control and the real control are updated simultaneously and are still equal to each other. In short, a renewal occurs. 4) Case 4: σ +1 (t) =1andγ +1 (t) =0. The probability of this case is P(σ +1 (t) =1)P(γ +1 (t) =0)=σ(1 γ). In this case, only the computed control is updated, u c +1 (t) =uc +1 (t) =ur (t)+a L t e (t +1), u r +1 (t) = u r (t). As a result, the state becomes asynchronization. From the above discussions, we find that (a) the computed control u c +1 (t) and the real control ur +1 (t) stay in the state of synchronization except in the last case; and (b) a renewal occurs when no data dropouts happen at the measurement and the actuator sides with a probability σγ. Therefore, we further discuss the last case, that is, we assume that the computed control and the real control become the last case at the ( +1)th iteration. Then, four possible outcomes exist for the ( +2)th iteration with probabilities of the four outcomes being the same to cases 1 4 above. 1) Case 1 : σ +2 (t) =0andγ +2 (t) =1. In this case, the real control is updated, u c +2 (t) =uc +1 (t) = u r (t)+a L t e (t +1), u r +2 (t)=uc +2 (t) =ur (t)+a L t e (t +1). That is, the computed control and the real control achieve synchronization, andarenewal occurs. 2) Case 2 : σ +2 (t) =0andγ +2 (t) =0. In this case, no change happens to both the computed control and the real control, u c +2 (t) =uc +1 (t) =ur (t)+a L t e (t + 1), u r +2 (t) =ur +1 (t) =ur (t). Then the computed control and the real control are still in the state of asynchronization. 3) Case 3 : σ +2 (t) =1andγ +2 (t) =1. In this case, both the computed control and the real control are updated, u c +2 (t) =ur +1 (t)+a +1L t e +1 (t +1), u r +2 (t) = u c +2 (t). As a result, the computed control and the real control become synchronization again, and a renewal occurs. 4) Case 4 : σ +2 (t) =1andγ +2 (t) =0. In this case, only the computed control is updated, u c +2 (t) = u r +1 (t)+a +1L t e +1 (t +1), u r +2 (t) =ur +1 (t).however, the real control remains the same to the ( +1)th iteration and therefore, the computed control retains the same state to the ( +1)th iteration. Thus, the computed control and real control are in the state of asynchronization. The analysis indicates that (a) from the asynchronization state, the computed control u c +2 (t) and the real control ur +2 (t) will either remain unchanged state or become synchronization again; and (b) a renewal occurs whenever the state changes into synchronization. As a result, we can conclude that the computed control and the real control have two states, i.e., synchronization and asynchronization,respectively. Moreover, from the state of synchronization, the probability of the inputs retaining synchronization is 1 σ(1 γ), while the probability of the inputs switching to asynchronization is σ(1 γ). From the state of asynchronization, the probabilities of retaining asynchronization and switching to synchronization are 1 γ and γ, respectively. Therefore, the two states would switch between each other following a Marov chain, as shown in Fig. 2. The proof of this lemma is completed.

77 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 11, NOVEMBER chain is [ ] [ ] p11 p 12 1 σ(1 γ) σ(1 γ) P = =, (6) p 21 p 22 γ 1 γ Fig. 2. Illustration of the Marov chain of synchronization and asynchronization. S: synchronization; A: asynchronization; : there is a renewal with probability σγ. Remar 4: If γ =1, which means there is no data dropout at the actuator side, then the matrix P is singular as the second column being zero. This implies that the computed control and real input would always be in the state of synchronization. This special case has been discussed in many previous papers. On the other hand, if σ =1,which means that there is no data dropout at the measurement side, then P is also singular as its first row coincides with the last row, and the Marov chain degrades to be a simple Bernoulli sequence. The convergence of such special case could be easy to established. Next we verify that the sample path behaviors for the whole iteration also form a Marov chain. Lemma 1 indicates that, for arbitrary time instant t, the states of synchronization and asynchronization between u c (t) and ur (t) form a Marov chain. From the analysis steps of Lemma 1, it can be seen that the switching of such two states are only determined by the data dropout variables at both sides, i.e., σ (t) and γ (t). In other words, the Marov property of the sample path behavior of u c (t) and ur (t) is irrelevant with their specific values. Moreover, notice that the random variables σ (t) and γ (t) modeling the data dropouts at both sides are independent for different time instants. Owing to such independence among different time instants, the combination of the N Marov chains, generated by the computed control and the real control for each time instant t, 0 t N 1, is also a Marov chain. Specifically speaing, let us introduce a new notation τ (t) to describe the states of synchronization and asynchronization. Thatis, we let τ (t) =1 if the computed control and the real control achieves synchronization, otherwise, τ (t) =0 if they achieves asynchronization. As we have explained above, the states of synchronization or asynchronization are determined only by the data dropout variables σ (t) and γ (t), therefore, τ (i) is independent of τ (j) for any i j. Meanwhile, the evolution of τ (t) is a Marov chain. To show the overall behaviors of all time instants, we further introduce a vector ϕ [τ (0),,τ (N 1)] T, which is a stac of τ (t). Since each variable τ (t) is binary, the vector ϕ has 2 N possible outcomes, i.e., [0, 0,, 0] T, [1, 0,, 0] T,, [1, 1,, 1] T. We denote the set of all possible values of ϕ as S. TheMarov property of ϕ can be proved directly by the definition of Marov chain. Note that, {τ (t)} is a Marov chain, t, thatis,p{τ (t) = i τ 1 = i 1,,τ 1 (t) =i 1 } = P{τ (t) =i τ 1 = i 1 }, t, i {0, 1}. Therefore, P{ϕ = θ ϕ 1 = θ 1,,ϕ 1 = θ 1 } = P{ϕ = θ ϕ 1 = θ 1 },whereθ i S. As a consequence, the switching of the inputs for the general case is also a Marov chain of 2 N states. B. Convergence Analysis In this subsection, the convergence of the input sequences in both mean square and almost sure senses is established. To this end, the original algorithms (4) and (5) are first transformed as a switching system whose random matrix switches as a Marov chain. We first consider the algorithms for arbitrary fixed time instant t. From Fig. 2, it is observed that the transition matrix of the Marov where p 11 P(τ +1 (t) =1 τ (t) =1), p 12 P(τ +1 (t) =0 τ (t) =1), p 21 P(τ +1 (t) =1 τ (t) =0), p 22 P(τ +1 (t) =0 τ (t) =0) with p ij being the element of P at the ith row and jth column, τ (t) denoting the state at iteration, t, whileτ (t) =1and τ (t) =0 denoting the states of synchronization and asynchronization, respectively. Note that 0 < σ, γ<1; thus, P is irreducible, aperiodic, and recurrent, which further means P is ergodic. In addition, p ij > 0, i, j = 1, 2. Note that renewal can occur both at the state of synchronization and asynchronization. Moreover, whenever a renewal occurs, both the computed control and the real control are improved. In other words, the real control is updated if a renewal occurs, otherwise, it remains unchanged. Therefore, it is concluded that updating of the real control also follows a Marov jump way. We could further introduce a random variable λ (t) to denote whether a renewal happens or not, i.e., λ (t) = 1 if a renewal happens, and 0 otherwise. Recalling Fig. 2, wefindthat the occurrence probability of renewal depends on its state of the last iteration. That is, the evolution of λ (t) is also a irreducible, aperiodic, recurrent, and ergodic Marov chain. The update of the real control has the following two formulations, that is, when λ (t) =0, u r +1 (t) =ur (t) =ur (t)+a 0 p q e (t + 1), and when λ =1, u r +1 (t) =ur (t)+a L t e (t +1), where 0 i j denotes zero matrix with appropriate dimensions. We can unify these two cases into the following one u r +1(t) =u r (t)+a λ (t)l t e (t +1), (7) where λ (t) values 0 or 1 subject to a two-state Marov chain. In order to show the convergence, we now lift the above recursion along the time axis. Specifically, denote Y = [y (1),,y (N )] T R qn, U r =[ur (0),,ur (N 1)]T R pn, Y (0) = [C 1 A 0 x (0),,C N A N 1,0 x (0)] T R qn and H is C 1 B C 2 A 1 B 0 C 2 B C 3 A 2,1 B 0 C 3 A 2 B 1 C 3 B C N A N 1,1 B 0 C N A N 1,2 B 1 C N A N 1,3 B 2 C N B N 1 with A i,j A i A i 1 A j, i j. The stochastic noise term ɛ is expressed as v (1) + C 1 w (0) v (2) + C 2 w (1) + C 2 A 1 w (0) ɛ =. v (N )+. N j =1 C N A N 1,j w (j 1) The the lifted system is Y = HU + ɛ + Y (0), while the desired reference y d (t) and associated desired input u d (t) can be lifted in similar formulations, Y d = HU d + Y d (0). From A2 A3, one has that Y (0) = Y d (0) and E{ɛ F 1 } =0, E{ ɛ 2 F 1 } <.Inaddition, the lifted tracing error is E Y d Y = H(U d U r ) ɛ. Then the lifted form of the recursion (7) is U r +1 = U r + a Λ LE = U r + a Λ LHΔU r a Λ ɛ, (8)

78 5854 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 11, NOVEMBER 2017 where Λ = diag{λ (0),, λ (N 1)} I p p, L = diag{l 0,, L N 1 } R pn qn,andδu r = U d U r. Subtracting both sides of the last equation (8) from U d leads to ΔU r +1 =ΔU r a Λ LHΔU r + a Λ ɛ. (9) It is worthy pointing out that Λ is a random matrix with 2 N possible outcomes because all its diagonal entries are binary and the switching of Λ among its outcomes follows a irreducible, aperiodic, recurrent, and ergodic Marov chain. In particular, there are two special case of Λ, namely, I pn pn and 0 pn pn denoting that the inputs at all the N time instants are renewed and unchanged, respectively. Next it is sufficient to show the zero-error convergence of the recursion (9) under mild design conditions. That is, we can move to design the learning matrix L and propose the convergence results. Note that H is a lower triangular bloc matrix with its diagonal blocs being C t+1 B t. Thus we could design the learning gain matrix L t such that L t C t+1 B t are with positive eigenvalues. Before presenting the main theorem, we first give two technical lemmas for convergence analysis. Lemma 2: Let {ξ } be a sequence of positive real numbers and such that ξ +1 (1 d 1 a )ξ + d 2 a 2 (d 3 + ξ ), (10) where d i > 0, i = 1,, 3, are constants and a satisfies a > 0, =1 a =, and =1 a2 <,thenlim ξ =0. Proof: From (10), we have ξ +1 (1 d 1 a + d 2 a 2 )ξ + d 2 d 3 a 2. (11) Since lim a = 0, we can choose a sufficient large integer 0 > 0 such that 1 d 1 a + d 2 a 2 < 1forall 0, and then we have ξ +1 ξ + d 4 a 2, (12) where d 4 d 2 d 3. As a result, it follows from (12) and =1 a2 < that sup ξ <, andξ converges. Based on this boundedness, we have from (11) that ξ +1 (1 d 1 a )ξ + d 5 a 2 with d 5 > 0being suitable constant. Again, noticing =1 a = and =1 a2 <, we conclude that lim ξ = 0. Lemma 3 ([22]): Let X(n), Z(n) be nonnegative stochastic processes (with finite expectation) adapted to increasing σ-algebra {F n } and such that E{X(n +1) F n } X(n)+Z(n), (13) E[Z(n)] <. (14) n =1 Then X(n) converges almost surely, as n. Theorem 1: Consider the linear time varying system (1) and assume A1 A3 hold. Moreover, the data at both measurement and actuator sides are dropped or successfully transmitted for the whole iteration together. The learning update laws (4) and (5) guarantee that the generated input sequence converges to the desired input both in mean square sense and almost sure sense if the learning gain matrix L t satisfies that all eigenvalues of L t C t+1 B t are positive, t = 0, 1,,N 1. Asa result, the desired reference y d (t) is asymptotically traced according to the index (3). Proof: From Lemma 1 and the transformations, we have (9). Now we show that ΔU r converge to zero both in mean square sense and almost sure sense. Note that all eigenvalues of the matrix LH are positive, thus there exists a positive definite matrix Q such that (LH) T Q + QLH = I. Moreover, according to the form of Λ,wehave(LH) T Λ T Q + QΛ LH 0. Then, we define a weighted norm for ΔU r as ΔU r 2 Q (ΔU r )T QΔU r, which can be regarded as a Lyapunov function. Now, we tae the weighted norm to both sides of (9), ΔU r +1 2 Q = ΔU r 2 Q + a 2 Λ LHΔU r 2 Q + a 2 Λ ɛ 2 Q a (ΔU r ) T ((LH) T Λ T Q + QΛ LH)ΔU r +2a (ΔU r ) T QΛ ɛ 2a (ΔU r ) T (LH) T Λ T QΛ ɛ. (15) Define a new increasing σ-algebra F σ{x j (t), w j (t), v j (t), σ j (t), γ j (t), 1 j 1, t = 0,,N}. In view of (4) (5), it is evident that U r F,thatis,U r is adapted to F.FromA3,wehave E{ɛ F } =0. As a result, E{2a (ΔU r ) T QΛ ɛ F } =0, (16) E{2a (ΔU r ) T (LH) T Λ T QΛ ɛ F } =0. (17) Therefore, it is straightforward to have that E{ ΔU r +1 2 Q F } = ΔU r 2 Q + c 0 a 2 ( ΔU r 2 Q + E{ ɛ 2 Q F }) a (ΔU r ) T E{(LH) T Λ T Q + QΛ LH F }ΔU r, (18) where c 0 =max{ LH 2, 1} as Λ 1. Note that Λ is a diagonal matrix, and the evolution of Λ is a irreducible and ergodic Marov chain. Thus there is a positive probability for Λ to be the identity matrix I pn pn. In addition, all the eigenvalues of the remaining 2 N 1 possible diagonal-matrix of Λ are either 1 or 0. Consequently, there exists some constant c 1 > 0 such that E{(LH) T Λ T Q + QΛ LH F } c 1 I. (19) By A3 we have E{ ɛ 2 Q F } <c 2 where c 2 > 0 is a suitable constant. Now we move to show the mean square convergence. Denote ξ E ΔU r 2 Q. Then taing mathematical expectation of both sides of (18) and using (19) lead to ξ +1 (1 c 1 c 3 a )ξ + c 0 a 2 (c 2 + ξ ). where c 3 is positive constant such that I c 3 Q. Then by Lemma 2, we have that E ΔU r 2 Q 0, implying that E ΔU r 2 0. That is, the zero-error convergence of ΔU r in mean square sense is proved. Next, we proceed to show the almost sure convergence of ΔU r. Denote η ΔU r 2 Q. Substituting (19) into (18), we have that E{η +1 F } (1 c 1 c 3 a )η + c 0 a 2 (E{ ɛ 2 F } + η ) η + c 0 a 2 (E{ ɛ 2 F } + η ). (20) Note that two terms on the right-hand side of last inequality, i.e., η and c 0 a 2 (E{ ɛ 2 F } + η ), corresponds to X(n) and Z(n) in Lemma 3, respectively. Moreover, it is evident that =1 E[ c 0 a 2 (E{ ɛ 2 F } + η ) ] = =1 c 0 a 2 (E ɛ 2 + ξ ) <, where the convergence of ξ is used. In other words, (13) and (14) in Lemma 3 are fulfilled. Therefore, it follows that η = ΔU r 2 Q converges almost surely as. On the other hand, we have show that ΔU r converges to zero in mean square, thus ΔU r converges to zero almost surely from probability theory. The proof is completed. Remar 5: In this technical note, the data dropouts are modeled by random variables subject to Bernoulli distribution, which is a widely used model in this research area. The major reason for such assumption is to allow the successive data dropouts with arbitrary length. In addition, such assumption also helps us to mae an explicit convergence proof for the general data dropout problem. Note that the critical technique for establishing the convergence is the Marov property of the

79 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 11, NOVEMBER sample path behavior, thus the proposed method given in this technical note can be applied to the Marovian data dropout case [23], [24]. On the other hand, the inherent update mechanism is the occurrence of renewal, which implies that the essential convergence of the proposed algorithms only requires that the data are not completely lost for each time instant. Remar 6: If no noise is involved in the system, that is, both w (t) and v (t) are eliminated, then the decreasing gain a could be removed from the algorithms. In this case, the recursion of input error (9) reduces to ΔU r +1 =ΔU r Λ LHΔU r (21) and an exponential convergence speed can be then obtained. Moreover, the design of learning gain matrix L t would be different with or without a. For the case with a, the condition on L t is that all eigenvalues of L t C t+1 B t are positive real numbers. Roughly speaing, this condition can be relaxed to the one that all eigenvalues of L t C t+1 B t are with positive real parts. On the other hand, for the case without a, the matrix L t should satisfy that the spectral radius of I L t C t+1 B t is less than one to ensure convergence. The latter design condition can be derived following the traditional contraction mapping method. Therefore, the introduction of decreasing sequence {a } also relax the design range of L t. C. Discussions on Convergence Speed In this subsection, we give a brief description on the convergence speed of the proposed algorithms. As a matter of fact, the convergence speed depends on two individual factors, namely, the renewal frequency and the decreasing gain sequence {a }. The former factor reflects the influence of data dropout on ILC, while the latter factor is a design factor originating from the stochastic approximation algorithm. In the traditional ILC problem where no data dropout occurs, the convergence speed is only determined by the designed decreasing gain a. Now let us chec how is the influence of data dropout on the convergence speed. To this end, we give an explicit description on the renewal frequency. Recalling Fig. 2 and its transition matrix (6), the associated stationary distribution π of the Marov chain can be calculated from πp = π and is given as [ ] γ π γ + σ σ γ, σ σ γ. (22) γ + σ σ γ Note that renewal can occur both at the state of synchronization and asynchronization. FromFig. 2, it is observed that the probability of occurrence of renewal at the state of synchronization is σγ, while the probability at the state of asynchronization is γ. Therefore, the probability of renewal along the iteration axis can be calculated γ P(renewal) = γ + σ σ γ σ γ + σ σ γ γ + σ σ γ γ σ γ = γ + σ σ γ. (23) In this technical note, for clear expression, we simply assume that the probability distributions for different time instants are the same. Thus the above probability of renewal is the average of the whole iteration. This probability describes the renewal frequency of the learning algorithms under data dropout environment. It is noticed that the probability is determined by the sum and product of the successful transmission probability at both sides, i.e., σ + γ and σγ. As a consequence, the renewal frequency is neither determined by the worst side nor the simple sum of both sides. Two facts are observed as follows. 1) Define the function f(σ,γ) =P(renewal). Evidently, f(σ,γ) =f(γ,σ). Moreover, through simple calculations, one has f(σ,γ) σ = γ 2 (γ + σ σ γ) 2 > 0. This condition means that a large successful transmission rate corresponds to increased number of renewals, and thus, faster convergence speed. 2) It is well nown that σγ ( ) 2 σ + γ 2 where the equality holds if and only if σ = γ. This implies that, when the sum σ + γ is fixed, the closer σ approaches to γ, the larger the product σγ is and so is the probability P(renewal). In other words, the convergence speed increases. IV. ILLUSTRATIVE SIMULATIONS In this section, we apply the proposed algorithms to a permanent magnet linear motor (PMLM), which is described by the following discretized model [25] x(t +1)=x(t)+v(t)Δ + ε 1 (t +1) v(t +1)=v(t) Δ 1 2 ψ 2 f Rm v(t)+δ 2 ψ f u(t)+ε Rm 2 (t +1) y(t) =v(t)+ɛ(t) where x and v denote the motor position and rotor velocity, Δ= 10 ms the sampling time interval, R = 8.6 Ω the resistance of stator, m = g the rotor mass, and ψ f = 0.35 Wb the flux linage, 1 = π/τ and 2 = 1.5π/τ,whereτ = m is the pole pitch. The stochastic noises ε 1 (t), ε 2 (t), andɛ(t) obey zero-mean distribution N (0,σ 2 ) with σ = In this simulation, we set the whole iteration length as 1 s, i.e., N = 100. The desired reference is y d (t) =1/3[sin(t/20)+1 cos(3t/20)],0 t 100. The initial state satisfies A2. The control input for the first iteration is simply set to be 0. The learning gain L t = 50 and the decreasing sequence is set to be a = 1/. The algorithms are run for 150 iterations. The general algorithms (4) and (5) are used. The random variables γ (t) and σ (t) for data dropouts are defined separately for different time instants rather than a unified variable for the entire iteration. We introduce data dropout rate (DDR) as the probability P(σ (t) =0) or P(γ (t) =0). In other words, DDR denotes the percentage of lost pacages over the total pacages. For simplicity, the DDRs for both the measurement and actuator sides are equal in the following. In this example, four cases are simulated, that is, DDR = 10%, 20%, 30%, and 40%, respectively. The averaged tracing error profiles along iteration axis for all cases are shown in Fig. 3, where the averaged N t =1 tracing error is defined as e e (t). As can be seen, the N convergence speed slows as the DDR increases. As has been shown in Theorem 1, the condition on learning gain L t is that all eigenvalues of L t C t+1 B t are positive. Moreover, a faster convergence speed can be achieved when L t is designed such that the eigenvalues are with larger magnitude; however, such case would lead to bad transient performance such as overshot before convergence. Thus, there is a trade-off between the convergence speed and transient performance. If there is no noise involved in the system, i.e., the noises w (t) and v (t) are removed from the system (1), then we can also delete the decreasing sequence a from the update laws (4) as mentioned in Remar 2. In this case, an exponential convergence speed is achieved, as shown in Fig. 4, where the learning gain L t is selected as L t =8due to the previous value does not satisfy the condition given in Remar 6.

80 5856 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 11, NOVEMBER 2017 Fig. 3. Tracing error profiles along iteration axis with different DDRs: Noise case (DDR1: DDR at the measurement side; DDR2: DDR at the actuator side). Fig. 5. Input sample paths along iteration axis with different DDRs. (a) With a case, (b) Without a case. out that the decreasing sequence may mae the learning controller unsuitable if large changes occur to the desired reference after several iterations. As a consequence, the design of learning laws depends on the practical application requirements. Fig. 4. Tracing error profiles along iteration axis with different DDRs: Noise free case (DDR1: DDR at the measurement side; DDR2: DDR at the actuator side). In the figure, all profiles are approximate lines in the logarithmic coordinate, which imply the exponential convergence property. In addition, Fig. 4 also verifies the relationship between convergence speed and DDR rate. For the above three lines, the sum of DDRs at both sides are identical. It can be found that the fastest convergence speed belongs to the case DDR1 = DDR2. Moreover, although the worst DDR in the forth line is 40%, it still behaves faster than the third line where the worst DDR is 30%. Such observations coincide with Subsection III.C. To demonstrate the effect of the decreasing sequence a for stochastic systems, we display the input profiles for (4) with and without a in Fig. 5. As can be seen, the introduction of a enables a stable convergence of input, while the input eeps fluctuating if such sequence is removed from the update law. This verifies the necessity of decreasing term in algorithms for stochastic systems. However, it should be pointed V. CONCLUSION ILC under general data dropout environments is explored in this technical note. The data dropouts are allowed to occur randomly at both the measurement and actuator sides. As a result, the control update process consists of two parts, i.e., the computed input and the real input. A novel convergence analysis framewor is proposed in this technical note. To be specific, the update process is first proved to be a Marov chain by directly analyzing its sample path behavior. Then the convergence in both mean square and almost sure senses is established strictly. In addition, the technical note also demonstrates the effectiveness and robustness of conventional P-type update law against random factors. REFERENCES [1] H.-S. Ahn, Y. Q. Chen, and K. L. Moore, Iterative learning control: Survey and categorization from 1998 to 2004, IEEE Trans. Syst., Man, Cybern. C, vol. 37, no. 6, pp , [2] D. Shen and Y. Wang, Survey on stochastic iterative learning control, J. Process Control, vol. 24, no. 12, pp , [3] D. Huang, J.-X. Xu, V. Venataramanan, and T. C. T. Huynh, High performance tracing of piezoelectric positioning stage using currentcycle iterative learning control with gain scheduling, IEEE Trans. Ind. Electron., vol. 61, no. 2, pp , [4] X. Li, Q. Ren, and J.-X. Xu, Precise speed tracing control of a robotic fish via iterative learning control, IEEE Trans. Ind. Electron., vol. 63, no. 4, pp , 2016.

81 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 11, NOVEMBER [5] H. S. Ahn, Y. Q. Chen, and K. L. Moore, Intermittent iterative learning control, in Proc IEEE Int. Symp. Intelligent Control, pp [6] H. S. Ahn, K. L. Moore, and Y. Q. Chen, Discrete-time intermittent iterative learning controller with independent data dropouts, in Proc IFAC World Congr., pp [7] H. S. Ahn, K. L. Moore, and Y. Q. Chen, Stability of discrete-time iterative learning control with random data dropouts and delayed controlled signals in networed control systems, in Proc. IEEE Int. Conf. Control Automation, Robotics, and Vision 2008, pp [8] X. Bu, Z.-S. Hou, and F. Yu, Stability of first and high order iterative learning control with data dropouts, Int. J. Control, Automation, Syst., vol. 9, no. 5, pp , [9] X. Bu, Z.-S. Hou, F. Yu, and F. Wang, H- iterative learning controller design for a class of discrete-time systems with data dropouts, Int. J. Syst. Sci., vol. 45, no. 9, pp , [10] C. Liu, J.-X. Xu, and J. Wu, Iterative learning control for remote control systems with communication delay and data dropout, Math. Problems in Eng., Article ID , pp. 1 14, [11] D. Shen and Y. Wang, Iterative learning control for networed stochastic systems with random pacet losses, Int. J. Control,vol.88,no.5,pp , [12] D. Shen and Y. Wang, ILC for networed nonlinear systems with unnown control direction through random Lossy channel, Syst. Control Lett., vol. 77, pp , [13] X. Bu, F. Yu, Z.-S. Hou, and F. Wang, Iterative learning control for a class of nonlinear systems with random pacet losses, Nonlinear Anal., Real World Appl., vol. 14, no. 1, pp , [14] Y.-J. Pan, H. J. Marquez, T. Chen, and L. Sheng, Effects of networ communications on a class of learning controlled non-linear systems, Int. J. Syst. Sci., vol. 40, no. 7, pp , [15] L.-X. Huang and Y. Fang, Convergence analysis of wireless remote iterative learning control systems with dropout compensation, Math. Problems in Eng., Article ID , pp. 1 9, [16] J. Liu and X. Ruan, Networed iterative learning control approach for nonlinear systems with random communication delay, Int. J. Syst. Sci., vol. 47, no. 16, pp , [17] J. Emelianova, P. Pashin, K. Galowsi, and E. Rogers, Stability of nonlinear discrete repetitive processes with Marovian switching, Syst. Control Lett., vol. 75, pp , [18] A. Benveniste, M. Métivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations. New Yor: Springer-Verlag, [19] P. E. Caines, Linear Stochastic Systems. New Yor: Wiley, [20] S. S. Saab, A discrete-time stochastic learning control algorithm, IEEE Trans. Autom. Control, vol. 46, no. 6, pp , [21] S. S. Saab, Selection of the learning gain matrix of an iterative learning control algorithm in presence of measurement noise, IEEE Trans. Autom. Control, vol. 50, no. 11, pp , [22] J. N. Tsitsilis, D. P. Bertesas, and M. Athans, Distributed asynchronous deterministic and stochastic gradient optimization algorithms, IEEE Trans. Autom. Control, vol. 31, no. 9, pp , [23] K. You and L. Xie, Minimum data rate for mean square stabilizability of linear systems with Marovian pacet losses, IEEE Trans. Autom. Control, vol. 56, no. 4, pp , [24] Y. Shi and B. Yu, Output feedbac stabilization of networed control systems with random delays modeled by Marov chains, IEEE Trans. Autom. Control, vol. 54, no. 7, pp , [25] W. Zhou, M. Yu, and D. Huang, A high-order internal model based iterative learning control scheme for discrete linear time-varying systems, Int. J. Automation and Computing, vol. 12, no. 3, pp , 2015.

82 IMA Journal of Mathematical Control and Information (2017) 34, doi: /imamci/dnw031 Advance Access Publication on June 28, 2016 Zero-error convergence of iterative learning control using quantized error information Yun Xu and Dong Shen College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , People s Republic of China Corresponding author: shendong@mail.buct.edu.cn and Xuhui Bu School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo , People s Republic of China [Received on 18 August 2015; revised on 6 March 2016; accepted on 19 May 2016] An iterative learning control algorithm using quantized error information is proposed in this article for both linear and nonlinear systems. The actual output is first compared with the reference signal and then the corresponding error is quantized and transmitted. A logarithmic quantizer is used to guarantee an adaptive improvement for tracing performance. The tracing error under this scheme is proved to converge to zero asymptotically. Illustrative examples verify the theoretical results. Keywords: iterative learning control; quantized error information; zero-error convergence. 1. Introduction Iterative learning control (ILC), first proposed in Arimoto et al. (1984), is an important branch of intelligent control methods, which aims to improve the tracing performance by successively correcting the input signal from iteration to iteration (Bristow et al., 2006; Ahn et al., 2007; Shen & Wang, 2014). It is inspired by the intuitive idea that one could learn from previous experiences and lessons when conducting some tas repeatedly, so that he/she would do it better and better. Thus, ILC is suitable for the systems that could accomplish some tass over a fixed time interval repetitively. For such systems, the input signal for the current cycle can be formulated using the input and output information of past cycles as well as the tracing objective. As has been developed for three decades, ILC has gained extensive developments both in theory and in applications. Different inds of systems have been covered, such as networed control systems (Shen & Wang, 2015a,b), multi-agent systems (Meng et al., 2013, 2014a,b, 2015; Meng & Moore, 2016), switched systems (Bu et al., 2013), etc. Some potential industrial applications are also reported such as wind turbines (Tutty et al., 2014), robot manipulator (Zhao et al., 2015), automated off-highway vehicle (Liu & Alleyne, 2014), etc. Recently more and more practical applications employ the networed control system scheme, where the plant and the controller are usually located in different sites and communicate with each other through wire/wireless networs. In such settings, the ILC under random data dropout has attained a lot of research (Bu et al., 2014; Shen & Wang, 2015a,b). Meanwhile, multi-agent system is also a hot topic in the control society (Meng et al., 2015; Meng & Moore, 2016), where the data communication among agents is a common highlight. In both networed control systems and multi-agent systems, the transmission burden is a critical issue to be addressed for practical applications. In order to reduce the The authors Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved. Downloaded from by guest on 12 September 2017

83 1062 Y. XU ET AL. transmission burden, quantization is a potential alternative. Quantized estimation and control have been studied in Curry (1970) and some excellent progress in identification based on quantized observations are reported in Wang et al. (2010). A survey on quantized nonlinear control is provided in Jiang & Liu (2013). Brocett & Liberzon (2000) and Fagnani & Zampieri (2004) gave some results on quantized feedbac stabilization problems, where Brocett & Liberzon (2000) considered a simple uniform quantizer with saturation, while Fagnani & Zampieri (2004) addressed both uniform quantizer and nested quantizer and made efforts on the tradeoff between quantization complexity and system performance. However, in the ILC field, no paper has been found on quantized control except Bu et al. (2015). Bu et al. made the first attempt on ILC problem in Bu et al. (2015), where the output measurements were quantized by a logarithmic quantizer and fed to the controller for updating ILC law. By using sector bound method and conventional contraction mapping method, it was shown that the tracing error converged to a small range whose upper bound depended on quantization density. However, the tracing error also depended on the target value, which can be seen from the expression of the upper bound. That is, the larger the output measurement is, the larger the final tracing error bound is. This is natural because of the definition of the quantizer. Thus, in order to get better tracing performance for a large target value, the quantizer density should be much greater. Nevertheless, is there any possible way to mae a zero-convergence of ILC with quantized information? This is the problem that the paper addresses. In this paper, we only consider the case that transmission from system output to controller is quantized for saving networ bandwidth. This reason is to mae our idea more intuitive, and the general case is left for further study. Motivated by the basic principle of ILC, which is learning and improving iteration by iteration for a specified reference, we mae an alteration to the conventional control implementation. To be specific, we first transmit the reference to the system before running the system for a certain tas. Then, the system would mae a comparison between its output and the given reference and then quantize the error locally. It is the quantized error that is transmitted bac to the controller for successive updating. As one could see in the following of this paper, the quantization error reduces adaptively as the tracing error reduces even a sparse logarithmic quantizer is adopted. Moreover, the delivery of the desired reference might be a controversial point of our scheme. However, in the current implementation, the networ from the controller to the plant is assumed to wor well. In other words, the input signal is well delivered. Therefore, we could use this networ to transmit the accurate reference. The rest of the paper is arranged as follows: Section 2 provides the system formulation and problem statements; while main results and analysis are given in Section 3; the extension to nonlinear systems is shown in Section 4; Section 5 provides some illustrative simulations to verify the theoretical analysis; and Section 6 concludes this paper. 2. Problem formulation Consider the following linear discrete-time system x (t + 1) = Ax (t) + Bu (t), y (t) = Cx (t), (2.1) where = 1, 2,... denotes the number of different iterations and t = 0, 1,..., N denotes different time instances in an iteration. Here N is the iteration length. x (t), u (t) and y (t) are state, input and output, respectively. A, B and C are suitable matrices with appropriate dimensions. Without loss of any generality, it is assumed that CB is of full-column ran. Downloaded from by guest on 12 September 2017

84 ILC USING QUANTIZED ERROR INFORMATION 1063 The reference is denoted by y d (t), t = {0, 1,..., N}. The control objective is to find an input sequence {u (t)} such that the output y (t) would converge to y d (t) as, t. For further analysis, the following assumptions are needed. Assumption 2.1 The reference y d (t) is realizable, i.e. there is an unique input u d (t) such that x d (t + 1) = Ax d (t) + Bu d (t), y d (t) = Cx d (t), (2.2) with a suitable initial state x d (0). Assumption 2.2 The identical initial condition is satisfied, i.e. for all iterations, where x d (0) is the desired initial state. x (0) = x d (0), (2.3) In this paper, for any specified tracing reference, we first transmit it to the system before operating. Then the tracing error is generated, quantized and transmitted bac to controller. In other words, the ILC law is u +1 (t) = u (t) + LQ(y d (t + 1) y (t + 1)), (2.4) where L is the learning gain matrix while Q( ) is a selected quantizer. In this paper, a logarithmic quantizer as in Bu et al. (2015) and Elia & Mitter (2001) is adopted, U ={±z i : z i = μ i z 0, i = 0, ±1, ±2, } {0}, 0 <μ<1, z 0 > 0, (2.5) where μ is associated with the quantization density. The associated quantizer Q( ) is given as 1 z i, if z 1+ζ i < v 1 z 1 ζ i, Q(v) = 0, if v = 0, Q( v), if v < 0, (2.6) with ζ = (1 μ)/(1 + μ). It is evident that the quantizer Q( ) in (2.6) is symmetric and time-invariant. Denote the tracing error e (t) = y d (t) y (t). InFu & Xie (2005), a sector bound method is proposed to deal with the quantization error. In this article, for a given quantization density μ, we have where e (t) ζ and ζ = (1 μ)/(1 + μ). Q(e (t)) e (t) = e (t) e (t), (2.7) Remar 1 One may as whether the quantization error can be regarded as a disturbance or uncertainty and the robust control analysis techniques can then be applied to derive some results. In our opinion, Downloaded from by guest on 12 September 2017

85 1064 Y. XU ET AL. this is an alternative approach to deal with the quantization error. However, this formulation of the quantization error would lead to conservative results regarding the tracing error s convergence. That is, the magnitude is not clear enough if the error is modelled as a disturbance or uncertainty. Meanwhile, following the robust control analysis techniques, one could only prove that the tracing error would converge into a bounded range. In this paper, we would lie to derive more insightful convergence. 3. Main results In this section, the zero-error convergence of (2.4) is given in the following theorem. Theorem 3.1 Consider system (2.1) and update law (2.4) with quantized error, and assume A2.1 A2.2 hold. If L is designed such that then the system tracing error converges to zero as. I LCB +ζ LCB λ<1, (3.1) Proof. Denote δu (t) = u d (t) u (t), δx (t) = x d (t) x (t). Subtracting both sides of (2.4) from u d (t), and combining (2.7), we have From system (2.1) and A2.1, one has Thus, it is evident that δu +1 (t) =δu (t) LQ(e (t + 1)) =δu (t) L(e (t + 1) + e (t + 1) e (t + 1)) =δu (t) L(Cδx (t + 1) + e (t + 1) Cδx (t + 1)). (3.2) δx (t + 1) =x d (t + 1) x (t + 1) =Ax d (t) + Bu d (t) Ax (t) Bu (t) =Aδx (t) + Bδu (t). (3.3) δu +1 (t) =(I LCB)δu (t) L e (t + 1)CBδu (t) (LCAδx (t) + L e (t + 1)CAδx (t)) =(I LCB)δu (t) e (t + 1)LCBδu (t) (1 + e (t + 1))LCAδx (t). (3.4) Taing norms of both sides of last equation leads to δu +1 (t) I LCB δu (t) +ζ LCB δu (t) + (1 + ζ) LCA δx (t) λ δu (t) +θ δx (t), (3.5) Downloaded from by guest on 12 September 2017

86 ILC USING QUANTIZED ERROR INFORMATION 1065 where I LCB +ζ LCB λ, (1 + ζ) LCA =θ. On the other hand, taing norms of (3.3) yields δx (t + 1) A δx (t) + B δu (t). (3.6) Recursively, we have δx (t) A δx (t 1) + B δu (t 1) A 2 δx (t 2) + A B δu (t 2) + B δu (t 1) t 1 A t 1 i B δu(i), (3.7) i=0 where the last inequality uses the assumption A2.2. Combining (3.5) and (3.7) wehave t 1 δu +1 (t) λ δu (t) +θ A t 1 i B δu(i). (3.8) Specifically, δu +1 (0) λ δu (0) δu +1 (1) λ δu (1) +θ B δu(0) N 2 δu +1 (N 1) λ δu (N 1) +θ A N 2 i B δu(i). Denote δu =[ δu (0), δu (1),, δu (N 1) ] T. It is observed that δu +1 ΓδU, (3.9) i=0 i=0 where Γ = λ 0 0 θ B λ θ A N 2 B θ A N 3 B λ. (3.10) Notice that Γ is a lower triangular system with diagonal elements being λ<1, thus, δu 0, t. equivalently, δu (t) Noticing (3.7) and the finiteness of time, we have δx (t) 0, t. This further yields that 0, t. In other words, a zero-error tracing performance of the system is obtained. This e (t) completes the proof. 0, or Downloaded from by guest on 12 September 2017

87 1066 Y. XU ET AL. Remar 1 As can be seen from (2.7), the zero-error convergence of tracing performance is guaranteed because the quantization error is bounded by the actual tracing error. Consequently, larger tracing error means larger quantization error. However, it is acceptable since the learning could be rough at this stage. While as the tracing error converges to zero, the quantization error also reduces to zero and thus enables a zero-error tracing as the iteration number goes to infinity. Remar 2 Notice that the quantization error is bounded by tracing error, thus it leaves much freedom for the design of quantizer which is characterized by ζ. However, as explained in Bu et al. (2015), quantizer index ζ and learning gain matrix L are coupled with each other in the convergence condition (3.1). Thus, the selection of quantizer is not totally free. Remar 3 The asymptotic convergence to zero is proved in the theorem. One may be interested in a monotonic convergence that ensures a good transient performance along iterations. Noticing (3.10), it is observed that a contraction on δu depends on the property of Γ such as Γ < 1 where δu and Γ denote some consistent norms of vector and matrix. Meanwhile, the quantization and the design of learning matrix L should satisfy that I LCB +ζ LCB < 1 to guarantee such monotone convergence. 4. Extension to nonlinear systems In this section, the following nonlinear system is considered x (t + 1) =f (x (t)) + B(x (t))u (t), y (t) =Cx (t), (4.1) where f (x (t)) and B(x (t)) are functions of x (t). The following assumptions are needed for further analysis. Assumption 4.1 The reference y d (t) is realizable, i.e. there is an unique input u d (t) such that x d (t + 1) =f (x d (t)) + B(x d (t))u d (t), y d (t) =Cx d (t), (4.2) with a suitable initial state x d (0). Assumption 4.2 The nonlinear functions f ( ), B( ) satisfy globally Lipschitz condition in their arguments, i.e. there exist b f and b B such that f (x 1 ) f (x 2 ) b f x 1 x 2, B(x 1 ) B(x 2 ) b B x 1 x 2. Now the theorem of zero-convergence for nonlinear system is given. Downloaded from by guest on 12 September 2017

88 ILC USING QUANTIZED ERROR INFORMATION 1067 Theorem 4.1 Consider system (4.1) and update law (2.4) with quantized error, and assume A2.2, A4.1 A4.2 hold. If L is designed such that I LCB(x (t)) +ζ LCB(x (t)) λ<1,, t, (4.3) then the system tracing error converges to zero as. Proof. The proof follows similar steps of Theorem 3.1 with minor modifications. The derivation of (3.2) is still valid. Combining (4.1) and (4.2), we have δx (t + 1) =x d (t + 1) x (t + 1) =f (x d (t)) f (x (t)) + B(x d (t))u d (t) B(x (t))u (t) =B(x (t))δu (t) + f (x d (t)) f (x (t)) +[B(x d (t)) B(x (t))]u d (t). (4.4) Substitute this equation into (3.2), then we have δu +1 (t) =δu (t) L(Cδx (t + 1) + e (t + 1)Cδx (t + 1)) =(I LCB(x (t)))δu (t) L e (t + 1)CB(x (t))δu (t) LC[f (x d (t)) f (x (t))] LC[B(x d (t)) B(x (t))]u d (t) L e (t + 1)C[f (x d (t)) f (x (t))] L e (t + 1)C[B(x d (t)) B(x (t))]u d (t). (4.5) Taing norm of last equation and using A4.2 lead to δu +1 (t) ( I LCB(x (t)) +ζ LCB(x (t)) ) δu (t) + (1 + ζ) LC b f δx (t) + (1 + ζ) LC u d (t) b B δx (t) λ δu (t) +η δx (t), (4.6) where η = (1 + ζ) LC b f + (1 + ζ) LC u d (t) b B. On the other hand, taing norm of both sides of (4.4) and using A4.2, wehave and recursively, we have δx (t + 1) b B δu (t) + (b f + b B u d (t) )δx (t) t 1 δx (t) ρ t 1 i b B δu (i), (4.7) i=0 Downloaded from by guest on 12 September 2017

89 1068 Y. XU ET AL. where ρ = b f + b B sup t u d (t) and A2.2 is used. Therefore, substituting (4.7) into (4.6) yields t 1 δu +1 (t) λ δu (t) + ηρ t 1 i b B δu (i) (4.8) which could be lifted as δu +1 δu, (4.9) i=0 where = λ 0 0 ηb B λ ηρ N 2 b B ηρ N 3 b B λ. Therefore, it is evident that δu (t) 0, t and further δx (t) 0, e (t) is, a zero-error tracing performance of the system is obtained. This completes the proof. 0, t. That 5. Illustrative simulations In order to mae improvements more visual, the examples of Bu et al. (2015) are taen into account in this section Linear system The following linear system is considered in Bu et al. (2015): ( x (t + 1) = 1 0 y (t) = (1 0.5) x (t). ) ( 0.5 x (t) + 1 ) u (t), (5.1) The desired reference is given as y d (t) = sin(8t/50) + sin(4t/50), t [0, 100]. The initial states are set to be x (0) = x d (0) = 0 for all, and the initial input is simply chosen as u 0 (t) = 0, t. The parameters in the quantizer is given as z 0 = 2, μ = 0.85, then ζ = In addition, the learning gain L is selected to be 0.8 such that I LCB +ζ LCB =0.264 < 1. The algorithm is performed for 20 iterations. The tracing performances at the 2nd, 5th and 20th iterations are shown in Fig. 1, where one could find that the tracing at the fifth iteration is good enough and the output at the 20th iteration almost coincides with the reference. The tracing error along iteration axis is shown in Fig. 2. The algorithm proposed in Bu et al. (2015) is also simulated to mae a comparison. To be specific, the solid line denotes the algorithm provided in this paper, where the quantizer is added to the tracing error, while the dashed line denotes the one of Bu et al. (2015), where the quantizer is added to the actual output. As can be seen from Fig. 2, the maximum tracing error of the algorithm in Bu et al. (2015) could not reduce to zero due to quantization error. However, the update law in this paper ensures a zero-error convergence. Downloaded from by guest on 12 September 2017

90 ILC USING QUANTIZED ERROR INFORMATION 1069 Fig. 1. Outputs at different iterations vs reference. Fig. 2. Maximal tracing error along iteration axis. As explained before, large reference value would result in large tracing error of Bu s update law since a logarithmic quantizer is adopted. This is shown in Fig. 3, where three references are considered, i.e. y d (t),3y d (t) and 5y d (t). Obviously, as the scale of the reference is enlarged, the maximum tracing error also increases. However, as can be seen from Fig. 4, our algorithm always ensures the tracing error to converge to zero as iteration number increases. In addition, we compute the quadratic sum of the input errors, i.e. Euclidean norm of δu, and show it in Fig. 5. One could find from this plot that the summation of the input error for a whole iteration decreases monotonically along iteration axis. Downloaded from by guest on 12 September 2017

91 1070 Y. XU ET AL. Fig. 3. Maximal tracing error for different references: output quantizer case. Fig. 4. Maximal tracing error for different references: error quantizer case. In order to see the effect of quantization density on the tracing performance, we further simulate the example for different scales of density. That is, the density parameter μ is set as 0.95, 0.85, 0.75 and 0.65, respectively. The results are shown in Figs 6 and 7 for the error quantizer case and output quantizer case, respectively. From these figures two observations are given as follows. The first one is that the larger the density is, the better the tracing performance is for both cases. The other one is that different scales of quantization density have little influence in the error quantizer case, while they have great influence in the output quantizer case. Downloaded from by guest on 12 September 2017

92 ILC USING QUANTIZED ERROR INFORMATION 1071 Fig. 5. Quadratic sum of input error. Fig. 6. Maximal tracing error for different scales of density: error quantizer case Nonlinear system Let us consider the following nonlinear system (Bu et al., 2015): x (t + 1) = 0.75 sin(x (t)) + 0.5u (t), y (t) = 0.2x (t). (5.2) Downloaded from by guest on 12 September 2017

93 1072 Y. XU ET AL. Fig. 7. Maximal tracing error for different scales of density: output quantizer case. Fig. 8. Outputs at different iterations vs reference. The desired reference is y d (t) = sin(3t/50) + 1 cos(t/50), t [0, 200]. The initial states are set to be x (0) = x d (0) = 0 for all. The initial input is simply chosen as u 0 (t) = 0, t. The parameters in the quantizer are z 0 = 5, μ = 0.9, then ζ = The learning gain is selected as L = 5, then it leads to I LCB +ζ LCB =0.525 < 1. The algorithm is also performed for 20 iterations. The tracing performances at the 2nd, 5th and 20th iterations are shown in Fig. 8, where the output at the 20th iteration is also satisfactory. Downloaded from by guest on 12 September 2017

94 ILC USING QUANTIZED ERROR INFORMATION 1073 Fig. 9. Maximal tracing error along iteration axis. Fig. 10. Maximal tracing error for different references: output quantizer case. A comparison of maximum tracing error along iteration axis is given in Fig. 9, where solid and dashed lines denote error quantizer and output quantizer cases, respectively. It is evident that using an error quantizer enables us to mae the tracing error converge to zero. Similar to linear system case, maximum errors for different scales of references are provided in Figs 10 and 11 for the output quantizer case and error quantizer case, respectively. The three references used in this example are y d (t), 2y d (t) and 4y d (t), respectively. From Fig. 10, a non-zero lower bound would always exist for the output quantizer case and the tracing performance reduces as the reference scale increases. However, a zero-error convergence is always guaranteed by the algorithm of this paper, Downloaded from by guest on 12 September 2017

95 1074 Y. XU ET AL. Fig. 11. Maximal tracing error for different references: error quantizer case. Fig. 12. Quadratic sum of input error. as revealed in Fig. 11. In addition, Fig. 12 shows the monotonic decreasing trend of quadratic summation of the input error along iteration axis. Similar to the linear system case, we also simulate for different scales of quantization density and show the results in Figs 13 and 14 for the error quantizer case and output quantizer case, respectively. The observations are also similar to the linear system case. Downloaded from by guest on 12 September 2017

96 ILC USING QUANTIZED ERROR INFORMATION 1075 Fig. 13. Maximal tracing error for different scales of density: error quantizer case. Fig. 14. Maximal tracing error for different scales of density: output quantizer case. 6. Concluding remars In this paper, the ILC problem for discrete-time linear and nonlinear systems is discussed by using quantized information. The reference is first transmitted to the plant and then compared with the actual output to derive tracing error. The tracing error is quantized by a logarithmic quantizer and then transmitted bac to the controller, which ensures an adaptive learning for precise tracing performance. Using conventional contraction techniques, the tracing error is shown strictly convergent to zero as iteration number goes to infinity. For further research, quantization at the input signal is of great interest. Downloaded from by guest on 12 September 2017

97 1076 Y. XU ET AL. Funding National Natural Science Foundation of China ( and ); Beijing Natural Science Foundation ( ). References Ahn, H. S., Chen, Y. Q. & Moore, K. L. (2007) Iterative learning control: survey and categorization from 1998 to IEEE Trans. Syst., Man, Cybern. C, 37, Arimoto, S., Kawamura, S. & Miyazai, F. (1984) Bettering operation of robots by learning. J. Robot. Syst., 1, Bristow, D. A., Tharayil, M. & Alleyne, A. G. (2006) A survey of iterative learning control: a learning-based method for high-performance tracing control. IEEE Control Syst. Mag., 26, Brocett, R. W. & Liberzon, D. (2000) Quantized feedbac stabilization of linear systems. IEEE Trans. Autom. Control, 45, Bu, X., Hou, Z., Yu, F. & Fu, Z. (2013) Iterative learning control for a class of non-linear switched systems, IET Control Theory Appl., 7, Bu, X., Hou, Z., Yu, F. & Wang, F. (2014) H- iterative learning controller design for a class of discrete-time systems with data dropouts. Int. J. Syst. Sci., 45, Bu, X., Wang, T., Hou, Z. & Chi, R. (2015) Iterative learning control for discrete-time systems with quantised measurements. IET Control Theory Appl., 9, Curry, R. E. (1970) Estimation and Control with Quantized Measurements. Cambridge, MA: MIT Press. Elia, N. & Mitter, K. (2001) Stabilization of linear systems with limited information. IEEE Trans. Autom. Control, 46, Fagnani, F. & Zampieri, S. (2004) Quantized stabilization of linear systems: complexity versus performance. IEEE Trans. Autom. Control, 49, Fu, M. & Xie, L. (2005) The sector bound approach to quantized feedbac control. IEEE Trans. Autom. Control, 50, Jiang, Z.-P. & Liu, T.-F. (2013) Quantized nonlinear control a survey. Acta Autom. Sin., 39, Liu, N. & Alleyne, A. G. (2014) Iterative learning identification applied to automated off-highway vehicle, IEEE Trans. Control Syst. Technol., 22, Meng, D., Jia,Y. & Du, J. (2013) Multi-agent iterative learning control with communication topologies dynamically changing in two directions. IET Control Theory Appl., 7, Meng, D., Jia, Y. & Du, J. (2015) Robust consensus tracing control for multiagent systems with initial state shifts, disturbances, and switching topologies. IEEE Trans. Neural Netw. Learn. Syst., 26, Meng, D., Jia, Y., Du, J. & Zhang, J. (2014a) High-precision formation control of nonlinear multi-agent systems with switching topologies: a learning approach. Int. J. Robust Nonlinear Control, 25, Meng, D., Jia, Y., Du, J. & Zhang, J. (2014b) On iterative learning algorithms for the formation control of nonlinear multi-agent systems. Automatica, 50, Meng, D. & Moore, K. L. (2016) Learning to cooperate: networs of formation agents with switching topologies. Automatica, 64: Shen, D. & Wang, Y. (2014) Survey on stochastic iterative learning control. J. Process Control, 24, Shen, D. & Wang, Y. (2015a) Iterative learning control for networed stochastic systems with random pacet losses. Int. J. Control, 88, Shen, D. & Wang,Y. (2015b) ILC for networed nonlinear systems with unnown control direction through random lossy channel. Syst. Control Lett., 77, Tutty, O., Blacwell, M., Rogers, E. & Sandberg, R. (2014) Iterative learning control for improved aerodynamic load performance of wind turbines with smart rotors. IEEE Trans. Control Syst. Technol., 22, Downloaded from by guest on 12 September 2017

98 ILC USING QUANTIZED ERROR INFORMATION 1077 Wang, L. Y., Yin, G., Zhang, J.-F. & Zhao, Y. (2010) System Identification with Quantized Observations, Theory and Applications. Boston: Birhauser. Zhao, Y. M., Lin, Y., Xi, F. & Guo, S. (2015) Calibration-based iterative learning control for path tracing of industrial robots. IEEE Trans. Ind. Electron., 62, Downloaded from by guest on 12 September 2017

100 Asian Journal of Control, Vol. 19, No. 5, pp , September 2017 Published online 6 February 2017 in Wiley Online Library (wileyonlinelibrary.com) DOI: /asjc.1481 STOCHASTIC POINT-TO-POINT ITERATIVE LEARNING CONTROL BASED ON STOCHASTIC APPROXIMATION Yun Xu, Dong Shen, and Xiao-Dong Zhang ABSTRACT An iterative learning control algorithm with iteration decreasing gain is proposed for stochastic point-to-point tracing systems. The almost sure convergence and asymptotic properties of the proposed recursive algorithm are strictly proved. The selection of learning gain matrix is given. An illustrative example shows the effectiveness and asymptotic trajectory properties of the proposed approach. Key Words: Iterative learning control, linear stochastic systems, point-to-point control, stochastic approximation. I. INTRODUCTION It is readily apparent that performance of human motion tass can be improved by repetition. This basic cognition motivates research into iterative learning control (ILC). ILC is a ind of optimization strategy that improves the tracing performance of a system which repeatedly completes some tas over a fixed time interval. Many extensive studies have covered a large range of ILC topics, including the design of update laws, identical initialization conditions, robustness, optimization, transient behavior, and the combination of ILC with other control methods [1 9]. The standard ILC requires the system output to trac the desired reference over the whole time interval [1 4]. However, in many practical applications such as pic and place robotic tas, satellite positioning, and production line automation, only partial reference needs to be accurately traced while the left reference is with a large degree of freedom. This type of ILC is called point-to-point ILC. As a special case, if only the terminal point is required to trac, it is termed the terminal ILC [10 12]. Great efforts have been made towards the point-to-point ILC problem. In [13,14], the problem was solved by iteratively updating the reference between trials instead of input profiles and the strict convergence analysis was provided. The paper [14] also presented an alternative method for the point-to-point problem, where the control input was linearly parameterized in Manuscript received November 12, 2015; revised October 12, 2016; accepted December 7, The authors are with College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. Dong Shen is the corresponding author ( shendong@mail.buct.edu.cn). This wor is supported by National Natural Science Foundation of China ( , ), Beijing Natural Science Foundation ( ). term of basis functions that were constructed by system matrices and the parameters were updated according to specified tracing data. In addition, the continuous-time system case was addressed in [15] with a detailed comparison between experimental performance and theoretical results. When considering a multiple input multiple output (MIMO) system, it is common that only some components of the output vector are required to satisfy certain conditions in practice. For example, consider a spatial motion control where the output position consists of three dimensions. We may impose a constraint only to the altitude but let the other two dimensions free. This type of general point-to-point tracing problem was studied in [16] and [17] for linear and nonlinear systems, respectively. The paper [16] provided an extensive formulation and analysis on gradient descent-based ILC and Newton method-based ILC with various mixed constraints. Readers can also refer to [18,19] for more experimental results. However, in all the above studies, no stochastic noises are taen into account. This observation motivates us to further consider the stochastic point-to-point ILC problem. The objective of this paper is to address ILC for stochastic point-to-point tracing system with a general form of tracing reference as in [16,17]. Besides, we propose a stochastic approximation based solution to this new problem. It is worth pointing out that this paper is an extension of [20], where only the convergence was shown. While in this paper, a low computation algorithm is presented with a detailed analysis on convergence and asymptotical property. In addition, [21] also presented a gain-varying update method similar to our paper. However, the gain matrix in [21] was the inverse of the system model and the iteration varying gain was obtained by minimizing an objective function. While, in 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

101 Y Xu et al.: Stochastic Point-to-Point ILC 1749 this paper, we apply the stochastic approximation technique and propose a simplified algorithm without using inverse matrix. The paper is arranged as follows. Section II provides the problem formulation; Section III gives the ILC algorithm associated with its convergence and asymptotical normality analysis; Section IV provides an illustrative example to show the effectiveness; some concluding remars are given in Section V. Notation. R denotes the set of real numbers and R m is the space of m-dimension vectors. The subscript T of a matrix denotes the transpose. (0, Q) is the normal distribution with zero mean and covariance Q. For two sequences {a } and {b },wecalla = o(b ) if b 0and (a b ) 0as. denotes the Euclidean norm of a vector or matrix. II. PROBLEM FORMULATION Consider the following LTV system: x (t + 1) =A t x (t)+b t u (t)+w (t + 1) y (t) =C t x (t)+v (t) where the subscript denotes different iterations, = 1, 2,,andt denotes an arbitrary time in an iteration. Denote the length of one iteration as N. x (t) R n, u (t) R p,andy (t) R q are system state vector, input vector and output vector, respectively. System matrices A t, B t and C t are with appropriate dimensions. w (t) and v (t) are system noise and measurement noise, respectively. In this paper, the noise is simply assumed to be zero mean Gaussian white noise, i.e., with normal distribution. Besides, for different iterations and different time instances, the noise signals are uncorrelated. For any t, both {w (t), = 1, 2, } and {v (t), = 1, 2, } are independent and identically distributed sequences. Assume that C t+1 B t is of full row ran. This condition implies that the relative degree is one. The high relative degree is an important issue in the ILC field [22,23]. The results of this paper can be extended to the high relative degree case with slight modifications similar to [22,23]. The input and output can be lifted as super-vector forms u =[u T (0), ut (1),, ut (N 1)]T R pn, y =[y T (1), yt (2),, yt (N)]T R qn. In addition, let C + B G = 2 C + B N N C + B 0 1 N 1 (1) where C + B t C t+1 B t, j A i j A j 1 A i, j i C j j 1 i+1 B i. Then we can rewrite (1) as y = Gu + y 0 + ε (2) where y 0 is the response to initial states, given by y0 = [(C 1 0 ) T, (C )T,, (C N N 1 ) T ] T x 0 (0), where x (0) denotes the initial state. In this paper, two cases of y 0 are discussed, namely, the identical initialization condition case and the asymptotic accuracy case. Without loss of any generality, it is simply assumed y 0 = 0 for the former case and y 0 0 for the latter case. The stochastic noise term ε is expressed as ε =[(v (1)+C 1 w (1)) T,, N 1 (v (N)+C N i=0 i 1 w 0 (N i)) T ] T. Thus ε is assumed as zero-mean Gaussian process noise with covariance Q, i.e., ε (0, Q). Q is defined as E(ε ε T ), and thus depends on both noises and system matrices. For standard ILC, the desired reference is y d =[y T d (1), yt d (2),, yt d (N)]T R qn. (3) Denote the standard tracing error as e = y d y. In many practical applications, only part of y d rather than y d itself is required to trac. To this end, we give a variant modelling procedure of [16]. Suppose that only l j components of the output at time j are required to trac, 0 l j q, j = 1, 2,, N. Ifl j = 0, it means that the output at time j is completely disregarded. If l j 0, let us denote the tracing components by 1 n j,1 < n j, 2 < < n j, lj q. Removing all the points that do not need to be followed from the original objective y d,anewreference trajectory y r with dimension l is obtained where l = N j=1 l j.thatis,y r is a condensed reference trajectory of y d. Let us first define a row vector θ R qn with the same dimension of y d.iftheith component of y d is required to trac, then define the ith component of θ as one, i.e., θ i = 1, otherwise define it as zero. That is 1, if the ith component is the target θ i = 1 i qn 0, otherwise Then construct a matrix Φ R l qn as follows { 1, if θj = 1and j Φ i, j = θ =1 = i 0, otherwise for i = 1,, l, j = 1,, qn. Then, it is evident that Φ is of full row ran, i.e., ran(φ) = l. Moreover,y r and y d satisfy the following relationship (4) (5) 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

102 1750 Asian Journal of Control, Vol. 19, No. 5, pp , September 2017 y r =Φy d. (6) For deterministic systems, the control objective is to find an input sequence such that e 0as. However, this control objective is not suitable for the stochastic point-to-point tracing problem due to two reasons. The first one is that all components of e can not be available in applications. The other one is that the tracing errors cannot converge to zero because of the existence of stochastic noises. In this paper, we define Φê y r ΦGu. The control objective is to guarantee that Φê 0as. For further analysis, we have the following property on the matrix ΦG. Lemma 1. Assume that C + B t is of full row ran, t,then ΦG is of full row ran. That is, ran(φg) =l. Proof. Since C + B t is of full row ran, we have ran(c + B t ) = q. From the definition of G, itisevident ran(g) = qn. By Sylvester s ran inequality, ran(φ) + ran(g) qn ran(φg). This further yields that l ran(φg). On the other hand, ran(φg) min{ran(φ), ran(g)} = l. Thus, we have ran(φg) =l. III. ILC ALGORITHM AND ITS ANALYSIS In this section, we design the ILC algorithm based on the stochastic approximation to generate an input sequence such that the control objective is achieved. Let {a } be a sequence satisfying that a > 0, a a +1 a a +1 a 0, a =, α 0. =1 The ILC update law is now defined as (7) u +1 = u + a L( y r Φy ) (8) where L is the learning gain matrix to be specified later. The above update law actually is a stochastic approximation algorithm. In the following the convergence and asymptotic normality of (8) are thus proved based on the stochastic approximation results. To this end, two theorems from stochastic approximation theory will be used in the following analysis. For readability, both theorems are included in the appendix. Remar 1. Comparing with the conventional P-type algorithm u +1 = u + L( y r Φy ), it is seen that a decreasing gain a is added to (8). For deterministic systems, the conventional P-type algorithm can guarantee a satisfactory convergence behavior. While for stochastic systems, such algorithm may not behave well since the historical stochastic noises cannot be canceled by a fixed learning gain. That is, the input sequence fails to achieve a stable convergence due to the existence of stochastic noises. This is the first reason why we introduce a to (8). Moreover, the decreasing gain a could suppress stochastic noises efficiently. As a matter of fact, the decreasing gain is somewhat a necessary requirement for eliminating the influence of stochastic noises [24]. 3.1 Convergence of the ILC Algorithm Let us first consider the identical initialization condition case. Theorem 1. For system (2) with initial value y 0 = 0, design L such that ΦGL is stable, then the ILC update law (8) with arbitrary initial input u 0 would guarantee that Φê 0as. Proof. From (8), we have Φê +1 =Φê a ΦGLΦê + a ΦGLΦε. (9) Notice that ΦGL in (9) corresponds to the matrix H of Theorem 3 in the appendix, which is iteration-invariant and stable by the design of L. Moreover, one could find that ΦGLΦε in (9) corresponds to μ +1 of Theorem 3 in the appendix. Note that ε is independent along the iteration axis, and so is Φε, Therefore, it is obvious that =1 a Φε 2 trace(φqφ T ) =1 a2 <. By Khintchine-Kolmogorov convergence theorem, we further have that ΦGLΦ =1 a ε <. The proof is thus completed by directly applying Theorem 3. Remar 2. Here, the stability of a matrix is defined as that all its eigenvalues are with negative real parts. Since ran(φg) = l, the design of L could be implemented by solving a linear matrix inequality (LMI): ΦGL > 0. This approach for L owns advantages on robustness [25]. However, such selection of L leads to a non-causal design of ILC. In the next subsection, more discussions on the selection of L are given. In many applications, the identical initialization condition may not be satisfied. However, the initial state may converge to zero asymptotically, or we could introduce an initial state learning mechanism to mae it so. In this case, we could formulate the initial condition as y 0 0. Then we have the following corollary Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

103 Y Xu et al.: Stochastic Point-to-Point ILC 1751 Corollary 1. For system (2) with initial value y 0 0, design L such that ΦGL is stable, then the ILC update law (8) with arbitrary initial input u 0 would guarantee that Φê 0as. Proof. Notice that the initial value y 0 0. Then, there is an extra term added to (9), which becomes Φê +1 =Φê a ΦGLΦê (10) + a ΦGLΦε + a ΦGLΦy 0 Thus ΦGLΦy 0 corresponds to the notation ν +1 of Theorem 3 in the appendix, leading that the condition on ν is obviously satisfied. Then the corollary holds obviously according to Theorem 3. Remar 3. In both Theorem 1 and Corollary 1, it is claimed that the initial input can be arbitrarily given. This implies that the convergence is independent of the initial value of the algorithm (8). In practical applications, because we have little prior nowledge of system information, the initial input u 0 is usually set to be zero. The influence of the initial input to the variance of tracing errors was discussed in [26] where the initial input was formulated as a random variable. 3.2 Selection of Learning Gain Matrix In the last subsection, the design condition on learning gain matrix L is to mae ΦGL stable, i.e., to ensure that all eigenvalues of ΦGL are with positive real parts. Obviously, there is a large selection range of L since ΦG is of full row ran. The first and apparent selection is L = (ΦG) T, which results in that ΦGL is a positive definite matrix. The selection actually is the gradient algorithm, which has been adopted in [16,25]. However, on one hand, this selection requires the full nowledge of system matrices from the formulation of G. On the other hand, one may worry about the computation load when the iteration length N is large. In order to overcome the above disadvantages, an alternative selection of L is given as follows. Note that G is a bloc lower triangular matrix with C + B t being its diagonal blocs. Thus these diagonal blocs would play the major part in control. Hence, we could form another bloc diagonal matrix as G 1 = diag{c + B 0,, C + B N 1 } and then we can select an alternative L as L =(ΦG 1 ) T. In this case, GG 1 still is a bloc lower triangular matrix with diagonal bloc being (C + B t )(C + B t ) T. Since C + B t is of full row ran, the diagonal bloc (C + B t )(C + B t ) T is thus positive definite. Hence, all eigenvalues of GG 1 are with positive real parts although it is not symmetric, and so are the eigenvalues of ΦGL. We can find that only the information of input-output coupling matrix C + B t is required. On the other hand, the coupling matrix C + B t denotes the control direction information, which is necessary when designing the controller. Moreover, the selection L = (ΦG 1 ) T avoids the non-causality of controller design. In addition, the computation load could be further reduced by decomposing the lifted update law (8) into non-lifted forms. In other words, the lifted update law (8) is used for analysis convenience. To be specific, we give an illustrative example based on the SISO LTI system, i.e., A t A, B t B, C t C. The entire output is of N dimension. Denote the tracing error as ê 1,,, ê N,. We assume there are κ N outputs required to trac and the locations are j 1,, j κ.letj 0 = 0. Then the update law along the time axis for L =(ΦG) T case is given as κ u (t)+a m=i CAjm t 1 Bê jm,, u +1 (t) = j i 1 t < j i, 1 i κ u (t), j κ t N 1 while for L =(ΦG 1 ) T case the update law is given as u (t)+a CBê ji,, u +1 (t) = t = j i 1, 1 i κ u (t), otherwise (11) (12) 3.3 Asymptotic properties of the ILC algorithms In this subsection, further asymptotic property analysis will be done for the ILC algorithms with a decreasing gain. We have proved that Φe 0, which is a path-wise result. Noticing that the stochastic noise is involved in our system, thus Φe is a random variable. 1 We will show that a Φe is asymptotically normal, i.e., 1 the distribution of a Φe converges to a normal distribution as. This could be regarded as a statistics result and is specified as follows. Theorem 2. For system (2) with initial value y 0 = 0, design L such that ΦGL+ α I is stable, where α is defined 2 by (7), then the ILC update law (8) with arbitrary initial 1 input u 0 would guarantee that a Φe is asymptotically normal, i.e., 1 d Φe (0, S) (13) a 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

104 1752 Asian Journal of Control, Vol. 19, No. 5, pp , September 2017 where S = e ( J+ α I)z 2 JQ J T e ( JT + α I)z 2 dz (14) 0 Proof. Similar to Theorem 1, noticing the facts that J = ΦGL and Q is the covariance of Φε,wehave that the covariance of ΦGLΦε is JQ J T. The proof is completed by using Theorem 4. For the case that the initial state can be asymptotically exactly reset, i.e., y 0 0, the above asymptotic normal property is also valid, if the convergence speed of the initial value satisfies certain requirements. Thus the following corollary is presented. Corollary 2. For system (2) with initial value y 0 0satisfying y 0 = o( a ),designl such that ΦGL + α 2 I is stable, where α is defined by (7), then the result of Theorem 2 still holds. Proof. Using similar steps of Theorem 1 and Theorem 2, the result directly holds based on Theorem 4 in the appendix. Remar 4. It is obvious that the conditions in (7) are all satisfied if a = β with β > 0, and then α = 1. β β would affect the convergence rate and normality of the algorithm in practical application. The conditions of a are essentially required by the stochastic approximation algorithm. Roughly speaing, rapid decreasing of a may result in fast convergence rate and large α would impose more limitations on design of L. IV. ILLUSTRATIVE EXAMPLE Consider a linear stochastic system with the following system matrices: A t = 0.5(1 cos((t 1)π 10)) sin((t 1)π 10) 0 B t = , [ ] C t = t , To mae the illustration simple and clear, let N = 6, then y R 12 and u R 18. The noise ε is assumed to be zero-mean Gaussian process noise with the covariance Q = I 12,whereI n denotes n-dimensional unit matrix. Suppose the reference points are y (1) (1), y(2) (3), d d Fig. 1. Norm of the tracing error y r ΦGu. [Color figure can be viewed at wileyonlinelibrary.com] (4) and y(1) (6). The selected reference trajectory is y d r =[ ] T. The parameters in (8) are set as a = 1 and L = (ΦG 1 ) T. The algorithm is run for 200 iterations. The norm of the modified tracing error Φê is presented in Fig. 1, labeled by decreasing learning gain and denoted by the dash-dot line. As one could see, the error reduces to zero rapidly. This further means that the actual tracing error will be caused only by the system and measurement noises of the current iteration asymptotically, while the latter noises cannot be canceled by any learning algorithm. The decreasing learning gain a plays an important role to mae a zero-convergence of the modified error Φê. As a comparison, the conventional P-type update law is also simulated, where the gain a is fixed to be 0.2. The norm of its modified tracing error Φê is presented in Fig. 1, labeled by constant learning gain and denoted by the solid line. As one can see, the constant learning gain may lead the modified tracing error to decrease a little faster in the first fewer iterations, but fluctuate with a larger amplitude since then. However, the decreasing learning gain could mae the modified tracing error eep decreasing as the iteration number increases. This coincides with the fact that the constant learning gain fails to ensure zero-error convergence of the modified tracing error while the decreasing learning gain can. In addition, one may argue whether different α defined in (7) would have significant influence on the convergence behavior. Noticing Remar 4, it is equivalent to show how the algorithm behaves for different β. To this end, three cases are selected, i.e., β = 1, β = 2, y (1) d 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

105 Y Xu et al.: Stochastic Point-to-Point ILC 1753 Fig. 2. Norm of the modified tracing error y r ΦGu with different learning gain. [Color figure can be viewed at wileyonlinelibrary.com] Fig. 4. The empirical CDF for tracing error data at reference points. [Color figure can be viewed at wileyonlinelibrary.com] Fig. 3. The last whole output y 200 and the reference y r. [Color figure can be viewed at wileyonlinelibrary.com] and β = 0.5, respectively. That is, the decreasing learning gains are a = 1, a = 2, anda = 0.5, respectively. The modified tracing error profiles along the iteration axis are shown in Fig. 2. One may find little difference in the tracing performance among different learning gain cases as long as the convergence condition is satisfied. Thus different β may mainly affect the design of L. The whole output of the last iteration y 200 and the reference points are shown in Fig. 3, where the dashed line with diamonds denotes all components of the output and the four circles denote the desired points. The inte- Fig. 5. The probability distribution function based on tracing error data at reference points. [Color figure can be viewed at wileyonlinelibrary.com] gers in x-label correspond to y (1) (1), y (2) (1), y (1) (2),, y (2) (6), thus the four circles in Fig. 3 denote y (1) (1), y(2) (3), d d y (1) (4) and y(1) (6). From the figure, one can find that the d d system output tracs the desired points effectively under the noise environment. In order to test the asymptotic normality, we repeat the above simulation for 1000 times and collect all the tracing errors at the reference points of the last iteration (the 200th iteration). Fig. 4 shows the empirical cumulative distribution function (CDF) for the tracing error data at each reference point and Fig. 5 plots the probability distribution function of the resulting distribution based on the tracing error data Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

106 1754 Asian Journal of Control, Vol. 19, No. 5, pp , September 2017 V. CONCLUSIONS An ILC algorithm is proposed for the stochastic point-to-point tracing system in this paper. The general problem formulation is given by introducing a matrix to connect the selected reference points with the original reference. Moreover, a decreasing learning gain is introduced in the conventional P-type update law to suppress stochastic noises. The almost sure convergence of the new algorithm is directly proved in term of modified tracing error. Furthermore, asymptotic normality of the limiting modified tracing error is also provided. However, in our algorithm, the unrequested outputs are with no consideration, while these outputs leave much freedom to design the algorithm and to see more optimization objectives if they are observable. For example, whether could the energy of unrequested outputs be minimized simultaneously for saving control efforts? This is an interesting and open problem. REFERENCES 1. Bristow, D. A., M. Tharayil, and A. G. Alleyne, A survey of iterative learning control: A learning-based method for high-performance tracing control, IEEE Control Syst. Mag., Vol. 26, No. 3, pp (2006). 2. Ahn, H. S., Y. Q. Chen, and K. L. Moore, Iterative learning control: survey and categorization from 1998 to 2004, IEEE Trans. Syst. Man Cybern. C, Vol. 37, No. 6, pp (2007). 3. Wang, Y., F. Gao, and F. J. I. Doyle, Survey on iterative learning control, repetitive control, and run-to-run control, J. Process Control, Vol. 19, No. 10, pp (2009). 4. Shen, D. and Y. Wang, Survey on stochastic iterative learning control, J. Process Control,Vol.24,No.12, pp (2014). 5. Yang, S., J.-X. Xu, D. Huang, and Y. Tan, Synchronization of heterogeneous multi-agent systems by adaptive iterative learning control, Asian J. Control, Vol. 17, No. 6, pp (2015). 6. Shen, D. and Y. Xu, Iterative learning control for discrete-time stochastic systems with quantized information, IEEE/CAA J. Automatica Sinica, Vol. 3, No. 1, pp (2016). 7. Shen, D., W. Zhang, Y. Wang, and C.-J. Chien, On almost sure and mean square convergence of P-type ilc under randomly varying iteration lengths, Automatica, Vol. 63, No. 1, pp (2016). 8. Liu, Z. and J. Liu, Adaptive iterative learning boundary control of a flexible manipulator with guaranteed transient performance, Asian J. Control (2016). DOI: /asjc Yin, Y., X. Bu, and J. Liang, Quantized iterative learning control design for linear systems based on a 2-D Roesser model, Asian J. Control (2016). DOI: /asjc Chi, R., D. Wang, F. L. Lewis, Z. Hou, and S. Jin, Adaptive terminal ILC for iteration-varying target points, Asian J. Control,Vol. 17,No. 3,pp (2015). 11. Jin, S., Z. Hou, and R. Chi, Optimal terminal iterative learning control for the automatic train stop system, Asian J. Control, Vol. 17, No. 5, pp (2015). 12. Liu, T., D. Wang, and R. Chi, Neural networ based terminal iterative learning control for uncertain nonlinear non-affine systems, Int. J. Adapt. Control Signal Process., Vol. 29, No. 10, pp (2015). 13. Freeman, C. T., Z. Cai, E. Rogers, and P. L. Lewin, Iterative learning control for multiple point-to-point tracing application, IEEE Trans. Control Syst. Technol., Vol. 19, No. 3, pp (2011). 14. Son, T. D., H. S. Ahn, and K. L. Moore, Iterative learning control in optimal tracing problems with specific data points, Automatica, Vol. 49, No. 5, pp (2013). 15. Owens, D. H., C. T. Freeman, and T. V. Dinh, Norm-optimal iterative learning control with intermediate point weighting: theory, algorithms, and experimental evaluation, IEEE Trans. Control Syst. Technol., Vol. 21, No. 3, pp (2013). 16. Freeman, C. T. and Y. Tan, Iterative learning control with mixed constraints for point-to-point tracing, IEEE Trans. Control Syst. Technol., Vol. 21, No. 3, pp (2013). 17. Freeman, C. T. and Y. Tan, Point-to-point iterative learning control with mixed constraints, Proc. Amer. Control Conf, San Francisco, CA, pp (2011). 18. Freeman, C. T., Constrained point-to-point iterative learning control with experimental verification, Control Eng. Practice, Vol. 20, No. 5, pp (2012). 19. Chu, B., C. T. Freeman, and D. H. Owens, A Novel Design Framewor for Point-to-Point ILC Using Successive Projection, IEEE Trans. Control Syst. Technol., Vol. 23, No. 3, pp (2015). 20. Shen, D. and Y. Wang, Iterative learning control for stochastic point-to-point tracing system, Proc. 12th Int. Conf. Control, Automation Robotics Vision, Guangzhou, China, pp (2012) Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

107 Y Xu et al.: Stochastic Point-to-Point ILC Owens, D. H., C. T. Freeman, and B. Chu, An inverse model approach to multivariable norm optimal iterative learning control with auxiliary optimization, Int. J. Control, Vol. 87, No. 8, pp (2014). 22. Meng, D., Y. Jia, J. Du, and FF. Yu, Data-driven control for relative degree systems via iterative learning, IEEE Trans. Neural Netw., Vol. 22, No. 12, pp (2011). 23. Wei, Y. and X. Li, Iterative learning control for linear discrete-time systems with high relative degree under initial state vibration, IET Contr. Theory Appl., Vol. 10, No. 10, pp (2016). 24. Saab, S. S., A discrete-time stochastic learning control algorithm, IEEE Trans. Autom. Control, Vol. 46, No. 6, pp (2001). 25. Butcher, M., A. Karimi, and R. Longchamp, Iterative learning control based on stochastic approximation, Proc. IFAC World Congress, Coex, South Korea, pp (2008). 26. Meng, D., Y. Jia, and J. Du, Evaluation of initial input effects on discrete-time stochastic iterative learning control, Asian J. Control, Vol. 15, No. 6, pp (2013). 27. Chen, H. F., Stochastic Approximation and Its Applications, Kluwer Academic Publishers, Dordrecht (2002). VI. APPENDIX The following theorems on stochastic approximation algorithm are cited from [27]. Consider the following recursion with arbitrary initial value η 0, η +1 = η + a H η + a (μ +1 + ν +1 ). (15) Theorem 3. Assume the following conditions hold: A.1 {a } satisfies that a > 0, a and a 1 +1 a 1 α 0as ; 0, =1 a =, A.2 {μ } and {ν } satisfy that =1 a μ +1 < and ν 0; A.3 {H } are l l matrices satisfying that H H and H is stable. Then {η } generated by (15) tends to zero, i.e., η 0. Theorem 4. Assume A.1 and the following conditions hold A.2 {μ, } is a martingale difference sequence of l-dimension with E(μ ) = 0, lim E(μ μ T ) =Γ,and{ν } satisfies that ν = o( a ); A.3 {H } are l l matrices satisfying that H H and H + α I is stable. 2 Then 1 a η,where{η } is generated by (15), is asymptotically normal: 1 d η (0, M) a where M = e (H+ α I)z 2 Γe (HT + α I)z 2 dz. 0 Yun Xu received the B.S. degree in Automation from Beijing Institute of Petrochemical Technology, China, in Now she is pursuing a M.S. degree at College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. Her research interests are in the area of sampled-data iterative learning control and adaptive iterative learning control. Dong Shen received the B.S. degree in mathematics from Shandong University, Jinan, China, in He received the Ph.D. degree in mathematics from the Academy of Mathematics and System Science, Chinese Academy of Sciences (CAS), Beijing, China, in From 2010 to 2012, he was a Post-Doctoral Fellow with the Institute of Automation, CAS. Since 2012, he has been an associate professor with College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. His current research interests include iterative learning controls, stochastic control and optimization. He has published more than 40 refereed journal and conference papers. He is the author of Stochastic Iterative Learning Control (Science Press, 2016, in Chinese) and co-author of Iterative Learning Control for Multi-Agent Systems Coordination (Wiley, 2017). Dr. Shen received IEEE CSS Beijing Chapter Young Author Prize in 2014 and Wentsun Wu Artificial Intelligence Science and Technology Progress Award in Xiao-Dong Zhang received the B.S. degree in Automation from Beijing University of Chemical Technology, China, in Now he is pursuing a M.S. degree at College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. His research interests are in the area of soft-sensing and machine learning Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

108 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2017 VOL. 48, NO. 13, Learning control for discrete-time nonlinear systems with sensor saturation and measurement noises Dong Shen and Chao Zhang College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, P.R. China Downloaded by [Dong Shen] at 08:34 04 September 2017 ABSTRACT The iterative learning control (ILC) is investigated for a class of nonlinear systems with measurement noises where the output is subject to sensor saturation. An ILC algorithm is introduced based on the measured output information rather than the actual output signal. A decreasing sequence is also incorporated into the learning algorithm to ensure a stable convergence under stochastic noises. It is strictly proved with the help of the stochastic approximation technique that the input sequence converges to the desired input almost surely along the iteration axis. Illustrative simulations are exploited to verify the effectiveness of the proposed algorithm. 1. Introduction It is readily apparent that the human behaviours for completing a given tas can be constantly improved by repetition. Motivated by this basic cognition, iterative learning control (ILC) is developed for repetitive systems to improve the tracing performance along the iteration axis. To be specific, for the systems that could complete some given tas in a finite time interval repeatedly, ILC generates the input signal for the current iteration on the basis of the input signals and tracing information from previous iterations as well as the desired reference. ILC is advantageous in its simple but effective control structure, which maes the implementation much easy. It has gained fast developments both in theoretical analysis and practical applications during the past three decades (Ahn, Chen, & Moore, 2007; Bristow,Tharayil,&Alleyne,2006; Shen&Wang,2014). There have been lots of studies on different topics of ILC such as update law design (Khong, Nesic, & Krstic, 2016), robustness (Hao, Liu, Pasze, & Galowsi, 2016; Li, Huang, Chu,& Xu, 2016) and application research (Blacwell, Tutty, Rogers, & Sandberg, 2016; Li, Ren,& Xu, 2016). Besides, the exploration of ILC has been extend to many new problems such as consensus of multi-agent system (Meng & Moore, 2016; Yang& Xu,2015), iteration varying lengths (Shen, Zhang, Wang, & Chien, 2016; Shen,Zhang,&Xu,2016), event-triggered control (Xiong, Yu, Patel, & Yu, 2016), control with quantisation information (Bu, Wang,Hou,&Chi,2015;Shen&Xu,2016), control under data dropouts (Liu & Ruan, 2016; Shen& Wang, 2015b) and initial state vibration (Wei & Li, 2016). Moreover, in order to compensate the nonrepeating disturbances and enhance the transient tracingperformance,afeedbaccontrollercanbeusedincombination with ILC. Ouyang, Zhang, and Gupta (2006) proposed a novel method using this idea, in which the gains of the PDtype feedbac controller adopted a switching mode to increase the convergence speed and enhance the transient performance. ARTICLE HISTORY Received 21 December 2016 Accepted 13 June 2017 KEYWORDS Affine nonlinear systems; sensor saturation; iterative learning control; stochastic approximation Saturation is a common nonlinearity for mechanic systems due to various range limitations in devices including both input saturation and output saturation. For example, cheap devices with inadequate range and amplifier saturation of electronic circuits would lead to saturation. The input saturation has been considered in lots of literature such as Bernstein and Michel (1995) and references therein, while the outputs are usually assumed to be with unlimited amplitude. However, the latter should be thoroughly explored because output saturation is common in many industrial applications. In the paper, Kong, Kniep,and Tomizua (2010) showed the saturation character in electric motor systems, where the speed or angular velocity was limited because of the maximum voltage. Moreover, Chau, Qin, Sayed, Wahab, and Yang (2010) gave an illustration that the battery recovery effect exhibited in wireless sensor networs was subject to the saturation threshold depending on the random sensing activities. Another example was provided in Rousseau, Varela, and Chapeau-Blondeau (2003), where the sensor devices were linear for small inputs and saturated for large inputs. From these industrial illustrations one can find that output saturation exists in many mechanic systems, and therefore, it deserves considerable efforts on the control and optimisation analysis. On the other hand, the literature on analysis and design for the output saturation problem are far less than the input saturationcase.sofar,onlylocalresultshavebeenobtained. The stabilisation is an important issue in control system design against saturation case. For this issue, early results were given in Kreisselmeier (1996) and Lin and Hu (2001) for output saturation problem, where the (semi)-globally asymptotically stabilisation conditions for linear feedbac control law were established according to single-input-single-output (SISO) systems. The extension to multi-input-multi-output (MIMO) systems was addressed in Grip, Saberi, and Wang (2010), where the authors showed a possibility of stabilisation without further CONTACT Dong Shen shendong@mail.buct.edu.cn 2017 Informa UK Limited, trading as Taylor & Francis Group

109 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 2765 Downloaded by [Dong Shen] at 08:34 04 September 2017 restrictions. The output feedbac H control problem for linear systems subject to sensor saturation was further addressed in Cao, Lin, and Chen (2003) usingthelinearmatrixinequality technique and was verified by a case of flight control system. Meanwhile, in Wang, Shen, and Liu (2012), the H filtering problem was discussed for a class of nonlinear systems with randomly occurring sensor saturation. Another ind of filtering, called set-membership filtering, was proposed in Yang and Li (2009) for discrete time-varying systems subject to sensor saturation. However, it is noticed that the results on output saturation problem are quite scattered and most of them provide a constructive approach according to the concerned problem. This further means that the research on control of systems with saturation is far away from perfection. In the ILC field, few papers are found on the algorithm design and performance analysis for systems with sensor saturation. Most papers related to saturation in the ILC filed pay attention to theinputsaturationcase.xu,tan,andlee(2004)andhou,xu, and Yan (2008) considered the input saturation problem, where the composite energy function method, as well as an inequality property of saturation function, was introduced to give the convergence analysis. Moreover, a reference governor method wasproposedintan,xu,norrlof,andfreeman(2011)toresolve the ILC control problem for dynamic systems with input saturation. Furthermore, for parameterised systems with input saturation, Zhang, Hou, Chi, and Ji (2015) and Zhang, Hou, Ji, and Yin (2016) providedan adaptive ILC scheme to update theassociated parameters and deal with the saturation problem. This method was also applied to high-speed trains control in Ji, Hou, and Zhang (2016). However, there is nearly no paper reporting progress on the ILC for systems subject to output saturation. A Chinese paper (Zhang & Fang, 2011) gave a primary attempt on ILC and repetitive control for linear systems and showed the convergence by the conventional contraction mapping method in λ-norm sense. It should be pointed out that all the above papers only consider deterministic systems. When considering systems with stochastic noises, the proposed approaches fail to obtain a strict analysis of the convergence. It is still an open problem. In addition, there has been lots of papers on ILC for nonlinear systems such as Xu (1997); however, the output was assumed to be fully accessible and no saturation on the output signal was considered in Xu (1997)and other related papers. In this paper, we are motivated to further study the ILC for a class of nonlinear systems, where the output is measured by sensors with saturation. Besides, the system outputs are also involved with stochastic noises due to uncertain environments. Asaresult,thenoiseinvolvedoutputwouldbebeyondthe sensor range randomly, and therefore, the actual measurement signal contains uncertainties from both saturation and noises. A stochastic approximation-based ILC algorithm is proposed anditisstrictlyprovedthatthegeneratedinputsequence converges to the desired one almost surely. Specifically, the technical contributions of this paper are as follows. First, the connection between the measured data and its original signal is established by a random but bounded random variable. This connectionalsohelpsustoobservetheeffectofthemeasured output in the updating law compared with the original system output. Moreover, a decreasing gain sequence is introduced to the updating law so that the nonlinearities and stochastic noises in the system can be well handled and a stable convergence of the input sequence along the iteration axis is thus guaranteed. Last but not least, when dealing with the multi-dimensional system, a control direction rectifying matrix is further introduced and the convergence condition of the proposed learning framewor is quite relaxed comparing with the existing wors. The proposed approach is potential for quantised ILC problem. Numerical simulations are detailed to show the effectiveness of the proposed algorithm. The rest of the paper is arranged as follows. Section 2 formulates the ILC problem for nonlinear systems with sensor saturation. The ILC algorithm and its convergence analysis are given in Section 3. The extension to the MIMO system case is elaborated in Section 4. Section5 provides a numerical simulation and Section 6 concludes the paper. Notations: R is the set of real numbers and R n is the n- dimensional space. E denotes the mathematical expectation of a random variable. The superscript T denotes the transpose of a vector or matrix. 2. Problem formulation Consider the following SISO nonlinear system with sensor saturation: x (t + 1) = f (t, x (t)) + b(t, x (t))u (t) y (t) = c T (t)x (t) (1) z (t) = Sat(y (t) + w (t)) where = 1, 2, labels the iteration number, while t = 0, 1,, N denotes different time instant of one iteration and N is the length of each iteration. u (t) R, x (t) R n, y (t) R are the input, state and output, respectively, whereas z (t) isthemeasured output by a saturated sensor and w (t) is the measurement noise. It should be emphasised that the actual system output y (t) is not available for the design of updating laws because of the existence of saturation mechanism. In other words, only the measured output z (t) can be used in the control law design. f (, ) : R R n R n and b(, ) : R R n R n are nonlinear continuous functions. In addition, these functions, f (t, x (t)) and b(t, x (t)),togetherwithc(t) R n are unnown, thus they are not available for the design of the learning law. The sensor saturation is defined as m, v m Sat(v) = v, m < v < m (2) m, v m for some unnown positive constant m. In other words, if the output coupled with random noises exceeds the measurement range,theactualmeasurementistheboundary.thesaturation nonlinearity here is usually due to the practical limitation of various measurement devices. The control objective of this paper is to find an input sequence {u (t), t = 0, 1,, N 1}, based on the measured information z (t) rather than the original output y (t), such that the output y (t) could trac the desired reference y d (t)precisely as the iteration number goes to infinity. The following assumptions are needed for system (1).

110 2766 D. SHEN AND C. ZHANG Downloaded by [Dong Shen] at 08:34 04 September 2017 Assumption 2.1: The desired reference y d (t) is realisable in the sense that there exist unique x d (0) and u d (t) such that x d (t + 1) = f (t, x d (t)) + b(t, x d (t))u d (t) y d (t) = c T (t)x d (t) (3) In addition, the desired reference lies in the measurable range, i.e. y d (t) < m, t. Assumption 2.2: The initial states can be precisely reset in the sense that x d (0) x (0) 0 as asymptotically. Remar 2.1: In the ILC field, it is usually required that the initial state is precisely reset, i.e. x (0) = x d (0),.Thisisthewellnown identical initialisation condition (i.i.c.). Assumption 2.2 is a relaxation of this condition. Many papers have studied the so-called re-initialisation problem such as Chen, Wen, Gong, and Sun (1999). Moreover, for discrete-time system, the wellnown initial rectifying technique can also be applied when the initial state varies. Since this problem is out of our scope in this paper, we simply use Assumption 2.2. Assumption 2.3: The functions f (t, ) and b(t, ) are continuous with respect to the second argument, t. Assumption 2.4: The real number c T (t + 1)b(t, x) coupling the input and output is an unnown nonzero constant, but its sign, characterising the control direction, is assumed nown. Without loss of any generality, it is assumed that c T (t + 1)b(t, x) >0 in the rest of this paper. Assumption 2.5: Foreachtimeinstantt,themeasurementnoise {w (t)} is a sequence of independent and identical distributed random variables such that Ew (t) = 0, sup w 2 (t) <, and 1 lim sup n n n =1 w2 (t) = Rt w a.s., t, where Rt w is unnown. Remar 2.2: Let us mae some further remars on the assumptions. Assumption 2.1 assumes the realisability of the desired references. Two points are included, i.e. the existence of optimal control signal and the range limitation of the desired reference. Ifthedesiredreferenceisoutofthesaturationrange,thenthe perfect tracing is hard to achieve due to the essential limitation of the saturated output z (t). Assumption 2.4 claims the information of control direction because such condition is a basic requirement for the controller design. If the control direction is unnown, an adaptive detecting mechanism should be introduced into the algorithm similar to Shen and Wang (2015a), which will mae the algorithm much complex. Assumption 2.5 imposes the conditions on stochastic noises. Considering the practical environments, i.e. the process being repeatable from iteration to iteration, it is accessible that the noises are usually bounded and random. For simplicity of writing, let us set f (t) f (t, x (t)), f d (t) f (t, x d (t)), b (t) b(t, x (t)), b d (t) b d (t, x d (t)), δx (t) x d (t) x (t), δu (t) u d (t) u (t), δ f (t) f d (t) f (t), δb (t) b d (t) b (t), c + f (t) c T (t + 1) f (t), c + b (t) c T (t + 1)b (t). The following lemma contributes a basis for the convergence proof in the next section. Lemma 2.1: Assume that Assumptions hold for system (1). Iflim δu (s) = 0,s= 0, 1,,t 1, thenatthetime instant t, δx (t) 0, δ f (t) 0, δb (t) 0 as. Proof: We prove the lemma by mathematical induction. From Equations (1) and(3), it follows that δx (t + 1) = f d (t) f (t) + b d (t)u d (t) b (t)u (t) = δ f (t) + δb (t)u d + b (t)δu (t) (4) Initial step. For t = 0, from Assumptions 2.2 and 2.3 we have that δ f (0) 0, δ b (0) 0as, whichfurther implies that the first two terms on the right-hand side of Equation (4) tend to zero as. Noticing b (0) b d (0) + δb (0), wefindthatb (0) is bounded. Thus, the fact lim δu (0) = 0 leads that the third term on the righthand side of Equation (4) also goes to zero. Hence, it follows that δx (1) 0as. Further, by noting Assumption 2.3 we have that δ f (1) 0and δb (1) 0as. Inductive step.assumetheconclusionsofthelemmaaretrue for s = 0, 1,, t 1, then we have that δx (t) 0,. Now,weproceedtoshowtheconclusionfortimeinstantt under the condition lim δu (t) = 0.Tothisend,noticing Equation (4), by completely the same steps as that used above, we find that δx (t + 1) 0as.Thatis,theconclusions arealsovalidfort. This completes the proof. Lemma 2.1 reveals the essential effect of the input at previous time instants (e.g. 0, 1,, t 1) on the state at current time instant (e.g. t). Specifically, the state error at current time instant wouldapproachtozeroaslongastheinputerrorsatallprevious time instants approach to zero. In other words, the effect of the historical tracing error is asymptotically negligible if the precise convergence of the input sequence can be achieved asymptotically. Therefore, based on Lemma 2.1, the analysis objective in the following is to show the zero-error convergence of the input sequence to the desired input defined in Assumption 2.1. The details of the analysis will be specified in the next section. 3. ILC algorithm and its convergence In this section, we will design the ILC law with a decreasing gain sequence, establish the connection between the measured information and the original output signal, and provide the main theorem on the asymptotic convergence. The detailed proof is put in the Appendix to achieve a smooth readability. Recall the fact that the actual output y (t) is not available for algorithm design. That is, only the measured output z (t) canbeused.thus, the available error is defined as e (t) y d (t) z (t), which will beemployedintheupdatinglaw.wenowdefinethelearning algorithm as follows: u +1 (t) = u (t) + a e (t + 1) (5) where a isapriordefineddecreasinggainsuchthata > 0, a 0as, =1 a =,and =1 a2 <.Itisobvious that a = a/ meets all these requirements with a > 0beinga suitable constant. This decreasing sequence is introduced from the stochastic approximation algorithm, which mainly aims to

111 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 2767 Downloaded by [Dong Shen] at 08:34 04 September 2017 ensure the almost sure convergence and suppresses the influence of stochastic noises. Remar 3.1: The above algorithm (5) actually is a stochastic approximation algorithm (Borar, 2008; Chen, 2002; Kushner &Yin,1997).Ifthedecreasinggainisreplacedwithsomeconstant, then the algorithm is the conventional P-type update law. The introduction of decreasing gain can ensure the asymptotical zero-error tracing performance under stochastic measurement noises. It is well nown that a properly decreasing gain for the correction term is a necessary requirement to ensure convergence in the recursive computation of stochastic systems for optimisation, identification and tracing (Benveniste, Metivier, & Priouret, 1990; Caines, 1988). However, this decreasing gain alsoslowsdowntheconvergencespeedoftheproposedalgorithm, which roots from the stochastic approximation theory. If the system is a deterministic system, i.e. the stochastic noise term is removed from the system, then one can replace the decreasing gain with a suitable constant so that an exponential convergence speed is obtained. Remar 3.2: As will be shown in the following convergence analysis, the inherent principle that the classic P-type learning algorithm (5) is effective for the output saturation problem is as follows. When the saturated output information is used for updating the input signal, the improvement of tracing performance is indeed achieved, although it is not so effective as the case that the original output information is used for updating. In other words, the output saturation problem might slow down the convergence speed; however, the convergence property is still guaranteed. For any fixed t we have e (t + 1) = y d (t + 1) z (t + 1) = y d (t + 1) Sat(y (t + 1) + w (t + 1)) = γ (t)(y d (t + 1) y (t + 1) w (t + 1)) (6) where γ (t) is defined as follows: γ (t) = y d (t + 1) m y d (t + 1) y (t + 1) w (t + 1), if y (t + 1) + w (t + 1) m 1, if m < y (t + 1) + w (t + 1) <m y d (t + 1) + m y d (t + 1) y (t + 1) w (t + 1), if y (t + 1) + w (t + 1) m (7) It is noted that 0 <γ (t) 1, t, and γ (t)isarandomvariable depending on w (t + 1). Remar 3.3: From Equation (6)we find that y d (t + 1) Sat(y (t + 1) + w (t + 1)) = γ (t)(y d (t + 1) y (t + 1) w (t + 1)) y d (t + 1) y (t + 1) w (t + 1) This coincides with the Property 1 provided in Xu et al. (2004). However, the stochastic noise is involved here and therefore the traditional contraction mapping method is unsuitable. Consequently, we define the random variable γ (t) to denote the contraction coefficient and show the convergence from the viewpoint of probability theory directly based on stochastic approximation technique. This is the technical difference between this paper and the existing results. In addition, it should be specially emphasised that the coefficient γ (t)isonlyavirtual one that is reduced from the original learning update law (5). In other words, such coefficient is only used for the convergence analysis and is not required to be nown for updating. Next, we substitute the expression (6) of available error e (t + 1) into the learning algorithm (5) to derive a specified regression of the input sequence. From Equations (1), (3), (4) and(6), we can rewrite Equation (5)as where u +1 (t) = u (t) + a γ (t)c + b (t)(u d (t) u (t)) + a ϕ (t) a γ (t)w (t + 1) (8) ϕ (t) = γ (t)(c + δ f (t) + c + δb (t)u d (t)) (9) denotes the structure noise. The regression function is h t, (u) = γ (t)c + b (t)(u d (t) u) (10) It should be pointed out that the above regression function depends on for any fixed t. However,therootofthefunction is independent of. Wenowgivethefirsttheoremofthispaper. Theorem 3.1: Assume that Assumptions hold for system (1). Then, the input sequence {u (t)} generated by the learning algorithm (5) converges to the desired input u d (t) as almost surely, t = 0, 1,,N 1.Inaddition,theoutputy (t) will converge to the desired reference y d (t) as almost surely, t. The proof is put in the Appendix. Theorem 3.1 provides the asymptotic convergence of the input sequence in almost sure sense. This theorem indicates that the simple P-type learning law with a decreasing gain can ensure a stable convergence of the input sequence to the desired input. In other words, the P-type learning law holds a good robustness against output saturation and stochastic noises. Remar 3.4: In this paper, the measurement output z (t) is involved with stochastic noises as is shown in the last equation of (1). In some applications, the stochastic noise w (t) may be introduced during the transmission or generated by other factors, which maes it be an additive term to the saturated output. In other words, the measurement output equation of (1)isformulatedasz (t) = Sat(y (t)) + w (t). For such ind of formulation, the proposed ILC algorithm (5) remains effective. The convergence proof can be done following similar steps with minor modifications in the derivations. The critical idea of the proof is to establish an continuous contraction along the iteration axis (i.e. φ i, j definedinequation (A3) in the Appendix) and then to show the asymptotically varnishing property of various series. It is worth pointing out that

112 2768 D. SHEN AND C. ZHANG Downloaded by [Dong Shen] at 08:34 04 September 2017 the term φ i, j is a product of successive real numbers when considering SISO system, whence the estimate can be made easily as real numbers are always commutative with each other. When considering MIMO system, such product should be defined according to matrices rather than real numbers; therefore, the commutativity of multiplication is no longer valid. This problem maes it nontrivial to extend the above results to MIMO systems. The details will be given in the next section. 4. Extension to MIMO case In this section, the proposed ILC scheme is extended to MIMO nonlinear systems by introducing a control direction rectifying matrix. The control direction Assumption 2.4 will be first modified according to the multi-dimensional system formulation. Then, the modifications on the learning algorithm and its convergence analysis will be given in sequence. Specifically, we consider the following nonlinear system: x (t + 1) = f (t, x (t)) + B(t, x (t))u (t) y (t) = C(t)x (t) (11) z (t) = Sat(y (t) + w (t)) where u (t) R p, y (t) R q and the associated matrix B(t, x) R n p, C(t) R q n.foravectorv = [v 1,...,v q ] T R q,the saturation Sat(v) [Sat(v 1 ),...,Sat(v q )] T and Sat(v i ), i,is definedinequation(2). Assumptions arestillvalidwiththevectorsb(t, x) and c(t) being replaced by matrix B(t, x) andc(t). The noise Assumption 2.5 is slightly modified in the expression of R t w, which should be R t w = lim sup n 1 n n =1 w (t)w T (t).inthe rest of this section, we will directly refer to these assumptions without detailing the differences again to avoid boring reduplication. Moreover, in Assumption 2.4, thecontroldirection is required to be nown prior as it is important for controller design. For the SISO system, the control direction actually is the signofcouplingvalue,whileforthemimosystem,thecontrol direction would not be so simple. Thus, Assumption 2.4 is modified as follows. Assumption 4.1: The matrix C(t + 1)B(t, x) coupling the input and output is of full-column ran. Remar 4.1: A direct corollary of Assumption 4.1 is that q p. Meanwhile, Assumption 4.1 also implies that the relative degree, which characterises the inherent relationship between the input and output, is one in essence. If the system relative degree is larger than one, say η, then the results given in this paper are still valid provided that the tracing error e(t + 1) used in the algorithm is replaced with e(t + η). To mae our expression concise, we only consider the relative degree one case throughout the paper. Nowthelearninglaw(5)ismodifiedas u +1 (t) = u (t) + a L t e (t + 1) (12) where L t R p q is the learning gain matrix to be defined later. Remar 4.2: TheroleofL t in the algorithm (12) istheupdating direction term that is incorporated with the coupling matrix C + B(t, x) C(t + 1)B(t, x) toensureanasymptoticalconvergenceastheiterationnumbergoestoinfinity.tobespecific,the design condition of L t is that all the eigenvalues of L t C(t + 1)B(t, x d (t))arewithrealparts, t. This condition is specified in the following theorem. Theorem 4.1: Assume that Assumptions , 2.5 and 4.1 hold for system (11). Then, the input sequence {u (t)} generated by the learning algorithm (12) converges to the desired input u d (t) as almost surely, t = 0, 1,,N 1, ifthelearninggainmatrix L t satisfies that all eigenvalues of L t C(t + 1)B(t, x d (t)) are with positive real parts for all time instances. In addition, the output y (t) will converge to the desired reference y d (t) as almost surely, t. The proof can be found in the Appendix. Theorem 4.1 extends the asymptotic convergence results from SISO system to MIMO system. Moreover, the convergence condition is essentially imposed on the input/output coupling matrix C + B(t, x). On the one hand, there always exist feasible solutions of the learning matrix L t as long as C + B(t, x) is of full-column ran. On the other hand, to some extent, the full-column ran is a necessary requirement to ensure a precise tracing (cf. Huang, Tan, & Lee, 2002). Thus, Theorem 4.1 employs a rather relaxed condition for convergence under output saturation and stochastic noises. Remar 4.3: The major difference between the proofs of Theorems 3.1 and 4.1 lies in the estimation of contraction terms φ i, j and i, j,wheretheformerisascalarwhilethelatterisa matrix. Thus, more complex derivations have to be done to mae the proof strict as the product of matrices involves additional difficulties. Remar 4.4: From the comparisons of the convergence conditionsin Theorems 3.1 and 4.1, it can be seen thatthe SISO case is a special case of the MIMO case. To be specific, when considering the SISO system, we assume the control direction is nown and positive, i.e. c + b(t, x) >0, as specified in Assumption 2.4. Then, according to Theorem 4.1,anypositivenumberL t would ensure the convergence for the SISO case. Thus, the reason that such gain does not appear in the learning algorithm (5)isthatwe simply let it be one equivalently in Equation (5). For the MIMO system, the gain matrix L t hastobeintroducedtobalancethe dimension and ensure the convergence condition specified by the eigenvalues of L t C + B(t, x). We have now established the convergence of ILC for nonlinear systems with output saturation for both SISO and MIMO systems. The technical differences between SISO and MIMO cases are also detailed. In the next section, we will verify these theoretical results based on two examples for SISO and MIMO systems, respectively. 5. Illustrative simulations In this section, the proposed algorithm is applied to two examples. The first is a single-lin manipulator system, which in essence is an SISO system. The second is an artificial system adopted from the existing paper, which is a two-inputtwo-output system.

113 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 2769 Downloaded by [Dong Shen] at 08:34 04 September 2017 Figure 1. Output tracing performance at the first, fifth and last iteration Example 1: single-lin manipulator A single-lin manipulator can be described by the following model (Wit, Noel, Aubin, Brogliato, & Drevet, 1989): (mr 2 + I) q + mgr cos(q) + f (q, q) = τ (13) where q, I and τ are the lin displacement, the moment of inertia and the applied motor torque, respectively. The punctual mass m is payload, which is located at distance r from the centre of the coordinate. The friction force f is difficult to model exactly. Letting h be a sampling period, one can discretise the model by using the Euler method as follows: τ(ih) = (mr 2 q(ih + 2h) 2q(ih + h) + q(ih) + I) + mgr cos(q(ih)) + f h 2 ( q, q(ih + h) q(ih) h Let x (1) (i) = q(ih), x (2) (i) = q(ih + h)andu(i) = τ(ih). Then, the above equation can be rewritten as the following state-space form: x (1) (i + 1) = x (2) (i) x (2) (i + 1) = 2x (2) (i) x (1) (i) mgrh2 mr 2 + h cos(x(1) (i)) ( h2 mr 2 + I f x (1) (i), x(2) (i) x (1) ) (i) h + h2 u(i) (14) mr 2 + I ) The parameters used in the simulation are set to be r = 1.0 m, m = 2.0 g, I = 1.5 g m 2.Thesamplingperiodish = s, and the tracing period or iteration length is T = s. Moreover, the friction function is f = (2 2 q +10 q )sgn( q), which denotes a simplified model of frictions including Coulomb stiction, asymmetries and downwards bends (Wit et al., 1989), and sgn( ) is a sign function. The system output is given as y(i) = x (2) (i). Here we assume the output is bounded with the upper bound being 1.4. That is, if the output is larger than 1.4, the measured output is actually 1.4. The involved stochastic noises satisfy w (t) N(0, ). The reference trajectory is y d (t) = sin(t) 1 cos(5t) 0.2t The initial input is simply given as u 0 (t) = 0, t.thelearning gain is given as a = 2/, 1. It is apparent that the requirements of a are satisfied. The algorithm runs for 20 iterations. The tracing performance is shown in Figure 1, wherethe solid line denotes the reference. As can be seen, the output at the first iteration is far away from the reference, while the one at the fifth iteration improves the performance greatly. The output at the 20th iteration almost coincides with the reference. This shows a well-tracing performance of the proposed algorithm. To show the decreasing property of the errors, denote the actual tracing error as e (t) = y d (t) y (t), where the subscript is the iteration number, and denote the measured tracing error as ê (t) = y d (t) z (t),wherez (t) = Sat(y (t) + w (t)). Figure 2 displays the convergence property of the proposed algorithm, where the max tracing error is defined as max t { e (t) } or max t { ê (t) } for the actual tracing error and measured tracing error, respectively. The maximum tracing error profiles are mared with circles and squares for actual and measured tracing error cases, respectively. Note that stochastic noise is involved and we introduce a decreasing learning gain a to

114 2770 D. SHEN AND C. ZHANG Downloaded by [Dong Shen] at 08:34 04 September 2017 Figure 2. Maximal error profiles. suppress the influence of stochastic noises, thus the maximum error profiles cannot decrease to zero completely. However, the tracing performance is already good enough. Moreover, as has been addressed in Remar 3.1, the decreasing gain a is introduced only to suppress the influence of noised output and measurement output stochastic noises. The disadvantage of such gain is a somewhat slow convergence speed. However, if the system is a deterministic system or the stochastic noises are negligible, then we can replace the decreasing gain with a constant gain. In Figure 2, as a comparison, we also simulate the case of constant gain for 0 reference actual output measured output time axis Figure 3. The system output with noises and measurement output at the first iteration.

115 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE reference 1st iteration 5th iteration 50th iteration tracing performance of first output Downloaded by [Dong Shen] at 08:34 04 September time axis Figure 4. First output tracing performance at the first, fifth and 50th iteration. the noise free system indicated by the solid line mared with diamonds, where the constant learning gain is set to be 0.5. An exponential convergence is obtained. To show the sensor saturation, the system output with measurement noise y (t) + w (t) and measured output z (t) at 4 3 the first iteration are given in Figure 3 bythedottedlineand dashedline,respectively.here,wetaethefirstiterationjustfor illustration and no special purpose is considered. As can be seen, if the system output exceeds the measurement range, only the bound can be obtained. reference 1st iteration 5th iteration 50th iteration tracing performance of second output time axis Figure 5. Second output tracing performance at the first, fifth and 50th iteration.

116 2772 D. SHEN AND C. ZHANG mesured tracing error actual tracing error tracign error for noise free case 10 1 max tracing error of first output Downloaded by [Dong Shen] at 08:34 04 September Figure 6. Maximal error profiles of first output Example 2: MIMO system case Consider the following MIMO system (Liu, Tang, Tong, & Li, 2014): iteration number x (1) (t + 1) = x (2) (t) x (2) (t + 1) = f (x(t)) + g(x(t))u(t) (15) y(t) = x (1) (t) where u(t) = [u (1) (t), u (2) (t)] T R 2 and y(t) = [y (1) (t), y (2) (t)] T R 2 are the input and output, respectively. The state is x(t) = [(x (1) (t)) T,(x (2) (t)) T ] T R 4 with x (i) (t) = [x (i1) (t), x (i2) (t)] T R 2, i = 1, 2. The unnown functions are defined as [ 0.4x f (x(t)) = 11 (t)/(1 + x21 2 (t)) ], x 22 (t) ( cos(x 12 (t))) [ ] 20 g(x(t)) =. 02 The initial values for the states are chosen as x(0) = [ , ] T. The iteration length N = 400. The desired references are given as ( 0.5πt 5sin + π ) 50 4 y d (t) = ( 0.5πt 2cos π ) 50 4 The stochastic noise is termed as w(t) = [w (1) (t), w (2) (t)] T with each dimension satisfying that w (i) (t) N(0, ), i = 1, 2. For the measured output, we assume that the upper bounds for the first and second dimensions are 6 and 2.5, respectively. That is, if y (1) (t)plusw (1) (t) is larger than 6, then the measured value that we get is 6. The second dimension is similar. For the simulation, the initial input is simply set to be zero, i.e. u 0 (t) = 0, t. The decreasing gain in this example chooses a = 1/, 1. The learning matrix is set L t = 0.8I 2.The algorithm runs for 50 iterations. The tracing performances for the first and second dimensions of the output are shown in Figures 4 and 5, respectively, where the solid line denotes the desired reference while the dotted, dash-dot, dashed lines denote the actual outputs at the first, fifth and 50th iterations, respectively. One can find that the tracing performance at the fifth iteration has been much improved, where the deviations mainly occur at peas and valleys. The output at the 50th iteration almost coincides with the desired reference. Similar to Example 1, the maximum tracing error profiles for the actual and measured tracing errors, denoted by solid lines mared with circles and squares, are shown in Figures 6 and 7 for two dimensions, respectively. The decreasing trend of such lines shows the convergence of the proposed algorithm. However, due to the involvement of stochastic noises, the maximum tracing error could not decrease to zero completely. Moreover, the decreasing gain a is thus introduced and the convergence speed is a little slow. In addition, the noise free case is also simulated where the decreasing gain a is replaced by a constant gain The maximumtracingerrorprofilesaredisplayedbythelinemared with diamonds in both Figure 6 and Figure 7. To show the effect of sensor saturation, we plot the first dimension of the output as an illustration. The actual output with measurement noise y (1) (t) + w (1) (t),themeasuredoutput and the desired reference are shown in Figure 8, labelled by dotted,dashedandsolidlines.

117 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE mesured tracing error actual tracing error tracing error for noise free case 10 1 max tracing error of second output Downloaded by [Dong Shen] at 08:34 04 September Figure 7. Maximal error profiles of second output. noised output and measurement output iteration number reference noised output measurement output time axis Figure 8. The first system output with noises and measurement output at the first iteration. 6. Conclusions The ILC is addressed for nonlinear systems where the output is subject to a saturation nonlinearity. A P-type update law is designed based on stochastic approximation algorithm, where the measured tracing error rather than the actual tracing error is used for updating. In order to prove the convergence, a random coefficient is introduced to describe the difference between such two tracing errors. Then, the almost sure convergence of the input sequence to the desired one is established strictly on the basis of stochastic approximation theory. Both SISO and MIMO examples are simulatedtoverifythetheoreticalanalysis.forfurtherresearch, it is of great interest to consider more general nonlinearities or quantisation technique at the output side and discuss the inherent effect of such nonlinearities on ILC design and analysis. Moreover, in some practical situations, the sensor saturation problem might be generated from structure failure. If such failure can be reconfigured by the

approach dealing with the output saturation problem.

118 2774 D. SHEN AND C. ZHANG Downloaded by [Dong Shen] at 08:34 04 September 2017 system itself, then we can formulate it as a resilience problem (Zhang & Lin, 2010; Zhang & van Luttervelt, 2011), which provides a novel approach dealing with the output saturation problem. The combination of resilience and the techniques giveninthispaperispotentialandvaluableforsomepractical applications, which requires more efforts in the next. Disclosure statement No potential conflict of interest was reported by the authors. Funding National Natural Science Foundation of China [grant number ], [grant number ]; Natural Science Foundation of Beijing Municipality [grant number ]. Notes on contributors Dong Shen received the B.S. degree in mathematics from Shandong University, Jinan, China, in He received the Ph.D. degree in mathematics from the Academy of Mathematics and System Science, Chinese Academy of Sciences (CAS), Beijing, China, in From 2010 to 2012, he was a Post- Doctoral Fellow with the Institute of Automation, CAS. From 2016 to 2017, he was a visiting scholar at National University of Singapore, Singapore. Since 2012, he has been an associate professor with College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. His current research interests include iterative learning controls, stochastic control and optimization. He has published more than 60 refereed journal and conference papers. He is the author of Stochastic Iterative Learning Control (Science Press, 2016, in Chinese) and co-author of Iterative Learning Control for Multi-Agent Systems Coordination (Wiley, 2017). Dr. Shen received IEEE CSS Beijing Chapter Young Author Prize in 2014 and Wentsun Wu Artificial Intelligence Science and Technology Progress Award in Chao Zhang received the B.E. degree in automation from Beijing University of Chemical Technology, Beijing, China, in Now he is pursuing a master degree at Beijing University of Chemical Technology. His research interests include iterative learning control and its applications on motion robots. References Ahn, H. S., Chen, Y. Q., & Moore, K. L. (2007). Iterative learning control: Survey and categorization from 1998 to IEEE Transactions Systems, Man, and Cybernetics-Part C, 37(6), Benaim, M. (1996). A dynamical system approach to stochastic approximations. SIAM Journal of Control and Optimization, 34(2), Benveniste, A., Metivier, M., & Priouret, P. (1990).Adaptive algorithms and stochastic approximations. New Yor, NY: Springer-Verlag. Bernstein,D.S.,&Michel,A.N.(1995). A chronological bibliography on saturating actuators. International Journal of Robust and Nonlinear Control, 5(5), Blacwell, M. W., Tutty, O. R., Rogers, E., & Sandberg, R. D. (2016). Iterative learning control applied to a non-linear vortex panel model for improved aerodynamic load performance of wind turbines with smart rotors. International Journal of Control, 89(1), Borar,V.S.(2008). Stochastic approximation: A dynamical systems viewpoint. Cambridge: Cambridge University Press. Bristow, D. A., Tharayil, M., & Alleyne, A. G. (2006). A survey of iterative learning control: A learning-based method for high-performance tracing control. IEEE Control Systems Magazine, 26(3), Bu,X.,Wang,T.,Hou,Z.,&Chi,R.(2015). Iterative learning control for discrete-time systems with quantised measurements. IET Control Theory & Applications, 9(9), Caines,P.E.(1988). Linear stochastic systems.newyor,ny:wiley. Cao, Y. Y., Lin, Z., & Chen, B. M. (2003). An output feedbac H controller design for linear systems subject to sensor nonlinearities. IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, 50(7), Chau,C.K.,Qin,F.,Sayed,S.,Wahab,M.H.,&Yang,Y.(2010). Harnessing battery recovery effect in wireless sensor networs: Experiments and analysis. IEEE Journal on Selected Areas in Communications, 28(7), Chen,H.F.(2002). Stochastic approximation and its applications. Dordrecht: Kluwer. Chen,B.M.,Lin,Z.,&Shamash,Y.(2004). Linear systems theory: A structural decomposition approach.boston:birhäuser. Chen, Y. Q., Wen, C., Gong, Z.,& Sun, M.(1999). An iterative learning controller with initial state learning. IEEE Transactions on Automatic Control, 44(2), Chow,Y.S.,&Teicher,H.(1997). Probability theory: Independence, interchangeability, martingales. New Yor, NY: Springer Verlag. Grip, H. F., Saberi, A.,& Wang, X.(2010). Stabilization of multiple-input multiple-output linear systems with saturated outputs. IEEE Transactions on Automatic Control, 55(9), Hao, S., Liu, T., Pasze, W., & Galowsi, K. (2016). Robust iterative learning control for batch processes with input delay subject to time-varying uncertainties. IET Control Theory & Applications, 10(15), Hou, Z., Xu, J.-X., & Yan, J. (2008). An iterative learning approach for density control of freeway traffic flow via ramp metering. Transportation Research Part C: Emerging Technologies, 16(1), Huang, S. N., Tan, K. K.,& Lee, T. H.(2002). Necessary and sufficient condition for convergence of iterative learning algorithm. Automatica, 38(7), Ji, H., Hou, Z., & Zhang, R. (2016). Adaptive iterative learning control for high-speed trains with unnown speed delays and input saturations. IEEE Transactions on Automation Science and Engineering, 13(1), Khong, S. Z., Nesic, D., & Krstic, M. (2016). Iterative learning control based on extremum seeing. Automatica, 66, Kong, K., Kniep, H. C., & Tomizua, M. (2010). Output saturation in electric motor systems: Identification and controller design. Journal of Dynamic Systems, Measurement, and Control, 132(5), Kreisselmeier, G. (1996). Stabilization of linear systems in the presence of output measurement saturation. Systems&ControlLetters,29, Kushner, H. J., & Yin, G. (1997). Stochastic approximation algorithms and applications.newyor,ny:springer. Li,X.,Huang,D.,Chu,B.,&Xu,J.-X.(2016). Robust iterative learning control for systems with norm-bounded uncertainties. International Journal of Robust and Nonlinear Control, 6(4), Li,X.,Ren,Q.,&Xu,J.-X.(2016). Precise speed tracing control of a robotic fish via iterative learning control. IEEE Transactions on Industrial Electronics, 63(4), Lin, Z.,& Hu, T.(2001). Semi-global stabilization of linear systems subject to output saturation. Systems&ControlLetters,43, Liu,J.,&Ruan,X.(2016). Networed iterative learning control approach for nonlinear systems with random communication delay. International Journal of Systems Science, 47(16), Liu,Y.,Tang,L.,Tong,S.,&Li,D.(2014). Reinforcement learning designbased adaptive tracing control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Transactions on Neural Networs & Learning Systems, 26(1), Meng,D.,&Moore,K.L.(2016). Learning to cooperate: Networs of formation agents with switching topologies. Automatica, 64, Ouyang,P.R.,Zhang,W.J.,&Gupta,M.M.(2006). An adaptive switching learning control method for trajectory tracing of robot manipulators. Mechatronics, 16(1), Rousseau,D.,Varela,J.R.,&Chapeau-Blondeau,F.(2003). Stochastic resonance for nonlinear sensors with saturation. Physical Review E, 67(2), Shen, D., & Wang, Y. (2014). Survey on stochastic iterative learning control. Journal of Process Control, 24(12),

119 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 2775 Downloaded by [Dong Shen] at 08:34 04 September 2017 Shen, D., & Wang, Y. (2015a). ILC for networed nonlinear systems with unnown control direction through random lossy channel. Systems & Control Letters, 77, Shen, D., & Wang, Y. (2015b). Iterative learning control for networed stochastic systems with random data dropouts. International Journal of Control, 88(5), Shen, D., & Xu, Y. (2016). Iterative learning control for discrete-time stochastic systems with quantized information. IEEE/CAA Journal of Automatica Sinica, 3(1), Shen, D., Zhang, W., Wang, Y., & Chien, C.-J. (2016). On almost sure and mean square convergence of P-type ILC under randomly varying iteration lengths. Automatica, 63, Shen,D., Zhang, W.,&Xu,J.-X.(2016). Iterative learning control for discrete nonlinear systems with randomly iteration varying lengths. Systems & Control Letters, 96, Tan, Y., Xu, J.-X., Norrlof, M., & Freeman, C. (2011). On reference governor in iterative learning control for dynamic systems with input saturation. Automatica, 47, Wang,Z.,Shen,B.,&Liu,X.(2012). H filtering with randomly occurring sensor saturations and missing measurements. Automatica, 48, Wei, Y.-S., & Li, X.-D. (2016). Iterative learning control for linear discretetime systems with high relative degree under initial state vibration. IET Control Theory & Applications, 10(10), de Wit, C. C., Noel, P., Aubin, A., Brogliato, B., & Drevet, P. (1989). Adaptive friction compensation in robot manipulators: Low velocities. The International Journal of Robotics Research, 10(3), Xiong, W., Yu, X., Patel, R., & Yu, W. (2016). Iterative learning control for discrete-time systems with event-triggered transmission strategy and quantization. Automatica, 72, Xu, J.-X. (1997). Analysis of iterative learning control for a class of nonlinear discrete-time systems. Automatica, 33(10), Xu,J.-X.,Tan,Y.,&Lee,T.H.(2004). Iterative learning control design based oncompositeenergyfunctionwithinputsaturation.automatica, 40(8), Yang, F., & Li, Y. (2009). Set-membership filtering for systems with sensor saturation. Automatica, 45, Yang, S., & Xu, J.-X. (2016). Leader Cfollower synchronisation for networed Lagrangian systems with uncertainties: A learning approach. International Journal of System Sciences, 47(4), Zhang,Y.,&Fang,Y.(2011). Learning control for systems with saturated output. Acta Automatica Sinica, 37(1), (in Chinese). Zhang, R., Hou, Z., Chi, R.,& Ji, H.(2015). Adaptive iterative learning control for nonlinearly parameterised systems with unnown time-varying delays and input saturations. International Journal of Control, 88(6), Zhang, R., Hou, Z., Ji, H.,& Yin, C.(2016). Adaptive iterative learning control for a class of non-linearly parameterised systems with input saturations. International Journal of Systems Science, 47(5), Zhang, W. J., & Lin, Y. (2010). On the principle of design of resilient systems C application to enterprise information systems. Enterprise Information Systems, 4(2), Zhang, W. J., & van Luttervelt, C. A. (2011). Toward a resilient manufacturing system. CIRP Annals-Manufacturing Technology, 60, Appendix Proof of Theorem 3.1: Noticing Equation (1) and Assumption 2.1, wefindthatitissufficienttoprovethatu (t) u d (t) a.s., t = 0, 1,, N 1. To this end, the proof is carried out by mathematical induction method along the time axis t with the help of Lemma 2.1 established in last section. Initial step: We first verify the validity of the theorem for t = 0. In this case, Equation (8)isrewrittenas Subtracting both sides of last equation from u d (0) leads to δu +1 (0) = (1 a γ (0)c + b (0))δu (0) a ϕ (0) + a γ (0)w (1) (A2) Since b (0) is continuous in its second argument, i.e. the initial state, hence from Assumption 2.2 one has that b (0) b d (0). In addition, Assumption 2.4 implies that the coupling number c + b (0) converges to a positive constant. By using Property 1.6 of Benaim (1996), which reveals that the sequence generated by a stochastic approximation algorithm is naturally bounded if the regression function is globally Lipschitz and satisfies some stable condition, it is easy to obtain that the input error sequence generated from Equation (A2)is bounded as the regression function in Equation (A2) is a linear function. Then from the system formulation (1) it follows that the output y (1) is bounded. Combing with the Assumption 2.5, one is able to conclude that there is a nonzero lower bound of γ (0), that is, there is a suitable 0 <ν<1suchthatγ (0) >ν,. Set φ i, j (1 a i γ i (0)c + b i (0)) (1 a j γ j (0)c + b j (0)), i j, φ i,i+1 1 (A3) Note that the notation φ i, j actually is a successive production of the contraction 1 a γ (0)c + b (0) with the subscript increasing from j up to i.therefore,thenotationiswelldefinedinequation (A3)wheni j.itisapparentthatφ i,i = 1 a i γ i (0)c + b i (0) as a special case. While for the case i < j, theincreasefrom j up to i is with no sense, and thus, we define φ i, j = 1with i < j for the completeness of the notation. It is clear that 1 a j γ j (0)c + b j (0) >0andc + b j (0) c for all sufficient large j,say j j 0,andasuitableconstantc > 0. Then for any i > j, j j 0, we have that φ i, j = (1 a i γ i (0)c + b i (0))φ i 1, j (1 a i cν)φ i 1, j exp( a i c 1 )φ i 1, j where the basic inequality 1 x e x, x > 0, is used for the last ( inequality and c 1 = cν. This further yields that φ i, j ) c 2 exp c i 1 = j a, j j 0, i > j, forsomesuitablec 2 > 0. Consequently, φ i, j c 3 exp c 1 i = j a, i > j, j j 0 (A4) To complete the estimation, we need to supplement the case i > j 0, j < j 0.Forthiscase,wehave u +1 (0) = u (0) + a γ (0)c + b (0)(u d (0) u (0)) + a ϕ (0) a γ (0)w (1) (A1) φ i, j φ i, j0 φ j0 1, j c 4 exp c 1 i = j a (A5)

120 2776 D. SHEN AND C. ZHANG Downloaded by [Dong Shen] at 08:34 04 September 2017 for some suitable c 4 > 0andc 4 c 3. Combining Equations (A4) and (A5) leads to the following estimation for the product φ i, j : i φ i, j c 4 exp c 1, i j 0, j > 0 (A6) = j When i < j 0,theaboveestimationisstillvalidduetothefiniteness of j 0,aslongaswechooseasufficientlylargec 4. Now from Equation (A2)we have a δu +1 (0) = φ,1 δu 1 (0) + φ, j+1 a j ϕ j (0) j=1 φ, j+1 a j γ j (0)w j (1) j=1 (A7) where the first term on the right-hand side of last equation tends to zero as because of Equation (A5). Now it comes to the last two terms of Equation (A7). From Assumptions 2.2 and 2.3 and Equation (9), it is evident that ϕ (0) 0as. Therefore, for any ɛ>0, there is a sufficient large integer 1 such that ϕ (0) <ɛ, 1.Thenfor > 1,wehave 1 1 φ, j+1 a j ϕ j (0) j=1 c 4 exp c 1 a i a j ϕ j (0) j=1 i= j+1 + ɛc 4 c 1 a i a j j= 1 exp i= j+1 (A8) where the first term on the right-hand side is a finite summation due to the finiteness of 1, in which each term tends to zero as because the starting number is bounded by 1.Consequently, the finite summation would tend to zero as. For the last term on the right-hand side of Equation (A8), the following estimation can be obtained: ɛc 4 c 1 a i a j j= 1 exp 2ɛc 4 2ɛc 4 c 1 = 2ɛc 4 c 1 2ɛc 4 c 1 i= j+1 ( a j c ) 1a 2 j exp c 1 a i 2 i= j+1 (1 e c1a j ) exp c 1 a i j= 1 i= j+1 exp c 1 a i exp c 1 j= 1 j= 1 i= j+1 i= j a i (A9) Hence,thesecondtermontheright-handsideofEquation(A8) tendstozeroas and ɛ 0. Now only the last term of Equation (A7) islefttoverify.to this end, we first show that the summation j=1 a jγ j (0)w j (1) converges to some unnown constant. From Assumption 2.5, it is noticed that {w j (1)} is a sequence of independent and identical distributed random variables along iteration axis with finite second moments. Moreover, E(a γ (t)w (1)) 2 E(a w (1)) 2 R 1 w a 2 < =1 =1 This further implies that =1 a γ (t)w (1) < a.s. by Khintchine Kolmogorov convergence theorem (Chow & Teicher, 1997). Let λ = j=1 a jγ j (0)w j (1), λ 0 = 0. Then λ λ< and therefore, for any ɛ>0, there is a sufficient large integer 2 > 1 such that λ λ ɛ, 2.Byapartialsummation,we have = λ = λ φ, j+1 a j γ j (0)w j (1) j=1 = φ, j+1 (λ j λ j 1 ) j=1 (φ, j+1 φ, j )λ j 1 j=1 (φ, j+1 φ, j )λ j=1 = λ λ + φ,1 λ =1 (φ, j+1 φ, j )(λ j 1 λ) j=1 (φ, j+1 φ, j )(λ j 1 λ) (A10) j=1 whereitisobviousthatthetermsontheright-handsidealltends to zero as exceptthelastone.whileforthelastterm,we further have (φ, j+1 φ, j )(λ j 1 λ) j=1 2 = + j=1 j= 2 +1 (φ, j+1 φ, j )(λ j 1 λ) It is evident that λ λ and 2 j=1 (φ, j+1 φ, j )(λ j 1 λ) 0as. The latter result holds because that it is a finite summation and φ, j 0as as j is bounded by 2.Moreover, j= 2 +1 = (φ, j+1 φ, j )(λ j 1 λ) j= 2 +1 φ, j+1 a j γ j (0)c + b j (0)(λ j 1 λ) (A11)

121 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 2777 Downloaded by [Dong Shen] at 08:34 04 September 2017 Note that λ j λ ɛ, j 2. In addition, c + b j (0) will converge to a constant. Then it follows that φ, j+1 a j γ j (0)c + b j (0)(λ j 1 λ) j= 2 +1 ɛ sup 1 j< c + b j (0) c 4 exp c 1 a j (A12) j= 2 +1 i= j+1 which tends to zero as and ɛ 0. Consequently, δu + 1 (0) 0as 0. In other words, the conclusion of this theorem is true for t = 0. Inductive step: Assume that the theorem is valid for s = 0, 1,, t 1, and we proceed to show that the theorem is also true for t.fromequation(8), we can get the counterpart of Equation (A2)fortimeinstantt: δu +1 (t) = (1 a γ (t)c + b (t))δu (t) a ϕ (t) + a γ (t)w (t + 1) (A13) where ϕ (t) = γ (t)(c + δ f (t) + c + δb (t)u d (t)). Comparing Equation (A13)withEquation(A2), we find that the terms δu (t), 1 a γ (t)c + b (t), ϕ (t), and w (t + 1) correspond to δu (0), 1 a γ (0)c + b (0), ϕ (0), and w (1), respectively.thus,theproofcanbecompletedfollowingthesamesteps as long as we ensure that ϕ (t) 0as, since all the other terms are with the same properties. By the induction assumption we have that lim δu (s) = 0 for s = 0, 1,, t 1, then applying the results in Lemma 2.1 leads to a direct conclusion that δ f (t) 0and δb (t) 0as 0. Therefore, we have ϕ (t) 0 by the boundedness of all the left terms in the expression of ϕ (t). As a consequence, using completely the same steps to the case t = 0, we can prove that u (t) u d (t)as. The proof of the theorem is thus completed by mathematical induction principle. Proof of Theorem 4.1: Theproofcanbeperformedsimilarlyto the proof of Theorem 3.1. Here, we mainly give the major differences caused by the multi-dimensional formulation. The mathematical induction method is also used here, where the inductive step is simply routine and thus we mainly chec the validity of the initial step. To this end, consider the case t = 0andthealgorithm (12) results in u +1 (0) = u (0) + a L 0 [y d (1) Sat(y (1) + w (1))] = u (0) + a L 0 Ɣ (0)[y d (1) y (1) w (1)] = u (0) + a L 0 Ɣ (0)C + B (0)(u d (0) u (0)) + a L 0 Ɣ (0)ψ (0) a L 0 Ɣ (0)w (1) (A14) where Ɣ (0) = diag{γ (1) (0),..., γ (q) (0)} is a diagonal matrix, C + B (0) C(1)B(0, x (0)), δb (0) = B(0, x d (0)) B(0, x (0)), and ψ (0) = C(1)δ f (0) + C(1)δB (0)u d (0),whichconverges to zero as. Then, subtracting both sides of last equation leads to δu +1 (0) = (I a L 0 Ɣ (0)C + B (0))δu (0) a L 0 Ɣ (0)ψ (0) + a L 0 Ɣ (0)w (1) (A15) which is similar to Equation (A2). Set i, j (I a i L 0 Ɣ i (0)C + B i (0)) (I a j L 0 Ɣ j (0)C + B j (0)), i j and i, i + 1 = I. ComparedwiththeproofofTheorem 3.1, to showthealmostsureconvergenceforthemimocase,itissufficient to verify that there exist suitable constants c 5 > 0and c 6 > 0suchthat i, j c 5 exp c 6 i = j a, i j, j 1 (A16) Then by applying the same steps as in the proof of Theorem 3.1, the estimations of all terms are still valid. First, recall the property that there is a lower bound of γ (i) (0), 1 i q, sayν for simplicity. Then we have Ɣ (0) νi. Note that C + B (0) C + B d (0) = C(t + 1)B(0, x d (0)) and L 0 is designed such that all eigenvalues of L 0 C + B d (0) are with positive real parts. Thus, according to the Lyapunov stability for linear system (see, for example, Chen, Lin, & Shamash, 2004, p. 37), for any given positive definite matrix Q, there exists a positive definite matrix P such that P( L 0 C + B d (0)) + ( L 0 C + B d (0)) T P = Q or equivalently P(L 0 C + B d (0)) + (L 0 C + B d (0)) T P = Q (A17) (A18) Consequently, we can choose Q = 2I andcomputethecorresponding positive definite matrix P. Since C + B (0) C + B d (0) as and Ɣ (0) is a diagonal matrix, there is a suitable 0 > 0suchthatfor 0 P( L 0 Ɣ (0)C + B (0)) + ( L 0 Ɣ (0)C + B (0)) T P νi (A19)

122 Downloaded by [Dong Shen] at 08:34 04 September D. SHEN AND C. ZHANG For simplicity, let us denote H L 0 Ɣ (0)C + B (0). As a consequence, i 0, T i, j P i, j = T i 1, j (I + a ih i ) T P(I + a i H i ) i 1, j = T i 1, j (P + a2 i HT i PH i + a i H T i P + a i PH i ) i 1, j T i 1, j (P + a2 i HT i PH i νa i I) i 1, j = T i 1, j P 1 2 (I νai P 1 + a 2 i P 1 2 H T i PH i P 1 2 )P 1 2 i 1, j (A20) Without loss of generality, when 0 is sufficiently large, it is guaranteed that, i 0, I νa i P 1 + a 2 i P 1 2 H T i PH i P c6 a i exp( 2c 6 a i ) (A21) by noticing the boundedness of P 1 2 H T i PH i P 1 2.Combining Equations (A20)and(A21)leadsto T i, j P i, j c 7 exp 2c 6 with a suitable constant c 7 > 0. Hence, i, j λ 1 2 min (P) c 7 exp c 6 i = j i = j a a (A22) As a result, we define c 5 λ 1 2 min (P) c 7, then the estimation of Equation (A16) is guaranteed. The following steps of this proof are completely the same as in the proof of Theorem 3.1 and thus are omitted for space concise. The proof is completed.

123

Systems & Control Letters 107 (2017) 9 16 Contents lists available at ScienceDirect Systems & Control Letters journal homepage: www.elsevier.

Imperial College, London, SW7 2AZ, UK b College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, PR China a r t i c l e i n f o a b s t r a c t

124 Systems & Control Letters 107 (2017) 9 16 Contents lists available at ScienceDirect Systems & Control Letters journal homepage: Two novel iterative learning control schemes for systems with randomly varying trial lengths Xuefang Li a, Dong Shen b, * a Department of Electrical and Electronic Engineering, Imperial College, London, SW7 2AZ, UK b College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , PR China a r t i c l e i n f o a b s t r a c t Article history: Received 21 December 2016 Received in revised form 30 June 2017 Accepted 3 July 2017 Keywords: Iterative learning control Learning convergence Randomly varying trial length This paper proposes two novel improved iterative learning control (ILC) schemes for systems with randomly varying trial lengths. Different from the existing wors on ILC with variable trial lengths that advocate to replace the missing control information by zero, the proposed learning algorithms are equipped with a searching mechanism to collect useful but avoid redundant past tracing information, which could expedite the learning speed. The searching mechanism is realized by the newly defined stochastic variables and an iteratively-moving-average operator. The convergence of the proposed learning schemes is strictly proved based on the contraction mapping methodology. Two illustrative examples are provided to show the superiorities of the proposed approaches Elsevier B.V. All rights reserved. 1. Introduction In our daily lives, one could complete a given tas and improve the performance gradually provided that the operation is repeated. Such process for human being is usually called the learning process. Inspired by this basic cognition, iterative learning control (ILC) theory is developed for systems that are able to complete tass over a fixed time interval and perform them repeatedly. By synthesizing the control input from the previous control input and tracing error, the controller is able to learn from the past experience and improve the current tracing performance. Since first introduced by Arimoto in 1980s [1], ILC has attracted much attention from both scholars and engineers over the past three decades and many achievements have been made [2 13]. When considering learning, a basic premise is that the desired tas should be performed under same conditions such as identical initial condition and identical trial length for all iterations. In fact, such premise has been assumed in most ILC literature. However, one may find that this assumption is commonly violated in many practical applications due to system uncertainties. That is, the trial lengths may vary in the iteration domain. For instance, [14 17] provided several practical systems that run repeatedly but the trial lengths are not identical due to the complex external environments. Specifically, [14] investigated the application of ILC This wor is supported by National Natural Science Foundation of China ( , ) and Beijing Natural Science Foundation ( ) * Corresponding author. addresses: xuefang.li@imperial.ac.u (X. Li), shendong@mail.buct.edu.cn (D. Shen). to humanoid robots, where the gaits problems were divided into phases defined by foot strie times and the durations of the phases were usually not the same from iteration to iteration during the learning process. Moreover, two biomedical systems including functional electrical stimulation for upper limb movement and for gait assistance were introduced in [15 17]. Due to the unnown dynamics and related complex factors, the learning process might end earlier and start the next iteration. Another example is the trajectory tracing with output constraints on a lab-scale gantry crane given in [18]. When the output constraints were violated, the load was wound up and the trial was terminated, which results in variable pass lengths for ILC [18]. Motivated by these observations, ILC problem with iteration-varying trial lengths has attracted more and more attention in recent years. In the existing literature, there are some wors addressing ILC design problems with non-uniform trial lengths from different technical perspectives [16 24]. First, Li et al. proposed an ILC framewor for both discrete-time linear and continuous-time nonlinear systems with randomly varying trial lengths by introducing a stochastic variable to describe the randomness of trial lengths in [19] and [21], respectively. In [19], to deal with the randomly varying trial lengths, an iteration-average operator of all historical data was employed in the ILC algorithm to reduce the effect of the lost tracing information. While in [21], instead of using all historical control information, an iteratively-moving-average operator is adopted in ILC law where only the most recent control information will be utilized for learning since older control information would reduce the corrective action from the most recent trials. Moreover, to avoid the utilization of λ-norm, a lifted framewor of ILC for discrete-time linear systems was provided / 2017 Elsevier B.V. All rights reserved.

125 10 X. Li, D. Shen / Systems & Control Letters 107 (2017) 9 16 in [20]. However, it is worthy noting that the convergence of the tracing errors in [19 21] is derived in the sense of mathematical expectation. Fortunately, there are some wors showing stronger convergence properties of ILC with non-uniform trial lengths. For example, the almost sure and mean square convergence of a P- type ILC was established in [23,24]. Specifically, [23] considered a discrete linear system where the path statistic properties of the input error, namely, mathematical expectations and covariances, were first recursively calculated along the iteration axis. Based on the recursions of expectations and covariances of the input error, the convergence in the sense of expectation, mean square, and almost sure was derived in sequence. In [24], the ILC design problem was extended to a class of affine nonlinear systems, where the techniques used in [23] were no longer applicable for nonlinear systems. Thus, a modified λ-norm and a technical lemma were introduced to pave the way of showing the almost sure convergence of the tracing error. Furthermore, Seel et al. also contributed much on this topic, where the main focus lies in the monotonic convergence property [15,18,22] and practical applications [16,17]. A primary result is given in [15], where the authors presented the conditions of learning gain matrix for ensuring monotonic convergence. However, the calculations of the learning gain rely upon a completely nown system model, which restricts the applicability of the proposed algorithm. A similar technique was applied to the trajectory tracing problem of a labscale gantry crane in [18]. The extended version of monotonic convergence with more detailed explanations was reported in a recent paper [22]. Additionally, [16,17] apply ILC with variable pass lengths in the Functional Electrical Stimulation (FES)-based treatment systems for strie patients. These two wors also show that the addressed problem has great significance in real-time applications. However, it is worthwhile to highlight that a common feature of the wors [18 24] on ILC with non-uniform trial lengths is to replace the missing tracing error information with zero. That is, when the tracing information is not available due to varying trial lengths, the laced data is set to be zero. Therefore, how to develop a new ILC algorithm that is able to improve the control performance for systems with iteration-varying trial lengths, is an interesting and challenging problem. Motivated by the above observations, in this paper, two novel improved ILC schemes are proposed for a class of discrete-time linear systems with randomly varying trial lengths. Different from the previous wors on ILC with variable trial lengths that advocate to replace the missing tracing information by zero, the proposed learning algorithms are equipped with a searching mechanism to collect useful but avoid redundant past control information, which could expedite the learning speed. The searching mechanism is realized by introducing a new stochastic variable and an iterativelymoving-average operator. The aim and main contribution of this paper is to reduce the impact of the randomly varying trial lengths to the learning control algorithm and to expedite the convergence speed. To achieve the objective, two ILC laws are proposed. More precisely, the first ILC scheme is proposed to reduce the redundant control information, which appears in the design of ILC laws in [19 21], while the second one is developed to mae full use of the effective previous control information to further expedite the learning speed. In addition, the almost sure convergence for both ILC schemes is provided in a rigorous way. The rest of the paper is organized as follows. Section 2 presents the problem formulation. Section 3 and 4 contribute to the controller design and convergence analysis. Furthermore, numerical simulations are given in Section 5 to verify the validation of the proposed control algorithms. Section 6 draws a conclusion of this wor. Notations. R is the real set and R n is the n-dimensional space. N is the set of positive integers. denotes the Euclidean norm of its indicated vector or matrix. Denote f(t) λ sup t {0,1,2,...,T} α λt f(t) and f(t) s sup t {0,1,2,...,T} f(t) the λ-norm and s-norm of a vector function f(t) respectively with λ > 0 and α > Problem formulation Consider the following discrete-time linear system x (t + 1) = Ax (t) + Bu (t), y (t) = Cx (t), where N is the iteration index, t {0, 1, 2,..., T } denotes the time instant, and T is the trial length at the th iteration. Moreover, x (t) R n, u (t) R p, and y (t) R r denote the state, input, and output of the system (1), respectively. Furthermore, A, B and C are constant matrices with appropriate dimensions. It is worth to point out that the results and convergence analysis in this paper can be extended to linear time-varying systems straightforwardly, and thus we just consider the time-invariant case to clarify our idea. Let y d (t), t {0, 1, 2,..., T d } be the desired output trajectory. Assume that, for any realizable output trajectory y d (t), there exists a unique control input u d (t) R p such that x d (t + 1) = Ax d (t) + Bu d (t), y d (t) = Cx d (t), where u d (t) is uniformly bounded for all t {0, 1, 2,..., T d } with T d being the desired trial length. The control objective is to tac the desired trajectory y d (t), t {0, 1, 2,..., T d } by determining a sequence of control inputs u such that the tracing error converges as the iteration number increases. Before addressing the controller design problem, the following assumptions are imposed. A 1. The coupling matrix CB is of full-column ran. A 2. The initial states satisfy x d (0) x (0) ϵ, ϵ > 0. Remar 1. The initial state resetting problem is one of the fundamental issues in ILC field as it is a standard assumption to ensure the perfect tracing performance. In the past three decades, some papers have devoted to remove this condition by developing additional control mechanisms such as [25 27]. Under assumption A2, since the initial state is different from the desired initial state, it is impossible to achieve the perfect tracing. Instead, the ILC algorithms should force the system output to be as close as possible to the target. Remar 2. It is worthy noting that, unlie the classic ILC theory that requires control tass to repeat on a fixed time interval, the trial lengths T, N are iteration-varying and may be different from the desired trial length T d. For the case that the th trial length is shorter than the desired trial length, both the system output and the tracing error information will be missing and cannot be used for learning. Thus, this paper aims to re-design ILC schemes to mae up the missing signals by maing full use of the previous available tracing information, and thus expedite the learning speed. Although some previous wors have been published [19 21], a basic assumption is that the probability distribution of T is nown prior. In this paper, the proposed ILC algorithms will be equipped with an automatic searching mechanism, thus the probability distribution of randomly varying trial lengths is no longer required. (1) (2)

126 X. Li, D. Shen / Systems & Control Letters 107 (2017) th iteration, while γ (t) = 0 denotes the event that the control process cannot continue to the time instant t. Based on the notations above and A3, the first proposed ILC law is presented as follows 1 m (I) : u +1 (t) = γ m n +1 j (t)u +1 j (t) t j=1 1 m + Γ γ m n +1 j (t)e +1 j (t + 1), (3) t j=1 where Γ is the learning gain matrix to be determined, and e y d y represents the tracing error. Fig. 1. Illustration of S m t, : set m = 5 and t = 9, then Sm t, = {T, T 3, T 4 }, which implies that n t = 3 and m n t = Controller design I and convergence analysis In this section, a novel ILC algorithm will be developed to reduce the effect of redundant tracing information in the ILC algorithms in [19,21] and thus could expedite the convergence speed. Recall that T is the trial length at th iteration. It varies in the iteration domain randomly. Denote S m t, {T +1 j t > T +1 j, j = 1, 2,..., m} where m > 1 is an integer. Let n t = S m t, be the amount of elements in the set S m t,. That is, for a given time instant t > 0 and a given iteration, there are only m n t iterations with the available tracing information in the past m iterations, and the tracing information at other iterations are missing. To show the definition of the set S m t, and the number n t, a simple example is illustrated in Fig. 1. It is worthy pointing out that the number n t is a random variable due to the randomness of the trial lengths. If we denote the probability of the occurrence of the output at time instant t as p(t), the mathematical expectation of n t can be calculated as E{n } = t p(t)m. Therefore, m n t would increase to infinity as m goes to infinity in the sense of mathematical expectation. This property guarantees the reasonability of the following assumption. A 3. For a given iteration number and a time instant t {0, 1,..., T }, the number m n t 1. That is, there exists at least one iteration whose trial length is larger than t in the past m consecutive iterations. Remar 3. The assumption A3 is imposed to guarantee the learning effectiveness. If m n t = 0, it gives T +1 j < t for j = 1, 2,..., m, namely, all trial lengths of the adjacent m iterations before the ( + 1)th iteration are shorter than the given time t. This further means that at time instant t, there is no output information available and nothing can be learned from the past m iterations. By assuming m n t 1, the effective learning process can be guaranteed. However, this assumption is not restrictive as m n t would increase to infinity as m goes to infinity. In addition, assumption A3 implies that the iteration-varying trial lengths are not totally stochastic in this section. This assumption would be further relaxed in Section 4. Similar to [19 21], we introduce a stochastic variable γ (t), t {0, 1,..., T d }, satisfying Bernoulli distribution and taing binary values 0 and 1. The relationship γ (t) = 1 represents the event that the control process can continue to the time instant t at the Remar 4. From (3), we can see that the stochastic variable γ (t) is also adopted in the control input part, i.e., the first term on the right-hand side of (3). It implies that if the trial length is shorter than the given time t, both the corresponding input and tracing error signals will not be involved in the updating. The major difference between ILC law (3) and the ones in [19,21] lies in that the average operator will not incorporated redundant control information into the learning law, and thus the convergence speed could be expedited. Remar 5. The initial m input and tracing error signals in the searching algorithm can be determined by using other control methods such as the classic feedbac control that can stabilize the controlled system, and they will not affect the final convergence performance. The convergence of the proposed ILC scheme (3) can be summarized in the following theorem. Theorem 1. Consider the system (1) and the ILC law (3). Assume that A1-A3 hold. If the following condition holds, I Γ CB ρ < 1, (4) then the tracing error e will converge to the δϵ-neighborhood of zero asymptotically in the sense of λ-norm as goes to infinity, where δ > 0 is a suitable constant to be defined later. Proof. Denote x = x d x the state error, u = u d u the input error, and e = y d y the tracing error. Subtracting both sides of the updating law (3) from u d, we have u +1 (t) = 1 m n t m γ +1 j (t) u +1 j (t) j=1 1 m Γ γ m n +1 j (t)e +1 j (t + 1). t (5) From (1) and (2), it gives j=1 e (t + 1) = CA x (t) + CB u (t). (6) Substituting (6) into (5) implies u +1 (t) = 1 m n t m γ +1 j (t) u +1 j (t) j=1 1 m Γ γ [ m n +1 j (t) CA x +1 j (t) t j=1 + ] CB u +1 j (t) 1 m = γ m n +1 j (t)(i Γ CB) u +1 j (t) t j=1 1 m Γ CA γ m n +1 j (t) x +1 j (t). t (7) j=1

127 12 X. Li, D. Shen / Systems & Control Letters 107 (2017) 9 16 Since x (t) = A t x (0) + t 1 n=0 At n 1 B u (n), it follows that u +1 (t) 1 m = γ m n +1 j (t)[i Γ CB] u +1 j (t) t j=1 1 m Γ CA t+1 γ m n +1 j (t) x +1 j (0) t j=1 t 1 1 m Γ CA γ m n +1 j (t) A t n 1 B u +1 j (n). (8) t j=1 n=0 Taing norm to both sides of (8), we can obtain that u +1 (t) 1 m n t m γ +1 j (t) I Γ CB u +1 j (t) j=1 + κα t+1 ϵ t 1 1 m + κβ γ m n +1 j (t) α t n u +1 j (n), (9) t j=1 n=0 1 m where γ j=1 +1 j(t) = 1, α A, β B, and κ Γ C m n t are applied. Multiplying both sides of (9) by α λt, and taing the supremum with respect to the time t, we have u +1 (t) λ 1 m n t m γ +1 j (t) I Γ CB u +1 j (t) λ j=1 + καϵ + 1 m κβ γ m n +1 j (t) t j=1 ( t 1 ) sup α λt α t n u +1 j (n). (10) t n=0 n=0 Note that ( t 1 ) sup α λt α t n u +1 j (n) t n=0 ( t 1 ) = sup α (λ 1)t α λn u +1 j (n) α (λ 1)n t n=0 ( t 1 ) sup α (λ 1)t sup(α λn u +1 j (n) )α (λ 1)n t n t 1 = u +1 j (t) λ sup α (λ 1)t t n=0 α (λ 1)n 1 α (λ 1)T d α λ 1 1 u +1 j(t) λ, (11) thus, (10) becomes 1 u +1 (t) λ m n t m γ +1 j (t)ρ 0 u +1 j (t) λ + καϵ j=1 ρ 0 max u +1 j (t) λ + καϵ, (12) ( j=1,2,...,m ) where ρ 0 I Γ CB + κβ(1 α (λ 1)T d ), and the equation α λ m γ m n j=1 +1 j(t) = 1 is applied. t Define Q +1 = u +1 (t) λ καϵ. From (12), it follows that 1 ρ 0 Q +1 ρ 0 max Q +1 j. (13) j=1,2,...,m If Q +1 0, it means that u +1 (t) λ has entered the καϵ 1 ρ 0 - neighborhood of zero and will stay in the neighborhood. Thus, to show the bounded convergence, it is sufficient to analyze the scenario with Q +1 > 0. Similar to (13), we have that Q +2 ρ 0 max j=1,2,...,m Q +2 j. Note that max j=1,2,...,m Q +2 j max{ max j=2,...,m Q +2 j, Q +1 } max{ max j=1,...,m Q +1 j, ρ 0 max = max Q +1 j, j=1,...,m Q +1 j } j=1,2,...,m and it follows that Q +2 ρ 0 max j=1,...,m Q +1 j. By induction, we can obtain that Q +p ρ 0 max j=1,...,m Q +1 j, p = 1, 2,..., m. Therefore, it gives max Q +p ρ 0 max Q +1 j, (14) p=1,...,m j=1,...,m which implies the convergence of max p=1,...,m Q +p, i.e., lim u(t) λ = καϵ. 1 ρ 0 Moreover, e (t) = CA t x (0) + t 1 C n=0 At n 1 B u (n). Taing λ-norm on both sides of this equation gives e (t) λ cϵ + cβ 1 α (λ 1)T d α λ u (t) λ, (15) α where c = C. Due to the convergence of u (t) λ, we can obtain the convergence of e (t) λ to a neighborhood of zero where the bound is proportional to ϵ. In other words, there exists an appropriate δ > 0 such that lim e (t) λ δϵ. This completes the proof. Remar 6. It is noted that the convergence condition given in Theorem 1, i.e., (4), is the same as the one in classic ILC, which is irrelevant with the probability distribution of the randomly varying trial lengths. This is one of the advantages of the proposed ILC scheme since it is shown that the same convergence condition can be applied to deal with more complex control problems. Although the probability distribution of the trial length is not involved in (4), different probability distribution will lead to different convergence speed. In details, for a given time instant t, the greater probability of the event T t, the faster the convergence speed. This can be verified from (14). For a greater probability of T t, we can select a smaller m, which indicates (14) will converge faster. However, due to the lac of analysis tools, currently it is difficult to present an analytic expression for the probability distribution and the convergence speed. This is an interesting problem, and should be addressed in future wor. Remar 7. The choice of m in the controller (3) depends on the length of the random interval for the trial length T. If the random interval is long, it implies that the trial length varies drastically in the iteration domain. In such a case, more previous trials will expedite the convergence speed because some of the missing information can be made up. While if the random interval is short, which means that the trial length in each iteration changes slightly and is close to the desired trial length, it is better to use a small number of previous trials. When the randomness is low, a large number of past trials may adversely weaen the learning effect because the large averaging operation would reduce the corrective action from the most recent trials. 4. Controller design II and convergence analysis In this Section, we will develop a new ILC law to mae full use of the previous control information, which thus could expedite the learning speed. In order to facilitate the controller design, the following assumption is first imposed.

128 X. Li, D. Shen / Systems & Control Letters 107 (2017) Theorem 2. Consider system (1) and ILC law (16). Assume A1, A2, and A4 hold. If the following condition holds, I Γ CB ρ < 1, (17) then the tracing error e will converge to the δϵ-neighborhood of zero asymptotically in the sense of λ-norm as goes to infinity, where δ > 0 is a suitable constant. Fig. 2. Illustration of A3: set m = 4 and t = 9, then we can find that T j > t, j = 1, 2, 5, 6, which implies that r,1 = 2, r,2 = 3, r,3 = 6, r,4 = 7. A 4. For a given iteration number > m and a time instant t {0, 1,..., T }, we can find m past iterations such that T +1 r,j > t, j = 1, 2,..., m, where r,j, j = 1, 2,..., m is an increasing sequence with 1 r,j being an integer. Remar 8. The assumption A4 is reasonable since we can always find enough past iterations satisfying the assumption after a sufficiently large number of iterations. Otherwise, the learning process cannot be guaranteed. In practical, only the first few iterations may not satisfy A4. For these iterations, we can adopt the control algorithm in Section 3 or the ones in [19 21] if necessary, which will not affect the convergence of the learning algorithm. A simple example of A4 is illustrated in Fig. 2. Based on A4, the second proposed ILC law is given as follows (II) : u +1 (t) = 1 m u +1 r,j (t) + 1 m m m Γ e +1 r,j (t + 1). (16) j=1 Remar 9. From (16), it can be found that r,j, j = 1, 2,..., m are random variables because of the randomness of the trial lengths. The introduction of these random variables actually forms the searching mechanism in the control algorithm. By fully searching and utilizing the available tracing information, (16) is able to increase the convergence speed. Remar 10. In this wor, the ILC laws (3) and (16) are totally different. Based on A3, the searching mechanism in ILC law (3) is restricted in the last m iterations. Within the last m iterations, (3) incorporates all the available information into the controller. The main advantage for the controller (3) is some of too old tracing information, which may weaen the correction from the latest iterations, can be avoided. However, the drawbac is that the available historical information may be too scanty to improve the learning process if the probability of the occurrence of full trial length is small. While for the ILC law (16), it eeps searching until m available output signals are found. This controller is good at collecting all useful past control information, but the information far away from the currently iteration may degrade the learning performance. The comparison of these two ILC laws will presented in the numerical examples. The second main result of this paper is summarized in the following theorem. j=1 Proof. For a given time instant t, let G t {T T > t, = 1, 2,...}. Define a new sequence 1 σ 1 < σ 2 < < σ i < and assume T σi is the ith elements of G t. Then G t can be represented as G t = {T σ1, T σ2,..., T σi,...}. For a given iteration number, if σ i < + 1 σ i+1 and i m, by the definition of G t, the ILC law (16) can be rewritten as follows u +1 (t) = 1 m u σi+1 j (t) + 1 m m m Γ e σi+1 j (t + 1). (18) j=1 Moreover, the control input will not be updated from iteration σ i + 1 to σ i+1, namely, u σi +1(t) = = u +1 (t) = = u σi+1 (t). (19) Therefore, (18) and (19) imply that u σi+1 (t) = 1 m u σi+1 j (t) + 1 m m Γ j=1 j=1 m j=1 e σi+1 j (t + 1). (20) Hence, it is sufficient to prove the convergence of the input sequence u σi, i = 1, 2,.... Similar to the proof of Theorem 1, the following inequality can be obtained u σi+1 (t) λ ρ 0 max u σi+1 j (t) λ + καϵ. (21) j=1,2,...,m Then following the same procedure as the latter part proof of Theorem 1, the convergence of the input sequence can be derived lim i u σi (t) λ = καϵ, which further gives lim 1 ρ u (t) λ 0 = καϵ. Finally, the convergence of the tracing error can be 1 ρ 0 proved similarly as the proof of Theorem 1, i.e., lim e (t) λ δϵ. The proof is thus completed. Remar 11. The learning algorithm (16) is stochastic due to the randomness of r,j. It seems that the algorithm (16) is deterministic but it is essentially stochastic because of the stochastic selection of suitable iterations, which can be seen from the subscripts of inputs and tracing errors. In addition, due to the introduction of randomly varying trial length, the convergence analysis in this paper uses a sequential contraction mapping as can be seen from (14) and (21). The major difference between the two recursions lies in that the sequential contraction in (14) is deterministic while in (21) is stochastic. Remar 12. Similar results and convergence analysis can be extended to linear time-varying systems, namely, A = A(t), B = B(t) and C = C(t), and nonlinear systems with Lipschitz continuous uncertainties without significant efforts. For nonlinear systems without Lipschitz conditions, composite energy function (CEF) would be an optional approach. However, CEF-based ILC design with iteration-varying trial lengths is still an open problem. Remar 13. In ILC field, 2D approach is another preferable analysis tools. For instance, in [28] 2D approach is applied to analyze the stability property of an inferential ILC, and in [29] a systematic procedure for ILC design by using 2D approach is developed. Therefore, investigating ILC with non-uniform trial lengths by 2D method would be an interesting research topic. Although there is no wor

129 14 X. Li, D. Shen / Systems & Control Letters 107 (2017) 9 16 reported on this topic in literature, it is not difficult to reformulate the problem addressed in this paper into 2D framewor. Due to the variation of the trial lengths, the stochastic variable is still needed to modify the 2D variables when they are unavailable/missing. However, for nonlinear systems, it is difficult to apply 2D approach. 5. Illustrative example In order to show the effectiveness and superiority of the proposed ILC schemes, the same discrete-time linear system (A, B, C) in [19] is considered, where A = ( C = ( ). ), B = ( Let the desired trajectory be y d (t) = sin(2πt/50) + sin(2πt/5) + sin(50πt), t I d {0, 1,..., 50}, and thus T d = 50. Without loss of generality, set u 0 (t) = 0, t I d in the first iteration. Moreover, assume that the trial length T varies from 30 to 50 satisfying discrete uniform distribution. This assumption is just a simple illustration. For other inds of probability distribution, the proposed ILC scheme still wors well since the probability distribution of trial lengths is not involved in the convergence conditions (4) and (17), and the influence of the probability distribution has also been discussed in Remar Simulations for ILC law (I) In this subsection, let m = 4 and the learning gain is set as L = 0.5, which renders to I LCB = 0.5 < 1. Firstly, we consider the case with identical initial condition, i.e., x (0) = [0, 0, 0] T, N. The performance of the maximal tracing error, e i s sup t Id e i, is presented in Fig. 3. It shows that the maximal tracing error e i s decreases from to within 100 iterations. Meanwhile, to show the effectiveness of the proposed ILC scheme, the comparisons with the ILC law in [19] is also given in Fig. 3. It is obvious that by removing the redundant control input signal in the control laws, the proposed ILC scheme outperforms the one in [19]. In detail, it can be seen that the convergence of ILC law (I) is much faster and smoother, which could be more desirable to practical applications. It is noted that oscillations in the tracing error profiles in Fig. 3 are due to the variation of the trial lengths. The tracing performance of the ILC law (I) at different iterations is shown in Fig. 4, where we can see that after 50 iterations, the difference between y 50 and y d is almost invisible. Furthermore, we consider the case with iteration-varying initial condition, namely, x (0) = ϵ [1, 1, 1] T, N with ϵ = 0.05 sin(). The convergence of e i s is given in Fig. 5. It is seen that after 50 iterations e i s cannot be reduced further, and the final convergence bound is proportional to 0.05 which is the magnitude of x (0) Simulations for ILC law (II) This subsection will demonstrate the effectiveness of the proposed ILC scheme (II). Similar to Section 5.1, we select the learning gain L = 0.5. Let m = 2 and x (0) = [0, 0, 0] T, N. Fig. 6 shows the convergence of the maximal tracing error e i s. By comparing with the ILC law in [19], we can see that the proposed ILC algorithm (II) is able to expedite the convergence speed a hundredfold. Moreover, Fig. 7 shows the comparison between the proposed ILC schemes (I) and (II). It is found that the ILC (I) presents a smoother tracing performance, while the ILC (II) wins by a faster convergence speed. The reason is that the ILC (II) incorporates more ), Fig. 3. The maximal tracing error profiles of the proposed ILC scheme (I) and the one in [19] (i.e., (Li et al., 2014)) under identical initial condition. Outputs Time (s) Fig. 4. The system outputs at the 1st, 5th and 50th iterations. The reference y d is given for comparison. Fig. 5. The convergence of the maximal tracing error of ILC law (I) without identical initial condition. historic learning information into updating and expedites the convergence speed. However, utilizing some older control information in the algorithm may lead to oscillation in tracing performance. Therefore, which algorithm should be chosen is entirely dependent on the control targets. If the identical initial condition is not satisfied, e.g. x (0) = ϵ [1, 1, 1] T, N, the ILC (16) still wors well, as shown in Fig. 8, by sacrificing the convergence accuracy.

130 X. Li, D. Shen / Systems & Control Letters 107 (2017) iteratively-moving-average operator. The convergence of the proposed learning schemes is analyzed according to the contraction mapping methodology. Moreover, the efficiency of the proposed ILC schemes are verified by numerical examples. Extension to nonlinear systems and design framewor of ILC with other imperfect learning conditions will be investigated in the next research phase. References Fig. 6. The maximal tracing error profiles of the proposed ILC scheme (II) and the one in [19] (i.e., (Li et al., 2014)) under identical initial condition. Fig. 7. The comparison of the proposed ILC algorithms (I) and (II). Fig. 8. The convergence of the maximal tracing error of ILC law (II) without identical initial condition. 6. Conclusion This paper presents two novel improved ILC schemes for systems with randomly varying trial lengths. To improve the control performance under iteration-varying trial lengths, the proposed learning algorithms are equipped with a searching mechanism to collect useful but avoid redundant past control information, which is able to expedite the learning speed. The searching mechanism is realized by introducing newly defined stochastic variables and an [1] S. Arimoto, S. Kawamura, F. Miyazai, Bettering operation of robots by learning, J. Robot. Syst. 1 (2) (1984) [2] D.A. Bristow, M. Tharayil, A.G. Alleyne, A survey of iterative learning control: a learning-based method for high-performance tracing control, IEEE Control Syst. Mag. 26 (3) (2006) [3] H.-S. Ahn, Y.Q. Chen, Kevin L. Moore, Iterative learning control: survey and categorization from 1998 to 2004, IEEE Trans. Syst. Man Cybern. C 37 (6) (2007) [4] D. Shen, Y. Wang, Survey on stochastic iterative learning control, J. Process Control 24 (12) (2014) [5] D. Huang, J.-X. Xu, V. Venataramanan, T.C. Tuong Huynh, High performance tracing of piezoelectric positioning stage using current-cycle iterative learning control with gain scheduling, IEEE Trans. Ind. Electron. 61 (2) (2014) [6] D. Huang, J.-X. Xu, X. Li, C. Xu, M. Yu, D-type anticipatory iterative learning control for a class of inhomogeneous heat equations, Automatica 49 (2013) [7] X. Li, D. Huang, B. Chu, J.-X. Xu, Robust iterative learning control for systems with norm-bounded uncertainties, Internat. J. Robust Nonlinear Control 26 (2016) [8] D. Shen, Y. Xu, Iterative learning control for discrete-time stochastic systems with quantized information, IEEE/CAA J. Autom. Sin. 3 (1) (2014) [9] X. Bu, Z. Hou, S. Jin, R. Chi, An iterative learning control design approach for networed control systems with data dropouts, Internat. J. Robust Nonlinear Control 26 (2016) [10] D. Shen, J.-X. Xu, A novel Marov chain based ILC analysis for linear stochastic systems under general data dropouts environments, IEEE Trans. Autom. Control (2017). [11] D. Meng, K.L. Moore, Learning to cooperate: networs of formation agents with switching topologies, Automatica 64 (2016) [12] T.D. Son, G. Pipeleers, J. Swevers, Robust monotonic convergent iterative learning control, IEEE Trans. Automat. Control 61 (4) (2016) [13] X. Li, Q. Ren, J.-X. Xu, Precise speed tracing control of a robotic fish via iterative learning control, IEEE Trans. Ind. Electron. 63 (4) (2016) [14] R.W. Longman, K.D. Mombaur, Investigating the use of iterative learning control and repetitive control to implement periodic gaits, Lecture Notes in Control and Inform. Sci. 340 (2014) [15] T. Seel, T. Schauer, J. Raisch, terative learning control for variable pass length systems, in: Proceedings of the 18th IFAC world congress, August 28 September 2, Milano, Italy, 2011, pp [16] T. Seel, C. Werner, T. Schauer, The adaptive drop foot stimulator C Multivariable learning control of foot pitch and roll motion in paretic gait, Med. Eng. Phys. 38 (11) (2016) [17] T. Seel, C. Werner, J. Raisch, T. Schauer, Iterative learning control of a drop foot neuroprosthesis ł Generating physiological foot motion in paretic gait by automatic feedbac control, Control Eng. Pract. 48 (2016) [18] M. Guth, T. Seel, J. Raisch, Iterative learning control with variable pass length applied to trajectory tracing on a crane with output constraints, in: Proceedings of the 52nd IEEE Conference on Decision and Control, Florence, Italy, 2013, pp [19] X. Li, J.-X. Xu, D. Huang, An iterative learning control approach for linear systems with randomly varying trial lengths, IEEE Trans. Automat. Control 59 (7) (2014) [20] X. Li, J.-X. Xu, Lifted system framewor for learning control with different trial lengths, Int. J. Autom. Comput. 12 (3) (2015) [21] X. Li, J.-X. Xu, D. Huang, Iterative learning control for nonlinear dynamic systems with randomly varying trial lengths, Internat. J. Adapt. Control Signal Process. 29 (11) (2015) [22] T. Seel, T. Schauer, J. Raisch, Monotonic convergence of iterative learning control systems with variable pass length, Internat. J. Control 90 (3) (2017) [23] D. Shen, W. Zhang, Y. Wang, C.-J. Chien, On almost sure and mean square convergence of p-type ILC under randomly varying iteration lengths, Automatica 63 (1) (2016)

131 16 X. Li, D. Shen / Systems & Control Letters 107 (2017) 9 16 [24] D. Shen, W. Zhang, J.-X. Xu, Iterative learning control for discrete nonlinear systems with randomly iteration varying lengths, Systems Control Lett. 96 (2016) [25] Y. Chen, C. Wen, Z. Gong, M. Sun, An iterative learning controller with initial state learning, IEEE Trans. Automat. Control 44 (2) (1999) [26] M. Sun, D. Wang, Iterative learning control with initial rectifying action, Automatica 38 (7) (2002) [27] J.-X. Xu, R. Yan, On initial conditions in iterative learning control, IEEE Trans. Automat. Control 50 (9) (2005) [28] J. Bolder, T. Oomen, Inferential iterative learning control: A 2D-system approach, Automatica 71 (2016) [29] W. Pasze, E. Rogers, K. Galowsi, Experimentally verified generalized KYP Lemma based iterative learning control design, Control Eng. Pract. 53 (2016)

Available online at www.sciencedirect.com Journal of the Franlin Institute 354 (2017) 5091 5109 www.elsevier.

Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, PR China Received 28 November 2016; received in revised form 21 March 2017; accepted 17 May 2017

132 Available online at Journal of the Franlin Institute 354 (2017) Learning control for linear systems under general data dropouts at both measurement and actuator sides: A Marov chain approach Dong Shen, Yanqiong Jin, Yun Xu College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , PR China Received 28 November 2016; received in revised form 21 March 2017; accepted 17 May 2017 Available online 9 June 2017 Abstract This paper contributes to the convergence analysis of iterative learning control for linear systems under general data dropouts at both measurement and actuator sides. By using a simple compensation mechanism for the dropped data, the sample path behavior along the iteration axis is analyzed and formulated as a Marov chain first. Based on the Marov chain, the recursion of the input error is reformulated as a switching system, and then a novel convergence proof is established in the almost sure sense under mild design conditions. Illustrative examples are provided to verify the theoretical results The Franlin Institute. Published by Elsevier Ltd. All rights reserved. 1. Introduction In practical industrial processes, a large class of systems could accomplish some given tass in a finite time interval and repeat the process continuously. Such systems are expected to achieve highly precise tracing performance during the repetition. To this end, the human This wor was supported by National Natural Science Foundation of China ( , ) and Beijing Natural Science Foundation ( ). Corresponding author. address: shendong@mail.buct.edu.cn (D. Shen) / 2017 The Franlin Institute. Published by Elsevier Ltd. All rights reserved.

133 5092 D. Shen et al. / Journal of the Franlin Institute 354 (2017) learning idea is introduced for the control of repetitive systems, which motivates the development of iterative learning control (ILC) [1 9]. In ILC, the controller generates the input signal for the current iteration based on the tracing information and input signals of previous iterations, so that the tracing performance can be gradually improved along the iteration axis. ILC is advantageous to have a simple but very effective control structure. Moreover, an increasing number of systems such as large-scale systems adopt the networed control framewor to enhance flexibility and robustness recently. In the networed control framewor, the system and controller are usually separated and they communicate with each other through wired or wireless networs. For such inds of systems, data may be dropped during the transmission due to various unpredictable factors. This problem has become one of the hot topics for practical applications because it may damage the tracing performance [10 12]. This observation motivates the researchers to consider ILC with the data being transmitted through wired/wireless networs. In ILC, there exist some cases that we have to implement the system into networed control structure. For example, the ILC has been successfully applied to the control of a twolin robot fish in [8]. In this application, the learning process is completed by an individual computer and the generated signals have to be transmitted through wireless networs to the robot fish. Moreover, the learning control can be applied to unmanned aerial vehicles (UAVs) for surveilling some specified area. In this case, the UAVs can access the command signals through wireless networs only. Another similar example is the trajectory-eeping control in satellite formation flying [13]. In these applications, the data transmission between the plant and the learning controller has to use wireless networs, whence data dropouts may occur due to lin congestion and limited transmission bandwidth among other factors. Therefore, it is of great demand to consider the design and analysis of ILC under random data dropout environments. Indeed, several pioneering wors in ILC field have been conducted to solve this problem from different perspectives and propose several design and analysis techniques. However, the results are far from complete and many open problems remain. The first attempt on ILC under data dropouts was reported by Ahn et al. in a series of conference papers [14 16]. In these papers, a Bernoulli model was used to describe the random data dropout, the Kalman filtering based technique was applied to conduct the convergence analysis, and the mean square convergence of the input sequence was thus obtained. Moreover, several subsequent papers [17 20] provided a mathematical expectation based convergence of the ILC algorithm, where random factors were firstly eliminated by taing mathematical expectation to the evolution dynamics and then the convergence was analyzed following a deterministic way. The first almost sure convergence result was given in [21,22], where the data dropout was modeled by a stochastic sequence with a bounded length requirement on successive data dropouts. Then the stochastic approximation based techniques were employed to derive the convergence analysis. To recap, the above publications indeed paved promising roads for solving ILC problem under random data dropouts; however, we should note that all the above discussions are limited to a special case that the data transmission was assumed to be randomly dropped only from the plant to the controller (we call it at the measurement side hereafter) only [14 22]. In other words, the wired/wireless networs for transmitting the input signal from the controller to the plant (we call it at the actuator side hereafter) are assumed to wor well. It is apparent that such setting is impractical for applications because the transmission of the measured data and the input signals usually employs the same networs. This motivates us to further consider the ILC problem under general data

134 D. Shen et al. / Journal of the Franlin Institute 354 (2017) dropout environments, by which we mean that the networs both at the measurement and actuator sides would suffer from data dropouts simultaneously. It is a nontrivial procedure to extend the study from the one-side dropout case to the general data dropout case, although some papers may claim not. In fact, the general case of data dropouts is seldom addressed because of the essential difficulty in the design and analysis of the compensation mechanism for the dropped data. Some attempts were given in [24 27], which provided two major compensation mechanisms for the lost data. In [24,25], when the data pacet at a time instant is dropped, it is compensated with the data pacet at one time instant ahead within the same iteration. Thus, it is required that the data at adjacent time instants should not be dropped simultaneously. That is, this compensation mechanism excludes the successive data dropouts along the time axis. In [26,27], when a data pacet was dropped, then the data pacet at the same time instant but in the last iteration was used to compensate for the lost data. As a consequence, the data pacets from adjacent iterations at the same time instant cannot be dropped meanwhile. That is, this compensation mechanism excludes the successive data dropouts along the iteration axis. In short, the cases of [24 27] failed to address the random successive dropouts problem along the time and/or iteration axes. The inherent reason is that a deterministic compensation mechanism was used in [24 27] for the convergence facilitation and therefore the data dropout cannot allow the genuine randomness. Comparing the results in one-side dropout case [14 22] and two-side dropout case [24 27], we observe that the major difficulty is twofold. On the one hand, the assumption that data only drops at the measurement side guarantees that the generated input in the learning controller is always fed to the system successfully, so that these two signals are identical for each iteration. While for the general case, due to the data dropouts at the actuator side, the synchronization is no longer valid. In other words, an additional random asynchronism would arise between the input generated by the controller and the one used for the system. On the other hand, all the papers [14 22] chose the intermittent update mechanism to deal with the data dropout problem, in which the algorithm simply set the dropped data to be zero, i.e., no data was compensated for the lost part. However, such mechanism is not suitable for the data dropout at the actuator side. A recent paper [23] showed that the tracing performance can be seriously damaged if the intermittent update mechanism was applied to the actuator side. In consideration of these two points, it is of great value to propose a novel analysis method to address various randomness in the general case, which therefore motivates this research. When considering the general successive data dropout case, the main difficulty is to handle unnown random factors. In particular, there exists an additional asynchronism between the input generated by the controller and the one fed to the system besides the random data dropouts, which differs this paper from previous papers [24 27]. In this paper, through a careful analysis of sample path behaviors, the newly introduced asynchronism can be formulated as a Marov chain, which paves a novel way to establish the convergence. Then the recursion of the input is converted into a switching system based on the newly introduced Marov chain. A novel convergence proof is provided in the almost sure sense based on recursive computation of the mathematical expectation of the input error sequence. It should be emphasized that both the formulation of the Marov chain and the associated convergence analysis have not been reported in the existing papers, which provide the primary contributions and technical novelties of this paper. In addition, while we restrict our discussions on the P-type learning law for concise expressions, the results can be extended to other types of learning laws because the Marov property is established according to the asynchronism at

135 5094 D. Shen et al. / Journal of the Franlin Institute 354 (2017) the actuator side and thus is independent of specific learning laws. To the best nowledge of the authors, this paper not only provides a first study on ILC under the general data dropout environments, which admits successive data dropouts along both the time and iteration axes, but also develops a framewor for design and analysis of ILC laws against various random factors. In short, the novelties of this paper are as follows: (1) we propose a general model of two-side data dropout problem, which admits successive dropouts along both iteration and time axes; (2) we establish a novel Marov chain model of the asynchronism between the computed input and the real input; and (3) we give a recursive computation of the expectation of the input error sequence and show the zero-error convergence in almost sure sense. The paper is arranged as follows: Section 2 presents the problem formulation. Section 3 shows the main result of this paper. Section 4 provides two illustration simulations to verify the theoretical results. Concluding remars are given in Section 5. Notations : R is the real number field and R n is the n -dimensional real space. E denotes the mathematical expectation. P denotes the probability of the indicated event. ρ( M ) denotes the spectral radius of a matrix M. I n n denotes the unit matrix with dimension n n, and the subscript n n may be omitted when no misunderstanding is caused. 2. Problem formulation 2.1. System presentation and problem statement Consider the lifted causal system y = H u + y (0) (1) where u R pn and y R qn denote the lifted input vector and output vector of an iteration, respectively. y (0) denotes the initial response of each iteration. is the iteration index, = 1, 2,..., and N denotes the iteration length. p and q denote the dimension of the input dimension and output dimension, respectively. H R qn pn is the system matrix, which is a lower triangular bloc matrix for the causal relationship between the input and output. It is usually formulated as H 1, H 2, 1 H 2, H = H 3, 1 H 3, 2 H 3, 3 0 (2) H N, 1 H N, 2 H N, 3 H N,N where H i, j R q p are Marov parameters [28 30]. As a special case, for the linear time invariant (LTI) system described by state space representation ( A, B, C ) with the relative degree being one, the diagonal parameter H i, i is CB while the off-diagonal parameter H i, j is usually computed as CA i j B, i > j. Remar 1. In this paper, to mae our idea clear and easy to understand, we directly employ the lifted model of a discrete-time system, i.e., Eq. (1). This formulation would save the notations without loss of any generality compared with the traditional state space model. Such model has been used in many previous studies, especially when considering the LTI

136 D. Shen et al. / Journal of the Franlin Institute 354 (2017) system. In addition, the proposed control laws can be applied to nonlinear systems but the convergence analysis would be quite different and thus will be detailed in another paper. The desired trajectory is given as y d. The tracing error is denoted by e y d y. A 1. For the desired trajectory, a unique input u d exists such that y d = H u d + y d (0). A 2. The system can be reset accurately for each iteration, i.e., y (0) = y d (0),. Remar 2. Assumption A 1 is imposed mainly to guarantee the convergence of the input sequence generated by the proposed algorithms. If such assumption is not satisfied, it is only the convergence of the tracing error can be derived. Indeed, A 1 is guaranteed as long as H is of full-column ran. As an illustration, when considering the LTI system ( A, B, C ) with the relative degree being one, the above requirement is satisfied if the input-output coupling matrix CB is of full-column ran. If the relative degree is larger than one, A 1 can be ensured by suitable adjustment to H similar to [29]. Assumption A 2 is the requirement on initial state condition. It is worth pointing out that many efforts have been made to remove or relax such condition such as [9] ; however, the progress is limited. In the papers addressing the consensus problem by ILC such as [31,32], these assumptions are relaxed and the relaxations can be referenced in the study of networed ILC. In this paper, our critical objective is to handle the general data dropouts problem and the involved asynchronism of inputs, thus we simply assume A 1 and A 2 to mae our expressions concentrated. In this paper, the general networ framewor is considered, that is, both networs at the measurement and actuator sides would suffer from random data dropouts. The data dropouts are modeled by two random variables, σ and γ, subject to Bernoulli distribution for both sides, respectively. In other words, both σ and γ are equal to 1 if the corresponding data are transmitted successfully, and 0 otherwise. Moreover, P (σ = 1) = σ and P (γ = 1) = γ, 0 < σ, γ < 1. For clear explanation of the critical idea, the data of each iteration is assumed to be transmitted as one pacage so that the statements are concise, and the extension to the case that data dropout at different time instants are separately transmitted will be detailed at the end of the next section. Remar 3. Generally, the condition 0 < σ, γ < 1 implies that random data dropouts indeed exist but the networ is not completely broen. If σ = 0 or γ = 0, the networ at the measurement side or the actuator side would be completely broen. In such case, no data can be successfully transmitted and thus it is impossible to achieve a perfect tracing. If γ = 1, the problem reduces to the traditional one-side data dropout case, which has been well addressed in [14 22]. If σ = 1, it means that the networ at the measurement side wors well. Then we only have to discuss the effect of the data dropout at the actuator side. The discussions below are still valid for this case. Now we can give our problem statement as follows. Problem statement : the objective of this paper is to design a suitable compensation mechanism for the dropped data, reveal the inherent character of the asynchronism between the inputs generated by the learning controller and fed to the system, and show the almost sure convergence of the input sequence to the desired input.

137 5096 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Fig. 1. Bloc diagram of the proposed ILC framewor ILC algorithms The bloc diagram of the control structure is illustrated in Fig. 1. For the learning controller, if the data are successfully transmitted at the measurement side, then the algorithm would update its input signal; if the data are lost during transmission at the measurement side, then the algorithm would stop updating and retain the input signal of the previous iteration. At the actuator side, if the input signal is successfully transmitted, then the plant will use this new input signal; if the input signal is lost, then the plant will operate the process using the stored input signal of the previous iteration. In Fig. 1, u c and u r denote the computed control signal and the actually used control signal at the th iteration, respectively. In the following, u c and u r are labeled computed control and real control, respectively. Then, the learning algorithm for the computed control is formulated as u c +1 = σ +1 u r + (1 σ +1 ) u c + σ +1 Le. (3) The actually used control signal for the ( + 1) th iteration is u r +1 = γ +1 u c +1 + (1 γ +1 ) u r. (4) From Eq. (3) it is noticed that if σ +1 = 1, i.e., the data is successfully transmitted, then the computed control is updated; otherwise if σ +1 = 0, then the computed control copies the value of the previous iteration. It should be noted that such copy may retain for successive iterations. On the other hand, it is noticed from Eq. (4) that if γ +1 = 1, the real control is successfully updated with the latest computed control ; otherwise if γ +1 = 0, then the real control retains its previous value and the operation is thus repeated. Remar 4. The learning algorithm is updated entirely, that is, the input signal of the entire iteration is lifted and updated in Eq. (3). One may question the computation load of this algorithm when the iteration length N is large. Thus, we should explain that the learning algorithm (3) is formulated only for the convenience of the following technical analysis. For practical applications, the algorithm can be divided into a time domain-based form as we will design the leaning matrix L in diagonal form. Thus no computation problem is involved. Remar 5. From Eqs. (3) and (4), one observes that the data of each iteration is transmitted as one pacage. As a matter of fact, the data dropout variable can be appended to each

138 D. Shen et al. / Journal of the Franlin Institute 354 (2017) time instant. In this case, the data dropout variables σ and γ are replaced by matrices and Ɣ, defined by = diag { σ 1, σ 2,..., σ N } and Ɣ = diag { γ 1, γ 2,..., γ N }, where σ i and γ j describe the random data dropouts both at the measurement and the actuator sides, respectively. The following results are still valid, but they require more complex derivations. The extensions will be detailed at the end of the next section. 3. Main results In this section, we give the convergence analysis of the proposed algorithms (3) and (4). The technical roadmap is as follows. We first establish a novel Marov chain model of the states of the computed input and the real input. Then we proceed to establish an equivalent switching model of the input renewal process. Further, we give recursive estimates on mathematical expectation of the input error sequence and show the zero-error convergence of the input sequence using a novel method. The extension to the time-dependent data dropout case is detailed at the end of this section Marov chain model In this subsection, we will give a new perspective for understanding the relationship between u c and u r. To this end, subtracting both sides of Eq. (3) from the desired input u d obtains u d u +1 c = u d [ σ +1 u r + (1 σ +1 ) u c + σ +1 Le ] = σ +1 (u d u r ) + (1 σ +1 )(u d u c ) σ +1 Le = σ +1 (u d u r ) + (1 σ +1 )(u d u c ) σ +1 LH (u d u r ) = σ +1 (I LH )(u d u r ) + (1 σ +1 )(u d u c ). Denote δu r u d u r c and δu u d u c. Then we have δu c +1 = σ +1 (I LH ) δu r + (1 σ +1 ) δu c. (5) Similarly, subtracting both sides of Eq. (4) yields δu r +1 = γ +1 δu c +1 + (1 γ +1 ) δu r. (6) Then substituting Eq. (5) into Eq. (6), we have δu r +1 = γ +1 [ σ +1 (I LH ) δu r + (1 σ +1 ) δu c ] + (1 γ +1 ) δu r = (I γ +1 I + γ +1 σ +1 (I LH )) δu r + γ +1 (1 σ +1 ) δu c. Now we first show the inherent Marov character of the sample behaviors of δu c as follows. (7) and δu r Lemma 1. The input signals δu c r and δu generated by Eqs. (3) and (4) form a Marov chain.

139 5098 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Proof. The computed control and the real control are evidently updated to be the same new input signal when σ = γ = 1. We first chec the sample path behavior of the learning and updating progress. In the following, the path behavior is called synchronization if the computed control and real control are equal to each other; otherwise, it is called asynchronization. Moreover, it is called a renewal if both computed control and real control are in a state of synchronization but different from their last synchronization. We start from the th iteration where σ = γ = 1 and therefore δu r = δu c. That is, the computed control and real control are in a state of synchronization at the th iteration. Then, for the ( + 1) th iteration, four possible outcomes exist. Case 1: σ +1 = 0 and γ +1 = 1. In this case, from Eqs. (5) and (7) one has δu +1 c = δu c = δu r, δu +1 r = δu +1 c = δu r. Thus the computed control and the real control retain the same status as the th iteration. Case 2: σ +1 = 0 and γ +1 = 0. In this case, it is obvious that δu c +1 = δu c = δu r, δu r +1 = δu r. That is, no change of both computed control and real control occurs. Case 3: σ +1 = 1 and γ +1 = 1. In this case, we find that δu +1 c r = (I LH ) δu δu r +1 = δu c +1, = (I LH ) δu r. In other words, the computed control and the real control are updated simultaneously and are still equal to each other. In short, a renewal occurs. Case 4: σ +1 = 1 and γ +1 = 0. In this case, only the computed control is updated, δu +1 c r = (I LH ) δu, δu +1 r = δu r. As a result, it becomes asynchronization. The probabilities for the above four cases are (1 σ) γ, (1 σ)(1 γ), σ γ, and σ(1 γ), respectively. From the above discussions, we find that (a) the computed control and the real control stay in the state of synchronization except in the last case; and (b) a renewal occurs when no data dropouts happen at the measurement and the actuator sides simultaneously. Therefore, we further discuss the last case, that is, we assume that the computed control and the real control become the last case at the ( + 1) th iteration. Then, four possible outcomes exist for the ( + 2) th iteration. Case 1 : σ +2 = 0 and γ +2 = 1. In this case, the real control is updated, δu +2 c = δu +1 c r = (I LH ) δu,

140 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Fig. 2. Illustration of the Marov chain of synchronization and asynchronization. S: synchronization; A: asynchronization; : there is a renewal with probability σ γ. δu +2 r = δu +2 c r = (I LH ) δu. That is, the computed control and the real control achieve synchronization, and a renewal occurs. Case 2 : σ +2 = 0 and γ +2 = 0. In this case, no change happens to both the computed control and the real control, δu +2 c = δu +1 c r = (I LH ) δu, δu +2 r = δu +1 r = δu r. Then the computed control and real control are still in the state of asynchronization. Case 3 : σ +2 = 1 and γ +2 = 0. In this case, only the computed control is updated, δu +2 c r r = (I LH ) δu +1 = (I LH ) δu, δu +2 r = δu +1 r = δu r. The values of the computed control and the real control remain the same to the ( + 1) th iteration, and they are in the state of asynchronization Case 4 : σ +2 = 1 and γ +2 = 1. In this case, both the computed control and the real control are updated, δu +2 c r r = (I LH ) δu +1 = (I LH ) δu δu r +2 = δu c +2 = (I LH ) δu r., As a result, the computed control and the real control become synchronization again, and a renewal occurs. The probabilities of the four outcomes, i.e., Cases 1 4, being the same to Cases 1 4 above. The analysis indicates that (a) from the asynchronization state, the computed control and the real control will either remain unchanged or become synchronization again; and (b) a renewal occurs whenever the state changes to synchronization. Thus, we can conclude that the computed control and the real control have only two states, namely, synchronization and asynchronization, respectively. Moreover, the two states would switch between each other following a Marov chain, as shown in Fig. 2. Specifically, from Fig. 2, we can observe the transition probability distribution of the Marov chain. That is, from the state of synchronization, the probability of the inputs retaining synchronization is 1 σ(1 γ), while the probability of the inputs switching to asynchronization is σ(1 γ). From the state of asynchronization, the probabilities of retaining asynchronization and switching to synchronization are 1 γ and γ, respectively. Note that although the transition

141 5100 D. Shen et al. / Journal of the Franlin Institute 354 (2017) probability from synchronization to synchronization is 1 σ(1 γ), there exists a renewal of the inputs with probability σ γ. All these probabilities are computed in Lemma 1. Note that we have shown that the switching between synchronization and asynchronization only depends on the state of the last iteration. Thus the proof is completed Convergence analysis In this subsection, we will give the convergence proof with the help of the Marov chain property specified in the last subsection. To this end, let us first design the learning gain matrix and then propose the convergence results. Note that the system matrix H is a bloc lower triangular matrix and the diagonal Marov parameter is H i, i, 1 i N. Thus, we could design the learning gain matrix L as a bloc diagonal matrix, i.e., L = diag { L 1,..., L N }. Then all eigenvalues of I L i H i,i, 1 i N evidently lie in the range (0, 1) whenever the learning gain matrix L is designed such that 0 < ρ(i L i H i,i ) < 1, 1 i N, where ρ( ) denotes the spectral radius. The main result of this paper is given in the following theorem. Theorem 1. Consider the linear system (1) and assume that A1 A2 hold. The learning update laws (3) and (4) guarantee the zero-error convergence of the output to any desired trajectory y d asymptotically as the iteration number goes to infinity, if the learning gain satisfies 0 < ρ(i L i H i,i ) < 1, 1 i N. Proof. From Fig. 2, the transition matrix of the Marov chain is formulated as [ ] [ ] p P = 11 p 12 1 σ(1 γ) σ(1 γ) = p 21 p 22 γ 1 γ where p 11 P (τ +1 = S τ = S), p 12 P (τ +1 = A τ = S), p 21 P (τ +1 = S τ = A ), and p 22 P (τ +1 = A τ = A ) with τ being the state of the th iteration, S the synchronization state, and A the asynchronization state. Note that 0 < σ, γ < 1 ; thus, P is irreducible, aperiodic, and recurrent, which further means P is ergodic. The stationary distribution π of this Marov chain through solving the equation πp = π is given as [ π γ γ + σ σ γ, σ σ γ γ + σ σ γ (8) ]. (9) Note that the renewal only occurs when the state changes to synchronization. Specifically, the occurrence probability of renewal is σ γ for the case that the state switches from synchronization to synchronization, and such probability is γ for the case that the state switches from asynchronization to synchronization. Thus, the probability of renewal along the iteration axis can be calculated γ P (renewal) = γ + σ σ γ σ γ + σ σ γ γ + σ σ γ γ σ γ = γ + σ σ γ. (10) In addition, whenever a renewal occurs, both the computed control and the real control will improve. To be specific, with the help of the above analysis, we can introduce a random variable λ to denote whether a renewal happens or not, i.e., λ = 1 if a renewal happens, and 0 otherwise. Moreover, λ obeys the Bernoulli distribution, p 1 P (λ = 1) = σ γ γ + σ σ γ

142 D. Shen et al. / Journal of the Franlin Institute 354 (2017) and p 2 P (λ = 0) = 1 P (λ = 1). Then, the recursion of the real input error could be formulated as δu r = λ (I LH ) δu r 1 + (1 λ ) δu r 1. (11) The last recursion can be regarded as a switched system δu r = Ɣ δu r 1 where Ɣ = I LH when λ = 1 and Ɣ = I when λ = 0. Let us denote Ɣ = Ɣ Ɣ 1,..., Ɣ 1 and S = { Ɣ : taen over all sample paths}. We have the following claim: Claim 1 : the mean of S, denoted by M, is defined recursively by M = ( p 1 (I LH ) + p 2 I ) M 1. (13) The proof for this claim is given by direct calculations. According to the definition of mean, M = Ɣ S P ( Ɣ ) Ɣ. Then, M = P ( Ɣ ) Ɣ Ɣ S = P ( Ɣ 1 )(p 1 (I LH ) Ɣ 1 + p 2 Ɣ 1 ) Ɣ 1 S 1 = (p 1 (I LH ) + p 2 I ) P ( Ɣ 1 ) Ɣ 1 Ɣ 1 S 1 = (p 1 (I LH ) + p 2 I ) M 1. The claim is proved. To show the expected convergence of δu r, Eq. (12) indicates that E δu r = E ( Ɣ δu r 0 ) = (p 1 (I LH ) + p 2 I ) E δu r 0. Thus, showing that ρ(p 1 (I LH ) + p 2 I ) < 1 is sufficient. On the one hand, I LH is a lower triangular matrix with its diagonal elements being I L i H i,i, 1 i N and I is the identity matrix. On the other hand, 0 < p 1, p 2 < 1 and p 1 + p 2 = 1. Verifying that ρ(p 1 (I LH ) + p 2 I ) < 1 thus requires little effort. In addition, given that ρ(i LH ) < 1, a suitable norm exists such that 0 < p 1 I LH + p 2 I < 1. Then, a positive constant 0 < μ < 1 exists such that 0 < p 1 I LH + p 2 I < μ. Then the recursion (12) leads to E δu r E Ɣ E δu r 0 Consequently =1 (E Ɣ ) E δu r 0 = (p 1 I LH + p 2 I ) E δu r 0 μ E δu r 0. E δu r =1 μ E δu r 0 (12)

143 5102 D. Shen et al. / Journal of the Franlin Institute 354 (2017) μ = 1 μ E δu 0 r <. Then by Marov inequality, for any ɛ > 0 we have =1 P ( δu r > ɛ) =1 E δu r <. (14) ɛ Therefore, P ( δu r > ɛ, infinitely often ) = 0is concluded by Borel Cantelli lemma, ɛ > 0, and P ( lim δu r = 0) = 1 is obtained further. The zero-error of input convergence is thus proved. From the relationship e = H δu r, the proof is completed. Remar 6. Define the function f ( σ, γ) = P (renewal). Evidently, f ( σ, γ) = f ( γ, σ). More- f ( σ, γ) γ over, through simple calculations, one has = 2 > 0. This condition means that σ ( γ + σ σ γ) 2 a larger successful transmission rate corresponds to more renewals, and thus, faster algorithm convergence. This result coincides with our intuitive nowledge. Now let us further discuss the extension to time-dependent update case. That is, as discussed in Remar 5, we introduce σ t and γ t to denote the data dropout occurring at time t for the updating of the th iteration. According to Remar 4, the control law could run time instant by time instant, i.e., it is not mandatory to use the lifted forms (3) and (4). In other words, the update algorithms for such general data dropout case tae the following forms, u c +1 (t ) = σ t +1 u r (t ) + (1 σ t +1 ) u c (t ) + σ t +1 L 0 e (t + 1), (15) u +1 r (t ) = γ +1 t u +1 c t (t ) + (1 γ+1 ) u r (t ). (16) In such case, the equivalent switching process of the input error, i.e., a counterpart to Eq. (11), is formulated as δu r = (I LH ) δu r + (I ) δu r (17) where = diag { λ 0,..., λn 1 } with λ t being defined similar to λ in Eq. (11). Therefore, different from Eq. (11) where only two cases are involved, there are 2 N cases in Eq. (17) for the switching. Note that the Marov chain established in Lemma 1 is in term of the synchronism and asynchronism of the computed control and real control of the entire iteration. Thus, when considering the time-dependent data dropout case, it is easy to find that the computed control and real control for arbitrary time instant also follow the same path behaviors. That is, for arbitrary given t, the synchronous and asynchronous states of δu c r and δu forms a two-state Marov chain. The combinations of the states of all time instants will no longer be described by two states only, i.e., synchronization and asynchronization. Instead, such combination consists of 2 N states because the controls at different time instants switch independently. Consequently, the combination of N independent Marov chains still behaves as a Marov chain with 2 N states. Among the 2 N states, two special cases should be pointed out. That is, the first one is that both the computed control and real control at all time instants achieve synchronization simultaneously, and the other one is that both controls at all time instants achieve asynchronization simultaneously. Moreover, the corresponding switched recursion (12) now becomes a switched system with 2 N cases of Ɣ, which includes two cases, i.e., I LH and I, as its

144 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Fig. 3. Output profiles at the 2nd, 5th, and 10th iterations and the desired trajectory. special cases according to the above special states. Specifically, the diagonal blocs of Ɣ are defined as follows: the t th diagonal bloc matrix of Ɣ is I L t H t,t if the computed control and real control at time t achieve renewal, otherwise, it is I p p. Then the convergence for the time-dependent data dropouts case can be verified following the same steps to those of Theorem 1. Remar 7. We give a further remar on the compensation mechanism taing Eq. (16) as an example. In Eq. (16), if the corresponding data pacet u +1 c (t ) is dropped during the transmission, then the available latest data u r (t ) is used to compensate for the data. Thus, this compensation mechanism admits successive random data dropouts along both the iteration and time axes. This is different from the existing compensation mechanisms in [24 27]. Under the time-axis-based compensation mechanism of [24,25], the update law (16) should be formulated as u +1 r (t ) = γ +1 t u +1 c t (t ) + (1 γ+1 ) u +1 c (t 1). Then the data pacet at adjacent time instants are not allowed to be dropped simultaneously. On the other hand, when applying the iteration-axis-based compensation mechanism of [26,27], the update law (16) should be

145 5104 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Fig. 4. Maximal error profile along iteration axis for DDR = 0, 10%, 20%, 30%, and 40%. formulated as u +1 r (t ) = γ t iterations are excluded. +1 u +1 c t (t ) + (1 γ+1 ) u c (t ), where the data dropouts for adjacent Remar 8. Note that the learning process is essentially iteration-dependent (cf. Eqs. (15) and (16) ). It should be pointed out that the dependence is generated by the random data dropouts; that is, the updating of the input would be different with the data dropouts occurring or not. However, from Theorem 1 it is observed that the convergence condition only depends on the system matrix, which is essential iteration-independent. This is because we adopt a simple holding-mechanism when data dropouts occur. Therefore, the essential system dynamics is repetitive and the inherent improvement of the input sequence is determined by the system information only, while the data dropout rate mainly affect the learning speed. This fact is verified in the next section. For the case with nonrepetitive system information or uncertainties, the recent papers [33,34] showed a potential solution where the convergence conditions depend on the combination of system information from successive iterations. 4. Illustrative simulations In this section, we first verify our theoretical results using a numerical simulation of linear time-varying (LTV) system. Then, a case study on an industrial robot is also provided to demonstrate the effectiveness. It should be emphasized that the general time-dependent

146 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Fig. 5. Asynchronization of the computed and real input signals. data dropout problem is considered and the algorithms (15) and (16) are applied in these simulations Numerical example In this subsection, a numerical example is given to verify the theoretical analysis. To show the effectiveness of the proposed algorithm, consider the LTV system (At, Bt, Ct ) with At, Bt, and Ct being 0.2 exp( t/100) sin (t ), At = Bt =[0 0.3 sin (t ) 1]T, Ct =[ cos(t ) 0.8]. The iteration length is N = 100. The initial condition is given as y (0) = yd (0) = 0 for all. The control input for the first iteration is set u0 = 0. The desired trajectory is yd =

147 5106 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Fig. 6. Output profiles at the 2nd, 10th, and 50th iterations and the desired trajectory sin (πt/ 20) sin (πt/ 10). The lifted model H can be calculated directly. The learning gain is selected as L i = 0. 4, 1 i 100. According to Remars 4 and 5, we simulate the general case. That is, the algorithms used here are time-dependent Eqs. (15) and (16). Meanwhile, the data dropout is introduced to each time instant separately rather than to the entire iteration. The algorithms are run for 150 iterations. We define data dropout rate (DDR) as the probability P (σ t t = 0) or P (γ = 0). It is noticed that DDR indicates the average ratio of lost transmission among all iterations. In the simulation, for experiments simplicity, the DDR at the measurement side is set to equal to the one at the actuator side. In addition, five cases of DDR are simulated, namely, DDR = 0, 10%, 20%, 30%, and 40%. Fig. 3 shows the output tracing profiles at the 2nd, 5th, and 8th iterations as well as the desired trajectory for the case that the DDRs at the measurement and actuator sides are both

148 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Fig. 7. Maximal error profile along iteration axis for DDR = 0, 10%, 20%, 30%, and 40%. 10%, 20%, 30%, and 40%, respectively. The figures show that the output converges to the reference after several iterations. Fig. 4 further displays the influence of data dropouts on convergence performance, where the maximal tracing error profiles are compared for DDR = 0, 10%, 20%, 30%, and 40%. Two facts are observed. First, the convergence speed decreases as the DDR increases. This observation coincides with our intuitive recognition as larger DDR means fewer updating iterations. Second, all the maximal tracing error profiles are approximate lines, thereby implying an exponential convergence speed. This observation verifies the theoretical results given in the last section. In addition, to demonstrate the asynchronization of the computed input signal and the real input signal, we introduce a counter τ ( t ) for any given time instant t, denoting the amount number up to the th iteration of the case that the computed input signal is not equal to the real input signal. That is, the counter value is only increased when both the computed and real input signals are not equal to each other. The profiles for all time instants are displayed in Fig. 5 for four cases, i.e., DDR = 10%, 20%, 30%, and 40%, where the profiles rise as the iteration number goes up. This fact illustrates that the asynchronization occurs randomly along the iteration axis independently for different time instants. Moreover, the larger DDR, the larger average value of τ ( t ) at the last iteration.

149 5108 D. Shen et al. / Journal of the Franlin Institute 354 (2017) Case study on an industrial robot In this subsection, the algorithms are applied to an industrial robot. The nominal model of the closed-loop joint is given as follows [35] : 948 G p (s) = (18) s s The desired trajectory is given as follows y d (t ) = sin (πt/ 20) cos (πt/ 10). The operation length is 2 s and the sampling frequency is 100 Hz. Then the system is discretized with iteration length N = 200. The learning gain used in this case is L i = , 1 i 200. The algorithms are run for 100 iterations. Similarly to the last subsection, we also consider five cases of DDR, i.e., DDR = 0, 10%, 20% 30%, and 40%. Fig. 6 shows the output tracing profiles at the 2nd, 10th, and 50th iterations as well as the desired trajectory for the cases that the DDRs at the measurement and actuator sides are both 10%, 20%, 30%, and 40%, respectively. It is evident that the output profiles converge to the desired trajectory asymptotically. The maximal tracing error profiles are illustrated in Fig. 7. It can be seen from this figure that the exponential convergence speed is guaranteed under general data dropout environments. This fact verifies the effectiveness of the proposed compensation mechanism and its associated analysis results. The asynchronization phenomena between the computed control and the real control is also simulated. The results are same to Fig. 5 and thus are omitted for saving space. 5. Conclusions This paper proposes the first convergence analysis of ILC for a linear system with data dropout at both measurement and actuator sides. The proof is carried out by carefully analyzing the sample path behavior and using Marov chain techniques. While the results are carried out for the classic P-type update law, it can be generalized to other update laws such as PD-type. For further analysis, considering the nonlinear system is of great interest. References [1] D.A. Bristow, M. Tharayil, A.G. Alleyne, A survey of iterative learning control: a learning-based method for high-performance tracing control, IEEE Control Syst. Mag. 26 (3) (2006) [2] H.-S. Ahn, Y.Q. Chen, K.L. Moore, Iterative learning control: survey and categorization from 1998 to 2004., IEEE Trans. Syst. Man Cybern. Part C 37 (6) (2007) [3] D. Shen, Y. Wang, Survey on stochastic iterative learning control, J. Process Control 24 (12) (2014) [4] Q. Zhu, J.X. Xu, D. Huang, Iterative learning control design for linear discrete-time systems with multiple high-order internal models, Automatica 62 (2015) [5] D. Meng, K.L. Moore, Learning to cooperate: networs of formation agents with switching topologies, Automatica 64 (2016) [6] D. Shen, Y. Xu, Iterative learning control for discrete-time stochastic systems with quantized information, IEEE/CAA J. Autom. Sin. 3 (1) (2016) [7] D. Shen, W. Zhang, Y. Wang, C.J. Chien, On almost sure and mean square convergence of p-type ILC under randomly varying iteration lengths, Automatica 63 (2016) [8] X. Li, Q. Ren, J.X. Xu, Precise speed tracing control of a robotic fish via iterative learning control, IEEE Trans. Ind. Electron. 63 (4) (2016) [9] Y.S. Wei, X.D. Li, Iterative learning control for linear discrete-time systems with high relative degree under initial state vibration, IET Control Theory Appl. 10 (10) (2016) [10] R. Sathivel, P. Selvaraj, Y. Lim, H.R. Karimi, Adaptive reliable output tracing of networed control systems against actuator faults, J. Franl. Inst. 354 (9) (June 2017)

150 D. Shen et al. / Journal of the Franlin Institute 354 (2017) [11] J. Bai, R. Lu, H. Su, A. Xue, Modeling and h control of wireless networed control system with both delay and pacet loss, J. Franl. Inst. 352 (10) (2015) [12] Q. Ling, A sufficient bit rate condition for mean-square stabilisation of linear systems over multiple lossy networs, IET Control Theory Appl. 10 (13) (2016) [13] H.S. Ahn, K.L. Moore, Y.Q. Chen, Trajectory-eeping in satellite formation flying via robust periodic learning control, Int. J. Robust Nonlinear Control 20 (14) (2010) [14] H.S. Ahn, Y.Q. Chen, K.L. Moore, Intermittent iterative learning control, in: Proceedings of the IEEE International Symposium on Intelligent Control, 2006, pp [15] H.S. Ahn, K.L. Moore, Y.Q. Chen, Discrete-time intermittent iterative learning controller with independent data dropouts, in: Proceedings of the IFAC World Congress, 2008a, pp [16] H.S. Ahn, K.L. Moore, Y.Q. Chen, Stability of discrete-time iterative learning control with random data dropouts and delayed controlled signals in networed control systems, Proceedings of the IEEE International Conference on Control Automation, Robotics, and Vision, 2008b, pp [17] X. Bu, Z. Hou, F. Yu, Stability of first and high order iterative learning control with data dropouts, Int. J. Control Autom. Syst. 9 (5) (2011) [18] X. Bu, Z. Hou, F. Yu, F. Wang, H - iterative learning controller design for a class of discrete-time systems with data dropouts, Int. J. Syst. Sci. 45 (9) (2014) [19] X. Bu, Z. Hou, S. Jin, R. Chi, An iterative learning control design approach for networed control systems with data dropouts, Int. J. Robust Nonlinear Control 26 (2016) [20] C. Liu, J.X. Xu, J. Wu, Iterative learning control for remote control systems with communication delay and data dropout, Math. Probl. Eng (705474) (2012) [21] D. Shen, Y. Wang, Iterative learning control for networed stochastic systems with random data dropouts, Int. J. Control 88 (5) (2015a) [22] D. Shen, Y. Wang, ILC for networed nonlinear systems with random output dropouts and unnown control direction, Syst. Control Lett. 77 (2015b) [23] D. Shen, Almost sure convergence of ILC for networed linear systems with random lin failures, Int. J. Control Autom. Syst. 15 (2) (2017) [24] X. Bu, F. Yu, Z. Hou, F. Wang, Iterative learning control for a class of nonlinear systems with random pacet losses, Nonlinear Anal. Real World Appl. 14 (1) (2013) [25] Y.J. Pan, H.J. Marquez, T. Chen, L. Sheng, Effects of networ communications on a class of learning controlled non-linear systems, Int. J. Syst. Sci. 40 (7) (2009) [26] L.X. Huang, Y. Fang, Convergence analysis of wireless remote iterative learning control systems with dropout compensation, Math. Probl. Eng (609284) (2013) 1 9. [27] J. Liu, X. Ruan, Networed iterative learning control approach for nonlinear systems with random communication delay, Int. J. Syst. Sci. 47 (16) (2016) [28] T.D. Son, G. Pipeleers, J. Swevers, Robust monotonic convergent iterative learning control, IEEE Trans. Autom. Control 61 (4) (2016) [29] G. Pipeleers, K.L. Moore, Unified analysis of iterative learning and repetitive controllers in trial domain, IEEE Trans. Autom. Control 59 (4) (2014) [30] S.K. Oh, J.M. Lee, Stochastic iterative learning control for discrete linear time-invariant system with batch varying reference trajectories, J. Process Control 36 (2015) [31] D. Meng, Y. Jia, J. Du, Consensus seeing via iterative learning for multi-agent systems with switching topologies and communication time-delays, Int. J. Robust Nonlinear Control 26 (12) (2016) [32] D. Meng, K.L. Moore, Robust cooperative learning control for directed networs with nonlinear dynamics, Automatica 75 (2017a) [33] D. Meng, K.L. Moore, Robust iterative learning control for nonrepetitive uncertain systems, IEEE Trans. Autom. Control 62 (2) (2017b) [34] D. Meng, K.L. Moore, Convergence of iterative learning control for SISO nonrepetitive systems subject to iteration-dependent uncertainties, Automatica 79 (2017c) [35] B. Zhang, Y. Ye, K. Zhou, D. Wang, Case studies of filtering techniques in multirate iterative learning control, Control Eng. Pract. 26 (2014)

151

152 376 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 14, NO. 1, JANUARY 2017 Stochastic Point-to-Point Iterative Learning Tracing Without Prior Information on System Matrices Dong Shen, Member, IEEE, Jian Han, and Youqing Wang, Senior Member, IEEE Abstract This paper contributes to a point-to-point iterative learning control problem for stochastic systems without prior information on system matrices. The stochastic approximation technique with gradient estimation by random difference is introduced to design the update law for input. It is strictly proved that the input sequence would converge almost surely to the optimal one, which minimizes the averaged tracing performance index. An illustrative simulation shows the effectiveness of the proposed algorithm. Note to Practitioners In many practical applications, the system would perform a given tas cycle by cycle. There are also many industrial processes operating periodically. In a word, repeatability is inherent in these systems. Then, it would lead to constant improvements to the system performance, just as we are able to learn from experiences and subsequently improve our behaviors in daily lives. This is exactly the control idea in this paper. On the other hand, learning, by trials and errors, allows us to mae modifications and corrections on the actions. This is why the algorithm proposed in this paper could achieve accurate tracing without system information. To be specific, the algorithm updates differently during the odd cycles and the even cycles. For the odd cycle, the control is added with a small perturbation, while for the subsequent even cycle, the control is updated by estimating the gradient with the help of trial information. The detailed algorithm steps are provided with strict analysis on the convergence and optimality properties. Index Terms Almost sure convergence, iterative learning control (ILC), point-to-point control, stochastic approximation. I. INTRODUCTION AS is well nown, learning is a basic sill of human to survive in the ancient times and to improve their wor and lives in the modern times. Inspired by this intuitive idea, iterative learning control (ILC) is introduced to improve the tracing performance, where the system could accomplish a given tas over a fixed time interval repeatedly. As learning is drawn into the input update law, the algorithm of ILC could Manuscript received May 17, 2016; revised July 14, 2016; accepted October 11, Date of publication December 1, 2016; date of current version January 4, This paper was recommended for publication by Associate Editor A. Pashevich and Editor H. Ding upon evaluation of the reviewers comments. This wor was supported in part by the National Natural Science Foundation of China under Grant , Grant and Grant , in part by the Beijing Natural Science Foundation under Grant , and in part by the Beijing Nova Program under Grant (Corresponding author: Dong Shen.) D. Shen and Y. Wang are with the College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , China ( shendong@mail.buct.edu.cn; wang.youqing@ieee.org). J. Han was with the College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , China. He is now with the Informatics Institute, Faculty of Science, University of Amsterdam, Amsterdam 1098XH, The Netherlands ( J.Han@uva.nl). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TASE be simple but effective as shown in many previous studies. Some excellent surveys can be found in [1] [3]. It is noted that reinforcement learning and/or approximate dynamic programming based control also integrates the concept of learning into control strategy [4]; however, ILC differs from these methods in that ILC pays more attention to the performance improvement along the iteration axis rather than that along the time axis. The standard ILC usually requires the system output to trac a desired objective over the whole time interval. However, in many practical applications, maybe only some points are required to be traced accurately while the others are free. As a simple but a classic example, a basetball player shoots from a fixed position repeatedly. What the player focuses is whether the basetball hits the target, rather than whether the basetball tracs some settled trajectory. In other words, only the terminal point is considered, and this ind of ILC is termed terminal iterative learning control (TILC) [5] [7]. As a general case, tae a train or subway passing some stations into consideration. One could find that only the schedule that the train/subway arrives at each station is requested, while the running status between stations is much flexible. In this case, the point-to-point iterative learning control (P2PILC) is more suitable for the control objective, because more degrees of design freedom are allowable. Both TILC and P2PILC have been considered in the previous studies. In [8] and [9], the point-to-point tracing problem was solved through iteratively updating the reference instead of the input profile along the iteration axis, which showed a novel way for the point-to-point control problem. The benefit was that they made a good use of the freedom of trajectory. Another promising method to deal with the point-to-point problem is directly updating the control signal based on specified tracing data, as shown in [9]. In addition, Owens et al. [10] presented a norm-optimal ILC solution to the continuous-time pointto-point problem and Chu et al. [11] showed a successive projection-based method. For a MIMO system, the required pass points in the above literature are the whole output vectors for arbitrary given time instances, while, in practice, we may only claim part components of the output vector to satisfy constraints. This ind of point-to-point tracing problem was studied in [12] for linear systems and in [13] for nonlinear systems, respectively. Detailed formulation of such a ind of point-to-point control was given in [12], which also provided an extensive analysis on gradient descent-based ILC and Newton method-based ILC with various mixed constraints. Freeman and Dinh [14] further investigated the norm-optimal ILC for highly coupled systems. The stochastic linear system case was addressed in [15] based on the stochastic approximation technique IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

153 SHEN et al.: STOCHASTIC POINT-TO-POINT ITERATIVE LEARNING TRACKING WITHOUT PRIOR INFORMATION ON SYSTEM MATRICES 377 However, model information is required in these papers to design the learning law. In other words, the algorithms depend on the system matrices, which may reduce the applications of P2PILC. The model-dependent design condition will be relaxed in this paper, where a random difference is introduced to estimate the gradient, so that no prior information on system matrices is requested. The almost sure convergence of the proposed algorithm to an allowable set is proved and then verified by an illustrative example. This paper introduces the Kiefer Wolfowitz (KW) algorithm into the design of update laws for the P2PILC problem to remove the prior requirement on system matrices, and the convergence of the proposed algorithms is detailed. It is seen that both [16] and [17] also adopted the KW algorithm to solve the traditional ILC problem, that is, the whole reference is required to be traced. Considering the essence of the pointto-point control problem, we propose the lifted form of update laws for the whole iteration differing from the time instant separated form used in [16] and [17]. In addition, the linear stochastic system is considered in this paper, which is not well addressed in previous wors. It is worth pointing out that the Kalman filtering-based approach proposed in [18] and [19] is also an effective method for stochastic ILC when partial system information is available. The rest of this paper is arranged as follows. Section II provides the problem formulations. Section III gives the ILC algorithm and the almost sure convergence result. Section IV provides an illustrative example to show the effectiveness. The conclusion is given in Section V. The detailed proof of the main theorem is put in the Appendix. II. PROBLEM FORMULATION In this section, the system formulation, point-to-point ILC formulation, and control objective will be given in sequence. A. System Formulation Consider the following linear time-varying system: x (t + 1) = A t x (t) + B t u (t) + w (t + 1) y (t) = C t x (t) + v (t) (1) where subscript denotes the cycle number, = 1, 2,..., and t denotes an arbitrary time in a cycle, t [0, N]. x (t) R n, u (t) R p,andy (t) R q are the system state vector, input vector, and output vector, respectively. System matrices A t, B t, and C t are with appropriate dimensions. Noise signals w (t) and v (t) are the system noise and the measurement noise, respectively. In this paper, the noise is simply assumed to be zero-mean Gaussian white noise, i.e., with normal distribution. In addition, for different cycles and different time instances, the noise signals are uncorrelated. The initial state x (0) is set to x 0. Here, assume that the input output coupling matrix C t+1 B t is of full row ran, t = 0, 1,...,N 1. One can rewrite the input and the output into a supervector form given by u =[u T (0), ut (1),...,uT (N 1)]T y =[y T (1), yt (2),...,yT (N)]T R qn. R pn In addition, let C 1 B C 2 A 1 B 0 C 2 B G = N 1 N 1 C N A j B 0 C N A j B 1 C N B N 1 j=1 j=2 where j m=i A m = A j A j 1...A i, j i and j m=i A m = I, j < i, and it is obvious that G R qn pn. Then, we have the following relationship between the input and the output: y = Gu + y 0 + ɛ where y 0 is the response to initial conditions, y 0 =[(C 1 A 0 ) T (C 2 A 1 A 0 ) T... (C N 1 N m=0 A m) T ] T x 0. Without loss of generality, it is assumed y 0 = 0orx 0 = 0. The stochastic noise term ɛ is expressed by v (1) + C 1 w (1) v (2) + C 2 w (2) + C 2 A 1 w (1) ɛ =.. N 1 v (N) + C Nj=0 N w ( j) m= j Noticing requirements on system noise {w (t)} and measurement noise {v (t)}, it is convenient to deduce the following condition. A1: The stochastic noise {ɛ } is a zero-mean Gaussian process noise with covariance Q, i.e., ɛ N(0, Q). For the standard ILC framewor, the tracing objective is A m y d = [ y T d (1), yt d (2),...,yT d (N)] T R qn. (2) Denote standard tracing error as e = y d y. B. Point-to-Point Problem Formulation As we now, in many practical applications, it is not required to trac the whole trajectory y d, but its subset. Here, an equivalent model of [12] is given. Suppose that only l j components of the output at time j is required to trac, 0 l j q, j = 1, 2,...,N. Ifl j = 0, it means the output at time j is completely disregarded. If l j = 0, denote the tracing components by 1 n j,1 < n j,2 < < n j,l j q. Now remove all the points that do not need to be followed from the original objective y d ; one would have a new reference trajectory y r with dimension l, wherel = N j=1 l j.inother words, y r is a condensed reference trajectory of y d satisfying the following relationship: y r = y d (3) where R l qn is a matrix with its element i, j = 1, if the ith dimension of y r locates at the jth dimension of y d,and otherwise i, j = 0. In the following, it is the condensed reference trajectory y r rather than the original one y d that is available for ILC update laws. Moreover, by the definition of and its construction process, it is obvious that is of full row ran,

154 378 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 14, NO. 1, JANUARY 2017 i.e., ran( ) = l. Considering the system formulation, for the following design and analysis, we further need a ran property of G stated in the following lemma. Lemma 1: The matrix G is of full row ran. Proof: Since C t+1 B t is of full row ran, the lifted matrix G also is of full row ran, i.e., ran(g) = qn. By the definition of, it is evident that ran( ) = l. Bythe Sylvesters ran inequality, we have ran( ) + ran(g) qn ran( G), that is, ran( G) l. On the other hand, it is evident ran( G) l. Thus, we find ran( G) = l. C. Control Objective For the standard ILC without any noise in the system, the control objective is to find an input sequence, such that e 0as. The notation here denotes 2-norm and this meaning will be ept through the rest of this paper. However, this control objective is not suitable for the stochastic system. Besides, since the focus of this paper falls on the required reference points, all the components of e may not be available in applications. Therefore, a new performance index is needed. Note that e = y d y and y r = y d, and it leads to e = (y d y ) = y r y (4) as the real tracing information for input updating. There is stochastic noise involved in y, and thus, one could not expect e 0aswhat have been done for the deterministic system. However, one may expect that if the associated stochastic noises are eliminated, the left part would converge to zero. To be specific, denote η y r Gu,and then, one would expect η 0. Therefore, the control objective of this paper is to design the ILC update law, such that u J,whereJ {u : y r Gu = 0}. Here, by u J, we mean that u converges to an element in J. Remar 1: As shown in [12, Lemma 1], if G is of full row ran, the feasible input u forms a space J of dimension pn l. Inotherwords,J could be formulated as J = {( G) y r + u, u null( G)}, where ( G) denotes the pseudoinverse of G. Remar 2: It is worth pointing out that η denotes the tracing error excluding the stochastic noises. Thus, η 0 means the system output may accurately trac the reference y r asymptotically if noise effects of the current cycle do not count. Since the system noises and measurement noises of the current cycle could not be prior predicted, η 0 is actually the best tracing performance. III. MAIN RESULTS In this section, the ILC algorithm is designed and analyzed. Since the system matrices {A t }, {B t },and{c t } are unnown prior, one could not directly calculate the corresponding input u, such that y r = Gu or recursively generate the input sequence based on system matrices. To generate such an input sequence without using system matrices, the KW stochastic approximation algorithm used in [16] and [17] is introduced, where the random difference is involved to estimate the gradient. To this end, we use the following vector sequence {, = 1, 2,...}, where R pn, [ 1, 2,..., pn ]. All components j are mutually independent and identically distributed random variables satisfying the following conditions: j < a, 1 j < b, E 1 j = 0 = 1, 2,..., j = 1, 2,...,pN. It is also assumed that the sequence { j } is independent of {ɛ }.DefinethepN-dimension vector [ ] = 1 1, 1 2,..., 1 pn = 1, 2,... Then, the ILC update algorithm is described as follows. Let {a }, {c },and{m } be the sequences of real numbers satisfying the following conditions: a > 0, a 0, a = (5) c > 0, c 0, =1 ( a =1 c ) 1+ δ 2 < (6) M > 0, M +1 > M, M. (7) The initial input u 0 is simply set to be zero. Then, the algorithm updates differently during the odd cycles and the even cycles. For the odd cycle, the control is defined as follows: while for the even cycle u 2+1 = u 2 + c (8) u 2(+1) = u 2 a ( e 2 2 ) c (9) u 2(+1) = u 2(+1) 1 [ u2(+1) M σ ] (10) 1 σ = 1 [ u2(i+1) >M σi ], σ 0 = 0 (11) i=1 where 1 [inequality] is an indicator function meaning that it equals 1 if the inequality indicated in the bracet is fulfilled, and 0 if the inequality does not hold. Remar 3: The ILC algorithm applied to system (1) with selected tracing points y r is given by (8) (11), where the random difference is applied to estimate the gradient, i.e., the control update direction. Thus, prior information on system matrices could be removed. In addition, an indicator function 1 [ ] is introduced to guarantee the boundedness of the input sequence. Remar 4: Here, we give some explanations on the parameters {a }, {c },and{m }. a denotes the learning step size and it serves as the conventional learning gain incorporating with the random difference. The decrease of a is to mae a suppression of the stochastic noises. c is used to reduce the range of deviation asymptotically, which further leads to a stable learning of the gradient. M is a technical tric to avoid the divergence of the proposed algorithm and ensure a stable improvement of the input sequence. The selection of

155 SHEN et al.: STOCHASTIC POINT-TO-POINT ITERATIVE LEARNING TRACKING WITHOUT PRIOR INFORMATION ON SYSTEM MATRICES 379 these parameters should satisfy (5) (7). Usually, a and c is given as ϱ τ,whereϱ and τ are suitable positive constants and M is selected as 2 or 3. Then, the following theorem on convergence of the algorithm could be established. Theorem 1: Consider system (1) and assume A1 hold, then the input sequence generated by (8) (11) tends to J,i.e., u J as goes to infinity. Remar 5: Though the update laws are given based on the KW algorithm, the convergence analysis is established based on the convergence results of the well-nown Robbins Monro (RM) stochastic approximation algorithm [20] [22]. To be specific, the proof is carried out through two steps: first, we transform the proposed update laws into the RM algorithm formulation, and then, we show the almost sure convergence by verifying the convergence conditions of corresponding RM algorithms. Remar 6: The differences between the odd and even cycles lie in roles that they are played in the gradient estimation process. For the odd cycle, the control is involved by a small perturbation, while for the subsequent even cycle, the control is updated by estimating the gradient with the help of trial information. The advantage of our ILC scheme is that it could estimate its gradient based on input/output information, which, therefore, is a data-based control method. In many practical systems, the accurate math model is hard to get due to the complex mechanism, time-varying environments, large scale, and other factors. Thus, the proposed data-based method is quite significant for practical applications. Remar 7: Under the framewor of point-to-point control, it is easy to find that there is not only one solution in J, since G is of full row ran rather than full column ran. Theorem 1 ensures that the input sequence would converge to a limitation in J, but it does not guarantee that the limitations for different experiment paths would be identical. Thus, there is an interesting question: would the input sequence converge to different limitations in different experiments? The answer is no. As a matter of fact, the limitation of the input sequence is ( G) T ( GG T T )y r. However, it is out of the scope of this note, and thus, it is omitted. Remar 8: The linear stochastic system is considered in this paper. As is well nown, most real-world problems involve nonlinearities, and thus, one might be interested in the extension to nonlinear systems. A possible way is maing local linearization to the nonlinear system, as shown in [13]. In addition, our algorithms require no information on system matrices, and thus it is possible to deal with nonlinear systems. However, the analysis would be much complicated and we would lie to address this problem in the next step. IV. ILLUSTRATIVE SIMULATION Consider a LTV system with system matrices given as follows: sin(0.3t) A = sin(2π/t) t 2 Fig. 1. Norm of the control objective: η = y r Gu t B = cos(0.5t) t 2 [ ] 1.8 sin(0.6t) C = sin(0.2t) sin(2π/t) For simple illustration, let N = 6, and then y R 12 and u R 18. The noise ɛ is assumed a zero-mean Gaussian process noise with the covariance Q = I. Suppose the reference points are y (2) d (1), y(2) d (3), y(1) d (4), and y (1) d (6), where the superscript denotes the dimension of the output vector, which describes the general point-to-point tracing problem. That is = It is easy to verify that ran( G) = 4. The arbitrarily selected reference trajectory is y r =[ ] T. The parameters for the algorithm are selected as follows. j is uniformly distributed on [ 1, 0.5] [0.5, 1], 1, 1 j 18. The iteration varying sequences {a } and {c } choose a = (1/(( + 200) 0.95 )) and c = (1/(( + 1) 0.65 )). The expanding parameter M is M = 2. The initial input u 0 is simply the zero vector. Then, the algorithm runs 1000 iterations following (8) (11). In order to illustrate the almost sure convergence of the algorithm, the norm of the modified tracing error, i.e., η = y r Gu, is shown in Fig. 1. As is shown in Section II, it suffices to demonstrate that η 0. It is shown in Fig. 1 that the error η reduces rapidly. This further means that the tracing error will be caused mainly by the system noises and measurement noises of the current iteration asymptotically, while the noises cannot be eliminated by any learning algorithm. This shows the effectiveness of the algorithm from another perspective. That is, the proposed algorithm has actually achieved the best tracing performance under stochastic noises. The whole output of the last iteration y 1000 and the reference points y r are shown in Fig. 2, where the solid line with cycles

156 380 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 14, NO. 1, JANUARY 2017 Theorem 2: Assume that the following H1 H4 hold. H1: a > 0, a 0, and =1 a =. H2: There exists a continuously differentiable function v( ) : R p R, such that sup f T (x) v(x) < 0 δ d(x,j ) x Fig. 2. Output of the last iteration y 1000 and the reference points y r. denotes all the components of the 12-dimension output of the last iteration and the four squares denote the reference points. The x-label 1 12 are corresponding to y (1) (1), y (2) (1), y (1) (2),...,andy (2) (6). Thus, the four bold squares in Fig. 2 correspond to y (2) d (1), y(2) d (3), y(1) d (4), andy(1) d (6), respectively. It can be found from the figure that the system output could trac the reference points effectively under the noise environment. However, the outputs of the unrequested points are with no consideration in the proposed algorithm. From simulations, we find that the outputs of these positions change from simulation to simulation. Therefore, the unrequested outputs leave much freedom for us to design the algorithm for seeing more optimization objectives or satisfying more constraints. It is an interesting and open topic in the stochastic point-to-point ILC problem. V. CONCLUSION In this paper, an ILC is designed for the stochastic pointto-point control problem without prior information on system matrices. A gradient estimating method, namely, the KW stochastic approximation algorithm is introduced to design the ILC update law. The ILC algorithm is proved convergent and optimal with probability one under mild conditions. For further research, it is of interest to consider how to reduce the dimension of the ILC algorithm and improve the convergence rate. It is also of interest to consider other control objectives, such as LQG problem or energy cost constraints. APPENDIX The following convergence theorem of stochastic approximation comes from [22]. Let f ( ) : R p R p and J {x : f (x) = 0}. Taea sequence of positive real numbers M satisfying M +1 > M, M. Consider the following algorithm: x +1 = (x + a y +1 )1 [ x +a y +1 M σ ] + x 1 [ x +a y +1 >M σ ] (12) y +1 = f (x ) + ε +1 (13) 1 σ = 1 [ xi +a i y i+1 >M σi ], σ 0 = 0. (14) i=1 for any >δ>0, where d(x, J) = inf y { x y, y J}, v(j) is nowhere dense. Furthermore, there exists a constant c 0 > 0, such that x < c 0 and v(x )<inf x =c0 v(x). H3: The function f ( ) is measurable and locally bounded. H4: Along the subscripts {n } of any convergent subsequence x n m(n 1,T ) lim lim sup T 0 T a i ε i+1 1 [ xi K ] i=n = 0 T [0, T ] if K is sufficiently large, where m(, T ) max{m : mi= a i T }. Then, with any initial value x 0, x defined by (12) (14) converges to J with probability one. Proof of Theorem 1: Denote L ( e e 2 2 ) (15) c where e = y r Gu Gɛ. Thus L = c ( y r Gu 2+1 Gɛ y r Gu 2 Gɛ 2 2 ) = c ( y r Gu y r Gu Gɛ Gɛ 2 2 2(y r Gu 2+1 ) T Gɛ (y r Gu 2 ) T Gɛ 2 ) = c ( y r Gu 2 Gc 2 y r Gu Gɛ Gɛ 2 2 2(y r Gu 2 ) T Gɛ c ( G ) T Gɛ (y r Gu 2 ) T Gɛ 2 ) = ( 2c ( G ) T (y c r Gu 2 ) + Gc 2 + Gɛ Gɛ 2 2 2(y r Gu 2 ) T Gɛ c ( G ) T Gɛ (y r Gu 2 ) T Gɛ 2 ) = 2( G) T (y r Gu 2 ) δ + γ + θ + α 2+1 α 2 β β 2

157 SHEN et al.: STOCHASTIC POINT-TO-POINT ITERATIVE LEARNING TRACKING WITHOUT PRIOR INFORMATION ON SYSTEM MATRICES 381 where δ = 2 ( T I) ( G) T (y r Gu 2 ) γ = c G 2 θ = 2 T ( G)T Gɛ 2+1 α 2+1 = c Gɛ α 2 = c Gɛ 2 2 β 2+1 = 2 c (y r Gu 2 ) T Gɛ 2+1 β 2 = 2 (y c r Gu 2 ) T Gɛ 2. Set g(x) = 2( G) T (y r Gx) and ξ = δ + γ + θ + α 2+1 α 2 β β 2, and then, the algorithm (9) (11) is u 2(+1) = (u 2 a g(u 2 ) a ξ ) 1 [ u2 a g(u 2 ) a ξ M σ ] 1 σ = 1 [ u2(i+1) >M σi ], σ 0 = 0. i=1 Comparing with Theorem 2 in the Appendix, to show that u 2 converges to J, conditions H1 H4 should be satisfied. H1 is fulfilled by the selection of a. For H2, select the Lyapunov function as v(x) = (y r Gx) T G( G) T (y r Gx), andthen v(x) x g(x) = 2(y r Gx) T G( G) T G( G) T (y r Gx). Noticing that v(j ) = 0, it is nowhere dense. Besides, H2 is fulfilled as x = 0. In the present case, g(x) is a linear function, and thus, H3 is valid. Thus, it only remains to verify H4. Since c 0as, one can get m(,t 1 ) lim lim sup T 0 T a i γ i i= = 0 T [0, T ]. (16) Therefore, to chec condition H4, it suffices to show a [ δ + θ + α 2+1 α 2 =1 β β 2 ]1 [ u2 <K ] <, a.s. (17) First, chec the term a (α 2+1 α 2 ). It is noticed α 2+1 α 2 = c ( Gɛ Gɛ 2 2 ) = c ( tr ( Gɛ2+1 ɛ T 2+1 ( Gɛ2 ɛ2 T )) ( G)T = ( ( ( tr G ɛ2+1 ɛ2+1 T c Q) ( G) T ) tr ( G ( ɛ 2 ɛ2 T Q) ( G) T )). By assumption A1, {(( )/(c ))tr( G(ɛ 2+1 ɛ2+1 T Q)( G) T )} and {(( )/(c ))tr( G(ɛ 2 ɛ2 T Q)( G)T )} are the sequences of zero-mean mutually independent random vectors with bounded moments. Then, by the convergence theorem for martingale difference sequence [23] a (α 2+1 α 2 )<, a.s. (18) =1 and hence a (α 2+1 α 2 )1 [ u2 <K ] <, a.s. (19) =1 since α 2+1 and α 2 are independent of u 2. Next, chec the term a (β 2+1 β 2 ). Notice that u 2 is independent of ɛ 2+1 and ɛ 2,and also is independent of ɛ 2+1 and ɛ 2 ; therefore, again by the convergence theorem for martingale difference sequence [23] a (β 2+1 β 2 )1 [ u2 <K ] <, a.s. (20) =1 Now, it comes to a θ. Notice that is independent of ɛ 2+1, and both and are bounded. Besides, ɛ 2+1 is a martingale difference sequence. Thus a T ( G)T Gɛ 2+1 <, a.s. (21) and hence =1 a θ 1 [ u2 <K ] <, a.s. (22) =1 since θ is independent of u 2. Finally, consider the term a δ. It is noticed that 2 3 pn pn T I = 1 2 pn pn 2 pn 3 pn. (23) Because i and j are mutually independent, i = j; thus, E( T I) = 0. Moreover, both i and (1/( j )) are bounded, i, j; thus, T I has a finite moment of 2 + δ. In addition, T I is independent of u 2. Therefore, one has a δ 1 [ u2 <K ] <, a.s. (24) =1 Thus, the condition H4 is verified, and the convergence theorem in the Appendix can be applied. As a result, one has u 2 J, a.s. (25)

158 382 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 14, NO. 1, JANUARY 2017 Since c 0, from (8) and (25) u, a.s. (26) This completes the proof. REFERENCES [1] H.-S. Ahn, Y. Q. Chen, and K. L. Moore, Iterative learning control: Brief survey and categorization, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 37, no. 6, pp , Nov [2] Y. Wang, F. Gao, and F. J. Doyle, III, Survey on iterative learning control, repetitive control, and run-to-run control, J. Process Control, vol. 19, no. 10, pp , [3] D. Shen and Y. Wang, Survey on stochastic iterative learning control, J. Process Control, vol. 24, no. 12, pp , [4] X. Xu, Reinforcement Learning and Approximate Dynamic Programming. Beijing, China: Science Press, [5] T. Liu, D. Wang, and R. Chi, Neural networ based terminal iterative learning control for uncertain nonlinear non-affine systems, Int. J. Adapt. Control Signal Process., vol. 29, no. 10, pp , [6] R. Chi, Z. Hou, S. Jin, D. Wang, and C.-J. Chien, Enhanced datadriven optimal terminal ILC using current iteration control nowledge, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 11, pp , Nov [7] S. Jin, Z. Hou, and R. Chi, Optimal terminal iterative learning control for the automatic train stop system, Asian J. Control, vol. 17, no. 5, pp , [8] C. T. Freeman, Z. Cai, E. Rogers, and P. L. Lewin, Iterative learning control for multiple point-to-point tracing application, IEEE Trans. Control Syst. Technol., vol. 19, no. 3, pp , May [9] T. D. Son, H.-S. Ahn, and K. L. Moore, Iterative learning control in optimal tracing problems with specified data points, Automatica, vol. 49, no. 5, pp , [10] D. H. Owens, C. T. Freeman, and T. V. Dinh, Norm-optimal iterative learning control with intermediate point weighting: Theory, algorithms, and experimental evaluation, IEEE Trans. Control Syst. Technol., vol. 21, no. 3, pp , May [11] B. Chu, C. T. Freeman, and D. H. Owens, A novel design framewor for point-to-point ILC using successive projection, IEEE Trans. Control Syst. Technol., vol. 23, no. 3, pp , May [12] C. T. Freeman and Y. Tan, Iterative learning control with mixed constraints for point-to-point tracing, IEEE Trans. Control Syst. Technol., vol. 21, no. 3, pp , May [13] C. T. Freeman and Y. Tan, Point-to-point iterative learning control with mixed constraints, in Proc. Amer. Control Conf., San Francisco, CA, USA, Jul. 2011, pp [14] C. T. Freeman and T. V. Dinh, Experimentally verified point-topoint iterative learning control for highly coupled systems, Int. J. Adapt. Control Signal Process., vol. 29, no. 3, pp , [15] D. Shen and Y. Wang, Iterative learning control for stochastic point-topoint tracing system, in Proc. 12th Int. Conf. Control, Autom., Robot. Vis., Guangzhou, China, 2012, pp [16] H. F. Chen, Almost sure convergence of iterative learning control for stochastic systems, Sci. China Ser. F, vol. 46, no. 1, pp , [17] D. Shen and H. F. Chen, A Kiefer Wolfowitz algorithm based iterative learning control for Hammerstein Wiener systems, Asian J. Control, vol. 14, no. 4, pp , [18] S. S. Saab, On a discrete-time stochastic learning control algorithm, IEEE Trans. Autom. Control, vol. 46, no. 8, pp , Aug [19] S. S. Saab, Selection of the learning gain matrix of an iterative learning control algorithm in presence of measurement noise, IEEE Trans. Autom. Control, vol. 50, no. 11, pp , Nov [20] H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., vol. 22, no. 3, pp , Sep [21] V. S. Borar, Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge, U.K.: Cambridge Univ. Press, [22] H. F. Chen, Stochastic Approximation and Its Applications. Dordrecht, The Netherlands: Kluwer, [23] Y. S. Chow and H. Teicher, Probability Theory: Independence, Interchangeability, Martingales. New Yor, NY, USA: Springer, 1978.

159

160 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, On interval tracing performance evaluation and practical varying sampling ILC Yun Xu, Dong Shen and Youqing Wang College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, P.R. China ABSTRACT This paperconsiders the evaluation of interval tracing error for sampled control performance and an associated sampling technique to enhance the tracing performance. The upper bounds of the tracing error profile of arbitrary sample interval for both the linear system and nonlinear system are first given. A practical sampled-data iterative learning control with varying sampling rates is proposed to ensure a prior given tolerant tracing error. In this control strategy, the inter-sample behaviour is checed to determine which intervals are not satisfactory when the given tracing performance at-sample time instants is satisfied, and then the sampling frequency for such intervals is increased. Both at-sample and inter-sample tracing performance are satisfied after enough learning iterations. Two examples are simulated to demonstrate the effectiveness of the proposed sampling strategy. ARTICLE HISTORY Received 10 August 2016 Accepted 17 December 2016 KEYWORDS Iterative learning control; sampled-data control; tolerant tracing performance; practical varying sampling; inter-sample error bound 1. Introduction The idea of iterative learning control (ILC), which involves the use of tracing information from the previous iterations to construct new input and can improve tracing performance for repetitive systems, was first proposed in Arimoto, Kawamura, and Miyazai (1984). The principle of ILC mimics the human ability using the past experience and nowledge to complete a certain tas repeatedly. A classic example is shooting a basetball from a fixed point. A basetball player learns from previous experiments and constantly revises his/her shooting angle and intensity, thereby ensuring that he/she could hit the baset successfully after enough attempts. ILC is simple and effective as it requires less system information and is a typical data-driven approach to handle nonlinearity, strong coupling, modelling difficulty and high-precision tracing control problems. As a consequence, ILC has attracted much attention of researchers (Ahn, Chen, & Moore, 2007; Bristow, Tharayil,& Alleyne, 2006; Shen & Wang, 2014),andiswidelyusedinindustrialproduction (Bifaretti, Tomei, & Verrelli, 2011; Ji,Hou,&Zhang,2016; Ouyang, Zhang, & Gupta, 2006; Xu, Chu, & Rogers, 2014;Zhao, Lin, Xi, & Guo, 2015).ItisworthnotingthatILCisafeedforward type control method, which is advantageous to ensure the convergence along the iteration axis. Thus, it is also capable of combining with other control techniques for further performances. For example, in Ouyang et al. (2006), a hybrid approach is provided for robot manipulators with integrating ILC and switching method to ensure a fast convergence rate. ILC was first proposed in a continuous form called continuous ILC. It saves the continuous signals of the input, the output and the desired trajectory into a memory, and generates the continuous input by using intact continuous information. However, in practice, saving all the continuous history information requires considerable storage. Therefore, discretetime ILC was developed to address this issue. As the name suggests, discrete-time ILC involves design and analysis in discrete form. This approach significantly reduces the amount of storage andcalculation,aswellasensuresthecontroleffect.itssimple and convenient design has attracted the interest of researchers suchasthosefocusingonstochasticsystems(oh&lee,2015), quantised problem (Shen & Xu, 2016; Xu& Shen, 2016), data dropouts problem (Shen & Wang, 2015a, 2015b) and random iteration length problem (Shen, Zhang, Wang, & Chien, 2016). However, the controlled plant is often a continuous-time system rather than a discrete system. Thus, scholars further consider so-called sampled-data iterative learning control (SDILC). SDILC designs discrete-time ILC law for continuous-time systemsbasedonthesampleddata(abidi&xu,2011;chien,1997; Oomen, Wijdeven, & Bosgra, 2007; Sun& Wang, 2001). The results on SDILC are rather limited and most existing papers focusedonconvergenceatthesamplinginstants,whiletheintersampleperformanceislessevaluated.notethattheconvergence at the sampling instants can be treated as that of the discretetime ILC, which is easy to guarantee; however, the inter-sample convergence is much harder to achieve because the control signal is constant rather than continuously varying for the specified interval. In fact, fairly few results can be found on the intersample tracing performance in the ILC field. While the issues of the existing publications related to SDILC have covered delays, initial shifts, bounded disturbances and other traditional ILC issues, in-depth study on the analysis and synthesis of SDILC stillrequiresmoreefforts.inthefollowing,wegiveabriefliterature review on SDILC in the light of ernel issues. Chien et al. focusedontheeffectofcombiningthecurrent feedbac mechanism with ILC under bounded disturbances or noises. Specifically, for linear systems, Chien and Tai (2004) and Chien, Hung, and Chi (2014) introduced the sampled tracing error of the current batch into ILC algorithm, where the bounded convergence at sampling instants was established. Chien and Ma (2013) showed that the convergence rate was increased when the feedbac controller was incorporated with CONTACT Dong Shen shendong@mail.buct.edu.cn 2017 Informa UK Limited, trading as Taylor & Francis Group

161 2 Y. XU ET AL. the feedforward ILC. Similar results for nonlinear systems were given in Chien (1997, 2000) and Chien, Wang, and Chi (2014). Sun et al. contributed a series of papers on affine nonlinear systems with arbitrary relative degree (Sun & Wang, 2000, 2001) andinitialshifts(sun,li,&zhu,2013; Zhu,He,& Sun, 2006). For nonlinear systems with a well-defined relative degree, Sun, Wang, and Wang (2004) used lower order differentiations of tracing error with the order less than the relative degree to generate an input sequence. Moreover, Sun and Wang (2000, 2001), Sun et al. (2004, 2013) and Zhu et al.(2006) showed bounded convergence under bounded disturbances and bounded initial shifts. The initial rectifying mechanism for initial shifts was discussed in Sun et al. (2013) and Zhu et al.(2006) under the SDILC framewor. In addition, Sun et al. (2013) proposed a novel varying-order algorithm differing from other papers. Xu et al. conducted a relatively basic and comprehensive study on the design and analysis of SDILC, featured with the frequency domain method (Abidi & Xu, 2011; Huang,Xu, Venataramanan, & Huynh, 2014; Xu, Abidi, Niu, & Huang, 2012). It was found that a monotonic convergence condition can be derived more easily in the frequency domain than in the time domain. Criteria for the selections of learning type and sampling time were presented in Abidi and Xu (2011). An experiment on a piezoelectric motor was detailed in Xu et al. (2012). The closedloop feedbac controller was included in SDILC design, and this controller outperformed well-tuned open-loop and PI control algorithms (Huang et al., 2014). Traditional ILC problems have also been attempted for SDILC such as time delays (Fan, He, & Liu, 2009), optimal control (Zhou, Tan, Oetomo, & Freeman, 2013) and singular systems (Sun, Fang, & Han, 2002). However, it should be pointed out that these attempts are just at the first step and many efforts are desiderated for their perfection. From the above papers, we find that most existing research focusedontheconvergenceatthesampleinstants,buttheperformance in sampling intervals is less evaluated. However, when considering the sampled control of the continuous-time system, perfect tracing means not only good at-sample performance but also satisfactory inter-sample behaviour. This is one major difference between sampled-data control and discrete-time control.specifically,intheilcfordiscrete-timesystems,thecontrol objective is to ensure the precise tracing at the specified time instants, while the optimal sampled-data ILC should pay attention to both the at-sample and inter-sample tracing performance simultaneously. To solve this problem, Oomen et al. presented an optimal multirate ILC criterion under the closed-loop multirate ILC set-up and provided an experiment on a wafer stage system to illustrate better inter-sample behaviour (Oomen et al., 2007; Oomen,Wijdeven, &Bosgra,2009, 2011). However, the topic is far away from complete as many related issues such as the design and analysis of irregular sampled-data ILC are still open. This is the first motivation of this paper. That is, we aim to provide an evaluation of the inter-sample tracing performance. Moreover, it is natural that high sampling rate results in rich data, which further leads to precise tracing performance. Therefore, in some applications, it is straightforward to use the fastest sampling rate up to the limit of the hardware. However, such simple mechanism may lead to a great waste of sampling cost and computation burden for some practical systems, especially for slow varying systems such as chemical reaction processes. Specifically, some reaction process is quite slow and thus one operation would run for a long period. The fast sampling for such ind of slow varying systems yields an excessive cost in sampling, storage and computation. In fact, the sampling rate is usually slow to save cost. Nevertheless, slow sampling may result in that the inter-sample behaviour is not so good as expected. Thus, it is of great interest to consider the trade-off between sampling cost and tracing performance. This is the second motivation of this paper. That is, we aim to propose an irregular sampling technique to balance the conflict between sampling cost and tracing performance. This paper contributes to an evaluation of the inter-sample tracing error and a varying sampling technique to construct an irregular sampled-data ILC for ensuring the prior given tolerant control performance. Specifically, the technical contributions of this paper consist of two parts. First, for high accuracy tracing of continuous-time systems, the upper bounds of inter-sample tracing errors are first evaluated provided that the zero-error at-sample tracing has been precisely achieved, which implies that a smaller sampling period corresponds to a smaller intersample tracing error. The cases for both linear and nonlinear systems are discussed separately, and the evaluation for linear systems is tighter than that for nonlinear systems as more uncertainties are involved for the latter case. Second, to balance the conflict between sampling cost and tracing performance, a practical varying sampling technique is introduced on the basis of the above observations. Specifically, we start from a slow sampling rate and increase the rate for part of the intervals according to the inter-sample tracing performance. The associated SDILC is proposed to ensure the good/tolerant tracing performance for both at-sample and inter-sample cases. Two illustrative examples are also detailed to show the effectiveness oftheproposedstrategy.itisworthpointingoutthatanalternative method for increasing the tracing performance is to compute the virtual sampled data for the points located in the sampleintervalandusethesevirtualsampleddataforinput updating. However, such method would need the system information for computing the virtual information. Moreover, if system noises and/or uncertainties are involved, the computation errors should be well addressed. Our method in this paper is to adjust the actual sampling rate. Therefore, the available data aremoreaccurateandtheproposedsdilcisadata-driven method. The rest of the paper is organised as follows. Section 2 provides the problem formulation. Sections 3 and 4 derive the expression of maximum interval tracing error for both linear and nonlinear systems, respectively. Section 5 details a varying sampling rate strategy and its associated SDILC. Two practical systems illustrate the effectiveness of the proposed method in Section6. Section7 concludes this paper. Notations. R denotes the set of real numbers, and R n denotes the n-dimensional space. N denotes the set of non-negative integers. For a vector x, without further specifications, x denotes the Euclidean norm and for a matrix M, M is the induced norm.

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 3 Problem statement: Given a desired trajectory y d (t), and a tolerant bound of the tracing error ϵ for the continuous-time system, the control object is to

162 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 3 Problem statement: Given a desired trajectory y d (t), and a tolerant bound of the tracing error ϵ for the continuous-time system, the control object is to assign a proper sampling strategyandaproperilclawsothatthegeneratedinputsequence u (j ) could ensure that the maximal tracing error during the wholetimeintervalisnolargerthanthegivenboundϵ after sufficient learning iterations. That is, max t y d (t) y (t) ϵ, t [0, T], K where K is a sufficiently large integer. Figure 1. Bloc diagram of SDILC. 2. Problem formulation Consider the following single-input single-output nonlinear continuous-time system: ẋ (t) = f (x (t), u (t)), y (t) = g(x (t)), where denotes different iterations, t denotes the time axis, t [0, T], and T is the iteration length. u (t) R p, y (t) R q and x (t) R n are the input, output and state, respectively. The functionsf( ), g( ) are smooth in their domain of definition. As a special case of Equation (1), the linear system is formulated as follows: ẋ (t) = Ax (t) + Bu (t), y (t) = Cx (t), where A, B and C are system matrices with appropriate dimensions. The continuous-time system can be well handled if the continuous differential of tracing error is used for update law. However, it is hard to obtain the continuous differential signal in many practical applications. Meanwhile, when computers are involved in the control design, it is reasonable to generate the control signals in discrete-time form. Thus, the sampled-data controlisapromisingwaytofillthegap.toperformthisimplementation, a sampler and a holder are introduced to achieve the A/D and D/A transformation. The bloc diagram of SDILC is given in Figure 1, wherea sampler is implemented at the output side to generate the sampled output, the learning controller produces the discrete input for the next iteration using the stored discrete input values and sampled outputs as well as the reference trajectory, and a holder is adopted to regain the continuous input signal for the controlled system. In this paper, the sampling period is denoted by.without loss of any generality, it is assumed that T/ is an integer and denote N T/ as the number of sampling instants. Then, only the output at the time instant j, wherej is a positive integer, is sampled for input updating. Therefore, we could only obtain the input signal at j, j N. To generate the continuous control signal, the zero-order holder is adopted as follows: (1) (2) u(t) = u( j ), t [ j, j + ). (3) Remar 2.1: Thecontrolobjectiveistoguaranteeatolerant bound of tracing errors after enough learning iterations. Thus, it is an objective in the sense of limitation. In other words, an appropriate sampling strategy should be designed such that lim e (t) ϵ for arbitrary t valued in [0, T] ratherthan only at the sampled time instants. However, in practical applications, verifying the limiting performance is difficult. Consequently, we could set a prior acceptable iteration number as the sufficient learning iterations. In addition, the tracing error at sampling time instants would affect the inter-sample behaviour, while in practical application, it is difficult to ensure the zeroerror tracing performance in finite iterations. Thus, given the upper bound of inter-sample tracing bound ϵ, wesetthetol- erant error at sampling time instants to be far smaller than the given bound so that perfect tracing at sampling instants is approximately achieved after finite learning iterations. Details are illustrated in Section 6. Remar 2.2: Even if the tracing errors at sampling time instants are infinitesimal or approach zero, some intervals may exist where the tracing performance is not satisfactory, especially when the sampling frequency is slow. To guarantee the desired tracing results, the sampling frequency needs to be adjusted in those time intervals. This motivates us to consider the nonuniform sampling, i.e. different sampling rates exist in an iteration. Such sampling mechanism can efficiently balance the conflict between sampling cost and tracing performance. Details are shown in Section 5. The following assumptions are required: Assumption 2.1: The nonlinear functions f(x) and g(x) satisfy the globally Lipschitz condition in the definition domain, i.e. f(x 1,y 1 ) f(x 2,y 2 ) f 1 x 1 x 2 +f 2 y 1 y 2 and g(x 1 ) g(x 2 ) g x 1 x 2. Assumption 2.2: Theinitialstateispreciselyresetforalliterations, i.e. x (0) = x d (0). Assumption 2.3: The desired trajectory y d (j ) and y c (t) is realisable, where y d (j ) denotesthesampledoutputvaluesofy c (t). That is, for an appropriate initial state x d (0), a unique input series u d (j ), j N,calledstepsignals,existssuchthattheoutputof(1) or (2) equals y d (j ) at sampling times instants. In addition, for an appropriate initial state x d (0), thereexistsauniqueinputu c (t), called continuous signals, such that the output of (1) or (2) equals y c (t) for all t [0,T]. Remar 2.3: We give some brief remars on the assumptions, which are all basic conditions and widely used in most ILC papers. Assumption 2.1 gives the globally Lipschitz condition of the nonlinear functions, which is essential in ensuring the stability of the original continuous-time system. It has been illustratedinchapter7ofxuandtan(2016) thatthesystem

163 4 Y. XU ET AL. may have finite escape time so that the tracing problem cannot be solved. This example implies the necessity of Assumption 2.1.Moreover,Assumption 2.2 is a basic conditionto guarantee the space and time repetition of the system. While relaxations of Assumption 2.2 have been discussed in many papers, the objective of this paper is to propose some fundamental results for the error evaluation. Thus, we use Assumption 2.2 to avoid additional influences of initial error on the inter-sample tracing errors. Finally, Assumption 2.3 provides the existence condition of input solution to the desired reference so that the following SDILC procedures are well defined. 3. Tracing error bound for linear system case The tracing error in sampling intervals under the step input signal u d (j ), t [j, j + ), for linear systems defined by Equation (2)isanalysedinthissection.Thefollowingtheorem presents the upper bound of the tracing error in a sampling interval. Theorem 3.1: If the linear system (2) samples uniformly with a period and wors under the desired step input signals u d (j ), then for any sampling interval [j,j + ),themaximumvalue of the tracing error norm profile e(j + t ) satisfies e( j +t ) C e A t B μ( j )t < C e A B μ( j ), where μ(j ) = max t u c (j + t) u d (j ), t [0, ], provided that the tracing errors at sampling time instants are zero. Here, the maximum tracing error within the sampling interval [j,j + ) isassumedtobeattimej + t. Proof: In a sampling period [j, j + ), the inter-sample tracingerrorwiththeuseofstepsignalscanbeexpressedas follows: e( j + t) = y c ( j + t) y d ( j + t) = Cx c ( j + t) Cx d ( j + t) [ j +t ] = C e At x c ( j ) + e A( j +t τ) Bu c (τ )dτ [ C e At x d ( j ) + j +t = C = C j t 0 j j +t j (4) ] e A( j +t τ) Bu d ( j )dτ e A( j +t τ) B [ u c (τ ) u d ( j ) ] dτ e A(t τ) B [ u c ( j + τ) u d ( j ) ] dτ, (5) where x c (j ) = x d (j ) isduetothepriorconditionthatthe zero-error tracing at sampling time instants has been achieved. According to the mean value theorem, Equation (5) can be written as = e( j + t) t 0 Ce A(t τ) B[u c ( j + τ) u d ( j )]dτ = Ce A(t t ) B[u c ( j + t ) u d ( j )]t, (6) where 0 t t. Taing norms to both sides of Equation (6) yields e( j + t) C e A t t B u c ( j + t ) u d ( j ) t < C e A t B μ( j )t, (7) where μ(j ) = max t u c (j + t) u d (j ), t [0, ]. Obviously, the value of C e A t B μ(j )t monotonically increases as t increases. Therefore, the above inequality renders the following inequality: e( j + t) < C e A t B μ( j )t < C e A B μ( j ). (8) For the absolute value of tracing error, i.e. e(j + t),atleast onemaximumvalueexistsin(j, j + ) duetothecondition e(j ) = 0, e(j + ) = 0 and the interval length is finite. Assume that j + t is the time when the maximum is achieved, then e( j + t ) C e A t B μ( j )t < C e A B μ( j ). This completes the proof. Remar 3.1: It has been proved in Sun and Wang (2001) that the desired discrete input sequence can ensure asymptotic zero-error tracing at sampling instants. Thus, in order to give the essential bound of the inter-sample tracing error, we assume that all tracing errors at the sampling time instants are zero. Then, we have e( j + ) = C 0 ea(t τ) B(u c ( j + τ) u d ( j ))dτ = 0. Suppose that u d (j ) is always larger (or smaller) than the upper (or lower) bound of u c (j + t), the exact tracing at sampling point j + is not realisable, i.e. e(j + ) 0. This contradiction implies that min 0 τ u c (j + τ) < u d (j ) < max 0 τ u c (j + τ). In addition, u c (j + τ) iscontinuousintheinterval0 τ, thus, according to the mean theorem, there must exist at least one time instant t,0< t <,suchthatu d (j ) = u c (j + t ). Note that in practical implementations, it is difficult to ensure the zero-error tracing within finite iterations, thus we would use a sufficient small threshold instead of zero to suspend the algorithm (see Section 5 for details). 4. Tracing error bound for nonlinear system case The estimation of the inter-sample tracing error bound for nonlinear systems is analysed in this section. It should pointed out that, comparing with the linear system case, more uncertainties are involved for the nonlinear system case, thus the estimation for nonlinear systems would be rougher than that for linear systems. This conjecture is verified in the following. Theorem 4.1: If the nonlinear system (1) samples uniformly with asmallperiod and wors under the desired step input signals u d (j ), then for any sampling interval [j, j + ), the maximum value of the tracing error norm profile e(j + t ) (9)

164 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 5 satisfies e( j + t ) g f 2μ( j )t < g f 2μ( j ) 1 f 1 t 1 f 1 (10) where μ(j ) = max t u c (j + t) u d (j ),t [0, ],provided that the tracing error at sampling time instants are zero. Here, the maximum tracing error within the sampling interval [j, j + ) isassumedtobeattimej + t. Proof: In a sampling period [j, j + ), the interval tracing errorwiththeuseofstepsignalscanbeexpressedasfollows: e( j + t) = g(x c ( j + t)) g(x d ( j + t)), (11) and by Assumption 2.1 e( j + t) g x c ( j + t) x d ( j + t). (12) In a sampling period [j, j + ), the inter-sample system states can be written as t x( j + t) = x( j ) + f (x( j + τ),u( j + τ))dτ. 0 (13) According to the mean value theorem, the difference of states x c (j + t)andx d (j + t)isgivenby x c ( j + t) x d ( j + t) = t 0 ( f (x c ( j + τ),u c ( j + τ)) f (x d ( j + τ),u d ( j )))dτ = [ f (x c ( j + t ), u c ( j + t )) f (x d ( j + t ), u d ( j + t )) ] t (14) where 0 t t. Taingnormstobothsidesofthelastequalityandapplying the Lipschitz conditions, we obtain and Define x c ( j + t) x d ( j + t) [ f 1 x c ( j + t ) x d ( j + t ) + f 2 u c ( j + t ) u d ( j + t ) ] t. (15) λ = max x c ( j + t) x d ( j + t), t [0, ], (16) t μ = max u c ( j + t) u d ( j ), t [0, ]. (17) t The expression (15)holdsforallt in the interval [0, ), which implies λ ( f 1 λ + f 2 μ)t. (18) If the sampling interval is small enough such that 1 f 1 >0, then by rearrangingequation (18), we obtain λ f 2μt 1 f 1 t. (19) Using definitions (16)and(17), we can rewrite Equation (12)as e( j + t) g f 2μt 1 f 1 t. (20) Assume that the maximum appears at time j + t within the sampling period [j, j + ), we have e( j + t ) g f 2μt 1 f 1 t. (21) ToindicatethesecondinequalityinEquation(10), the monotonic property of the function h(t) = gf 2μt 1 f 1, t [0, ] needs to t be examined. Note that d dt h(t) = which implies that function h(t) = gf 2μt 1 f 1 t is tenable. This com- increasing and therefore g f 2μt 1 f 1 t pletes the proof. gf 2μ > 0, (22) (1 f 1 t) 2 < g f 2μ 1 f 1 is monotonically Remar 4.1: To ensure the validity of Equation (19), the denominator part should be larger than zero, i.e. 1 f 1 t > 0(ort < 1/f 1 ). To satisfy this condition, should be small enough to satisfy < 1 f 1. It is worth mentioning that it is a sufficient requirement of sample interval rather than a necessary condition. However, note that we will eep increasing the sampling rate if the inter-sample tracing error is not satisfactory, thus the length of sample intervals decrease so that there always exists a suitable guaranteeing the validity of Equation (19). Remar 4.2: Different analysis methods are applied for linear and nonlinear systems. For liner systems, the state equation is solvable and analytical as shown in Equation (5), and then the tracing error can be directly expressed by system matrix A, B and C asshowninequation(6). But the situation is not valid for nonlinear systems. In fact, the state analytical solution is nearly impossible to obtain. Therefore, the difference of states x c (j + t)andx d (j + t)hastobederivedfromthemeanvaluetheorem as given in Equation (14). Moreover, the relationship between system outputs and states is nonlinear, leading to that norm of the difference e(j + t) is bounded by parameters f and g.in short, the nonlinear system formulation has more uncertainty in establishing the estimated error bound. Remar 4.3: Since the nonlinear system formulation introduces more uncertainties in evaluating the estimation of the error bound, it is expected that the estimation bound for the linear system is tighter than that for the nonlinear system. To show this, we restrict the estimation (10) to the linear system, then it is evident that f 1 = A, f 2 = B,andg = C for the linear case. In such situation, the distinctions of the upper bounds for linear and nonlinear systems are e A in Equation (23) and 1 1 A

165 6 Y. XU ET AL. in Equation (10), respectively. Note that canbeselectedtobe smallenoughtoensurethepositivenessof1 A according to Remar 4.1. Define an auxiliary function q(x) = e x (1 x). It is evident that q(0) = 1and q(x) = xe x.thus q(x) < 0forx > x x 0, implying that q(x) is monotonically decreasing for x > 0. This fact further yields that q(x) < q(0) = 1forx > 0, which implies e x < 1 for 0 < x < 1. Consequently, the estimation for linear 1 x systems is tighter than that for nonlinear systems. Remar 4.4: Both linear and nonlinear cases show that a small sample period corresponds to small intersample tracing errors. The validity of the linear case is evident. Here, we verify it for the nonlinear case, that is, we show the last term of Equation (10)is monotonically increasing with respect to. Tothisend,,we denote ē( ) = g f 2μ( j ) 1 f 1, t [0, ], as a function of.then, the derivative of this function is given as follows: dē( ) d = gf 2μ( j ) (1 f 1 ) 2 + gf 2μ ( j ) 1 f 1, (23) dμ( j ) where μ ( j ) = d.roughlyspeaing,μ (j ) isusually non-negative because the maximum error μ(j ) is nondecreasing along with the increase of interval length, which implies that gf 2 μ ( j ) 1 f 1 0. Therefore, the derivative dē( ) is positive. Consequently, the monotonic increase property with respect to the d interval length is also valid for the nonlinear system case. 5. Varying sampling rate strategy With sufficient learning iterations and a proper ILC design, the asymptotical convergence at sampling time instants is easy to achieve, while controlling the inter-sample tracing behaviour is difficult. It is impossible to achieve zero-error convergence in the sampling intervals unless the controlled plant wors under continuous input signal u c (t). Therefore, what we can do is to reduce tracing errors in sampling intervals and guarantee it within the given bound. Sections 3 and 4 indicate that a high sampling rate corresponds to a small inter-sample tracing error bound. Through increasing the sampling rate, the maximal inter-sample tracing error would show a decreasing tendency. This observation is also used to derive the so-called sampling rate-dependent technique (Saab & Touhtarian, 2015). Consequently, a simple idea for our control objective is to increase the sampling rate so that the interval length is decreased and the inter-sample behaviour is improved. However, increasing sampling rate is not always practical because an excessively high sampling rate usually leads to a waste of computation time, energy and storage space. On the contrary, lower rate sampling means less calculation cost, but lower rate sampling may result in a larger inter-sample tracing error no matter how many learning iterations are performed. To balance the tracing performance and computation cost, a varying sampling rate strategy is presented in this section. By varying sampling rate strategy, we mean that the sampling rate varies during different time sections of the entire iteration. To be specific, we first set a low sampling rate and the entire operation is divided into several time intervals. Then, after enough learning iterations, we chec the inter-sample behaviour to determine whether the maximal tracing error is larger than the given bound. If the maximal tracing error of an appointed intervalissmallerthanthegivenbound,thenthesamplingrate does not need to be increased for this interval further. Otherwise, we increase the sampling rate to divide the interval into several subintervals to further improve the performance. A high sampling rate is expected to be applied to the necessary sections only during the entire operation to reduce computation load and improve tracing performance. The detailed implementation of the proposed varying sampling rate strategy is given as follows. Practical verifications are given in the next section. Step 1 The sampling period is initiated based on woring experience, and then N sampling instants are given, where N = [T/ ]. Step 2 Design a discrete or sampled ILC update law based on current sampling rate, where the control target is to achieve the zero-error tracing performance at the sampling time instants. Step 3 At each iteration, the maximal at-sample error is checed to determine whether the maximum error is smaller than pϵ or not. That is, we chec whether max j e (j ) pϵ is true or not. If true, then go to Step 4;otherwise,goto Step 2.Notethatp (0, 1) is a prior given small enough constant. Step 4 For each time interval, chec whether the maximum error in the interval is smaller than ϵ. Thatis,wechec whether max t [j, j + ] e (t) ϵ is true or not. If it is true, then no changes need to be made. If not, then the sampling rate is increased for the unsatisfactory intervals. A simple rate increase mechanism is to double the sampling rate for all unsatisfactory intervals, that is, we further divide the selected interval into two identical pieces. Step 5 If the validation in Step 4 is satisfied for all intervals, i.e. thecontrolobjectiveisachieved,thenterminatethealgorithm. Otherwise, go to Step 2. The tracing error could be sufficiently reduced as long as the sampling rate is sufficiently high. Therefore, through increasing the sampling rate at certain time intervals and learning sufficient iterations, it is expected to obtain a smaller tracing error than the given error bound during the entire running time [0, T], i.e. y d (t) y (t) ϵ, t [0, T]. The rate adjusting might occur several times. Then, the collection of iterations with the same sampling distribution is called a stage in the following. As a result, the learning process usually comprises several stages before the tracing error becomes small enough. For an arbitrary tolerant error bound, the suitable input sequence based on samplingdatamaybeofthefollowingform: u ( j 1 1 ), t [ j 1 1, j ) [0, t 1 ), j 1 = 0, 1, 2,..., 1 u ( j 2 2 ), t [t 1 + j 2 2, t 1 + j ) u (t) = [t 1, t 2 ), j 2 = 0, 1, 2,..., 2. u ( j m m ), t [t m 1 + j m m, t m 1 + j m m + m ) [t m 1, T], j m = 0, 1, 2,..., m (24)

166 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 7 where i is the sampling period of the ith section, and m denotes the number of sections constituting operation length. Different sections can have different sampling rates, that is, some sections can be sampled roughly, while other sections are sampled densely. Remar 5.1: To illustrate the design of update law in Step 2,we consider the linear system (2) as an example here. The sampleddata dynamic equation can be described as x ( j + ) = x ( j ) + Ɣu ( j ), y ( j ) = Cx ( j ). (25) The relative degree of the sampled-data system is assumed to be one, and the following P-type ILC is adopted to achieve zeroerror tracing performance at the sampling time instants: u +1 ( j ) = u ( j ) + Le ( j + ), j N. (26) The convergence condition can be formulated as I LCƔ < 1, (27) where L isthelearninggainofilc.notethatthecontrolmatrix Ɣ depends on the sampling period. Thus, the convergence condition (27) and tracing performance are not always satisfied. Once Ɣ changes, the learning gain should be changed accordingly. Moreover, to guarantee the monotonic convergence, a new type of ILC can be designed based on the design technique in Abidi and Xu (2011). For example, D-type ILC is suitable if the transfer function P(z) is second-order with a single integrator, and D 2 -type ILC is suitable for either a second-order or a thirdorder P(z) with two integrators. On the other hand, the matrix Ɣ determines the relative degree of the sampled-data system. There may exist one sampling frequency in which the relative degree of the sampled-data system is lager than one. Then, the convergence condition (27) becomesinvalid.insuchcase,the tracing error e (j + ) shouldbereplacedbye (j + r ), where r denotes the corresponding relative degree. The design and analysis of these algorithms have been presented in previous studies (Abidi & Xu, 2011; Chien, 1997, 2000; Chien& Ma,2013; Chien & Tai, 2004; Chien, Wang, & Chi, 2014; Chien,Hung,& Chi,2014; Huangetal.,2014; Sun& Wang,2000, 2001; Sunetal., 2004; Xuetal., 2012; Xu, Huang, Venataramanan, & Tuong, 2013). Remar 5.2: ThetypeofILCalgorithmsandselectedlearning gains have significant influence on the convergence speed and tracing performance. That is, different ILC algorithms and/or different gains can lead to different learning iteration numbers and computation load. However, this subject is beyond the scope of this paper. Therefore, in Remar 5.1, we simply adopt the traditional P-type learning algorithm as an illustration. This ILC scheme is also used in the following simulations (see the next section for more details). One may argue that the traditional P-type learning law may result in poor transient performance. Consequently,moreeffortsarerequiredforfurtheranalysis,and many open problems remain for the design, analysis and optimisation of SDILC. Remar 5.3: To determine exactly whether the tracing error by a certain sampling strategy is satisfactory, sufficient trials should be prior performed with the ILC update law until the tracing errors at sampling instants approach zero. This condition is the reason why we choose p (0, 1) small enough in Step 3,thatis, asmallp means the degree that the input sequence approaches sufficiently to u d (j ) when the condition is satisfied. The value of p canbevariableindifferentstages. Remar 5.4: Using the strategy presented above, we could always find a suitable sampling rate distribution to ensure that the maximal tracing error during the whole time interval is smaller than the given bound ϵ. On one hand, smaller sampling period maes smaller tracing error bound. On the other hand, the inequality g f 2μ( j ) 1 f 1 ε leads to ε gf 2 μ( j ) + f 1 ε. (28) From the inequality (28), we have the sufficient value of the sampling period that guarantees the tolerant tracing performance, while the actual generated sampling period may be larger than this value. 6. Numerical experiments Two examples are given in this section to illustrate the effectiveness of our results. One is a linear system case and the other one is a nonlinear system case Linear system case Consider a piezomotor stage studied in Abidi and Xu (2011). The driver and motor can be molded approximately as ẋ 1 (t) = x 2 (t), ẋ 2 (t) = f ν M x 2(t) + f M (t), y(t) = x 1 (t), (29) where x 1 is the motion position, x 2 is the motion velocity, M = 1gisthemovingmass, fν = 144 N is the velocity damping factor and f = 6 N/V is the force constant. The desired trajectory is given as y d (t) = sin (2πt π/2) and the iteration length is 0.5 s. Here, the tolerant tracing error is set as ϵ = For simplicity and effectiveness, the traditional P-type ILC is adopted in the following simulation. A detailed convergence analysis is given in Abidi and Xu (2011). According to the implementation in Section 5,we first sample five instants uniformly during the entire running time. For the first stage, the learning gain is selected as 250. The parameter p is set to The performance at sampling instants is displayed in Figure 2,wherethelinedenotesthemaximalerroralongtheiteration axis. The maximal tracing error at the sampling instants approximates zero at the ninth iteration for the first stage. Then, we chec the inter-sample behaviour, which is observed to be poor (as can be seen in Figure 3). Thus, the sampling rate is then increased. As a result, the maximal error leaps at the 10th iteration. The similar leap occurs at the 34th iteration where

167 8 Y. XU ET AL. maximal tracing error at sampling instants stage 1 stage 2 stage 3 pε iteration Figure 2. Maximal tracing error at sampling instants of linear system. system outputs desired trajectory 9th iteration: 5 ponits 33rd iteration: 10 points 60th iteration: 18 points t/ms Figure 4. Output profiles at different stages tracing error 1.2 x th iteration :5 ponits 33rd iteration :10 points 60th iteration :18 points error bound t/ms Figure 3. Tracing performance of linear system using the varying sampling technique. thesamplingratedistributionsareincreasedagainforseveral intervals. Figure 2 shows the maximal tracing error at sampling instants of the three learning stages according to different samplingratesanddistributions.specifically,thedashedline,dotand-dash line, solid line and dotted line denote stages 1, 2, 3, and error bound multiplied by parameter p, respectively. In addition, the given tracing condition on the sampling instants is satisfied at the 9th, 33rd and 60th iterations according to different stages. The parameters p for stages 2 and 3 are set to 0.5 and 0.85, respectively. The learning gains for stages 2 and 3 are set to 250 and 600, respectively. Figure3 shows the tracing error profiles along the time axis atthelastiterationofeachstage.specifically,thedashedline, dot-and-dash line, solid line and dotted line are the tracing error profiles of the 9th, 33rd and 60th iterations, and the given tolerant tracing error, respectively. From Figure 3, we can see that all the maximal inter-sample errors of each interval at the ninth iteration exceed the given bound ϵ. Therefore, all intervals are divided into two pieces, i.e. the sampling rate is doubled in stage 2. That is, the sampling period is 0.05 s, and 10 sampling instants are chosen in stage 2. After learning of another 24 iterations, i.e. at the 33rd iteration, the tracing error profile is displayed by the dot-and-dash line in Figure 3. The tracing error profiles of the fifth and sixth sampling intervals are smaller than ϵ. Thus,theinputsofthese two intervals no longer need to be updated in the following iterations. However, the tracing error profiles of the left intervals are yet unsatisfactory. Thus, the sampling rates in these intervals are doubled again. As a result, the sampling instants increase to 18 in stage 3. After another 27 iterations, the maximal tracing error profile of the entire iteration does not exceed the given bound ϵ at the 60th iteration. ThesystemoutputsareshowninFigure 4, where the dashed line,dot-and-dashlineandsolidlinedenotethesystemoutputat the 9th, 33rd and 60th iterations, respectively. The dotted line is the desired trajectory. Figure 4 shows that the output at the ninth iteration does not trac the desired trajectory well, whereas the output at the 60th iteration almost coincides with the desired trajectory. These findings show the effectiveness of the proposed strategy Nonlinear system case A DC-motor driving a single rigid lin through a gear is used as an example for the nonlinear system case. The dynamics is the same as that given in Wang (1998), (( J m + J ) ( l θ n 2 m + B m + B ) l θ n 2 m + Mgl ( ) n sin θm = u, n (30) andthelinanglepositionisrelatedtothemotorangleas θ l = θ m /n, (31) where θ m, J m, B m and θ l, J l, B l are the motor and lin angles, inertia and damping coefficients, respectively; n is the gear ratio; u is the motor torque; M isthelumpedmass;andl is the center of mass from the axis of motion. These parameters are given as J m = 0.3, B m = 0.3, J l = 0.44, B l = 0.25, M = 0.5, g = 9.8, n = 1.6, l = θ m and θ m are chosen as state variables. The output is θ l,and let the desired trajectory be y d (t) = 1 6 πt πt3, t [0, 3].

168 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 9 maximal tracing error at sampling instants stage 1 stage 2 pε iteration Figure 5. Maximal tracing error at sampling instants of nonlinear system. systerm outputs desired trajectory 16th iteration: 15 points 52nd iteration: 28 points x t x 10 4 Figure 7. Output profiles at different stages tracing error 6 x th iteration:15 points 52nd iteration:28 points error bound t x 10 4 Figure 6. Tracing performance of nonlinear system using the varying sampling technique. Thetracingerrorboundissetasthemillesimalofmax y d (t), i.e. ϵ = In the simulation, the classical P-type ILC is adopted, and the learning gain is set to 2. The parameter p is set as p = We first divide the entire operation into 15 intervals. By using the varying sampling technique, the total learning process consists of two stages, where 15 and 28 sampling instants exist for the former stage (stage 1) and the latter stage (stage 2), respectively. The sampling positions are automatically selected by the algorithm itself similar to the linear system case. The performance at sampling instants is displayed in Figure 5 along the iteration axis, where the dot-and-dash line, the solid line and the dotted line denote the maximal tracing error profiles of stages 1 and 2, and error bound multiplied by parameter p, respectively. The at-sample tracing performance is satisfied at the 16th and the 52nd iterations for the two stages, respectively. Figure 6 shows the entire tracing performance along the time axis, where the dot-and-dash line, the solid line and the dotted line denote the maximal tracing error profiles at the 16th and the 52nd iterations, and the given tolerant tracing error, respectively. From Figure 6, one can see that two intervals exist, whose inter-sample tracing error profiles are smaller than the given bound ϵ at the 16th iteration. As a result, the input signals for these intervals are no longer updated in the following iterations. The other intervals are unsatisfactory in the first stage, and thesamplingrateisdoubledfortheseintervals,therebyproducing 28 sampling instants in the second stage. After another 36 learning iterations, the entire tracing error profile is acceptable at the 52nd iteration. Similar to the linear system case, we show the tracing performance of the 16th and 52nd iterations in Figure 7,wherethedotand-dashline,thesolidlineandthedottedlinedenotetheoutput profiles at the 16th and the 52nd iterations, and the desired trajectory, respectively. The tracing at the 52nd iteration is very good. In short, the proposed algorithm is also effective for nonlinear systems. 7. Concluding remars In this paper, the inter-sample errors are analysed, and their upper bounds are given. On the basis of these upper bounds, a practical SDILC with varying sampling rates is proposed. The system samples quicly if a large inter-sample tracing error exists in last stage, whereas the sampling frequency is slow if theinter-sampletracingerrorisacceptable.forthispurpose, a maximal/tolerant tracing error is given first as the control objective. The algorithm starts with a low sampling rate, and then it improves the tracing performance at sampling instants to be of sufficient precision by using the sampled data based on the learning mechanism. The inter-sample behaviour is then checed to determine which intervals are not satisfactory. The sampling rate is increased for these intervals. Repeating these steps ensures that the whole tracing objective could be well achieved. Two examples demonstrate the effectiveness of our strategy. For the further research, it is of great interest to deep the theoretical analysis. Disclosure statement No potential conflict of interest was reported by the authors.

Notes on contributors Yun Xu received her B.S. degree in Automation from beijing Institute of Petrochemical Technology, China, in 2014. Now she is pursuing a M.S. degree at College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China.

He received his Ph.D. degree in Mathematics from the Academy of Mathematics and System Science, Chinese Academy of Sciences (CAS), Beijing, China, in 2010.

169 10 Y. XU ET AL. Funding This wor is supported by National Natural Science Foundation of China [grant number ], [grant number ]; Beijing Natural Science Foundation [grant number ]. Notes on contributors Yun Xu received her B.S. degree in Automation from beijing Institute of Petrochemical Technology, China, in Now she is pursuing a M.S. degree at College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. Her research interests include sampleddata iterative learning control and adaptive iterative learning control. Dong Shen received his B.S. degree in Mathematics from Shandong University, Jinan, China, in He received his Ph.D. degree in Mathematics from the Academy of Mathematics and System Science, Chinese Academy of Sciences (CAS), Beijing, China, in From 2010 to 2012, he was a Post-DoctoralFellowwiththeInstituteofAutomation, CAS. From 2016 to 2017, He was a visiting scholar at National University of Singapore, Singapore. Since 2012, he has been an associate professor with College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. His current research interests include iterative learning controls, stochastic control and optimization. He has published more than 40 refereed journal and conference papers. He is author of Stochastic Iterative Learning Control (Science Press, 2016, in Chinese), co-author of Iterative Learning Control for Multi-Agent Systems Coordination (Wiley, 2016), and co-editor of Service Science, Management and Engineering: Theory and Applications (Academic Press and Zhejiang University Press, 2012). Dr Shen received IEEE CSS Beijing Chapter Young Author Prize in 2014 and Wentsun Wu Artificial Intelligence Science and Technology Progress Award in Youqing Wang received his B.S. degree from Shandong University, Jinan, Shandong, China, in 2003, andhisph.d.degreeincontrolscienceandengineering from Tsinghua University, Beijing, China, in He wored as a research assistant in the Department of Chemical Engineering, Hong Kong University of Science and Technology, from February 2006 to August From February 2008 to February 2010, he wored as a senior investigator in the Department of Chemical Engineering, University of California, Santa Barbara, USA. From August 2015 to November 2015, he was a visiting professor in Department of Chemical and Materials Engineering, University of Alberta, Canada. Currently, he is a full professor in Shandong University of Science and Technology and also Beijing University of Chemical Technology. His research interests include fault-tolerant control, state monitoring, modelling and control of biomedical processes (e.g. artificial pancreas system), and iterative learning control. He is an (associate) editor of Multidimensional Systems and Signal Processing and Canadian Journal of Chemical Engineering. He holds membership of two IFAC Technical Committees (TC6.1 and TC8.2). He is a recipient of several research awards (including Journal of Process Control Survey Paper Prize and ADCHEM2015 Young Author Prize). References Abidi, K., & Xu, J.X. (2011). Iterative learning control for sampled-data systems:fromtheorytopractice.ieee Transactions on Industrial Electronics, 58(7), Ahn, H.S., Chen, Y.Q., & Moore, K.L. (2007). Iterative learning control: Brief survey and categorization. IEEE Transactions on Systems Man and Cybernetics Part C, 37(6), Arimoto, S., Kawamura, S., & Miyazai, F. (1984). Bettering operation of robots by learning. Journal of Robotic Systems, 1(2), Bifaretti, S., Tomei, P., & Verrelli, C.M. (2011). A global robust iterative learning position control for current-fed permanent magnet step motors. Automatica, 47(1), Bristow, D.A., Tharayil, M., & Alleyne, A.G. (2006). A survey of iterative learning control. IEEE Control Systems, 26(3), Chien, C.J. (1997). The sampled-data iterative learning control for nonlinear systems. Proceedings of the IEEE Conference on Decision and Control, 5(5), Chien, C.J. (2000). A sampled-data iterative learning control using fuzzy networ design. International Journal of Control, 73(10), Chien, C.J., Hung, Y.C., & Chi, R. (2014). Design and analysis of current error based sampled-data ILC with application to position tracing control of DC motors. In 11th IEEE International Conference on Control and Automation (pp ). Taichung. Chien,C.J.,&Ma,K.Y.(2013). Feedbac control based sampled-data ILC for repetitive position tracing control of DC motors. In CACS International Automatic Control Conference (pp ). Nantou. Chien, C.J., & Tai, C.L. (2004). A DSP based sampled-data iterative learning control system for brushless DC motors. In IEEE International Conference on Control Applications (pp ). Taipei. Chien, C.J., Wang, Y.C., & Chi, R. (2014). Sample-data adaptive iterative learning control for a class of unnown nonlinear systems. In 13th International Conference on Control Automation Robotics Vision (pp ). Singapore. Fan,Y.,He,S.,&Liu,F.(2009). PD-type sampled-data iterative learning control for nonlinear systems with time delays and uncertain disturbances. In International Conference on Computational Intelligence and Security (pp ). Beijing. Huang, D., Xu, J.X., Venataramanan, V., & Huynh, T.C.T. (2014). Highperformance tracing of piezoelectric positioning stage using currentcycle iterative learning control with gain scheduling. IEEE Transactions on Industrial Electronics, 61(2), Ji, H., Hou, Z., & Zhang, R. (2016). Adaptive iterative learning control for high-speed trains with unnown speed delays and input saturations. IEEE Transactions on Automation Science and Engineering, 13(1), Oh, S.K., &Lee, J.M. (2015). Stochastic iterative learning control for discrete linear time-invariant system with batch-varying reference trajectories. Journal of Process Control, 36, Oomen, T., Wijdeven, J.V.D., & Bosgra, O. (2007). Design framewor for high-performance optimal sampled-data control with application to a wafer stage. International Journal of Control, 80(6), Oomen, T., Wijdeven, J.V.D., & Bosgra, O. (2009). Suppressing intersample behavior in iterative learning control. Automatica, 45(4), Oomen, T., Wijdeven, J.V.D., & Bosgra, O.H. (2011). System identification and low-order optimal control of intersample behavior in ILC. IEEE Transactions on Automatic Control, 56(11), Ouyang, P.R., Zhang, W.J., & Gupta, M.M. (2006). An adaptive switching learning control method for trajectory tracing of robot manipulators. Mechatronics, 16(1), Saab, S.S., & Touhtarian, R. (2015). A MIMO sampling-rate-dependent controller. IEEE Transactions on Industrial Electronics, 62(6), Shen, D., & Wang, Y. Q. (2014). Survey on stochastic iterative learning control. Journal of Process Control, 24(12), Shen, D., & Wang, Y.Q. (2015a). Iterative learning control for networed stochastic systems with random pacet losses. International Journal of Control, 88(5), Shen, D., & Wang, Y.Q. (2015b). ILC for networed nonlinear systems with unnowncontroldirectionthroughrandomlossychannel.systems & Control Letters, 77, Shen, D., & Xu, Y. (2016). Iterative learning control for discrete-time stochastic systems with quantized information. IEEE/CAA Journal of Automatica Sinica, 3(1), Shen, D., Zhang, W., Wang, Y.Q., & Chien, C.J. (2016). On almost sure and mean square convergence of p-type ILC under randomly varying iteration lengths. Automatica, 63(1), Sun, P., Fang, Z., & Han, Z.Z. (2002). Sampled-data iterative learning control forsingularsystems.in4th World Congress on Intelligent Control and Automation (pp ). Shanghai.

170 INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 11 Sun, M.X., Li, Z.L., & Zhu, S. (2013). Varying-order sampled-data iterative learning control for MIMO nonlinear systems. Acta Automatica Sinica, 39(7), Sun,M.X.,&Wang,D.W.(2000). Sampled-data iterative learning control for SISO nonlinear systems with arbitrary relative degree. In Proceedings of the American Control Conference (pp ). Sun,M.X.,&Wang,D.W.(2001). Sampled-data iterative learning control for nonlinear systems with arbitrary relative degree. Automatica, 37(2), Sun, M.X., Wang, D.W., & Wang, Y.Y. (2004). Sampled-data iterative learning control with well-defined relative degree. International Journal of Robust and Nonlinear Control, 14(8), Wang,D.W.(1998). Convergence and robustness of discrete time nonlinear systems with iterative learning control. Automatica, 34(11), Xu,W.K., Chu, B.,&Rogers,E.(2014). Iterative learning control for roboticassisted upper limb stroe rehabilitation in the presence of muscle fatigue. Control Engineering Practice, 31, Xu, J.X., Abidi, K., Niu, X.L., & Huang, D.Q. (2012). Sampled-data iterative learning control for a piezoelectric motor. In Proceedings of the IEEE International Symposium on Industrial Electronics (pp ). Hangzhou. Xu, J.X., Huang, D., Venataramanan, V., & Tuong, H.T.C. (2013). Extreme precise motion tracing of piezoelectric positioning stage using sampled-data iterative learning control. IEEE Transactions on Control Systems Technology, 21(4), Xu, Y., & Shen, D. (2016). Zero-error convergence of iterative learning control using quantized error information. IMA Journal of Mathematical Control and Information, in press. Xu, J.X., & Tan, Y. (2003). Linear and nonlinear iterative learning control. Berlin: Springer. Zhao, Y.M., Lin, Y., Xi, F.F., & Guo, S. (2015). Calibration-based iterative learning control for path tracing of industrial robots. IEEE Transactions on Industrial Electronics, 62(5), Zhu, S., He, X.X., & Sun, M.X. (2006). Initial rectifying of a sampled-data iterative learning controller. In The Sixth World Congress on Intelligent Control and Automation (pp ), Dalian. Zhou, S.H., Tan, Y., Oetomo, D., Freeman, C., & Mareels, I. (2013). On online sampled-data optimal learning for dynamic systems with uncertainties. In 9th Asian Control Conference (pp. 1 7). Istanbul.

171

Information Sciences 381 (2017) 352 370 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.

Beijing University of Chemical Technology, Beijing 10 0 029, PR China a r t i c l e i n f o a b s t r a c t Article history: Received 9 April 2016 Revised 19 October 2016 Accepted 26 November 2016

172 Information Sciences 381 (2017) Contents lists available at ScienceDirect Information Sciences journal homepage: Two updating schemes of iterative learning control for networed control systems with random data dropouts Dong Shen, Chao Zhang, Yun Xu College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , PR China a r t i c l e i n f o a b s t r a c t Article history: Received 9 April 2016 Revised 19 October 2016 Accepted 26 November 2016 Available online 2 December 2016 Keywords: Iterative learning control Data dropout Intermittent updating scheme Successive updating scheme Networed control system The iterative learning control (ILC) problem is addressed in this paper for stochastic linear systems with random data dropout modeled by a Bernoulli random variable. Both intermittent updating scheme and successive updating scheme are provided on the basis of the available tracing information only and shown to be convergent to the desired input almost certainly. In the intermittent updating scheme, the algorithm only updates its control signal when data is successfully transmitted. In the successive updating scheme, the algorithm continuously updates its control signal with the latest available data in each iteration whether the output information of the last iteration is successfully transmitted or lost. Illustrative simulations verify the convergence and effectiveness of the proposed algorithms Elsevier Inc. All rights reserved. 1. Introduction Iterative learning control (ILC) is an important branch of intelligent control, especially for repetitive systems. Since first proposed by Arimoto in 1984 [5], ILC has been developed for three decades and many excellent achievements have been reported [2,6,31]. The inherent idea of ILC is to generate the input signal for the current iteration by using the input and output information from previous iterations as well as the desired trajectory. Thus, tracing performance is successively improved along the iteration axis, unlie in the traditional control strategies that improve control performance along the time axis [6]. The system has to complete the given tracing tas in a finite time interval and then repeat it again and again because of this operation mechanism. Examples of these systems are chemical processes, robotics, and hard dis drives, to name a few. Many studies have been conducted on different ILC topics, such as update law design [19], robustness [17,20,37], frequency analysis [43], and application research [7,22,41]. Moreover, the exploration of ILC has been extended to new problems, such as multi-phase processes [38], varying tass [44], iteration varying lengths [23,35,36], event-triggered control [40], control with quantization information [11,34], collaborative tracing [16], and initial state vibration [39]. However, most of these studies mainly concern ILC in terms of centralized control systems, the controller and plant of which are placed together, so that each piece of information can be well received and processed. Recently, networed control systems (NCSs) have been widely used because of their facility, flexibility, and robustness with the help of the fast developments of This wor is supported by National Natural Science Foundation of China ( , ) and Beijing Natural Science Foundation ( ). Corresponding author. address: shendong@mail.buct.edu.cn (D. Shen) / 2016 Elsevier Inc. All rights reserved.

173 D. Shen et al. / Information Sciences 381 (2017) communication and networ techniques. In this ind of system implementation, data transmission is a critical topic as data dropouts damage the tracing performance. This situation motivates the research on ILC for NCSs. Other papers have also focused on ILC for a class of networed systems called multi-agent systems [21,26 28]. However, the major difference between these studies and ours is the description of a networed system. In [21,26 28], the multi-agent is a networed complex system combined by multiple subsystems, in which a networ denotes the topological relationship among subsystems. The main problem is how to achieve synchronization and/or consensus. Conversely, in the field of NCS, a networ means the transmission channel between the plant and the controller, and the main problem is how to guarantee good performance under severe transmission conditions, such as data dropouts. When considering the ILC for NCSs, three major aspects should be taen into account, namely, data dropout condition, compensation mechanism design, and convergence analysis. In existing papers, some progresses were obtained according to one or more of these aspects. However, this state is still far from the topic being complete. In what follows, we present in brief the existing results on the data dropout condition and demonstrate our motivations and contributions. Early attempts were first conducted by Ahn et al. [1,3,4] and succeeded by Bu and his co-worers [9,10,12]. In these studies, data dropout is modeled by a Bernoulli random variable, which has a value of 1 when data are successfully transmitted and 0 otherwise. As a result of such a condition, data dropouts can occur randomly and successively. Thus, the Bernoulli model has been widely used in many papers to describe random data dropouts. On the basis of the statistics of random data dropout, Ahn et al. showed mean square stability following the Kalman filtering-based design and analysis techniques that were first proposed by Saab in [30]. The major differences among these papers were the locations of the data dropouts. Specifically, only measurement loss was discussed in [1,3], and the case in which data dropouts occurred in the control and in the output was addressed in [4]. Bu et al. addressed the data dropout problem from the mathematical expectation perspective in [9,10,12]. Generally, mathematical expectations are taen to both sides of the iteration recursive equation of tracing errors directly in [9] for the linear system, and then the stability condition is given to derive the convergence of the expectation of tracing error. The corresponding nonlinear case is expressed in [12]. This technique is also used in [25], in which the stochastic equations were first converted into deterministic ones by taing mathematical expectation and the following analysis was conducted by the conventional contraction mapping method. Moreover, [10] provided an H ILC analysis for a discrete time system with random data dropouts, where the H performance is defined on the basis of the mathematical expectations of the random variables. In addition, Bu and his co-worers provided a 2D analysis approach for ILC under data dropouts, and the learning gain was generated using the linear matrix inequality (LMI) technique [8]. The mean square asymptotic stability was established. Several observations of these results are listed as follows. First, conducting a rigorous analysis directly according to the generated sequence itself is difficult because of the inherent randomness. Therefore, the above wors employed mathematical expectation, such as [9,10,12,25], or covariance, such as [1,3,4,8], to remove the random data dropout effect. Moreover, the probability of a data dropout should be nown prior to designing the learning gain matrix because of the introduction of expectation and/or covariance. This condition limits the application range of the proposed algorithms as the statistics of data dropout is commonly unnown prior. Furthermore, the algorithms given in [1,3,4,8 10,25] all adopt the intermittent strategy to deal with the random data dropouts, i.e., the algorithms simply stop updating whenever data are lost during transmission. If the tracing information is available, the algorithms will use this information for updating; if the tracing information is lost, the algorithms will not update until the arrival of the new data. In short, the general Bernoulli data dropout condition still faces many open problems from the perspective of compensation mechanism design and convergence analysis. To further extend the results in the view of the compensation mechanism design, some limitations are imposed on the data dropout condition as in [12,18,24,29]. In practice, if the corresponding information is lost, considering whether or not one can compensate for this information will be of interest. These studies identify two types of compensation mechanisms. The first type is the time axis-based compensation method used in [12,29]. If the data at time t are lost, then the data at time t 1 in the same iteration will be used to compensate for the lost data. Then, this compensation prevents the data at adjacent time instances from dropping. In [12], the expectation of the output converging to the desired reference under a condition depending on the probability of successful transmission was proved. The paper [29] mainly discussed the effect in the case in which a pacet at a specified sampling time was lost. As the authors indicated, producing mathematical proof was difficult for the general multiple pacet loss case. Thus, they had to assume that the rate of data dropout was far less than 100%. This conclusion demonstrated another limitation of data dropout. The second type is the iteration axis-based compensation mechanism [18,24]. If the data at th iteration are lost, then the data at the same time instant but at the 1 th iteration will be used to compensate for the lost data. At the same time, the data from adjacent iterations will not be dropped, a situation indicating the limitation of the data dropout condition. This inherent mechanism guarantees the convergence property of the proposed algorithms in [18,24]. In sum, the papers [12,18,24,29] provided primary compensation mechanisms for dropped data instead of setting them to zero. These mechanisms show the possibility of compensating for lost data, but these studies failed to address the random successive data dropout case along the time axis and/or iteration axis. Generally, these additional limitations imply that the data dropout is not completely stochastic and remains blan for the general random multiple data dropout case. Under a general data dropout environment, considering a generic algorithm that updates itself with the latest available data unnown prior to the learning process is of great interest. In addition, a new model of random data dropouts was given in [32,33], in which a stochastic sequence was introduced to model the data dropouts with bounded length requirements. That is, for any given time instant, an unnown maximum

174 354 D. Shen et al. / Information Sciences 381 (2017) of successive iteration number exists that data could be dropped consecutively. As a result, this bounded length requirement of successive dropouts is somewhat tight as it is not completely stochastic. Based on the above literature review, the motivation of this paper is that a Bernoulli random variable is the most common model for random data dropout. For the generic Bernoulli model, existing wors focus on the intermittent strategy and the proposed algorithms converge in the mathematical expectation sense or mean square sense. A few papers have considered the compensation of lost data or direct convergence analysis to the generated sequence, but restrictive conditions are imposed on data dropout environments. Thus, in this paper, we are motivated to study the compensation scheme and convergence analysis in an almost sure sense under the generic Bernoulli model. Specifically, we consider the stochastic linear system with both system and measurement noises under random data dropouts. The random data dropout is modeled by a Bernoulli random variable and no further condition is imposed. The main contributions of this paper are twofold. The first one is that two updating schemes, namely, intermittent updating scheme (IUS) and successive updating scheme (SUS), are proposed, analyzed, and compared. Moreover, the specific probability is not required beforehand in these schemes. In the IUS, the algorithm updates its control signal only when the corresponding pacet is successfully transmitted. In the SUS, the algorithm eeps updating with the latest available data whether or not data dropouts occur. The other contribution is the almost sure convergence of the proposed algorithms under the Bernoulli model of data dropouts. Specifically, the input sequences generated by both algorithms are proved to converge to the desired input even if stochastic noises are involved. Note that the SUS and its associated convergence analysis is novel, and they have not been addressed in existing papers because of the newly involved randomness of successive dropouts. The rest of the paper is organized as follows. The problem formulation, including system setup, control objective, and preliminary lemma, is presented in Section 2. The intermittent updating scheme and successive updating scheme with their convergence analysis are provided in Sections 3 and 4, respectively. Illustrative simulations are given in Section 5 to verify the theoretical results and Section 6 concludes the paper. Notations : R is the real number field, and R n is the n -dimensional real space. N is the set of all positive integers. I n n is the n -dimensional identity matrix. P is the probability of an event and E is the mathematical expectation. 2 is the Euclidean norm or 2-norm of a vector or a matrix. For a concise expression, the subscript 2 is omitted and the norm notation is abbreviated as in the rest of the paper. The superscript T is the transpose of a matrix or vector. For two sequences { a n } and { b n }, we call a n = O (b n ) if b n 0 and there exists L > 0 such that a n Lb n, n, and a n = o(b n ) if b n 0 and ( a n / b n ) 0 as n. The abbreviations i.o. and a.s. denote infinitely often and almost surely, respectively. 2. Problem formulation Consider the following discrete stochastic system: x ( t + 1 ) = A ( t ) x ( t ) + B ( t ) u ( t ) + w ( t + 1 ) (1) y ( t ) = C ( t ) x ( t ) + v ( t ) where = 1, 2,... is the different iteration numbers, t = 0, 1,..., N is the different time instances in an iteration, and N is the length of each iteration. x (t) R n, u (t) R p, and y (t) R q are the state, input, and output of the system, respectively. A ( t ), B ( t ), and C ( t ) are system matrices with appropriate dimensions. The random variables w ( t ) and v ( t ) are system noises and measurement noises, respectively. Let y d ( t ), t = 0, 1,..., N be the tracing reference. The following mild assumptions are given for the system (1). A1. The input-output coupling matrix C(t + 1) B (t) is assumed to have a full-column ran for all t. A2. For each t, the independent and identically distributed (i.i.d.) sequence { w (t), = 0, 1,...} is independent of the i.i.d. sequence { v (t), = 0, 1,...} with E w (t) = 0, E v (t) = 0, sup E w (t) 2 <, sup E v (t) 2 <, 1 lim n n n =1 w (t) w T (t) = R t w, and lim n 1 n v n =1 (t) v T (t) = R t v, a.s., where R t w and R t v are unnown matrices. A3. The initial state sequence { x (0)} is i.i.d. with E x (0) = x d (0), sup E x (0) 2 1 <, and lim n n =1 x (0) x T (0) = R 0. Further, the sequences { x (0), = 0, 1,...}, { w (t), = 0, 1,...}, and { v (t), = 0, 1,...} are mutually independent. Remar 1. For any given initial value x d (0), the following expression of u d ( t ) can be computed recursively from the nominal model, u d (t) = [(C + B (t)) T (C + B (t))] 1 (C + B (t)) T (y d (t + 1) C(t + 1) A (t ) x d (t )) where A1 is used and C + B (t) C(t + 1) B (t) R q p. Evidently, the following equations are fulfilled: x d ( t + 1 ) = A ( t ) x d ( t ) + B ( t ) u d ( t ) y d ( t ) = C ( t ) x d ( t ) Moreover, A1 further implies that the relative degree is one and the dimension of input is not larger than the dimension of output, i.e., p q. A1 is used to guarantee the existence of the desired control that generates the desired tracing reference n (2)

D. Shen et al. / Information Sciences 381 (2017) 352 370 355 Fig. 1. Bloc diagram of the networed iterative learning control. from the nominal model.

175 D. Shen et al. / Information Sciences 381 (2017) Fig. 1. Bloc diagram of the networed iterative learning control. from the nominal model. A2 is a common condition in unnown random noises in stochastic control. The independence condition in A2 is required along the iteration axis, and thus it is rational for practical applications because the process is repeatable. The initial resetting condition A3 enables a random initial shift around the desired initial state. The classical precise resetting condition of the initial state can be regarded as a special case of A3. To facilitate the expression, denote 1 w (0) = x (0) x d (0). Then, defining lim n n n =1 w (0) w T (0) = R 0 w is easy to satisfy the formulation of A2. In other words, A3 can be compressed into A2. Thus, all the assumptions are mild. The setup of the control system is illustrated in Fig. 1, where the plant and learning controller are located separately and communicate via networs. The data may be dropped out through the networs because of networ congestion, linage interruption, and transmission error. However, to mae the expression concise and without loss of any generality, data dropout is only considered for the output side. That is, random data dropouts only occur in the networ from the measurement output to the buffer, and the networ from the learning controller to the control plant is assumed to wor well. Similar to [1,3,4,9,12], we adopt a Bernoulli random variable to model the random data dropouts. Specifically, a random variable γ ( t ) is introduced to indicate whether or not the measurement pacet y ( t ) is successfully transmitted, { 1, y γ (t) = (t) is successfully transmitted (3) 0, otherwise and without loss of any generality, P (γ (t) = 1) = ρ, P (γ (t) = 0) = 1 ρ (4) where 0 < ρ < 1. That is, the probability that the measurement y ( t ) is successfully transmitted is ρ for all t and. Remar 2. The Bernoulli random variable has been used in many publications to describe random data dropouts. In the field of ILC, papers [1,3,4,9,12] are typical illustrations. However, in most cases, only convergence in the mathematical expectation sense and/or mean square sense is obtained. No result on convergence with probability 1 (w.p.1) has been reported until now. In our early papers [32,33], the convergence w.p.1 of ILC input sequence is proved strictly for stochastic systems. However, data dropout in previous studies is modeled by a stochastic sequence with a finite length requirement. A finite length requirement means that data dropout is not completely stochastic. Therefore, the Bernoulli random variable model is revisited in this paper and the convergence w.p.1 is expected to be obtained. If the measurement pacet is successfully transmitted, then one can compare it with the desired reference value and compute the tracing error e (t) y d (t) y (t) for the updating. Otherwise, no output information is received, and no tracing error can be obtained for further updating. The conventional control objective for a deterministic system is to build an ILC algorithm that generates the input sequence, so that the actual output of the system y ( t ) can trac some given trajectory y d ( t ) asymptotically. However, when considering stochastic systems, system noises and measurement noises, which cannot be predicated and eliminated by any algorithm, are observed. Thus, we cannot expect that y ( t ) y d ( t ), t, for stochastic systems, as the iteration number goes to infinity. Therefore, for stochastic systems, the best achievable tracing performance is that the tracing error only consists of the noise terms. To this end, the control objective of this paper is to design an ILC algorithm such that u ( t ) u d ( t ), t, as. Remar 3. As stochastic noises cannot be predicted and eliminated, an intuitive idea is to minimize the following averaged tracing error index: V t = lim sup n 1 n n =1 y d (t) y (t) 2 Through simple calculations, the above index is found to be minimized if u ( t ) u d ( t ), t as. Thus, in what follows, we show the direct convergence of the input sequence to the desired input. (5)

176 356 D. Shen et al. / Information Sciences 381 (2017) To achieve this control objective under stochastic noises, two updating schemes are proposed in this paper. The convergence analysis of these schemes is presented in the following two sections. IUS u +1 (t) = u (t) + a γ (t + 1) L t e (t + 1) (6) where a is the learning step-size and L t is the learning gain matrix. SUS u +1 (t) = u (t) + a L t e (t + 1) (7) where a and L t mean the same as those in the IUS case, and e (t) is the latest available tracing error defined as e (t) = { e (t), if γ (t) = 1 e 1 (t), if γ (t) = 0 (8) The learning step-size { a } is a decreasing sequence, and it should satisfy the following: a > 0, a 0, a =, a 2 < =1 =1 (9) Remar 4. The decreasing sequence is added to the algorithms as the stochastic system is considered in the paper. Clearly, a = α/ meets all the requirements in (9), where the constant α > 0 can be regarded as a tuning parameter. This learning step-size a is introduced to suppress the effect of stochastic noises as the iteration number goes to infinity and to guarantee a zero-error convergence of the input sequence. Specifically, the tracing error contains stochastic noises as its part. After enough learning iteration, the stochastic noise dominates the tracing error. If no mechanism is available to suppress the effect of stochastic noises, the algorithm will fail to converge to a stable limitation. Therefore, we add the decreasing sequence to the conventional P-type learning algorithm. Remar 5. Note that the proposed algorithms (6) and (7) are slight modified versions of the conventional P-type algorithm. In other words, this study attempts to provide the first rigorous convergence analysis for ILC under random data dropout that is modeled by a Bernoulli random variable. Thus, designing new ILC algorithms is not our principle objective. We believe that the conventional P-type can provide robustness against severe learning conditions. Accordingly, we adopt the common ILC algorithms. The introduction of the decreasing sequence a maes algorithms (6) and (7) classic stochastic approximation algorithms. Thus, our next analysis is based on stochastic approximation techniques. Remar 6. The inherent mechanism of IUS is that the algorithm updates the input when the output is successfully received and stops updating otherwise. That is, the input can eep the latest one if no new output is received; this is why it is called IUS. In addition, the updating frequency is equal to the successful transmission rate. Thus, the learning step-size a goes to 0 fast when the data dropout rate is large and a slow learning speed is obtained. An alternative to (6) is given as follows to improve the performance: u +1 ( t ) = u ( t ) + a κ ( t ) γ ( t + 1 ) L t e ( t + 1 ) κ ( t ) = γ i ( t + 1 ) i =1 Remar 7. The inherent mechanism of SUS is that the algorithm eeps updating by using the latest available pacet. In other words, if the output of the last iteration is received, then the algorithm will update its input using this information. If the output of the last iteration is lost, then the algorithm will update its input using the latest available output pacet received previously. That is, algorithm (7) can be rewritten as u +1 (t) = u (t) + a γ (t + 1) L t e (t + 1) + a (1 γ (t + 1)) L t e 1 (t + 1) (10) Note that no finite length condition in the successive data dropouts is required. In fact, the number of successive data dropout iteration can be arbitrarily large. Remar 8. The differences between IUS and SUS are listed as follows. First, IUS is an event-triggered updating, whereas SUS is an iteration-triggered updating. That is, IUS only updates its signal when the measurement output of the last iteration is received, whereas SUS updates its signal in every iteration. Moreover, the updating frequency of IUS depends on the rate of successful transmission and is usually low. Generally, the larger the data dropout rate is, the lower the updating frequency is. Conversely, SUS eeps updating all the time. In sum, the respective convergence speeds of IUS and SUS are maredly different.

177 D. Shen et al. / Information Sciences 381 (2017) The following technical lemma is used for the subsequent analysis. Its proof is found in [13]. Lemma 1. Let H be a stable matrix with dimension of l l. Let a satisfy the conditions in (9) and let l-dimensional vectors { μ }, { ν } satisfy the following conditions: =1 a μ <, ν 0 Then, { α } generated by the following recursion with an arbitrary initial value α 0 converges to zero w.p.1 α +1 = α + a Hα + a (μ + ν ) Here by stability of a matrix we mean that all its eigenvalues are with negative real parts. 3. Convergence analysis for the intermittent updating scheme In this section, we present the technical convergence analysis of the IUS case. Compared with that of the SUS case, the proof of the IUS case is more intuitive as IUS will eep the input signal invariant if the corresponding output of the last iteration is lost. Thus, one only needs to focus on the updating iterations. In observing the probability of (4), clearly, E γ (t) = ρ and E γ 2 (t) = ρ. Denote δx (t) = x d (t) x (t) and δu (t) = u d (t) u (t). Then, the update law (6) is rewritten as u +1 ( t ) = u ( t ) + a γ ( t + 1 ) L t e ( t + 1 ) = u ( t ) + a γ ( t + 1 ) L t ( y d ( t + 1 ) y ( t + 1 ) ) = u ( t ) + a γ ( t + 1 ) L t C ( t + 1 ) δx ( t + 1 ) a γ ( t + 1 ) L t v ( t + 1 ) = u ( t ) + a γ ( t + 1 ) L t C + B ( t ) δu ( t ) + a γ ( t + 1 ) L t C + A ( t ) δx ( t ) a γ ( t + 1 ) L t C ( t + 1 ) w ( t + 1 ) a γ ( t + 1 ) L t v ( t + 1 ) = u ( t ) + a ρl t C + B ( t ) δu ( t ) + a ( γ ( t + 1 ) ρ) L t C + B ( t ) δu ( t ) + a γ ( t + 1 ) L t C + A ( t ) δx ( t ) a γ ( t + 1 ) L t C ( t + 1 ) w ( t + 1 ) a γ ( t + 1 ) L t v ( t + 1 ) where C + A (t) C(t + 1) A (t). Subtracting both sides of the last equation from u d ( t ) leads to δu +1 ( t ) = δu ( t ) a ρl t C + B ( t ) δu ( t ) a ( γ ( t + 1 ) ρ) L t C + B ( t ) δu ( t ) a γ ( t + 1 ) L t C + A ( t ) δx ( t ) + a γ ( t + 1 ) L t C ( t + 1 ) w ( t + 1 ) + a γ ( t + 1 ) L t v ( t + 1 ) In the following, the argument t or its specific value may be omitted if no confusion exists to mae the expressions concise. Theorem 1. Consider the stochastic system (1) and update law (6). Design L t R p q such that all eigenvalues of L t C + B (t) are with positive real parts. Then, the input u ( t ) generated by (6) converges to u d ( t ) w.p.1 as goes to infinity, t. (11) (12) (13) Proof. The proof is performed by mathematical induction along the time axis t. The steps for t = 1, 2,..., N 1 are identical to those in the case of t = 0, which will be expressed in detail in the following. Initial Step. Consider the case t = 0. For t = 0, (13) is δu +1 ( 0 ) = δu ( 0 ) a ρl 0 C + B ( 0 ) δu ( 0 ) a ( γ ( 1 ) ρ) L 0 C + B ( 0 ) δu ( 0 ) a γ ( 1 ) L 0 C + A ( 0 ) δx ( 0 ) + a γ ( 1 ) L 0 C ( 1 ) w ( 1 ) + a γ ( 1 ) L 0 v ( 1 ) (14)

178 358 D. Shen et al. / Information Sciences 381 (2017) Considering A3 and Remar 1, we find that δx (0) = w (0). On the other side, γ (1) is independent of δx (0). Both { γ (1)} and { δx (0)} are i.i.d. sequences along the iteration axis with a finite second moment. In addition, E δx (0) = E x (0) x d (0) = 0. Therefore, if we denote ɛ γ (1) L 0 C + A (0) δx (0), then { ɛ } is an i.i.d. sequence with zero mean and a finite second moment. Through direct calculations, we have =1 E (a ɛ ) 2 = E (a γ (1) L 0 C + A (0) δx (0)) 2 =1 = =1 a 2 E γ 2 (1) L 0 C + A (0) E (δx (0)) 2 c 0 a 2 < =1 where c 0 > 0 is a suitable constant. Then, we have =1 a γ (1) L 0 C + A (0) δx (0) < w.p.1 by the Khintchine Kolmogorov convergence theorem [15]. Similarly, both { w (1)} and { v (1)} are i.i.d. sequences with zero means and finite second moments. In addition, they are independent of γ (1). Clearly, =1 E (a γ (1) L 0 C(1) w (1)) 2 L 0 C(1) 2 a 2 E γ 2 (1) E w 2 (1) < =1 =1 E (a γ (1) L 0 v (1)) 2 L 0 2 a 2 E γ 2 (1) E v 2 (1) < =1 which further leads to =1 a γ (1) L 0 C(1) w (1) <, =1 E a γ (1) L 0 v (1) < w.p.1. Consider the third term on the right-hand side of (14), a (γ (1) ρ) L 0 C + B (0) δu (0). The sequence of this term is no longer mutually independent, unlie the last three terms of (14). To deal with this term, let F be the increasing σ -algebra generated by y j ( t ), x j ( t ), w j ( t ), v j ( t ), 0 t N, 0 j, i.e., F σ { y j (t), x j (t), w j (t), v j (t), γ j (t), 0 j, t { 0,..., N}}. According to the update law (6), u (t) F 1. Note that γ (1) is independent of F 1 and is thus independent of δu (0). Therefore, E { (γ (1) ρ) L 0 C + B (0) δu (0) F 1 } = L 0 C + B (0) δu (0) E { γ (1) ρ F 1 } = 0 That is, (a (γ (1) ρ) L 0 C + B (0) δu (0), F, 1) is a martingale difference sequence. In addition, =1 E { a (γ (1) ρ) L 0 C + B (0) δu (0) 2 F 1 } a 2 E {(γ (1) ρ) 2 F 1 } =1 sup L 0 C + B (0) δu (0) 2 c 1 a 2 < =1 =1 a 2 E (γ (1) ρ) 2 where c 1 > 0 is a suitable constant. Then, by the Chow convergence theorem of martingale [15], we have =1 a (γ (1) ρ) L 0 C + B (0) δu (0) < w.p.1. If the learning gain matrix is designed such that all eigenvalues of L 0 C + B (0) are with positive real parts, then ρl 0 C + B (0) is clearly stable. Thus, applying Lemma 1 to the recursion (14), we have δu (0) 0 as w.p.1. Inductive Step. Assume that the convergence of u ( t ) has been proved for t = 0, 1,..., s 1 and that the aim is to show the convergence for t = s.

179 D. Shen et al. / Information Sciences 381 (2017) From (1) and (2), we have δx ( s ) = A ( s 1 ) δx ( s 1 ) + B ( s 1 ) δu ( s 1 ) w ( s ) = A ( s 1 ) A ( s 2 ) δx ( s 2 ) + A ( s 1 ) B ( s 2 ) δu ( s 2 ) + B ( s 1 ) δu ( s 1 ) A ( s 1 ) w ( s 1 ) w ( s ) ( ) ( ) s 1 s 1 s s 1 = A ( j ) B ( i ) δu ( i ) A ( j ) w ( i ) i =0 j= i +1 i =0 j= i where j = i A () = A ( j) A ( j 1)... A (i), j i and i 1 A () = I. Replacing all t in (14) with s and substituting the above = i equation, we have δu +1 ( s ) = δu ( s ) a ρl s C + B ( s ) δu ( s ) a ( γ ( s + 1 ) ρ) L s C + B ( s ) δu ( s ) ( ) s 1 s 1 a γ ( s + 1 ) L s C + A ( s ) A ( j ) B ( i ) δu ( i ) i =0 ( j= i +1 ) s s 1 a γ ( s + 1 ) L s C + A ( s ) A ( j ) w ( i ) i =0 + a γ ( s + 1 ) L s ( C ( s + 1 ) w ( s + 1 ) + v ( s + 1 ) ) j= i Through the induction assumption, we have that δu ( t ) 0 as w.p.1 for t = 0, 1,..., s 1. Thus, γ (s + 1) L s C + A (s) ( s 1 s 1 i =0 j= i +1 A ( j) ) B (i) δu (i) 0 w.p.1. Following the same step as the initial step, we can obtain the following results: a ( γ ( s + 1 ) ρ) L s C + B ( s ) δu ( s ) <, =1 ( ) s s 1 a γ ( s + 1 ) L s C + A ( s ) A ( j ) w ( i ) <, =1 i =0 j= i a γ ( s + 1 ) L s C ( s + 1 ) w ( s + 1 ) <, =1 a γ ( s + 1 ) L s v ( s + 1 ) < w. p. 1. =1 Again, using Lemma 1, we can easily conclude that δu ( s ) 0 as. The proof is completed using the mathematical induction method. (15) Remar 9. As shown in Theorem 1, the condition on the design of the learning matrix L t is relaxed, as it can be solved by finding a feasible solution for an LMI L t C + B (t) > 0. If the system matrices C ( t ) and B ( t ) are nown, then an intuitive selection of L t is L t = (C + B (t)) T, which leads L t C + B (t) to become a positive definite matrix. In addition, this condition can be assured under some uncertainties of the system model. Remar 10. The proof of Theorem 1 indicates that the condition in the initial state A3 can be replaced by the following: the initial state x (0) is independent of stochastic noises w ( t ) and v ( t ). Moreover, the deviation between x (0) and x d (0) approaches zero, i.e., δx (0) 0 as. By incorporating some learning strategy of initial state such as [14], the applications can be further enlarged. Remar 11. With slight modifications of the above proof, the convergence w.p.1 of the alternative algorithm proposed in Remar 6 is easily shown. If the learning step size a is eliminated from (6), i.e., u +1 (t) = u (t) + γ (t + 1) L t e (t + 1) (16) then the convergence w.p.1 can also be obtained as long as L t satisfies that the spectral norm of I ρl t C + B (t) is less than 1. The selection of L t is more restrictive than the one given in Theorem 1. The tracing performance comparisons of the proposed algorithm (6) and the conventional P-type update law (16) are simulated in Section 5.

180 360 D. Shen et al. / Information Sciences 381 (2017) Convergence analysis for the successive updating scheme This section presents the convergence analysis of the SUS case. The proof of this case is more technically complex than that of the IUS case, as the update information in (7) or (10) is no longer relatively definitive. In other words, if the measurement output of the last iteration is lost during transmission, then the one used in (7) will be unnown because of the possibility of successive data dropouts. Thus, update information can come from any previous iteration. To form this situation, stochastic stopping times { τ t, = 1, 2,..., 0 t N} are introduced to denote the random iteration-delays of the update caused by random data dropouts. Then, the updating scheme (7) is rewritten as u +1 (t) = u (t) + a L t e τ t+1 (t + 1) where the stopping time τ t+1 e m (t + 1) with m > τ t+1 τ t+1. In other words, for the updating of input at t of ( + 1) th iteration, no information of is received and only e τ t+1 (t + 1) is available. According to the SUS settings, for the iterations (t + 1). < m, the input u m ( t ) is successively updated with the same error e τ t+1 The coupling of stochastic stopping times and the successive updating mechanism mae the convergence analysis much complex. Thus, the analysis will be proved by two steps: we first show the convergence of (17) with τ t = 0,, t, and then we consider the effect of stopping times τ t. We first show the convergence of the following updating scheme u +1 (t) = u (t) + a L t e (t + 1) This case actually is the conventional ILC for systems without random data dropouts. We have the following theorem. Theorem 2. Consider the stochastic system (1) and update law (18). Design L t R p q such that all eigenvalues of L t C + B (t) are with positive real parts. Then, the input u ( t ) generated by (18) converges to u d ( t ) w.p.1 as goes to infinity, t. The proof is put in the appendix. Now it comes to the general case (17). Theorem 3. Consider the stochastic system (1) and update law (17). Design L t R p q such that all eigenvalues of L t C + B (t) are with positive real parts. Then, the input u ( t ) generated by (18) converges to u d ( t ) w.p.1 as goes to infinity, t. (17) (18) Proof. Comparing (17) and (18), the effect of the random data dropout is an additional error: a L t (e (t + 1) e τ t+1 (t + 1)) Thus the aim of the proof is to show that the above term satisfies the condition (11). Specifically, we have a L t (e (t + 1) e τ t+1 (t + 1)) = a L t C + B (t)[ δu (t) δu τ t+1 (t)] + a L t (C + A (t)[ δx (t) δx τ t+1 (t)] a L t C(t + 1)[ w (t) w τ t+1 (t)] a L t [ v (t) v τ t+1 (t)] Undoubtedly, the last two terms satisfy condition (11). Similar to the proofs in Theorems 1 and 2, they can be proved by mathematical induction that the second term on the right-hand side of the last equation satisfies condition (11). Thus, only the first term, i.e., a L t C + B (t)[ δu (t) δu τ t+1 (t)], is left for further analysis. Recalling update (17), the difference is expanded to (19) δu ( t ) δu τ t+1 ( t ) = 1 m = τ t m = τ t+1 1 m = τ t+1 1 m = τ t+1 a m L t C + B ( t ) δu m τ t+1 m ( t ) a m L t C + A ( t ) δx m τ t+1 m ( t ) a m L t C ( t + 1 ) w m τ t+1 m ( t + 1 ) a m L t v m τ t+1 m ( t + 1 ) (20)

181 D. Shen et al. / Information Sciences 381 (2017) To analyze the effect of (20), we need to estimate the number of successive data dropout iterations, i.e., τ t. As data dropouts are modeled by a Bernoulli random variable, τ t obeys the geometric distribution. To mae the notations concise, we let τ denote a random variable satisfying the same geometric distribution, i.e., τ G ( ρ). Clearly, E τ = ρ 1 and Var (τ ) = 1 ρ ρ. As Var (τ ) = E (τ E τ ) 2, then E τ 2 = 2 ρ 2 ρ2. Using direct calculations, we have P { τ n 1 2 } = n =1 n =1 P { τ 2 n } = P { j τ 2 < j + 1 } n =1 j= n = jp { j τ 2 < j + 1 } j=1 E τ 2 < Using the Borel Cantelli lemma, we have P { τ n 1 2 i.o. } = 0 Consequently, we have τn t n 0 a.s. t Therefore, and n τn t 1 a.s. n n τ t n a.s. m = τ t+1 t Now, let us prove that the three terms on the right-hand side of (20) satisfy the condition (11) of Lemma 1. Through the same steps of the proof of Theorem 1, 1 m =0 a m L t (C(t + 1) w m τ t+1 (t + 1) v m m τm t+1 (t + 1)) converges to an unnown constant a.s., t. Therefore, in view of (23), we have 1 a m L t (C(t + 1) w m τ t+1 m (t + 1) v m τ (t + 1)) m t+1 = o(1) Therefore, the last term of (20) satisfies condition (11) of Lemma 1. Again, the same as the steps of the proof in Theorem 1, the second term on the right-hand side of (20) can be split into two parts: a finite summation of input error of past time instances and a finite summation of stochastic noises. 1 m = τ t+1 1 = a m L t C + A ( t ) δx m τ t+1 m ( t ) m = τ t+1 1 m = τ t+1 ( ) t 1 t 1 a m L t C + A ( t ) A ( j ) B ( i ) δu m τ ( i ) t+1 m i =0 j= i +1 ( ) t t 1 a m L t C + A ( t ) A ( j ) w m τ ( i ) t+1 m i =0 j= i In consideration of (23), the former part can be proven convergent to zero following the typical mathematical induction steps, and the latter part can be shown to satisfy condition (11) similar to the last term of (20). The first term on the right-hand side of (20) is the only one that remains. This term can be almost surely bounded from above by a sample path dependent on constant times 1 a m. Noting the selection of a, this term is then bounded by c 0 a τ t+1 τ t+1 step-size of a = 1 a τ t+1 τ t+1 = m = τ t+1, where c 0 is an suitable constant. Thus, this quantity is o (1). For simplicity, we directly select the standard. The general case is similar but with a complicated explanation. In this case, 1 τ t+1 = 1 ( τ t+1 τ t+1 τ 1 ) ( ) ( 1 ) t+1 = O O 1 2 = O = o ( 1 ) 1 2 In sum, we show that the effect of random data dropouts, i.e., (20), satisfies condition (11). The convergence proof of this theorem is also achieved using the same steps of Theorem 1 and using Lemma 1. (21) (22) (23)

182 362 D. Shen et al. / Information Sciences 381 (2017) Table 1 List of notations. notation meaning notation meaning x ( t ) motor position f l ( t ) developed force v ( t ) rotor velocity u ( t ) stator voltage i ( t ) current of stator R resistance of stator L inductance of stator ς pole pitch ψ f flux linage m rotor mass f fri ( t ) frictional force f rip ( t ) ripple force f loa ( t ) applied load force f w ( t ) uncertainties/disturbances Remar 12. Unlie in the IUS case, the update is conducted in each iteration using the latest available information in the SUS case. As shown in (17), two random factors are involved: the random data dropout and the uncertain length of successive iterations that data dropouts occur. As data dropout is described by a Bernoulli random variable and is independent in different iterations, the length of successive dropout iterations is not a bounded variable. Therefore, we have to mae an intensive estimation of the effect of this factor. This part is the ernel step of our proof, and it was not dealt with in previous ILC results. 5. Illustrative simulations 5.1. System descriptions Let us consider a direct current motor control problem for velocity tracing. The dynamics of a permanent magnet linear motor (PMLM) is described as follows [42]: { x (t) = v (t) u (t) = 1 ψ f x (t) + Ri (t) + L i (t) (24) f l (t) = m v (t) + f fri (t) + f rip (t) + f loa (t) + f w (t) where 1 = π ς. The definitions of the notations are listed in Table 1. Following the simplification procedures in [42], the PMLM model can be transformed into the following case: x (t) = v (t) v (t) = 1 2 ψ 2 f v Rm (t) + 2 ψ f Rm u (t) y (t) = v (t) where 2 = 1. 5 π ς. According to our formulations, the discrete time interval is set to = 10 ms, and the whole iteration length is 1 s. That is, N = 100. The system is discretized using the Euler method. Taing the disturbances and noises into consideration, the following stochastic system is obtained: x (t + 1) = x (t) + v (t) + ε 1 (t + 1) v (t + 1) = v (t) 1 2 ψ 2 f v (t) Rm + 2 ψ f Rm u (t) + ε 2 (t + 1) (26) y (t) = v (t) + ɛ(t) where the parameters are given as follows: ς = m, R = 8. 6, m = g, and ψ f = Wb. The product of the input/output coupling value is The reference trajectory is y d (t) = 1 / 3( sin (t/ 20) + 1 cos (3 t/ 20)), 0 t 100. The initial control action is simply given as u 0 (t) = 0, t. The stochastic noises ε 1 ( t ), ε 2 ( t ), and ɛ( t ) obey normal distribution N (0, ). Note that the upper bound of the tracing reference is small, and thus large noises dominate the output. To mae the convergence clear, the stochastic noises here are set with a small derivation. In the following, we present in detail the performance of the proposed algorithms and associated comparisons. Specifically, the tracing performances of IUS and SUS are illustrated in Section 5.2, which shows the advantage of SUS compared with IUS. The comparisons under different data dropout rates (DDRs) and different learning gains for both IUS and SUS are detailed in Sections 5.3 and 5.4, respectively. The comparison between the proposed algorithms and the conventional P-type algorithm is discussed in Section 5.5, which reveals the effect of decreasing sequence { a }. These comparisons illustrate the application information of the proposed algorithms Tracing performances of IUS and SUS In this subsection, we first verify the convergence properties of the proposed updating schemes and then compare IUS and SUS. Any positive number L t can satisfy the conditions in the convergence theorems. (25)

183 D. Shen et al. / Information Sciences 381 (2017) Fig. 2. Tracing performances of IUS and SUS at the final iteration: DDR = 10%. Fig. 3. AATE of IUS and SUS along the iteration axis: DDR = 10%. We first set the probability as ρ = That is, for any data pacet, the probability of a successful transmission is 90%. To mae expressions clear, let γ = 1 ρ denote the DDR, which is the average ratio of lost data to the whole data. In this case, the DDR is 10%, which is low. The decreasing sequence is set to a = 1 /( + 10), the learning gain is L t = 55, and algorithms (6) and (7) are run for 300 iterations. The final outputs of both algorithms are shown in Fig. 2, where the solid, dash, and dotted lines denote the tracing reference, the final output of IUS, and the final output of SUS, respectively. As shown in the figure, three lines are almost coincident. Therefore, both schemes can converge to the desired target quicly under low DDR and stochastic noises. Moreover, the performances of both schemes are close to each other. To understand the performances, we further plot the averaged absolute tracing error along the iteration axis. The average absolute tracing error (AATE) is defined as e = ( N t=1 e (t) )/N for the th iteration. As shown in Fig. 3, little difference is observed between both schemes, thus confirming the above findings. Generally, the lower the DDR is, the closer the performances of IUS and SUS are. If no data dropout exists, then IUS and SUS will be the same. Subsequently, we set ρ = 0. 3, or equivalently γ = 70%, to further compare the performance of both schemes. This percentage implies that the transmission channel is poor, as about 70% of the data will be dropped. The parameters of both algorithms are the same as those in the low DDR case above. The final outputs of both algorithms are displayed in Fig. 4.

184 364 D. Shen et al. / Information Sciences 381 (2017) Fig. 4. Tracing performances of IUS and SUS at the final iteration: DDR = 70%. Fig. 5. AATE of IUS and SUS along the iteration axis: DDR = 70%. Different from the low DDR case, SUS is more advantageous than IUS with respect to tracing accuracy for the same learning iterations under the high DDR condition. As presented in Fig. 5, where the AATE along the iteration axis is shown for both schemes, the tracing errors of SUS are much smaller than those of IUS. The inherent reason for this condition is that the IUS scheme will stop updating if the corresponding pacet is lost, whereas the SUS scheme eeps updating whether or not the corresponding pacet is lost. In other words, SUS updates more times than IUS Comparison with different data dropout rates To determine the influence of different DDRs, a comparison is made with respect to DDR between IUS and SUS. Here four DDRs are considered, namely, γ = 10%, 30%, 50%, and 70%, respectively. Algorithms (6) and (7) are run for 300 iterations. The parameters are set to a = 1 / ( + 10) and L t = 55.

185 D. Shen et al. / Information Sciences 381 (2017) Fig. 6. AATE for IUS with respect to different DDRs: DDR = 10%, 30%, 50%, and 70%. Fig. 7. AATE for SUS with respect to different DDRs: DDR = 10%, 30%, 50%, and 70%. The AATE of the IUS case with respect to different DDRs is shown in Fig. 6, where the dash, dotted, solid, and dashdotted lines denote the AATE from low DDR to high DDR, respectively. We find that tracing accuracy and convergence speed worsen distinctly as the DDR increases or the transmission condition worsens. The SUS case is illustrated in Fig. 7, where the lines have the same meaning as those in the IUS case. The tracing accuracy of SUS has no visible changes for different DDRs. However, most data fail to be transmitted bac when DDR increases. Thus, if the learning gain is quite large, there may be a large increase before the algorithm converges because of excessive updating of some available data Comparison with different learning gains To determine how the learning gain L t is affected in practical applications, we compare the proposed algorithms with different learning gains. The parameters of the algorithms are given as a = 1 / ( + 10) and γ = 30%. Three different learning gains are simulated: L = 55, 110, 165. The results for IUS and SUS are shown in Figs. 8 and 9, respectively, where the profiles of AATE are plotted. A large learning gain may lead to a fast convergence speed for both IUS and SUS cases. However, the final tracing accuracy is not significantly improved by increasing the learning gains.

186 366 D. Shen et al. / Information Sciences 381 (2017) Fig. 8. AATE for IUS with respect to different gains: L = 55, 110 and 165 (DDR = 30%). Fig. 9. AATE for SUS with respect to different gains: L = 55, 110 and 165 (DDR = 30%) Comparison with conventional P-type algorithm This subsection analyzes the effect of decreasing the sequence a and the comparison between the proposed algorithms and the conventional P-type algorithms. As has been analyzed in previous sections, a relaxes the selection of learning gain and guarantees a zero-convergence under stochastic noises. When the decreasing sequence a in (6) and (7) is taen out, algorithms (6) and (7) become conventional P-type algorithms. We first verify the advantage of learning gain selection with the introduction of a. To do so, we select L t = 40. As shown in Fig. 10, the conventional P-type algorithms fail to converge whether they are intermittent updating or successive updating. In this case, the DDR is set to a low value, γ = 10%. The lines denote the profiles of maximal errors, defined as max t e ( t ), along the iteration axis. However, the proposed algorithms still guarantee convergence because of the introduction of a. In this example, we let the decreasing sequence be a = 1 / ( + 10). To mae a fair comparison, the learning gain is set to L t = 440, so that for the first iteration, the first coupling gain a 1 L t = 40. The convergence is shown in Fig. 11. Next, we compare the learning gain with the convergence of the proposed algorithms and the conventional P-type algorithms. For the conventional P-type algorithms, the learning gain is set to L t = 5 ; for the proposed algorithms, the learning

187 D. Shen et al. / Information Sciences 381 (2017) Fig. 10. Maximal error profiles for both IUS and SUS without a along the iteration axis: DDR = 10%. Fig. 11. Maximal error profiles for both IUS and SUS with a along the iteration axis: DDR = 10%. gain is set to L t = 55 and a = 1 / ( + 10). Therefore, the initial coupling learning gain is the same as that in the conventional P-type learning algorithms. The comparisons are illustrated in Figs. 12 and 13, in which the DDR is 50%. In Fig. 12, the proposed IUS algorithm converges slower than the conventional P-type algorithm, as the decreasing sequence a weaens the effect of the learning process. However, the difference in the SUS case is not obvious in Fig Conclusions In this paper, the ILC is considered for stochastic linear systems with random data dropout. Data dropout is modeled by a Bernoulli random variable, which is valued 1 or 0, which denotes that the data are successfully transmitted or not, respectively. Two schemes are proposed to deal with the random data dropouts: the intermittent updating scheme, which updates its control signal if data are transmitted and does nothing otherwise, and the successive updating scheme, which always updates its control signal whether the data is transmitted or not. That is, if the data are successfully transmitted, then the algorithm will use these data; if the data are lost, then the algorithm will use the latest available data that have been stored. SUS has an advantage over IUS in tracing accuracy for the same learning iteration number under a high data dropout rate, as it can successively improve its performance. However, the difference is rather subtle. In both schemes, the

188 368 D. Shen et al. / Information Sciences 381 (2017) Fig. 12. Maximal error profiles for IUS along the iteration axis: DDR = 50%. Fig. 13. Maximal error profiles for SUS along the iteration axis: DDR = 50%. input sequences are proved to converge to the desired input in an almost sure sense under stochastic noises and random data dropout. Illustrative simulations verify the theoretical analysis. For further research, more details on the relationship between the data dropout rate and the tracing performance are of great interest. Appendix Proof of Theorem 2. The proof is carried out by mathematical induction similar to the proof of Theorem 1. Initial Step. Consider the case t = 0. For t = 0, subtracting both sides of (18) from u d (0) leads to δu +1 ( 0 ) = δu ( 0 ) a L 0 e ( 1 ) = δu ( 0 ) a L 0 C + B ( 0 ) δu ( 0 ) a L 0 C + A ( 0 ) δx ( 0 ) + a L 0 C ( 1 ) w ( 1 ) + a L 0 v ( 1 ) (27)

189 D. Shen et al. / Information Sciences 381 (2017) Similar to the proof of Theorem 1, =1 a L 0 C + A (0) δx (0) <, =1 a L 0 C(1) w (1) <, and =1 a L 0 v (1) <, w.p.1. Note that L 0 C + B (0) is stable through the suitable selection of L 0. Thus, applying Lemma 1 to the recursion (27), we have that δu (0) 0 as w.p.1. Inductive Step. Assume that the convergence of u ( t ) has been proved for t = 0, 1,..., s 1 and that the aim is to show the convergence for t = s. The following recursion is easy to establish: δu +1 ( s ) = δu ( s ) a L s C + B s ) δu ( s ) ( ) s 1 s 1 a L s C + A ( s ) A ( j ) B ( i ) δu ( i ) i =0 ( j= i +1 ) s s 1 a L s C + A ( s ) A ( j ) w ( i ) i =0 j= i + a L s C ( s + 1 ) w ( s + 1 ) + a L s v ( s + 1 ) Through the induction assumption, we have L s C + A (s) ( s 1 s 1 i =0 j= i +1 A ( j) ) B (i) δu (i) 0 w.p.1, and the infinite summation of the last three terms on the right-hand side of (28) is finite. Thus, using Lemma 1 again leads to the conclusion. The proof is completed. (28) References [1] H.S. Ahn, Y.Q. Chen, K.L. Moore, Intermittent iterative learning control, in: Proc. the 2006 IEEE Int. Symposium on Intelligent Control, 2006, pp [2] H.S. Ahn, Y.Q. Chen, K.L. Moore, Iterative learning control: survey and categorization from 1998 to 2004, IEEE Trans. Syst. Man Cybern.-Part C 37 (6) (2007) [3] H.S. Ahn, K.L. Moore, Y.Q. Chen, Discrete-time intermittent iterative learning controller with independent data dropouts, in: Proc. the 2008 IFAC World Congress, 2008, pp [4] H.S. Ahn, K.L. Moore, Y.Q. Chen, Stability of discrete-time iterative learning control with random data dropouts and delayed controlled signals in networed control systems, in: Proc. the IEEE Int. Conf. Control Automation, Robotics, and Vision, 2008, pp [5] S. Arimoto, S. Kawamura, F. Miyazai, Bettering operation of robots by learning, J. Robot. Syst. 1 (2) (1984) [6] D.A. Bristow, M. Tharayil, A.G. Alleyne, A survey of iterative learning control: a learning-based method for high-performance tracing control, IEEE Control Syst. Mag. 26 (3) (2006) [7] M.W. Blacwell, O.R. Tutty, E. Rogers, R.D. Sandberg, Iterative learning control applied to a non-linear vortex panel model for improved aerodynamic load performance of wind turbines with smart rotors, Int. J. Control 89 (1) (2016) [8] X. Bu, Z. Hou, S. Jin, R. Chi, An iterative learning control design approach for networed control systems with data dropouts, Int. J. Robust Nonlinear Control 26 (2016) [9] X. Bu, Z.-S. Hou, F. Yu, Stability of first and high order iterative learning control with data dropouts, Int. J. Control Autom. Syst. 9 (5) (2011) [10] X. Bu, Z.-S. Hou, F. Yu, F. Wang, H- Iterative learning controller design for a class of discrete-time systems with data dropouts, Int. J. Syst. Sci. 45 (9) (2014) [11] X. Bu, T. Wang, Z. Hou, R. Chi, Iterative learning control for discrete-time systems with quantised measurements, IET Control Theory Appl. 9 (9) (2015) [12] X. Bu, F. Yu, Z.-S. Hou, F. Wang, Iterative learning control for a class of nonlinear systems with random pacet losses, Nonlinear Anal. Real World Appl. 14 (1) (2013) [13] H.F. Chen, Stochastic Approximation and Its Applications, Dordrecht, the Netherlands: Kluwer, [14] Y. Chen, C. Wen, Z. Gong, M. Sun, An iterative learning controller with initial state learning, IEEE Trans. Autom. Control 44 (2) (1999) [15] Y.S. Chow, H. Teicher, Probability Theory: Independence, Interchangeability, Martingales, Springer Verlag, New Yor, [16] S. Devasia, Iterative learning control with time-partitioned update for collaborative output tracing, Automatica 69 (2016) [17] S. Hao, T. Liu, W. Pasze, K. Galowsi, Robust iterative learning control for batch processes with input delay subject to time-varying uncertainties, IET Control Theory Appl. 10 (15) (2016) [18] L.-X. Huang, Y. Fang, Convergence analysis of wireless remote iterative learning control systems with dropout compensation, Math. Prob. Eng (2013) 1 9. Article ID [19] S.Z. Khong, D. Nesic, M. Krstic, Iterative learning control based on extremum seeing, Automatica 66 (2016) [20] X. Li, D. Huang, B. Chu, J.-X. Xu, Robust iterative learning control for systems with norm-bounded uncertainties, Int. J. Robust Nonlinear Control 26 (2016) [21] J. Li, J. Li, Distributed adaptive fuzzy iterative learning control of coordination problems for higher order multi-agent systems, Int. J. Syst. Sci. 47 (10) (2016) [22] X. Li, Q. Ren, J.-X. Xu, Precise speed tracing control of a robotic fish via iterative learning control, IEEE Trans. Ind. Electron. 63 (4) (2016) [23] X. Li, J.-X. Xu, D. Huang, Iterative learning control for nonlinear dynamic systems with randomly varying trial lengths, Int. J. Adapt. Control Signal Process. 29 (2015) [24] J. Liu, X. Ruan, Networed iterative learning control approach for nonlinear systems with random communication delay, Int. J. Syst. Sci. 47 (16) (2016) [25] C. Liu, J.-X. Xu, J. Wu, Iterative learning control for remote control systems with communication delay and data dropout, Math. Prob. Eng (2012) Article ID [26] D. Meng, W. Du, Y. Jia, Data-driven consensus control for networed agents: an iterative learning control-motivated approach, IET Control Theory Appl. 9 (14) (2015) [27] D. Meng, Y. Jia, J. Du, J. Zhang, High-precision formation control of nonlinear multi-agent systems with switching topologies: a learning approach, Int. J. Robust Nonlinear Control 25 (13) (2015) [28] D. Meng, K.L. Moore, Learning to cooperate: networs of formation agents with switching topologies, Automatica 64 (2016) [29] Y.-J. Pan, H.J. Marquez, T. Chen, L. Sheng, Effects of networ communications on a class of learning controlled non-linear systems, Int. J. Syst. Sci. 40 (7) (2009) [30] S. Saab, A discrete-time stochastic learning control algorithm, IEEE Trans. Autom. Control 46 (6) (2001) [31] D. Shen, Y. Wang, Survey on stochastic iterative learning control, J. Process Control 24 (12) (2014) [32] D. Shen, Y. Wang, Iterative learning control for networed stochastic systems with random pacet losses, Int. J. Control 88 (5) (2015)

190 370 D. Shen et al. / Information Sciences 381 (2017) [33] D. Shen, Y. Wang, ILC for networed nonlinear systems with unnown control direction through random lossy channel, Syst. Control Lett. 77 (2015) [34] D. Shen, Y. Xu, Iterative learning control for discrete-time stochastic systems with quantized information, IEEE/CAA J. Autom. Sin. 3 (1) (2016) [35] D. Shen, W. Zhang, Y. Wang, C.-J. Chien, On almost sure and mean square convergence of p-type ILC under randomly varying iteration lengths, Automatica 63 (1) (2016) [36] D. Shen, W. Zhang, J.-X. Xu, Iterative learning control for discrete nonlinear systems with randomly iteration varying lengths, Syst Control Lett. 96 (2016) [37] T.D. Son, G. Pipeleers, J. Swevers, Robust monotonic convergent iterative learning control, IEEE Trans. Autom. Control 61 (4) (2016) [38] L. Wang, X. He, D. Zhou, Average dwell time-based optimal iterative learning control for multi-phase batch processes, J. Process Control 40 (2016) [39] Y.-S. Wei, X.-D. Li, Iterative learning control for linear discrete-time systems with high relative degree under initial state vibration, IET Control Theory Appl. 10 (10) (2016) [40] W. Xiong, X. Yu, R. Patel, W. Yu, Iterative learning control for discrete-time systems with event-triggered transmission strategy and quantization, Automatica 72 (2016) [41] L. Zhang, W. Chen, J. Liu, C. Wen, A robust adaptive iterative learning control for trajectory tracing of permanent-magnet spherical actuator, IEEE Trans. Ind. Electron. 63 (1) (2016) [42] W. Zhou, M. Yu, D.Q. Huang, A high-order internal model based iterative learning control scheme for discrete linear time-varying systems, Int. J. Autom. Comput. 12 (3) (2015) [43] N. Zsiga, S. Dooren, P. Elbert, C.H. Onder, A new method for analysis and design of iterative learning control algorithms in the time-domain, Control Eng. Pract. 57 (2016) [44] J. van Zundert, J. Bolder, T. Oomen, Optimality and flexibility in iterative learning control for varying tass, Automatica 67 (2016)

191

192 Asian Journal of Control, Vol. 20, No. 6, pp. 1 13, November 2018 Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: /asjc.1656 ITERATIVE LEARNING CONTROL FOR NONLINEAR SYSTEMS WITH DATA DROPOUTS AT BOTH MEASUREMENT AND ACTUATOR SIDES Yanqiong Jin and Dong Shen ABSTRACT This paper discusses the iterative learning control (ILC) for nonlinear systems under a general networed control structure, in which random data dropouts occur independently at both measurement and actuator sides. Both updating algorithms are proposed for the computed input signal at the learning controller and the real input signal at the plant, respectively. The system output is strictly proved to converge to the desired reference with probability one as the iteration number goes to infinity. A numerical simulation is provided to verify the effectiveness of the proposed mechanism and algorithms. Key Words: Iterative learning control, data dropouts, asynchronous update laws, nonlinear systems, convergence analysis. I. INTRODUCTION Learning is a basic sill of humans whereby one can correct behaviors based on experiences when one completes some given tas repeatedly. This basic cognition is mimiced by the intelligent control strategy, namely, iterative learning control (ILC), which was first proposed by Arimoto last century [1]. In such control strategy, the system should repeat some tas in a finite interval, so that the tracing information of previous iterations can be used to correct the input signal for the current iteration. Then, the tracing performance is gradually improved along the iteration axis [2 8]. In recent years, with the help of fast developments of networ and communication techniques, many systems have adopted the networed control structure, in which the plant and the controller locate at different sites and communicate with each other through wired/wireless networs. For example, unmanned aerial vehicles (UAVs) can be used for surveillance of some specified area and the surveillance routine is usually repeatable. Then the control of UAVs is achieved through wireless networs. Another similar example is the trajectory-eeping control in satellite formation flying [9]. In these applications, the communication burden over networs is an important concern as the finite transfer ability conflicts with the huge transfer demand. Generally, we have two pos- Manuscript received January 13, 2017; revised July 13, 2017; accepted August 8, The authors are with College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. D. Shen is the corresponding author ( shendong@mail.buct.edu.cn). This wor is supported by National Natural Science Foundation of China ( , ) and Beijing Natural Science Foundation ( ). sible approaches, active and passive, to deal with the communication problem. The active way is to artificially reduce the transfer data such as quantized ILC [10,11], while the passive way is to design ILC algorithms that are robust regarding to random data dropouts. In this paper, we focus on the latter approach. That is, we are of interest to consider the control problem against random data dropouts. Several papers have been dedicated to the design and analysis of ILC algorithm under random data dropouts environments [12 27]. However, this topic has not been completely studied yet and there still exist gaps to fill compared with the general case. The data dropout is assumed to occur only at the measurement side in [12 21], that is, only the output data might be lost during the transmission from the plant to the controller, while the networ from the controller to the plant is assumed to wor well. The inherent principle in these papers is that, if the tracing information is successfully transmitted bac to the controller, then the algorithm updates its input signal, otherwise the algorithm stops updating and retains its previous signal. The convergence was established in the mean square sense [12 14], mathematical expectation sense [10,15 17], and almost sure sense [18 21]. Additionally, in [17,20,21], the authors also provided an alternative compensation scheme for the lost measurement, i.e., substituting the dropped pacet with the synchronous one from its previous iteration. This successive updating mode paves a novel way for data dropout problems. However, the lossless networ at the actuator side maes the whole control scheme wor similarly to no data dropout case. If the networ at the actuator side also suffers data dropouts, the control performance would be greatly influenced if we modify nothing to the control framewor Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

193 Asian Journal of Control, Vol. 20, No. 6, pp. 1 13, November 2018 Several researchers proceeded to consider the case that networs at both measurement and actuator sides suffer random data dropouts or communication delays [22 27]. In this case, the lost input pacet must be compensated because the plant should be continuously driven by some input signal. Both time-wise compensation and iteration-wise compensation mechanisms were proposed. In [22,23], the dropped pacet was compensated by its adjacent pacet within the same iteration. That is, the dropped data, say α (t), is compensated with the α (t 1). In [24,25], the delayed input was compensated by the pacet from its previous iteration. That is, the dropped data, say α (t), is compensated with α 1 (t). Therefore, it is clear that these mechanisms restrict the data from either adjacent time instances or adjacent iterations to be dropped or delayed meanwhile. In other words, successive data dropouts are not allowed, which implies that the random data dropouts are somewhat deterministic but not completely stochastic. In [26,27], the dropped input was compensated by the one used in the previous iteration and thus successive data dropouts are admitted. However, due to the analysis techniques, the authors had to impose additional conditions on the data dropout rate. These observations motivate us to further relax the strict convergence conditions. To recap, it is of great interest to consider the control strategy for the nonlinear system with data dropouts occurring at both measurement and actuator sides, where no extra assumption but only the Bernoulli distribution is required on the data dropout model. In this paper, both networs are allowed to suffer random data dropouts and a new memory is integrated to the plant for storing the real control signal (fed to the plant). Two inds of asynchronization are observed in this case. The first asynchronization exists between the computed input signal generated by the learning controller and the real input signal fed to the plant due to data dropouts at the actuator side. The other asynchronization lies in the updates at different time instances due to independent data dropouts for different time instances. To deal with such asynchronization and randomness, we first give a novel control framewor with both input signals being updated with its available data. The convergence of the proposed algorithms is strictly shown under simple design conditions on the learning gain matrix. The classical λ-norm technique is modified to address the involved randomness and the asynchronization is carefully analyzed to derive the convergence results. This paper is distinguished from existing papers in the following novelties: (i) the general data dropout environment is considered for nonlinear systems, in which networs at both measurement and actuator sides may suffer random data dropouts; (ii) no additional condition but only the Bernoulli distribution is required to model the data dropouts; (iii) a novel updating mechanism is proposed for both the computed and real input signals; and (iv) a novel convergence proof is provided for the nonlinear system with general data dropouts. It should be emphasized that, while we adopt the classical P-type update law in this paper for generating the input signals, the proposed approach is not limited to the P-type case. In other words, the extensions to other inds of update laws such as PD-type and ILC integrating with current-iteration feedbac control can be done following the similar steps given in this paper. To avoid tedious repetition of derivations, we omit these discussions. This paper is arranged as follows. Section II provides the problem formulation and the update algorithms. Section III gives the strict convergence analysis of the proposed algorithms. The extension to non-affine nonlinear system case is discussed in Section IV. Illustrative simulations are provided in Section V. Section VI concludes this paper. Notation. R denotes the real number set and R n is the space of the n-dimensional vectors. P(event) denotes the probability of the indicated event. For a random variable X, EX is its mathematical expectation. For a vector X R n, X is the -norm of the given vector defined as the maximal absolute value of its elements. Meanwhile, for a matrix M R n n, denote the -norm as M, n whichisdefinedby M = max 1 i n j=1 m ij where m ij is the entry of M. II. PROBLEM FORMULATION Consider the following affine nonlinear system: x (t + 1) =f (t, x (t)) + B(t)u (t), y (t) =C(t)x (t), where is the iteration number, = 1, 2,, t denotes thetimeinstance,t = 0, 1, 2,, N, andn is the iteration length. The variables x (t) R n, u (t) R p,and y (t) R q denote the system state, input, and output, respectively. f (, ) is a nonlinear continuous function. C(t) and B(t) are unnown time-varying matrices with appropriate dimensions. For brevity, we denote C + B(t) C(t + 1)B(t). To simplify the convergence analysis, we assume that C + B(t) is of full-column ran. Let y d (t), t {0, 1, 2,, N} be the desired reference. For the suitable initial state x d (0) such that y d (0) = C(0)x d (0), there always exists an unique desired input u d (t) that can generate the reference signal y d (t). Specifically, the desired input u d (t) is recursively defined as (1) 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

194 Y. Jin and D. Shen: ILC for Nonlinear Systems with Both Dropouts follows u d (t) =[(C + B(t)) T C + B(t)] 1 (C + B(t)) T (y d (t + 1) C(t + 1)f (t, x d (t))), x d (t + 1) =f (t, x d (t)) + B(t)u d (t). With this control signal, it is apparent that the following equations for the desired reference is satisfied, that is, the input u d (t) computed above drives the plant to generate the desired reference y d (t), x d (t + 1) =f (t, x d (t)) + B(t)u d (t), y d (t) =C(t)x d (t). Define the tracing error as (2) Fig. 1. Bloc diagram of the networed ILC framewor. e (t) =y d (t) y (t). (3) The following assumptions are required for the technical analysis. A1. For all t {0, 1, 2,, N}, the nonlinear continuous function f (t, ): R n R n satisfies the global Lipschitz condition, that is, x 1, x 2 R n, f (t, x 1 ) f (t, x 2 ) f x 1 x 2 (4) where f > 0 is the Lipschitz constant. This assumption is made mainly for the technical analysis as a modified λ-norm technique is employed to derive the convergence of tracing error with probability one in the next section. For the extension from global Lipschitz condition to local Lipschitz condition, a possible way is to adopt similar techniques in [18,19]. However, this paper aims to provide a novel convergence proof for ILC under data dropouts at both measurement and actuator sides, of which the data dropout condition is rather relaxed, thus we assume the global Lipschitz condition for a concise proof. A2. The initial state of the system is reset to be x d (0) at every iteration, i.e., x (0) =x d (0), 1. This assumption is the well-nown identical initialization condition (i.i.c.), one of the fundamental issues in ILC. It has been used in many ILC papers as repetition is the basic premise of ILC. If i.i.c. is not satisfied, then the perfect tracing is hard to achieve by learning algorithms, at least for the initial position/portion of the desired reference. Many papers have been dedicated to extending i.i.c. by introducing additional mechanisms such as initial rectifying mechanism [28] or initial learning mechanism [29]. Such mechanisms can be combined with the results given in this paper to deal with the initial resetting issue. Besides, if the initial state is not identically reset but locates in a bounded range around x d (0), then one can obtain that the tracing error converges to a small zone around zero similarly to [30]. In this paper, we consider a general formulation of the networed ILC framewor, in which the plant and the learning controller are connected by the wired/wireless networs as shown in Fig. 1. In this framewor, two networs exist both from the plant to the learning controller, namely, at the measurement side, and from the learning controller to the plant, namely, at the actuator side. Moreover, both networs would suffer random data dropouts. To model this point, we introduce two random variables σ (t) and γ (t) subjected to 0 1 Bernoulli distribution for both sides, respectively. In other words, both σ (t) and γ (t) are equal to 1 if the corresponding data is successfully transmitted, and 0 otherwise. In addition, P(σ (t) =1) = σ(t) and P(γ (t) =1) = γ(t) where 0 < σ(t), γ(t) < 1. Note that both networs wor individually, thus it is rational to assume that σ (t) is independent of γ (t). The control objective of this paper is to design a suitable input updating scheme such that the generated input sequence ensures zero-error convergence with probability one for nonlinear systems with data dropouts. Moreover, the system output driven by such updating scheme can trac the desired reference asymptotically as the iteration number goes to infinity. To achieve the control objective, in this paper, the controller update law follows the basic holding strategy. To be specific, if the data transmits successfully at the measurement side, then the learning controller would update its input signal. Otherwise, if the data is lost during the transmission at the measurement side, then the learning controller stops updating and retains the previous input signal. On the other hand, if the input signal 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

195 Asian Journal of Control, Vol. 20, No. 6, pp. 1 13, November 2018 is successfully transmitted at the actuator side, then the plant would use the newly arrived input signal. Otherwise, if the input signal is lost during the transmission, then the plant would retain the previous input signal stored in the memory. To mae the following expressions concise, hereafter we call the input generated by the learning controller as computed input signal, denoted by u c(t), and the input used for the plant as real input signal, denoted by u r (t), respectively. Then, the computed input signal is updated as u c +1 (t) =σ +1(t)u r (t)+[1 σ +1(t)]u c (t) + σ +1 (t)l t e (t + 1), where L t is the learning gain matrix to be designed later. Moreover, the real input signal used for the plant is given as (5) u r +1 (t) =γ +1 (t)uc +1 (t)+[1 γ +1 (t)]ur (t). (6) Remar 1. Note that the random data dropouts occur independently at both measurement and actuator sides, thus the update of both computed and real input signals might be asynchronous. That is, the computed input is updated when the data is successfully transmitted bac from the plant. However, this latest input may fail to be transmitted to the plant so that the real input signal retains the previous one. In this case, the asynchronization between the computed and real input signals arises. Moreover, it is worth pointing out that the update of both inputs is also asynchronous along the time axis as the random variables σ (t) and γ (t) are independent for different time instances. In addition, it should be noted that the ILC scheme given in Fig. 1 requires no transient growth problem existing when output dropouts occur, because in this case, when large transient errors occur, the controller may have no information about them due to the data dropouts and thus cannot stop the transient growth. III. CONVERGENCE ANALYSIS OF ILC ALGORITHMS In this section, the convergence of the proposed algorithms (5) and (6) for both the computed and real input signals to the desired input u d (t) with probability one is proved and then the output of the system (1) would trac the desired reference y d (t) asymptotically as iteration number goes to infinity. As remared in the last section, there exists asynchronization in the updating of the computed and real input signals. Such asynchronization maes it nontrivial to establish the convergence proof. To this end, we first derive the expressions for both input errors and then build an augmented regression model, so that the asynchronization can be treated as internal randomness (see Lemma 1). The property of the newly introduced random matrix in the regression model of the augmented input errors is then analyzed (see Lemma 2). By applying a modified λ-norm technique according to the random asynchronization, the contraction mapping of the input errors is strictly established to show the convergence (see Theorem 1). We first state the auxiliary lemmas, whose proofs are given in the Appendix. Denote δu c u d (t) uc (t) and δu r u d (t) ur (t) as the errors of the computed and real inputs, respectively. Define the augmented input error δu (t) =[(δu c (t))t, (δu r (t))t ] T. (7) Then we have the following characterization of this augmented input error. Lemma 1. For the augmented input error given in (7), the following regression holds, where δu +1 (t) =P (t)δu (t) Q (t)[ f (t, x d (t)) f (t, x (t))], P (t) = [ [1 σ +1 (t)]i σ +1 (t)[i L t C + B(t)] γ +1 (t)[1 σ +1 (t)]i [ ] σ Q (t) = +1 (t)l t C(t + 1) (10) γ +1 (t)σ +1 (t)l t C(t + 1) with the expression in the position mared by being ], [1 γ +1 (t)]i + γ +1 (t)σ +1 (t)[i L t C + B(t)]. This lemma characterizes the random asynchronization between the computed and real inputs, demonstrated by the random matrix P (t). ItisclearthatP (t) depends on both and t, which reflects the asynchronization in iteration-domain and time-domain, respectively. Note that σ +1 (t) is independent of γ +1 (t) and both of them value 0 or 1. Thus, P (t) has four possible outcomes, in which one case implies the asynchronization state between the two inputs (σ +1 (t) =1andγ +1 (t) =0), two cases imply the synchronization state (γ +1 (t) =1), and one case implies the maintenance of the previous state (σ +1 (t) =γ +1 (t) =0). For the regression model (8), the contraction mapping property of the matrix P (t) is important for the (8) (9) 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

196 Y. Jin and D. Shen: ILC for Nonlinear Systems with Both Dropouts convergence analysis. This property is clarified in the following lemma. Lemma 2. If the learning gain matrix L t in (5) satisfies I L t C + B(t) < 1, then we have δx (t + 1) [ f (t, x d (t)) f (t, x (t))] + B(t)δu r (t) f δx (t) + B(t) δu r (t) f δx (t) + b δu r (t) (15) sup E P (t) < 1. (11) t Now, the main theorem is given as follows. Theorem 1. Consider the nonlinear system (1) and assume A1-A2 hold. If the learning gain matrix L t in (5) satisfies I L t C + B(t) < 1, then both the computed and real input sequences generated by the algorithms (5) and (6) converge to the desired input u d (t) givenin(2) with probability one as, thatis,u c(t) u d (t), u r (t) u d (t), t, with probability one as. Consequently, the actual tracing error e (t) 0with probability one as. Proof. Taing -norm to both sides of the regression model for the augmented input error (8) yields that δu +1 (t) P (t)δu (t) + Q (t)[ f (t, x d (t)) f (t, x (t))] P (t) δu (t) + Q (t) [f (t, x d (t)) f (t, x (t))] P (t) δu (t) + f Q (t) δx (t) (12) where A1 is applied to the last inequality. Noticing the independence property of the involved variables, we tae mathematical expectation to (12) and obtain that E δu +1 (t) E P (t) E δu (t) + f E Q (t) E δx (t), (13) because all terms in (12) are positive and the inequality holds by the order preservation property of mathematical expectation for random variables. Noticing system (1) and the desired reference model (2) as well as the control framewor in Fig. 1, we have δx (t + 1) =[f (t, x d (t)) f (t, x (t))] + B(t)δu r (t) (14) where δx (t) x d (t) x (t).thentaing -norm to both sides of (14) leads to where b max t B(t). We further tae mathematical expectation to the last inequality, where all variables are positive, E δx (t + 1) f E δx (t) + b E δu r (t). (16) Bacward iterating this inequality along the time axis further leads to E δx (t + 1) 2 f E δx (t 1) + b E δu r (t) + f b E δu r (t 1) t b t i E δu r f (i) i=0 (17) where assumption A2 (i.e., δx (0) =0) is applied. Consequently, we have t 1 E δx (t) b t 1 i E δu r f (i). (18) i=0 Now substituting (18) into (13) leads to E δu +1 (t) E P (t) E δu (t) t 1 + b E Q (t) t i E δu r f (i). (19) Because δu r (t) is part of δu (t), wehave δur(t) δu (t) for all t. Thus, from (19) it follows i=0 E δu +1 (t) E P (t) E δu (t) t 1 + b E Q (t) t i E δu f (i). i=0 (20) Now the classical λ-norm technique can be used. Specifically, multiply both sides of last inequality with α λt where α>1andλ>1 are defined later, and then tae 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

197 Asian Journal of Control, Vol. 20, No. 6, pp. 1 13, November 2018 supremum according to all time instances t, sup(α λt E δu +1 (t) ) t sup t E P (t) sup(α λt E δu (t) ) t + b sup E Q (t) t sup t ( t 1 α λt i=0 t i E δu f (i) Let α> f, then it is observed that sup t sup t sup t sup t ( t 1 α λt i=0 t i E δu f (i) ) ( t 1 ) α λt α t i E δu (i) ( t 1 i=0 ) α (λ 1)t i E δu (i) i=0 ( t 1 ). ) α λi E δu (i) α (λ 1)(t i) i=0 ( sup α λi ) E δu (i) sup t t sup t ( t 1 ) α (λ 1)(t i) i=0 ( α λi E δu (i) ) 1 α (λ 1)t α λ 1 1. Define a new λ-norm of δu (t) as ( δu (t) λ sup α λt ) E δu (t). t Substituting (22) into (21) yields that δu +1 (t) λ where ρ and φ are defined as ρ = sup E P (t), t (21) (22) ( ρ + b φ 1 ) α (λ 1)t δu α λ 1 1 (t) λ (23) φ = sup E Q (t). t Note that P (t) and Q (t) depends on σ +1 (t) and γ +1 (t) only, while the latter are identically and independently distributed with respect to and t. Thus, both ρ and φ are independent of iteration index as the mathematical expectation operator E is involved. From Lemma 2, we find ρ<1. Let α>max{1, f }, then there always exists a sufficiently large λ such that 0 < b φ 1 α (λ 1)t < 1 ρ. From this observation we α λ 1 1 further get ρ ρ + b φ 1 α (λ 1)t < 1. (24) α λ 1 1 Thus, from (23) we have lim δu (t) λ = 0, t. The time t is finite, then lim E δu (t) = 0, t. Further, noting δu (t) 0, it is clear that lim δu (t) = 0, t, with probability one. Thus, it is apparent that lim δu c(t) = 0and lim δu r (t) = 0. Furthermore, by (17) we now lim δx (t) = 0 and then lim e (t) =0, t.this completes the proof. Remar 2. In the proof, the classical λ-norm is modified by introducing a mathematical expectation operator to the associated variables. Roughly speaing, this modification can effectively handle the newly introduced randomness (or asynchronization), which is generated by the random data dropouts at both measurement and actuator sides. This technique can be applied to deal with other similar random factors in ILC such as iteration-varying lengths [30]. Remar 3. One may argue the conservativeness of the λ-norm technique, which has been discussed in some previous papers. However, it is worth pointing out that the λ-norm is only used to pave the way for convergence analysis. The intrinsic convergence property of the proposed algorithms is independent of the analysis technique. That is, the conservative analysis technique does not imply that the updating algorithms are conservative. Indeed, the P-type update law has remarable tracing performance and thus it is therefore believed that the proposed algorithms behave well under general random data dropouts environments. The tracing performance of the proposed algorithms is illustrated in Section V. Remar 4. In the proof, the monotonic convergence in the λ-norm sense is shown in (23). However, One may interest in monotonic convergence in the vector norm sense. To this end, we can lift the augmented input into a super-vector form U = [E δu (0) T, E δu (1) T,, E δu (N 1) T ]T and derive the associated matrix Γ from (19) as a bloc lower-triangular matrix with its elements being the parameters of (19). Then, we have U +1 Γ U. Consequently, the input error converges to zero monotonically if one can design L t satisfying Γ < 1. However, this condition requires additional system information, which may restrict the applicability. Remar 5. In practical applications, the transient growth problem along the iteration axis is an important issue in ILC for ensuring a safe operation process Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

198 Y. Jin and D. Shen: ILC for Nonlinear Systems with Both Dropouts A pseudospectra analysis based approach was proposed in [31] to solve this issue for linear systems. In this paper, we assume that the transient growth problem does not occur as we consider the general data dropout problem. Moreover, we employ a simple holding-strategy at the actuator side to avoid the situation that zero-input causes large transient error. Generally, the transient growth problem under successive data dropouts is of great importance and interest. The techniques in [31] (and the references therein) may provide a possible way to solve this problem. In addition, the references [32,33] also provide some ideas on how to solve the transient growth problem. IV. EXTENSIONS TO NON-AFFINE NONLINEAR SYSTEMS In this section, we consider the following discrete-time non-affine nonlinear system x (t + 1) =g(t, x (t), u (t)), y (t) =C(t)x (t), (25) where the notations have the same meaning to (1) except the nonlinear function g(t, x (t), u (t)). Here, t, assume that g(t,, ) R n R q R n are continuously differential with respect to its arguments x and u. To be specific, denote D 1, (t) g and D g x x (t) 2, (t) where u u (t) x (t) denotes the vector that lies between x d (t) and x (t) and u (t) lies between u d (t) and u (t). The following assumptions are for the analysis. A3.For the suitable initial state x d (0), there exists a unique u d (t) such that x d (t + 1) =g(t, x d (t), u d (t)), y d (t) =C(t)x d (t). (26) A4. For any t {0, 1, 2,, N} the global Lipschitz condition holds for the nonlinear function g(t, x, u) in the sense that g(t, x 1, u 1 ) g(t, x 2, u 2 ) g x 1 x 2 + b u 1 u 2. Without loss of any generality, assume that D 2, (t) is non-singular. Moreover,, t, D 1, (t) g, D 2, (t) b. Theorem 2. Consider the nonlinear system (25) and assume A2-A4 hold. If the learning gain matrix L t in (5) satisfies I L t C + D 2, (t) < 1, then both the computed and real input sequences generated by the algorithms (5) and (6) converge to the desired input u d (t) given in (26) with probability one as, thatis,u c(t) u d (t), u r (t) u d (t), t, with probability one as. Consequently, the actual tracing error e (t) 0with probability one as. Proof. The proof can be performed similarly to that of Theorem 1. Thus, here we mainly provide the major revisions according to the general formulations. Based on (25) and (26), the state difference becomes δx (t + 1) =g(t, x d (t), u d (t)) g(t, x (t), u r (t)) = D 1, (t)δx (t)+d 2, (t)δu r (t). (27) The error dynamics is then replaced by e (t + 1) =C(t + 1)δx (t + 1) = C + D 1, (t)δx (t) (28) + C + D 2, (t)δu r (t) where C + C(t + 1). Comparing (28) with (38), we can observe the analogy with B(t) being replaced by D 2, (t) and the associated matrix P (t) now turns into P (t) = [ ] [1 σ +1 (t)]i σ +1 (t)[i L t C + D 2, (t)] γ +1 (t)[1 σ +1 (t)]i where the expression in the position mared by is (29) [1 γ +1 (t)]i + γ +1 (t)σ +1 (t)[i L t C + D 2, (t)]. This further yields δu +1 (t) =P (t)δu (t) Q (t)d 1, (t)δx (t). (30) Thus, taing the -norm first and taing mathematical expectation then to (30) yields E δu +1 (t) E P (t) E δu (t) + g E Q (t) E δx (t) (31) where A4 is applied to the last inequality. From (27) and A4, bacward iterating the state difference similarly to (17) we have t 1 E δx (t) b t 1 i E δu g (i). (32) i= Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

199 Asian Journal of Control, Vol. 20, No. 6, pp. 1 13, November 2018 Similarly to (21), apply the λ norm to both sides of the inequality (31) and combine with (32), then, ( sup α λt ) E δu +1 (t) t sup t ( E P (t) sup α λt ) E δu (t) t + b sup E Q (t) t sup t ( t 1 α λt i=0 t i g E δu (i) ). (33) According to the changes from (27) to (33) and following the similar steps to the proof of Theorem 1, we have δu +1 (t) λ ( ρ + b φ 1 ) α (λ 1)t δu α λ 1 1 (t) λ. (34) Then, using the condition I LC + D 2, (t) < 1, it is easy to obtain 0 < ρ < 1 following similar proof of Lemma 2. Hence, by choosing a sufficient large λ, itfollows that (24) is valid for this case. Then the proof can be completed following routine derivations. Remar 6. In this section, the results are extended to the non-affine nonlinear system. One may argue that the condition on D 2, (t) is conservative because both time- and iteration-varying factors are taen into simultaneously. However, the condition are widely satisfied in practical applications as the system runs around the equilibrium or the desired state x d (t). Then the partial derivative matrices in the neighborhood would ensure the validity of the condition and thus guarantee the convergence of the proposed algorithms. Such convergence, in turn, contributes the validity of the condition. In addition, following similar steps, we can also extend the linear output equation to the nonlinear case. This case is omitted in this paper for brevity. Remar 7. The problem formulations and updating laws in [26,27] are much similar to those in this paper. The major differences between [26,27] and this paper lie in three aspects: convergence analysis techniques, design of learning gain matrix, and conditions on data dropouts. First, [26,27] established the convergence based on the limit analysis of series, while we formulate the asynchronism between the computed and real inputs by randomly switching matrices and show the convergence based on a modified contraction mapping method. Second, the selection of learning gain matrix in [26,27] depends on not only the system information but also the data dropout rate, while in this paper it only depends on the input/output coupling matrix. Last, additional conditions on data dropout is imposed in [26,27], while we only require that the transmission networs are not completely broen down. V. ILLUSTRATIVE SIMULATIONS To show the effectiveness of the proposed ILC algorithms, let us consider the following non-affine nonlinear system, x (1) (t + 1) = 0.75 sin(t) sin(x(1) (t)) + 0.1x (1) (t) cos(x(2) + ( cos (t)) ( x (2) (t)+u (t) 5 x (2) (t + 1) = 0.5cos(t) cos(x(2) (t)) + 0.2sin(t) cos(x (1) (t)) +( sin(u (t) 10))u (t), y (t) =0.1x (1) (t)+0.02t1 3 x (2) (t), )) u (t), where x (t) = [x (1) (t) x(2) (t)]t denotes the state. The iteration length is N = 50. The desired reference is y d (t) = 0.5sin(πt 20) sin(πt 10). The initial state is set x (0) =x d (0) =0. Without loss of any generality, the initial input is set to be u 0 (t) =0, t. The learning gain L t is selected as 0.9, which satisfies the design condition given in Theorem 1, that is, 0 < 1 L t C + B(t) < 1. The proposed algorithms (5) and (6) run for 150 iterations. To model the random data dropouts occurring at both measurement and actuator sides, in the simulation, we generate random variables σ (t) and γ (t) independently for different iterations and different time instances. In addition, σ (t) is also independent of γ (t). Bothσ (t) and γ (t) are binary Bernoulli random variables with the expectation σ(t) and γ(t). Note that both σ(t) and γ(t) are also the probabilities of successful transmission. Then the values 1 σ(t) and 1 γ(t) denote the average rate that the data is lost during the transmission. Thus we called this value as data dropout rate (DDR) in the rest of this section. In order to demonstrate the effectiveness of the learning algorithms under general data dropouts conditions, three scenarios are considered in this simulation. For simplicity, we let DDR at the measurement side is equal to that at the actuator side Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

200 Y. Jin and D. Shen: ILC for Nonlinear Systems with Both Dropouts tracing performance tracing performance tracing performance desired trajectory 20th iteration 50th iteration 150th iteration time axis (a) Case 1: DDR=15% desired trajectory 20th iteration 50th iteration 150th iteration time axis (b) Case 2: DDR=30% desired trajectory 20th iteration 50th iteration 150th iteration time axis (c) Case 3: DDR=45% Fig. 2. Tracing performance of system output at the 20th, 50th, and 150th iterations under general data dropouts for three cases. [Color figure can be viewed at wileyonlinelibrary.com] Case 1. DDR= 15% at both measurement and actuator sides. That is, σ(t) = γ(t) =0.85 or P(σ (t) =1) = P(γ (t) =1) =0.85. Case 2. DDR= 30% at both measurement and actuator sides. That is, σ(t) = γ(t) =0.70 or P(σ (t) =1) = P(γ (t) =1) =0.70. Max tracing error DDR=0 DDR=15% DDR=30% DDR=45% iteration axis Fig. 3. Maximal tracing error profiles. [Color figure can be viewed at wileyonlinelibrary.com] Case 3. DDR= 45% at both measurement and actuator sides. That is, σ(t) = γ(t) =0.55 or P(σ (t) =1) = P(γ (t) =1) =0.55. The tracing performance of the system output at the 20th, 50th, and 150th iterations are illustrated in Fig. 2. As can be observed from this figure, the proposed algorithms ensure a convergence of the system output to the desired reference. At the 20th iteration, the outputs of three cases are deflected from the reference; while at the 150th iteration, all outputs achieve satisfactory tracing precision. Thus the proposed algorithms have good behavior against general data dropouts conditions. On the other hand, comparing Fig. 2(a) and Fig. 2(c), it is seen that the tracing precision at the 50th iteration of the former case is better than that of the latter case. This observation implies that large DDR would slow the convergence speed. To further show this point, the maximal tracing error (MTE) profiles are displayed in Fig. 3 where the MTE is defined as max t e (t) for the -th iteration. In Fig. 3, four lines are plotted with different marers, denoting the cases DDR= 0,15%,30%,and 45%, respectively. Two facts can be seen from the figure: the first one is that the larger the DDR, the slower the convergence speed (coinciding with Fig. 2); the other is that all lines decreases fast in the semi-logarithmic coordinates, which shows the effectiveness of the proposed algorithms. Moreover, to demonstrate the asynchronization between the computed input signal and the real input signal, we introduce a counter τ (t) for any given time 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

201 asynchronization number Asian Journal of Control, Vol. 20, No. 6, pp. 1 13, November 2018 Table I. Statistics of the asynchronization number, iteration maximum= % 45% ideal number total number average number iteration axis 0 (a) Case 1: DDR=15% asynchronization number 15% 0 60 time axis time axis iteration axis (b) Case 2: DDR=30% asynchronization number DDR 100 independently for different time instances. Moreover, the average value of 𝜏 (t) at the last iteration approximates the product of iteration number and the DDRs for all three cases. To be specific, when DDR= 15%, 30%, and 45%, the product (the expected amount of asynchronization states) is % = 22.5, % = 45, and % = 67.5, respectively. To see this point, we provide the statistical results of Fig. 4 in the Table I, where the ideal number denotes products of the iteration number and the DDR (first row), the total number denotes the amount occurrence of asynchronization for each case (second row), and the average number is computed by dividing the total number by the time length N (last row). It can be seen from the table that the average number almost equals to the ideal number for each case. VI. CONCLUSIONS time axis iteration axis (c) Case 3: DDR=45% Fig. 4. Asynchronization of the computed and real input signals: 𝜏 (t). [Color figure can be viewed at wileyonlinelibrary.com] instance t, denoting the amount number up to the -th iteration of the case that the computed input signal is not equal to the real input signal. That is, the counter value increases only when both computed input and real input achieve an asynchronous state. In other words, if uc (t) = ur (t), then the counter 𝜏 (t) is unchanged; otherwise, if uc (t) ur (t), then the counter 𝜏 (t) increases one integer. The profiles for all time instances are plotted in Fig. 4, in which all profiles rise as the iteration number goes up. This figure illustrates that the asynchronization occurs randomly along the iteration axis and This paper addresses the ILC problem for nonlinear discrete-time systems with data dropouts occurring at both measurement and actuator sides. Both updating laws are proposed for the computed input signal and the real input signal, whence the asynchronization between the two input signals are allowed. The zero-error convergence with probability one of the system output to the desired reference is strictly proved. In addition, the results show that the simple compensating mechanism has good tracing performance and robustness against random factors. Numerical simulations verify the effectiveness of the proposed algorithms. For further research, it is of great interest to dig out the influence of random data dropouts on the tracing performance. REFERENCES 1. Arimoto, S., S. Kawamura, and F. Miyazai, Bettering operation of robots by learning, J. Robot Syst., Vol. 1, No. 2, pp (1984). 2. Bristow, D. A., M. Tharayil, and A. G. Alleyne, A survey of iterative learning control: a learning-based method for high-performance tracing control, IEEE Control Syst. Mag., Vol. 26, No. 3, pp (2006) Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

202 Y. Jin and D. Shen: ILC for Nonlinear Systems with Both Dropouts 3. Ahn, H. S., Y. Q. Chen, and K. L. Moore, Iterative learning control: Survey and categorization from 1998 to 2004, IEEE Trans. Syst. Man Cybern.-C, Vol. 37, No. 6, pp (2007). 4. Shen, D. and Y. Wang, Survey on stochastic iterative learning control, J. Process Control,Vol.24,No.12, pp (2014). 5. Zhu, Q., J.-X. Xu, D. Huang, and G.-D. Hu, Iterative learning control for linear discrete-time systems with unnown high-order internal models: A time-frequency analysis, Asian J. Control. (2017) Xu, Y., D. Shen, and X.-D. Zhang, Stochastic point-to-point iterative learning control based on stochastic approximation, Asian J. Control,Vol.34, No. 3, pp (2017). 7. Chi, R., Z. S. Hou, S. Jin, and B. Huang, Computationally-light non-lifted data-driven norm-optimal iterative learning control, Asian J. Control (2017) Shen, D. and Y. Xu, Iterative learning control for discrete-time stochastic systems with quantized information, IEEE/CAA J. Autom Sinica, Vol. 3, No. 1, pp (2016). 9. Ahn, H. S., K. L. Moore, and Y. Q. Chen, Trajectory-eeping in satellite formation flying via robust periodic learning control, Int. J. Robust Nonlinear Control, Vol. 20, No. 14, pp (2010). 10. Zhang, T. and J. Li, Iterative learning control for multi-agent systems with finite-leveled sigma-delta quantization and random pacet losses, IEEE Trans. Circuits Syst. I: Regul. Pap., Vol. 64, No. 8, pp (2017). 11. Zhang, T. and J. Li, Event-triggered iterative learning control for multi-agent systems with quantization, Asian J. Control. (2017) /asjc Ahn, H. S., Y. Q. Chen, and K. L. Moore, Intermittent iterative learning control, Proc. IEEE Int. Symp. Intell. Control, Munich, Germany, pp (2006). 13. Ahn, H. S., K. L. Moore, and Y. Q. Chen, Discrete-time intermittent iterative learning controller with independent data dropouts, Proc. 17th IFAC World Congr., Coex, South Korea, pp (2008). 14. Ahn, H. S., K. L. Moore, and Y. Q. Chen, Stability of discrete-time iterative learning control with random data dropouts and delayed controlled signals in networed control systems, Proc. 10th Int. Conf. Control Autom. Robot. Vision, Hanoi, Vietnam, pp (2008). 15. Bu, X., Z. S. Hou, and F. Yu, Stability of first and high order iterative learning control with data dropouts, Int. J. Control, Autom. Syst.,Vol.9,No.5, pp (2011). 16. Bu, X., Z. S. Hou, F. Yu, and F. Wang, H- iterative learning controller design for a class of discrete-time systems with data dropouts, Int. J. Syst. Sci., Vol. 45, No. 9, pp (2014). 17. Liu, J. and X. Ruan, Networed iterative learning control design for nonlinear systems with stochastic output pacet dropouts, Asian J. Control. (2017) Shen, D. and Y. Wang, ILC for networed nonlinear systems with unnown control direction through random lossy channel, Syst. Control Lett., Vol. 77, pp (2015). 19. Shen, D. and Y. Wang, Iterative learning control for networed stochastic systems with random pacet losses, Int. J. Control, Vol. 88, No. 5, pp (2015). 20. Shen, D., C. Zhang, and Y. Xu, Intermittent and successive ILC for stochastic nonlinear systems with random data dropouts, Asian J. Control. (2017) Shen, D., C. Zhang, and Y. Xu, Two compensation schemes of iterative learning control for networed control systems with random data dropouts, Inf. Sci., Vol. 381, pp (2017). 22. Bu, X., F. Yu, Z. S. Hou, and F. Wang, Iterative learning control for a class of nonlinear systems with random pacet losses, Nonlinear Anal. Real World Appl., Vol. 14, No. 1, pp (2013). 23. Pan, Y.-J., H. J. Marquez, T. Chen, and L. Sheng, Effects of networ communications on a class of learning controlled non-linear systems, Int. J. Syst. Sci., Vol. 40, No. 7, pp (2009). 24. Liu, J. and X. Ruan, Networed iterative learning control approach for nonlinear systems with random communication delay, Int. J. Syst. Sci., Vol. 47, No. 16, pp (2016). 25. Liu, J. and X. Ruan, Networed iterative learning control design for discrete-time systems with stochastic communication delay in input and output channels, Int. J. Syst. Sci., Vol. 48, No. 9, pp (2017). 26. Liu, J. and X. Ruan, Networed iterative learning control for discrete-time systems with stochastic pacet dropouts in input and output channels, Adv. Differ. Equ. (2017) /s Liu, J. and X. Ruan, Synchronous-substitution-type iterative learning control for discrete-time networed control systems with Bernoulli-type stochastic 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

203 Asian Journal of Control, Vol. 20, No. 6, pp. 1 13, November 2018 pacet dropouts, IMA J. Math. Control Inf. (2017) Sun, M. and D. Wang, Iterative learning control with initial rectifying action, Automatica, Vol. 38, No. 7, pp (2002). 29. Chen, Y. Q., C. Wen, Z. Gong, and M. Sun, An iterative learning controller with initial state learning, IEEE Trans. Autom. Control, Vol. 44, No. 2, pp (1999). 30. Shen, D., W. Zhang, and J.-X. Xu, Iterative learning control for discrete nonlinear systems with randomly iteration varying lengths, Syst. Control Lett., Vol. 96, pp (2016). 31. Bristow, D. A. and J. R. Singler, Towards transient growth analysis and design in iterative learning control, Int. J. Control, Vol. 84, No. 7, pp (2011). 32. Delchev, K., Iterative learning control for nonlinear systems: A bounded-error algorithm, Asian J. Control, Vol. 15, No. 2, pp (2013). 33. Par, K.-H. and Z. Bien, A study on iterative learning control with adjustment of learning interval for monotone convergence in the sense of sup-norm, Asian J. Control, Vol. 4, No. 1, pp (2002). VII. APPENDIX 7.1 Proof of Lemma 1 Subtracting both sides of (5) from u d (t) leads to δu c +1 =u d (t) uc +1 (t) =u d (t) {σ +1 (t)u r (t)+[1 σ +1 (t)]uc (t) + σ +1 (t)l t e (t + 1)} =σ +1 (t)δu r (t)+(1 σ +1 (t))δuc (t) σ +1 (t)l t e (t + 1). (35) Similarly, subtracting both sides of (6) from u d (t) yields δu r +1 = γ +1(t)δu c +1 (t)+[1 γ +1(t)]δu r (t) (36) where δu c u d (t) u c(t) and δur u d (t) u r (t) denote errors for the computed and real input signals, respectively. Moreover, from the system formulation we have δx (t + 1) =[f (t, x d (t)) f (t, x (t))] + B(t)δu r (t) (37) where δx (t) x d (t) x (t). Meanwhile, the tracing error is e (t) =C(t)δx (t). Thus, e (t + 1) =C + [ f (t, x d (t)) f (t, x (t))] + C + B(t)δu r (t) (38) where C + = C(t + 1) for short. Substituting (38) into (35) yields δu c +1 = σ +1 (t)[i L t C+ B(t)]δu r (t) + σ +1 (t)l t C + [ f (t, x d (t)) f (t, x (t))] (39) +[1 σ +1 (t)]δu c (t). Further, substituting (39) into (36) leads to δu r +1 =[1 γ +1 (t)]δu r (t)+γ +1 (t)[1 σ +1 (t)]δuc (t) + γ +1 (t)σ +1 (t)l t C + [ f (t, x d (t)) f (t, x (t))] + γ +1 (t)σ +1 (t)[i L t C + B(t)]δu r (t). (40) Based on (39) and (40), noting the augmented input error δu (t) and associated matrices P (t) and Q (t), the regression model (8) holds obviously. This completes the proof. 7.2 Proof of Lemma 2 It is seen that P (t) is a stochastic matrix with two random variables σ +1 (t) and γ +1 (t), which has four possible situations as follows. Case 1. σ +1 (t) =1, γ +1 (t) =1. [ ] 0 I P 1 (t) = Lt C + B(t) 0 I L t C +. B(t) Case 2. σ +1 (t) =1, γ +1 (t) =0. [ ] 0 I P 2 (t) = Lt C + B(t). 0 I Case 3. σ +1 (t) =0, γ +1 (t) =1. [ ] I 0 P 3 (t) =. I 0 Case 4. σ +1 (t) =0, γ +1 (t) =0. [ ] I 0 P 4 (t) =. 0 I 2017 Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

204 Y. Jin and D. Shen: ILC for Nonlinear Systems with Both Dropouts Then we could introduce four binary random variables μ i,1 i 4, such that μ i {0, 1} and μ 1 + μ 2 + μ 3 + μ 4 = 1. Note that these four μ i are dependent, since whenever any one is equal to 1, all the others have to be 0. The random variable μ i is used to describe the occurrence of P i (t) for P (t),thatis,ifp (t) values Pi (t),thenμ i = 1. Recalling the formulation of σ (t) and γ (t) in Section II, we have that p 1 = P(μ 1 = 1) = σ(t) γ(t), p 2 = P(μ 2 = 1) = σ(t)[1 γ(t)], p 3 = P(μ 3 = 1) =[1 σ(t)] γ(t), p 4 = P(μ 4 = 1) =[1 σ(t)][1 γ(t)]. Then we can obtain that E P (t) = E μ 1 P 1 (t)+μ 2 P2 (t)+μ 3 P3 (t)+μ 4 P4 (t) 4 4 = P(μ i = 1) μ j P j i=1 (t) j=1 4 = P(μ i = 1) P i (t). i=1 (41) Noticing the form of P i (t), 1 i 4, and definition of -norm, we have that P i (t) = 1, i = 2, 3, 4. While for P 1(t), it is apparent that P1(t) < 1 as long as L t is designed satisfying that I L t C + B(t) < 1. As long as the networs at both measurement and actuator sides are not completely broen, we must have p 1 > 0, and then E P (t) < 1, t. This further results in that sup t E P (t) < 1. The proof is completed. Yanqiong Jin received the B.E. degree in automation from Beijing University of Chemical Technology, Beijing, China, in Now she is pursuing a master degree at Beihang University. Her research interests include iterative learning control and its applications on motion robots. Dong Shen received the B.S. degree in mathematics from Shandong University, Jinan, China, in He received the Ph.D. degree in mathematics from the Academy of Mathematics and System Science, Chinese Academy of Sciences (CAS), Beijing, China, in From 2010 to 2012, he was a Post-Doctoral Fellow with the Institute of Automation, CAS. From 2016 to 2017, he was a visiting scholar at National University of Singapore, Singapore. Since 2012, he has been an associate professor with College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China. His current research interests include iterative learning controls, stochastic control and optimization. He has published more than 60 refereed journal and conference papers. He is the author of Stochastic Iterative Learning Control (Science Press, 2016, in Chinese) and co-author of Iterative Learning Control for Multi-Agent Systems Coordination (Wiley, 2017). Dr. Shen received IEEE CSS Beijing Chapter Young Author Prize in 2014 and Wentsun Wu Artificial Intelligence Science and Technology Progress Award in Chinese Automatic Control Society and John Wiley & Sons Australia, Ltd

205

206 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Data-Driven Learning Control for Stochastic Nonlinear Systems: Multiple Communication Constraints and Limited Storage Dong Shen, Member, IEEE Abstract This paper proposes a data-driven learning control method for stochastic nonlinear systems under random communication conditions, including data dropouts, communication delays, and pacet transmission disordering. A renewal mechanism is added to the buffer to regulate the arrived pacets, and a recognition mechanism is introduced to the controller for the selection of suitable update pacets. Both intermittent and successive update schemes are proposed based on the conventional P-type iterative learning control algorithm, and are shown to converge to the desired input with probability one. The convergence and effectiveness of the proposed algorithms are verified by means of illustrative simulations. Index Terms Communication delay, data dropout, datadriven control, disorder, iterative learning control (ILC), stochastic nonlinear control systems. I. INTRODUCTION AFTER three decades of developments, iterative learning control (ILC) has become an important branch of intelligent control for repetitive systems [1] [3]. Repetitive systems are those systems that can complete some given tracing tas in a finite time interval and then repeat the process for a number of times. For such systems, we can generate a control signal for the current iteration by incorporating the control signals and tracing information from previous iterations so that the tracing performance can be gradually improved along the iteration axis. This characteristic of ILC mimics the inherent principle of human learning and is thus effective for repetitive systems. Indeed, ILC has been explored in relation to many new issues in learning systems, such as iterationvarying lengths [4], [5], interval learning tracing [6], terminal ILC [7], primitive-based ILC [8], and quantized ILC [9], [10]. Successful applications of ILC have also been reported, including robot fish [11], permanent magnet spherical actuators [12], and marine vibrators [13]. Most existing ILC literature is concerned with conventional centralized control systems, in which the controller and the plant are colocated and the information is transmitted without any delay or loss. However, many modern applications Manuscript received October 17, 2016; revised February 23, 2017; accepted April 17, This wor was supported in part by the National Natural Science Foundation of China under Grant and Grant and in part by the Beijing Natural Science Foundation under Grant The author is with the College of Information Science and Technology, Beijing University of Chemical Technology, Beijing , China ( shendong@mail.buct.edu.cn). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TNNLS (e.g., robot fish and unmanned aerial vehicle) use networed control structures, in which the controller and the plant are located on different sites and communicate through wireless networs. Such implementation approaches are convenient, flexible, and robust because of the rapid development of fast communication and networ techniques. However, they may suffer from multiple randomness in the form of data dropouts, communication delays, and transmission disordering because of networ congestion, broen linages, and transmission errors. These random phenomena can critically influence the control performance. Thus, it is of great importance to investigate the performance of learning control under various communication constraints. In addition, we should point out that there are two distinct inds of control with networs: control of networs and control over networs. The former case usually involves the control of multiagent systems consisting of several agents or subsystems by using the neighbor agents information [14], [15]. The latter case involves control that is conducted via the networs, that is, the system is implemented following a networed structure in which the control signal and measurement information are transmitted through the networs. In this paper, we focus on the latter case, in which the random communication constraints are of great interest. The data dropout problem has been widely discussed in several ILC papers, whereas the other two reported issues are rarely addressed. Early attempts to address the data dropout issue were made in [16] [23] from different viewpoints. In those studies, the data dropout was modeled by a Bernoulli random variable in [16] [21], and by an arbitrary stochastic-sequence model with a finite length requirement in [22] and [23]. Moreover, the convergence results so obtained include mean-square convergence [16] [18], expectation convergence [19] [21], and almost-sure convergence [22], [23]. In addition, three different models of data dropouts were taen into account in [24]: stochastic sequence, Bernoulli random variable, and Marov chain (in which a switchedsystem approach was introduced). However, it should be noted that in all those papers, the input signals are required to unchange if no new measurement data arrive. That is, only the intermittent update scheme (IUS) specified later in this paper was used in those papers. More investigations are thus expected to improve the control performance. Some papers have addressed the communication delay problem [25], [26]. The P-type networed ILC scheme was proposed for discrete-time systems [25], in which the delayed X 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

207 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS data are compensated by the data from the previous iteration. In such a scheme, successive delays along the iteration axis are not allowed. Meanwhile, in [26], successive communication delays are handled in a similar way to that of random asynchronism among different subsystems, and asymptotic convergence is established. There are also some papers that consider time delays, such as [27] [30], in which the time delay is assumed to be iteration invariant. It was pointed out in [30] that such iteration-invariant delays have little influence on the convergence. In this paper, we consider random communication delays along the iteration axis rather than the time delays addressed by much of the other literature. In addition, the random disorder problem has not been discussed in the previous ILC literature. It is such observations that motivated this paper. Specifically, the goals of this paper are as follows. First, this paper addresses the ILC problem under multiple random constraints, including data dropouts, communication delays, and data-pacet transmission disorder. To the best of our nowledge, data dropouts and communication delays have attracted some preliminary attempts, but the disorder problem has not been discussed. The major difficulty here is to describe the combined effects of data dropouts, communication delays, and pacet transmission disordering in a unified framewor. In contrast with our previous wor [22], [23] in which only the data dropout problem was addressed, this paper is the first to show differences related to multiple communication constraints and limited storage conditions. Moreover, this paper also aims to propose effective schemes for dealing with multiple randomness. Most previous papers used the IUS to handle a specific communication constraint, whereas here we begin by showing that the IUS is convergent under multiple constraints, and then, we propose the successive update scheme (SUS) as an alternative, in which the input can still update its signal by using the latest available data when the corresponding data are lost. This is the second difference between this paper and our previous wor [22], [23]. Furthermore, we should emphasize that a renewal mechanism for the data-receiving buffer and a recognition mechanism for the self-updating controller are also proposed as we consider more complex conditions. In addition, the IUS and SUS are compared. In short, this paper proposes a framewor for modeling multiple constraints, and novel update mechanisms for handling the constraints. The main contributions of this paper are as follows. 1) A unified stochastic-sequence framewor without specific statistical hypotheses is proposed for modeling multiple random constraints, including data dropouts, communication delays, and pacet transmission disordering. 2) A renewal mechanism and a recognition mechanism are proposed for the buffer in order to deal with the combined effect of multiple constraints. 3) Two ILC update algorithms are proposed: an IUS and an SUS. The almost-sure convergence of such schemes is strictly proved by means of stochastic approximation theory. The remainder of this paper is arranged as follows. The problem is formulated in Section II, including Fig. 1. Bloc diagram of networed control system. the system setup, communication constraints, and control objectives. The IUS and SUS are detailed in Sections III and IV, respectively, along with their convergence analyses. Illustrative simulations are given in Section V, and Section VI concludes this paper. Notations: R denotes the real number field, and R n is the n-dimensional real space. N is the set of all positive integers. E is the mathematical expectation. A superscript T denotes the transpose of a matrix or vector. For two sequences {a n } and {b n }, we call a n = O(b n ) if b n 0 and there exists L > 0suchthat a n Lb n, n, anda n = o(b n ) if b n 0 and (a n /b n ) 0asn. II. PROBLEM FORMULATION We begin this section by setting up the system and mae certain wea assumptions. We then detail the communication constraints in order to establish a suitable model. The control objective is provided at the end of this section, with two primary lemmas. A. System Setup and Assumptions Consider the following single-input-single-output (SISO) nonlinear system: x (t + 1) = f (t, x (t)) + b(t, x (t))u (t) y (t) = c(t)x (t) + v (t) (1) where the subscript = 1, 2,... denotes different iterations. The argument t {0, 1,...,N} labels the time instants in an iteration of the process, with N being the length of the iteration. The system input, state, and output are u (t) R, x (t) R n,andy (t) R, respectively, and v (t) denotes random measurement noise. Both f (t, x (t)) and b(t, x (t)) are continuous functions, where the argument t indicates that the functions are time-varying, and c(t) is the output coupling coefficient. The setup of the control system is shown in Fig. 1, where the plant and learning controller are located separately and communicate via networs. To mae our main idea intuitively understandable, the communication constraints are considered for the output side only. In other words, the random communication constraints occur only on the networ from the measurement output to the buffer, whereas the networ from the learning controller to the control plant is assumed to wor well. If the networ at the actuator side was also to suffer from communication constraints, an asynchronism would arise between the control generated by the learning

208 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SHEN: DATA-DRIVEN LEARNING CONTROL FOR STOCHASTIC NONLINEAR SYSTEMS 3 controller and the one fed to the plant. This asynchronism would require more steps to establish the convergence. Indeed, such an extension could be accomplished by incorporating the path analysis techniques from [31]. In this paper, the data transmission of the measurement outputs might encounter multiple random factors, such as data dropouts, communication delays, and pacet transmission disordering. Thus, as shown in Fig. 1, a buffer is required to allow the learning controller to provide a correction mechanism and ensure smooth running. The mechanism will be detailed in Section II-B. For system (1), we need the following assumptions. A1: The desired reference y d (t), t {0, 1,...,N} is realizable, i.e., there exist a suitable initial state x d (0) and input u d (t) such that x d (t + 1) = f (t, x d (t)) + b(t, x d (t))u d (t) y d (t) = c(t)x d (t). (2) A2: The real number c(t + 1)b(t, ) that couples the input and output is unnown and nonzero. Its sign, which characterizes the control direction, is assumed to be nown in advance. Without loss of generality, it is simply assumed that c(t + 1)b(t, ) >0 for all iterations. A3: For any t, the measurement noise {v (t)} is an independent sequence along the iteration axis with Ev (t) = 0, Ev 2(t) <, and lim sup n (1/n) n =1 v 2(t) = Rt v, a.s., where Rv t is unnown. A4: The initial values can be asymptotically reset precisely in the sense that x (0) x d (0) as where x d (0) is given in A1. Here, we mae some remars about these assumptions. Assumption A1 relates to the desired reference, which, if not realizable, means that no such input exists that satisfies (2). In that case, we would redefine the problem statement as one that achieves the best approximation of the reference. Assumption A2 requires the control direction to be nown. However, if the direction is not nown a priori, we can employ techniques similar to those proposed in [23] and [32] to regulate the control direction adaptively. This assumption also implies that the relative degree of system (1) is one. In addition, it is worth pointing out that the choice of an SISO system here is only to mae the algorithm and analysis concise and easy to follow. The results in this paper could be extended to a multi-input-multioutput (MIMO) affine system by modifying the ILC update laws slightly; a gain matrix should multiply the tracing error term for regulating the control direction. The independence condition is required in A3 along the iteration axis, but this is rational for practical applications, because the process is repeatable. It is clear that common Gaussian white noise satisfies this assumption. Assumption A4 means that the initial state can be asymptotically precise. This assumption is relaxed compared with the conventional identical initial condition for the initial state. The initial learning or rectifying mechanism given in [33] and [34] can be incorporated in the following analysis to further deal with the initial shift problem. However, this is beyond the present scope and thus is omitted. In addition, we do not impose the conventional global Lipschitz condition on the nonlinear functions. Fig. 2. (a) Data dropout. (b) Communication delay. (c) Transmission disordering. B. Communication Constraints In this paper, three types of communication constraint are taen into consideration: data dropouts, communication delays, and transmission disordering. In this section, we discuss these random factors briefly and propose a unified description of the multiple communication constraints. In addition, a mechanism is provided to regulate the arriving pacets. Fig. 2 shows the three communication constraints along the iteration axis for any fixed time instant. A solid square box denotes a data pacet coming from the output side of the control plant, whereas a dashed square box denotes possible storage of the buffer. For brevity, we assume throughout that the data are paced and transmitted according to the time label, and in Fig. 2, we plot only the pacets with the same time label. Thus, different square boxes denote data in different iterations. Focusing on the colored box in Fig. 2(a), the pacets before and after it would be successfully transmitted, whereas the colored one might be dropped during transmission. A communication delay is shown in Fig. 2(b); adjacent colored boxes arrive at the buffer nonadjacently, which results in the second colored box being delayed. Fig. 2(c) displays the disordering case, in which the second colored box arrives at the buffer ahead of the first. All these random communication conditions would mae the data pacets in the buffer chaotic. For practicality and to reduce control costs, we limit the storage capacity of the buffer, which means that there will usually be insufficient storage for all the data coming from the output. In some cases, the available space may only accommodate the data of one iteration, which is the minimum buffer capacity with which to ensure the learning process. Therefore, we need to consider the possibility of limited information when we design the learning control. To solve the problem of information chaos and limited storage, a simple renewal mechanism is proposed for the buffer. Each pacet contains the whole output information at one time instant; we choose not to consider any more refined types of data partitioning. Each pacet is then labeled with an iteration-stamp, allowing the buffer to identify the iteration index of pacets. Meanwhile, each pacet is also labeled with a time stamp so that the renewals of different time instants are conducted independently. On the buffer side, only the latest pacet with respect to the iteration stamp is stored in the buffer and is used for updating the control signal. Here, we explain this mechanism briefly. For any fixed time t, suppose that a pacet with iteration stamp 0 is received successfully by the buffer. The buffer will then compare it with the previously stored pacet to determine which iteration-stamp number is closer to the current iteration

209 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS index. If the iteration-stamp number of the stored pacet is larger than that of the new arrival, then the new arrival is discarded. Otherwise, the original pacet is replaced by the newly arrived one. As only the latest pacet is stored, there are no excessive requirements for the size of the buffer. However, we should emphasize that more freedom of pacet renewal and control design is provided if extra storage is available to accommodate more data in the buffer. In that case, additional advantages (e.g., convergence speed and tracing precision) may be obtained by designing suitable update algorithms with additional tracing information. This would lead to the interesting and open problem of determining the optimal storage. In this paper, we consider one iteration-storage case only in order to remain focused on the topic in hand. Under the communication constraints, the pacet in the buffer will not be replaced at each iteration. One pacet may be maintained in the buffer for several successive iterations, the length of which is random because of the combined effect of the above-communication constraints. It is hard to impose a statistical model on the random successive duration of each pacet along the iteration axis. However, the length of time for which a pacet is maintained in the buffer is usually bounded, unless the networ has broen down. Thus, we use the following wea assumption for the buffer renewal to describe the combined effect of multiple communication constraints. A5: The arrival of a new pacet is random and does not obey any probability distribution. However, the length between adjacent arrivals should be bounded by a sufficiently large number M, which does not need to be nown in advance. That is, there is a number M such that during M successive iterations, the buffer will renew the output information at least once. The communication assumption A5 is wea and practical; we impose no probability distribution on it, which maes it widely applicable. A finite bound is required for the length between adjacent arrivals. However, it is not necessary to now the specific value of the maximum length M. Thatis, only the existence of such a bound is required, and thus, the design of the ILC update law is independent of its specific value. It should be noted that the value of M corresponds to the worst case communication conditions; usually, a larger value of M implies a harsher communication condition. Such a property is demonstrated in the illustrative simulations, in which a uniform length distribution is imposed to characterize the effect of M on the tracing performance. However, it is not necessary for M and the average renewal frequency to be related positively. C. Control Objective and Preliminary Lemmas We now present our control objective. Let F σ {y j (t), x j (t), v j (t),1 j, t {0, 1,...,N}} be a σ -algebra generated by y j (t), x j (t), andv j (t), 0 t N, 1 j. Then, the set of admissible control is defined as U {u +1 (t) F, sup u (t) <, a.s., t {0, 1,...,N}, = 0, 1, 2,...}. The control objective of this paper is to find an input sequence {u (t), = 0, 1,...} U under the communication constraints (i.e., data dropouts, communication delay, and pacet transmission disordering) that minimizes the averaged tracing index, t {0, 1,...,N} V (t) = lim sup n 1 n n y (t) y d (t) 2 (3) =1 where y d (t) is the desired reference given in A1. If we define the control output as z (t) = c(t)x (t), itiseasytoshowthat z (t) y d (t) as whenever the tracing index (3) is minimized, and vice versa. In other words, index (3) implies that precise tracing is achieved if all measurement noise is eliminated. We note that when considering the optimization of a composite objective function, what is nown as the advanced fine tuning (AFT) approach [35] can be used to solve the problem. Note that both AFT and ILC are data-driven methods and thus can be applied to nonlinear systems. However, the implementation of AFT is more complex than that of ILC, and the learning speed of AFT can be lower than that of ILC as the former has to learn more information. For simplicity, we denote f (t) f (t, x (t)), f d (t) f (t, x d (t)), b (t) b(t, x (t)), b d (t) b(t, x d (t)), δu (t) u d (t) u (t), δx (t) x d (t) x (t), δ f (t) f d (t) f (t), δb (t) b d (t) b (t), andc + b (t) c(t + 1)b (t). The subscripts and d in f (t), f d (t), b (t), andb d (t) denote merely that these functions depend on the state x (t) or x d (t), and not that the functions are iteration-varying. For further analysis, we require the following lemmas, the proofs of which are the same as in [22] and thus are omitted for brevity. Lemma 1: Assume that assumptions A1 A4 hold for system (1). If lim δu (s) = 0, s = 0, 1,...,t, then at time t + 1, δx (t + 1) 0, δ f (t + 1) 0, δb (t + 1) 0 as. Lemma 2: Assume that assumptions A1 A4 hold for system (1) and for tracing reference y d (t). Then, index (3) will be minimized as V (t) = Rv t for any arbitrary time t, if the control sequence {u (i)} is admissible and satisfies u (i) u d (i) as, i = 0, 1,...,t 1. In this case, the input sequence {u (t)} is called the optimal control sequence. Lemma 1 paves the way for connecting the state convergence at the next time instant and the input convergence at all previous time instants. This lemma plays a supporting role in the application of mathematical induction in the convergence analysis. Lemma 2 characterizes the optimal solution according to the tracing index. Based on Lemma 2, it is sufficient to show that the input sequence converges to the desired input defined in assumption A1. In the following, we propose two update schemes for generating the optimal control sequence {u (t)} under the communication constraints. The first scheme is called the IUS, in which the control signal retains the latest one if no new output arrives at the buffer. The second is called the SUS, in which the control signal eeps updating even if no new pacet arrives. The tracing performances of these two schemes are compared in numerical simulations.

210 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SHEN: DATA-DRIVEN LEARNING CONTROL FOR STOCHASTIC NONLINEAR SYSTEMS 5 III. IUS AND ITS CONVERGENCE In this section, we provide an in-depth discussion of the IUS. Specifically, we begin by studying the path behavior of IUS for an arbitrary fixed time instant t, and we provide a recognition mechanism to ensure a smoothing improvement of the algorithm. We then introduce a sequence of stopping times to specify the learning algorithm, and we give the convergence results. Under the communication constraints, for arbitrary time instant t, the pacet stored in the buffer and used for the th iteration is the one with the ( m (t))th iteration-stamp, where m (t) is a random variable over {1, 2,...,M}, andm is defined as in assumption A5. Some observed properties of m (t) are as follows. If there is no communication constraint, then m (t) = 1, ; otherwise,m (t) >1. When transmission disordering occurs for any given, we might expect m +1 (t) m (t) + 1. In the remainder of this paper, the argument t will be omitted from m (t) to simplify the notation and to avoid tedious repetition. Note that m is a random variable; without loss of generality, there are upper and lower bounds of m, i.e., m m M with m 1 because of the communication constraints. In the IUS, the input is generated from the latest available information and its corresponding input, that is u (t) = u m (t) + a m e m (t + 1) (4) where e (t) y d (t) y (t) and a is the learning step size (defined later),, t. By simple calculations, we have e (t + 1) = c + b (t)δu (t) + ϕ (t) v (t + 1) (5) where ϕ (t) = c + δ f (t) + c + δb (t)u d (t). Before proceeding to the main theorem for the IUS case, we perform some primary analyses of the input update. Let us begin with an arbitrary iteration, say 0, for which the input is given as u 0 (t) = u 0 m 0 (t) + a 0 m 0 e 0 m 0 (t + 1). We now proceed to the next iteration, i.e., the ( 0 + 1)th iteration. If no pacet arrives at the buffer, then m 0 +1 = m and the input for this iteration is u 0 +1(t) = u m (t) + a m e m (t + 1) = u 0 m 0 (t) + a 0 m 0 e 0 m 0 (t + 1) where m m 0 +1 and the last equality is valid, because m = (m 0 + 1) = 0 m 0. Consequently, u 0 +1(t) = u 0 (t). In other words, the input remains invariant when no new pacet is received. However, according to assumption A5, this input will not remain unchanged forever. Indeed, after several iterations (say τ iterations, for example), a new pacet will arrive successfully at the buffer and the input is then updated. However, we should carefully chec the iteration stamp, say 1, of the newly arrived pacet. Specifically, noting that the iteration stamp of the pacet at the 0 th iteration is 0 m 0 and recalling the renewal mechanism whereby only the one with larger iteration-stamp will be accepted, we have 1 0 m 0. However, the iteration stamp must be Fig. 3. Illustration of two scenarios of new arrivals. smaller than the corresponding iteration number; thus, we have τ 1, because we assume that the subsequent updating occurs at the ( +τ)th iteration. In short, 0 m τ 1. As such, two scenarios should be considered for the iteration stamp 1 of the newly arrived pacet: 1 with 0 m , and 1 with τ 1 (see Fig. 3). In the former scenario, updating the input at the ( 0 + τ)th iteration would generate a mismatch between the iteration labels of the tracing error and the existing input. The algorithm is a combination of several staggered updating procedures, which maes convergence analysis intricate. In the latter scenario, updating at the ( 0 + τ)th iteration could use input u 0 (t), i.e., the update would be u 0 +τ (t) = u 1 + a 1 e 1 (t + 1) = u 0 + a 1 e 1 (t + 1). Remar 1: By analyzing the two scenarios in Fig. 3, we find that Scenario 1 would lead to a mismatch between the iteration labels of the tracing error and the stored input. To deal with this problem, a possible solution is to augment the capacity of the buffer to store more historical data of the input or the tracing error so that we can always match the input and the tracing error. This is an advantage of extra storage, as discussed in Section II-B. Determining the optimal capacity of the buffer and designing and analyzing the corresponding learning algorithms remain open problems. In this paper, we consider the one-iteration storage case; thus, we have to adopt another simple method whereby we discard the pacet in Scenario 1 and wait for suitable pacets (see the following for details). To mae the following analysis more concise, an additional recognition mechanism is proposed to allow the learning controller to define the suitable information for updating. Assume that the latest update occurs at the 0 th iteration. If no new pacet is received, then the input will remain as u 0 (t). Otherwise, the controller will chec whether the iterationstamp of the new pacet is smaller than 0. If so, then this pacet is neglected, and the update is delayed until a new pacet with an iteration-stamp number larger than or equal to 0,say 1, is received. The learning controller will then update its input signal using u 0 (t) and e 1 (t + 1). Note that e 1 (t + 1) is actually generated by u 0 (t), since 1 0.This update procedure is shown in Fig. 4, where, for any fixed time instant t, the boxes in the top row denote the output pacets for successive iterations. The colored pacets are received by the buffer and used for updating successfully, whereas the dashed ones are lost during transmission, either discarded by the renewal mechanism or neglected by the recognition mechanism. The boxes in the bottom row denote the inputs in

211 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Fig. 4. Illustration of the recognition mechanism. different iterations; the solid and dashed ones denote updating iterations and holding iterations, respectively. The arrows lin the input updating with its corresponding tracing information. Remar 2: Another explanation for the recognition mechanism is that we expect the input of 0 to behave generally better than previous ones, because an improvement has been made, maing it unnecessary to update further using information from iterations before 0. Meanwhile, this mechanism maes it possible for us to update the control signal smoothly with limited storage. We now formulate ILC based on the renewal and recognition mechanisms for the IUS case. For arbitrary time instant t, we define a sequence of random stopping times {τ i }, i = 1, 2,..., where τ i denotes the iteration number of the ith update of the control signal for time t, corresponding to the solid boxes in the bottom row of Fig. 4. It should be noted that {τ i } is defined for different time instants independently, denoting the asynchronous update for different time instants; we omit the associated argument t throughout to simplify the notation. Without loss of generality, we assume that τ 0 = 0. The pacet used for the ith update has iteration-stamp τ i n τi, corresponding to the colored boxes in the top row of Fig. 4, where n τi is a random variable due to the communication constraints (see Section II-B), 1 n τi M. Recalling the recognition mechanism, we have τ i n τi τ i 1, τ i τ i 1 2M, i, and the input generating e τi n τi (t + 1) is u τi 1 (t), t. The update algorithm can now be rewritten as and u τi (t) = u τi 1 (t) + a τi 1 e τi n τi (t + 1) (6) u (t) = u τi (t), τ i < τ i+1 1. (7) This algorithm is, in essence, an event-triggered update, because τ i is an unnown random stopping time and n τi is an unnown random variable; thus, this algorithm differs from the conventional deterministic framewor. The learning step size {a } is a decreasing sequence that satisfies a > 0, a 0, =1 a =, =1 a 2 <, anda j = a (1 + O(a )), M j. It is clear that a = 1/( +1) meets all these requirements. We now present the following convergence theorem for the IUS; the proof can be found in Appendix A. Theorem 1: Consider system (1) and control objective (3), and assume that assumptions A1 A5 hold, then the input sequence {u (t)} generated by IUS (6) and (7) with the renewal and recognition mechanisms is an optimal control sequence. In other words, u (t) converges to u d (t) a.s. as for any t, 0 t N 1. Theorem 1 reveals the essential convergence and optimality property of the IUS for nonlinear system (1) under multiple communication constraints and limited storage. It should note that the convergence is an asymptotic property in which only the limits are characterized. Remar 3: The proposed IUS (6) and (7) updates its input only when a satisfactory pacet is received. Thus, the update frequency may be low if severe communication constraints arise. In practice, the tracing performance worsens as the communication environments deteriorate, as shown in the simulations in the following. Roughly speaing, more severe communication constraints imply that the average gap of τ i is large, so that the learning step size a τi goes to 0 relatively quicly, which might result in quite slow learning. Remar 4: For any given learning step-size sequence {a }, an alternative modification to the algorithm could increase the convergence speed of the first iterations. Specifically, the controller records its updating times and, then, changes the step size in turn only when the update actually occurs. That is, algorithm (6) is replaced by u τi (t) = u τi 1 (t)+a i e τi n τi (t+1). The convergence results of Theorem 1 remain valid. Remar 4 gives an alternative IUS that could increase the convergence speed from the perspective of selecting the learning gain. However, according to Remar 3, if the communication environments are harsh, the renewal and recognition mechanisms may lower the updating frequency and then the convergence speed. Motivated by this observation, we propose an alternative framewor, i.e., the SUS, in Section IV. IV. SUS AND ITS CONVERGENCE As noted in Section III, the IUS might have a low learning speed along the iteration axis if the communication environment is seriously impaired. In such case, the algorithm would require many iterations to achieve an acceptable performance as the update frequency is low. Thus, it is impractical in most real applications. A possible solution is to mae the best use of the available information by increasing the learning step size to improve the convergence speed. In this section, we propose another scheme called SUS, in which the input eeps updating using the latest available pacet when no new pacet is received by the buffer. It is apparent that such an update principle is in contrast with the IUS, which eeps the input invariant if no satisfactory pacet is received by the buffer. Thus, we expect the SUS to be advantageous in that the tracing performance might be improved iteration by iteration. The renewal mechanism of the buffer and the recognition mechanism of the controller are still valid for the SUS. Consequently, the random stopping time τ i and random variable n τi are defined in the same way as in the IUS case. For the SUS, the update for the τ i iteration is then given as u τi (t) = u τi 1(t) + a τi 1e τi n τi (t + 1) (8) and for τ i < τ i+1 1 u (t) = u 1 (t) + a 1 e τi n τi (t + 1) (9) in which case we have u τi+1 1(t) = u τi 1(t) + τ i+1 2 a =τ i 1 e τi n τi (t + 1). (10)

212 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SHEN: DATA-DRIVEN LEARNING CONTROL FOR STOCHASTIC NONLINEAR SYSTEMS 7 Because the algorithm eeps updating in the SUS case, it is not an event-triggered algorithm. However, we should emphasize in particular that the alteration of the error signal is eventtriggered, regulated by the renewal and recognition mechanisms. By observing the subscript of the input and step size on the right-hand side, (8) is different from (6). Specifically, in (8), the subscript of the input and step-size is τ i 1, whereas in (6), it is τ i 1. Moreover, (9) differs from (7) in the successive updating. We now have the following convergence theorem for SUS; the proof is given in Appendix B. Theorem 2: Consider system (1) and control objective (3), and assume that assumptions A1 A5 hold. Then, the input sequence {u (t)} generated by SUS (8) and (9) with the renewal and recognition mechanisms is an optimal control sequence. In other words, u (t) converges to u d (t) a.s. as for any t, 0 t N 1. Theorem 2 indicates the asymptotic convergence of the SUS along the iteration axis and shows the optimality of the generated input sequence. From this viewpoint, both IUS and SUS can guarantee the convergence of the input sequence to the desired input with probability one. However, the major difference between IUS and SUS lies in the following points. First of all, the IUS is an event-triggered updating, whereas the SUS is an iteration-triggered updating. Moreover, the updating frequency of the IUS depends on the rate of successful transmission, renewal, and recognition, and thus is low if the communication constraints are harsh. In contrast, the SUS eeps updating for all iterations. Thus, it is expected that the SUS can guarantee a better convergence performance when the communication environments deteriorate. This point is illustrated by simulations in Section V. Remar 5: In this paper, we consider an SISO system for the sae of concise expression and analysis. The results can be extended to an MIMO case, in which the vectors c(t) and b(t, x) are replaced with matrices C(t) and B(t, x). We assume u (t) R p and y (t) R q,and then, C(t) R q n and B(t, x) R n p. In such case, the control direction is determined by the coupling matrix C(t + 1)B(t, x) R q p, which is more complicated than in the SISO case. To ensure the convergence of the algorithm, an additional matrix L(t) R p q should left multiply the error term e (t + 1) in (6), (8), and (9) to regulate the control direction. The design condition for L(t) is that all eigenvalues of L(t)C(t + 1)B(t, x) are with positive real parts. The convergence proofs can be conducted following similar steps. V. ILLUSTRATIVE SIMULATIONS Consider the following affine nonlinear system as an example, in which the state is 2-D: x (1) (t + 1) = 0.8x (1) (t) + 0.3sin ( x (2) (t) ) u (t) x (2) (t + 1) = 0.4cos ( x (1) (t) ) x (2) (t) u (t) y (t) = x (1) (t) + x (2) (t) + v (t) where x (1) (t) and x (2) (t) denote the first and second dimension of x (t), respectively. It is easy to chec that c + b(t) = = 0.56 > 0. Fig. 5. Illustration of iteration dwelling length. As a simple illustration, let N = 40 and the measurement noise v (t) be zero-gaussian distributed, v (t) N(0, ). The reference trajectory is y d (t) = 20 sin((t/20)π). The initial control action is given simply as u 0 (t) = 0, t. Here, the selection of the initial input value does not affect the inherent convergence property of the proposed algorithm. The learning gain chooses a = (1/ + 1). Each algorithm is run for 300 iterations. For each time instant, in order to simulate the renewal and recognition mechanisms dealing with random communication constraints, we begin by generating a sequence of random numbers {τ } that are uniformly distributed over {1, 2,...,M}, where M is defined as in assumption A5. Thus, in essence, τ denotes the random dwelling length/iterations of each received pacet along the iteration axis (caused by communication constraints). We should clarify that we simulate the pacet alternation in the buffer directly rather than the specific communication constraints to illustrate the combined effects of multiple communication constraints and limited storage under renewal and recognition mechanisms (see Fig. 4) and to provide a suitable parameter for the following comparison analysis (see assumption A5). It is then apparent that the accumulation number σ = i=1 τ i corresponds to those iterations at which input updating (6) or (8) occurs, whereas for the other iterations, the input algorithm (7) or (9) wors. Note that both τ and σ are random variables, indicating the event-triggered character of the input updating; neither are nown prior to running the algorithms. An illustration of τ is given in Fig. 5, where M = 5. As can be seen from this figure, τ is randomly valued in the set {1, 2, 3, 4, 5}. This is a simulation of the iteration dwelling length for which a pacet is stored in the buffer. Thus, the average dwelling length (i.e., mathematical expectation of τ ) could be regarded as a data transmission rate (DTR) index. Specifically, because a uniform distribution is adopted, the mathematical expectation of τ is (M +1)/2. This means that, on average, a feasible pacet is received and an update occurs every (M + 1)/2 iterations. In the case of Fig. 5, we have M = 5 and therefore, Eτ = 3, i.e., an update happens every three iterations on average. The explanation of this is twofold: the data loss rate is 2/3, and the updating is three times slower than in the case of no communication constraints. In the following, we first show the performance of the IUS, and then turn our attention to the SUS. The comparisons between the IUS and SUS are detailed at the end of this section.

213 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Fig. 7. Averaged absolute tracing error (( N i=1 e (i) 2 /N)) 1/2 : M=3, 5, and 11 for IUS case. Fig. 6. y 300 (t) versus y d (t) for IUS case. (a) M = 5. (b) M = 11. A. IUS Case We begin by considering the IUS case with M = 5. The tracing performance of the final iteration (i.e., the 300th iteration) is shown in Fig. 6(a), where the solid line with circles is the reference signal and the dashed line with crosses denotes the actual output y 300 (t). The fact that the output tracs the desired positions demonstrates the convergence and effectiveness of the IUS. The deviations seen in Fig. 6(a) are caused mainly by stochastic measurement noise, which cannot be canceled by any learning algorithm, because it is completely unpredictable. As explained previously, M or (M + 1)/2 corresponds the DTR index, and thus, we are interested in the influence of M. We simulate this example further for M = 3 and M = 11 with average iteration dwelling lengths of 2 and 6, respectively. It is expected that a longer dwelling length implies a higher rate of data loss and poorer tracing performance. This point is verified in Fig. 6(b) for M = 11, where the performance is clearly worse than that in Fig. 6(a). This suggests that the number of learning iterations should be increased to improve the tracing performance. The averaged absolute tracing error for each iteration is defined as ( N i=1 e (i) 2 /N)) 1/2. Given the stochastic noise in the index, the averaged absolute tracing error does not decrease to zero as the number of iterations goes to infinity. Fig. 7 demonstrates the averaged absolute tracing error profiles for M =3, 5, and 11, denoted by the solid, dashed, and dashed-dotted lines, respectively. As seen in Fig. 7, a larger value of M results in larger tracing errors. B. SUS Case We now come to the SUS case. For clarity, we tae the same simulation cases as before. First, we consider the case Fig. 8. y 300 (t) versus y d (t) for SUS case. (a) M = 5. (b) M = 11. of M = 5. The tracing performance of the final iteration (i.e., the 300th iteration) is shown in Fig. 8(a), where the symbols are the same as those in the IUS case. As is seen from the figure, the desired reference is traced precisely. The final tracing performance for M = 11 in the SUS case is presented in Fig. 8(b). In contrast to the IUS case, the final tracing performance is much better, even when M is large. The similarity between Fig. 8(a) and (b) suggests that a longer dwelling length does not cause significant deterioration of the learning progress. This is because the algorithm eeps updating in the SUS case. The averaged absolute tracing error profiles for M = 3, 5, and 11 are shown in Fig. 9 by the solid, dashed, and dashed-dotted lines, respectively. Two differences can be observed between Figs. 7 and 9. The first is that the

214 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SHEN: DATA-DRIVEN LEARNING CONTROL FOR STOCHASTIC NONLINEAR SYSTEMS 9 Fig. 9. Averaged absolute tracing error (( N i=1 e (i) 2 /N)) 1/2 :M= 3, 5, and 11 for SUS case. Fig. 11. Averaged absolute tracing error for M = 5: IUS versus SUS. Fig. 10. y 300 (t) versus y d (t) for M = 5: IUS versus SUS. tracing performances after 100 learning iterations show little difference with the value of M in the SUS case. This explains the similarity between Fig. 8(a) and (b) from a different viewpoint. The second is that a large fluctuation occurs for the case M = 11, caused by successively updating with a large error for the first several iterations. C. IUS Versus SUS To provide a visual comparison between IUS and SUS, we show their final outputs in Fig. 10 for the case M = 5, where the solid line, dashed line with crosses, and dashed-dotted line with circles represent the reference, the IUS output, and the SUS output, respectively. The performance at time instants from 8 to 13 is enlarged as a subplot. It can be seen that the SUS output surpasses the IUS output over the same iterations. This is reasonable, because SUS updates more often than IUS does for the same iterations. The absolute averaged tracing error profiles are shown in Fig. 11 for M = 5, where it can be seen that the SUS algorithm achieves faster convergence and superior tracing. However, the SUS algorithm fluctuates during the early iterations as M increases, whereas the IUS algorithm maintains a gentle descent. VI. CONCLUSION This paper addresses the ILC problem for stochastic nonlinear systems with random communication constraints, including data dropouts, communication delays, and pacet transmission disordering. These communication constraints are analyzed and a renewal mechanism is proposed to regulate the pacets in the buffer. To design ILC update laws, a recognition mechanism is added to the controller for the selection of suitable pacets. Two learning schemes are proposed: IUS and SUS. When no suitable new pacet arrives, IUS retains the latest input, whereas SUS continues to update with the latest tracing information. Both schemes are shown to converge to the optimal input in the almost-sure sense. For further research, it would be of great interest to consider ways to accelerate the proposed schemes. When the capacity of the buffer is larger than one iteration storage, an important issue is to determine the optimal capacity of the buffer in relation to tracing performance and economy requirements. Moreover, the corresponding design and analysis of the learning algorithms remain to be conducted. In addition, the control signal may not change rapidly because of practical limitations, that is, any variation of the input should be bounded. Then, how to integrate this issue into the problem formulation and solve it become open problems. APPENDIX A PROOF OF THEOREM 1 Due to the nonlinear functions f (t) and b (t), which are related to the information from the previous time instants, it is difficult to show the convergence of the input for all time instants simultaneously. Therefore, for convenience, the proof is carried out by mathematical induction along the time axis t. Note that the steps for time t = 1, 2,...,N 1 are identical to the case for initial time t = 0, which is expressed as follows. A. Initial Step Consider the case of t = 0. From algorithms (6) and (7), it is evident that to show the optimality of {u (0)}, it is sufficient to show the optimality of its subsequence {u τi (0)}, i.e., to show the convergence of (6).

215 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Note that both τ i and n τi are random and τ i n τi τ i 1.For t = 0, the algorithm (6) gives δu τi (0) = δu τi 1 (0) a τi 1 c + b τi 1 (0)δu τi n τi (0) a τi 1 ϕ τi n τi (0) + a τi 1 v τi n τi (1) = (1 a τi 1 c + b τi 1 (0))δu τi 1 (0) a τi 1 ϕ τi n τi (0) + a τi 1 v τi n τi (1). (11) The above-mentioned recursion differs from the traditional ILC update law. In the above-mentioned recursion, the learning gain and tracing error are event-triggered, whereas the traditional ILC update law runs per iteration. However, by A5, we have τ i τ i 1 2M, and thus, {a τi } is a subset of {a } with the following properties a τi 0, i=1 a τi =, i i=1 aτ 2 i <. Set Ɣ i, j (1 a τi c + b τi (0))...(1 a τ j c + b τ j (0)), i j and Ɣ i,i+1 1. Because b (0) is continuous in the initial state, c + b τi converges to a positive constant by A4 and A2. It is clear that 1 a τ j c + b τ j (0) >0 for a sufficiently large j, say j j 0. Then, for any i > j, j j 0,itistruethat Ɣ i, j = (1 a τi c + b τi (0))Ɣ i 1, j exp ( ) ca τi Ɣi 1, j with some c > 0, where the inequality 1 a e a is applied. It then follows that Ɣ i, j c 0 exp( c i = j a τ ), j j 0 for some c 0 > 0, and because j 0 is a finite integer, it is clear that: i Ɣ i, j Ɣ i, j0 Ɣ j0, j c 0 exp c a τ j. (12) Now from (11), we have δu τi (0) = Ɣ i,0 δu τ0 (0) + = j i Ɣ i, j+1 a τ j ϕ τ j n τ j (0) j=0 i Ɣ i, j+1 a τ j v τ j n τ j (1) (13) j=0 where the first term at the right-hand side of the last equation tends to zero as i goes to infinity, by the definition of τ i, Ɣ i, j and (12). By A4, the recognition mechanism and the continuity of nonlinear functions, it is clear that ϕ τ j n τ j (0) 0. j From A3, we have j=0 a τ j v τ j n τ j (1) <. Thus, the last two terms of (13) tend to zero following similar steps to [36, Lemma 3.1.1]. B. Inductive Step Assume that the convergence of u (t) has been proven for t = 0, 1,...,s 1, then from Lemma 1, we have δx (s) 0 and therefore ϕ (s) 0. Then, following similar steps to the case t = 0, we are with no difficulty to conclude that δu (s) 0. This completes the proof. APPENDIX B PROOF OF THEOREM 2 Similar to the proof of Theorem 1, the mathematical induction is used due to the existence of nonlinearities. On the other hand, noticing (10), we have a recursion based on stopping times, which is similar to (6). Thus, for any given time, we will first show the convergence of a subsequence {u τi 1(t)} and then extend this to the general sequence {u (t)}. There are two major differences between the proofs of Theorems 1 and 2: first, the tracing error e τi n τi (t + 1) used in (10) is not generated by u τi 1(t), and second, the extension from the subsequence to the general input sequence of this theorem is nontrivial. A. Initial Step Consider the case of t = 0. Subtracting both sides of (10) with t = 0 from u d (0) yields δu τi+1 1(0) = δu τi 1(0) = δu τi 1(0) τ i+1 2 a =τ i 1 τ i+1 2 =τ i 1 τ i+1 2 =τ i 1 a e τi n τi (1) a c + b τi n τi (0)δu τi n τi (0) ϕ τi n τi (0) + τ i+1 2 a =τ i 1 v τi n τi (1). By the definition of n τi, we now that n τi 1, and that there is an iteration gap between the input signal and the tracing error information. However, we could rewrite the last equation as δu τi+1 1(0) = 1 + τ i+1 2 =τ i 1 τ i+1 2 a =τ i 1 τ i+1 2 =τ i 1 a c + b τi n τi (0) δu τi 1(0) c + b τi n τi (0)(δu τi 1(0) δu τi n τi (0)) a ϕ τi n τi (0) + τ i+1 2 =τ i 1 a v τi n τi (1). (14) Note that when τ i 1 <τ i n τi τ i 1, the updating from τ i n τi iteration to τ i 1 iteration would follow (9) and thus: δu τi 1(0) δu τi n τi (0) τ i 2 = e τi 1 n τi 1 (1) = =τ i n τi a τ i 2 a =τ i n τi (c + b τi 1 n τi 1 (0)δu τi 1 n τi 1 (0) ϕ τi 1 n τi 1 (0) + v τi 1 n τi 1 (1)). (15)

216 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. SHEN: DATA-DRIVEN LEARNING CONTROL FOR STOCHASTIC NONLINEAR SYSTEMS 11 It follows from (14) and (15) that: δu τi+1 1(0) = 1 + τ i+1 2 =τ i 1 τ i+1 2 =τ i 1 ( ( τ i 2 τ i =τ i n τi a a =τ i 1 ( ( τ i 2 a =τ i n τi τ i+1 2 =τ i 1 a c + b τi n τi (0) δu τi 1(0) a c + b τi n τi (0) ) ) c + b τi 1 n τi 1 (0)δu τi 1 n τi 1 (0) c + b τi n τi (0) ) ( ) ) ϕ τi 1 n τi 1 (0) + v τi 1 n τi 1 (1) a ϕ τi n τi (0) + τ i+1 2 =τ i 1 a v τi n τi (1). (16) Let i, j be i, j (1 ( τ i+1 2 =τ i 1 a )c + b τi n τi (0)) (1 ( τ j+1 2 =τ j 1 a )c + b τ j n τ j (0)) for i j and i,i+1 = 1. Note that b (0) is continuous in the initial state and c + b τi n τi converges to some positive constant as i goes to infinity by A4 and A2. Given the boundedness of τ i+1 τ i, it is clear that 1 ( τ i+1 2 =τ i 1 a )c + b τi n τi (0) >0forlarge enough value of j, say j j 0. Thus by steps similar to those of Theorem 1, we arrive at i, j c 0 exp c τ i+1 2 a =τ j 1 i j, j > 0 (17) with proper c 0 and c. For brevity of notations, we denote τ i 2 α i c + b τi n τi (0) a =τ i n τi ) c + b τi 1 n τi 1 (0)δu τi 1 n τi 1 (0) τ i 2 β i c + b τi n τi (0) a ϕ τi 1 n τi 1 (0) + ϕ τi n τi (0) =τ i n τi τ i 2 γ i c + b τi n τi (0) v τi 1 n τi 1 (1) + v τi n τi (1). Then, from (16), we have =τ i n τi a δu τi+1 1(0) = i,1 δu τ1 1(0) + + i i, j+1 j=1 i j=1 i j=1 τ j+1 2 a =τ j 1 τ j+1 2 i, j+1 =τ j 1 τ j+1 2 i, j+1 β j =τ j 1 a α j a γ j (18) where the first term on the right-hand side tends to zero as i goes to infinity. By A4, we have β i 0. According to A5, i τi+1 2 =τ i 1 a 0, τi+1 2 i=1 i =τ i 1 a = =1 a =,and i=1 τ i+1 2 a =τ i 1 2 M i=1 τ i+1 2 a 2 =τ i 1 = M a 2 <. =1 By following similar steps to those of Theorem 1, the last two terms on the right-hand side of (18) tend to zero as i goes to infinity. Then, to prove the zero convergence of δu τi 1(0), it suffices to show the zero convergence of the second term on the right-hand side of (18) as i. It is obvious that α i = O(a τi ) because of the boundedness of δu τi 1 n τi 1 (0) and c + b τi n τi (0) and the fact that a τi 2 τ i 2 =τ i n τi a Ma τi n τi Ma τi 1. This results in that α i 0, and there- i fore, the zero convergence of i j=1 i, j+1 ( τ j+1 2 =τ j 1 a )α j following the similar steps of [36, Lemma 3.1.1] or Theorem 1 above. As a result, we have shown that δu τi 1(0) i 0. Next, let us extend it to δu (0), τ i τ i+1 2. From (9), it follows that: δu (0) = δu τi 1(0) = δu τi 1(0) 1 = 1 + a j j=τ i a j j=τ i 1 j=τ i 1 1 j=τ i 1 1 j=τ i 1 1 j=τ i 1 a j e τi n τi (1) a j (ϕ τi n τi (0) v τi n τi (1)) c + b τi n τi (0)δu τi n τi (0) c + b τi n τi (0) δu τi 1(0) a j c + b τi n τi (0)(δu τi 1(0) δu τi n τi (0)) a j (ϕ τi n τi (0) v τi n τi (1)) τ i τ i+1 2. Then by techniques similar to those used for (14), zero convergence for general δu (0) is proven. B. Inductive Step Assume the convergence of u (t) has been proven for t = 0, 1,...,s 1, then, by using Lemma 1, we have δx (s) 0 and therefore, ϕ (s) 0. Then, following similar steps as in the case t = 0, we are with no difficulty to conclude that δu (s) 0. This completes the proof.

217 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS REFERENCES [1] D. A. Bristow, M. Tharayil, and A. G. Alleyne, A survey of iterative learning control, IEEE Control Syst., vol. 26, no. 3, pp , Jun [2] H.-S. Ahn, Y. Chen, and K. L. Moore, Iterative learning control: Brief survey and categorization, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 37, no. 6, pp , Nov [3] D. Shen and Y. Wang, Survey on stochastic iterative learning control, J. Process Control, vol. 24, no. 12, pp , [4] D. Shen, W. Zhang, Y. Wang, and C.-J. Chien, On almost sure and mean square convergence of P-type ILC under randomly varying iteration lengths, Automatica, vol. 63, pp , Jan [5] D. Shen, W. Zhang, and J.-X. Xu, Iterative learning control for discrete nonlinear systems with randomly iteration varying lengths, Syst. Control Lett., vol. 96, pp , Oct [6] W. Xiong, D. W. C. Ho, and X. Yu, Saturated finite interval iterative learning for tracing of dynamic systems with HNN-structural output, IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 7, pp , Jul [7] R. Chi, Z. Hou, S. Jin, D. Wang, and C.-J. Chien, Enhanced datadriven optimal terminal ILC using current iteration control nowledge, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 11, pp , Nov [8] M.-B. Radac, R.-E. Precup, and E. M. Petriu, Model-free primitivebased iterative learning control approach to trajectory tracing of MIMO systems with experimental validation, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 11, pp , Nov [9] D. Shen and Y. Xu, Iterative learning control for discrete-time stochastic systems with quantized information, IEEE/CAA J. Autom. Sinica, vol.3, no. 1, pp , Jan [10] X. Bu and Z. Hou, Adaptive iterative learning control for linear systems with binary-valued observations, IEEE Trans. Neural Netw. Learn. Syst., to be published, doi: /TNNLS [11] X. Li, Q. Ren, and J.-X. Xu, Precise speed tracing control of a robotic fish via iterative learning control, IEEE Trans. Ind. Electron., vol. 63, no. 4, pp , Apr [12] L. Zhang, W. Chen, J. Liu, and C. Wen, A robust adaptive iterative learning control for trajectory tracing of permanent-magnet spherical actuator, IEEE Trans. Ind. Electron., vol. 63, no. 1, pp , Jan [13] O. Sörnmo, B. Bernhardsson, O. Kröling, P. Gunnarsson, and R. Tenghamn, Frequency-domain iterative learning control of a marine vibrator, Control Eng. Pract., vol. 47, pp , Feb [14] D. Meng, Y. Jia, J. Du, and F. Yu, Tracing algorithms for multiagent systems, IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 10, pp , Oct [15] D. Meng, Y. Jia, and J. Du, Robust consensus tracing control for multiagent systems with initial state shifts, disturbances, and switching topologies, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 4, pp , Apr [16] H.-S. Ahn, Y. Q. Chen, and K. L. Moore, Intermittent iterative learning control, in Proc. IEEE Int. Symp. Intell. Control, Oct. 2006, pp [17] H.-S. Ahn, K. L. Moore, and Y. Q. Chen, Discrete-time intermittent iterative learning controller with independent data dropouts, IFAC Proc. Vol., vol. 41, no. 2, pp , Dec [18] H.-S. Ahn, K. L. Moore, and Y. Chen, Stability of discrete-time iterative learning control with random data dropouts and delayed controlled signals in networed control systems, in Proc. 10th Int. Conf. Control Autom. Robot. Vis. (ICARCV), Dec. 2008, pp [19] X. Bu, Z.-S. Hou, and F. Yu, Stability of first and high order iterative learning control with data dropouts, Int. J. Control Autom. Syst., vol. 9, no. 5, pp , [20] X. Bu, F. Yu, Z.-S. Hou, and F. Wang, Iterative learning control for a c1ass of nonlinear systems with measurement dropouts, (in chinese), Control Theory Appl., vol. 29, no. 11, pp , [21] X. Bu, F. Yu, Z. Hou, and F. Wang, Iterative learning control for a class of nonlinear systems with random pacet losses, Nonlinear Anal. Real World Appl., vol. 14, no. 1, pp , [22] D. Shen and Y. Wang, Iterative learning control for networed stochastic systems with random pacet losses, Int. J. Control, vol. 88, no. 5, pp , [23] D. Shen and Y. Wang, ILC for networed nonlinear systems with unnown control direction through random lossy channel, Syst. Control Lett., vol. 77, pp , Mar [24] D. Shen and Y. Wang, ILC for networed discrete systems with random data dropouts: A switched system approach, in Proc. 33rd Chin. Control Conf. (CCC), Nanjing, China, Jul. 2014, pp [25] J. Liu and X. Ruan, Networed iterative learning control approach for nonlinear systems with random communication delay, Int. J. Syst. Sci., vol. 47, no. 16, pp , [26] D. Shen and H.-F. Chen, Iterative learning control for large scale nonlinear systems with observation noise, Automatica, vol. 48, no. 3, pp , Mar [27] R. Zhang, Z. Hou, R. Chi, and H. Ji, Adaptive iterative learning control for nonlinearly parameterised systems with unnown timevarying delays and input saturations, Int. J. Control, vol. 88, no. 6, pp , [28] L. Wang, S. Mo, D. Zhou, F. Gao, and X. Chen, Delay-range-dependent robust 2D iterative learning control for batch processes with state delay and uncertainties, J. Process Control, vol. 23, no. 5, pp , Jun [29] D. Meng and Y. Jia, Anticipatory approach to design robust iterative learning control for uncertain time-delay systems, Asian J. Control, vol. 13, no. 1, pp , [30] D. Shen, Y. Mu, and G. Xiong, Iterative learning control for nonlinear systems with deadzone input and time delay in presence of measurement noise, IET Control Theory Appl., vol. 5, no. 12, pp , Aug [31] D. Shen and J.-X. Xu, A novel Marov chain based ILC analysis for linear stochastic systems under general data dropouts environments, IEEE Trans. Autom. Control, to be published, doi: /TAC [32] D. Shen and Z. Hou, Iterative learning control with unnown control direction: A novel data-based approach, IEEE Trans. Neural Netw., vol. 22, no. 12, pp , Dec [33] Y. Chen, C. Wen, Z. Gong, and M. Sun, An iterative learning controller with initial state learning, IEEE Trans. Autom. Control, vol. 44, no. 2, pp , Feb [34] M. Sun and D. Wang, Initial shift issues on discrete-time iterative learning control with system relative degree, IEEE Trans. Autom. Control, vol. 48, no. 1, pp , Jan [35] E. B. Kosmatopoulos and A. Kouvelas, Large scale nonlinear control system fine-tuning through learning, IEEE Trans. Neural Netw., vol. 20, no. 6, pp , Jun [36] H.-F. Chen, Stochastic Approximation and Its Applications. Dordrecht, The Netherlands: Kluwer, Dong Shen (M 10) received the B.S. degree in mathematics from the School of Mathematics, Shandong University, Jinan, China, in 2005, and the Ph.D. degree in mathematics from the Key Laboratory of Systems and Control, Institute of Systems Science, Academy of Mathematics and System Science, Chinese Academy of Sciences (CAS), Beijing, China, in From 2010 to 2012, he was a Post-Doctoral Fellow with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, CAS. Since 2012, he has been an Associate Professor with the College of Information Science and Technology, Beijing University of Chemical Technology. From 2016 to 2017, he was a Visiting Scholar with the National University of Singapore, Singapore. He has authored or co-authored over 50 refereed journal and conference papers. He has authored the boo Stochastic Iterative Learning Control (Science Press, 2016) and co-authored the boo Iterative Learning Control for Multi-Agent Systems Coordination (Wiley, 2017). His current research interests include iterative learning control and stochastic control and optimization. Dr. Shen received the IEEE CSS Beijing Chapter Young Author Prize in 2014 and the Wentsun Wu Artificial Intelligence Science and Technology Progress Award in 2012.

ILC Group Annual Report 2018

ILC Group Annual Report 2018 ILC Group Annual Report 28 D. SHEN 28.2.3 报告摘要 Letter 本报告主要汇总了智能与学习系统中心 (Center of Intelligent and Learning Systems) 在 28 年的研究内容报告的主要内容包括研究组在本年度的相关数据会议交流等学术活动讨论组报告列表研究生信息表研究方向概述以及本年度发表论文集本研究小组的主要研究方向为迭代学习控制