- PDF Free Download

Provided by the author(s) and University College Dublin Library in accordance with ublisher olicies. Please cite the ublished version when available. Title Low Comlexity Stochastic Otimization-Based Model Extraction for Digital Predistortion of RF Power Amlifiers Author(s) Kelly, Noel; Zhu, Anding Publication date 216-5 Publication information IEEE Transactions on Microwave Theory and Techniques, 64 (5): 1373-1382 Publisher IEEE Item record/more information Publisher's statement Publisher's version (DOI) htt://hdl.handle.net/1197/8389 216 IEEE. Personal use of this material is ermitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including rerinting/reublishing this material for advertising or romotional uroses, creating new collective works, for resale or redistribution to servers or lists, or reuse of any coyrighted comonent of this work in other works. htt://dx.doi.org/1.119/tmtt.216.2547383 Downloaded 217-12-23T5:18:8Z The UCD community has made this article oenly available. Please share how this access benefits you. Your story matters! (@ucd_oa) Some rights reserved. For more information, lease see the item record link above.

1 Low Comlexity Stochastic Otimization-Based Model Extraction for Digital Predistortion of RF Power Amlifiers Noel Kelly, Student Member, IEEE, and Anding Zhu, Senior Member, IEEE Abstract This aer introduces a low-comlexity stochastic otimization-based model coefficients extraction solution for digital redistortion of RF ower amlifiers. The roosed aroach uses a closed-loo extraction architecture and relaces conventional least squares training with a modified version of the simultaneous erturbation stochastic aroximation (SPSA) algorithm that requires a very low number of numerical oerations er iteration, leading to considerable reduction in hardware imlementation comlexity. Exerimental results show that the comlete closed-loo stochastic otimization-based coefficient extraction solution achieves excellent linearization accuracy while avoiding the comlex matrix oerations associated with conventional least squares techniques. Index Terms Digital redistortion, linearization, model extraction, stochastic otimization, simultaneous erturbation stochastic aroximation, ower amlifier. I I. INTRODUCTION N MODERN wireless base stations, the radio frequency (RF) ower amlifier (PA) is an inherently nonlinear device that causes in-band distortion as well as out-of-band sectral growth in the transmitted signal. These effects are articularly severe at high outut ower levels when the PA is oerated in a ower efficient mode. Digital redistortion () is an advanced linearization technique that comensates for PA nonlinear effects by alying an inverted model of the PA to the inut signal at digital baseband before amlification [1],[2]. To effectively aly, an accurate PA model must be develoed first since it is only when the nonlinear characteristics of the PA are accurately modeled and thus correctly reversed, that the overall system resonse to a signal flowing serially through the cascade of -PA can become linear. In recent years, a range of advanced behavioral models for RF PAs have been develoed, with many derived from the Volterra series [3]-[5]. The coefficients for these models are This ublication was emanated from research suorted in art by research grants from Science Foundation Ireland (SFI) and co-funded under the Euroean Regional Develoment Fund under Grant Numbers 13/RC/277 and 12/IA/1267. This aer is an exanded version from the IEEE MTT-S International Worksho on Integrated Nonlinear Microwave and Millimetre-wave Circuits (INMMiC), Taormina, Italy, October 1 2, 215. The authors are with the School of Electrical and Electronic Engineering, University College Dublin, Dublin 4, Ireland (e-mail: noel.kelly.1@ucdconnect.ie; anding.zhu@ucd.ie). tyically calculated by using least squares (LS) based algorithms in an indirect learning (IDL) architecture [6]-[8]. The LS algorithm offers high accuracy and fast convergence, but it uses comlex matrix multilications and inversions which require substantial hardware resources to execute. Furthermore, the comlexity of these matrix oerations increases with the number of coefficients emloyed in the model and the number of samles used in the extraction rocess [9]. As oerating bandwidths in wireless communication systems continue to increase, the nonlinear behavior of the PAs becomes more comlicated. It leads that more coefficients will be used in the models which in turn causes the LS oeration to become more comlex and ower consuming. In addition, in future small-cell base stations, cost and energy consumtion of the digital art itself is exected to become a major consideration because the ower savings in the RF become smaller. Thus, lowcomlexity coefficient extraction solutions for are highly desirable. In [1], a stochastic otimization-based coefficient calculation technique was roosed as a low-comlexity alternative to the LS solution. It was derived from the simultaneous erturbation stochastic aroximation (SPSA) algorithm that uses the measurements of the loss function with a random erturbation on the model coefficients to determine the coefficient udating direction and finally find the otimum solution. By using this aroach, the gradient aroximation only requires two function measurements er iteration regardless of the number of coefficients involved. The coefficient erturbation rocess only requires a simle addition and subtraction oeration, which leads to substantial savings in hardware resource usage in the model extraction. Due to limited sace, only the basic concet was given and the SPSA was only alied in the indirect learning structure in [1]. In this aer, we first resent the comlete idea of the SPSA algorithm and then give a ste-by-ste guide to imlementation of an enhanced version of the SPSA algorithm that is secifically suitable for the coefficient extraction. Using information from revious iterations, the roosed modification substantially imroves convergence seed. To effectively reuse hardware resources and otimize system erformance, a comlete SPSA-based model extraction aroach using the closed-loo coefficient estimation architecture is also given. Exerimental results show that the roosed aroach can achieve comarable linearization

2 erformance but use considerably less hardware resources comared to the conventional LS algorithm. The aer is organized as follows. In Section II, the conventional model extraction is briefly reviewed and the associated challenges are highlighted. Stochastic otimization using the SPSA algorithm is introduced in Section III and the roosed novel alication-secific solution is given in Section IV. Section V discusses ractical alication of the technique in the closed-loo coefficient estimation architecture while Section VI reorts simulation and exerimental results. The overall findings of the work are summarized in Section VII. II. CONVENTIONAL MODEL EXTRACTION Much effort has been devoted in recent years to develoing efficient behavioral models, tyically with the goal of reducing the number of terms while maintaining model accuracy [3]-[5]. One recent examle, the Decomosed Vector Rotation (DVR) model [11], relates the PA inut signal, x (n), and its outut, ỹ(n) as shown in (1), where a i and c s,i are the model coefficients. M yn ( ) axn ( i) i S i M c x( ni) e s1 i S M s1 i si,,1 si,,21 j ( ni) j ( ni) c x( ni) e x( n) For calculation of the model coefficient vector, Ĉ=[a 1, a 2,, c s,1,1, ], the indirect learning architecture is commonly used [2],[6]-[8]. In this architecture, the model extraction is conducted by using a ost-inverse model, where the outut of the PA, ỹ(n), is used as inut of the model while the inut of the PA, ũ(n), is used as the exected outut. Because the ost-inverse model has the identical structure as that used for the re-inverse model, the extracted coefficients for the ost-inverse model can be directly coied to the block, as illustrated in Fig. 1. To extract the ost-inverse model coefficients, the standard least squares algorithm can be emloyed and the coefficient vector, Ĉ, is given by 1 H H s s (1) C Y Y Y U (2) where ( ) H reresents the Hermitian transose and ( ) -1 is the matrix inverse oerator. The vector U is formed from the outut signal samles u (n) and Y is a matrix constructed using measured samles of the PA outut, y (n), in which each column corresonds to a term in the DVR behavioral model in (1). The large matrix inversion and multilication oerations required to execute (2) are comutationally comlex and resource intensive. For instance, to extract 5 coefficients with 8 digital samles, it requires over 4,, comlex multilication oerations. Imlementing such algorithms in Fig. 1. system with indirect learning model extraction architecture. digital hardware requires substantial dedicated hardware resources and occuies large chi area. To reduce the comutational comlexity, iterative coefficient extraction techniques can be considered. In articular, the recursive least-squares (RLS) and least-meansquares (LMS) algorithms have been alied in and PA modeling [12],[13]. The RLS algorithm is an examle of a quasi-newton otimization method in which the coefficient udate equation uses an aroximation to the Hessian matrix at each iteration. Although the RLS algorithm avoids oerations involving large matrices, maintaining an accurate aroximation of the Hessian matrix still requires significant comlexity, articularly for higher order models. Alternatively, gradient descent-based methods, such as LMS, rely on a simle first order aroximation to determine the udate direction. In the LMS algorithm, a highly efficient aroximation to the loss function gradient vector is used where large matrix calculations are avoided and imlementation comlexity is greatly reduced. However, LMS is very slow to converge and the algorithm tyically struggles to achieve the desired model accuracy. III. SIMULTANEOUS PERTURBATION STOCHASTIC APPROXIMATION ALGORITHM SPSA is an efficient stochastic otimization algorithm that follows an iterative rocedure where incremental udating of adjustable arameters is used to converge towards the desired minimum or maximum of an objective function. For a given iteration, k, the SPSA algorithm calculates an udated coefficient vector according to: C k1 C k ag k k C k (3) where Ĉ k is the existing coefficient vector and a k is a weighting factor used to control convergence seed. The term ĝ k (Ĉ k ) reresents the loss function gradient at the coefficient vector Ĉ k and is resonsible for determining the direction of the algorithm udate. The key idea of SPSA is that, the loss function gradient, ĝ k (Ĉ k ), is estimated by using measurements on the loss function instead of conducting direct differential calculation, which substantially reduces the comutational comlexity [14][15]. Let s take a simle PA forward model, shown in Fig. 2, as an examle and assume that the model is constructed by using a nonlinear function with a set of coefficients. The goal is to find the otimum coefficient values that result in the closest match between the redicted outut y [n] and the measured outut y[n] with inut x[n]. To extract the coefficients, a random erturbation sequence, Δ k is added and subtracted

3 Fig. 2. Loss function measurement in the SPSA algorithm. (with weighting c k ) from the current forward model coefficient vector, Ĉ k, generating two additional coefficient vectors, C C c Δ k k k k C C c Δ k k k k. (4) By alying the two coefficient vectors to the model, two sets of model outut data can be obtained and two loss function measurements, L(Ĉ k + ) and L(Ĉk ) are erformed. The loss function gradient is then simly aroximated by ( L( Ck) L( Ck) gk Ck). (5) Ck Ck and the resulting coefficient udate equation is given by: C L Ck L k1 Ck ak Ck Ck ( ) ( Ck). (6) Because all elements of Ĉ k are randomly erturbed together, the gradient aroximation measurements are indeendent from the number of the coefficients, which significantly simlifies the er-iteration comlexity. The erturbation vector itself is randomly generated at each iteration, given by: Δk1 Δk 2 Δ k (7) ΔkW and a Bernoulli distribution in which +1 and -1 outcomes occur with equal robability is considered a suitable choice to satisfy convergence requirements [15]. Based on the descrition above, a single iteration of the SPSA algorithm can then be summarized as follows: Ste 1: Generate erturbation vector Δ k Ste 2: Calculate temorary coefficients Ĉ k + and Ĉk Ste 3: Measure error function values L(Ĉ k + ) and L(Ĉk ) Ste 4: Obtain gradient aroximation ĝ k (Ĉ k ) Ste 5: Udate coefficient vector Ĉ k Using the udate equation in (6), SPSA iteratively calculates an otimum coefficient vector for a given system with a very low number of oerations er iteration. Secifically, as shown above, at each iteration the algorithm requires only a small number of simle addition and subtraction oerations, as defined in (4) and (5), and generation of the erturbation vector in (7). The rogression of the SPSA algorithm over 4 iterations across a samle error surface is deicted in Fig. 3. Fig. 3. Evolution of the SPSA algorithm over four iterations. IV. MODIFIED SPSA In the general form outlined above, SPSA can be alied across a wide range of otimization scenarios. However, in this work only a single alication of the algorithm is considered, namely, calculation of a set of coefficients. This restricted oerating environment can be exloited to develo the original algorithm into a higher erforming alication-secific technique, as discussed below. A. Quadratic Interolation In conventional, the normalized mean squared error (NMSE) is often used as a metric to quantify the linearization erformance [16]. The NMSE measures the total ower of the error vector between the ideal and modeled waveforms, normalized to the ideal signal ower. It is defined as: NMSE N en ( ) n1 N u n1 ideal 2 ( n) where N is the total number of samles, and e(n) is the difference between the ideal and measured signals 2 (8) en () u () n u () n (9) meas ideal Owing to the requirements of conventional extraction techniques, the vast majority of modern behavioral models are designed to ensure a linear relationshi between the model outut and its coefficients. For these linear-inarameters models, it follows that, for each samle, the error measured at the model outut has a linear relationshi with the model coefficients. This in turn leads that, if we choose NMSE as the loss function, the outut of the loss function can be exressed as a quadratic function of the model coefficients [13]. This secial feature of the loss function can be used to imrove SPSA erformance in extracting the coefficients. In the standard imlementation, the loss function gradient is aroximated as the sloe of a line between two oints located in the vicinity of the coefficient estimate, as shown in Fig.3. When the loss function is known to have a quadratic form, however, the interolated function is no longer an aroximation but an accurate reresentation of the loss

4 Fig. 4. Evolution of the quadratic interolation SPSA algorithm over 4 iterations. function along the chosen erturbation vector. As a result, the udated coefficient estimate can be found by moving directly to the minimum of the interolated function: C Δk( L( Ck) L( Ck)) k1 Ck L C k L Ck L Ck ( ) ( ) 2 ( ) (1) For quadratic modeling of the loss function, three unique oints are required. In addition to the two temorary coefficient vectors, Ĉ k + and Ĉk used in conventional SPSA, a third measurement is taken by directly using the current coefficient, Ĉ k, to give L(Ĉ k ). Fig. 4 deicts the evolution of this rocess over a number of iterations for a single coefficient. Successive quadratic interolations can be made and the coefficient set can be accurately udated to reduce the loss function value at each iteration. Using quadratic interolation to imrove SPSA erformance was mentioned in [17] and the Aendix at the end of this aer rovides further detail on the derivation of (1). B. Stee-Descent SPSA Algorithm Although results show that the erformance in quadratic interolation is consistent, the standard Bernoulli erturbations tyically generate shallow quadratic functions that rovide only a relatively small imrovement at each iteration. This leads to long convergence time in model extraction. To imrove convergence seed for SPSA, in this work, we roose to sulement the standard Bernoulli-based erturbations with calculated stee erturbation sequences designed to generate more efficient slices of the loss function surface. As shown in Fig. 4, as the SPSA algorithm rogresses, unique coefficient sets and their related loss function oints are generated at each iteration. Because the model is linear-inarameters, the loss function surface is thus quadratic, which leads that not only the coefficient sets associated with each iteration but also any combination of two different coefficient sets shall fall on a quadratic curve. For instance, as shown in Fig. 5, a coefficient set from iteration 1 and another set from iteration 5 can be used to form a new quadratic curve. This unique feature allows us to intelligently use the existing information from revious iterations to choose an efficient search direction. To exlain how this works, consider the examle taken Fig. 5. Stee-descent erturbation vector generation. above of moving from a oint on the quadratic on iteration 1 to the one on iteration 5. After the first iteration we select a oint and label it with coordinates (Ĉ 1, L(Ĉ 1 )) and similarly after the fifth iteration, we select a second oint (Ĉ 5, L(Ĉ 5 )). If we subtract Ĉ 1 from Ĉ 5, we find the erturbation vector, Δ SD required to move between the oints. Now, we define a third coefficient set Ĉ 5 + Δ SD and obtain the associated loss function oint L(Ĉ 5 +Δ SD ). The three oints (Ĉ 1, L(Ĉ 1 )), (Ĉ 5, L(Ĉ 5 )), and (Ĉ 5 +Δ SD, L(Ĉ 5 +Δ SD )) can now be used to form a quadratic curve. Using this curve a new minimum oint and corresonding coefficient set can be found. Different from the normal Bernoulli-based quadratic searching where the erturbation weighting is constant across the coefficient set, here the weighting varies. If two coefficient sets are located a short distance aart along the horizontal axis but are searated by a large vertical distance, a stee quadratic curve can be formed and thus the minimum oint can be found much faster. We call this aroach stee-descent SPSA (SD-SPSA). For the technique to work, the main challenge is in finding a suitable erturbation vector Δ SD. To further exlain, as shown in Fig. 5, at a given iteration k, to calculate Δ SD, the SD-SPSA algorithm selects a oint, denoted (Ĉ SD, L(Ĉ SD )), on the q-th ast loss function, L k-q (Ĉ), such that there exists a stee sloe between it and the current coefficient vector/loss function oint, (Ĉ k, L(Ĉ k )). The erturbation vector required to move between these two oints is calculated and designated the new stee descent erturbation, Δ SD. Efficient calculation of the numerical values for Ĉ SD is necessary if the SD-SPSA algorithm is to be effective. Using data from a ast iteration, k-q, the desired Ĉ SD coefficients can be exressed in terms of, Ĉ k-q and Δ k-q as: C C c Δ (11) SD kq ste kq where c ste is a constant determining the location of Ĉ SD on the ast loss function curve. The value of c ste for a given iteration is constant across all terms in the coefficient vector Ĉ SD. Thus, (11) can be rearranged to give an exression for c ste in terms of a single term in Ĉ SD :

5 c ste C C SD, k q, Δ kq,. (12) As shown in Fig. 5, we define a single term in Ĉ SD, as: C C (13) SD, k, w where w is chosen as a small value to ensure a stee sloe between the two oints. The resulting numerical value is used to determine c ste in (12) which is in turn alied to (11) to find Ĉ SD. Finally, the desired stee descent erturbation vector, Δ SD, is calculated as simly the erturbation required to move between the current coefficient set Ĉ k to the set Ĉ SD : Δ C C. (14) SD SD k Once Δ SD is determined, the SD-SPSA algorithm follows an identical rocedure to the standard quadratic interolation SPSA for the remainder of the iteration. Two temorary coefficient vectors are formed, C k Ck ckδsd (15) C C c Δ k k k SD and with the current coefficient vector Ĉ k, three loss function measurements are generated. Quadratic interolation is erformed on the resulting three unique loss function oints and the udated coefficient estimate is calculated according to (1). Calculating the stee descent erturbation sequence relies on storing both the erturbation sequence, Δ k, and the coefficient vector, Ĉ k, from ast iterations. The number of loss functions to be stored can be a user-defined arameter of the algorithm. To revent unnecessary imlementation comlexity, it is desirable to limit the memory deth as much as ossible without imacting erformance. It has been observed in this work that, by storing iteration data oints at regular intervals instead of consecutively, overall storage requirements can be drastically reduced without significantly imacting on erformance. For examle, drawing on data from the ast 1 consecutive iterations, similar erformance can be achieved by storing 1 data samles taken eriodically over the same 1 iterations. V. PROPOSED COEFFICIENT EXTRACTION Two architectures are commonly used in coefficient extraction: oen-loo indirect learning and closed-loo adatation. As shown in Fig. 1, the indirect learning uts the block outside the training loo and uses the PA outut as the inut and the outut as the exected outut to train a ost-inverse model first and then coy the coefficients to the re-inverse () block. It can converge very quickly but the final accuracy may be affected by linear imairments of the feedback loo [18]. More imortantly, a comlete ostdistortion model must be constructed in the training rocess, which can increase the hardware imlementation cost of the coefficient extraction. In contrast, outlined in Fig. 6, the closed-loo aroach laces the model inside the estimation loo and iteratively udates the coefficients using a searate error model [18],[19]. On each training run, h, the Fig. 6. SD-SPSA in the closed-loo coefficient estimation architecture. model coefficients vector C is udated according to: C, h1 C, h C error (16) where the error model coefficients vector, Ĉ error, is trained to model the measured error between the original inut x[n] and the PA outut y[n], en [] xn [] yn [] (17) and is the adatation factor. The closed-loo converges slower but it can tyically achieve greater accuracy when it reaches the steady state. Because the error model uses the same inut as the model, calculation of the nonlinear modeling terms can be shared between the two blocks, which can reduce hardware resource usage. In this work, we roose to incororate the SD-SPSA algorithm develoed from Section IV into a closed-loo estimation architecture to demonstrate how the SPSA technique can simlify the model extraction rocess. A. SPSA-Based Closed-Loo Coefficient Extraction Tyically, LS estimation is used to calculate the error model coefficients, H 1 H C ( X X) X E. (18) error where X is a matrix constructed from the inut samles, x[n], in which each column corresonds to a term in the model and E is the vector of measured errors, e[n]. As mentioned earlier, LS oeration involves large matrix oerations. In this work, we roose emloying the SD-SPSA algorithm to train the error model. Alying SD-SPSA in this architecture, in order to measure the loss function at each iteration, the error model outut, e mod [n] in Fig. 6, must be calculated. In other words, we need to run the error model, E mod XC error (19) where E mod is the vector of error model outut samles e mod [n]. Since the error model has identical structure to that used in the block and, at each run, the matrix X is already generated in the block, it is not necessary to imlement a full coy of the model. Instead, in the error model we only need to read out the terms of the matrix X from the block and multily with the error coefficients to generate E mod. As shown in Fig. 6, this allows just one set of model terms to be generated in the model and shared with the error model where a different coefficient vector is alied. For

6 Fig. 7. The closed-loo model extraction rocedure. examle, the term x (n i) β s e jθ(n i) in the DVR model in (1) can be generated just once and multilied by a different coefficient in the and error models. Imlementation of a comlete model structure is highly resource-intensive and reresents a significant ortion of the overall digital hardware requirements of the system [2], utilizing just one model for both coefficient extraction and redistortion significantly reduces the overall comlexity. The comlete closed loo extraction rocess, as outlined in Fig. 7, is comosed of two training loos. On a given closed loo coefficient extraction run, system inut and outut data is catured and used to generate an error signal. The SD-SPSA algorithm then runs in an internal training loo using the catured data to extract the error model coefficients. The conditions for exiting the internal SD-SPSA training loo can be chosen by the user. In our test, we limit a maximum number of iterations as the exiting criteria. When SD-SPSA training is comlete, the calculated error model coefficients are assed out of the internal loo and used to udate the coefficient estimate, as in (16). If necessary, after the coefficients are udated, new data is catured and the rocess is reeated until a redetermined linearization target is achieved. B. Comlexity Analysis Table I details the SD-SPSA comlexity in terms of real multilications and additions required er iteration. Oerations required to generate the error model outut at each iteration are also included. Standard LS extraction comlexity, as in (18), is included for reference. The results show that the coefficient udate comlexity in SPSA is greatly simlified. To quantify the imrovement, for an examle scenario with N=8192 samles and K = 5 model terms, SD-SPSA requires 2,49,518 real multilications while LS would need 124,483,8 oerations, leading to a 98% reduction in comutational comlexity. TABLE I SD-SPSA VS LS COMPLEXITY COMPARISON Oeration Comlexity Real Mults. Real Additions SD-SPSA(er iteration) Calculate C + and C 4 K Measure L(C ), L(C + ), L(C ) 4 N 8 N Calculate SD erturbations K 4 K Calculate coefficient udate 2 K 2 K Error model calculations: Generate XC + and XC 6 N K 6 N (K-1)+ Total: Least Squares (18) Matrix multilications: 4N+3K+ 6NK 1 N K 8N+1K+ 6N(K-1)+1NK (K N) (N K) 3 N K 2 2 (N-1) K 2 + 5 N K 2 (K K) (K N) 3 N K 2 2 (K-1) K N+ 5 N K 2 2 (N-1) K+ (K N) (N 1) 3 N K 5 N K Matrix inversion (K K) 3 K 3 5 K 3 2 ((N-1)K+ 3 (2NK Total: + (N-1)K 2 + NK+K 3 ) (K-1)NK)+5K 3 + 1NK 2 +5NK N number of training samles, K number of coefficients It is notable that, although not strictly a art of the algorithm itself, the main comlexity of SD-SPSA lands on calculating the error model outut and the NMSE. For conducting SD-SPSA, the only additional overhead is to calculate the stee descent erturbations. This amounts to K real multilications and 4K real additions, a minor cost relative to the overall comlexity. For comleteness, aside from the main stes detailed in Table I, a small number of hardware resources are also required for generation of the random erturbation sequence each iteration. For the SD-SPSA algorithm, ast iteration data must be stored. For each iteration, the saved data set contains the original coefficient vector and the associated erturbation vector. Assuming 32 bit accuracy for each comlex value, for a model with K coefficients, this corresonds to 32 K 2 bits of data er stored iteration. For the results resented in this aer, the SD-SPSA algorithm stores information from 1 ast iterations, leading to a total storage requirement of 32 K 2 1 bits of data. A tyical model with K=5 will thus only require 32 Kb of storage sace. Considering the Xilinx Virtex 7 FPGA family has between 28,62 Kb and 67,68 Kb of on-device storage caacity in the form of individual 36Kb block RAM units, this memory requirement is well within the caacity of on-board storage in modern FPGA chis [21]. VI. SIMULATION AND EXPERIMENTAL RESULTS In this section, simulation results for the SD-SPSA algorithm are first resented before the comlete coefficient estimation architecture is evaluated in a full RF test bench.

7-15 Normalized Mean Square Error (db) -2-25 -3-35 Least Squares Reference Quadratic SPSA SD-SPSA -4 2 4 6 8 1 No. iterations Fig. 8. Quadratic and SD-SPSA NMSE erformance. TABLE II SD-/QUADRATIC SPSA FORWARD MODELING PERFORMANCE NMSE No. of Iterations Required Accuracy (db) Quadratic SPSA Proosed SD-SPSA -3 55 33-35 1,9 1,5-36 3,4 1,4-37 12,5 2,5-37.5 32, 4, -38 257, 14, -38.2 865, 3, Least Squares NMSE: -38.32 db. A. PA Modeling Simulation The erformance of model extraction solutions heavily relies on the ability of the algorithms to achieve the desired modeling accuracy. To confirm the accuracy of the roosed SD-SPSA algorithm, a forward PA modeling scenario is considered in this subsection. The training data consists of 14,5 inut/outut samles catured from an LDMOS Doherty PA oerated at 37 dbm and excited by a 2 MHz W-CDMA signal. A DVR model, as given in (1), with arameters S=8 and M=3 serves as the behavioral model. Both the quadratic SPSA and the SD-SPSA algorithms were tested. The SD-SPSA algorithm was oerated with a 2:1 ratio of Bernoulli to stee descent erturbations and 1 ast data oints were stored with a samling interval of every 1 iterations. The LS estimation, outlined in (2), was also taken as the reference. Fig. 8 reorts the NMSE erformance over iterations. Both the quadratic SPSA and the SD-SPSA follow a similar training attern in which convergence seed is high for an initial eriod of aroximately 2, iterations before slowing down drastically as the NMSE aroaches that of the LS solution. For the quadratic SPSA, this lateau effect occurs earlier than in the SD-SPSA solution. After 1, iterations, the SD- SPSA algorithm converges to a value close to the LS reference. Table II reorts the number of iterations required to reach a chosen accuracy level for each algorithm and again, the fast converging SD-SPSA algorithm is shown to outerform standard quadratic SPSA. In addition, in less than 5, iterations the SD-SPSA algorithm achieves an NMSE level within 1 db of the LS reference and after 3, iterations the difference between the two is aroximately.1 db. Thus it is shown that using the SD-SPSA algorithm, it is Fig. 9. test latform setu. TABLE III SD-SPSA PERFORMANCE FOR 2 MHZ LTE SIGNAL Scenario NMSE (db) ACPR (db) -2 MHz +2 MHz No -2.9-32.7-31. SD-SPSA (2 runs) -33. -45.6-48.3 SD-SPSA (4 runs) -39.2-55.7-57.9 SD-SPSA (6 runs) -43.6-58.7-58. SD-SPSA (8 runs) -46.9-59.5-58.4 SD-SPSA (1 runs) -45.7-59.2-59.3 Least Squares (1 runs) -45.6-59.4-59.6 indeed ossible to achieve accuracy levels comarable to those obtained with conventional least squares techniques. B. Exerimental Test The roosed model extraction solution was then evaluated in a full RF test bench. The test setu was the same as that used in [22], shown in Fig. 9. An LDMOS Doherty PA again was oerated at 2.14 GHz. The inut signal was generated in MATLAB running on a PC before it was assed to the RF board for modulation and u-conversion and finally sent to the PA. At the PA outut, the signal with a re-determined block length was down-converted and demodulated to baseband, and then catured and returned to the PC for coefficient extraction. Following the model extraction rocess illustrated in Fig. 7, the exerimental rocedure is as follows: i) Cature PA inut/outut signal without. ii) Using the catured inut/outut data, aly SD-SPSA to calculate the error model coefficients vector C error in the inner loo of training. iii) Udate coefficients using Equation (16). iv) Using the udated coefficients, generate the redistorted inut signal, uload to the RF test bench. v) Re-cature PA outut signal generated by the new redistorted signal. vi) Reeat stes ii) to v) until the desired linearization erformance is achieved. The samling rate of the test latform was 368.64 MHz. The closed-loo coefficient adatation factor λ was.7 and on each run 15, iterations of the SD-SPSA algorithm were used to calculate the error model coefficients. 1) Performance with 2 MHz LTE Signal A 2 MHz single band LTE signal with 6.5 db eak to average ower ratio (PAPR) was used in the first test. The

8.9.8 AM/PM Without.7 6.6.5.4-6.3-12 AM/AM With SD-SPSA -18.2.4.6 Normalized Inut Magnitude AM/PM With SD-SPSA Normalized Outut Magnitude Phase Difference (Deg) 12 AM/AM Without.2.1 1.8 Fig. 1. AM/AM and AM/PM lots for 2 MHz LTE signal with and without Normalized Power Sectral Density (db) -1 With Least Squares -5-6 -4-3 -1 With Least Squares -2-3 With SD-SPSA Without -4-5 -6-6 -5-4 -3-2 -1 1 2 Frequency Offset (MHz) 3 4 5 6 Fig. 12. Outut sectra comarison for 6 MHz, 12-carrier UMTS signal With SD-SPSA -4-7 -5-7 Without -2-3 TABLE IV SD-SPSA PERFORMANCE FOR 6 MHZ UMTS SIGNAL ACPR (db) Scenario NMSE (db) -5 MHz +5 MHz No -9.92-35.6-3.1 SD-SPSA (1 runs) -39.23-51.32-51.6 Least Squares (1 runs) -39.66-51.64-52.1 Normalized Power Sectral Density (db) 1 18-2 -1 1 2 Frequency Offset (MHz) 3 4 5 Fig. 11. Outut sectra for 2 MHz LTE signal. redistortion model was a DVR-based function, with S=8 and M=3. Table III reorts the erformance of the SD-SPSA algorithm over the course of a series of closed loo estimation training runs in terms of adjacent channel ower ratio (ACPR) and NMSE. After 1 runs, strong linearization erformance is achieved. This is confirmed in Fig. 1 where the AM/AM and AM/PM lots are comared for the signal with and without. For comarison between the roosed solution and existing methods, linearization results for a conventional LS algorithm alied in the same closed loo architecture are reorted. As in the SD-SPSA simulations a training set of 15, inut/outut samles was catured to erform each training run using the LS algorithm. Results are reorted in Fig. 11 and further detailed in Table III where it can be seen that the roosed SD-SPSA solution achieves comarable timedomain NMSE and frequency domain ACPR erformance. We also conducted tests using the LMS and RLS algorithms. Due to the limited number of training samles available, LMS could not converge roerly and thus the erformance is very oor with NMSE only reaching -3 db. RLS can achieve similar erformance to the SPSA, but RLS requires more comlex oerations at each iteration. 2) Performance with 6 MHz UMTS Signal To test erformance in a wideband extraction scenario, a 6 MHz 12-carrier UMTS signal was used. The signal had 6.5 db PAPR and was alied in the test bench setu in Fig. 9 with an LDMOS Doherty PA with average outut ower of 34 dbm. A DVR redistortion model with S=8 and increased memory length M=5 was used to account for increased memory effects due to the wider bandwidth. The SD-SPSA algorithm was band-limited to an observation bandwidth of 14 MHz. Linearization erformance is reorted in Table IV for with the closed loo coefficient estimation using both conventional LS and SD-SPSA. Both techniques are shown to achieve similar linearization erformance, reducing the ACPR by aroximately 2 db and the NMSE by aroximately 3 db. Fig. 12 reorts the measured sectra at the PA outut for the wideband signal, confirming the strong linearization erformance of the SD-SPSA algorithm, directly comarable to the LS estimation. VII. CONCLUSION This work resents a novel model extraction solution based on a stochastic otimization technique incororated in a closed-loo coefficient estimation architecture. The roosed algorithm avoids comutationally intensive Hessian and gradient calculations, instead using loss function measurements to aroximate the gradient and iteratively udate the coefficient estimate. The roosed closed-loo technique also avoids generating a second set of model terms as required in indirect learning structures. This further reduces the system comlexity, making it well-suited to low-cost FPGA imlementation. Exerimental results show that the roosed aroach achieves excellent linearization erformance, with accuracy comarable to that achieved using the conventional LS method.

9 APPENDIX The quadratic interolation SPSA udate equation in (1) can be develoed as follows. For a linear-in-arameters model, NMSE is a quadratic function of the model coefficients [13]. Thus, for a given iteration, k, of the SPSA algorithm, a 2-dimensional section of the loss function can be defined along a given direction (determined by the erturbation sequence) in terms of any one of the model coefficients Ĉ k, as: 2 f ( C ) C C (A.1) k 1, k, k, 2, k, k, 3, k, where the unique quadratic function arameters ϕ 1,k,, ϕ 2,k,, and ϕ 3,k, exist for each model coefficient and iteration k. The minimum of (A.1) reresents the best erformance that can be achieved by varying the weighting factor for a given set of fixed coefficient and erturbation vectors. Finding the coefficient set corresonding to the quadratic minimum can be formulated as a Newton-based otimization roblem. For a simle quadratic minimization roblem Newton s method is given by: f '( n ) n1 n C C C (A.2) f ''( C ) where Ĉ n+1 is the udated otimum arameter estimate, Ĉ n the current arameter set, and f '(Ĉ n ), f ''(Ĉ n ) are the first and second derivatives of the function f (Ĉ n ) for which the minimum is sought. For the quadratic in (A.1), the first and second derivatives are given by: f '( C ) 2 C (A.3) and k, 1, k, k, 2, k, n f ''( C ) 2. (A.4) k, 1, k, Newton s method is derived from a second order truncated Taylor series, so only a single iteration is needed to reach the quadratic minimum: 2 1, k, Ck, 2, k, arg min( f ( C )) k C k,. (A.5) C 2 1, k, Substituting for 1, k, and 2, k, using the algebraic exressions in terms of the three measured oints ( C k,, L( C k)), ( Ck,, L( C k)), and ( Ck,, L( C k)), (A.5) is given by: arg min( f ( C )) C C k k, Δk, LCk LCk Ck Ck 2 Ck L L L. (A.6) The quadratic interolation SPSA algorithm generates the udated otimum coefficient estimate using the interolated quadratic minimum for each term in the coefficient vector. Thus the comlete udate algorithm at each iteration is a generalized version of (A.6): C k1 C k Δk LCk LCk Ck Ck 2 Ck L L L (A.7) where Δ k and C k are the comlete erturbation and coefficient vectors resectively. REFERENCES [1] J. Wood, Behavioral modeling and linearization of RF ower amlifiers. Norwood, MA: Artech House, 214 [2] F. Luo, Digital front-end in wireless communications and broadcasting. Cambridge, U.K.: Cambridge Univ., 211. [3] F. M. Ghannouchi and O. Hammi, Behavioral modeling and redistortion, IEEE Microw. Mag., vol. 1, no. 7,. 52 64, Dec. 29. [4] D. R. Morgan, Z. Ma, J. Kim, M. G. Zierdt, and J. Pastalan, A generalized memory olynomial model for digital redistortion of RF ower amlifiers, IEEE Trans. Signal Process., vol. 54, no. 1,. 3852-386, Oct. 26. [5] A. Zhu, J. C. Pedro, and T. J. Brazil, Dynamic deviation reduction based Volterra behavioral modeling of RF ower amlifiers, IEEE Trans. Microw. Theory Techn., vol. 54, no. 12,. 4323-4332, Dec. 26. [6] L. Ding, G. T. Zhou, D. R. Morgan, Z. Ma, J. S. Kenney, J. Kim, and C. R. Giardina, A robust digital baseband redistorter constructed using memory olynomials, IEEE Trans. Commun., vol. 52, no. 1,. 159-165, Jan. 24. [7] A. Zhu, P. J. Draxler, J. J. Yan, T. J. Brazil, D. F. Kinball, and P.M. Asbeck, Oen-loo digital redistorter for RF ower amlifiers using dynamic deviation reduction-based Volterra series, IEEE Trans. Microw. Theory Techn., vol. 56, no. 7,. 1524-1534, Jul. 28. [8] C. Eun and E. J. Powers, A new Volterra redistorter based on the indirect learning architecture, IEEE Trans. Signal Process., vol. 45, no. 1,. 223 227, Jan. 1997. [9] L. Guan and A. Zhu, Otimized low-comlexity imlementation of least squares based model extraction for digital redistortion of RF ower amlifiers, IEEE Trans. Microw. Theory Techn., vol. 6, no. 3,. 594-63, Mar. 212. [1] N. Kelly and A. Zhu, A modified simultaneous erturbation stochastic otimization algorithm for digital redistortion model extraction, Int. Integr. Nonlinear Microw. Millimetre-Wave Circuits Worksho (INMMIC), Taormina, Italy, Oct. 215,. 1-3. [11] A. Zhu, Decomosed vector rotation-based behavioral modeling for digital redistortion of RF ower amlifiers, IEEE Trans. Microw. Theory Techn., vol. 63, no. 2,. 737-744, Feb. 215. [12] F. M. Ghannouchi, O. Hammi, and M. Helaoui, Characterization and identification techniques, in Behavioral Modeling and Predistortion of Wideband Wireless Transmitters, 1 st ed. London, U.K., Wiley, 215, ch. 8,. 17-183. [13] P. S. R. Diniz, Fundamentals of adative filtering, in Adative Filtering Algorithms and Practical Imlementation, 3rd ed., New York: Sringer, 28. [14] J. C. Sall, Multivariate stochastic aroximation using a simultaneous erturbation gradient aroximation, IEEE Trans. Automat. Contr., vol. 37, no. 3,. 332-341, Mar. 1992. [15] J. C. Sall, An overview of the simultaneous erturbation method for efficient otimization, John Hokins APL Tech. Dig., vol. 19, no. 4, 482-492, 1998. [16] M. S. Muha, C. J. Clark, A. A. Moulthro, and C. P. Silva, Validation of ower amlifier nonlinear block models, in IEEE MTT-S Int. Microw. Sym. Dig., 1999, vol. 2,. 759-762. [17] A. V. Keerthi and P. Choudary, Method and aaratus to otimize adative radio-frequency systems, U.S. Patent 258 591, Oct. 2, 211. [18] R. N. Braithwaite, Closed-loo digital redistortion () using an observation ath with limited bandwidth, IEEE Trans. Microw. Theory Techn., vol. 63, no. 2,.726-736, February 215. [19] L. Guan and A. Zhu, Dual-loo model extraction for digital redistortion of wideband RF ower amlifiers, IEEE Microw. and Wireless Comon. Lett, vol. 21, no. 9,. 51-53, Setember 211.

1 [2] L. Guan and A. Zhu, Low-cost FPGA imlementation of Volterra series-based digital redistorter for RF ower amlifiers, IEEE Trans. Microw. Theory Techn., vol. 58, no. 4,. 866-872, Aril 21. [21] 7 Series FPGAs Overview, 1st ed., Xilinx, Inc., San Jose, CA, 215. [22] L. Guan, R. Kearney, C. Yu, and A. Zhu, "High erformance digital redistortion test latform develoment for wideband RF ower amlifiers," Int. J. Microw. Wireless Technologies, vol. 5, no. 2,. 149-162, Aril 213. Noel Kelly (S 15) received the BE degree in electronic engineering from the School of Electrical and Electronic Engineering, University College Dublin (UCD), Ireland in 212 before joining the RF and Microwave Research Grou at UCD. He is originally from Sligo, Ireland and is currently working towards the PhD degree in electronic engineering. His research interests include low comlexity digital redistortion architectures, efficient field rogrammable gate array (FPGA) imlementation solutions and digital redistortion alications for satellite communications. Anding Zhu (S -M 4-SM 12) received the B.E. degree in telecommunication engineering from North China Electric Power University, Baoding, China, in 1997, the M.E. degree in comuter alications from the Beijing University of Posts and Telecommunications, Beijing, China, in 2, and the Ph.D degree in electronic engineering from University College Dublin (UCD), Dublin, Ireland, 24. He is currently a Senior Lecturer with the School of Electrical and Electronic Engineering, UCD. His research interests include high-frequency nonlinear system modeling and device characterization techniques with a articular emhasis on behavioral modeling and linearization for RF ower amlifiers (PAs). He is also interested in wireless and RF system design, digital signal rocessing, and nonlinear system identification algorithms.