Dipartimento di Elettronica e Informazione, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, Italy

Size: px

Start display at page:

Download "Dipartimento di Elettronica e Informazione, Politecnico di Milano, piazza Leonardo da Vinci 32, Milan, Italy"

Ada Dickerson
5 years ago
Views:

1 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation Editorial Vesa Välimäki Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3, 15 HUT, Espoo, Finland vesa.valimaki@hut.fi Augusto Sarti Dipartimento di Elettronica e Informazione, Politecnico di Milano, piazza Leonardo da Vinci 3, 133 Milan, Italy augusto.sarti@polimi.it Matti Karjalainen Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3, 15 HUT, Espoo, Finland matti.karjalainen@hut.fi Rudolf Rabenstein Multimedia Communications and Signal Processing, University Erlangen-Nuremberg, 9158 Erlangen, Germany rabe@lnt.de Lauri Savioja Laboratory of Telecommunications Software and Multimedia, Helsinki University of Technology, P.O. Box 54, 15 HUT, Espoo, Finland lauri.savioja@hut.fi Model-based sound synthesis has become one of the most active research topics in musical signal processing and in musical acoustics. The earliest attempts in generating musical sound with a physical model were made over three decades ago. The first commercial products were seen only some twenty years later. Recently, many refinements to previous signal processing algorithms and several new ones have been introduced. We have learned that new signal processing methods can still be devised or old ones modified to advance the field. Today there exist efficient model-based synthesis algorithms for many sound sources, while there are still some for which we do not have a good model. Certain issues, such as parameter estimation and real-time control, require further work for many model-based approaches. Finally, the capabilities of human listeners to perceive details in synthetic sound should be accounted for in a way similar as in perceptual audio coding in order to optimize the algorithms. The success and future of the model-based approach depends on researchers and the results of their work. The roots of this special issue are in a European project called ALMA (Algorithms for the Modelling of Acoustic Interactions, IST , see polimi.it/alma/) where the guest editors and their research teams collaborated in the period from 1 to 4. The goal of the ALMA project was to develop an elegant, general, and unifying strategy for a blockwise design of physical models for sound synthesis. A divide-and-conquer approach was taken, in which the elements of the structure are individually modeled and discretized, while their interaction topology is separately designed and implemented in a dynamical and physically sound fashion. As a result, several high-quality demonstrations of virtual musical instruments played in a virtual environment were developed. During the ALMA project, the guest editors realized that this special issue could be created, since the field was very active but there had not been a special issue devoted to it for a long time. This EURASIP JASP special issue presents ten examples of recent research in model-based sound synthesis. The first two papers are related to keyboard instruments. First Giordano and Jiang discuss physical modeling synthesis of the piano using the finite-difference approach. Then Välimäki et al.

94 EURASIP Journal on Applied Signal Processing show how to synthesize the sound of the harpsichord based on measurements of a real instrument.

2 94 EURASIP Journal on Applied Signal Processing show how to synthesize the sound of the harpsichord based on measurements of a real instrument. An efficient implementation using a visual software synthesis package is given for real-time synthesis. In the third paper, Trautmann and Rabenstein present a multirate implementation of a vibrating string model that is based on the functional transformation method. In the next paper, Testa et al. investigate the modeling of stiff string behavior. The dispersive wave phenomenon, perceivable as inharmonicity in many string instrument sounds, is studied by deriving different physically inspired models. In the fourth paper, Karjalainen and Erkut propose a very interesting and general solution to the problem of how to build composite models from digital waveguides and finitedifference time-domain blocks. The next contribution is from Guillemain, who proposes a real-time synthesis model of double-reed wind instruments based on a nonlinear physical model. The paper by Howard and Rimell provides a viewpoint quite different from the others in this special issue. It deals with the design and implementation of user interfaces for model-based synthesis. An important aspect is the incorporation of tactile feedback into the interface. Arroabarren and Carlosena have studied the modeling and analysis of human voice production, particularly the vibrato used in the singing voice. Source-filter modeling and sinusoidal modeling are compared to gain a deeper insight in these phenomena. Bensa et al. bring the discussion back to the physical modeling of musical instruments, with particular reference to the piano. They propose a source/resonator model of hammer-string interaction aimed at a realistic production of piano sound. Finally, Glass and Fukuodome incorporate a plucked-string model into an audio coder for audio compression and instrument synthesis. The guest editors would like to thank all the authors for their contributions. We would also like to express our deep gratitude to the reviewers for their diligent efforts in evaluating all submitted manuscripts. We hope that this special issue will stimulate further research work on model-based sound synthesis. Vesa Välimäki Augusto Sarti Matti Karjalainen Rudolf Rabenstein Lauri Savioja Vesa Välimäki was born in Kuorevesi, Finland, in He received the M.S. degree in technology, the Licentiate of Science degree in technology, and the Doctor of Science degree in technology, all in electrical engineering from Helsinki University of Technology (HUT), Espoo, Finland, in 199, 1994, and 1995, respectively. He was with the HUT Laboratory of Acoustics and Audio Signal Processing from 199 to 1. In 1996, he was a Postdoctoral Research Fellow with the University of Westminster, London, UK. During the academic year 1- he was Professor of signal processing at the Pori School of Technology and Economics, Tampere University of Technology (TUT), Pori, Finland. In August he returned to HUT, where he is currently Professor of audio signal processing. He was appointed Docent in signal processing at the Pori School of Technology and Economics, TUT, in 3. His research interests are in the application of digital signal processing to audio and music. Dr. Välimäki is a senior member of the IEEE Signal Processing Society and is a member of the Audio Engineering Society, the Acoustical Society of Finland, and the Finnish Musicological Society. Augusto Sarti, born in 1963, received the Laurea degree (1988, cum Laude) and the Ph.D. (1993) in electrical engineering, from the University of Padua, Italy, with research on nonlinear communication systems. He completed his graduate studies at the University of California at Berkeley, where he spent two years doing research on nonlinear system control and on motion planning of nonholonomic systems. In 1993 he joined the Dipartimento di Elettronica e Informazione of the Politecinco di Milano, where he is now an Associate Professor. His current research interests are in the area of digital signal processing, with particular focus on sound analysis, processing and synthesis, image processing, video coding and computer vision. Augusto Sarti authored over 1 scientific publications. He is leading the Image and Sound Processing Group (ISPG) at the Dipartimento di Elettronica e Informazione of the Politecnico di Milano, which contributed to numerous national projects and 8 European research projects. He is currently coordinating the IST European Project ALMA: ALgorithms for the Modelling of Acoustic interactions, and is co-coordinating the IST European Project ORIGAMI: A new paradigm for high-quality mixing of real and virtual. Matti Karjalainen was born in Hankasalmi, Finland, in He received the M.S. and the Dr.Tech. degrees in electrical engineering from the Tampere University of Technology, in 197 and 1978, respectively. Since 198 he has been a Professor of acoustics and audio signal processing at the Helsinki University of Technology in the faculty of Electrical Engineering. In audio technology his interest is in audio signal processing, such as DSP for sound reproduction, perceptually based signal processing, as well as music DSP and sound synthesis. In addition to audio DSP, his research activities cover speech synthesis, analysis, and recognition; perceptual auditory modeling and spatial hearing; DSP hardware, software, and programming environments; as well as various branches of acoustics, including musical acoustics and modeling of musical instruments. He has written more than 3 scientific or engineering articles and contributed to organizing several conferences and workshops. Prof. Karjalainen is an AES Fellow and a Member in IEEE (Institute of Electrical and Electronics Engineers), ASA (Acoustical Society of America), EAA (European Acoustics Association), ICMA (International Computer Music Association), ESCA (European Speech Communication Association), and several Finnish scientific and engineering societies.

Editorial 95 Rudolf Rabenstein received the degrees Diplom-Ingenieur and Doktor- Ingenieur in electrical engineering and the Habilitation degree in signal processing, all from the University of

From 1998 to 1991, he was with the physics department of the University of Siegen, Germany. In 1991, he returned to the Telecommunications Laboratory of the University of Erlangen-Nuremberg.

3 Editorial 95 Rudolf Rabenstein received the degrees Diplom-Ingenieur and Doktor- Ingenieur in electrical engineering and the Habilitation degree in signal processing, all from the University of Erlangen- Nuremberg, Germany in 1981, 1991, and 1996, respectively. He worked with the Telecommunications Laboratory, University of Erlangen-Nuremberg, from 1981 to From 1998 to 1991, he was with the physics department of the University of Siegen, Germany. In 1991, he returned to the Telecommunications Laboratory of the University of Erlangen-Nuremberg. His research interests are in the fields of multidimensional systems theory, multimedia signal processing, and computer music. Rudolf Rabenstein is author and coauthor of more than 1 scientific publications, has contributed to various books and book chapters, and holds several patents in audio engineering. He is a board member of the School of Engineering of the Virtual University of Bavaria, Germany and a member of several engineering societies. Lauri Savioja works as a professor for the Laboratory of Telecommunications Software and Multimedia in the Helsinki University of Technology (HUT), Finland. He received the Doctor of Science degree in Technology in 1999 from the Department of Computer Science, HUT. His research interests include virtual reality, room acoustics, and human-computer interaction.

4 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation Physical Modeling of the Piano N. Giordano Department of Physics, Purdue University, 55 Northwestern Avenue, West Lafayette, IN , USA ng@physics.pudue.edu M. Jiang Department of Physics, Purdue University, 55 Northwestern Avenue, West Lafayette, IN , USA Department of Computer Science, Montana State University, Bozeman, MT 59715, USA jiang@cs.montana.edu Received 1 June 3; Revised 7 October 3 A project aimed at constructing a physical model of the piano is described. Our goal is to calculate the sound produced by the instrument entirely from Newton s laws. The structure of the model is described along with experiments that augment and test the model calculations. The state of the model and what can be learned from it are discussed. Keywords and phrases: physical modeling, piano. 1. INTRODUCTION This paper describes a long term project by our group aimed at physical modeling of the piano. The theme of this volume, model based sound synthesis of musical instruments, is quite broad, so it is useful to begin by discussing precisely what we mean by the term physical modeling. The goal of our project is to use Newton s laws to describe all aspects of the piano. We aim to use F = ma to calculate the motion of the hammers, strings, and soundboard, and ultimately the sound that reaches the listener. Of course, we are not the first group to take such a Newton s law approach to the modeling of a musical instrument. For the piano, there have been such modeling studies of the hammer-string interaction [1,, 3, 4, 5, 6, 7, 8, 9], string vibrations [8, 9, 1], and soundboard motion [11]. (Nice reviews of the physics of the piano are given in [1, 13, 14, 15].) There has been similar modeling of portions of other instruments (such as the guitar [16]), and of several other complete instruments, including the xylophone and the timpani [17, 18, 19]. Our work is inspired by and builds on this previous work. At this point, we should also mention how our work relates to other modeling work, such as the digital waveguide approach, which was recently reviewed in []. The digital waveguide method makes extensive use of physics in choosing the structure of the algorithm; that is, in choosing the proper filter(s) and delay lines, connectivity, and so forth, to properly match and mimic the Newton s law equations of motion of the strings, soundboard, and other components of the instrument. However, as far as we can tell, certain features of the model, such as hammer-string impulse functions and the transfer function that ultimately relates the sound pressure to the soundboard motion (and other similar transfer functions), are taken from experiments on real instruments. This approach is a powerful way to produce realistic musical tones efficiently, in real time and in a manner that can be played by a human performer. However, this approach cannot address certain questions. For example, it would not be able to predict the sound that would be produced if a radically new type of soundboard was employed, or if the hammers were covered with a completely different type of material than the conventional felt. The physical modeling method that we describe in this paper can address such questions. Hence, we view the ideas and method embodied in work of Bank and coworkers [] (and the references therein) as complementary to the physical modeling approach that is the focus of our work. In this paper, we describe the route that we have taken to assembling a complete physical model of the piano. This complete model is really composed of interacting submodels which deal with (1) the motions of the hammers and strings and their interaction, () soundboard vibrations, and (3) sound generation by the vibrating soundboard. For each of these submodels we must consider several issues, including selection and implementation of the computational algorithm, determination of the values of the many parameters that are involved, and testing the submodel. After considering each of the submodels, we then describe how they are combined to produce a complete computational piano. The

5 Physical Modeling of the Piano 97 quality of the calculated tones is discussed, along with the lessons we have learned from this work. A preliminary and abbreviated report on this project was given in [1].. OVERALL STRATEGY AND GOALS One of the first modeling decisions that arises is the question of whether to work in the frequency domain or the time domain. In many situations, it is simplest and most instructive to work in the frequency domain. For example, an understanding of the distribution of normal mode frequencies, and the nature of the associated eigenvectors for the body vibrations of a violin or a piano soundboard, is very instructive. However, we have chosen to base our modeling in the time domain. We believe that this choice has several advantages. First, the initial excitation in our case this is the motion of a piano hammer just prior to striking a string is described most conveniently in the time domain. Second, the interaction between various components of the instrument, such as the strings and soundboard, is somewhat simpler when viewed in the time domain, especially when one considers the early attack portion of a tone. Third, our ultimate goal is to calculate the room pressure as a function of time, so it is appealing to start in the time domain with the hammer motion and stay in the time domain throughout the calculation, ending with the pressure as would be received by a listener. Our time domain modeling is based on finite difference calculations [1] thatdescribe allaspectsofthe instrument. A second element of strategy involves the determination of the many parameters that are required for describing the piano. Ideally, one would like to determine all of these parameters independently, rather than use them as fitting parameters when comparing the modeling results to real (measured) tones. This is indeed possible for all of the parameters. For example, dimensional parameters such as the string diameters and lengths, soundboard dimensions, and bridge positions, can all be measured from a real piano. Likewise, various material properties such as the string stiffness, the elastic moduli of the soundboard, and the acoustical properties of the room in which the numerical piano is located, are well known from very straightforward measurements. For a few quantities, most notably the force-compression characteristics of the piano hammers, it is necessary to use separate (and independent) experiments. This brings us to a third element of our modeling strategy the problem of how to test the calculations. The final output is the sound at the listener, so one could test the model by simply evaluating the sounds via listening tests. However, it is very useful to separately test the submodels. For example, the portion of the model that deals with soundboard vibrations can be tested by comparing its predictions for the acoustic impedance with direct measurements [11,, 3, 4]. Likewise, the room-soundboard computation can be compared with studies of sound production by a harmonically driven soundboard [5]. This approach, involving tests against specially designed experiments, has proven to be extremely valuable. The issue of listening tests brings us to the question of goals, that is, what do we hope to accomplish with such a modeling project? At one level, we would hope that the calculated piano tones are realistic and convincing. The model could then be used to explore what various hypothetical pianos would sound like. For example, one could imagine constructing a piano with a carbon fiber soundboard, and it would be very useful to be able to predict its sound ahead of time, or to use the model in the design of the new soundboard. On a different and more philosophical level, one might want to ask questions such as what are the most important elements involved in making a piano sound like a piano? We emphasize that it is not our goal to make a real time model, nor do we wish to compete with the tones produced by other modeling methods, such as sampling synthesis and digital waveguide modeling []. 3. STRINGS AND HAMMERS Our model begins with a piano hammer moving freely with a speed v h just prior to making contact with a string (or strings, since most notes involve more than one string). Hence, we ignore the mechanics of the action. This mechanics is, of course, quite important from a player s perspective, since it determines the touch and feel of the instrument [6]. Nevertheless, we will ignore these issues, since (at least to a first approximation) they are not directly relevant to the composition of a piano tone and we simply take v h as an input parameter. Typical values are in the range 1 4 m/s [9]. When a hammer strikes a string, there is an interaction force that is a function of the compression of the hammer felt, y f. This force determines the initial excitation and is thus a crucial factor in the composition of the resulting tone. Considerable effort has been devoted to understanding the hammer-string force [1,, 3, 4, 5, 6, 7, 7, 8, 9, 3, 31, 3, 33]. Hammer felt is a very complicated material [34], and there is no first principles expression for the hammerstring force relation F h (y f ). Much work has assumed a simple power law function F h ( y f ) = F y p f, (1) where the exponent p is typically in the range.5 4 and F is an overall amplitude. This power law form seems to be at least qualitatively consistent with many experiments and we therefore used (1) in our initial modeling calculations. While (1) has been widely used to analyze and interpret experiments, and also in previous modeling work, it has been known for some time that the force-compression characteristic of most real piano hammers is not a simple reversible function [7, 7, 8, 9, 3]. Ignoring the hysteresis has seemed reasonable, since the magnitude of the irreversibility is often found to be small. Figure 1 shows the force-compression characteristic for a particular hammer (a Steinway hammer from the note middle C) measured in two different ways. In the type I measurement, the hammer struck a stationary force sensor and the resulting force and felt compression were measured as described in [31]. We see

6 98 EURASIP Journal on Applied Signal Processing Fh (N) 1 Hammer force characteristics Hammer C4 Type I exp. Type II exp y f (mm) Figure 1: Force-compression characteristics measured for a particular piano hammer measured in two different ways. In the type I experiment (dotted curve), the hammer struck a stationary force sensor and the resulting force, F h, and felt compression, y f, were measured. The initial hammer velocity was approximately 1 m/s. The solid curve is the measured force-compression relation obtained in a type II measurement, in which the same hammer impacted a piano string. This behavior is described qualitatively by (), with parameters p = 3.5, F = N, ɛ =.9, and τ = s. The dashed arrows indicate compression/decompression branches. that for a particular value of the felt compression, y f, the force is larger during the compression phase of the hammerstring collision than during decompression. However, this difference is relatively small, generally no more than 1% of the total force. Provided that this hysteresis is ignored, the type I result is described reasonably well by the power law function (1)withp 3. However, we will see below that (1) is not adequate for our modeling work, and this has led us to consider other forms for F h. In order to shed more light on the hammer-string force, we developed a new experimental approach, which we refer to as a type II experiment, in which the force and felt compression are measured as the hammer impacts on a string [3, 35]. Since the string rebounds in response to the hammer, the hammer-string contact time in this case is considerably longer (by a factor of approximately 3) than in the type I measurement. The force-compression relation found in this typeiimeasurementisalsoshowninfigure 1.Incontrastto the type I measurements, the type II results for F h (y)donot consist of two simple branches (one for compression and another for decompression). Instead, the type II result exhibits loops, which arise for the following reason. When the hammer first contacts the string, it excites pulses that travel to the ends of the string, are reflected at the ends, and then return. These pulses return while the hammer is still in contact with the string, and since they are inverted by the reflection, they cause an extra series of compression/decompression cycles for the felt. There is considerable hysteresis during these cycles, much more than might have been expected from the type I result. The overall magnitude of the type II force is also somewhat smaller; the hammer is effectively softer under the type II conditions. Since the type II arrangement is the one found in real piano, it is important to use this hammerforce characteristic in modeling. We have chosen to model our hysteretic type II hammer measurements following the proposal of Stulov [3, 33]. He has suggested the form ( F h y f (t) ) [ = F g ( y f (t) ) ɛ g ( y f (t ) ) exp ( ] ) (t t )/τ dt. t () Here, τ is a characteristic (memory) time scale associated with the felt, ɛ is a measure of the magnitude of the hysteresis, and y f (t) is the variation of the compression with time. In other words, () says that the felt remembers its previous compression history over a time of order τ, and that the force is reduced according to how much the felt has been compressed during that period. The inherent nonlinearity of the hammer is specified by the function g(z); Stulov took this to be a power law g(z) = z p. (3) Stulov has compared () to measurements with real hammers and reported very good agreement using τ, ɛ, p, and F as fitting parameters. Our own tests of () have not shown such good agreement; we have found that it provides only a qualitative (and in some cases semiquantitative) description of the hysteresis shown in Figure 1 [35]. Nevertheless, it is currently the best mathematical description available for the hysteresis, and we have employed it in our modeling calculations. Our string calculations are based on the equation of motion [8, 1, 36] y t = c s [ ] y x ɛ 4 y y x 4 α 1 t α 3 y t 3, (4) where y(x, t) is the transverse string displacement at time t and position x along the string. c s µ/t is the wave speed for an ideal string (with stiffness and damping ignored), with T the tension and µ the mass per unit length of the string. When the parameters ɛ, α 1,andα are zero, this is just the simple wave equation. Equation (4) describes only the polarization mode for which the string displacement is parallel to the initial velocity of the hammer. The other transverse mode and also the longitudinal mode are both ignored; experiments have shown that both of these modes are excited in real piano strings [37, 38, 39],butwewillleavethemfor future modeling work. The term in (4) thatisproportional to ɛ arises from the stiffness of the string. It turns out that c s ɛ = r s Es /ρ s,wherer s, E s,andρ s are the radius, Young s

7 Physical Modeling of the Piano 99 modulus, and density of the string, respectively, [9, 36]. For typical piano strings, ɛ is of order 1 4, so the stiffness term in (4) is small, but it cannot be neglected as it produces the well-known effect of stretched octaves [36]. Damping is accounted for with the terms involving α 1 and α ; one of these terms is proportional to the string velocity, while the other is proportional to 3 y/ t 3. This combination makes the damping dependent on frequency in a manner close to that observed experimentally [8, 1]. Our numerical treatment of the string motion employs a finite difference formulation in which both time t and position x are discretized in units t s and x s [8, 9, 1, 4]. The string displacement is then y(x, t) y(i x s, n t s ) y(i, n). If the derivatives in (4) are written in finite difference form, this equation can be rearranged to express the string displacement at each spatial location i at time step n1 in terms of the displacement at previous time steps as described by Chaigne and Askenfelt [8, 1]. The equation of motion (4) does not contain the hammer force. This is included by the addition of a term on the right-hand side proportional to F h, which acts at the hammer strike point. Since the hammer has a finite width, it is customary to spread this force over a small length of the string [8]. So far as we know, the details of how this force is distributed have never been measured; fortunately our modeling results are not very sensitive to this factor (so long as the effective hammer width is qualitatively reasonable). With this approach to the string calculation, the need for numerical stability together with the desired frequency range require that each string be treated as 5 1 vibrating numerical elements [8, 1]. 4. THE SOUNDBOARD Wood is a complicated material [41]. Soundboards are assembled from wood that is quarter sawn, which means that two of the principal axes of the elastic constant tensor lie in the plane of the board. The equation of motion for such a thin orthotropic plate is [11,, 3, 4] ρ b h z b t = D 4 z x x 4 ( ) D x ν y D y ν x 4D 4 z xy x y D 4 z y y 4 F s(x, y) β z (5) t, where the rigidity factors are h 3 b D x = E x 1 ( ), 1 ν x ν y h 3 b D y = E y 1 ( ), 1 ν x ν y D xy = h3 b G xy 1. (6) Here, our board lies in the x y plane and z is its displacement. (These x and y directions are, of course, not the same as the x and y coordinates used in describing the string motion.) The soundboard coordinates x and y run perpendicular and parallel to the grain of the board. E x and ν x are Young s modulus and Poisson s ration for the x direction, and so forth for y, G xy is the shear modulus, h b is the board thickness and ρ b is its density. The values of all elastic constants were taken from [41]. In order to model the ribs and bridges, the thickness and rigidity factors are position dependent (since these factors are different at the ribs and bridges than on the bare board) as described in [11]. There are also some additional terms that enter the equation of motion (5) at the ends of bridges [11, 17, 18, 43]. F s (x, y) is the force from the strings on the bridge. This force acts at the appropriate bridge location; it is proportional to the component of the string tension perpendicular to the plane of the board, and is calculated from the string portion of the model. Finally, we include a loss term proportional to the parameter β [11]. The physical origin of this term involves elastic losses within the board. We have not attempted to model this physics according to Newton s laws, but have simply chosen a value of β which yields a quality factor for the soundboard modes which is similar to that observed experimentally [11, 4]. 1 Finally, we note that the soundboard acts back on the strings, since the bridge moves and the strings are attached to the bridge. Hence, the interaction of strings in a unison group, and also sympathetic string vibrations (with the dampers disengaged from the strings) are included in the model. For the solution of (5), we again employed a finite difference algorithm. The space dimensions x and y were discretized, both in steps of size x b ; this spatial step need not be related to the step size for the string x s. As in our previous work on soundboard modeling [11], we chose x b = cm, since this is just small enough to capture the structure of the board, including the widths of the ribs and bridges. Hence, the board was modeled as 1 1 vibrating elements. The behavior of our numerical soundboard can be judged by calculations of the mechanical impedance, Z, as defined by Z = F v b, (7) where F is an applied force and v b is the resulting soundboard velocity. Here, we assume that F is a harmonic (single frequency) force applied at a point on the bridge and v b is measured at the same point. Figure shows results calculated from our model [11] for the soundboard from an upright piano. Also shown are measurements for a real upright soundboard (with the same dimensions and bridge positions, etc., as in the model). The agreement is quite acceptable, especially considering that parameters such as the dimensions of the soundboard, the position and thickness of the ribs and bridges, and the elastic constants of the board were taken 1 In principle, one might expect the soundboard losses to be frequency dependent, as found for the string. At present there is no good experimental data on this question, so we have chosen the simplest possible model with just a single loss term in (5).

8 93 EURASIP Journal on Applied Signal Processing Z (kg/s) Experiment Model j x r,andk x r. The grids for v y and v z are arranged in a similar manner, as explained in [44, 45]. Sound is generated in this numerical room by the vibration of the soundboard. We situate the soundboard from the previous section on a plane perpendicular to the z direction in the room, approximately 1 m from the nearest parallel wall (i.e., the floor). At each time step the velocity v z of the room air at the surface of the soundboard is set to the calculated soundboard velocity at that instant, as obtained from the soundboard calculation. The room is taken to be a rectangular box with the same acoustical properties for all 6 walls. The walls of the room are modeled in terms of their acoustic impedance, Z,with 1 Soundboard impedance Upright piano at middle C f(hz) p = Zv n, (9) where v n is the component of the (air) velocity normal to the wall [46]. Measurements of Z for a number of materials [47] have found that it is typically frequency dependent with the form Figure : Calculated (solid curve) and measured (dotted curve) mechanical impedance for an upright piano soundboard. Here, the force was applied and the board velocity was measured at the point where the string for middle C crosses the bridge. Results from [11, 4]. from either direct measurements or handbook values (e.g., Young s modulus). 5. THE ROOM Our time domain room modeling follows the work of Botteldooren [44, 45]. We begin with the usual coupled equations for the velocity and pressure in the room p t = ρ ac a v ρ x a t v y ρ a t v ρ z a = p x, = p y, = p z, t [ v x x v y y v z z where p is the pressure, the velocity components are v x, v y, and v z, ρ a is the density, and c a is the speed of sound in air. This family of equations is similar in form to an electromagnetic problem, and much is known about how to deal with it numerically. We employ a finite difference approach in which staggered grids in both space and time are used for the pressure and velocity. Given a time step t r, the pressure is computed at times n t r while the velocity is computed at times (n1/) t r. A similar staggered grid is used for the space coordinates, with the pressure calculated on the grid i x r, j x r, k x r, while v x is calculated on the staggered grid (i1/) x r, ], (8) Z(ω) Z iz ω, (1) where ω is the angular frequency. Incorporating this frequency domain expression for the acoustic impedance into our time domain treatment was done in the manner described in [45]. The time step for the room calculation was t r = 1/ s, as explained in the next section. The choice of spatial step size x r was then influenced by two considerations. First, in order for the finite difference algorithm to be numerically stable in three dimensions, one must have x r /( 3 t r ) >c a. Second, it is convenient for the spatial steps for the soundboard and room to be commensurate. In the calculations described below, the room step size was x r = 4 cm, that is, twice the soundboard step size. When using the calculated soundboard velocity to obtain the room velocity at the soundboard surface, we averaged over 4 soundboard grid points for each room grid point. Typical numerical rooms were 3 4 4m 3, and thus contained 1 6 finite difference elements. Figure 3 shows results for the sound generation by an upright soundboard. Here, the soundboard was driven harmonically at the point where the string for middle C contacts the bridge, and we plot the sound pressure normalized by the board velocity at the driving point [5]. It is seen that the model results compare well with the experiments. This provides a check on both the soundboard and the room models. 6. PUTTING IT ALL TOGETHER Our model involves several distinct but coupled subsystems the hammers/strings, the soundboard, and the room and it is useful to review how they fit together computationally. The calculation begins by giving some initial velocity to a particular hammer. This hammer then strikes a string (or strings), and they interact through either (1) or(). This sets the string(s) for that note into motion,

9 Physical Modeling of the Piano 931 p/vb (arb. units) 1 5 Sound generation Soundboard driven at C4 the model should transfer easily to a cluster (i.e., multi-cpu) machine. We have also explored an alternative approach to the room modeling involving a ray tracing approach [48]. Ray tracing allows one to express the relationship between soundboard velocity and sound pressure as a multiparameter map, involving approximately 1 4 parameters. The values of these parameters can be precalculated and stored, resulting in about an order of magnitude speed-up in the calculation as compared to the room algorithm described above. 1 Model Experiment Frequency (Hz) Figure 3: Results for the sound pressure normalized by the soundboard velocity for an upright piano soundboard: calculated (solid curve) and measured (dotted curve). The board was driven at the point where the string for middle C crosses the bridge. Results from [5]. and these in turn act on the bridge and soundboard. As we have already mentioned, the vibrations of each component of our model are calculated with a finite difference algorithm, each with an associated time step. Since the systems are coupled that is, the strings drive the soundboard, the soundboard acts back on the strings, and the soundboard drives the room it would be computationally simpler to use the same value of the time step for all three subsystems. However, the equation of motion for the soundboard is highly dispersive, and stability requirements demand a much smaller time step for the soundboard than is needed for string and room simulations. Given the large number of room elements, this would greatly (and unnecessarily) slow down the calculation. We have therefore chosen to instead make the various time steps commensurate, with t r = (1/5) s, t s = t r /4, t b = t s /6, (11) where the subscripts correspond to the room (r), string (s), and soundboard (b). To explain this hierarchy, we first note that the room time step is chosen to be compatible with common audio hardware and software; 1/ t r is commensurate with the data rates commonly used in CD sound formats. We then see that each room time step contains 4 string time steps; that is, the string algorithm makes 4 iterations for each iteration of the room model. Likewise, each string time step contains 6 soundboard steps. The overall computational speed is currently somewhat less than real time. With a typical personal computer (clock speed 1 GHz), a 1 minute simulation requires approximately 3 minutes of computer time. Of course, this gap will narrow in the future in accord with Moore s law. In addition, 7. ANALYSIS OF THE RESULTS: WHAT HAVE WE LEARNED AND WHERE DO WE GO NEXT? In the previous section, we saw that a real-time Newton s law simulation of the piano is well within reach. While such a simulation would certainly be interesting, it is not a primary goal of our work. We instead wish to use the modeling to learn about the instrument. With that in mind, we now consider the quality of the tones calculated with the current version of the model. In our initial modeling, we employed power law hammers described by (1) with parameters based on type I hammer experiments by our group [31]. The results were disappointing it is hard to accurately describe the tones in words, but they sounded distinctly plucked and somewhat metallic. While we cannot include our calculated sounds as part of this paper, they are available on our website After many modeling calculations, we came to the conclusion that the hammer model for example, the power law description (1) wasthe problem. Note that we do not claim that power law hammers must always give unsatisfactory results. Our point is that when the power law parameters are chosen to fit the type I behavior of real hammers, the calculated tones are poor. It is certainly possible (and indeed, likely) that power law parameters that will yield good piano tones can be found. However, based on our experience, it seems that these parameters should be viewed as fitting parameters, as they may not accurately describe any real hammers. This led us to the type II hammer experiments described above, and to a description of the hammer-string force in terms of the Stulov function (), with parameters (τ, ɛ,etc.) taken from these type II experiments [35]. The results were much improved. While they are not yet Steinway quality, it is our opinion that the calculated tones could be mistaken for a real piano. In that sense, they pass a sort of acoustical Turing test. Our conclusion is that the hammers are an essential part of the instrument. This is hardly a revolutionary result. However, based on our modeling, we can also make a somewhat stronger statement: in order to obtain a realistic piano tone, the modeling should be based on hammer parameters observed in type II measurements, with the hysteresis included in the model. There are a number of issues that we plan to address in the future. (1) The hammer portion of the model still needs attention. Our experiments[35] indicate that while the Stulov function does provide a qualitative description of

10 93 EURASIP Journal on Applied Signal Processing the hammer force hysteresis, there are significant quantitative differences. It may be necessary to develop a better functional description to replace the Stulov form. () As it currently stands, our string model includes only one polarization mode, corresponding to vibrations parallel to the initial hammer velocity. It is well known that the other transverse polarization mode can be important [37]. This canbe readily included, but will require a more general soundboard model since the two transverse modes couple through the motion of the bridge. (3) The soundboard of a real piano is supported by a case. Measurements in our laboratory indicate that the case acceleration can be as large as 5% or so of the soundboard acceleration, so the sound emitted by the case is considerable. (4) We plan to refine the room model. Our current room model is certainly a very crude approximation to a realistic room. Real rooms have wall coverings of various types (with differing values of the acoustic impedances), and contain chairs and other objects. At our current level of sophistication, it appears that the hammers are more of a limitation than the room model, but this may well change as the hammer modeling is improved. In conclusion, we have made good progress in developing a physical model of the piano. It is now possible to produce realistic tones using Newton s laws with realistic and independently determined instrument parameters. Further improvements of the model seem quite feasible. We believe that physical modeling can provide new insights into the piano, and that similar approaches can be applied to other instruments. ACKNOWLEDGMENTS We thank P. Muzikar, T. Rossing, A. Tubis, and G. Weinreich for many helpful and critical discussions. We also are indebted to A. Korty, J. Winans II, J. Millis, S. Dietz, J. Jourdan, J. Roberts, and L. Reuff for their contributions to our piano studies. This work was supported by National Science Foundation (NSF) through grant PHY REFERENCES [1] D. E. Hall, Piano string excitation in the case of small hammer mass, Journal of the Acoustical Society of America, vol. 79, no. 1, pp , [] D. E. Hall, Piano string excitation II: General solution for a hard narrow hammer, Journal of the Acoustical Society of America, vol. 81, no., pp , [3] D. E. Hall, Piano string excitation III: General solution for a soft narrow hammer, Journal of the Acoustical Society of America, vol. 81, no., pp , [4] D. E. Hall and A. Askenfelt, Piano string excitation V: Spectra for real hammers and strings, Journal of the Acoustical Society of America, vol. 83, no. 4, pp , [5] D. E. Hall, Piano string excitation. VI: Nonlinear modeling, Journal of the Acoustical Society of America, vol. 9, no. 1, pp , 199. [6] H. Suzuki, Model analysis of a hammer-string interaction, Journal of the Acoustical Society of America, vol. 8, no. 4, pp , [7] X. Boutillon, Model for piano hammers: Experimental determination and digital simulation, Journal of the Acoustical Society of America, vol. 83, no., pp , [8] A. Chaigne and A. Askenfelt, Numerical simulations of piano strings. I. A physical model for a struck string using finite difference method, Journal of the Acoustical Society of America, vol. 95, no., pp , [9] A. Chaigne and A. Askenfelt, Numerical simulations of piano strings. II. Comparisons with measurements and systematic exploration of some hammer-string parameters, Journal of the Acoustical Society of America, vol. 95, no. 3, pp , [1] A. Chaigne, On the use of finite differences for musical synthesis. Application to plucked stringed instruments, Journal of the d Acoustique, vol. 5, no., pp , 199. [11] N. Giordano, Simple model of a piano soundboard, Journal of the Acoustical Society of America, vol. 1, no., pp , [1] H. A. Conklin Jr., Design and tone in the mechanoacoustic piano. Part I. Piano hammers and tonal effects, Journal of the Acoustical Society of America, vol. 99, no. 6, pp , [13] H. Suzuki and I. Nakamura, Acoustics of pianos, Appl. Acoustics, vol. 3, pp , 199. [14] H. A. Conklin Jr., Design and tone in the mechanoacoustic piano. Part II. Piano structure, Journal of the Acoustical Society of America, vol. 1, no., pp , [15] H. A. Conklin Jr., Design and tone in the mechanoacoustic piano. Part III. Piano strings and scale design, Journal of the Acoustical Society of America, vol. 1, no. 3, pp , [16] B. E. Richardson, G. P. Walker, and M. Brooke, Synthesis of guitar tones from fundamental parameters relating to construction, Proceedings of the Institute of Acoustics, vol. 1, no. 1, pp , 199. [17] A. Chaigne and V. Doutaut, Numerical simulations of xylophones. I. Time-domain modeling of the vibrating bars, Journal of the Acoustical Society of America, vol. 11, no. 1, pp , [18] V. Doutaut, D. Matignon, and A. Chaigne, Numerical simulations of xylophones. II. time-domain modeling of the resonator and of the radiated sound pressure, Journal of the Acoustical Society of America, vol. 14, no. 3, pp , [19] L. Rhaouti, A. Chaigne, and P. Joly, Time-domain modeling and numerical simulation of a kettledrum, Journal of the Acoustical Society of America, vol. 15, no. 6, pp , [] B. Bank, F. Avanzini, G. Borin, G. De Poli, F. Fontana, and D. Rocchesso, Physically informed signal processing methods for piano sound synthesis: a research overview, EURASIP Journal on Applied Signal Processing, vol. 3, no. 1, pp , 3. [1] N. Giordano, M. Jiang, and S. Dietz, Experimental and computational studies of the piano, in Proc. 17th International Congress on Acoustics, vol. 4, Rome, Italy, September 1. [] J. Kindel and I.-C. Wang, Modal analysis and finite element analysis of a piano soundboard, in Proc. 5th International Modal Analysis Conference, pp , Union College, Schenectady, NY, USA, [3] J. Kindel, Modal analysis and finite element analysis of a piano soundboard, M.S. thesis, University of Cincinnati, Cincinnati, Ohio, USA, [4] N. Giordano, Mechanical impedance of a piano soundboard, Journal of the Acoustical Society of America, vol. 13, no. 4, pp , 1998.

Physical Modeling of the Piano 933 [5] N. Giordano, Sound production by a vibrating piano soundboard: Experiment, Journal of the Acoustical Society of America, vol. 14, no. 3, pp. 1648 1653, 1998.

Yanagisawa, K. Nakamura, and H. Aiko, Experimental study on force-time curve during the contact between hammer and piano string, Journal of the Acoustical Society of Japan, vol. 37, pp. 67 633, 1981.

11 Physical Modeling of the Piano 933 [5] N. Giordano, Sound production by a vibrating piano soundboard: Experiment, Journal of the Acoustical Society of America, vol. 14, no. 3, pp , [6] A. Askenfelt and E. V. Jansson, From touch to string vibrations. II. The motion of the key and hammer, Journal of the Acoustical Society of America, vol. 9, no. 5, pp , [7] T. Yanagisawa, K. Nakamura, and H. Aiko, Experimental study on force-time curve during the contact between hammer and piano string, Journal of the Acoustical Society of Japan, vol. 37, pp , [8] T. Yanagisawa and K. Nakamura, Dynamic compression characteristics of piano hammer, Transactions of Musical Acoustics Technical Group Meeting of the Acoustic Society of Japan, vol. 1, pp , 198. [9] T. Yanagisawa and K. Nakamura, Dynamic compression characteristics of piano hammer felt, Journal of the Acoustical Society of Japan, vol. 4, pp , [3] A. Stulov, Hysteretic model of the grand piano hammer felt, Journal of the Acoustical Society of America, vol. 97, no. 4, pp , [31] N. Giordano and J. P. Winans II, Piano hammers and their force compression characteristics: does a power law make sense?, Journal of the Acoustical Society of America, vol. 17, no. 4, pp ,. [3] N. Giordano and J. P. Millis, Hysteretic behavior of piano hammers, in Proc. International Symposium on Musical Acoustics,D.Bonsi,D.Gonzalez,andD.Stanzial,Eds.,pp. 37 4, Perugia, Umbria, Italy, September 1. [33] A. Stulov and A. Mägi, Piano hammer: Theory and experiment, in Proc. International Symposium on Musical Acoustics, D. Bonsi, D. Gonzalez, and D. Stanzial, Eds., pp. 15, Perugia, Umbria, Italy, September 1. [34] J. I. Dunlop, Nonlinear vibration properties of felt pads, Journal of the Acoustical Society of America, vol. 88, no., pp , 199. [35] N. Giordano and J. P. Millis, Using physical modeling to learn about the piano: New insights into the hammer-string force, in Proc. International Congress on Acoustics, S. Furui, H. Kanai, and Y. Iwaya, Eds., pp. III 113, Kyoto, Japan, April 4. [36] N. H. Fletcher and T. D. Rossing, The Physics of Musical Instruments, Springer-Verlag, New York, NY, USA, [37] G. Weinreich, Coupled piano strings, Journal of the Acoustical Society of America, vol. 6, no. 6, pp , [38] M. Podlesak and A. R. Lee, Dispersion of waves in piano strings, Journal of the Acoustical Society of America, vol. 83, no. 1, pp , [39] N. Giordano and A. J. Korty, Motion of a piano string: longitudinal vibrations and the role of the bridge, Journal of the Acoustical Society of America, vol. 1, no. 6, pp , [4] N. Giordano, Computational Physics, Prentice-Hall, Upper Saddle River, New Jersey, USA, [41] V. Bucur, Acoustics of Wood, CRC Press, Boca Raton, Fla, USA, [4] S. G. Lekhnitskii, Anisotropic Plates, GordonandBreachScience Publishers, New York, NY, USA, [43] J. W. S. Rayleigh, Theory of Sound,Dover,NewYork,NY,USA, [44] D. Botteldooren, Acoustical finite-difference time-domain simulation in a quasi-cartesian grid, Journal of the Acoustical Society of America, vol. 95, no. 5, pp , [45] D. Botteldooren, Finite-difference time-domain simulation of low-frequency room acoustic problems, Journal of the Acoustical Society of America, vol. 98, no. 6, pp , [46] P. M. Morse and K. U. Ingard, Theoretical Acoustics, Princeton University Press, Princeton, NJ, USA, [47] L. L. Beranek, Acoustic impedance of commercial materials and the performance of rectangular rooms with one treated surface, Journal of the Acoustical Society of America, vol. 1, pp. 14 3, 194. [48] M. Jiang, Room acoustics and physical modeling of the piano, M.S. thesis, Purdue University, West Lafayette, Ind, USA, N. Giordano obtained his Ph.D. at Yale University in 1977, and has been at the Department of Physics at Purdue University since His research interests include mesoscopic and nanoscale physics, computational physics, and musical acoustics. He the author of the textbook Computational Physics (Prentice-Hall, 1997). He also collects and restores antique pianos. M. Jiang has a B.S. degree in physics (1997) from Peking University, China, and M.S. degrees in both physics and computer science (1999) from Purdue University. Some of the work described in this paper was part of his physics M.S. thesis. After graduation, he worked as a software engineer for two years, developing Unix kernel software and device drivers. In, he moved to Bozeman, Montana, where he is now pursuing a Ph.D. in Computer Science in Montana State University. Minghui s current research interests include the design of algorithms, computational geometry, and biological modeling and bioinformatics.

12 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation Sound Synthesis of the Harpsichord Using a Computationally Efficient Physical Model Vesa Välimäki Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3, 15 HUT, Espoo, Finland vesa.valimaki@hut.fi Henri Penttinen Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3, 15 HUT, Espoo, Finland henri.penttinen@hut.fi Jonte Knif Sibelius Academy, Centre for Music and Technology, P.O. Box 86, 51 Helsinki, Finland jknif@siba.fi Mikael Laurson Sibelius Academy, Centre for Music and Technology, P.O. Box 86, 51 Helsinki, Finland laurson@siba.fi Cumhur Erkut Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3, FIN-15 HUT, Espoo, Finland cumhur.erkut@hut.fi Received 4 June 3; Revised 8 November 3 A sound synthesis algorithm for the harpsichord has been developed by applying the principles of digital waveguide modeling. A modification to the loss filter of the string model is introduced that allows more flexible control of decay rates of partials than is possible with a one-pole digital filter, which is a usual choice for the loss filter. A version of the commuted waveguide synthesis approach is used, where each tone is generated with a parallel combination of the string model and a second-order resonator that are excited with a common excitation signal. The second-order resonator, previously proposed for this purpose, approximately simulates the beating effect appearing in many harpsichord tones. The characteristic key-release thump terminating harpsichord tones is reproduced by triggering a sample that has been extracted from a recording. A digital filter model for the soundboard has been designed based on recorded bridge impulse responses of the harpsichord. The output of the string models is injected in the soundboard filter that imitates the reverberant nature of the soundbox and, particularly, the ringing of the short parts of the strings behind the bridge. Keywords and phrases: acoustic signal processing, digital filter design, electronic music, musical acoustics. 1. INTRODUCTION Sound synthesis is particularly interesting for acoustic keyboard instruments, since they are usually expensive and large and may require amplification during performances. Electronic versions of these instruments benefit from the fact that keyboard controllers using MIDI are commonly available and fit for use. Digital pianos imitating the timbre and features of grand pianos are among the most popular electronic instruments. Our current work focuses on the imitation of the harpsichord, which is expensive, relatively rare, but is still commonly used in music from the Renaissance and the baroque era. Figure 1 shows the instrument used in this study. It is a two-manual harpsichord that contains three individual sets of strings, two bridges, and has a large soundboard.

Sound Synthesis of the Harpsichord Using a Physical Model 935 Figure 1: The harpsichord used in the measurements has two manuals, three string sets, and two bridges.

13 Sound Synthesis of the Harpsichord Using a Physical Model 935 Figure 1: The harpsichord used in the measurements has two manuals, three string sets, and two bridges. The picture was taken during the tuning of the instrument in the anechoic chamber. Instead of wavetable and sampling techniques that are popular in digital instruments, we apply modeling techniques to design an electronic instrument that sounds nearly identical to its acoustic counterpart and faithfully responds to the player s actions, just as an acoustic instrument. We use the modeling principle called commuted waveguide synthesis [1,, 3], but have modified it, because we use a digital filter to model the soundboard response. Commuted synthesis uses the basic property of linear systems, that in a cascade of transfer functions their ordering can be changed without affecting the overall transfer function. This way, the complications in the modeling of the soundboard resonances extracted from a recorded tone can be hidden in the input sequence. In the original form of commuted synthesis, the input signal contains the contribution of the excitation mechanism the quill plucking the string and that of the soundboard with all its vibrating modes [4]. In the current implementation, the input samples of the string models are short (less than half a second) and contain only the initial part of the soundboard response; the tail of the soundboard response is reproduced with a reverberation algorithm. Digital waveguide modeling [5] appears to be an excellent tool for the synthesis of harpsichord tones. A strong argument supporting this view is that tones generated using the basic Karplus-Strong algorithm [6] are reminiscent of the harpsichord for many listeners. 1 This synthesis technique has been shown to be a simplified version of a waveguide string model [5, 7]. However, this does not imply that realistic harpsichord synthesis is easy. A detailed imitation of the properties of a fine instrument is challenging, even though the starting point is very promising. Careful modifications to the algorithm and proper signal analysis and calibration routines are needed for a natural-sounding synthesis. The new contributions to stringed-instrument models include a sparse high-order loop filter and a soundboard 1 The Karplus-Strong algorithm manages to sound something like the harpsichord in some registers only when a high sampling rate is used, such as 44.1 khz or.5 khz. At low sample rates, it sounds somewhat similar to violin pizzicato tones. model that consists of the cascade of a shaping filter and a common reverb algorithm. The sparse loop filter consists of a conventional one-pole filter and a feedforward comb filter inserted in the feedback loop of a basic string model. Methods to calibrate these parts of the synthesis algorithm are proposed. This paper is organized as follows. Section gives a short overview on the construction and acoustics of the harpsichord. In Section3, signal-processing techniques for synthesizing harpsichord tones are suggested. In particular, the new loop filter is introduced and analyzed. Section 4 concentrates on calibration methods to adjust the parameters according to recordings. The implementation of the synthesizer using a block-based graphical programming language is described in Section 5, where we also discuss the computational complexity and potential applications of the implemented system. Section 6 contains conclusions, and suggests ideas for further research.. HARPSICHORD ACOUSTICS The harpsichord is a stringed keyboard instrument with a long history dating back to at least the year 144 [8]. It is the predecessor of the pianoforte and the modern piano. It belongs to the group of plucked string instruments due to its excitation mechanism. In this section, we describe briefly the construction and the operating principles of the harpsichord and give details of the instrument used in this study. For a more in-depth discussion and description of the harpsichord, see, for example, [9, 1, 11, 1], and for a description of different types of harpsichord, the reader is referred to [1]..1. Construction of the instrument The form of the instrument can be roughly described as triangular, and the oblique side is typically curved. A harpsichord has one or two manuals that control two to four sets of strings, also called registers or string choirs. Two of the string choirs are typically tuned in unison. These are called the 8 (8 foot) registers. Often the third string choir is tuned an octave higher, and it is called the 4 register. The manuals can be set to control different registers, usually with a limited number of combinations. This permits the player to use different registers with left- and right-hand manuals, and therefore vary the timbre and loudness of the instrument. The 8 registers differ from each other in the plucking point of the strings. Hence, the 8 registers are called 8 back and front registers, where back refers to the plucking point away from the nut (and the player). The keyboard of the harpsichord typically spans four or five octaves, which became a common standard in the early 18th century. One end of the strings is attached to the nut and the other to a long, curved bridge. The portion of the string behind the bridge is attached to a hitch pin, which is on top of the soundboard. This portion of the string also tends to vibrate for a long while after a key press, and it gives theinstrumentareverberantfeel.thenutissetonavery rigid wrest plank. The bridge is attached to the soundboard.

14 936 EURASIP Journal on Applied Signal Processing Trigger at attack time Trigger at release time Release samples g release Excitation samples Timbre control S(z) Output g sb R(z) Soundboard filter Tone corrector Figure : Overall structure of the harpsichord model for a single string. The model structure is identical for all strings in the three sets, but the parameter values and sample data are different. Therefore, the bridge is mainly responsible for transmitting string vibrations to the soundboard. The soundboard is very thin about to 4 mm and it is supported by several ribs installed in patterns that leave trapezoidal areas of the soundboard vibrating freely. The main function of the soundboard is to amplify the weak sound of the vibrating strings, but it also filters the sound. The soundboard forms the top of a closed box, which typically has a rose opening. It causes a Helmholtz resonance, the frequency of which is usually below 1 Hz [1]. In many harpsichords, the soundbox also opens to the manual compartment... Operating principle A plectrum also called a quill that is anchored onto a jack, plucks the strings. The jack rests on a string, but there is a small piece of felt (called the damper) between them. One end of the wooden keyboard lever is located a small distance below the jack. As the player pushes down a key on the keyboard, the lever moves up. This action lifts the jack up and causes the quill to pluck the string. When the key is released, the jack falls back and the damper comes in contact with the string with the objective to dampen its vibrations. A spring mechanism in the jack guides the plectrum so that the string is not replucked when the key is released..3. The harpsichord used in this study The harpsichord used in this study (see Figure 1) wasbuilt in by Jonte Knif (one of the authors of this paper) and Arno Pelto. It has the characteristics of harpsichords built in Italy and Southern Germany. This harpsichord has two manuals and three sets of string choirs, namely an 8 back, an 8 front, and a 4 register. The instrument was tuned to the Vallotti tuning [13] with the fundamental frequency of A 4 of 415 Hz. There are 56 keys from G 1 to D 6,whichcorrespond to fundamental frequencies 46 Hz and 11 Hz, respectively, in the 8 register; the 4 register is an octave higher, so the corresponding lowest and highest fundamental frequencies are about 93 Hz and Hz. The instrument is 4 cm long The tuning is considerably lower than the current standard (44 Hz or higher). This is typical of old musical instruments. and 85 cm wide, and its strings are all made of brass. The plucking point changes from 1% to about 5% of the string length in the bass and in the treble range, respectively. This produces a round timbre (i.e., weak even harmonics) in the treble range. In addition, the dampers have been left out in the last octave of the 4 register to increase the reverberant feel during playing. The wood material used in the instrument has been heat treated to artificially accelerate the aging process of the wood. 3. SYNTHESIS ALGORITHM This section discusses the signal processing methods used in the synthesis algorithm. The structure of the algorithm is illustrated in Figure. It consists of five digital filters, two sample databases, and their interconnections. The physical model of a vibrating string is contained in block S(z). Its input is retrieved from the excitation signal database, and it can be modified during run-time with a timbre-control filter, which is a one-pole filter. In parallel with the string, a second-order resonator R(z) is tuned to reproduce the beating of one of the partials, as proposed earlier by Bank et al. [14, 15]. Whilewecouldusemoreresonators, wehavedecided to target a maximally reduced implementation to minimize the computational cost and number of parameters. The sum of the string model and resonator output signals is fed through a soundboard filter, which is common for all strings. The tone corrector is an equalizer that shapes the spectrum of the soundboard filter output. By varying coefficients g release and g sb, it is possible to adjust the relative levels of the string sound, the soundboard response, and the release sound. In the following, we describe the string model, the sample databases, and the soundboard model in detail, and discuss the need for modeling the dispersion of harpsichord strings Basic string model revisited We use a version of the vibrating string filter model proposed by Jaffe and Smith [16]. It consists of a feedback loop, where a delay line, a fractional delay filter, a high-order allpass filter, and a loss filter are cascaded. The delay line and the fractional delay filter determine the fundamental frequency of the tone. The high-order allpass filter [16] simulates dispersion which

15 Sound Synthesis of the Harpsichord Using a Physical Model 937 x(n) y(n) b A d (z) r a F(z) z L1 z R z 1 Ripple filter One-pole filter Figure 3: Structure of the proposed string model. The feedback loop contains a one-pole filter (denominator of (1)), a feedforward comb filter called ripple filter (numerator of (1)), the rest of the delay line, a fractional delay filter F(z),andanallpassfilterA d (z) simulating dispersion. is a typical characteristic of vibrating strings and which introduces inharmonicity in the sound. For the fractional delay filter, we use a first-order allpass filter, as originally suggested by Smith and Jaffe[16, 17]. This choice was made because it allows a simple and sufficient approximation of delay when a high sampling rate is used. 3 Furthermore, there is no need to implement fundamental frequency variations (pitch bend) in harpsichord tones. Thus, the recursive nature of the allpass fractional delay filter, which can cause transients during pitch bends, is not harmful. The loss filter of waveguide string models is usually implemented as a one-pole filter [18], but now we use an extended version. The transfer function of the new loss filter is H(z) = b r z R, (1) 1az 1 where the scaling parameter b is defined as b = g(1 a), () R is the delay line length of the ripple filter, r is the ripple depth, and a is the feedback gain. Figure 3 shows the block diagram of the string model with details of the new loss filter, which is seen to be composed of the conventional one-pole filter and a ripple filter in cascade. The total delay line length L in the feedback loop is 1RL 1 plus the phase delay caused by the fractional delay filter F(z) and the allpass filter A d (z). The overall loop gain is determined by parameter g, which is usually selected to be slightly smaller than 1 to ensure stability of the feedback loop. The feedback gain parameter a defines the overall lowpass character of the filter: a value slightly smaller than (e.g., a =.1) yields a mild lowpass filter, which causes high-frequency partials to decay faster than the low-frequency ones, which is natural. The ripple depth parameter r is used to control the deviation of the loss filter gain from that of the one-pole filter. 3 The sampling rate used in this work is 441 Hz. The delay line length R is determined as R = round ( r rate L ), (3) where r rate is the ripple rate parameter that adjusts the ripple density in the frequency domain and L is the total delay length in the loop (in samples, or sampling intervals). Theripplefilterwasdevelopedbecauseitwasfoundthat the magnitude response of the one-pole filter alone is overly smooth when compared to the required loop gain behavior for harpsichord sounds. Note that the ripple factor r in (1) increases the loop gain, but it is not accounted for in the scaling factor in (). This is purposeful because we find it useful that the loop gain oscillates symmetrically around the magnitude response of the conventional one-pole filter (obtained from (1) by setting r = ). Nevertheless, it must be ensured somehow that the overall loop gain does not exceed unity at any of the harmonic frequencies otherwise the system becomes unstable. It is sufficient to require that the sum g r remains below one, or r < 1 g. In practice, a slightly larger magnitude of r still results in a stable system when r <, because this choice decreases the loop gain at Hz and the conventional loop filter is a lowpass filter, and thus its gain at the harmonic frequencies is smaller than g. With small positive or negative values of r, it is possible to obtain wavy loop gain characteristics, where two neighboring partials have considerably different loop gains and thus decay rates. The frequency of the ripple is controlled by parameter r rate so that a value close to one results in a very slow wave, while a value close to.5 results in a fast variation where the loop gain for neighboring even and odd partials differs by about r (depending on the value of a). An example is shown in Figure 4 where the properties of a conventional one-pole loss filter are compared against the proposed ripply loss filter. Figure 4a shows that by adding a feedforward path with small gain factor r =., the loop gain characteristics can be made less regular. Figure 4b shows the corresponding reverberation time (T 6 ) curve, which indicates how long it takes for each partial to decay by 6 db. The T 6 values are obtained by multiplying the time-constant values τ by 6/[log(1/e)] or

16 938 EURASIP Journal on Applied Signal Processing 1 Loop gain Frequency (Hz) (a) 1 T6(s) Frequency (Hz) (b) Figure 4: The frequency-dependent (a) loop gain (magnitude response) and (b) reverberation time T 6 determined by the loss filter. The dashed lines show the smooth characteristics of a conventional one-pole loss filter (g =.995, a =.5). The solid lines show the characteristics obtained with the ripply loss filter (g =.995, a =.5, r =., r rate =.5). The bold dots indicate the actual properties experienced by the partials of the synthetic tone (L = samples, f =.5Hz). The time constants τ(k) for partial indices k = 1,, 3,...,on the other hand, are obtained from the loop gain data G(k)as τ(k) = 1 f ln [ ]. (4) G(k) The loop gain sequence G(k) is extracted directly from the magnitude response of the loop filter at the fundamental frequency (k = 1) and at the other partial frequencies (k =, 3, 4,...). Figure 4b demonstrates the power of the ripply loss filter: the second partial can be rendered to decay much slower than the first and the third partials. This is also perceived in the synthetic tone: soon after the attack, the second partial stands out as the loudest and the longest ringing partial. Formerly, this kind of flexibility has been obtained only with high-order loss filters [17, 19]. Still, the new filter has only two parameters more than the one-pole filter, and its computational complexity is comparable to that of a first-order pole-zero filter. 3.. Inharmonicity Dispersion is always present in real strings. It is caused by the stiffness of the string material. This property of strings gives rise to inharmonicity in the sound. An offspring of the harpsichord, the piano, is famous for its strongly inharmonic tones, especially in the bass range [9, ]. This is due to the large elastic modulus and the large diameter of high-strength steel strings in the piano [9]. In waveguide models, inharmonicity is modeled with allpass filters [16, 1,, 3].Naturally, it would be cost-efficient not to implement the inharmonicity, because then the allpass filter A d (z) would not be needed at all. The inharmonicity of the recorded harpsichord tones were investigated in order to find out whether it is relevant to model this property. The partials of recorded harpsichord tones were picked semiautomatically from the magnitude spectrum, and with a least-square fit we estimated the inharmonicity coefficient B [] for each recorded tone. The measured B values are displayed in Figure 5 together with the threshold of audibility and its 9% confidence intervals taken from listening test results [4]. It is seen that the B coefficient is above the mean threshold of audibility in all cases, but above the frequency 14 Hz, the measured values are within the confidence interval. Thus, it is not guaranteed that these cases actually correspond to audible inharmonicity. At low frequencies, in the case of the 19 lowest keys of the harpsichord, where the inharmonicity coefficients are about 1 5, the inharmonicity is audible according to this comparison. It is thus important to implement the inharmonicity for the lowest octaves or so, but it may also be necessary to implement the inharmonicity for the rest of the notes. This conclusion is in accordance with [1], where inharmonicity is stated as part of the tonal quality of the harpsichord, and also with [1], where it is mentioned that the inharmonicity is less pronounced than in the piano Sample databases The excitation signals of the string models are stored in a database from where they can be retrieved at the onset time. The excitation sequences contain, samples (.45 s),

Sound Synthesis of the Harpsichord Using a Physical Model 939 1 4 db 1 3 35 5 3 1 B 1 4 1 5 Frequency (Hz) 5 15 15 5 1 6 1 3 1 7 4 6 8 1 Fundamental frequency (Hz) 5.5 1 1.

17 Sound Synthesis of the Harpsichord Using a Physical Model db B Frequency (Hz) Fundamental frequency (Hz) Time (s) 35 4 Figure 5: Estimates of the inharmonicity coefficient B for all 56 keys of the harpsichord (circles connected with thick line). Also shown are the threshold of audibility for the B coefficient (solid line) and its 9% confidence intervals (dashed lines) taken from [4]. Figure 6: Time-frequency plot of the harpsichord air radiation when the 8 bridge is excited. To exemplify the fast decay of the low-frequency modes only the first seconds and frequencies up to 4 Hz are displayed. and they have been extracted from recorded tones by canceling the partials. The analysis and calibration procedure is discussed further in Section 4 of this paper. The idea is to include in these samples the sound of the quill scraping the string plus the beginning of the attack of the sound so that a natural attack is obtained during synthesis, and the initial levels of partials are set properly. Note that this approach is slightly different from the standard commuted synthesis technique, where the full inverse filtered recorded signal is used to excite the string model [18, 5]. In the latter case, all modes of the soundboard (or soundbox) are contained within the input sequence, and virtually perfect resynthesis is accomplished if the same parameters are used for inverse filtering and synthesis. In the current model, however, we have truncated the excitation signals by windowing them with the right half of a Hanning window. The soundboard response is much longer than that (several seconds), but imitating its ringing tail is taken care of by the soundboard filter (see the next subsection). In addition to the excitation samples, we have extracted short release sounds from recorded tones. One of these is retrieved and played each time a note-off command occurs. Extracting these samples is easy: once a note is played, the player can wait until the string sound has completely decayed, and then release the key. This way a clean recording of noises related to the release event is obtained, and any extra processing is unnecessary. An alternative way would be to synthesize these knocking sounds using modal synthesis, as suggested in [6] Modeling the reverberant soundboard and undamped strings When a note is plucked on the harpsichord, the string vibrations excite the bridge and, consequently, the soundboard. The soundboard has its own modes depending on the size and the materials used. The radiated acoustic response of the harpsichord is reasonably flat over a frequency range from 5 to Hz [11]. In addition to exciting the air and structural modes of the instrument body, the pluck excites the part of the string that lies behind the bridge, the high modes of the low strings that the dampers cannot perfectly attenuate, and the highest octave of the 4 register strings. 4 The resonance strings behind the bridge are about 6 to cm long and have a very inharmonic spectral structure. The soundboard filter used in our harpsichord synthesizer (see Figure )is responsible for imitating all these features. However, as will be discussed further in Section 4.5, the lowest body modes can be ignored since they decay fast and are present in the excitation samples. In other words, the modeling is divided into two parts so that the soundboard filter models the reverberant tail while the attack part is included in the excitation signal, which is fed to the string model. Reference [11] discusses the resonance modes of the harpsichord soundboard in detail. The radiated acoustic response of the harpsichord was recorded in an anechoic chamber by exciting the bridges (8 and 4 ) with an impulse hammer at multiple positions. Figure 6 displays a time-frequency response of the 8 bridge when excited between the C 3 strings, that is, approximately at the middle point of the bridge. The decay times at frequencies below 35 Hz are considerably shorter than in the frequency range from 35 to 1 Hz. The T 6 values at the respective bands are about.5 seconds and 4.5 seconds. This can be explained by the fact that the short string portions 4 The instrument used in this study does not have dampers in the last octave of the 4 register.

18 94 EURASIP Journal on Applied Signal Processing behind the bridge and the undamped strings resonate and decay slowly. As suggested by several authors, see for example, [14, 7, 8], the impulse response of a musical instrument body can be modeled with a reverberation algorithm. Such algorithms have been originally devised for imitating the impulse response of concert halls. In a previous work, we triggered a static sample of the body response with every note [9]. In contrast to the sample-based solution, which produces the same response every time, the reverberation algorithm produces additional variation in the sound: as the input signal of the reverberation algorithm is changed, or in this case as the key or register is changed, the temporal and frequency content of the output changes accordingly. The soundboard response of the harpsichord in this work is modeled with an algorithm presented in [3]. It is a modification of the feedback delay network [31], where the feedback matrix is replaced with a single coefficient, and comb allpass filters have been inserted in the delay line loops. A schematic view of the reverberation algorithm is shown in Figure 7. This structure is used because of its computational efficiency. The H k (z) blocks represent the loss filters, A k (z) blocks are the comb allpass filters, and the delay lines are of length P k. In this work, eight (N = 8) delay lines are implemented. One-pole lowpass filters are used as loss filters which implement the frequency-dependent decay. The comb allpass filters increase the diffusion effect and they all have the transfer function A k (z) = a ap,k z Mk, (5) Mk 1a ap,k z where M k are the delay-line lengths and a ap,k are the allpass filter coefficients. To ensure stability, it is required that a ap,k [ 1, 1]. In addition to the reverberation algorithm, a tonecorrector filter, as shown in Figure, is used to match the spectral envelope of the target response, that is, to suppress the low frequencies below 35 Hz and give some additional lowpass characteristics at high frequencies. The choice of the parameters is discussed in Section CALIBRATION OF THE SYNTHESIS ALGORITHM The harpsichord was brought into an anechoic chamber where the recordings and the acoustic measurements were conducted. The registered signals enable the automatic calibration of the harpsichord synthesizer. This section describes the recordings, the signal analysis, and the calibration techniques for the string and the soundboard models Recordings Harpsichord tones were recorded in the large anechoic chamber of Helsinki University of Technology. Recordings were made with multiple microphones installed at a distance of about 1 m above the soundboard. The signals were recorded digitally (44.1 khz, 16 bits) directly onto the hard disk, and to remove disturbances in the infrasonic range, they were highpass filtered. The highpass filter is a fourth-order Butterworth highpass filter with a cutoff frequency of 5 Hz or 3 Hz (for the lowest tones). The filter was applied to the signal in both directions to obtain a zero-phase filtering. The recordings were compared in an informal listening test among the authors, and the signals obtained with a highquality studio microphone by Schoeps were selected for further analysis. All 56 keys of the instrument were played separately with six different combinations of the registers that are commonly used. This resulted in 56 6 = 336 recordings. The tones were allowed to decay into silence, and the key release was included. The length of the single tones varied between 1 and 5 seconds, because the bass tones of the harpsichord tend to ring much longer than the treble tones. For completeness, we recorded examples of different dynamic levels of different keys, although it is known that the harpsichord has a limited dynamic range due to its excitation mechanism. Short staccato tones, slow key pressings, and fast repetitions of single keys were also registered. Chords were recorded to measure the variations of attack times between simultaneously played keys. Additionally, scales and excerpts of musical pieces were played and recorded. Both bridges of the instrument were excited at several points (four and six points for the 4 and the 8 bridge, respectively) with an impulse hammer to obtain reliable acoustic soundboard responses. The force signal of the hammer and acceleration signal obtained from an accelerometer attached to the bridge were recorded for the 8 bridge at three locations. The acoustic response was recorded in synchrony. 4.. Analysis of recorded tones and extraction of excitation signals Initial estimates of the synthesizer parameters can be obtained from analysis of recorded tones. For the basic calibration of the synthesizer, the recordings were selected where each register is played alone. We use a method based on the short-time Fourier transform and sinusoidal modeling, as previously discussed in [18, 3]. The inharmonicity of harpsichord tones is accounted for in the spectral peak-picking algorithm with the help of the estimated B coefficient values. After extracting the fundamental frequency, the analysis system essentially decomposes the analyzed tone into its deterministic and stochastic parts, as in the spectral modeling synthesis method [33]. However, in our system the decay times of the partials are extracted, and the loop filter design is based on the loop gain data calculated from the decay times. The envelopes of partials in the harpsichord tones exhibit beating and two-stage decay, as is usual for string instruments [34]. The residual is further processed, that is, the soundboard contribution is mostly removed (by windowing the residual signal in the time domain) and the initial level of each partial is adjusted by adding a correction obtained through sinusoidal modeling and inverse filtering [35, 36]. The resulting processed residual is used as an excitation signal to the model.

19 Sound Synthesis of the Harpsichord Using a Physical Model 941 z P1 H 1 (z) A 1 (z) x(n). y(n) z PN H N (z) A N (z) g fb Figure 7: A schematic view of the reverberation algorithm used for soundboard modeling Loss filter design Since the ripply loop filter is an extension of the one-pole filter that allows improved matching of the decay rate of one partial and simply introduces variations to the others, it is reasonable to design it after the one-pole filter. This kind of approach is known to be suboptimal in filter design, but highest possible accuracy is not the main goal of this work. Rather, a simple and reliable routine to automatically process a large amount of measurement data is reached for, thus leaving a minimum amount of erroneous results to be fixed manually. Figure 8 shows the loop gain and T 6 data for an example case. Itisseenthatthetargetdata (bolddotsinfigure 8)contain a fair amount of variation from one partial to the next one, although the overall trend is downward as a function of frequency. Partials with indices 1, 11, 16, and 18 are excluded (set to zero), because their decay times were found to be unreliable (i.e., loop gain larger than unity). The one-pole filter response fitted using a weighted least squares technique [18] (dashed lines in Figure 8) can follow the overall trend, but it evens up the differences between neighboring partials. The ripply loss filter can be designed using the following heuristic rules. (1) Select the partial with the largest loop gain starting from the second partial 5 (the sixth partial in this case, see Figure 8), whose index is denoted by k max. Usually one of the lowest partials will be picked once the outliers have been discarded. () Set the absolute value of r so that, together with the one-pole filter, the magnitude response will match the target loop gain of the partial with index k max, that is, r =G(k max ) H(k max f ), where the second term is the loop gain due to the one-pole filter at that frequency (in this case r =.15). 5 In practice, the first partial may have the largest loop gain. However, if we tried to match it using the ripply loss filter, the r rate parameter would go to 1, as can be seen from (6), and the delay-line length R would become equal to L rounded to an integer, as can be seen from (3). This practically means that the ripple filter would be reduced to a correction of the loop gain by r, whichcanbedonealsobysimplyreplacingtheloopgainparameterg by g r. For this reason, it is sensible to match the loop gain of a partial other than the first one. Loop gain T6(s) Frequency (Hz) 1 5 (a) Frequency (Hz) (b) Figure 8: (a) The target loop gain for a harpsichord tone ( f = 197 Hz) (bold dots), the magnitude response of the conventional one-pole filter with g =.996 and a =.96 (dashed line), and the magnitude response of the ripply loss filter with r =.15 and r rate =.833 (solid line). (b) The corresponding T 6 data. The total delay-line length is 3.9 samples, and the delay-line length R of the ripple filter is 19 samples. (3) If the target loop gain of the first partial is larger than the magnitude response of the one-pole filter alone at that frequency, set the sign of r to positive, and otherwise to negative so that the decay of the first partial is made fast (in the example case in Figure 8, the minus sign is chosen, that is, r =.15). (4) If a positive r has been chosen, conduct a stability check at the zero frequency. If it fails (i.e., g r 1), the value of r must be made negative by changing its sign. (5) Set the ripple rate parameter r rate so that the longest ringing partial will occur at the maximum nearest to Hz. This means that the parameter must be chosen

20 94 EURASIP Journal on Applied Signal Processing according to the following rule: 1 when r k max r rate = 1 when r<. k max In the example case, as the ripple pattern is a negative cosine wave (in the frequency domain) and the peak should hit the 6th partial, we set the r rate parameter equal to 1/1 =.833. This implies that the minimum will occur at every 1th partial and the first maximum will occur at the 6th partial. The result of this design procedure is shown in Figure 8 with the solid line. Note that the peak is actually between the 5th and the 6th partial, because fractional delay techniques are not used in this part of the system and the delay-line length R is thus an integer, as defined in (3). It is obvious that this design method is limited in its ability to follow arbitrary target data. However, as we now know that the resolution of human hearing is also very limited in evaluating differences in decay rates [37], we find the match in most cases to be sufficiently good Beating filter design The beating filter, a second-order resonator R(z) coupled in parallel with the string model (see Figure ), is used for reproducing the beating in harpsichord synthesis. In practice, we decided to choose the center frequency of the resonator so that it brings about the beating effect in one of the low-index partials that has a prominent level and large beat amplitude. These criteria make sure that the single resonator will produce an audible effect during synthesis. In this implementation, we probed the deviation of the actual decay characteristics of the partials from the ideal exponential decay. This procedure is illustrated in Figure 9. In Figure 9a, the mean-squared error (MSE) of the deviation is shown. The lowest partial that exhibits a high deviation (1th partial in this example) is selected as a candidate for the most prominent beating partial. Its magnitude envelope is presented in Figure 9b by a solid curve. It exhibits a slow beating pattern with a period of about 1.5 seconds. The second-order resonator that simulates beating, in turn, can be tuned to result in a beating pattern with this same rate. For comparison, the magnitude envelopes of the 9th and 11th partials are also shown by dashed and dash-dotted curves, respectively. The center frequency of the resonator is measured from the envelope of the partial. In practice, the offset ranges from practically Hz to a few Hertz. The gain of the resonator, that is, the amplitude of the beating partial, is set to be the same as that of the partial it beats against. This simple choice is backed by the recent result by Järveläinen and Karjalainen [38] that the beating in string instrument tones is essentially perceived as an on/off process: if the beating amplitude is above the threshold of audibility, it is noticed, while if it is below it, it becomes inaudible. Furthermore, changes in the beating amplitude appear to be inaccurately perceived. Before knowing these results, in a former version of the synthesizer, we also decided to use the same amplitude for the two (6) MSE Magnitude (db) Harmonic # 9th partial 1th partial 11th partial (a) Time (ms) (b) Figure 9: (a) The mean squared error of exponential curve fitting to the decay of partials ( f = 197 Hz), where the lowest large deviation has been circled (1th partial), and the acceptance threshold is presented with a dashed-dotted line. (b) The corresponding temporal envelopes of the 9th, 1th, and 11th partials, where the slow beating of the 1th partial and deviations in decay rates are visible. components that produce the beating, because the mixing parameter that adjusts the beating amplitude was not giving a useful audible variation [39]. Thus, we are now convinced that it is unnecessary to add another parameter for all string models by allowing changes in the amplitude of the beating partial Design of soundboard filter The reverberation algorithm and the tone correction unit are set in cascade and together they form the soundboard model, as shown in Figure. For determining the soundboard filter, the parameters of the reverberation algorithm and its tone correctorhavetobeset.theparametersforthereverberation algorithm were chosen as proposed in [31]. To match the frequency-dependent decay, the ratio between the decay times at Hz and at f s / was set to.13, so that T 6 at Hz became 6. seconds. The lengths of the eight delay lines varied from 19 to 1999 samples. To avoid superimposing the responses, the lengths were incommensurate numbers [4]. The lengths M k of the delay lines in the comb allpass structures were set to 8% of the total length of each delay line path P k, filter coefficients a ap,k were all set to.5, and the feedback coefficient g fb was set to.5.

Therefore, the tone correction section is divided into two parts: a highpass filter that suppresses frequencies below 35 Hz and another filter that imitates the spectral envelope at the middle and

21 Sound Synthesis of the Harpsichord Using a Physical Model 943 The excitation signals for the harpsichord synthesizer are.45 second long, and hence contain the necessary fastdecaying modes for frequencies below 35 Hz (see Figure 6). Therefore, the tone correction section is divided into two parts: a highpass filter that suppresses frequencies below 35 Hz and another filter that imitates the spectral envelope at the middle and high frequencies. The highpass filter is a 5th-order Chebyshev type I design with a 5 db passband ripple, the 6 db point at 35 Hz, and a roll-off rate of about 5dB peroctave below the cutoff frequency. The spectral envelope filter for the soundboard model is a 1th-order IIR filter designed using linear prediction [41] from a.-second long windowed segment of the measured target response (see Figure 6 from.3 second to.5 second). Figure 1 shows the time-frequency plot of the target response and the soundboard filter for the first 1.5 seconds up to 1 khz. The target response has a prominent lowpass characteristic, which is due to the properties of the impulse hammer. While the response should really be inverse filtered by the hammer force signal, in practice we can approximately compensate this effect with a differentiator whose transfer function is H diff (z) =.5.5z 1. This is done before the design of the tone corrector, so the compensation filter is not included in the synthesizer implementation. 5. IMPLEMENTATION AND APPLICATIONS This section deals with computational efficiency, implementation issues, and musical applications of the harpsichord synthesizer Computational complexity The computational cost caused by implementing the harpsichord synthesizer and running it at an audio sample rate, such as 441 Hz, is relatively small. Table 1 summarizes the amount of multiplications and additions needed per sample for various parts of the system. In this cost analysis, it is assumed that the dispersion is simulated using a first-order allpass filter. In practice, the lowest tones require a higherorder allpass filter, but some of the highest tones may not have the allpass filter at all. So the first-order filter represents an average cost per string model. Note that the total cost per string is smaller than that of an FIR filter of order 1 (i.e., 13 multiplications and 1 additions). In practice, one voice in harpsichord synthesis is allocated one to three string models, which simulate the different registers. The soundboard model is considerably more costly than a string model: the number of multiplications is more than fourfold, and the number of additions is almost seven times larger. The complexity analysis of the comb allpass filters in the soundboard model is based on the direct form II implementation (i.e., one delay line, two multiplications, and two additions per comb allpass filter section). The implementation of the synthesizer, which is discussed in detail in the next section, is based on high-level programming and control. Thus, it is not optimized for fastest possible real-time operation. The current implementation of the synthesizer runs on a Macintosh G4 (8 MHz) Magnitude (db) Magnitude (db) Frequency (Hz) 4 6 Frequency (Hz) 8 1 (a) 8 1 (b) Time (s) Time (s) Figure 1: The time-frequency representation of (a) the recorded soundboard response and (b) the synthetic response obtained as the impulse response of a modified feedback delay network. computer, and it can simultaneously run 15 string models in real time without the soundboard model. With the soundboardmodel,itispossibletorunabout1strings.anew, faster computer and optimization of the code can increase these numbers. With optimized code and fast hardware, it may be possible to run the harpsichord synthesizer with full polyphony (i.e., 56 voices) and soundboard in real time using current technology. 5.. Synthesizer implementation The signal-processing part of the harpsichord synthesizer is realized using a visual software synthesis package called PWSynth [4]. PWSynth, in turn, is part of a larger visual programming environment called PWGL [43]. Finally, the control information is generated using our music notation package ENP (expressive notation package) [44].Inthis section, the focus is on design issues that we have encountered when implementing the synthesizer. We also give ideas on

22 944 EURASIP Journal on Applied Signal Processing Table 1: The number of multiplications and additions in different parts of the synthesizer. Part of synthesis algorithm Multiplications Additions String model Fractional delay allpass filter F(z) Inharmonizing allpass filter A d (z) One-pole filter 1 Ripple filter 1 1 Resonator R(z) 3 Timbre control 1 Mixing with release sample 1 1 Soundboard model Modified FDN reverberator IIR tone corrector 11 1 Highpass filter 1 9 Mixing 1 1 Total Per string (without soundboard model) 13 1 Soundboard model All (one string and soundboard model) 7 77 how the model is parameterized so that it can be controlled from the music notation software. Our previous work in designing computer simulations of musical instruments has resulted in several applications, such as the classical guitar [39], the Renaissance lute, the Turkish ud [45], and the clavichord [9]. The two-manual harpsichord tackled in the current study is the most challenging and complex instrument that we have yet investigated. As this kind of work is experimental, and the synthesis model must be refined by interactive listening, a system is needed that is capable of making fast and efficient prototypes of the basic components of the system. Another nontrivial problem is the parameterization of the harpsichord synthesizer. In a typical case, one basic component, such as the vibrating string model, requires over 1 parameters so that it can be used in a convincing simulation. Thus, since the full harpsichord synthesizer implementation has three string sets each having 56 strings, we need at least 168 (= ) parameters in order to control all individual strings separately. Figure 11 shows a prototype of a harpsichord synthesizer. It contains three main parts. First, the top-most box (called num-box with the label number-of-strings ) gives the number of strings within each string set used by the synthesizer. This number can vary from 1 (useful for preliminary tests) to 56 (the full instrument). In a typical real-time situation, this number can vary, depending on the polyphony of the musical score to be realized, between 4 and 1. The next box of interest is called string model. It is a special abstraction box that contains a subwindow. The contents of this window are displayed in Figure 1. This abstraction box defines a single string model. Next, Figure 11 shows three copy-synth-patch boxes that determine the individual string sets used by the instrument. These sets are labeled as follows: harpsy1/8-fb/, harpsy1/8-ff, and harpsy1/4- ff/. Each string set copies the string model patch count times, where count is equal to the current number of strings (given by the upper number-of-strings box). The rest of the boxes in the patch are used to mix the outputs of the string sets. Figure 1 gives the definition of a single string model. The patch consists of two types of boxes. First, the boxes with the name pwsynth-plug (the boxes with the darkest outlines in grey-scale) define the parametric entry points that are used by our control system. Second, the other boxes are low-level DSP modules, realized in C, that perform the actual sample calculation and boxes which are used to initialize the DSP modules. The pwsynth-plug boxes point to memory addresses that are continuously updated while the synthesizer is running. Each pwsynth-plug box has a label that is used to build symbolic parameter pathnames. While the copy-synth-patch boxes (see the main patch of Figure 11) copy the string model in a loop, the system automatically generates new unique pathnames by merging the label from the current copy-synth-patch box, the current loop index, and the label found in pwsynth-plug boxes. Thus, pathnames like harpsy1/8-fb/1/lfgain are obtained, which refers to the lfgain (loss filter gain) of the first string of the 8 back string set of a harpsichord model called harpsy Musical applications The harpsichord synthesizer can be used as an electronic musical instrument controlled either from a MIDI keyboard or from a sequencer software. Recently, some composers have been interested in using a formerly developed model-based guitar synthesizer for compositions, which are either experimental in nature or extremely challenging for human players.

23 Sound Synthesis of the Harpsichord Using a Physical Model 945 Num-box 56 Number of strings String-model A Copy-synth-patch Count Patch harpsy1/8-fb/ Accum-vector Vector S S Copy-synth-patch Count Patch harpsy1/8-ff/ Accum-vector Vector S S Copy-synth-patch Count Patch harpsy1/4-f/ Accum-vector Vector S S S Synth-box Patch :Score S Figure 11: The top-level prototype of the harpsichord synthesizer in PWSynth. The patch defines one string model and the three string sets used by the instrument. Another fascinating idea is to extend the range and timbre of the instrument. A version of the guitar synthesizer, that we call the super guitar, has an extended range and a large number of strings [46]. We plan to develop a similar extension of the harpsichord synthesizer. In the current version of the synthesizer, the parameters have been calibrated based on recordings. One obvious application for a parametric synthesizer is to modify the timbre by deviating the parameter values. This can lead to extended timbres that belong to the same instrument family as the original instrument or, in the extreme cases, to a novel virtual instrument that cannot be recognized by listeners. One of the most obvious subjects for modification is the decay rate, which is controlled with the coefficients of the loop filter. A well-known limitation of the harpsichord is its restricted dynamic range. In fact, it is a controversial issue whether the key velocity has any audible effect on the sound of the harpsichord. The synthesizer easily allows the implementation of an exaggerated dynamic control, where the key velocity has a dramatic effect on both the amplitude and the timbre, if desired, such as in the piano or in the acoustic guitar. As the key velocity information is readily available, it can be used to control the gain and the properties of a timbre control filter (see Figure ). Luthiers who make musical instruments are interested in modern technology and want to try physics-based synthesis to learn about the instrument. A synthesizer allows varying certain parameters in the instrument design, which are difficult or impossible to adjust in the real instrument. For example, the point where the quill plucks the string is structurally fixed in the harpsichord, but as it has a clear effect on the timbre, varying it is of interest. In the current harpsichord synthesizer, it would require the knowledge of the plucking point and then inverse filtering its contribution from the excitation signal. The plucking point contribution can then be implemented in the string model by inserting another feedforward comb filter, as discussed previously in several works [7, 16, 17, 18]. Another prospect is to vary the location of the damper. Currently, we do not have an exact model for the damper, and neither is its location a parameter. Testing this is still possible, because it is known that the nonideal functioning of the damper is related to the nodal points of the strings, which coincide with the locations of the damper. The ripply loss filter allows the imitation of this effect. Luthiers are interested in the possibility of virtual prototyping without the need for actually building many versions of an instrument out of wood. The current synthesis model may not be sufficiently detailed for this purpose. A real-time or near-real-time implementation of a physical model, where several parameters can be adjusted, would be an ideal tool for testing prototypes.

24 946 EURASIP Journal on Applied Signal Processing Pwsynth-plug P1gain Pwsynth-plug Riprate.5 Number Numbers S Pwsynth-plug SoundID Ripple-delay-1g3 Sig Delay Ripple Ripdepth.5 z 1 Sample-player Sample Freq Amp Trig Number Numbers S Extra-sample1 A S S Pwsynth-plug freqsc 1 Pwsynth-plug 1Overfreq Intval Pwsynth-plug Ripdepth. Onepole Sig Coef Gain S Pwsynth-plug Trigg Pwsynth-plug 1fcoef 1/freq 1/fcoef 1fgain A Intval Initial-vals Pwsynth-plug 1fgain Pwsynth-plug 1fgainsc 1. Linear-iP Sig.1 S Number Numbers S Figure 1: The string model patch. The patch contains the low-level DSP modules and parameter entry points used by the harpsichord synthesizer. 6. CONCLUSIONS This paper proposes signal-processing techniques for synthesizing harpsichord tones. A new extension to the loss filter of the waveguide synthesizer has been developed which allows variations in the decay times of neighboring partials. This filter will be useful also for the waveguide synthesis of other stringed instruments. The fast-decaying modes of the soundboard are incorporated in the excitation samples of the synthesizer, while the long-ringing modes at the middle and high frequencies are imitated using a reverberation algorithm. The calibration of the synthesis model is made almost automatic. The parameterization and use of simple filters also allow manual adjustment of the timbre. A physicsbased synthesizer, such as the one described here, has several musical applications, the most obvious one being the usage as a computer-controlled musical instrument. Examples of single tones and musical pieces synthesized with the synthesizer are available at hut.fi/publications/papers/jasp-harpsy/. ACKNOWLEDGMENTS The work of Henri Penttinen has been supported by the Pythagoras Graduate School of Sound and Music Research. The work of Cumhur Erkut is part of the EU project ALMA (IST ). The authors are grateful to B. Bank, P. A. A. Esquef, and J. O. Smith for their helpful comments. Special thanks go to H. Järveläinen for her help in preparing Figure 5. REFERENCES [1] J. O. Smith, Efficient synthesis of stringed musical instruments, in Proc. International Computer Music Conference, pp , Tokyo, Japan, September [] M. Karjalainen and V. Välimäki, Model-based analysis/synthesis of the acoustic guitar, in Proc. Stockholm Music Acoustics Conference, pp , Stockholm, Sweden, July August [3] M. Karjalainen, V. Välimäki, and Z. Jánosy, Towards highquality sound synthesis of the guitar and string instruments, in Proc. International Computer Music Conference, pp , Tokyo, Japan, September [4] J. O. Smith and S. A. Van Duyne, Commuted piano synthesis, in Proc. International Computer Music Conference,pp , Banff, Alberta, Canada, September [5] J. O. Smith, Physical modeling using digital waveguides, Computer Music Journal, vol. 16, no. 4, pp , 199. [6] K. Karplus and A. Strong, Digital synthesis of plucked string and drum timbres, Computer Music Journal, vol. 7, no., pp , [7] M.Karjalainen,V.Välimäki, and T. Tolonen, Plucked-string models, from the Karplus-Strong algorithm to digital waveg-

25 Sound Synthesis of the Harpsichord Using a Physical Model 947 uides and beyond, Computer Music Journal, vol., no. 3, pp. 17 3, [8] F. Hubbard, Three Centuries of Harpsichord Making, Harvard University Press, Cambridge, Mass, USA, [9] N. H. Fletcher and T. D. Rossing, The Physics of Musical Instruments, Springer-Verlag, New York, NY, USA, [1] E. L. Kottick, K. D. Marshall, and T. J. Hendrickson, The acoustics of the harpsichord, Scientific American, vol. 64, no., pp , [11] W.R.Savage,E.L.Kottick,T.J.Hendrickson,andK.D.Marshall, Air and structural modes of a harpsichord, Journal of the Acoustical Society of America, vol. 91, no. 4, pp , 199. [1] N. H. Fletcher, Analysis of the design and performance of harpsichords, Acustica, vol. 37, pp , [13] J. Sankey and W. A. Sethares, A consonance-based approach to the harpsichord tuning of Domenico Scarlatti, Journal of the Acoustical Society of America, vol. 11, no. 4, pp , [14] B. Bank, Physics-based sound synthesis of the piano, M.S. thesis, Department of Measurement and Information Systems, Budapest University of Technology and Economics, Budapest, Hungary,, published as Tech. Rep. 54, Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Espoo, Finland,. [15] B. Bank, V. Välimäki, L. Sujbert, and M. Karjalainen, Efficient physics based sound synthesis of the piano using DSP methods, in Proc. European Signal Processing Conference, vol. 4, pp. 5 8, Tampere, Finland, September. [16] D. A.Jaffe and J. O. Smith, Extensions of the Karplus-Strong plucked-string algorithm, Computer Music Journal, vol.7, no., pp , [17] J. O. Smith, Techniques for digital filter design and system identification with application to the violin, Ph.D. thesis, Stanford University, Stanford, Calif, USA, [18] V. Välimäki, J. Huopaniemi, M. Karjalainen, and Z. Jánosy, Physical modeling of plucked string instruments with application to real-time sound synthesis, Journal of the Audio Engineering Society, vol. 44, no. 5, pp , [19] B. Bank and V. Välimäki, Robust loss filter design for digital waveguide synthesis of string tones, IEEE Signal Processing Letters, vol. 1, no. 1, pp. 18, 3. [] H. Fletcher, E. D. Blackham, and R. S. Stratton, Quality of piano tones, Journal of the Acoustical Society of America,vol. 34, no. 6, pp , 196. [1] S. A. Van Duyne and J. O. Smith, A simplified approach to modeling dispersion caused by stiffness in strings and plates, in Proc. International Computer Music Conference, pp , Århus, Denmark, September [] D. Rocchesso and F. Scalcon, Accurate dispersion simulation for piano strings, in Proc. Nordic Acoustical Meeting, pp , Helsinki, Finland, June [3] B. Bank, F. Avanzini, G. Borin, G. De Poli, F. Fontana, and D. Rocchesso, Physically informed signal processing methods for piano sound synthesis: a research overview, EURASIP Journal on Applied Signal Processing, vol. 3, no. 1, pp , 3. [4] H. Järveläinen, V. Välimäki, and M. Karjalainen, Audibility of the timbral effects of inharmonicity in stringed instrument tones, Acoustics Research Letters Online, vol., no. 3, pp , 1. [5] M. Karjalainen and J. O. Smith, Body modeling techniques for string instrument synthesis, in Proc. International Computer Music Conference, pp. 3 39, Hong Kong, China, August [6] P. R. Cook, Physically informed sonic modeling (PhISM): synthesis of percussive sounds, Computer Music Journal, vol. 1, no. 3, pp , [7] D. Rocchesso, Multiple feedback delay networks for sound processing, in Proc. X Colloquio di Informatica Musicale, pp. 9, Milan, Italy, December [8] H. Penttinen, M. Karjalainen, T. Paatero, and H. Järveläinen, New techniques to model reverberant instrument body responses, in Proc. International Computer Music Conference, pp , Havana, Cuba, September 1. [9] V. Välimäki, M. Laurson, and C. Erkut, Commuted waveguide synthesis of the clavichord, Computer Music Journal, vol. 7, no. 1, pp. 71 8, 3. [3] R. Väänänen, V. Välimäki, J. Huopaniemi, and M. Karjalainen, Efficient and parametric reverberator for room acoustics modeling, in Proc. International Computer Music Conference, pp. 3, Thessaloniki, Greece, September [31] J. M. Jot and A. Chaigne, Digital delay networks for designing artificial reverberators, in Proc. 9th Convention Audio Engineering Society, Paris, France, February [3] C. Erkut, V. Välimäki, M. Karjalainen, and M. Laurson, Extraction of physical and expressive parameters for modelbased sound synthesis of the classical guitar, in Proc. 18th Convention Audio Engineering Society, p.17,paris,france, February. [33] X. Serra and J. O. Smith, Spectral modeling synthesis: a sound analysis/synthesis system based on a deterministic plus stochastic decomposition, Computer Music Journal, vol. 14, no. 4, pp. 1 4, 199. [34] G. Weinreich, Coupled piano strings, Journal of the Acoustical Society of America, vol. 6, no. 6, pp , [35] V. Välimäki and T. Tolonen, Development and calibration of a guitar synthesizer, Journal of the Audio Engineering Society, vol. 46, no. 9, pp , [36] T. Tolonen, Model-based analysis and resynthesis of acoustic guitar tones, M.S. thesis, Laboratory of Acoustics and Audio Signal Processing, Department of Electrical and Communications Engineering, Helsinki University of Technology, Espoo, Finland, 1998, Tech. Rep. 46. [37] H. Järveläinen and T. Tolonen, Perceptual tolerances for decay parameters in plucked string synthesis, Journal of the Audio Engineering Society, vol. 49, no. 11, pp , 1. [38] H. Järveläinen and M. Karjalainen, Perception of beating and two-stage decay in dual-polarization string models, in Proc. International Symposium on Musical Acoustics, Mexico City, Mexico, December. [39] M. Laurson, C. Erkut, V. Välimäki, and M. Kuuskankare, Methods for modeling realistic playing in acoustic guitar synthesis, Computer Music Journal, vol. 5, no. 3, pp , 1. [4] W. G. Gardner, Reverberation algorithms, in Applications of Digital Signal Processing to Audio and Acoustics,M.Kahrsand K. Brandenburg, Eds., pp , Kluwer Academic, Boston, Mass, USA, [41] J.D.MarkelandA.H.GrayJr., Linear Prediction of Speech, Springer-Verlag, Berlin, Germany, [4] M. Laurson and M. Kuuskankare, PWSynth: a Lisp-based bridge between computer assisted composition and sound synthesis, in Proc. International Computer Music Conference, pp , Havana, Cuba, September 1. [43] M. Laurson and M. Kuuskankare, PWGL: a novel visual language based on Common Lisp, CLOS and OpenGL, in Proc. International Computer Music Conference, pp , Gothenburg, Sweden, September.

948 EURASIP Journal on Applied Signal Processing [44] M. Kuuskankare and M. Laurson, ENP.: a music notation program implemented in Common Lisp and OpenGL, in Proc.

Välimäki, Model-based synthesis of the ud and the Renaissance lute, in Proc. International Computer Music Conference, pp. 119 1, Havana, Cuba, September 1. [46] M. Laurson, V. Välimäki, and C.

Vesa Välimäki was born in Kuorevesi, Finland, in 1968. He received the M.S.

26 948 EURASIP Journal on Applied Signal Processing [44] M. Kuuskankare and M. Laurson, ENP.: a music notation program implemented in Common Lisp and OpenGL, in Proc. International Computer Music Conference, pp , Gothenburg, Sweden, September. [45] C. Erkut, M. Laurson, M. Kuuskankare, and V. Välimäki, Model-based synthesis of the ud and the Renaissance lute, in Proc. International Computer Music Conference, pp , Havana, Cuba, September 1. [46] M. Laurson, V. Välimäki, and C. Erkut, Production of virtual acoustic guitar music, in Proc. Audio Engineering Society nd International Conference on Virtual, Synthetic and Entertainment Audio, pp , Espoo, Finland, June. Vesa Välimäki was born in Kuorevesi, Finland, in He received the M.S. degree in technology, the Licentiate of Science degree in technology, and the Doctor of Science degree in technology, all in electrical engineering from Helsinki University of Technology (HUT), Espoo, Finland, in 199, 1994, and 1995, respectively. He was with the HUT Laboratory of Acoustics and Audio Signal Processing since 199 to 1. In 1996, he was a Postdoctoral Research Fellow with the University of Westminster, London, UK. During the academic year 1 he was Professor of signal processing at the Pori School of Technology and Economics, Tampere University of Technology (TUT), Pori, Finland. He is currently Professor of audio signal processing at HUT. He was appointed Docent in signal processing at the Pori School of Technology and Economics, TUT, in 3. His research interests are in the application of digital signal processing to music and audio. Dr. Välimäki is a senior member of the IEEE Signal Processing Society and is a member of the Audio Engineering Society, the Acoustical Society of Finland, and the Finnish Musicological Society. Mikael Laurson was born in Helsinki, Finland, in His formal training at the Sibelius Academy consists of a guitar diploma (1979) and a doctoral dissertation (1996). In, he was appointed Docent in music technology at Helsinki University of Technology, Espoo, Finland. Between the years 1979 and 1985 he was active as a guitarist. Since 1989 he has been working at the Sibelius Academy as a Researcher and Teacher of computer-aided composition. After conceiving the PatchWork (PW) programming language (1986), he started a close collaboration with IRCAM resulting in the first PW release in After 1993 he has been active as a developer of various PW user libraries. Since the year 1999, Dr. Laurson has worked in a project dealing with physical modeling and sound synthesis control funded by the Academy of Finland and the Sibelius Academy Innovation Centre. Cumhur Erkut was born in Istanbul, Turkey, in He received the B.S. and the M.S. degrees in electronics and communication engineering from the Yildiz Technical University, Istanbul, Turkey, in 1994 and 1997, respectively, and the Doctor of Science in Technology degree in electrical engineering from Helsinki University of Technology (HUT), Espoo, Finland, in. Between 1998 and, he worked as a Researcher at the HUT Laboratory of Acoustics and Audio Signal Processing. He is currently a Postdoctoral Researcher in the same institution, where he contributes to the EU-funded research project Algorithms for the Modelling of Acoustic Interactions (ALMA, European project IST ). His primary research interests are model-based sound synthesis and musical acoustics. Henri Penttinen was born in Espoo, Finland, in He received the M.S. degree in electrical engineering from Helsinki University of Technology (HUT), Espoo, Finland, in 3. He has worked at the HUT Laboratory of Acoustics and Signal Processing since 1999 and is currently a Ph.D. student there. His main research interests are signal processing algorithms, real-time audio applications, and musical acoustics. Mr. Penttinen is also active in music through playing, composing, and performing. Jonte Knif was born in Vaasa, Finland, in He is currently studying music technology at the Sibelius Academy, Helsinki, Finland. Prior to this he studied the harpsichord at the Sibelius Academy for five years. He has built and designed many historical keyboard instruments and adaptations such as an electric clavichord. His present interests include also loudspeaker and studio electronics design.

27 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation Multirate Simulations of String Vibrations Including Nonlinear Fret-String Interactions Using the Functional Transformation Method L. Trautmann Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstrasse 7, 9158 Erlangen, Germany traut@lnt.de Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, P.O. Box 3, 15 HUT, Finland Lutz.Trautmann@acoustics.hut.fi R. Rabenstein Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstrasse 7, 9158 Erlangen, Germany rabe@lnt.de Received 3 June 3; Revised 14 November 3 The functional transformationmethod (FTM) isa well-established mathematical method for accurate simulations of multidimensional physical systems from various fields of science, including optics, heat and mass transfer, electrical engineering, and acoustics. This article applies the FTM to real-time simulations of transversal vibrating strings. First, a physical model of a transversal vibrating lossy and dispersive string is derived. Afterwards, this model is solved with the FTM for two cases: the ideally linearly vibrating string and the string interacting nonlinearly with the frets. It is shown that accurate and stable simulations can be achieved with the discretization of the continuous solution at audio rate. Both simulations can also be performed with a multirate approach with only minor degradations of the simulation accuracy but with preservation of stability. This saves almost 8% of the computational cost for the simulation of a six-string guitar and therefore it is in the range of the computational cost for digital waveguide simulations. Keywords and phrases: multidimensional system, vibrating string, partial differential equation, functional transformation, nonlinear, multirate approach. 1. INTRODUCTION Digital sound synthesis methods can mainly be categorized into classical direct synthesis methods and physics-based methods [1]. The first category includes all kinds of sound processing algorithms like wavetable, granular and subtractive synthesis, as well as abstract mathematical models, like additive or frequency modulation synthesis. What is common to all these methods is that they are based on the sound to be (re)produced. The physics-based methods, also called physical modeling methods, start at the physics of the sound production mechanism rather than at the resulting sound. This approach has several advantages over the sound-based methods. (i) The resulting sound and especially transitions between successive notes always sound acoustically realistic as far as the underlying model is sufficiently accurate. (ii) Sound variations of acoustical instruments due to different playing techniques or different instruments within one instrument family are described in the physics-based methods with only a few parameters. These parameters can be adjusted in advance to simulate a distinct acoustical instrument or they can be controlled by the musician to morph between real world instruments to obtain more degrees of freedom in the expressiveness and variability. The second item makes physical modeling methods quite useful for multimedia applications where only a very limited bandwidth is available for the transmission of music as, for example, in mobile phones. In these applications, the physicalmodelhastobetransferredonlyonceandafterwardsitis sufficient to transfer only the musical score while keeping the variability of the resulting sound. The starting points for the various existing physical modeling methods are always physical models varying for a certain vibrating object only in the model accuracies. The application of the basic laws of physics to an existing or imaginary

28 95 EURASIP Journal on Applied Signal Processing vibrating object results in continuous-time, continuousspace models. These models are called initial-boundaryvalue problems and they contain a partial differential equation (PDE) and some initial and boundary conditions. The discretization approaches to the continuous models and the digital realizations are different for the single physical modeling methods. One of the first physical modeling algorithm for the simulation of musical instruments was made by Hiller and Ruiz 1971 in [] with the finite difference method. It directly discretizes the temporal and spatial differential operators of the PDE to finite difference terms. On the one hand, this approach is computationally very demanding; since temporal and spatial sampling intervals have to be chosen small for accurate simulations. Furthermore, stability problems occur especially in dispersive vibrational objects if the relationship between temporal and spatial sampling intervals is not chosen properly [3]. On the other hand, the finite difference method is quite suitable for studies in which the vibration has to be evaluated in a dense spatial grid. Therefore, the finite difference method has mainly been used for academic studies rather than for real-time applications (see, e.g., [4, 5]). However, the finite difference method has recently become more popular also for real-time applications in conjunction with other physical modeling methods [6, 7]. A mathematically similar discretization approach is used in mass-spring models that are closely related to the finite element method. In this approach, the vibrating structure is reduced to a finite number of mass points that are interconnected by springs and dampers. One of the first systems for the simulation of musical instruments was the CORDIS system which could be realized in real time on a specialized processor [8]. The finite difference method, as well as the mass-spring models, can be viewed as direct discretization approaches of the initial-boundary-value problems. Despite the stability problems, they are very easy to set up, but they are computationally demanding. In modal synthesis, first introduced in [9], the PDE is spatially discretized at non necessarily equidistant spatial points, similar to the mass-spring models. The interconnections between these discretized spatial points reflect the physical behavior of the structure. This discretization reduces the degrees of freedom for the vibration to the number of spatial points which is directly transferred to the same number of temporal modes the structure can vibrate in. The reduction does not only allow the calculation of the modes of simple structures, but it can also handle vibrational measurements of more complicated structures at a finite number of spatial points [1]. A commercial product of the modal synthesis, Modalys, is described, for example, in [11]. For a review of modal synthesis and a comparison to the functional transformation method (FTM), see also [1]. The commercially and academically most popular physical modeling method of the last two decades was the digital waveguide method (DWG) because of its computational efficiency. It was first introduced in [13] as a physically interpreted extension of the Karplus-Strong algorithm [14]. Extensions of the DWG are described, for example, in [15, 16, 17, 18]. The DWG first simplifies the PDE to the wave equation which has an analytical solution in the form of a forward and backward traveling wave, called d Alembert solution. It can be realized computationally very efficient with delay lines. The sound effects like damping or dispersion occurring in the vibrating structure are included in the DWG by low-order digital filters concentrated in one point of the delay line. This procedure ensures the computational efficiency, but the implementation looses the direct connection to the physical parameters of the vibrating structure. The focus of this article is the FTM. It was first introduced in [19] for the heat-flow equation and first used for digital sound synthesis in []. Extensions to the basic model of a vibrating string and comparisons between the FTM and the above mentioned physical modeling methods are given, for example, in [1].In the FTM,the initial-boundary-value problem is first solved analytically by appropriate functional transformations before it is discretized for computer simulations. This ensures a high simulation accuracy as well as an inherent stability. One of the drawbacks of the FTM is so far its computational load, which is about five times higher than the load of the DWG [1]. This article extends the FTM by applying a multirate approach to the discrete realization of the FTM, such that the computational complexity is significantly reduced. The extension is shown for the linearly vibrating string as well as for the nonlinear limitation of the string vibration by a fretstring interaction occurring in slapbass synthesis. The article is organized as follows. Section derives the physical model of a transversal vibrating, dispersive, and lossy string in terms of a scalar PDE and initial and boundary conditions. Furthermore, a model for a nonlinear fret-string interaction is given. These models are solved in Section 3 with the FTM in continuous time and continuous space. Section 4 discretizes these solutions at audio rate and derives an algorithm to guarantee stability even for the nonlinear discrete system. A multirate approach is used in Section 5 for the simulation of the continuous solution to save computational cost. It is shown that this multirate approach also works for nonlinear systems. Section 6 compares the audio rate and the multirate solutions with respect to the simulation accuracy and the computational complexity.. PHYSICAL MODELS In this Section, a transversal vibrating, dispersive, and lossy string is analyzed using the basic laws of physics. From this analysis, a scalar PDE is derived in Section.1. Section. defines the initial states of the vibration, as well as the fixings of the string at the nut and the bridge end, in terms of initial and boundary conditions, respectively. In Section.3, the linear model is extended with a deflection-dependent force simulating the nonlinear interaction between the string and the frets, well known as slap synthesis []. In all these models, the strings are assumed to be homogeneous and isotropic. Furthermore, the smoothness of their surfaces may not permit stress concentrations. The deflections of the strings are assumed to be small enough to change

29 Multirate Simulations of String Vibrations Using the FTM 951 neither the cross section area nor the tension on the string so that the string itself behaves linearly..1. Linear partial differential equation derived by basic laws of physics The string under examination is characterized by its material and geometrical parameters. The material parameters are given by the mass density ρ, the Young s modulus E, the laminar air flow damping coefficient d 1, and the viscoelastic damping coefficient d 3. The geometrical parameters consist of the length l, the cross section area A and the moment of inertia I.Furthermore,atensionT s is applied to the string in axial direction. Considering only a string segment between the spatial positions x s and x s x, the forces on this string segment can be analyzed in detail. They consist of the restoring force f T caused by the tension T s, the bending force f B caused by the stiffness of the string, the laminar air flow force f d1, the viscoelastic damping force f d3 (modeled here without memory), and the external excitation force f e. They result at x s in ( f T xs, t ) = T s sin ( ϕ ( x s, t )) T s ϕ ( x s, t ), ( f B xs, t ) = EIb ( x s, t ), ( f d1 xs, t ) = d 1 xv ( x s, t ), ( f d3 xs, t ) = d 3 sin ( ϕ ( x s, t )) d 3 ϕ ( x s, t ), (1a) (1b) (1c) (1d) where ϕ(x s, t) is the slope angle of the string, b(x s, t) is the curvature of the string, v(x s, t) is the velocity, and prime denotes spatial derivative and dot denotes temporal derivative. Note that in (1a) and in (1d) it is assumed that the amplitude of the string vibration is small so that the sine function can be approximated by its argument. Similar equations can be found for the forces at the other end of the string segment at x s x. All these forces are combined by the equation of motion to ρa x v ( x s, t ) ( = f y xs, t ) ( f d3 xs, t ) ( f y xs x, t ) ( f d3 xs x, t ) ( f d1 xs, t ) ( f e xs, t ), () where f y = f T f B. Setting x and solving () for the excitation force density f e1 (x s, t) = f e (x s, t)δ(x x s ), four coupled equations are obtained, that are valid not only at the string segment x s x x s x but also at the whole string x l. δ(x) denotes the impulse function. f e1 (x, t) = ρa v(x, t)d 1 v(x, t) f y (x, t) d 3 ḃ(x, t), (3a) f y (x, t) = T s ϕ(x, t) EIb (x, t), (3b) b ( x 1, t ) = ϕ (x, t), (3c) v ( x 1, t ) = ϕ(x, t). (3d) An extended version of the derivation of (3) can be found in [1]. The four coupled equations (3) can be simplified to one scalar PDE with only one output variable. All the dependent variables in (3a) can be written in terms of the string deflection y(x, t) by replacing v(x, t) withẏ(x, t) and ϕ(x, t) = y (x, t)from(3d) and with (3b)and(3c). Then (3) can be written in a general notation of scalar PDEs D { y(x, t) } L { y(x, t) } W { y(x, t) } (4a) = f e1 (x, t), x [, l], t [, ), with D { y(x, t) } = ρaÿ(x, t)d 1 ẏ(x, t), L { y(x, t) } = T s y (x, t)ei B y (x, t), W { y(x, t) } { { }} = W D WL y(x, t) = d3 ẏ (x, t). (4b) As it can be seen in(4), the operator D{} contains only temporal derivatives, the operator L{} has only spatial derivatives, and the operator W{} consists of mixed temporal and spatial derivatives. The PDE is valid only on the string between x = andx = l and for all positive times. Equation (4) forms a continuous-time, continuous-space PDE. For a unique solution, initial and boundary conditions must be given as specified in the next section... Initial and boundary conditions Initial conditions define the initial state of the string at time t =. This definition is written in the general operator notation with [ ] fi T { } y(x,) y(x, t) = =, x [, l], t =. (5) ẏ(x,) Since the scalar PDE (4) is of second order with respect to time, only two initial conditions are needed. They are chosen arbitrarily by the initial deflection and the initial velocity of the string as seen in (5). For musical applications, it is a reasonable assumption that the initial states of the strings vanish at time t = asgivenin(5). Note that this does not prevent the interaction between successively played notes since the time is not set to zero for each note. Thus, this kind of initial condition is only used for, for example, the beginning of a piece of music. In addition to the initial conditions, also the fixings of the string at both ends must be defined in terms of boundary conditions. In most stringed instruments, the strings are nearly fixed at the nut end (x = x = ) and transfer energy at the other end (x = x 1 = l) via the bridge to the resonant body []. For some instruments (e.g., the piano) it is also a justified assumption, that the bridge fixing can be modeled to be ideally rigid [3]. Then the boundary conditions are given by [ ( fbi T { } y xi, t ) ] y(x, t) = y ( x i, t ) =, i, 1, t [, ). (6) It can be seen from (6) that the string is assumed to be fixed, allowed to pivot at both ends, such that the deflection y and the curvature b = y must vanish. These are boundary conditions of first kind. For simplicity, there is no energy fed

30 95 EURASIP Journal on Applied Signal Processing PDE IC, BC { } ODE BC { } Algebraic equation Reordering Discretization MD TFM Discrete solution z 1 { } Discrete 1 DTFM 1 { } Discrete MD TFM Figure 1: Procedure of the FTM solving initial boundary value problems defined in form of PDEs and IC and BC. into the system via the boundary, resulting in homogeneous boundary conditions. The PDE (4), in conjunction with the initial (5) and boundary conditions (6), forms the linear continuoustime continuous-space initial-boundary-value problem to be solved and simulated..3. Nonlinear extension to the linear model for slap synthesis Nonlinearities are an important part in the sound production mechanisms of musical instruments [3]. One example is the nonlinear interaction of the string with the frets, well known as slap synthesis. This effect was modeled first for the DWG in [] as a nonlinear amplitude limitation. For the FTM, the effect was already applied to vibrating strings in [4]. A simplified model for this interaction interprets the fret as a spring with a high stiffness coefficient S fret acting at one position x f as a force f f on the string at time instances where the string is in contact with the fret. Since this force depends on the string deflection, it is nonlinear, defined with ( ) f f xf, t, y, y f ( ( S fret y xf, t ) ( y f xf, t )), for y ( x f, t ) ( y f xf, t ) >, =, for y ( x f, t ) ( y f xf, t ). (7) The deflection of the fret from the string rest position is denoted with y f.thepde(4) becomes nonlinear by adding the slap force f f to the excitation function f e1 (x, t).thus,alinear and a nonlinear system for the simulation of the vibrating string is derived. Both systems are solved in the next sections with the FTM. 3. CONTINUOUS SOLUTIONS USING THE FTM To obtain a model that can be implemented in the computer, the continuous initial-boundary-value problem has to be discretized. Instead of using a direct discretization approach as described in Section 1, the continuous analytical solution is derived first, which is discretized subsequently. This procedure is well known from the simulation of one-dimensional systems like electrical networks. It has several advantages including simulation accuracy and guaranteed stability. The outline of the FTM is given in Figure 1. First, the PDE with initial conditions (IC) and boundary conditions (BC) is Laplace transformed (L{ }) withrespecttotime to derive a boundary-value problem (ODE, BC). Then a so-called Sturm-Liouville transformation (T { }) isusedfor the spatial variable to obtain an algebraic equation. Solving for the output variable results in a multidimensional (MD) transfer function model (TFM). It is discretized and by applying the inverse Sturm-Liouville transformation T 1 { } and the inverse z-transformation z 1 { } it results in the discretized solution in the time and space domain. The impulse-invariant transformation is used for the discretization shown in Figure 1. It is equivalent to the calculation of the continuous solution by inverse transformation into the continuous time and space domain with subsequent sampling. The calculation of the continuous solution is presented in Sections 3.1 to 3.5, the discretization is shown in Sections 4 and 5. For the nonlinear system, the transformations cannot obviously result in a TFM. Therefore, the procedure has to be modified slightly, resulting in an MD implicit equation, described in Section Laplace transformation As known from linear electrical network theory, the Laplace transformation removes the temporal derivatives in linear and time-invariant (LTI) systems and includes, due to the differentiation theorem, the initial conditions as additive terms (see, e.g., [5]). Since first- and second-order time derivatives occur in (4) and the initial conditions (5) are homogeneous, the application of the Laplace transformation to the initial boundary value problem derived in Section results in d D (s)y(x, s)l { Y(x, s) } { } w D (s)w L Y(x, s) (8a) = F e1 (x, s), x [, l], fbi T Y(x, s) =, i, 1. (8b) The Laplace transformed functions are written with capital letters and the complex temporal frequency variable is denoted by s = σ jω. It can be seen in (8a) that the temporal derivatives of (4a) are replaced with scalar multiplication of the functions d D (s) = ρas d 1 s, w D (s) = d 3 s. (8c) Thus, the initial boundary value problem (4), (5), and (6) is replaced with the boundary-value problem (8) after Laplace transformation.

31 Multirate Simulations of String Vibrations Using the FTM Sturm-Liouville transformation The transformation of the spatial variable should have the same properties as the Laplace transformation has for the time variable. It should remove the spatial derivatives and it should include the boundary conditions as additive terms. Unfortunately, there is no unique transformation available for this task due to the finite spatial definition range in contrast to the infinite time axis. That calls for a determination of the spatial transformation at hand, depending on the spatial differential operator and the boundary conditions. Since it leads to an eigenvalue problem first solved for simplified problems by Sturm and Liouville between 1836 and 1838, this transformation is called a Sturm-Liouville transformation (SLT) [6]. Mathematical details of the SLT applied to scalar PDEs can be found in [1]. The SLT is defined by T { Y(x, s) } l = Ȳ(µ, s) = K(µ, x)y(x, s)dx. (9) Note that there is a finite integration range in (9)incontrast to the Laplace transformation. The transformation kernels K(µ, x) of the SLT are obtained as the set of eigenfunctions of the spatial operator L W = LW L with respect to the boundary conditions (8b). The corresponding eigenvalues are denoted by β 4 µ(s) whereβ µ (s) is the discrete spatial frequency variable (see,e.g., [1] for details). For the boundary-value problem defined in (8) with the operators given in (4b), the transformation kernels and the discrete spatial frequency variables result in ( µπ K(µ, x) = sin l ( µπ βµ(s) 4 = EI l ) x, µ N, (1a) ) 4 ( T s d 3 s )( ) µπ. (1b) l Thus, the SLT can be interpreted as an extended Fourier series decomposition Multidimensional transfer function model Applying the SLT (9) to the boundary-value problem (8) and solving for the transformed output variable Ȳ(µ, s) results in the MD TFM Ȳ(µ, s) = 1 d D (s)β 4 µ(s) F e (µ, s). (11) Hence, the transformed input forces F(µ, s) are related via the MD transfer function given in (11) to the transformed output variable Ȳ(µ, s). The denominator of the MD TFM depends quadratically on the temporal frequency variable s and to the power of four on the spatial frequency variable β µ. This is based on the second-order temporal and fourth-order spatial derivatives occurring in the scalar PDE (4). Thus, the transfer function is a two-pole system with respect to time for each discrete spatial eigenvalue β µ Inverse transformations As explained at the beginning of Section 3, the continuous solution in the time and space domain is now calculated by using inverse transformations. Inverse SLT The inverse SLT is defined by an infinite sum over all discrete eigenvalues β µ with Y(x, s) = T 1{ Ȳ(µ, s) } = µ 1 N µ Ȳ(µ, s)k(µ, x). (1) The inverse transformation kernel K(µ, x) and the inverse spatial frequency variable β µ are the same eigenfunctions and eigenvalues as for the forward transformation due to the selfadjointness of the spatial operators L and W L (see [1]fordetails). Thus, the inverse SLT can be evaluated at each spatial position by evaluating the infinite sum. Since only quadratic terms of µ occur in the denominator, it is sufficient to sum over positive values of µ and double the result to account for the negative values. The norm factor results in that case in N µ = l/4. Inverse Laplace transformation It can be seen from (11) and(8c), (1b) that the transfer functions consist of two-pole systems with conjugate complex pole pairs for each discrete spatial eigenvalue β µ. Therefore the inverse Laplace transformation results for each spatial frequency variable in a damped sinusoidal term, called mode Continuous solution After applying the inverse transformations to the MD TFM, the continuous solution results in y(x, t) = 4 ( 1 e σµt sin ( ω µ t ) ) ρal ω f e (x, t) K(µ, x)δ 1 (t). µ=1 µ (13) The step function, denoted by δ 1 (t), is used since the solution is only valid for positive time instances; means temporal convolution. fe (x, t) is the spatially transformed excitation force, derived by inserting f e1 into (9). The angular frequencies ω µ, as well as their corresponding damping coefficients σ µ, can be calculated from the poles of the transfer function model (11). They directly depend on the physical parameters of the string and can be expressed by ω µ = ( ( EI ρa d3 )(µπ ρa) l ) 4 ( Ts ρa d 1d 3 (ρa) σ µ = d 1 ρa d 3 ρa( µπ l ) (µπ ) l ). ( d1 ρa ), (14) Thus, an analytical continuous solution (13), (14) of the initial boundary value problem (4), (5), (6) is derived without temporal or spatial derivatives.

32 954 EURASIP Journal on Applied Signal Processing 3.6. Implicit equation for slap synthesis The PDE (4) becomes nonlinear by adding the solutiondependent slap force f f (x f, t, y, y f )in(7) to the right-hand side of the linear PDE. Obviously, the application of the Laplace transformation and the SLT to the nonlinear initialboundary-value problem cannot lead to an MD TFM, since a TFM always requires linearity. However, assuming that the nonlinearity can be represented as a finite power series and that the nonlinearity does not contain spatial derivatives, both transformations can be applied to the system [1]. With (7), both premises are given such that the slap force can also be transformed into the frequency domains. The Y(x, s)- dependency of F f can be expressed with (1) intermsof Ȳ(ν, s) to be consistently in the spatial frequency domain. Then an MD implicit equation is derived in the temporal and spatial frequency domain Ȳ(µ, s) = 1 ( ( F d D (s)βµ(s) 4 e (µ, s) F )) f µ, s, Ȳ(ν, s). (15) Note that the different argument ν in the output dependence of F f (µ, s, Ȳ(ν, s)) denotes an interaction between all modes caused by the nonlinear slap force. Details can be found in [1]. Since the transfer functions in (11) and(15) are the same, also the spatial transformation kernels and frequency variables stay the same as in the linear case. Thus, also the temporal poles of (15) are the same as in the MD TFM (11)and the continuous solution results in the implicit equation y(x, t) = 4 ρal µ=1 ( 1 ω µ e σµt sin ( ω µ t ) ( fe (x, t) f f ( µ, t, ȳ(ν, t) )) ) K(µ, x)δ 1 (t), (16) with ω µ and σ µ givenin(14). It is shown in the next sections that this implicit equation is turned into explicit ones by applying different discretization schemes. 4. DISCRETIZATION AT AUDIO RATE This section describes the discretization of the continuous solutions for the linear and the nonlinear cases. It is performed at audio rate, for example with sampling frequency f s = 1/T = 44.1 khz, where T denotes the sampling interval. The discrete realization is shown as it can be implemented in the computer. For the nonlinear slap synthesis, some extensions of the discrete realization are required and, furthermore, the stability of the entire system must be controlled Discretization of the linear MD model The discrete realization of the MD TFM (11) consists of a three-step procedure performed below: (1) discretization with respect to time, () discretization with respect to space, (3) inverse transformations. Discretization with respect to time Discretizing the time variable with t = kt, k N and assuming an impulse-invariant system, an s-to-z mapping is applied to the MD TFM (11)withz = e st. This procedure directly leads to an MD TFM with the discrete-time frequency variable z: Ȳ d (µ, z) = T( 1/ρAω µ ) ze σ µt sin ( ω µ T ) z ze σµt cos ( ω µ T ) e σµt F d e (µ, z). (17) Superscript d denotes discretized variables. The angular frequency variables and the damping coefficients are given in (14). Pole-zero diagrams of the continuous and the discrete system are shown in [7]. Discretization with respect to space For the spatial frequency domain, there is no need for discretization, since the spatial frequency variable is already discrete. However, a discretization has to be applied to the spatial variable x. This spatial discretization consists of simply evaluating the analytical solution (13) at a limited number of arbitrary spatial positions x a on the string. They can be chosen to be the pickup positions or the fret positions, respectively. Inverse transformations The inverse SLT cannot be performed any longer for an infinite number of µ due to the temporal discretization. To avoid temporal aliasing the number must be limited to µ T such that ω µt T π, which also ensures realizable computer implementations. Effects of this truncation are described in [1]. The most important conclusion is that the sound quality is not effected since only modes beyond the audible range are neglected. By applying the shifting theorem, the inverse z-transformation results in µ T second-order recursive systems in parallel, each one realizing one vibrational mode of the string. The structure is shown with solid lines in Figure. This linear structure can be implemented directly in the computer since it only includes delay elements z 1,adders, and multipliers. Due to (14), the coefficients of the secondorder recursive systems in Figure only depend on the physical parameters of the vibrating string. 4.. Extensions for slap synthesis The discretization procedure for the nonlinear slap synthesis can be performed with the same three steps described in Section 4.1. Here, the discretized MD TFM is extended with the output-dependent slap force F f d(µ, z, Ȳ d (ν, z)) and thus stays implicit. However, after discretization with respect to spaceasdescribedabove,andinversez-transformation with application of the shifting theorem, the resulting recursive systems are explicit. This is caused by the time shift of the excitation function due to the multiplication with z in the numerator of (17). Therefore, the linear system given with solid lines in Figure is extended with feedback paths denoted by dashed lines from the output to additional inputs between

33 Multirate Simulations of String Vibrations Using the FTM 955 f d e (k) c 1,e (1) c 1,s (1) z 1 z 1 f d f (k) NL K(1, x a ) N 1 y d (x a, k) e σ1t e σ1t cos(ω 1 T) c 1,e (µ T ) c 1,s (µ T ) z 1 z 1 K(µ T, x a ) N µt. e σµ T T e σµ T T cos(ω µt T) Figure : Basic structure of the FTM simulations derived from the linear initial boundary value problem (4), (5), and (6) with several second-order resonators in parallel. Solid lines represent basic linear system, while dashed lines represent extensions for the nonlinear slap force. ȳd (µ T, k) c 1,e (µ T ) f d e (k) c 1,s (µ T ) f d f (k) z 1 e σµ T T z 1 ȳ d (µ T, k) e σµ T T cos(ω µt T) ȳ d 1 (µ T, k) ȳ d 1,s(µ T, k) Figure 3: Recursive system realization of one mode of the transversal vibrating string. the unit delays of all recursive systems. The feedback paths are weighted with the nonlinear (NL) function (7) Guaranteeing stability The discretized LTI systems derived in Section 4.1 are inherently stable as long as the underlying continuous physical model is stable due to the use of the impulse-invariant transformation [5]. However, for the nonlinear system derived in Section 4. this stability consideration is not valid any more. It might happen that the passive slap force of the continuous system becomes active with the direct discretization approach [4]. To preserve the passivity of the system, and thus the inherent stability, the slap force must be limited such that the discrete impulses correspond to their continuous counterparts. The instantaneous energy of the string vibration can be calculated by monitoring the internal states of the modal deflections [1]. The slap force limitation can then be obtained directly from the available internal states. For an illustration of these internal states, the recursive system of one mode µ T is given in Figure 3. The variables c 1,e (µ T )andc 1,s (µ T ), denoting the weightings of the linear excitation force fe d (k) atx e and of the slap force ff d(k)atx f, respectively, result with (9), (1a)and(17) in ( ) T c 1,(e,s) µt = sin ( ω µt T ) ( µt π sin x (e,s) ). (18) ρaω µt l The total instantaneous energy of the string vibration without slap force density can be calculated with [1, 8] (time step k and mode number µ T dependencies are omitted for concise notation) E vibr (k) = 4ρA ( ) σ l µt ωµ T µ T ȳd 1 ȳ1 d ȳe d σµ T T cos ( ω µt T ) ȳ d e σµ T T e σµ T T sin ( ω µt T ). (19) In (19), the instantaneous energy is calculated without application of the slap force since the internal states ȳ d 1(µ T, k) are used (seefigure 3). For calculating the instantaneous energy E s (k) after applying the slap force, ȳ d 1(µ T, k) must be replaced with ȳ d 1,s(µ T, k) in(19). To meet the condition of passivity of the elastic slap collision, both energies must be related by E vibr (k) E s (k). Here, only the worst-case scenario with regard to the instability problem is discussed, where both en-

34 956 EURASIP Journal on Applied Signal Processing ergies are the same. By inserting into this energy equality the corresponding expressions of (19) and solving for the slap force ff d (k) results in f d f (k) = µ T c 5 ( µt ) ( e σµ T T cos ( ω µt T ) ȳ d ( µt, k ) ȳ d 1 ( µt, k )), with ( ) c 5 µt = c 1,s ( µt ) ( σ µ T ω µ T ) ν T µ T e σν T T sin ( ω νt T ) (a) κ T (c1,s ( )( κt σ κt ωκ ) T ν T κ T e σν T T sin ( ω νt T )). (b) The force limitation discussed here can be implemented very efficiently. Only one additional multiplication, one summation, and one binary shift are needed for each vibrational mode (see (a)), since the more complicated constants c 5 (µ T ) have to be calculated only once and the weighting of ȳ d (µ T, k) has to be performed within the recursive system anyway (compare Figure 3). Discrete realizations of the analytical solutions of the MD initial boundary value problems have been derived in this section. For the linear and nonlinear systems, they resulted in stable and accurate simulations of the transversal vibrating string. The drawback of these straight forward discretization approaches of the MD systems in the frequency domains is the high computational complexity of the resulting realizations. Assuming a typical nylon guitar string with 47 Hz pitch frequency, 59 eigenmodes have to be calculated up to the Nyquist frequency at.5 khz. With an average of 3.1 and 4. multiplications per output sample (MPOS) per recursive system for the linear and the nonlinear systems, respectively, the total computational cost results for the whole string in 183 MPOS and 48 MPOS. Note that the fractions of the average MPOS result from the assumption that there are only few time instances where an excitation force acts on the string, such that the input weightings of the recursive systems do not have to be calculated at each sample step. Since this is also assumed for the nonlinear slap force, the fractional part in the nonlinear system is higher than in the linear system. These computational costs are approximately five times higher than those of the most efficient physical modeling method, the DWG [1]. The next section shows that this disadvantage of the FTM can be fixed by using a multirate approach for the simulation of the recursive systems. 5. DISCRETIZATION WITH A MULTIRATE APPROACH The basic idea using a multirate approach to the FTM realization is that the single modes have a very limited bandwidth as long as the damping coefficients σ µ are small. Subdividing the temporal spectrum into different bands that are processed independently of each other, the modes within these bands can be calculated with a sampling rate that is a fraction of the audio rate. Thus, the computational complexity can be reduced with this method. The sidebands generated by this procedure at audio rate are suppressed with a synthesis filter bank when all bands are added up to the output signal. The input signals of the subsampled modes also have to be subsampled. To avoid aliasing, the respective input signals for the modes are obtained by processing the excitation signal fe d (k) through an analysis filter bank. This general procedure is shown with solid lines in Figure 4. It shows several modes (RS # i), each one running at its respective downsampled rate. This filter bank approach is discussed in detail in the next two sections for the linear as well as for the nonlinear model of the FTM Discretization of the linear MD model For the realization of the structure shown in Figure 4, two major tasks have to be fulfilled [9]: (1) designing an analysis and a synthesis filter bank that can be realized efficiently, () developing an algorithm that can simulate band changes of single sinusoids to keep the flexibility of the FTM. Filter bank design There are numerous design procedures for filter banks that are mainly specialized to perfect or nearly perfect reconstruction requirements [3]. In the structure shown in Figure 4 there is no need for a perfect reconstruction as in soundprocessing applications, since the sound production mechanism is performed within the single downsampled frequency bands. Therefore, inaccuracies of the interpolation filters can be corrected by additional weightings of the subsampled recursive systems. Linear phase filters with finite impulse responses (FIR) are used for the filter bank due to the variability of the single sinusoids over time. Furthermore, a realvalued generation of the sinusoids in the form of secondorder recursive systems as shown in Figure is preferred to complex-valued first-order recursive systems. This approach avoids on one hand additional real-valued multiplications of complex numbers. On the other hand, the nonlinear slap implementation can be performed in a similar way for the multirate approach, as explained for the audio-rate realization in Section 4.. A multirate realization of the FTM with complex-valued first-order systems is described in [31]. To fulfill these prerequisites and the requirement of loworder filters for computational efficiency with necessarily flat filter edges, a filter bank with different downsampling factors for different bands has to be designed. A first step is to design a basic filter bank with P ED equidistant filters, all using the same downsampling factor r ED = P ED. Due to the flat filter edges, there will be P ED 1 frequency gaps between the single filters that have neither a sufficient passband amplification nor a sufficient stopband attenuation. These gaps are

35 Multirate Simulations of String Vibrations Using the FTM 957 f d f (rk) NL y d (x a, rk) 4 RS # 1 4 RS # RS # 3 f d e (k) Analysis filter bank 6 4 RS # 4 RS # 5 RS # Synthesis filter bank y d (x a, k) RS # 7. Figure 4: Structure of the multirate FTM. Solid lines represent the basic linear system, while dashed and dotted lines represent the extensions for the nonlinear slap force. RS means recursive system. The arrow between RS # 3 and RS # 4 indicates a band change. filled with low-order FIR filters that realize the interpolation of different downsampling factors than r ED. The combination of all filters forms the filter bank. It is used for the analysis and the synthesis filter bank as shown in Figure 4. An example of this procedure is shown in Figure 5 with P ED = 4. The total number of bands is P = 7. The frequency regions where the single filters are used as passbands in the filter bank are separated by vertical dashed lines. The filters are designed by a weighted least-squares method such that they meet the desired passband bandwidths and stopband attenuations. Note that there are several frequency regions for each filter where the frequency response is not specified explicitly. These so-called don t care bands occur since only a part of the Nyquist bandwidth in the downsampled domain is used for the simulation of the modes. Thus, there can only be images of these sinusoids in the upsampled version in distinct regions. All other parts of the spectrum are don t care bands, for the lowpass filter they are shown as gray areas in Figure 5. Magnitude ripples of ±3dB are allowed in the passband which can be compensated by a correction of the weighting factors of the single sinusoids. The stopbands are attenuated by at least 6 db, which is sufficient for most hearing conditions. Merely in studio-like hearing conditions larger stopband attenuations must be used such that artifacts produced by using the filter bank cannot be heard. Due to the different specifications of the filters, concerning bandwidths and edge steepnesses, they have different orders and thus different group delays. To compensate for the different group delays, delay-lines of length (M max M p )/ are used in conjunction with the filters. The number of coef- Magnitude response (db) ω µ T/π Figure 5: Top: frequency responses of the equidistant filters (with downsampling factor four in this example). Center: frequency responses of the filters with other downsampling factors. Bottom: frequency response of the filter bank. The downsampling factors r are given within the corresponding passbands. The FIR filter orders are between M min = 34 and M max = 7 in this example. They realize a stopband attenuation of at least 6 db and allow passband ripples of ±3dB. ficients of the interpolation filters are denoted by M p,where M max is the maximum order of all filters. The delay lines consume some memory space but no additional computational

36 958 EURASIP Journal on Applied Signal Processing cost [3]. Realizing the filter bank in a polyphase structure, each filter bank results in a computational cost of P C filterbank = p=1 M p r p MPOS, (1) with the downsampling factors r p of each band. For the example given above, each filter bank needs 73 MPOS. In (1) it is assumed that each band contains at least one mode to be reproduced, so that it is a worst-case scenario. As long as the excitation signal is known in advance, the excitations for each band can be precalculated such that only the synthesis filter bank must be implemented in real time. The case that the excitation signals are known and stored as wavetables in advance is quite frequent in physical modeling algorithms, although the pure physicality of the model is lost by this approach. For example, for string simulations, typical plucking or striking situations can be described by appropriate excitation signals which are determined in advance. The practical realization of the multirate approach starts with the calculation of the modal frequencies ω µt and their corresponding damping coefficients σ µt.thefrequencydenotes in which band the mode is synthesized. The coefficients of the recursive systems, as shown in Figure for the audio rate realization, have to be modified in the downsampled domain since the sampling interval T is replaced by T (r) = rt (1) = rt. () Superscript (r) denotes the downsampled simulation with factor r. The downsampling factors of the different bands r p are given in the top and center plot of Figure 5. No further adjustments have to be performed for the coefficients of the recursive systems in the multirate approach, since modes can be realized in the downsampled baseband or each of the corresponding images. Band changes of single modes One advantage of the FTM is that the physical parameters of a vibrating object can be varied while playing. This is not only valid for successively played notes but also within one note, as it occurs, for example, in vibrato playing. As far as one or several modes are at the edges of the filter bank bands, these variations can cause the modes to change the bands whiletheyareactive. ThisisshownwithanarrowinFigure 4. In such a case, the reproduction cannot be performed by just adjusting the coefficients of the recursive systems with () to the new downsampling rate and using the other interpolation filter. This procedure would result in strong transients and in a modification of the modal amplitudes and phases. Therefore, a three-step procedure has to be applied to the band changing modes: (1) adjusting the internal states of the recursive systems such that no phase shift and no amplitude difference occurs in the upsampled output signal from this mode, () canceling the filter output of the band changing mode, (3) training of the new interpolation filter to avoid transient behavior. Similar to the calculation of the instantaneous energy for slap synthesis, also the instantaneous amplitude and phase can be calculated from the internal states of a second-order recursive system, ȳ 1 and ȳ. They can be calculated for the old band with downsampling factor r 1,aswellasforthenewband with factor r. Demanding the equality of both amplitudes and phases, the internal states of the new band are calculated from the internal states of the old band to ȳ (r) 1 = ȳ (r1) sin ( ω µ r T ) 1 sin ( ω µ r 1 T ) ( ȳ (r1) e σµr1t cos ( ω µ r T ) sin ( ω µ r T ) ) tan ( ω µ r 1 T ) (3), ȳ (r) = ȳ (r1) e σµ(r1 r)t. The second item of the three-step procedure means that the output of the synthesis interpolation filter must not contain those modes that are leaving that band at time instance k ch T for time steps kt k ch T. Since the filter bank is a causal system of length M p T, the information of the band change must either be given in advance at (k ch M p )T or a turbo filtering procedure has to be applied. In the turbo filtering, the calculations of several sample steps are performed within one sampling interval at the cost of a higher peak computational complexity. In this case, the turbo filtering must calculate the previous outputs of the modes, leaving the band and subtract their contribution to the interpolated output for time instances kt k ch T. Due to the higher peak computational complexity of the turbo filtering and the low orders of the interpolation filters, the additional delay of M p T is preferred here. In the same way, as the band changing mode must not have an effect on the leaving band from k ch T on, it must also be included in the interpolation filter of the new band from this time instance on. In other words, the new interpolation filter must be trained to correctly produce the desired mode without transients, as addressed in the third item of the three-step procedure above. It can also be performed with the turbo processing procedure with a higher computational cost or with the delay of M p T between the information of band change and its effect in the output signal. Now, the linear solution (13) of the transversal vibrating string derived with the FTM is realized also with a multirate approach. Since the single modes are produced at a lower rate than the audio rate, this procedure saves computational cost in comparison to the direct discretization procedure derived in Section 4.1. The amount of computational savings with this procedure is discussed in more detail in Section Extensions for slap synthesis In the discretization approach described in Section 4. the output y d (x a, k) is fed back to the recursive systems via the path of the external force fe d (k) (comparefigure ). Using the same path in the multirate system shown in Figure 4

37 Multirate Simulations of String Vibrations Using the FTM 959 would result in a long delay within the feedback path due to the delays in the interpolation filters of the analysis and the synthesis filter bank. Furthermore, the analysis filter bank should not be realized in real time as long as the excitation signal is known in advance. Fortunately, the recursive systems calculate directly the instantaneous deflection of the single modes, but in the downsampled domain. Considering a system where only modes are simulated in baseband, the signal can be fed back in between the down- and upsampled boxes in Figure 4 and thus directly in the downsampled domain. In comparison to the full-rate system, the observation of the penetration of the string into the fret might be delayed by up to (r p 1)T seconds. This delay results in a different slap force, but applying the stabilization procedure described in Section 4.3 the stability is guaranteed. However, in realistic simulations there are also modes in the higher frequency bands than just in the baseband. This modifies the simulations described above in two ways: (i) the deflection of the string and thus the penetration into the fret depends on the modes of all bands, (ii) there is an interaction due to nonlinear slap force between all modes in all bands. The calculation of the instantaneous string deflection in the downsampled rates is rarely possible, since there are various downsampling rates as shown in Figure 4. Thus, there are only a few time instances k all T, where the modal deflections are updated in all bands at the same time. Since in almost all bands one sample value of the recursive systems represents more than half the period of the mode, it is not reasonable to use the previously calculated sample value for the calculation of the deflection at time instances kt k all T.However, all the equidistant bands of the filter bank as shown on top of Figure 5 have the same downsampling factor and can thus represent the same time instances for the calculation of the deflection. Furthermore, most of the energy of guitar string vibrations is in the lower modes [8], such that the deflection is mostly defined by the modes simulated in the lowest bands. Therefore, the string deflection is determined here at each r 1 th audio sample from all equidistant bands and each (k mod r 1 = ) (k mod r = )th audio sample from all equidistant bands and bands with the downsampling rate of the lowest band-pass. This is shown in the right dashed and dotted paths in Figure 4. In the example of Figure 5, in each twelfth audio sample the deflection is calculated from the four equidistant bands and each twelfth audio sample it is calculated also from the second and sixth bands. In the same way the string deflection is calculated with varying participation of the different bands, also the slap force is only applied to modes in these bands as shown in the left dashed and dotted paths in Figure 4. This procedure has two effects: firstly, there is no interaction between all modes at all (downsampled) time instances from the slap force. Secondly, the slap force itself, being an impulse-like signal with a bright spectrum, is filtered by the filter bank. The first effect is not that important since the procedure ensures interactions between most modes but it only restricts them to few time instances, in the example above every fourth or twelfth audio sample. These low delays of the interaction are not noticeable. The second effect can be handled by adding impulses directly to the interpolation filters of the synthesis filter bank. The weights of the impulses in each band are determined by the difference between the sum of all slap force impulses in all bands and the applied slap force impulses in that band. In that way, a slap force, only applied to baseband modes, produces a nearly white noise slap signal at audio rate. The stabilization procedure described in Section 4.3 can be also applied to the multirate realization of the nonlinear slap force. The only differences to the audio rate simulations are that T is replaced by r p T asgivenin() and the summation for the calculation of the stable slap force ff d (k) as givenin(a) is only performed over the modes realized in the participating bands. Thus, there are time instances where the slap force is only applied to the modes in the equidistant bands and time instances where it is applied also to bands with another downsampling factor. This is shown with the dotted lines in Figure 4. Due to the different cases of participating bands, also two versions of the constants c 5 (µ T )have to be calculated, since the products and sums in (b) depend only on the participating modes. Now, a stable and realistic simulation of the nonlinear slap force is also obtained in the multirate realization. In the nonlinear case, the simulation accuracy obviously decreases with higher downsampling factors and thus with an increasing number of bands. This effect is discussed in more detail in the next section. 6. SIMULATION ACCURACY AND COMPUTATIONAL COMPLEXITY In the previous sections, stable, linear and nonlinear, discrete FTM models have been derived. In the next sections, the simulation accuracies of these models and their corresponding computational complexities are discussed Simulation accuracies For the linearly vibrating string, the discrete realization of the single modes at full rate is an exactly sampled version of the continuous modes. This is true as long as the input force can be modeled with discrete impulses, since the impulseinvariant transformation is used as explained in Section 4.1. However, the exactness of the complete system is lost with the truncation of the summation of partials in (1) toavoid aliasing effects. Therefore, the results are only accurate as long as the excitation signal has only low energy in the truncated high frequency range. This is true for the guitar and most other musical instruments [8] and, furthermore, the neglected higher partials cannot be received by the human auditory system as long as the sampling interval T is chosen small enough. Since the audible modes are simulated exactly and the simulation error is out of the audible range, the FTM is used here as an optimized discretization approach for sound synthesis applications.

38 96 EURASIP Journal on Applied Signal Processing In multirate simulations of linear systems as described in Section 5.1, the single modes are produced exactly within the downsampled domain. But due to the imperfectness of the analysis filter bank, modes are not only excited by the correct frequency components of the excitation force, but also by aliasing terms that occur with downsampling. In the same way, the images, produced by upsampling the outputs of the recursive systems, are not suppressed perfectly with the synthesis filter bank. However, the filter banks have been designed such that the stopband suppressions are at least 6dB.This is sufficient for most listening conditions as defined in Section 5.1. Furthermore, the filters are designed in a least-mean-squares sense such that the energy of the side lobes in the stopbands is minimized. Further filter bank optimizations with respect to the human auditory system are difficult since the filter banks are designed only once for all kinds of mode configurations concerning their positions and amplitude relations in the simulated spectrum. In the audio rate string model excited nonlinearly with the slap force as described in Section 4., the truncation of the infinite sum in (16) also effects the accuracy of the lower modes through the nonlinearity. The simulations are accurate only as long as the external excitation and the nonlinearity have low contributions to the higher modes. Although the external excitation contributes rarely to the higher modes, there is an interaction between all modes due to the slap force. This interaction grows with the modal frequencies. It can be directly seen in the coefficients c 5 (µ T )in(b), since they have larger absolute values for higher frequencies. However, the force contributions of the omitted modes are distributed to the simulated modes since the denominator of (b) decreases for less simulated partials. Furthermore, the sign of c 5 (µ T ) changes with µ T due to (18) as well as the expression in parenthesis of (a) does withtime. Thus, thereis a bidirectional interaction between low and high modes and not only an energy shift from low to high frequencies. Neglecting modes out of the audible range results in less energy fluctuations of the audible modes. But since the neglected energy fluctuations have high frequencies, they are also out of the audible range. In the multirate implementation of the nonlinear model as described in Section 5., the interactions between almost all modes are retained. It is more critical here that the observation of the fret-string penetration might be delayed by several audio samples. This circumvents not only the strict limitation of the string deflection by the fret, but is also changes the modal interactions because the nonlinear system is not time-invariant. However, the audible slap effect stays similar to the full-rate simulations and sounds realistic. Audio examples can be found at traut/jasp4/sounds.html. It has been shown that the FTM realizes the continuous solutions of the physical models of the vibrating string accurately. With the multirate approach, the FTM looses the exactness of the linear audio rate model, but the inaccuracies cannot be heard. For the nonlinear model, the multirate approach leads to audible differences compared to the audio rate simulations, but the characteristics of the slap sounds are preserved. Thus, simplifications and computational savings due to the filter bank approach are performed here with respect to the human auditory system. 6.. Computational complexities The computational complexities of the FTM are explained with two typical examples, a single bass guitar string simulated in different qualities and a six-string acoustic guitar. The first example simulates the vibration of one bass guitar string with fundamental frequency of 41 Hz. The corresponding physical parameters can be found, for example, in [1]. This string is simulated in different sound qualities by varying the number of simulated modes from 1 to 117, which corresponds to the simulation of all modes up to the Nyquist frequency with a sampling frequency of f s = 44.1 khz. Figure 6 shows the dependency of the computational complexities on the number of simulated modes and thus the simulation accuracy or sound quality. The procedure used here to enhance the sound quality consists of simulating more and more modes in consecutive order from the lowest mode on. Thus, the enhancement of the sound quality sounds like opening the lowpass in subtractive synthesis. The upper plot shows the computational complexities for the linear system, simulated at audio rate and with the multirate approach using filter banks with P = 7 and P = 15. The bottom plot shows the corresponding graphs for the nonlinear systems. It is assumed that the external forces only act on the string at one tenth of the output samples such that the weighting of the inputs do not have to be performed at each time instance. Thus, each linear recursive system needs 3.1 MPOS for the calculation of one output sample, whereas the nonlinear system needs 4. MPOS. It can be seen that the multirate implementations are much more efficient than the audio-rate simulations, except for simulations with very few modes. With all 117 simulated modes, the relation between audio rate and multirate simulations (P = 7) is 363 MPOS to 157 MPOS for the linear system and 49 MPOS to 187 MPOS for the nonlinear system. This is a reduction of the computational complexity of more than 6%. The steps in the multirate graphs denote the offset of the filter bank realization and that the interpolations of the filter bank bands are only calculated as long as there is at least one mode simulated in those bands. On the one hand, the regions between the steps are steeper in the filter bank with P = 7 than in that with P = 15 due to the higher downsampling factors in filter banks with more bands. On the other hand, the steps are higher for filter banks with more bands due to the higher interpolation filter orders. In this example, the multirate approach with P = 7 is superior to the filter bank with P = 15 for high qualities, since there are only a few modes simulated in the higher bands of P = 15, but the filter bank offset is higher. For other configurations with a higher number of simulated modes, this situation is different as shown in the next example. The second example shows the computational complex-

39 Multirate Simulations of String Vibrations Using the FTM 961 Computational complexity [MPOS] Number of modes (a) Computational complexity [MPOS] Number of modes (b) Figure 6: Computational complexities of the FTM simulations dependent on the number of simulated modes at audio rate (dotted line), and with multirate approaches with P = 7 (dashed line) and P = 15 (solid line). (a): linearly vibrating string, (b): vibrating string with nonlinear slap forces. ities of the simultaneous simulations of six independent strings as they occur in an acoustic guitar. Obviously, there is only one interpolation filter bank needed for all strings. The average number of simulated modes for each guitar string is assumed to be 6. In contrast to the first example, it is assumed that the modes are equally distributed in the frequency domain, such that at least one mode is simulated in each band. Figure 7 shows that the computational complexities depend on the choice of the used filter bank. On the one hand, each filter bank needs a fixed amount of computational cost which grows with the number of used bands. On the other hand, filter banks with more bands provide higher downsampling factors for the production of the sinusoids which saves computational cost. Thus, the choice of the optimal filter bank depends on the number of simultaneously simulated modes. For practical implementations this has to be estimated in advance. It can be seen that for the linear case (solid line) the minimum computational cost is 7 MPOS using the filter bank with P = 11. In the nonlinear case, the filter bank with P = 15 has the minimum computational cost with 319 MPOS for the simulation of all six strings. Compared to the audio-rate simulations with 1116 MPOS and 151 MPOS for the linear and nonlinear case, respectively, the multirate simulations allow computational savings up to 79%. Thus, the multirate simulations have a computational complexity of approximately 45 MPOS (53 MPOS) for each linearly (nonlinearly) simulated string. Computational complexity [MPOS] P Figure 7: Computational complexities of the FTM simulations of a six-string guitar dependent on the number of bands for the multirate approach. Solid line: linearly vibrating string. Dashed line: vibrating string with nonlinear slap forces. Compared to high quality DWG simulations, the computational complexities of the multirate FTM approach are nearly the same. Linear DWG simulations need up to 4 MPOS for the realization of the reflection filters [1] and the nonlinear limitation of the string by the fret additionally needs 3 MPOS per fret position [].

40 96 EURASIP Journal on Applied Signal Processing 7. CONCLUSIONS The complete procedure of the FTM has been described from the basic physical analysis of a vibrating structure resulting in an initial boundary value problem via its analytical solution to efficient digital multirate implementations. The transversal vibrating dispersive and lossy string with a nonlinear slap force served as an example. The novel contribution is a thorough investigation of the implementation and the properties of a multirate realization. It has been shown that the differences between audiorate and multirate simulations for linearly vibrating string simulations are not audible. The differences of the nonlinear simulations were audible but the multirate approach preserves the sound characteristics of the slap sound. The application of the multirate approach saves almost 8% of the computational cost at audio rate. Thus, it is nearly as efficient as the most popular physical modeling method, the DWG. The multirate FTM is by far not limited to the example of vibrating strings. It can be used in a similar way to spatially multidimensional systems, like membranes or plates or even to other physical problems like heat flow or diffusion. ACKNOWLEDGMENTS The authors would like to thank Vesa Välimäki for numerous discussions and his help in the filter bank design for the multirate FTM. Furthermore, the financial support of the Deutsche Forschungsgemeinschaft (DFG) for this research is greatly acknowledged. REFERENCES [1] C. Roads, S. Pope, A. Piccialli, and G. De Poli, Eds., Musical Signal Processing, Swets & Zeitlinger, Lisse, The Netherlands, [] L. Hiller and P. Ruiz, Synthesizing musical sounds by solving the wave equation for vibrating objects: Part I, Journal of the Audio Engineering Society, vol. 19, no. 6, pp , [3] A. Chaigne and V. Doutaut, Numerical simulations of xylophones. I. Time-domain modeling of the vibrating bars, Journal of the Acoustical Society of America, vol. 11, no. 1, pp , [4] A. Chaigne, On the use of finite differences for musical synthesis. Application to plucked stringed instruments, Journal d Acoustique, vol. 5, no., pp , 199. [5] A. Chaigne and A. Askenfelt, Numerical simulations of piano strings. I. a physical model for a struck string using finite difference methods, Journal of the Acoustical Society of America, vol. 95, no., pp , [6] M. Karjalainen, 1-D digital waveguide modeling for improved sound synthesis, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol., pp , IEEE Signal Processing Society, Orlando, Fla, USA, May. [7] C. Erkut and M. Karjalainen, Finite difference method vs. digital waveguide method in string instrument modeling and synthesis, in Proc. International Symposium on Musical Acoustics, Mexico City, Mexico, December. [8] C. Cadoz, A. Luciani, and J. Florens, Responsive input devices and sound synthesis by simulation of instrumental mechanisms: the CORDIS system, Computer Music Journal, vol. 8, no. 3, pp. 6 73, [9] J. M. Adrien, Dynamic modeling of vibrating structures for sound synthesis, modal synthesis, in Proc. AES 7th International Conference, pp , Audio Engineering Society, Toronto, Canada, May [1] G. De Poli, A. Piccialli, and C. Roads, Eds., Representations of Musical Signals, M. I. T. Press, Cambridge, Mass, USA, [11] G. Eckel, F. Iovino, and R. Caussé, Sound synthesis by physical modelling with Modalys, in Proc. International Symposium on Musical Acoustics, pp , Le Normant, Dourdan, France, July [1] L. Trautmann and R. Rabenstein, Digital Sound Synthesis by Physical Modeling Using the Functional Transformation Method, Kluwer Academic Publishers, New York, NY, USA, 3. [13] D. A.Jaffe and J. O. Smith, Extensions of the Karplus-Strong plucked-string algorithm, Computer Music Journal, vol.7, no., pp , [14] K. Karplus and A. Strong, Digital synthesis of plucked-string and drum timbres, Computer Music Journal, vol. 7, no., pp , [15] J. O. Smith, Physical modeling using digital waveguides, Computer Music Journal, vol. 16, no. 4, pp , 199. [16] J. O. Smith, Efficient synthesis of stringed musical instruments, in Proc. International Computer Music Conference, pp , Tokyo, Japan, September [17] M. Karjalainen, V. Välimäki, and Z. Jánosy, Towards highquality sound synthesis of the guitar and string instruments, in Proc. International Computer Music Conference, pp , Tokyo, Japan, September [18] M.Karjalainen, V. Välimäki, and T. Tolonen, Plucked-string models, from the Karplus-Strong algorithm to digital waveguides and beyond, Computer Music Journal, vol., no. 3, pp. 17 3, [19] R. Rabenstein, Discrete simulation of dynamical boundary value problems, in Proc. EUROSIM Simulation Congress, pp , Vienna, Austria, September [] L. Trautmann and R. Rabenstein, Digital sound synthesis based on transfer function models, in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,pp , IEEE Signal Processing Society, New Paltz, NY, USA, October [1] L. Trautmann, B. Bank, V. Välimäki, and R. Rabenstein, Combining digital waveguide and functional transformation methods for physical modeling of musical instruments, in Proc. Audio Engineering Society nd International Conference on Virtual, Synthetic and Entertainment Audio, pp , Espoo, Finland, June. [] E. Rank and G. Kubin, A waveguide model for slapbass synthesis, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, pp , IEEE Signal Processing Society, Munich, Germany, April [3] M. Kahrs and K. Brandenburg, Eds., Applications of Digital Signal Processing to Audio and Acoustics, Kluwer Academic Publishers, Boston, Mass, USA, [4] L. Trautmann and R. Rabenstein, Stable systems for nonlinear discrete sound synthesis with the functional transformation method, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol., pp , IEEE Signal Processing Society, Orlando, Fla, USA, May. [5] B. Girod, R. Rabenstein, and A. Stenger, Signals and Systems, John Wiley & Sons, Chichester, West Sussex, UK, 1. [6] R. V. Churchill, Operational Mathematics, McGraw-Hill,New York, NY, USA, 3rd edition, 197. [7] R. Rabenstein and L. Trautmann, Digital sound synthesis of string instruments with the functional transformation

Multirate Simulations of String Vibrations Using the FTM 963 method, Signal Processing, vol. 83, no. 8, pp. 1673 1688, 3. [8] N. H. Fletcher and T. D.

Välimäki, A multirate approach to physical modeling synthesis using the functional transformation method, in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.

41 Multirate Simulations of String Vibrations Using the FTM 963 method, Signal Processing, vol. 83, no. 8, pp , 3. [8] N. H. Fletcher and T. D. Rossing, The Physics of Musical Instruments, Springer-Verlag, New York, NY, USA, [9] L. Trautmann and V. Välimäki, A multirate approach to physical modeling synthesis using the functional transformation method, in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1 4, IEEE Signal Processing Society, New Paltz, NY, USA, October 3. [3] P. P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall, Englewood Cliffs, NJ, USA, [31] S. Petrausch and R. Rabenstein, Sound synthesis by physical modeling using the functional transformation method: Efficient implementation with polyphase filterbanks, in Proc. International Conference on Digital Audio Effects,London,UK, September 3. [3] B. Bank, Accurate and efficient method for modeling beating and two-stage decay in string instrument synthesis, in Proc. MOSART Workshop on Current Research Directions in Computer Music, pp , Barcelona, Spain, November 1. L. Trautmann received his degrees Diplom-Ingenieur and Doktor- Ingenieur in electrical engineering from the University of Erlangen-Nuremberg, in 1998 and, respectively. In 3 he was working as a Postdoc in the Laboratory of Acoustics and Audio Signal Processing at the Helsinki University of Technology, Finland. His research interests are in the simulation of multidimensional systems with focus on digital sound synthesis using physical models. Since 1999, he published more than 5 scientific papers, book chapters, and books. He is a holder of several patents on digital sound synthesis. R. Rabenstein received his degrees Diplom-Ingenieur and Doktor- Ingenieur in electrical engineering from the University of Erlangen-Nuremberg, in 1981 and 1991, respectively, as well as the Habilitation in signal processing in He worked with the Telecommunications Laboratory of this university from 1981 to 1987 and since From 1998 to 1991, he was with the physics department of the University of Siegen, Germany. His research interests are in the fields of multidimensional systems theory and simulation, multimedia signal processing, and computer music. He serves in the IEEE TC on Signal Processing Education. He is a board member of the School of Engineering of the Virtual University of Bavaria and has participated in several national and international research cooperations.

42 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation Physically Inspired Models for the Synthesis of Stiff Strings with Dispersive Waveguides I. Testa Dipartimento di Scienze Fisiche, Università di Napoli Federico II, Complesso Universitario di Monte S. Angelo, 816, Italy italo.testa@na.infn.it G. Evangelista Dipartimento di Scienze Fisiche, Università di Napoli Federico II, Complesso Universitario di Monte S. Angelo, 816, Italy gianpaolo.evangelista@na.infn.it S. Cavaliere Dipartimento di Scienze Fisiche, Università di Napoli Federico II, Complesso Universitario di Monte S. Angelo, 816, Italy cavaliere@na.infn.it Received 3 June 3; Revised 17 November 3 We review the derivation and design of digital waveguides from physical models of stiff systems, useful for the synthesis of sounds from strings, rods, and similar objects. A transform method approach is proposed to solve the classic fourth-order equations of stiff systems in order to reduce it to two second-order equations. By introducing scattering boundary matrices, the eigenfrequencies are determined and their n dependency is discussed for the clamped, hinged, and intermediate cases. On the basis of the frequency-domain physical model, the numerical discretization is carried out, showing how the insertion of an all-pass delay line generalizes the Karplus-Strong algorithm for the synthesis of ideally flexible vibrating strings. Knowing the physical parameters, the synthesis can proceed using the generalized structure. Another point of view is offered by Laguerre expansions and frequency warping, which are introduced in order to show that a stiff system can be treated as a nonstiff one, provided that the solutions are warped. A method to compute the all-pass chain coefficients and the optimum warping curves from sound samples is discussed. Once the optimum warping characteristic is found, the length of the dispersive delay line to be employed in the simulation is simply determined from the requirement of matching the desired fundamental frequency. The regularization of the dispersion curves by means of optimum unwarping is experimentally evaluated. Keywords and phrases: physical models, dispersive waveguides, frequency warping. 1. INTRODUCTION Interest in digital audio synthesis techniques has been reinforced by the possibility of transmitting signals to a wider audience within the structured audio paradigm, in which algorithms and restricted sets of data are exchanged [1]. Among these techniques, the physically inspired models play a privileged role since the data are directly related to physical quantities and can be easily and intuitively manipulated in order to obtain realistic sounds in a flexible framework. Applications are, amongst the others, the simulation of a physical situation producing a class of sounds as, for example, a closing door, a car crash, the hiss made by a crawling creature, the human-computer interaction and, of course, the simulation of musical instruments. In the general physical models technique, continuoustime solutions of the equations describing the physical system are sought. However, due to the complexity of the real physical systems from the classic design of musical instruments to the molecular structure of extended objects solutions of these equations cannot be generally found in an analytic way and one should resort to numerical methods or approximations. In many cases, the resulting approximation scheme only closely resembles the exact model. For this reason, one could better define these methods as physically inspired models, as first proposed in [], where the mathematical equations or solutions of the physical problem serve as a solid base to inspire the actual synthesis scheme. One of the advantages of using physically inspired models for sound synthesis is that they allow us to perform a selection of the physical parameters actually influencing the sound so that a trade-off between completeness and particular goals can be achieved.

43 Physically Inspired Models 965 In the following, we will focus on stiff vibrating systems, including rods and stiff strings as encountered in pianos. However, extensions to two- or three-dimensional systems can be carried out with little effort. Vibrating physical systems have been extensively studied over the last thirty years for their key role in many musical instruments. The wave equation can be directly approximated by means of finite difference equations [3, 4, 5, 6, 7], or by discretization of the wave functions as proposed by Jaffe and Smith [8, 9] who reinterpreted and generalized the Karplus-Strong algorithm [1] in a wave propagation setting. The outcome of the approximation of the time domain solution of the wave equation is the design of a digital waveguide simulating the string itself: the sound signal simulation is achieved by means of an appropriate excitation signal, such as white noise. However, in order to achieve a more realistic and flexible synthesis, the interaction of the excitation system with the vibrating element is, in turn, physically modeled. Digital waveguide methods for the simulation of physical models have been widely used [11, 1, 13, 14, 15, 16]. One of the reasons for their success is that they are appropriate for real-time synthesis [17, 18, 19, ]. This result allowed us to change our approach to model musical instruments based on vibrating strings: waveguides can be designed for modeling the core of the instruments, that is, the vibrating string, but they are also suitable for the integration of interaction models, for example, for the excitation due to a hammer [1] or to a bow [9], the radiation of sound due to the body of the instrument [, 3, 4, 5], and of different side-effects in plucked strings [6]. Itmustbepointed outthatthe interactions being highly nonlinear, their modeling and the determination of the range of stability is not an easy task. In this paper, we will review the design of a digital waveguide simulating a vibrating stiff system, focusing on stiff strings and treating bars as a limit case where the tension in negligible. The purpose is to derive a general framework inspiring the determination of a discrete numerical model. A frequency domain approach has been privileged, which allows us to separate the fourth-order differential equation of stiff systems into two second-order equations, as shown in Section. This approach is also useful for the simulation of two-dimensional (D) systems such as thin plates. By enforcing proper boundary conditions, we obtain the eigenfrequencies and the eigenfunctions of the system as found, for the case of strings, in the classic works by Fletcher [7, 8]. Once the exact solutions are completely characterized, their numerical approximation is discussed [9, 3] together with their justification based on physical reasoning. The discretization of the continuous-time domain solutions is carried out in Section 3, which naturally leads to dispersive waveguides based on a long chain of all-pass filters. From a different point of view, the derived structure can be described in terms of Laguerre expansions and frequency warping [31]. In this framework, a stiff system can be shown to be equivalent to a nonstiff (Karplus-Strong like) system, whose solutions are frequency warped, provided that the initial and the possibly moving boundary conditions are properly unwarped [3, 33]. As a side effect, this property can be exploited in order to perform an analysis of piano sounds by means of pitch-synchronous frequency warped wavelets in which the excitation can be separated from the resonant sound components [34]. The models presented in this paper provide at least two entry points for the synthesis. If the physical parameters and boundary conditions are completely known, or if it is desired to specify them to model arbitrary strings or rods, then the eigenfunctions, hence the dispersion curve, can be determined. The problem is then reconducted to that of finding the best approximation of the continuous-time dispersion curve with the phase response of a suitable all-pass chain using themethods illustrated in Section 3. Another entry point is offered if sound samples of an instrument are available. In this case, the parameters of the synthesis model can be determined by finding the warping curve that best fits the data given by the frequencies of the partials, together with the length of the dispersive delay line. This is achieved by means of a regularization method of the experimental dispersion data, as reported in Section 4. The physical entry point is to be preferred in those situations where sound samples are not available, for example, when we are modeling a nonexisting instrument by extension of the physical model, such as a piano with unusual speaking length. The other entry level is best for approximating real instrument sounds. However, in this case, the synthesis is limited to existing sources, although some variations can be obtained in terms of the warping parameters, which are related to, but do not directly represent, physical factors.. PHYSICAL STIFF SYSTEMS In this section, we present a brief overview of the stiff string and rod equations of motion and of their solution. The purpose is twofold. On the one hand, these equations give the necessary background to the physical modeling of stiff strings. On the other hand, we show that their frequency domain solution ultimately provides the link between continuous-time and discrete-time models, useful for the derivation of the digital waveguide and suitable for their simulation. This link naturally leads to Laguerre expansions for the solution and to frequency warping equivalences. Furthermore, enforcing proper boundary conditions determines the eigenfrequencies and eigenfunctions of the system, useful for fitting experimentally measured resonant modes to the ones obtained by simulation. This fit allows us to determine the parameters of the waveguide through optimization..1. Stiff string and bar equation The equation of motion for the stiff string can be determined by studying the equilibrium of a thin plate [35, 36]. One obtains the following 4th-order differential equation for the deformation of the string y(x, t): ε 4 y(x, t) x 4 y(x, t) x = 1 y(x, t) c t, ε = EI T, c = T ρs, (1)

44 966 EURASIP Journal on Applied Signal Processing featuring the Young modulus of the material E, the inertia moment I with respect to the transversal axis of the crosssection of the string (for a circular section of radius r, I = πr 4 /4asin[36]), the tension of the string T, and the mass per unit length ρs. Note that for ε, (1) becomes the wellknown equation of the vibrating string [35]. Otherwise, if the applied tension T is negligible, we obtain ε 4 y(x, t) x 4 = y(x, t) t, ε = EI ρs, () which is the equation for the transversal vibrations of rods. Solutions of (1)and() are best found in terms of the Fourier transform of y(x, t) withrespecttotime: Y(x, ω) = y(x, t) exp( iωt)dt, (3) where ω is the angular velocity related to frequency f by the relationship f = πω. By taking the Fourier transform of both members of (1) and (), we obtain ε 4 Y(x, ω) x 4 Y(x, ω) x for the stiff string and ω Y(x, ω) = (4) c ε 4 Y(x, ω) x 4 ω Y(x, ω) = (5) for the rod. The second-order / x spatial differential operator is defined as a repeated application of the L space extension of the i( / x)operator[37]. To the purpose, we seek solutions whose spatial and frequency dependency can be factored, according to the separation of variables method, as follows: Y(x, ω) = W(ω)X(x). (6) Substituting (6) in(4) and(5) results in the elimination of the W(ω) term, obtaining ordinary differential equations, whose characteristic equations, respectively, are ελ 4 λ ω = c (stiff string), ε λ 4 ω = (rod). The elementary solutions for the spatial part X(x) have the form X(x) = C exp(λx). It is important to note that in both cases, the characteristics equations have the following form: ( λ ξ1 )( λ ξ ) =, (8) where ξ 1 and ξ are, in general, complex numbers that depend on ω. Equation(8) allowsustofactorbothequations in (4) and(5) as follows: (7) [ ] [ ] x ξ 1 x ξ Y(x, ω) =. (9) The operator / x is selfadjoint with respect to the L scalar product [37]. Therefore, (9) can be separated into the following two independent equations: where [ x ξ 1 [ x ξ ] ] Y 1 (x, ω) =, Y (x, ω) =, (1) Y(x, ω) = Y 1 (x, ω)y (x, ω). (11) As we will see, (1) justifies the use, with proper modifications, of a second-order generalized waveguide based on progressive and regressive waves for the numerical simulation of stiff systems... General solution of the stiff string and bar equations In this section, we will provide the general solution of (8). The particular eigenfunctions and eigenfrequencies of rods and stiff strings are determined by proper boundary conditions and are treated in Section.3.From(7), it can be shown that ξ 1 ± =± 14ω ε/c 1 ε 14ω (stiff string), ξ ± ε/c =± 1 ε ξ 1 ± =± ω (1) ε (rod). ω ξ ± =± ε Note that in both cases, the eigenvalues ξ ± 1 are complex numbers, while ξ ± are real numbers. It is also worth noting that ξ1 ξ = 1 ε ξ1 ξ = (stiff string), (rod), (13) where ξ 1 corresponds to the positive choice of the sign in front of the square root in (1)andξ = ξ ±. As expected, if we let T, then both sets of eigenvalues of the stiff string tend to those found for the rod. Using the equations in (1), we then have for both strings and rods Y 1 (x, ω) = c 1 exp ( ξ 1 x ) c1 exp ( ξ 1 x ), Y (x, ω) = c exp ( ξ x ) c exp ( ξ x ), (14) where c 1 ±, c ± are, in general, functions of ω. Note that Y 1 (x, ω) is an oscillating term, while, since ξ is real, Y (x, ω) is nonoscillating. For finite-length strings, both positive and negative real exponentials are to be retained.

45 Physically Inspired Models 967 From (1), we see that the primary effect of stiffness is the dependency on frequency of the argument (from now on, phase) of the solutions of (4) and(5). Therefore, the propagation of the wave from one section of the string located at x to the adjacent section located at x x is obtained by multiplication of a frequency dependent factor exp(ξ 1 x). Consequently, the group velocity u,definedasu (dξ 1 /dω) 1, also depends on frequency. This results in a dispersion of the wave packet, characterized by the function ξ 1 (ω), whose modulus is plotted in Figure 1 for the case of a brass string using the following values of the physical parameters r, T, ρ, and E: r = 1 mm, T = dyne, ρ = 8.44 g cm 3, E = dyne cm. (15) Clearly, the previous example is a very crude approximation of a physical piano string (e.g., real-life piano strings in the low register are built out of more than one material and a copper or brass wire is wrapped around a steel core). For the sake of completeness, we give the explicit expression of u in both the cases we are studying. We have (c c 4ω ε ) u = (c (c ± c 4ω ε )) (stiff string), u = ω ε (rod). (16) If T, the two group velocities are equal. Moreover, if in the first line in (16), we let ε, then u c, which is the limit case of the ideally flexible vibrating string. These facts further justify the use of a dispersive waveguide in the numerical simulation. With respect to this point, a remark is in order: the dispersion introduced by stiffness canbe treatedas a limiting nonphysical consequence of the Euler-Bernoulli beam equation: [ ] d dx EI d y dx = p, (17) where p is the distributed load acting on the beam. It is nonphysical in the sense that u as ω. However, in the discrete-time domain, this nonphysical situation is avoided ifwesupposeallthesignalsbebandlimited..3. Complete characterization of stiff string and rod solution Boundary conditions for real piano strings lie in between the conditions of clamped extrema: ( Y L ) ( ) L, ω = Y, ω =, Y(x, ω) x = Y(x, ω) L/ x =, L/ (18) ξ1(cm 1 ) Frequency (Hz) 4 Figure 1: Plot of the phase module of the stiff model equation solution for ε = π/4cm and c 1 4 cm s 1. and of hinged extrema [5, 16, 31, 35, 36]: ( Y L ) ( ) L, ω = Y, ω =, Y(x, ω) x = Y(x, ω) L/ x =. L/ (19) Before determining the conditions for the eigenfrequencies of the considered stiff systems, we find a more compact way of writing (18) and(19). Starting from the factorized form of the stiff systems equation (see (1)), and using the symbols introduced in Section.,we have wherewelet Y 1 (x, ω) = ψ 1 (x, ω)ψ1 (x, ω), Y (x, ω) = ψ (x, ω)ψ (x, ω), ψ 1 ± (x, ω) = c 1 ± exp ( ξ 1 ± x ), ψ ± (x, ω) = c ± exp ( ξ ± x ). Conditions (18) can then be rewritten as follows: ( Y 1 L ) (, ω = Y L ),, ω ( ) ( L L Y 1, ω = Y ),, ω Y 1(x, ω) x = Y (x, ω) L/ x, L/ Y 1(x, ω) x = Y (x, ω) L/ x. L/ () (1) () At the terminations of the string or of the rod, we have ψ 1 ψ1 = ( ψ ) ψ, ξ 1 ψ 1 ξ1 ψ1 = ( ξ ψ ) (3) ξ ψ,

46 968 EURASIP Journal on Applied Signal Processing which can be rewritten in matrix form: [ ][ ] [ ][ ] 1 1 ψ ψ ξ 1 ξ ψ = 1 ξ1 ξ ψ. (4) By left-multiplying both members of (4) for the inverse of the [ 1 1 ] ξ 1 ξ matrix, we have [ ] [ ] ψ 1 ψ = S 1 c ψ, (5) ψ where we let ( ξ ξ ) 1 ξ S ξ 1 c ξ 1 ξ ξ 1 ξ ξ ξ 1 ξ ξ 1. (6) ξ ξ 1 The matrix S c relates the incident wave with the reflected wave at the boundaries. Independently of the roots ξ i,ithas the following properties: S c = 1, [ ] 1 (7) S c =. 1 In the case of a hinged stiff system (see (19)) at both ends, we have ψ 1 ψ1 = ( ψ ) ψ, ( ) ξ ψ 1 1 ( ) ψ ( ξ1 (ξ ) ψ 1 = ( ) ψ ) (8) ξ which, in matrix form, becomes [ ][ ] [ ][ ] 1 1 ψ ( ) ξ ( ) ψ 1 ξ ψ = ( ) ξ ( ) 1 1 ξ ψ. (9) By taking the inverse of matrix [ 1 1 ] (ξ 1 ) (ξ ),weobtain [ ] [ ] ψ 1 ψ ψ = S 1 h ψ, (3) where S h = [ ] 1. (31) 1 The S h matrix for the hinged stiff system is independent of roots ξ i.thematricess h and S c are related in the following way: S h = S c, S h = (3) S c. In conclusion, the boundary conditions for stiff systems can be expressed in terms of matrices that can be used in thenumerical simulation ofstiff systems. Moreover, since the real-life boundary conditions for stiff strings in piano lie in between the conditions given in (18) and(19), we can combine the two matrices S c and S h in order to enforce more general conditions, as illustrated in Section 3. In the following, we will solve (4) and(5) applying separately these sets of boundary conditions The clamped stiff string and rod In order to characterize the eigenfunctions in the case of conditions (18), in (1)welet ξ 1 = iξ 1 (33) for both the stiff string and the rod solution. By definition, ξ 1 is a real number. Moreover, for the rod, we have ξ 1 = ξ. With this position, it can be shown that conditions (18) for the stiff string lead to the equations [35, 38] ( tan ( tanh ξ 1 ) L ) ξ L while, for the rod, we have ( ) L tanh ξ ( ) tan ξ 1 L [ ] ξ 1 = ξ [ ], (34) cos ( ξ 1L ) cosh ( ξ L ) = 1. (35) Equations (34) and(35) can be solved numerically. In particular, taking into account the second line in (1), solutions of (35)are[35] ω n = π ( 3.11,5,7,...,(n 1) ) α, 4 4 ε α = L. (36) A similar trend can be obtained for the stiff string. In view of their historical and practical relevance, we here report the numerical approximation for the allowed eigenfrequencies of the stiff string given by Fletcher [7]: ( ω n nπ c ) (1n π L α )( 1α 4α ), α = ε L. (37) If we expand the above expression in a series of powers of α truncated to second order, we have the following approximate formula valid for small values of stiffness: ω n ( nπ c L )[ 1α (1 1 8 n π )(α) ]. (38) The last approximation does not apply to bars. For ε =, we have α = and the eigenfrequencies tend to the well-known formula for the vibrating string [35]: ω n = nω 1. (39) Typical curves of the relative spacing χ n ω n /ω 1,where ω n ω n1 ω n, of eigenfrequencies for the stiff string are

47 Physically Inspired Models 969 Relative spacing r = 3mm r = 1mm Deviation from linearity (Hz) r = 3mm r = 1mm Partial number Frequency (Hz) Figure : Typical eigenfrequencies relative spacing curves of the clamped stiff string for different values of the radius r of the section S. Figure 3: Typical warping curves of the clamped stiff string for different values of the radius r of the section S. shown in Figure with variable r, where values of the other physical parameters are the same as in (15). Due to the dependency on the frequency of the phase of the solution, the eigenfrequencies of the stiff string are not equally spaced. For a small radius r, henceforlowdegree of the stiffness of the string (see (1)), the relative spacing is almost constant for all the considered order of eigenfrequencies. However, for higher stiffness, the spacing of the eigenfrequencies increases, in first approximation, as a linear function of the order of the eigenfrequency. The above results are summarized by the typical warping curves of the system, shown in Figure 3, in which the quantity ω n ω n,whichrepresents the deviation from the linearity, is plotted in terms of spacing ω n between consecutive eigenfrequencies. In the stiff string case, we have two sets of eigenfunctions, one having even parity and the other one having odd parity, whose analytical expressions are respectively given by [38] ( Y(x, ω)=c(ω)cos ( Y(x, ω)=c(ω)sin ξ 1 ξ 1 ) L [ cos ( ξ 1x ) cos ( ξ 1(L/) ) cosh ( ξ x ) ] cosh ( ξ (L/) ), ) L [ sin ( ξ 1x ) sin ( ξ 1(L/) ) sinh ( ξ x ) ] sinh ( ξ (L/) ), (4) where C(ω) is a constant that can be calculated imposing the initial conditions..3.. Hinged stiff string and rod Conditions (19) lead to the following sets of equations for the stiff string: sin ( ξ 1L ) sinh ( ξ L ) =, ξ 1 ξ =, (41) and for the rod: sin ( ξ 1L ) sinh ( ξ L ) =. (4) The second line in (41) has no solutions since both ξ 1 and ξ are real functions. It follows that hinged stiff systems are only described by (4). In this equation, sinh(ξ L) = has no solution, hence the eigenfrequencies are determined by the condition ξ 1 = nπ L. (43) Using the parameters α and α respectively defined in (36) and (37), the eigenfrequencies for the hinged stiff string are exactly expressed as follows: ( ω n = nπ c ) (n π L α 1 ), (44) while for the rod, we have ω n = n π α. (45) As the tension T, (44) tends to (45). Figure 4 shows the relative spacing of the eigenfrequencies in the case of the hinged stiff string. Relative eigenfrequencies spacing curves are very similar to the ones of the clamped string and so are the warping curves of the system, as shown in Figure 5. Using (45), we can give an analytic expression for the relative spacing of the eigenfrequencies of the hinged rod. We have π α (n 1). (46) Equation (43) leads to the following set of odd and even

48 97 EURASIP Journal on Applied Signal Processing 1 8 r = 3mm X(z) P delays z P Y(z) Relative spacing 6 4 r = 1mm Partial number Figure 4: Typical eigenfrequencies relative spacing curves of the hinged stiff string for different values of the radius r of the section S. Deviation from linearity (Hz) r = 3mm r = 1mm Frequency (Hz) Figure 5: Typical warping curves of the hinged stiff string for different values of the radius r of the section S. eigenfunctions for the stiff string [38]: ( ) nπ Y n (x, ω) = D(ω)sin L x, ( ) (n 1)π Y n (x, ω) = D(ω)cos x, L (47) whered(ω) must be determined by enforcing the initial conditions. It is worth noting that both functions in (47) are independent of the stiffness parameter ε. In Section 3, we will use the obtained results in order to implement the dispersive waveguides digitally simulating the solutions of (4)and(5). Finally, we need to stress the fact that the eigenfrequencies of the hinged stiff string are similar to the ones for the clamped case except for the factor (1 α 4α ). Therefore, for small values of stiffness, they do not differ too much. This G(z) (low-pass) Figure 6: Basic Karplus-Strong delays cascade. can also be seen from the similarity of the warping curves obtained with the two types of boundary conditions. Taking into account the fact that real-piano strings boundary conditions lie in between these two cases, we can conclude that the eigenfrequencies of real-piano strings can be calculated by means of the approximated formula [7, 8]: ω n An Bn 1, (48) where A and B can be obtained from measurements. Approximation (48) is useful in order to match measured vibrating modes against the model eigenfrequencies. 3. NUMERICAL APPROXIMATIONS OF STIFF SYSTEMS Most of the problems encountered when dealing with the continuous-time equation of the stiff string consist in determining the general solution and in relating the initial and boundary conditions to the integrating constants of the equation. In this section, we will show that we can use a similar technique also in discrete-time, which yields a numerical transform method for the computation of the solution. In Section, we noted that (1) becomes the equation of vibrating string in the case of negligible stiffness coefficient ε. It is well known that the technique known as Karplus-Strong algorithm implements the discrete-time domain solution of the vibrating string equation [8], allowing us to reach good quality acoustic results. The block diagram of the adopted loop circuitis shown in Figure 6. The transfer function of the digital loop chain can be written as follows: H(z) = 1 1 z P G(z), (49) where the loop filter G(z) takes into account losses due to nonrigid terminations and to internal friction, and P is the number of sections in which the string is subdivided, as obtained from time and space sampling. Loop filters design can be based on measured partial amplitude and frequency trajectories [18], or on linear predictive coding (LPC)-type methods [9]. The filter G(z) can be modelled as IIR or FIR and it must be estimated from samples of the sound or from a model of the string losses, where, for stability, we need G(e jω ) < 1. Clearly, in the IIR case or in the nonlinear phase FIR case, the phase response of the loop filter introduces a

49 Physically Inspired Models u =.9 X(z) P all-pass cascade A(z) P Y(z).5 Discrete frequency (Hz) G(z) (low-pass) Figure 8: Dispersive waveguide used to simulate dispersive systems..5 u = Discrete frequency (Hz) Figure 7: First-order all-pass phase plotted for various values of u. limited amount of dispersion. Additional phase terms in the form of all-pass filters can be added in order to tune the string model to the required pitch [13] and contribute to further dispersion. Since the group velocity for a traveling wave for a stiff system depends on frequency (see (16)), it is natural to substitute, in discrete time, the cascade of unit delays with a chain of circuital elements whose phase responses do depend on frequency. One can show that the only choice that leads to rational transfer functions is given by a chain of first-order all-pass filters [39, 4]. More complex physical systems, for example, as in the simulation of a monaural room, call for substituting the delays chain with a more general filter as illustrated in [41]: A(z, u) = z 1 u 1 uz 1 (5) whose phase characteristic is u sin(ω) θ(ω)= Ω arctan 1 u cos(ω). (51) The phase characteristics in (51) areplottedinfigure 7 for variousvaluesofu. A comparison between the curve in Figure 1 and the ones in Figure 7 gives more elements of plausibility for the approximation of the solution phase of the stiff model equations, given in (1), with the all-pass filter phase (51). Adopting a similar circuital scheme as in the Karplus-String algorithm [1] in which the unit delays are replaced by first-order allpass filters, the approximation is given by ξ 1 ( ) P Ω fs θ(ω), (5) L where f s is the sampling frequency. Note that, by definition, both members of (5) are real numbers. Therefore, in the z- domain, a nonstiff system can be mapped into a stiff system by means of the frequency warping map z 1 A(z). (53) The resulting circuit is shown in Figure 8. Note, that the feedback all-pass chain results in delay-free loops. Computationally, these loops can be resolved by the methods illustrated in [34, 4, 43]. Moreover, the phase response of the loop filter G(z) contributes to the dispersion and it must be taken into account in the global model. The circuit in Figure 8 can be optimized in order to take into account the losses and the coupling amongst strings (e.g., as in piano). In the framework of this paper, we confined our interest to the design of the stiff system filter. For a review of the design of lossy filters and coupling models, see [17] Stiff system filter parameters determination Within the framework of the approximation (5) in the case of dispersive waveguide, the integer parameter P can be obtained by constraining the two functions to attain the same values at the extrema of the bandwidth. Since θ(π) = π, we have P = ξ ( ) 1 πfs L. (54) π As we will see, condition (54) is not the only one that can be obtained for the parameter P. The deviation from linearity introduced by the warping θ(ω) canbe written asfollows: (Ω) θ(ω) Ω = arctan u sin(ω) 1 u cos(ω). (55) The function (Ω) is plotted, for different values of u, in Figure 9. One can see that the absolute value of (Ω) has a maximum which corresponds to the maximum deviation from the linearity of θ(ω). Itcanbe shown thatthis maximum occurs for Ω = Ω M = arccos(u) (56)

50 97 EURASIP Journal on Applied Signal Processing Deviation from linearity (Hz) u =.9 u = Discrete frequency (Hz) Figure 9: Plot of the deviation from linearity of the all-pass filter phase for different values of parameter u. for which the maximum deviation is Substituting (56) in(51), we have ( Ω M, u ) = arcsin(u). (57) θ ( ) π Ω M = arcsin(u). (58) Since the solution phase ξ 1 is approximated by θ(ω), it has to satisfy the condition ( ) ΩM L ξ 1 T P π arcsin(u) (59) and therefore, we have the following bound on P: P Lξ ( 1 fs arccos(u) ) π/ arcsin(u). (6) For higher-order Q all-pass filters, (6)canbewrittenasfol- lows: P 1 Q Q i=1 ξ 1 ( fs arccos ( u i )) L π/arcsin ( u i ). (61) An optimization algorithm can be used to obtain the vector parameter u. Based on our experiments, we estimated that an optimal order Q is 4 for the piano string. Therefore, using the values in (15) for the 58 Hz tone of an L = cm brass string, we obtain P = 9. Although this is not a model for a real-life wound inhomogeneous piano string, this example gives a rough idea of the typical number of the required all-pass sections. The computation of this long allpass chain can be too heavy for real-time applications. Therefore, an approximation of the chain by means of a cascade of an all-pass of order much smaller than P with unit delays is usually sought [13, 9, 3]. A simple and accurate approach is to model the all-pass as a cascade of first-order sections with variable real parameter u [38]. However, a more general approach calls for including in the design second-order all-pass sections, equivalent to a pair of complex conjugated first-order sections [9]. InSection 4, wewillbypassthis estimation procedure based on the theoretical eigenfunctions of the string to estimate the all-pass parameters and the number of sections from samples of the piano. 3.. Laguerre sequences An invertible and orthogonal transform, which is related to the all-pass chain included in the stiff string model, is given by the Laguerre transform [44, 45]. The Laguerre sequences l i [m, u] are best defined in the z-domain as follows: 1 u [ z L i (z, u) 1 ] u i = 1 uz 1 1 uz 1. (6) Thus, the Laguerre sequences can be obtained from the z- domain recurrence 1 u L (z, u) = 1 uz 1, (63) L i1 (z, u) = A(z)L i (z, u), where A(z) is defined as in(5). Comparison of (6) with (5) shows that the phase of the z transform of the Laguerre sequences is suitable for approximating the phase of the solution of the stiff model equation. A biorthogonal generalization of the Laguerre sequences calling for a variable u from section to section is illustrated in [46]. This is linked to the refined approximation of the solution previously shown Initial conditions Putting together the results obtained in Section 1, wecan write the solution phase of the stiff model Y(Ω, x)asfollows (see (11)and(14)): Y(ω, x) = c 1 (ω)exp ( iξ 1x ) c 1 (ω)exp ( iξ 1x ). (64) We are now disregarding the transient term due to ξ since it does not influence the acoustic frequencies of the system. In discrete time and space, we let x = m(l/p) asin[1]. With the approximation (5), (64)becomes Y(m, Ω) c 1 (Ω)exp ( imθ(ω) ) c 1 (Ω)exp ( imθ(ω) ). (65) Substituting (63)in(65), we have Y(Ω, m) c 1 (Ω) L m(ω, u) L (z, u) c 1 (Ω) L m(ω, u) L (z, u), (66) where we have used the fact that A ( e iω, u ) = e iω u 1 ue iω = exp ( iθ(ω) ). (67)

51 Physically Inspired Models 973 By defining V (Ω) c 1 (Ω) L (z, u), (66) can be written as follows: V (Ω) c 1 (Ω) L (z, u), (68) Y(m, Ω) V (Ω)L m (Ω, u)v (Ω)L m (Ω, u). (69) Taking the inverse discrete-time Fourier transform (IDTFT) on both sides of (69), we obtain where y[m, n] y [m, n]y [m, n], (7) y [m, n] = y [m, n] = k= k= v [n k]l m [k, u], v [n k]l m [k, u], (71) and the sequences v ± (n) are the IDTFT of V ± (Ω). For the sake of conciseness, we do not report here the expression of v ± [n] in terms of constants c ± 1. For further details, see [31, 38]. The expression of the numerical solution y[m, n] canbe written in terms of a generic initial condition y[m,]= y [m,]y [m,]. (7) In order to do this, we resort to the extension of Laguerre sequences to negative arguments: and to the property If we introduce the quantity l m [n, u], n, l m [n, u] = l m [ n, u], n<, (73) l m [n, u] = l n [m, u]. (74) y ± k [u] = m= y ± [m,]lk [±m, u], l k [±m, u] = l ±m[k, u] (75) with a simple mathematical manipulation, (71)canbewritten as follows: y [m, n] = y [m, n] = k= k= y k [u]l m[k n, u], y k [u]l m[k n, u]. Therefore, the numeric solution becomes y[m, n] = k= y k l m[k n, u] k= (76) y k l m[k n, u]. (77) We have just shown that the solution of the discrete-time stiff model equation can be written as a Laguerre expansion of the initial condition. At the same time, this shows that the stiff string model is equivalent to a nonstiff string model cascaded by frequency warping obtained by Laguerre expansion Boundary conditions In Section 1, we discussed the stiff model equation boundary conditions in continuous time (see (18) and(19)). In this section, we will discuss the homogenous boundary conditions (i.e., the first line in both (18) and(19)) in the discrete-time domain. Using approximation (5) and letting the number of sections of the stiff system P be an even integer, we can write the homogenous conditions as follows (see also (69)): ( Y P ), Ω = = V (Ω)L P/ (Ω, u)v (Ω)L P/ (Ω, u) =, ( Y P ), Ω = = V (Ω)L P/ (Ω, u)v (Ω)L P/ (Ω, u) =. Like (34), (78) can be expressed in matrix form: [ ][ ] LP/ (Ω, u) L P/ (Ω, u) V (Ω) L P/ (Ω, u) L P/ (Ω, u) V (Ω) = (78) [ ]. (79) As shown in Section 3.3, the functions V ± (Ω) are determined by means of Laguerre expansion of the initial conditions sequences through (71)and(76). For any choice of these initial conditions, the determinant of the coefficients matrix in (79) must be zero, obtaining the following condition: [ LP/ (Ω, u) ] [ L P/ (Ω, u)] =. (8) Recalling the z-transform expression for the Laguerre sequences, we have sin [ θ(ω)p ] =, θ(ω) = kπ k = 1,, 3,... (81) P In the stiff string case, the eigenfrequencies of the system are not harmonically related. In our approximation of the phase of the solution with the digital all-pass phase, the harmonicity is reobtained at a different level: the displacement of the all-pass phase values is harmonic according to the law written in (81). The distance between two consecutive values of this phase is π/p. Due to the nonrigid terminations, the reallife boundary conditions can be given in terms of frequency dependent functions, which are included in the loop filter. In mapping the stiff structure to a nonstiff one, care must be taken into unwarping the loop filter as well. 4. SYNTHESIS OF SOUND In order to implement a piano simulation via the physical model, we need to determine the design parameters of the

52 974 EURASIP Journal on Applied Signal Processing.7 18 Warping parameter Spacing of the partials (Hz) Partial number Partial number Figure 1: Computed all-pass optimized parameters u. Figure 11: Warped deviation from linearity. dispersive waveguide, that is, the number of all-pass sections and the coefficients u i of the all-pass filters. This task could be performed by means of lengthy measurements or estimation of the physical variables, such as tension, Young s module, density, and so forth. However, as we already remarked, due to the constitutive complexity of the real-life piano strings and terminations, this task seems to be quite difficult and to lead to inaccurate results. In fact, the given physical model only approximately matches the real situation. Indeed, in ordertomodelandjustifythemeasuredeigenfrequencies, we resorted to Fletcher s experimental model described by (48). However, in that case, we ignore the exact form of the eigenfunctions, which is required in order to determine the number of sections of the waveguide and the other parameters.amorepragmaticandeffective approach is to estimate the waveguide parameters directly from the measured eigenfrequencies ω n. These can be extracted, for example, from recorded samples of notes played by the piano under exam. Fletcher s parameters A and B can be calculated as follows: A = 1 16ωn ωn, n 3 B = 1 n 4γ γ, γ = ω n ω n. (8) In practice, in the model where the all-pass parameters u i are equal throughout the delay line, one does not even need to estimate Fletcher s parameters. In fact, in view of the equivalence of the stiff string model with the warped nonstiff model, one can directly determine, through optimization, the parameter u that makes the dispersion curve of the eigenfrequencies the closest to a straight line, using a suitable distance. A result of this optimization is shown in Figure 1. It must be pointed out that our point of view differs from the one proposed in [9, 3], where the objective is the min- Normalized frequency Partial number Figure 1: Optimized all-pass parameters u for A#3 tone. imization of the number of nontrivial all-pass sections in the cascade. Given the optimum warping curve, the number of sections is then determined by forcing the pitch of the cascade of the nonstiff model (Karplus-Strong like) with warping to match the required fundamental frequency of the recorded tone. An example of this method is shown in Figure 11, where the measured warping curves pertaining to several piano keys in the low register, as estimated from the resonant eigenfrequencies, are shown. In Figure 1, the optimum sequence of all-pass parameters u for the examined tones is shown. Finally, in Figure 13, the plot of the regularized dispersion curves by means of optimum unwarping is shown. For further details about this method, see [47, 48, 49]. Frequency warping has also been employed in conjunction with D waveguide meshes in the effort of reducing the artificial

53 Physically Inspired Models 975 Frequency (Hz) CONCLUSIONS In order to support the design and use of digital dispersive waveguides, we reviewed the physical model of stiff systems, using a frequency domain approach in both continuous and discrete time. We showed that, for dispersive propagation in the discrete-time, the Laguerre transform allows us to write the solution of the stiff model equation in terms of an orthogonal expansion of the initial conditions and to reobtain harmonicity at the level of the displacement of the allpass phase values. Consequently, we showed that the stiff string model is equivalent to a nonstiff string model cascaded with frequency warping, in turn obtained by Laguerre expansion. Finally, we showed that due to this equivalence, the all-pass coefficients can be computed by means of optimization algorithms of the stiff model with a warped nonstiff one. The exploration of physical models of musical instruments requires mathematical or physical approximations in order to make the problem treatable. When available, the solutions will only partially reflect the ensemble of mechanical and acoustic phenomena involved. However, the physical models serve as a solid background for the construction of physically inspired models, which are flexible numerical approximations of the solutions. Per se, these approximations are interesting for the synthesis of virtual instruments. However, in order to fine tune the physically inspired models to real instruments, one needs methods for the estimation of the parameters from samples of the instrument. In this paper, we showed that dispersion from stiffness is a simple case in which the solution of the raw physical model suggests a discrete-time model, which is flexible enough to be used in the synthesis and which provides realistic results when the characteristics are estimated from the samples Partial number Figure 13: Optimum unwarped regularized dispersion curves. dispersion introduced by the nonisotropic spatial sampling [5]. Since the required warping curves do not match the first-order all-pass phase characteristic, in order to overcome this difficulty, a technique including resampling operators has been used in [5, 51] according to a scheme first introduced in [33] and further developed in [5] for the wavelet transforms. However, the downsampling operators inevitably introduce aliasing. While in the context of wavelet transforms, this problem is tackled with multichannel filter banks, this is not the case of D waveguide meshes. REFERENCES [1] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer, Structured audio: creation, transmission, and rendering of parametric sound representations, Proceedings of the IEEE, vol. 86, no. 5, pp. 9 94, [] P. Cook, Physically informed sonic modeling (PhISM): synthesis of percussive sounds, Computer Music Journal, vol. 1, no. 3, pp , [3] L. Hiller and P. Ruiz, Synthesizing musical sounds by solving the wave equation for vibrating objects: Part I, Journal of the Audio Engineering Society, vol. 19, no. 6, pp , [4] L. Hiller and P. Ruiz, Synthesizing musical sounds by solving the wave equation for vibrating objects: Part II, Journal of the Audio Engineering Society, vol. 19, no. 7, pp , [5] A. Chaigne and A. Askenfelt, Numerical simulations of piano strings. I. A physical model for a struck string using finite difference methods, Journal of the Acoustical Society of America, vol. 95, no., pp , [6] A. Chaigne and A. Askenfelt, Numerical simulations of piano strings. II. Comparisons with measurements and systematic exploration of some hammer-string parameters, Journal of the Acoustical Society of America, vol. 95, no. 3, pp , [7] A. Chaigne, On the use of finite differences for musical synthesis. Application to plucked stringed instruments, Journal of the d Acoustique, vol. 5, no., pp , 199. [8] D. A. Jaffe and J. O. Smith III, Extensions of the Karplus- Strong plucked-string algorithm, The Music Machine, C. Roads, Ed., pp , MIT Press, Cambridge, Mass, USA, [9] J. O. Smith III, Techniques for digital filter design and system identification with application to the violin, Ph.D. thesis, Electrical Engineering Department, Stanford University (CCRMA), Stanford, Calif, USA, June [1] K. Karplus and A. Strong, Digital synthesis of plucked-string and drum timbres, The Music Machine, C.Roads,Ed.,pp , MIT Press, Cambridge, Mass, USA, [11] J. O. Smith III, Physical modeling using digital waveguides, Computer Music Journal, vol. 16, no. 4, pp , 199. [1] J. O. Smith III, Physical modeling synthesis update, Computer Music Journal, vol., no., pp , [13] S. A. Van Duyne and J. O. Smith III, A simplified approach to modeling dispersion caused by stiffness in strings and plates, in Proc International Computer Music Conference, pp , Aarhus, Denmark, September [14] J. O. Smith III, Principles of digital waveguide models of musical instruments, in Applications of Digital Signal Processing to Audio and Acoustics, M.KahrsandK.Brandenburg, Eds., pp , Kluwer Academic Publishers, Boston, Mass, USA, [15] M. Karjalainen, T. Tolonen, V. Välimäki, C. Erkut, M. Laurson, and J. Hiipakka, An overview of new techniques and effects in model-based sound synthesis, Journal of New Music Research, vol. 3, no. 3, pp. 3 1, 1.

54 976 EURASIP Journal on Applied Signal Processing [16] J. Bensa, S. Bilbao, R. Kronland-Martinet, and J. O. Smith III, The simulation of piano string vibration: from physical models to finite difference schemes and digital waveguides, Journal of the Acoustical Society of America, vol. 114, no., pp , 3. [17] B. Bank, F. Avanzini, G. Borin, G. De Poli, F. Fontana, and D. Rocchesso, Physically informed signal processing methods for piano sound synthesis: a research overview, EURASIP Journal on Applied Signal Processing, vol. 3, no. 1, pp , 3. [18] V. Välimäki, J. Huopaniemi, M. Karjalainen, and Z. Jánosy, Physical modeling of plucked string instruments with application to real-time sound synthesis, Journal of the Audio Engineering Society, vol. 44, no. 5, pp , [19] J. O. Smith III, Efficient synthesis of stringed musical instruments, in Proc International Computer Music Conference, pp , Tokyo, Japan, September [] M. Karjalainen, V. Välimäki, and Z. Jánosy, Towards highquality sound synthesis of the guitar and string instruments, in Proc International Computer Music Conference, pp , Tokyo, Japan, September [1] G. Borin and G. De Poli, A hysteretic hammer-string interaction model for physical model synthesis, in Proc. Nordic Acoustical Meeting, pp , Helsinki, Finland, June [] G. E. Garnett, Modeling piano sound using digital waveguide filtering techniques, in Proc International Computer Music Conference, pp , Urbana, Ill, USA, August [3] J. O. Smith III and S. A. Van Duyne, Commuted piano synthesis, in Proc International Computer Music Conference,pp , Banff, Canada, September [4] S. A. Van Duyne and J. O. Smith III, Developments for the commuted piano, in Proc International Computer Music Conference,pp , Banff, Canada, September [5] M. Karjalainen and J. O. Smith III, Body modeling techniques for string instrument synthesis, in Proc International Computer Music Conference, pp. 3 39, Hong Kong, August [6] M. Karjalainen, V. Välimäki, and T. Tolonen, Plucked-string models, from the Karplus-Strong algorithm to digital waveguides and beyond, Computer Music Journal, vol., no. 3, pp. 17 3, [7] H. Fletcher, Normal vibration frequencies of a stiff piano string, Journal of the Acoustical Society of America, vol. 36, no. 1, pp. 3 9, [8] H. Fletcher, E. D. Blackham, and R. Stratton, Quality of piano tones, Journal of the Acoustical Society of America, vol. 34, no. 6, pp , 196. [9] D. Rocchesso and F. Scalcon, Accurate dispersion simulation for piano strings, in Proc. Nordic Acoustical Meeting, pp , Helsinki, Finland, June [3] D. Rocchesso and F. Scalcon, Bandwidth of perceived inharmonicity for physical modeling of dispersive strings, IEEE Trans. Speech, and Audio Processing, vol. 7, no. 5, pp , [31] I. Testa, G. Evangelista, and S. Cavaliere, A physical model of stiff strings, in Proc. Institute of Acoustics (Internat. Symp. on Music and Acoustics), vol. 19, pp. 19 4, Edinburgh, UK, August [3] S. Cavaliere and G. Evangelista, Deterministic least squares estimation of the Karplus-Strong synthesis parameter, in Proc. International Workshop on Physical Model Synthesis, pp , Firenze, Italy, June [33] G. Evangelista and S. Cavaliere, Discrete frequency warped wavelets: theory and applications, IEEE Trans. Signal Processing, vol. 46, no. 4, pp , [34] A. Härmä, M. Karjalainen, L. Savioja, V. Välimäki, U. K. Laine, and J. Huopaniemi, Frequency-warped signal processing for audio applications, Journal of the Audio Engineering Society, vol. 48, no. 11, pp ,. [35] N. H. Fletcher and T. D. Rossing, Principles of Vibration and Sound, Springer-Verlag, New York, NY, USA, [36] L. D. Landau and E. M. Lifšits, Theory of Elasticity, Editions Mir, Moscow, Russia, [37] N. Dunford and J. T. Schwartz, Linear Operators; Part : Spectral Theory, Self Adjoint Operators in Hilbert Space, John Wiley & Sons, New York, NY, USA, 1st edition, [38] I. Testa, Sintesi del suono generato dalle corde vibranti: un algoritmo basato su un modello dispersivo, Physics degree thesis, Università Federico II di Napoli, Napoli, Italy, [39] H. W. Strube, Linear prediction on a warped frequency scale, Journal of the Acoustical Society of America, vol. 68, no. 4, pp , 198. [4] J. A. Moorer, The manifold joys of conformal mapping: applications to digital filtering in the studio, Journal of the Audio Engineering Society, vol. 31, no. 11, pp , [41] J.-M. Jot and A. Chaigne, Digital delay networks for designing artificial reverberators, in Proc. 9th Convention Audio Engineering Society, Paris, France, preprint no. 33, February, [4] M. Karjalainen, A. Härmä, and U. K. Laine, Realizable warped IIR filters and their properties, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 5 8, Munich, Germany, April [43] A. Härmä, Implementation of recursive filters having delay free loops, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp , Seattle, Wash, USA, May [44] P. W. Broome, Discrete orthonormal sequences, Journal of the ACM, vol. 1, no., pp , [45] A. V. Oppenheim, D. H. Johnson, and K. Steiglitz, Computation of spectra with unequal resolution using the fast Fourier transform, Proceedings of the IEEE, vol. 59, pp , [46] G. Evangelista and S. Cavaliere, Audio effects based on biorthogonal time-varying frequency warping, EURASIP Journal on Applied Signal Processing, vol. 1, no. 1, pp. 7 35, 1. [47] G. Evangelista and S. Cavaliere, Auditory modeling via frequency warped wavelet transform, in Proc. European Signal Processing Conference, vol. I, pp , Rhodes, Greece, September [48] G. Evangelista and S. Cavaliere, Dispersive and pitchsynchronous processing of sounds, in Proc. Digital Audio Effects Workshop, pp. 3 36, Barcelona, Spain, November [49] G. Evangelista and S. Cavaliere, Analysis and regularization of inharmonic sounds via pitch-synchronous frequency warped wavelets, in Proc International Computer Music Conference, pp , Thessaloniki, Greece, September [5] L. Savioja and V. Välimäki, Reducing the dispersion error in the digital waveguide mesh using interpolation and frequency-warping techniques, IEEE Trans. Speech, and Audio Processing, vol. 8, no., pp ,. [51] L. Savioja and V. Välimäki, Multiwarping for enhancing the frequency accuracy of digital waveguide mesh simulations, IEEE Signal Processing Letters, vol. 8, no. 5, pp , 1. [5] G. Evangelista, Dyadic Warped Wavelets, vol. 117 of Advances in Imaging and Electron Physics, Academic Press, 1.

Physically Inspired Models 977 I. Testa was born in Napoli, Italy, on September 1, 1973.

In the following years, he has been engaged in the didactics of physics research, in the field of secondary school teacher training on the use of computer-based activities and in teaching computer

lileo Ferraris, Napoli. G. Evangelista received the Laurea in physics (with the highest honor) from the University of Napoli, Napoli, Italy, in 1984 andthem.s.andph.d.degreesinelectrical engineering from the University of California, Irvine, in 1987 and 199, respectively.

55 Physically Inspired Models 977 I. Testa was born in Napoli, Italy, on September 1, He received the Laurea in Physics from University of Napoli Federico II in 1997 with a dissertation on physical modeling of vibrating strings. In the following years, he has been engaged in the didactics of physics research, in the field of secondary school teacher training on the use of computer-based activities and in teaching computer architecture for the information sciences course. He is currently teaching electronics and telecommunications at the Vocational School, Galileo Ferraris, Napoli. G. Evangelista received the Laurea in physics (with the highest honor) from the University of Napoli, Napoli, Italy, in 1984 andthem.s.andph.d.degreesinelectrical engineering from the University of California, Irvine, in 1987 and 199, respectively. Since 1995, he has been an Assistant Professor with the Department of Physical Sciences, University of Napoli Federico II. From 1998 to, he was a Scientific Adjunct with the Laboratory for Audiovisual Communications, Swiss Federal Institute of Technology, Lausanne, Switzerland. From 1985 to 1986, he worked at the Centre d Etudes de Mathématique et Acoustique Musicale (CEMAMu/CNET), Paris, France, where he contributed to the development of a DSP-based sound synthesis system, and from 1991 to 1994, he was a Research Engineer at the Microgravity Advanced Research and Support Center, Napoli, where he was engaged in research in image processing applied to fluid motion analysis and material science. His interests include digital audio, speech, music, and image processing; coding; wavelets and multirate signal processing. Dr. Evangelista was a recipient of the Fulbright fellowship. S. Cavaliere received the Laurea in electronic engineering (with the highest honer) from the University of Napoli Federico II, Napoli, Italy, in Since 1974, he has been with the Department of Physical Sciences, University of Napoli, first as a Research Associate and then as an Associate Professor. From 197 to 1973, he was with CNR at the University of Siena. In 1986, he spent an academic year at the Media Laboratory, Massachusetts Institute of Technology, Cambridge. From 1987 to 1991, he received a research grant for a project devoted to the design of VLSI chips for real-time sound processing and for the realization of the Musical Audio Research Station, workstation for sound manipulation, IRIS, Rome, Italy. He has also been a Research Associate with INFN for the realization of very-large systems for data acquisition from nuclear physics experiments (KLOE in Frascati and ARGO in Tibet) and for the development of techniques for the detection of signals in high-level noise in the Virgo experiment. His interests include sound and music signal processing, in particular for the Web, signal transforms and representations, VLSI, and specialized computers for sound manipulation.

56 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation Digital Waveguides versus Finite Difference Structures: Equivalence and Mixed Modeling Matti Karjalainen Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, 15 Espoo, Finland matti.karjalainen@hut.fi Cumhur Erkut Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, 15 Espoo, Finland cumhur.erkut@hut.fi Received 3 June 3; Revised 4 December 3 Digital waveguides and finite difference time domain schemes have been used in physical modeling of spatially distributed systems. Both of them are known to provide exact modeling of ideal one-dimensional (1D) band-limited wave propagation, and both of them can be composed to approximate two-dimensional (D) and three-dimensional (3D) mesh structures. Their equal capabilities in physical modeling have been shown for special cases and have been assumed to cover generalized cases as well. The ability to form mixed models by joining substructures of both classes through converter elements has been proposed recently. In this paper, we formulate a general digital signal processing (DSP)-oriented framework where the functional equivalence of these two approaches is systematically elaborated and the conditions of building mixed models are studied. An example of mixed modeling of a -D waveguide is presented. Keywords and phrases: acoustic signal processing, hybrid models, digital waveguides, scattering, FDTD model structures. 1. INTRODUCTION Discrete-time simulation of spatially distributed acoustic systems for sound and voice synthesis finds its roots both in modeling of speech production and musical instruments. The Kelly-Lochbaumvocal tractmodel [1] introduced a onedimensional transmission line simulation of speech production with two-directional delay lines and scattering junctions for nonhomogeneous vocal tract profiles. Delay sections discretize the d Alembert solution of the wave equation [] and the scattering junctions implement the acoustic continuity laws of pressure and volume velocity in a tube of varying diameter. Further simplification led to the synthesis models used as the basis for linear prediction of speech [3]. A similar modeling approach to musical instruments, such as string and wind instruments, was formulated later and named the technique of digital waveguides (DWGs) [4, 5]. For computational efficiency reasons, in DWGs twodirectional delay lines are often reduced to single delay loops [6]. DWGs have been further discussed in two-dimensional (D) and three-dimensional (3D) modeling [5, 7, 8, 9, 1], combined sometimes with a finite difference approach into DWG meshes. Finite difference schemes [11] were introduced to the simulation of vibrating string as a numerical integration solution of the wave equation [1, 13], and the approach has been developed further for example in [14] as a finite difference time domain (FDTD) simulation. The second-order finite difference scheme including propagation losses was formulated as a digital filter structure in [15], and its stability issues were discussed in [16]. This particular structure is the main focus of the finite difference discussions in the rest of this paper and we will refer to it as the FDTD model structure. DWG and FDTD approaches to discrete-time simulation of spatially distributed systems show a high degree of functional equivalence. As discussed in [5], in the onedimensional band-limited case, the ideal wave propagation can be exactly modeled by both methods. The basic difference is that the FDTD model structures process the signals as they are, whereas DWGs process their wave decomposition. There are other known differences between DWGs and FDTD model structures. One of them is the instabilities ( spurious responses) found in FDTD model structures, but not in DWGs, to specific excitations. Another difference is the numeric behavior in finite precision computation.

57 Digital Waveguides versus Finite Difference Structures 979 Comparison of these two different paradigms has been developed further in [1, 17, 18]. In [17], the interesting and important possibility of building mixed models with submodels of DWG and FDTD types was introduced and generalized to elements with arbitrary wave impedances in [18]. The problem of functional comparison and compatibility analysis has remained, however, and is the topic of this paper. The rest of the paper is organized as follows. Section provides the background information and notation that will be used in the following sections. A summary of wavebased modeling and finite difference modeling is also included in this section. Section 3 provides the derivation of the FDTD model structures, including the source terms, scattering, and the continuity laws. Based on the wave equation in the acoustical domain, this section highlights the functional equivalence of DWGs and FDTD model structures. It also presents a way of building mixed models. The formal proofs of equivalence are provided in Appendix. Section 4 is devoted to real-time implementation of mixed models. Finally, Section 5 draws conclusions and indicates future directions.. BACKGROUND Sound synthesis algorithms that simulate spatially distributed acoustic systems usually provide discrete-time solutions to a hyperbolic partial differential equation, that is, the wave equation. According to the domain of simulation, the variables correspond to different physical quantities. The physical variables may further be characterized by their mathematical nature. An across variable is defined here to describe a difference between two values of an irrotational potential function (a function that integrates or sums up to zero over closed trajectories), whereas a through variable is defined here to describe a solenoidal function (a quantity that integrates or sums-up to zero over closed surfaces). For example in the acoustical domain, the deviation from the steady-state pressure p(x, t) is an across variable and the volume velocity u(x, t) is a through variable, where x is the spatial vector variable and t is the temporal scalar variable. Similarly, in the mechanical domain, the across variable is the force and the through variable is the velocity. The ratio of the through and across variables yields the impedance Z. The admittance is the inverse of Z, that is, Y = 1/Z. In a one-dimensional (1D) medium, the spatial vector variable reduces to a scalar variable x, so that in a homogeneous, lossless, unbounded, and source-free medium the wave equation is written y tt = c y xx, (1) where y is a physical variable, subscript ttrefers to the second partial derivative in time t, xx to the second partial derivative in spatial variable x, andc is speed of wavefront in the medium of interest. For example in the mechanical domain (e.g., vibrating string) we are primarily interested in transversalwavemotionforwhichc = T/µ,whereT is tension force and µ is mass per unit length of the string []. The impedance is closely related to the tension T, massdensityµ, and the propagation speed c and is given by Z = Tµ = T/c. In the acoustical domain, the admittance is also related to the acoustical propagation speed c. For instance, the admittance of a tube with a constant cross-section area A is given by Y = A ρc, () where ρ is the gas density in the tube. The two common forms of discretizing the wave equation for numerical simulation are through traveling wave solution and by finite difference formulation..1. Wave-based modeling The traveling wave formulation is based on the d Alembert solution of propagation of two opposite direction waves, that is, y(x, t) = y(x ct) y(x ct). (3) Here, the arrows denote the right-going and the left-going components of the total waveform. Assuming that the signals are bandlimited to half of sampling rate, we may sample the traveling waves without losing any information by selecting T as the sample interval and X the position interval between samples so that T = X/c. Sampling is applied in a discrete time-space grid in which n and k are related to time and position, respectively. The discretized version of (3) becomes [5]: y(k, n) = y(k n) y(k n). (4) It follows that the wave propagation can be computed by updating state variables in two delay lines by yk,n1 = y k 1,n, yk,n1 = y k1,n, (5) that is, by simply shifting the samples to the right and left, respectively. The shift is implemented with a pair of delay lines, and this kind of discrete-time modeling is called DWG modeling [5]. Since the physical variables are split into directional wave components, we will refer to such models as W-models. According to (3) or(4), a single physical variable (either through or across) is computed by summing the traveling waves, whereas the other one may be computed implicitly via the impedance. If the medium is nonhomogeneous, then the admittance varies as a function of the spatial variable. In this case, the energy transfer between the wave components should be computed according to Kirchhoff-type of continuity laws, ensuring that the total energy is preserved. These laws may be derived utilizing the irrotational and solenoidal nature of across and through variables, respectively. In the DWG equivalent, the change in Y across a junction of the waveguide sections causes scattering and the scattering junctions of interconnected ports, with given admittances and wave variables,

58 98 EURASIP Journal on Applied Signal Processing have to be formulated [5]. For instance, in a parallel junction of N waveguides in the acoustical domain, the Kirchhoff constraints are P 1 = P = =P N = P J, (6) U 1 U U N U ext =, where P i and U i are the total pressure and volume velocity of the ith branch 1,respectively,P J is the common pressure of coupled branches, and U ext is an external volume velocity to the junction. Such a junction is illustrated in Figure 1. When port pressures are represented by incoming wave components P i, outgoing wave components by P i, admittances attached to each port by Y i,and P i = P i P i, U i = Y i P i, (7) the junction pressure P J can be obtained as P J = 1 ( ) N U ext Y i P i, (8) Y tot where Y tot = N i=1 Y i is the sum of all admittances to the junction. Outgoing pressure waves are obtained from (7) to yield Pi = P J P i. The resulting junction, a W-node, is depicted in Figure. The delay lines or termination admittances (see appendix) are connected to the W-ports of a W- node. A useful addition to DWG theory is to adopt wave digital filters (WDF) [1, 19] as discrete-time simulators of lumped parameter elements. Being based on W-modeling, they are computationally compatible with the W-type DWGs [1, 18, ]... Finite difference modeling In the most commonly used way to discretize the wave equation by finite differences, the partial derivatives in (1) areap- proximated by centered differences. The centered difference approximation to the spatial partial derivative y x is given by [11] ( ) y(x x/, t) y(x x/, t) y x, (9) x where x is the spatial sampling interval. A similar expression is obtained for the temporal partial derivative, if x is kept constant and t is replaced by t ± t, where t is the discrete-time sampling interval. Iterating the difference approximations, second-order partial derivatives in (1) areapproximated by ( ) yx x,t y x,t y x x,t y xx x, ( ) (1) yx,t t y x,t y x,t t y tt t, 1 Note that capital letters denote a transform variable. For instance, P i is the z-transform of the signal p i (n). i=1 Y 1 Y U ext P 1 P 1 P J Figure 1: Parallel junction of admittances Y i with associated pressure waves indicated. A volume velocity input U ext is also attached. where the short-hand notation y x,t is used instead of y(x, t). By selecting t = x/c, and using index notation k = x/ x and n = t/ t,(1)resultin Y n P y k,n1 = y k 1,n y k1,n y k,n 1. (11) From (11) we can see that a new sample y k,n1 at position k and time index n 1 is computed as the sum of its neighboring position values minus the value at the position itself one sample period earlier. Since y k,n1 is a physical variable, we will refer to models based on finite differences as K-models, with a reference to Kirchhoff type of physical variables. 3. FORMULATION OF THE FDTD MODEL STRUCTURE The equivalence of the traveling wave and the finite difference solution of the ideal wave equation (given in (5)and(11), respectively) has been shown, for instance, in [5]. Based on this functional equivalence, (11) has been previously expanded without a formal derivation to a scattering junction with arbitrary port impedances, where (8) is used as a template for the expansion [18]. The resulting FDTD model structure is illustrated in Figure 3 for a three-port junction. A comparison of the FDTD model structure in Figure 3 and the DWG scattering junction in Figure reveals the functional similarities of the two methods. However, a formal, generalized, and unified derivation of the FDTD model structure without an explicit reference to the DWG method remains to be presented. This section presents such a derivation based on the equations of motion of the gas in a tube. Note that, because of the analogy between different physical domains, once the formulation is derived, it can be used in different domains as well. Therefore, the derivation below is not limited to the acoustical domain and the resulting structure can also be used in other domains Source terms In order to explain the excitation U ext and the associated filter H(z) = 1 z in Figure 3, we consider a piece of tube of P 3 P P 3 Y 3

59 Digital Waveguides versus Finite Difference Structures 981 W-port 3 U ext Y P 3 P3 3 Y 1 Y 3 W-node N 1 Y W-admittance P 1 W-port 1 P 1 Y 1 1 Yi P J Y P W-port P z N W-line z N (a) Y 1 U ext Y Y N w Y 1 w w N 1 w w W-line w w N N w w W-line w P J (b) Figure : (a) N-port scattering junction (three ports are shown) of ports with admittances Y i. Incoming and outgoing pressure waves are P i and Pi, respectively. W-port 1 is terminated by admittance Y 1. (b) Abstract representation of the W-node in (a). U ext K-port 3 Y 3 Y z 1 Y 3 K-node N 1 Y K-admittance Y 1 1 Yi Y K-pipe z 1 K-port 1 P J K-port z 1 z 1 (a) Y 1 U ext Y Y N k Y 1 k k N 1 k k K-pipe k k N N k k K-pipe k P J (b) Figure 3: (a) Digital filter structure for finite difference approximation of a three-port scattering node with port admittances Y i.onlytotal velocity P J (K-variable) is explicitly available. (b) Abstract representation of the K-node in (a).

60 98 EURASIP Journal on Applied Signal Processing constant cross-sectional area A that includes an ideal volume velocity source s(t). The pressure p and volume velocity u (the variables in the acoustical domain, as explained in the previous section) satisfy the following PDE set: ρ u t A p x = A p ρc t u = s, (1) x where ρ is the gas density and c is the propagation speed. This set may be combined to yield a single PDE in p and the source term Defining p t ρc s A t = c p x. (13) s(t) = 1 ( ( s t t ) ( s t t )) O ( t ), (14) using index notation k = x/ x and n = t/ t, and applying centered differences (see Section.)to(13)with x/ t = c yields the following difference equation p k (n 1)= p k1 (n)p k 1 (n) p k (n 1) ρc x ( sk (n 1) s k (n 1) ). A (15) Note that ρc/a is the acoustic impedance that converts the volume velocity source s(t) to the pressure. Since the model output is the pressure at the time step n 1, it follows that the source is delayed two samples, subtracted from its current value, and scaled, corresponding to the filter 1 z for U ext in Figure Admittance discontinuity and scattering Now consider an unbounded, source-free tube with a crosssection A(x) that is a smooth real function of spatial variable x. In this case, the governing PDEs can be combined into a single PDE in the pressure alone [1], p t = c A(x) ( x A(x) p x ) (16) which is the Webster horn equation. Discretizing this equation by centered differences yields the following difference equation p k (n 1) p k (n)p k (n 1) t = c A k A k1/ ( pk1 (n) p k (n) ) ( A k 1/ pk (n) p k 1 (n) ) x, (17) where A k = A(k x). By selecting x = c t and using the following approximation A k = 1 ( Ak 1/ A k1/ ) O ( x ) (18) twice, (17) becomes p k (n 1)p k (n 1) ( = Ak 1/ p k 1 (n)a k1/ p k1 (n) ). (19) A k 1/ A k1/ Finally, by defining Y k 1 = A k 1/ /ρc we obtain p k (n 1)p k (n 1) = Y tot ( Yk 1 p k 1 (n)y k1 p k1 (n) ), () where the term Y tot = Y k 1 Y k1 may be interpreted as the sum of all admittances connected to the kth cell. This recursion is implemented with the filter structure illustrated in Figure 4. The output of the structure is the junction pressure p J,k (n).itisworthtonotethat() is functionally the same as the DWG scattering representation given in (8), if the admittances are real. A more general case of complex admittances has been considered in the appendix. Whereas the DWG formulation can easily be extended to N-port junctions, this extension is not necessarily possible for a K-model, where the continuity laws are generally not satisfied. In the next subsection, we investigate the continuity laws within the FDTD model structure Continuity laws We denote the pressure across the impedance 1/ Y i as p a (n), and the volume velocity through the same impedance as u t (n), with a reference to Figure 4. According to these notations, Ohm s law in the acoustical domain yields p a (n) = u t (n)/y tot, (1) whereas the Kirchhoff continuity laws can be written as p a (n) = p k (n 1)p k (n 1), () u t (n) = Y k 1 p k 1 (n)y k1 p k1 (n). (3) Inserting (1) into (3) eliminates u t (n), and the result may be combined with () to give the following equation for combined continuity laws: p k (n 1)p k (n 1) = Y tot ( Yk 1 p k 1 (n)y k1 p k1 (n) ). (4) This relation is exactly the recursion of the FDTD model structure given in (), but obtained here solely from the continuity laws. We thus conclude that the continuity laws are automatically satisfied by the FDTD model structure of Figure 4. It is worth to note that more ports may be added to the structure without violating the continuity laws for any number of linear, time-invariant (LTI) admittances, as long as Y tot = Y i.forn ports connected to the ith cell, (3) becomes N U t = z 1 Y i P J,i (5) i=1

61 Digital Waveguides versus Finite Difference Structures 983 Y k 1 Y k1 p a (n) u t (n) 1 Yi p k (n 1)= p J,K z 1 z 1 p k 1 (n) p k1 (n) Figure 4: Digital filter structure for finite difference approximation of an unbounded, source-free tube with a spatially varying cross section. z 1 K-port Y 1 N 1 Y z 1 1 Y 1 Y z 1 P 1 K-port Y z z 1 KW-converter W-port P P Y N Y 3 1 Y Y 3 P W-port Figure 5: FDTD node (left) and a DWG node (right) forming a part of a hybrid waveguide. There is a KW-converter between K- and W- models. Y i are wave admittances of W-lines, K-pipes, and KW-converter between junction nodes. P 1 and P are the junction pressures of the K-node and W-node, respectively. andtherecursionin(4) canbeexpressedin z-domain as P J,k z P J,k = Yi N i=1 z 1 Y i P J,i. (6) The superposition of the excitation block in (14) and the N-port formulation above completes the formulation of the FDTD model structure. In particular, by setting N = 3 the digital filter structure in Figure 3 is obtained Construction of mixed models An essential difference between DWGs of Figure and FDTD model structures of Figure 3 is that while DWG junctions are connected through two-directional delay lines (W-lines), FDTD nodes have two unit delays of internal memory and delay-free K-pipes connecting ports between nodes. These junction nodes and ports are thus not directly compatible. The next question is the possibility to interface these submodels. The interconnection of a lossy FDTD model structure and a similar DWG has been tackled in [17]. A proper interconnection element (converter) has been proposed for the resulting hybrid model in this special case. A generalization has been proposed in [18],whichallowstomakeanyhybrid model of K-elements (FDTD) and W-elements having arbitrary wave admittances/impedances at their ports (see also [1]). Here,wederivehowahybridmodel(showninFigure 5) can be constructed in a 1-D waveguide between a K-node N 1 (left) and a W-node N (right), aligned with the spatial grids k = 1 and, respectively. The derivation is based on the fact that the junction pressures are available in both types of nodes, but in the DWG case not at the W-ports. If N 1 and N would be both W-nodes (see Figure 8 in the appendix), the traveling wave entering into the node N could be calculated as P = z 1 P1 = z 1( P 1 z 1 ) P = z 1 P 1 z P. (7) Note that P 1 is available in the K-node N 1 in Figure 5. Conversely, if N 1 and N would be both K-nodes, the junction pressure z 1 P would be needed for calculation of P 1 (see Figure 1 in the appendix). Although P is implicitly available in N, it can also be obtained by summing up the wave

62 984 EURASIP Journal on Applied Signal Processing yt yt yt w w w wl wl wl wl w wl w wl w wl w yt kw kw wl kp k kp k kw w wl w yt kp kp wl kp k kp k kw w wl w yt kp kp wl Figure 6: Part of a D waveguide mesh composed of (a) K-type FDTD elements (left bottom): K-pipes (kp) and K-nodes (k), (b) W-type DWG elements (top and right): delay-controllable W-lines (wl), W-nodes (w), and terminating admittances (yt), and (c) converter elements (kw) to connect K- and W-type elements into a mixed model. components within the converter z 1 P = z 1( P ) P. (8) Equation (7) may be inserted in (8) to yield the following transfer matrix of the -port KW-converter element [ ] [ ][ ] P 1 z z z 1 = P 1 ( 1 P 1 1 z ). (9) The KW-converter in Figure 5 essentially performs the calculations given in (9) and interconnects the K-type port of an FDTD node and the W-type port of a DWG node. The signal behavior in a mixed modeling structure is further investigated in the appendix. 4. IMPLEMENTATION OF MIXED MODELS The functional equivalence and mixed modeling paradigm of DWGs and FDTDs presented above allows for flexible building of physical models from K- and W-type of substructures. In this way, it is possible to exploit the advantages of each type. In this section, we will explore a simple example of digital waveguide model that shows how the mixed models can be built. Before that, a short discussion on the pros and cons of the different paradigms in practical realizations is presented. P 4.1. K-modeling versus W-modeling, pros and cons An advantage of W-modeling is in its numerical robustness. By proper formulation, the stability is guaranteed also with fixed-point arithmetics [5, 19]. Another useful property is the relatively straightforward way of using fractional delays [] when building digital waveguides, which makes for example tuning and run time variation of musical instrument models convenient. In general, it seems that W-modeling is the right choice in most 1-D cases. The advantages of K-modeling by FDTD waveguides are found when realizing mesh-like structures, such as -D and 3-Dmeshes[7, 8]. In such cases, the number of unit delays (memory positions) is two for any dimensionality, while for a DWG mesh it is two times the dimensionality of the mesh. A disadvantage of FDTDs is their inherent lack of numerical robustness and tendency of instability for signal frequencies near DC and the Nyquist frequency. Furthermore, FDTD junction nodes cannot be made memoryless, which may be a limitation in nonlinear and parametrically varying models D waveguide mesh case Figure 6 illustrates a part of a -D mixed model structure that is based on a rectangular FDTD waveguide mesh for efficient and memory-saving computation and DWG elements at boundaries. Such model could be for example a membrane

63 Digital Waveguides versus Finite Difference Structures 985 of a drum or in a 3-D case a room enclosed by walls. When there is need to attach W-type termination admittances to the model or to vary the propagation delays within the system, a change from K-elements to W-elements through converters is a useful property. Furthermore, variable-length delays can be used, for example, for passive nonlinearities at the terminations to simulate gongs and other instruments where nonlinear mode coupling takes place [3]. The same principle can be used to simulate shock waves in brass instrument bores [4]. In such cases, the delay lengths are made dependent on the signal value passing through the delay elements. In Figure 6, the elements denoted by kp are K-type pipes between K-type nodes. Elements kw are K-to-W converters and elements wl are W-lines, where the arrows indicate that they are controllable fractional delays. Elements yt are terminating admittances. In a general case, scattering can be controlled by varying the admittances, although the computational efficiency is improved if the admittances are made equal. In a modern PC, a -D mesh of a few hundred elements can run in real time at full audio rate. By decimated computation, bigger models can be computed if a lower cutoff frequency is permitted, allowing large physical dimensions of the mesh Mixed modeling in BlockCompiler The development of the K- and W-models above has led to a systematic formulation of computational elements for both paradigms and mixed modeling. The W-lines and K-pipes as well as related junction nodes are useful abstractions for a formal specification of model implementation. We have developed a software tool for physical modeling called the BlockCompiler [] that is designed in particular for flexible modeling and efficient real-time computation of the models. The BlockCompiler contains two levels: (a) model creation and (b) model implementation. The model creation level is written in the Common Lisp programming language for maximal flexibility in symbolic object-based manipulation of model structures. A set of DSP-oriented and physics-oriented computational blocks are available. New block classes can be created either as macro classes composed of predefined elementary blocks or by writing new elementary blocks. The blocks are connected through ports: inputs and outputs for DSP blocks and K- or W-type ports for physical blocks. A full interconnected model is called a patch. The model implementation level is a code generator that does the scheduling of the blocks, writes C source code into a file, compiles it on the fly, and allows for streaming sound in real time or computation by stepping in a sample-bysample mode. The C code can also be exported to other platforms, such as the Mustajuuri audio platform [5] and pd [6]. Sound examples of mixed models can be found at 5. SUMMARY AND CONCLUSIONS This paper has presented a formulation of a specific FDTD model structure and showed its functional equivalence to the DWGs. Furthermore, an example of mixed models consisting of FDTD and DWG blocks and converter elements is reported. The formulation allows for high flexibility in building 1D or higher dimensional physical models from interconnected blocks. The DWG method is used as a primary example to the wave-based methods in this paper. Naturally, the KWconverter formulation is applicable to any W-method, such as the wave digital filters (WDFs) [19]. In the future, we plan to extend our examples to include WDF excitation blocks. Other important future directions are the analysis of the dynamic behavior of parametrically varying hybrid models, as well as benchmark tests for computational costs of the proposed structures. Matlab scripts and demos related to DWGs and FDTDs can be found at waveguide-modeling/. APPENDIX A. PROOFS OF EQUIVALENCE The proofs of functional equivalence between the DWG and FDTD formulations used in this article are given below. The approach useful for this can be based on the Thevenin and Norton theorems [7]. A.1. Termination in a DWG network Passive termination of a DWG junction port by a given admittance Y is equivalent to attaching a delay line of infinite length and wave admittance Y. In the DWG case, this means an infinite long sequence of admittance-matched unit delay lines. Since there is no back-scattering in finite time, we can use the left-side port termination of Figure, withzerovol- ume velocity in input terminal. Thus, admittance filter Y 1 is not needed in computation, it has only to be included in making the filter 1/ Y i. A.. Termination in an FDTD network Deriving the passive port termination for an FDTD junction is not as obvious as for a DWG junction. We can apply again an infinitely long sequence of admittance-matched FDTD sections, as depicted in Figure 7 on the left-hand side. With the notations given and z-transforms of variables and admittances we can denote P = Y 1 P 1 z 1 M Y i P i z 1 P z, Yi Yi i= P 1 = P z 1 P z 1 P 1 z, P k = P k1 z 1 P k 1 z 1 P k z, (A.1a) (A.1b) fork< 1, (A.1c) where P i, i = 1,...,M, are pressures of all M neighboring junctions linked through admittances Y i to junction i =, and P k,wherek =, 1,,... are pressures in junctions between admittance-matched elements chained as termination of junction. By applying (A.1c)to(A.1b) iteratively for

64 986 EURASIP Journal on Applied Signal Processing Y 1 Y 1 Yi P P 1 P z 1 z 1 z 1 z 1 z 1 z 1 Figure 7: FDTD structure terminated by admittance-matched chain of FDTD elements on the left-hand side. U ext Y 1 Y z 1 P Y Y Y 1 Y Y Y 3 P 1 P 1 z 1 P P Figure 8: Structure for derivation of signal behavior in a DWG network. k =,...,N we get P 1 = P z 1 P N 1 z N P N z N 1. (A.) When N, the last two terms cease to have effect on P 1 in any finite time span and they can thus be discarded. When the result P 1 = P z 1 is used in (A.1a), we get { } Y1 P = P z 1 z 1 M Y i P i z 1 P z, Yi Yi i= (A.3) where the first term on the right-hand side can be interpreted as a way to implement the termination as a feedback through a unit delay as illustrated in Figure 3 for the left-hand port of the FDTD junction. U z Y 1 Y 1 Y 1 Y z 1 P J z 1 A.3. Signal behavior in a DWG network Figure 8 illustrates a case where an arbitrarily large interconnected DWG network is reduced so that only two scattering junctions, connected through unit delay line of wave admittance Y, are shown explicitly. Norton equivalent source U ext is feeding junction node 1 and an equivalent termination admittance is Y 1. Junction node is terminated by a Norton equivalent admittance Y 3. Now, we derive the signal propagation from U ext to junction pressure P 1 and transmission ratio between pressures P and P 1. If these transfer functions are equal for the DWG, the FDTD, and the mixed case with KW-converter, the models are functionally equivalent z 1 Figure 9: FDTD structure for derivation of volume velocity source (U ext ) to junction pressure (P J ) transfer function. for any topologies and parametric values equivalent between these cases. This is due to the superposition principle and the Norton theorem. z 1

65 Digital Waveguides versus Finite Difference Structures 987 Y 1 Y Y Y 3 1 Y 1 Y 1 Y Y 3 z 1 z 1 P 1 P z 1 z 1 z 1 z 1 Figure 1: FDTD structure for derivation of signal relation between two junction pressures. Y 1 Y 1 z Y Y 3 1 Y Y 3 z 1 Y 1 Y P 1 P 1 z 1 W-to-K converter P z 1 z 1 Figure 11: Mixed modeling structure for derivation of DWG to FDTD pressure relation. From Figure 8, we can write directly for the propagation of equivalent source U ext to junction pressure P 1 as P 1 = U ext / ( Y 1 Y ). (A.4) Signal transmission ratio between P and P 1 can be derived from the following set of equations (A.5a), (A.5b), and (A.5c): P = Y P1 z 1, Y Y 3 P1 = P 1 P z 1, P = P P1 z 1. By eliminating wave variables P 1 and P, P 1 = P = ( P1 P z 1) ( 1 z ), ( P P 1 z 1) ( 1 z ), z 1 P = Y ( P1 P z 1) Y Y 3 1 z and by solving for P /P 1,weget P Y z 1 = P 1 Y Y 3 ( ) Y Y 3 z. (A.5a) (A.5b) (A.5c) (A.6) (A.7) In the special case of admittance match Y = Y 3,wegetP /P 1 = z 1.Forms(A.4) and(a.7) are now the reference to prove equivalence with FDTD and mixed modeling cases. A.4. Signal behavior in an FDTD network Using notations in Figure 9, which shows a Norton s equivalent for an FDTD network, we can write P J = U ext ( 1 z ) P J z Y 1 Y Y 1 P J z Y (A.8) P J z Y 1 Y Y 1 Y that after simplification yields P J = U ext / ( Y 1 Y ), (A.9) which is equivalent to the DWG form(a.4). Notice that form (1 z ) in feeding U ext to the node has zeros on the unit circle for angles nπ (n is integer), compensating poles inherent in the FDTD backbone structure. This degrades numerical robustness of the structure around these frequencies. For the structure of two FDTD nodes in Figure 1,wecan write equation P = P z Y 3 Y Y 3 P z Y Y Y 3 P 1 z 1, (A.1)

66 988 EURASIP Journal on Applied Signal Processing which simplifies to P Y z 1 = P 1 Y Y 3 ( ) Y Y 3 z (A.11) being equivalent to the DWG form (A.7). This completes proving the equivalence of the DWG and FDTD structures. A.5. Signal behavior in a mixed modeling structure To prove the equivalence of signal behavior also in the mixed modeling structure of Figure 5 with a KW-adaptor, we have to analyze the junction signal relations in both directions. We first prove the equivalence in the FDTD to DWG direction. According to Figure 5,wecanwrite P = Y Y Y 3 P 1 z 1 Y Y Y 3 P z, P = P ( P 1 z 1 P z ). (A.1) Eliminating P and solving for P /P 1 yields again form (A.7), proving the equivalence. According to Figure 11, we can analyze signal relationship in the DWG to FDTD direction by writing P = Y 3 Y Y 3 P z P z Y Y Y 3 ( P 1 P 1 z P z 1) z 1, P 1 = P 1 ( P z 1 P 1 z ). (A.13) By eliminating P1 and solving for P /P 1, we get again form (A.7). This concludes proving the equivalence of the mixed modeling case to corresponding DWG and thus also to FDTD structures. ACKNOWLEDGMENTS This work is part of the Algorithms for the Modelling of Acoustic Interactions (ALMA) project (IST ) and has been supported by the Academy of Finland as a part of the project Technology for Audio and Speech Processing (SA 53537). REFERENCES [1] J.L.KellyandC.C.Lochbaum, Speechsynthesis, inproc. 4th International Congress on Acoustics, pp. 1 4, Copenhagen, Denmark, September 196. [] N. H. Fletcher and T. D. Rossing, The Physics of Musical Instruments, Springer-Verlag, New York, NY, USA, nd edition, [3] J. D. Markel and A. H. Gray, Linear Prediction of Speech, Springer-Verlag, New York, NY, USA, [4] J. O. Smith, Physical modeling using digital waveguides, Computer Music Journal, vol. 16, no. 4, pp , 199. [5] J. O. Smith, Principles of digital waveguide models of musical instruments, in Applications of Digital Signal Processing to Audio and Acoustics, M. Kahrs and K. Brandenburg, Eds., pp , Kluwer Academic Publishers, Boston, Mass, USA, [6] M.Karjalainen,V.Välimäki, and T. Tolonen, Plucked-string models: From the Karplus-Strong algorithm to digital waveguides and beyond, Computer Music Journal, vol., no. 3, pp. 17 3, [7] S. A. Van Duyne and J. O. Smith, Physical modeling with the -D digital waveguide mesh, in Proc. International Computer Music Conference, pp. 4 47, Tokyo, Japan, September [8]L.Savioja,T.J.Rinne,andT.Takala, Simulationofroom acoustics with a 3-D finite difference mesh, in Proc. International Computer Music Conference, pp , Aarhus, Denmark, September [9] L. Savioja, Modeling techniques for virtual acoustics, Ph.D.thesis, Helsinki University of Technology, Espoo, Finland, [1] S. D. Bilbao, Wave and scattering methods for the numerical integration of partial differentialequations, Ph.D. thesis, Stanford University, Stanford, Calif, USA, May 1. [11] J. C. Strikwerda, Finite Difference Schemes and Partial Differential Equations, Wadsworth and Brooks/Cole, Pacific Grove, Calif, USA, [1] L. Hiller and P. Ruiz, Synthesizing musical sounds by solving the wave equation for vibrating objects: Part 1, Journal of the Audio Engineering Society, vol. 19, no. 6, pp , [13] L. Hiller and P. Ruiz, Synthesizing musical sounds by solving the wave equation for vibrating objects: Part, Journal of the Audio Engineering Society, vol. 19, no. 7, pp , [14] A. Chaigne, On the use of finite differences for musical synthesis. Application to plucked stringed instruments, Journal d Acoustique, vol. 5, no., pp , 199. [15] M. Karjalainen, 1-D digital waveguide modeling for improved sound synthesis, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, vol., pp , Orlando, Fla, USA, May. [16] C. Erkut and M. Karjalainen, Virtual strings based on a 1-D FDTD waveguide model: Stability, losses, and traveling waves, in Proc. Audio Engineering Society nd International Conference on Virtual, Synthetic and Entertainment Audio,pp , Espoo, Finland, June. [17] C. Erkut and M. Karjalainen, Finite difference method vs. digital waveguide method in string instrument modeling and synthesis, in Proc. International Symposium on Musical Acoustics, Mexico City, Mexico, December. [18] M. Karjalainen, C. Erkut, and L. Savioja, Compilation of unified physical models for efficient sound synthesis, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 5, pp , Hong Kong, China, April 3. [19] A. Fettweis, Wave digital filters: Theory and practice, Proc. IEEE, vol. 74, no., pp. 7 37, [] M. Karjalainen, BlockCompiler: Efficient simulation of acoustic and audio systems, in Proc. 114th Audio Engineering Society Convention, Amsterdam, Netherlands, March 3, preprint [1] M. Karjalainen, Time-domain physical modeling and realtime synthesis using mixed modeling paradigms, in Proc. Stockholm Music Acoustics Conference, vol. 1, pp , Stockholm, Sweden, August 3. [] T. I. Laakso, V. Välimäki, M. Karjalainen, and U. K. Laine, Splitting the unit delay-tools for fractional delay filter design, IEEE Signal Processing Magazine,vol.13,no.1,pp.3 6, [3] J. R. Pierce and S. A. Van Duyne, A passive nonlinear digital filter design which facilitates physics-based sound synthesis of highly nonlinear musical instruments, Journal of the Acoustical Society of America, vol. 11, no., pp , [4] R. Msallam, S. Dequidt, S. Tassart, and R. Caussé, Physical model of the trombone including nonlinear propagation ef-

Digital Waveguides versus Finite Difference Structures 989 fects, in Proc. International Symposium on Musical Acoustics, vol., pp. 419 44, Edinburgh, Scotland, UK, August 1997. [5] T.

67 Digital Waveguides versus Finite Difference Structures 989 fects, in Proc. International Symposium on Musical Acoustics, vol., pp , Edinburgh, Scotland, UK, August [5] T. Ilmonen, Mustajuuri an application and toolkit for interactive audio processing, in Proc. International Conference on Auditory Display, pp , Espoo, Finland, July 1. [6] M. Puckette, Pure data, in Proc. International Computer Music Conference, pp. 4 7, Thessaloniki, Greece, September [7] J. E. Brittain, Thevenin s theorem, IEEE Spectrum, vol. 7, no. 3, pp. 4, 199. Matti Karjalainen was born in Hankasalmi, Finland, in He received the M.S. and the Dr.Tech. degrees in electrical engineering from the Tampere University of Technology, in 197 and 1978, respectively. Since 198 he has been a professor in acoustics and audio signal processing at the Helsinki University of Technology in the Faculty of Electrical Engineering. In audio technology his interest is in audio signal processing, such as digital signal processing (DSP) for sound reproduction, perceptually based signal processing, as well as music DSP and sound synthesis. In addition to audio DSP, his research activities cover speech synthesis, analysis, and recognition, perceptual auditory modeling and spatial hearing, DSP hardware, software, and programming environments, as well as various branches of acoustics, including musical acoustics and modeling of musical instruments. He has written more than 3 scientific and engineering articles and contributed to organizing several conferences and workshops. Prof. Karjalainen is Audio Engineering Society (AES) fellow and member in Institute of Electrical and Electronics Engineers (IEEE), Acoustical Society of America (ASA), European Acoustics Association (EAA), International Computer Music Association (ICMA), European Speech Communication Association (ESCA), and several Finnish scientific and engineering societies. Cumhur Erkut was born in Istanbul, Turkey, in He received his B.S. and his M.S. degrees in electronics and communication engineering from the Yildiz Technical University, Istanbul, Turkey, in 1994 and 1997, respectively, and the Dr.Tech. degree in electrical engineering from the Helsinki University of Technology (HUT), Espoo, Finland, in. Between 1998 and, he worked as a researcher at the Laboratory of Acoustics and Audio Signal Processing of the HUT. He is currently a postdoctoral researcher in the same institution, where he contributes to the EU-funded research project Algorithms for the Modelling of Acoustic Interactions (ALMA, IST ). His primary research interests are model-based sound synthesis and musical acoustics.

68 EURASIP Journal on Applied Signal Processing 4:7, 99 1 c 4 Hindawi Publishing Corporation A Digital Synthesis Model of Double-Reed Wind Instruments Ph. Guillemain Centre National de la Recherche Scientifique, Laboratoire de Mécanique et d Acoustique, 31 chemin Joseph-Aiguier, 134 Marseille cedex, France guillem@lma.cnrs-mrs.fr Received 3 June 3; Revised 9 November 3 We present a real-time synthesis model for double-reed wind instruments based on a nonlinear physical model. One specificity of double-reed instruments, namely, the presence of a confined air jet in the embouchure, for which a physical model has been proposed recently, is included in the synthesis model. The synthesis procedure involves the use of the physical variables via a digital scheme giving the impedance relationship between pressure and flow in the time domain. Comparisons are made between the behavior of the model with and without the confined air jet in the case of a simple cylindrical bore and that of a more realistic bore, the geometry of which is an approximation of an oboe bore. Keywords and phrases: double-reed, synthesis, impedance. 1. INTRODUCTION The simulation of woodwind instrument sounds has been investigated for many years since the pioneer studies by Schumacher [1] on the clarinet, which did not focus on digital sound synthesis. Real-time-oriented techniques, such as the famous digital waveguide method (see, e.g., Smith [] and Välimäki [3]) and wave digital models [4] have been introduced in order to obtain efficient digital descriptions of resonators in terms of incoming and outgoing waves, and used to simulate various wind instruments. The resonator of a clarinet can be said to be approximately cylindrical as a first approximation, and its embouchure is large enough to be compatible with simple airflow models. In double-reed instruments, such as the oboe, the resonator is not cylindrical but conical and the size of the air jet is comparable to that of the embouchure. In this case, the dissipation of the air jet is no longer free, and the jet remains confined in the embouchure, giving rise to additional aerodynamic losses. Here, we describe a real-time digital synthesis model for double-reed instruments based on one hand on a recent study by Vergez et al. [5], in which the formation of the confined air jet in the embouchure is taken into account, and on the other hand on an extension of the method presented in [6] for synthesizing the clarinet. This method avoids the need for the incoming and outgoing wave decompositions, since it deals only with the relationship between the impedance variables, which makes it easy to transpose the physical model to a synthesis model. The physical model is first summarized in Section. In order to obtain the synthesis model, a suitable form of the flow model is then proposed, a dimensionless version is written and the similarities with single-reed models (see, e.g., [7]) are pointed out. The resonator model is obtained by associating several elementary impedances, and is described in terms of the acoustic pressure and flow. Section 3 presents the digital synthesis model, which requires first discrete-time equivalents of the reed displacement and the impedance relations. The explicit scheme solving the nonlinear model, which is similar to that proposed in [6], is then briefly summarized. In Section 4, the synthesis model is used to investigate the effects of the changes in the nonlinear characteristics induced by the confined air jet.. PHYSICAL MODEL The main physical components of the nonlinear synthesis model are as follows. (i) The linear oscillator modeling the first mode of reeds vibration. (ii) The nonlinear characteristics relating the flow to the pressure and to the reed displacement at the mouthpiece. (iii) The impedance equation linking pressure and flow. Figure 1 shows a highly simplified embouchure model for an oboe and the corresponding physical variables described in Sections.1 and..

69 A Digital Synthesis Model of Double-Reed Wind Instruments 991 p m y/ y/ H p j, v j p r, q Reeds Backbore Main bore Figure 1: Embouchure model and physical variables..1. Reed model Although this paper focuses on the simulation of doublereed instruments, oboe experiments have shown that the displacements of the two reeds are symmetrical [5, 8]. In this case, a classical single-mode model seems to suffice to describe the variations in the reed opening. The opening is based on the relative displacement y(t) of the two reeds when adifference in acoustic pressure occurs between the mouth pressure p m and the acoustic pressure p j (t) of the air jet formed in the reed channel. If we denote the resonance frequency, damping coefficient, and mass of the reeds ω r, q r and µ r, respectively, the relative displacement satisfies the equation d y(t) dt ω r q r dy(t) dt ω r y(t) = p m p j (t) µ r. (1) Based on the reed displacement, the opening of the reed channel denoted S i (t) is expressed by S i (t) = Θ ( y(t)h ) w ( y(t)h ), () where w denotes the width of the reed channel, H denotes the distance between the two reeds at rest (y(t)andp m = ) and Θ is the Heaviside function, the role of which is to keep the opening of the reeds positive by canceling it when y(t)h<... Nonlinear characteristics..1. Physical bases In the case of the clarinet or saxophone, it is generally recognized that the acoustic pressure p r (t)andvolumevelocity v r (t) at the entrance of the resonator are equal to the pressure p j (t)andvolumevelocityv j (t) of the air jet in the reed channel (see, e.g., [9]). In oboe-like instruments, the smallness of the reed channel leads to the formation of a confined air jet. According to a recent hypothesis [5], p r (t)isnolongerequal in this case to p j (t), but these quantities are related as follows p j (t) = p r (t) 1 q(t) ρψ S, (3) ra where Ψ is taken to be a constant related to the ratio between the cross section of the jet and the cross section at the entrance of the resonator, q(t) is the volume flow, and ρ is the mean air density. In what follows, we will assume that the area S ra, corresponding to the cross section of the reed channel at the point where the flow is spread over the whole cross section, is equal to the area S r at the entrance of the resonator. The relationship between the mouth pressure p m and the pressure of the air jet p j (t) and the velocity of the air jet v j (t) and the volume flow q(t), classically used when dealing with single-reed instruments, is based on the stationary Bernoulli equation rather than on the Backus model (see, e.g., [1]for justification and comparisons with measurements). This relationship, which is still valid here, is p m = p j (t) 1 ρv j(t), q(t) = S j (t)v j (t) = αs i (t)v j (t), where α, which is assumed to be constant, is the ratio between the cross section of the air jet S j (t) and the reed opening S i (t). It should be mentioned that the aim of this paper is to propose a digital sound synthesis model that takes the dissipation of the air jet in the reed channel into account. For a detailed physical description of this phenomenon, readers can consult [5], from which the notation used here was borrowed.... Flow model In the framework of the digital synthesis model on which this paper focuses, it is necessary to express the volume flow q(t) as a function of the difference between the mouth pressure p m and the pressure at the entrance of the resonator p r (t). From (4), we obtain (4) v j (t) = ρ ( pm p j (t) ), (5) q (t) = α S i (t) v j (t). (6) Substituting the value of p j (t)givenby(3) into (5)gives v j (t) = ρ Using (6), thisgives q (t) = α S i (t) ( ρ ( pm p r (t) ) Ψ q(t) S. (7) r ( pm p r (t) ) ) Ψ q(t) S, (8) r from which we obtain the expression for the volume flow, namely, the nonlinear characteristics q(t) = sign ( p m p r (t) ) αs i (t) 1Ψα S i (t) /S r ρ p m p r (t)..3. Dimensionless model The reed displacement and the nonlinear characteristics are converted into the dimensionless equations used in the synthesis model. For this purpose, we first take the reed displacement equation and replace the air jet pressure p j (t) by the (9)

70 99 EURASIP Journal on Applied Signal Processing expression involving the variables q(t) andp r (t) (equation (3)), d y(t) dt ω r q r dy(t) dt ωr y(t) = p m p r (t) ρψ q(t) µ r µ r S. r (1) On similar lines to what has been done in the case of singlereed instruments [11], y(t) is normalized with respect to the static beating-reed pressure p M defined by p M = Hω r µ r. We denote by γ the ratio, γ = p m /p M and replace y(t) by x(t), where the dimensionless reed displacement is defined by x(t) = y(t)/h γ. With these notations, (1)becomes 1 ω r d x(t) dt q r dx(t) x(t) = p r(t) ρψ q(t) ω r dt p M p M S r and the reed opening is expressed by (11) S i (t) = Θ ( 1 γ x(t) ) wh ( 1 γ x(t) ). (1) Likewise, we use the dimensionless acoustic pressure p e (t) and the dimensionless acoustic flow u e (t)definedby p e (t) = p r(t) p M, u e (t) = ρc S r q(t) p M, (13) where c is the speed of the sound. With these notations, the reed displacement and the nonlinear characteristics are finally rewritten as follows, 1 ω r d x(t) dt q r dx(t) x(t) = p e (t)ψβ u u e (t) (14) ω r dt and using (9)and(1), u e (t) = Θ ( 1 γ x(t) ) sign ( γ p e (t) ) ζ ( 1 γ x(t) ) ( ) γ p e (t) 1Ψβ x 1 γ x(t) = F ( x(t), p e (t) ), (15) where ζ, β x and β u are defined by ζ = ρ cαw H, β x = H α w µ r S r ω r S, β u = H ω r µ r r ρc. (16) This dimensionless model is comparable to the model described, for example, in [7, 9] in the case of single-reed instruments, where the dimensionless acoustic pressure p e (t), the dimensionless acoustic flow u e (t), and the dimensionless reed displacement x(t) are linked by the relations In addition to the parameter ζ, two other parameters β x and β u depend on the height H of the reed channel at rest. Although, for the sake of clarity in the notations, the variable t has been omitted, γ, ζ, β x,andβ u are functions of time (but slowly varying functions compared to the other variables). Taking the difference between the jet pressure and the resonator pressure into account results in a flow which is no longer proportional to the reed displacement, and a reed displacement which is no longer linked to p e (t) in an ordinary linear differential equation..4. Resonator model We now consider the simplified resonator of an oboe-like instrument. It is described as a truncated, divergent, linear conical bore connected to a mouthpiece including the backbore to which the reeds are attached, and an additional bore, the volume of which corresponds to the volume of the missing part of the cone. This model is identical to that summarized in [1] Cylindrical bore The dimensionless input impedance of a cylindrical bore is first expressed. By assuming that the radius of the bore is large in comparison with the boundary layers thicknesses, the classical Kirchhoff theory leads to the value of the complex wavenumber for a plane wave k(ω) = ω/c (i 3/ /)ηcω 1/,whereη is a constant depending on the radius R of the bore η = (/Rc 3/ )( l v (c p /c v 1) l t ). Typical values of the physical constants, in mks units, are l v = 4.1 8, l t = , C p /C v = 1.4 (see, e.g., [13]). The transfer function of a cylindrical bore of infinite length between x = andx = L, which constitutes the propagation filter associated with the Green formulation, including the propagation delay, dispersion, and dissipation, is then given by F(ω) = exp( ik(ω)l). Assuming that the radiation losses are negligible, the dimensionless input impedance of the cylindrical bore is classically expressed by C(ω) = i tan ( k(ω)l ). (18) In this equation, C(ω) is the ratio between the Fourier transforms P e (ω) andu e (ω) of the dimensionless variables p e (t)andu e (t)definedby(13). The input admittance of the cylindrical bore is denoted by C 1 (ω). A different formulation of the impedance relation of a cylindrical bore, which is compatible with a time-domain implementation, and was proposed in [6], is used and extended here. It consists in rewriting (18) as C(ω) = 1 1exp ( ik(ω)l ) exp ( ) ik(ω)l 1exp ( ). (19) ik(ω)l 1 d x(t) ωr dt q r dx(t) x(t) = p e (t), ω r dt u e (t) = Θ ( 1 γ x(t) ) sign ( γ p e (t) ) ζ ( 1 γ x(t) ) γ p e (t). (17) Figure shows the interpretation of (19) in terms of looped propagation filters. The transfer function of this model corresponds directly to the dimensionless input impedance of a cylindrical bore. It is the sum of two parts. The upper part corresponds to the first term of (19) and the

71 A Digital Synthesis Model of Double-Reed Wind Instruments 993 u e (t) u e (t) exp ( ik(ω)l ) exp ( ik(ω)l ) Figure : Impedance model of a cylindrical bore. x e c 1 (ω) 1 Figure 3: Impedance model of a conical bore. p e (t) p e (t) lower part corresponds to the second term. The filter having the transfer function F(ω) = exp( ik(ω)l) stands for the back and forth path of the dimensionless pressure waves, with a sign change at the open end of the bore. Although k(ω) includes both dissipation and dispersion, the dispersion is small (e.g., in the case of a cylindrical bore with a radius of 7 mm, η = ), and the peaks of the input impedance of a cylindrical bore can be said to be nearly harmonic. In particular, this intrinsic dispersion can be neglected, unlike the dispersion introduced by the geometry of the bore (e.g., the input impedance of a truncated conical bore cannot be assumed to be harmonic)..4.. Conical bore From the input impedance of the cylindrical bore, the dimensionless input impedance of the truncated, divergent, conical bore can be expressed as a parallel combination of a cylindrical bore and an air bore, S (ω) = 1 1/ ( iωx e /c ) 1/C(ω), () where x e is the distance between the apex and the input. It is expressed in terms of the angle θ of the cone and the input radius R as x e = R/ sin(θ/). The parameter η involved in the definition of C(ω) in (), which depends on the radius and characterizes the losses included in k(ω), is calculated by considering the radius of the cone at (5/1)L. This value was determined empirically, by comparing the impedance given by ()withan input impedance of the same conical bore obtained with a series of elementary cylinders with different diameters (stepped cone), using the transmission line theory. Denoting by D the differentiation operator D(ω) = iω and rewriting () in the form S (ω) = D(ω)(x e /c)/(1 D(ω)(x e /c)c 1 (ω)), we propose the equivalent scheme in Figure Oboe-like bore The complete bore is a conical bore combined with a mouthpiece. The mouthpiece consists of a combination of two bores, (i) a short cylindrical bore with length L 1,radiusR 1,surface S 1, and characteristic impedance Z 1. This is the backbore to which the reeds are attached. Its radius is small in comparison with that of the main conical bore, the characteristic impedance of which is denoted Z = ρc/s r,and (ii) an additional short cylindrical bore with length L,radius R,surfaceS, and characteristic impedance Z. Its radius is large in comparison with that of the backbore. This role serves to add a volume corresponding to the truncated part of the complete cone. This makes it possible to reduce the geometrical dispersion responsible for inharmonic impedance peaks in the combination backbore/conical bore. The impedance C 1 (ω) of the short cylindrical backbore is based on an approximation of i tan(k 1 (ω)l 1 )withsmall values of k 1 (ω)l 1. It takes the dissipation into account and neglects the dispersion. Assuming that the radius R 1 is large in comparison with the boundary layers thicknesses, using (19), C 1 (ω) is first approximated by C 1 (ω) 1 exp ( η 1 c ) ( ω/l 1 exp iωl1 /c ) 1exp ( η 1 c ) ( ω/l 1 exp iωl1 /c ), (1) which, since L 1 is small, is finally simplified as C 1 (ω) 1 exp ( η 1 c )( ω/l 1 1 iωl1 /c ) 1exp ( η 1 c ). () ω/l 1 By noting G(ω) = (1 exp( η 1 c ω/l 1 ))/(1 exp( η 1 c ω/l 1 )), and H(ω) = (L 1 /c)(1 G(ω)), the expression of C 1 (ω)reads C 1 (ω) = G(ω)iωH(ω). (3) This approximation avoids the need for a second delay line in the sampled formulation of the impedance. The transmission line equation relates the acoustic pressure p n and the flow u n at the entrance of a cylindrical bore (with characteristic impedance Z n,lengthl n,andwavenumber k n ) to the acoustic pressure p n1 and the flow u n1 at the exit of a cylindrical bore. With dimensioned variables, it reads p n (ω) = cos ( ) k n (ω)l n pn1 (ω)iz n sin ( ) k n (ω)l n un1 (ω), u n (ω) = i sin ( ) k n (ω)l n pn1 (ω)cos ( ) k n (ω)l n un1 (ω), Z n (4) yielding p n (ω) u n (ω) = p ( ) n1(ω)/u n1 (ω)iz n tan kn (ω)l n 1i/Z n tan ( ) k n (ω)l n pn1 (ω)/u n1 (ω). (5)

72 994 EURASIP Journal on Applied Signal Processing 1 (ω) Z 1 Z e (ω) p e (t) u e (t) 1 Z p e (t) (ω) (ω) Z V ρc H, p m ( ζ,βx,β u,γ ) Reed model Figure 4: Impedance model of the simplified resonator. u e (t) f x(t) p e (t) Using the notations introduced in () and(3), the input impedance of the combination backbore/main conical bore reads p 1 (ω) u 1 (ω) = Z S (ω)z 1 C 1 (ω) 1Z /Z 1 S (ω)c 1 (ω), (6) which is simplified as p 1 (ω)/u 1 (ω) = Z S (ω) Z 1 C 1 (ω), since Z 1 Z. In the same way, the input impedance of the whole bore reads p (ω) u (ω) = p ( ) 1(ω)/u 1 (ω)iz tan k (ω)l 1i/Z tan ( ) k (ω)l p1 (ω)/u 1 (ω), (7) which, since Z Z 1, is simplified as p (ω) u (ω) = p 1 (ω)/u 1 (ω) 1i/Z tan ( ) k (ω)l p1 (ω)/u 1 (ω). (8) Since L is small and the radius is large, the losses included in k (ω) can be neglected, and hence k (ω) = ω/c and tan(k (ω)l ) = (ω/c)l. Under these conditions, the input impedance of the bore is given by p (ω) u (ω) = 1 1/ ( p 1 (ω)/u 1 (ω) ) iω/c ( ) L /Z (9) 1 = 1/ ( Z S (ω)z 1 C 1 (ω) ) iω/c ( L S /ρc ). If we take V to denote the volume of the short additional bore V = L S and rewrite (9) with the dimensionless variables P e and U e (U e = Z u ), the dimensionless input impedance of the whole resonator relating the variables P e (ω)andu e (ω)becomes Z e (ω) = P e(ω) U e (ω) (3) 1/Z = iωv/ ( ρc ) 1/ ( Z 1 C 1 (ω)z S (ω) ). After rearranging (3), we propose the equivalent scheme in Figure 4. It can be seen from (3) that the mouthpiece is equivalent to a Helmholtz resonator consisting of a hemispherical cavity with volume V and radius R b such that V = (4/6)πR 3 b,connected to a short cylindrical bore with length L 1 and radius R 1. Figure 5: Nonlinear synthesis model..5. Summary of the physical model The complete dimensionless physical model consists of three equations, 1 ω r u e (t) = d x(t) dt q r dx(t) x(t) = p e (t)ψβ u u e (t), (31) ω r dt ζ ( 1 γ x(t) ) 1Ψβ x ( 1 γ x(t) ) Θ ( 1 γ x(t) ) sign ( γ p e (t) ) γ p e (t), (3) P e (ω) = Z e (ω)u e (ω). (33) These equations enable us to introduce the reed and the nonlinear characteristics in the form of two nonlinear loops, as shown in Figure 5. The first loop relates the output p e to the input u e of the resonator, as in the case of single-reed instruments models. The second nonlinear loop corresponds to the u e-dependent changes in x. The output of the model is given by the three coupled variables p e, u e,andx. The control parameters of the model are the length L of the main conical bore and the parameters H(t) andp m (t) fromwhichζ(t), β x (t), β u (t), and γ(t) are calculated. In the context of sound synthesis, it is necessary to calculate the external pressure. Here we consider only the propagation within the main cylindrical part of the bore in (). Assuming again that the radiation impedance can be neglected, the external pressure corresponds to the time derivative of the flow at the exit of the resonator p ext (t) = du s (t)/dt. Using the transmission line theory, one directly obtains U s (ω) = exp ( ik(ω)l )( P e (ω)u e (ω) ). (34) From the perceptual point of view, the quantity exp( ik(ω)l) can be left aside, since it stands for the losses corresponding to a single travel between the embouchure and the open end. This simplification leads to the following expression for the external pressure p ext (t) = d dt ( pe (t)u e (t) ). (35)

73 A Digital Synthesis Model of Double-Reed Wind Instruments DISCRETE-TIME MODEL In order to draw up the synthesis model, it is necessary to use a discrete formulation in the time domain for the reed displacement and the impedance models. The discretization schemes used here are similar to those described in [6] for the clarinet, and summarized in [1] for brass instruments and saxophones Reed displacement We take e(t) to denote the excitation of the reed e(t) = p e (t) Ψβ u u e (t). Using (31), the Fourier transform of the ratio X(ω)/E(ω) can be readily written as X(ω) E(ω) = ωr ωr ω. (36) iωq r ω r An inverse Fourier transform provides the impulse response h(t) of the reed model h(t) = ω ( r exp 1 ) ( 1 ) 4 q r ω rq r t sin 4 qr ω r t. (37) Equation (37) shows that h(t) satisfiesh() =. This property is most important in what follows. In addition, the range of variations allowed for q r is ], [. The discrete-time version of the impulse response uses two centered numerical differentiation schemes which provide unbiased estimates of the first and second derivatives when they are applied to sampled second-order polynomials iω f e ( z z 1 ), ω fe ( z z 1 ) (38), where z = exp(i ω), ω = ω/ f e,and f e is the sampling frequency. With these approximations, the digital transfer function of the reed is given by X(z) E(z) = z 1 fe /ωr f e q r / ( ) ω r z 1 ( fe /ωr 1 ) z ( f e q r / ( ) ), ω r f e /ωr (39) yielding a difference equation of the type x(n) = b 1a e(n 1) a 1a x(n 1) a a x(n ). (4) This difference equation keeps the property h() =. Figure 6 shows the frequency response of this approximated reed model (solid line) superimposed with the exact one (dotted line). This discrete reed model is stable under the condition ω r < f e 4 q r. Under this condition, the modulus of the poles of the transfer function is given by ( fe ω r q r )/( f e ω r q r ) and is always smaller than 1. This Hz Figure 6: Approximated (solid line) and exact (dotted line) reed frequency response with parameter values f r = 5 Hz, q r =., and f e = 44.1kHz. stability condition makes this discretization scheme unsuitable for use at low sampling rates, but in practice, at the CD quality sample rate, this problem does not arise for a reed resonance frequency of up to 5 khz with a quality factor of up to.5. For a more detailed discussion of discretization schemes, readers can consult, for example, [14]. The bilinear transformation does not provide a suitable discretization scheme for the reed displacement. In this case, the impulse response does not satisfy the property of the continuous model h() =. 3.. Impedance A time domain equivalent to the inverse Fourier transform of impedance Z e (ω) givenby(3) isnowrequired.herewe express p e (n) as a function of u e (n). The losses in the cylindrical bore element contributing to the impedance of the whole bore are modeled with a digital low-pass filter. This filter approximates the back and forth losses described by F(ω) = exp( ik(ω)l) and neglects the (small) dispersion. So that they can be adjusted to the geometry of the resonator, the coefficients of the filter are expressed analytically as functions of the physical parameters, rather than using numerical approximations and minimizations. For this purpose, a one-pole filter is used, F( ω) = b exp( i ωd) 1 a 1 exp( i ω), (41) where ω = ω/ f e,andd = f e (L/c) is the pure delay corresponding to a back and forth path of the waves. The parameters b and a 1 are calculated so that F(ω) = F( ω) for two given values of ω, andaresolutions of the system F ( ) ( ω 1 1a 1 a 1 cos ( )) ω 1 = b, F ( ) ( ω 1a 1 a 1 cos ( )) ω = b, (4)

74 996 EURASIP Journal on Applied Signal Processing where F(ω (1,) ) = exp( ηc ω (1,) /L). The first value ω 1 is an approximation of the frequency of the first impedance peak of the truncated conical bore given by ω 1 = c(1πl9π x e 16L)/(4L(4L3πx e 4x e )), in order to ensure a suitable height of the impedance peak at the fundamental frequency. It is important to keep this feature to obtain a realistic digital simulation of the continuous dynamical system, since the linear impedance is associated with the nonlinear characteristics. This ensures that the decay time of the fundamental frequency of the approximated impulse response of the impedance matches the exact value, which is important in the case of fast changes in γ (e.g., attack transient). The second value ω corresponds to the resonance frequency of the Helmholz resonator ω = c S 1 /(L 1 V). The phase of F( ω) has a nonlinear part, which is given by arctan(a 1 sin( ω)/(1 a 1 cos( ω))). This part differs from the nonlinear part of the phase of F(ω), which is given by ηc ω/l. Although these two quantities are different and although the phase of F( ω) is determined by the choice of a 1, which is calculated from the modulus, it is worth noting that in both cases, the dispersion is always very small, has a negative value, and is monotonic up to the frequency ( f e /π)arccos(a 1 ). Consequently, in both cases, in the case of a cylindrical bore, up to this frequency, the distance between successive impedance peaks decreases as their rank increases, ω n1 ω n <ω n ω n 1. Using (19) and(41), the impedance of the cylindrical bore unit C(ω) is then expressed by C(z) = 1 a 1z 1 b z D 1 a 1 z 1. (43) b z D Since L 1 is small, the frequency-dependent function G(ω) involved in the definition of the impedance of the short backbore C 1 (ω) can be approximated by a constant, corresponding to its value in ω. The bilinear transformation is used to discretize D = iω: D(z) = f e ((z 1)/(z 1)). The combination of all these parts according to (3) yields the digital impedance of the whole bore in the form k=4 k= b ck z Z e (z) k k=3 k= b cdk z D k = 1 k=4 k=1 a ck z k k=3, (44) k= a cdk z D k where the coefficients b ck, a ck, b cdk,anda cdk are expressed analytically as functions of the geometry of each part of the bore. This leads directly to the difference equation, which can be conveniently written in the form p e (n) = b c u e (n)ṽ, (45) where Ṽ includes all the terms that do not depend on the time sample n k=4 k=3 Ṽ = b ck u e (n k) b cdk u e (n D k) k=1 k= k=4 k=1 k=3 a ck p e (n k) k= a cdk p e (n D k). (46) Hz (a) samples (b) Figure 7: (a) represents approximated (solid lines) and exact (dotted lines) input impedance, while (b) represents approximated (solid lines) and exact (dotted lines) impulse response. Geometrical parameters L =.46 m, R =.16 m, θ =, L 1 =. m, R 1 =.15 m, and R b =.6 m. Figure 7 shows an oboe-like bore input impedance, both approximated (solid line) and exact (dotted line) together with the corresponding impulse responses Synthesis algorithm The sampled expressions for the impulse responses of the reed displacement and the impedance models are now used to write the sampled equivalent of the system of (31), (3), and (33): ( x(n) = b 1a pe (n 1) Ψβ u u e (n 1) ) (47) a 1a x(n 1) a a x(n ), p e (n) = b c u e (n)ṽ, (48) where W is u e (n) = W sign ( γ p e (n) ) γ p e (n), (49) W = Θ ( 1 γ x(n) ) ζ ( 1 γ x(n) ) 1Ψβ x ( 1 γ x(n) ). (5) This system of equations is an implicit system, since u e (n) hastobeknowninordertobeabletocomputep e (n) with the impedance equation (48). Likewise, u e (n) is obtained from the nonlinear equation (49) and requires p e (n) to be known. Thanks to the specific reed discretization scheme presented in Section 3.1, calculating x(n)with(47)doesnotre-

75 A Digital Synthesis Model of Double-Reed Wind Instruments 997 quire p e (n)andu e (n) to be known. This makes it possible to solve this system explicitly, as shown in [6], thus doing away with the need for schemes such as the K-method [15]. Since W is always positive, if one considers the two cases γ p e (n) andγ p e (n) <, successively, substituting the expression for p e (n)from(48) into (49)eventuallygives u e (n) = 1 sign(γ Ṽ) ( b c W (bc W W ) ) 4 γ Ṽ. (51) The acoustic pressure and flow in the mouthpiece at sampling time n are then finally obtained by the sequential calculation of Ṽ with (46), x(n) with(47), W with (5), u e (n) with (51), and p e (n)with(48). Theexternalpressurep ext (n) is calculated using the difference between the sum of the internal pressure and the flow at sampling time n and n SIMULATIONS The effects of introducing the confined air jet into the nonlinear characteristics are now studied in the case of two different bore geometries. In particular, we consider a cylindrical resonator, the impedance peaks of which are odd harmonics, and a resonator, the impedance of which contains all the harmonics. We start by checking numerically the validity of the resolution scheme in the case of the cylindrical bore. (Sound examples are available at guillemain/eurasip.html.) 4.1. Cylindrical resonator We first consider a cylindrical resonator, and make the parameter Ψ vary linearly from to 4 during the sound synthesis procedure (1.5 seconds). The transient attack corresponds to an abrupt increase in γ at t =. During the decay phase, starting at t = 1.3 seconds, γ decreases linearly towards zero. Its steady-state value is γ =.56. The other parameters are constant, ζ =.35, β x = , β u = The reed parameters are ω r = π.315 rad/second, q r =.5. The resonator parameters are R =.55 m, L =.46 m. Figure 8 shows superimposed curves, in the top figure, the digital impedance of the bore is given in dotted lines, and the ratio between the Fourier transforms of the signals p e (n) andu e (n) in solid lines; in the bottom figure, the digital reed transfer function is given in dotted lines, and the ratio of the Fourier transforms of the signals x(n) and p e (n)ψ(n)β u u e (n) (including attack and decay transients) in solid lines. As we can see, the curves are perfectly superimposed. There is no need to check the nonlinear relation between u e (n), p e (n), and x(n), which is satisfied by construction since u e (n) is obtained explicitly as a function of the other variables in (51). In the case of the oboe-like bore, the results obtained using the resolution scheme are equally accurate Hz 1.5 (a) Hz (b) Figure 8: (a) represents impedance (dotted line) and ratio between the spectra of p e and u e (solid line), while (b) represents reed transfer (dotted line) and ratio of spectra between x and p e Ψβ u u e (solid line). khz s Figure 9: Spectrogram of the external pressure for a cylindrical bore and a beating reed where γ = The case of the beating reed The first example corresponds to a beating reed situation, which is simulated by choosing a steady-state value of γ greater than.5 (γ =.56). Figure 9 shows the spectrogram (db) of the external pressure generated by the model. The values of the spectrogram are coded with a grey-scale palette (small values are dark and high values are bright). The bright horizontal lines correspond to the harmonics of the external pressure

998 EURASIP Journal on Applied Signal Processing 1 18 16 14 1 1 8 6 4 8 6 4 4 6 1 1 (a) 1 18 16 14 1 1 8 6 4 8 6 4 4 6 1 1 (b) 1 8 6 khz 4 Figure 1: u e (n) versusp e (n): (a) t =.5 second, (b) t =.

76 998 EURASIP Journal on Applied Signal Processing (a) (b) khz 4 Figure 1: u e (n) versusp e (n): (a) t =.5 second, (b) t =.5 second (a) Figure 11: u e (n)versusp e (n): (a) t =.75 second, (b) t = 1second. (b) s Figure 1: Spectrogram of the external pressure for a cylindrical bore and a nonbeating reed where γ = (a) (b) Increasing the value of Ψ mainly affects the pitch and only slightly affects the amplitudes of the harmonics. In particular, at high values of Ψ, a small increase in Ψ results in a strong decrease in the pitch. A cancellation of the self-oscillation process can be observed at around t = 1. seconds, due to the high value of Ψ, since it occurs before γ starts decreasing. Odd harmonics have a much higher level than even harmonics as occuring in the case of the clarinet. Indeed, the even harmonics originate mainly from the flow, which is taken into account in the calculation of the external pressure. However, it is worth noticing that the level of the second harmonic increases with Ψ. Figures 1 and 11 show the flow u e (n) versus the pressure p e (n), obtained during a small number (3) of oscillation periods at around t =.5 seconds, t =.5 seconds, t =.75 seconds and t = 1 seconds. The existence of two different paths, corresponding to the opening or closing of the reed, is due to the inertia of the reed. This phenomenon is observed also on single-reed instruments (see, e.g., [14]). A discontinuity appears in the whole path because the reed is beating. This cancels the opening (and hence the flow) while the pressure is still varying. The shape of the curve changes with respect to Ψ. This shape is in agreement with the results presented in [5]. Figure 13: u e (n) versusp e (n): (a) t =.5 second, (b) t =.5 second The case of the nonbeating reed The second example corresponds to a nonbeating reed situation, which is obtained by choosing a steady-state value of γ smaller than.5 (γ =.498). Figure 1 shows the spectrogram of the external pressure generated by the model. Increasing the value of Ψ results in a sharp change in the level of the high harmonics at around t =.4 seconds, a slight change in the pitch, and a cancellation of the self-oscillation process at around t =.8 seconds, corresponding to a smaller value of Ψ than that observed in the case of the beating reed. Figure 13 shows the flow u e (n) versus the pressure p e (n) at around t =.5 seconds and t =.5 seconds. Since the reed is no longer beating, the whole path remains continuous. The changes in its shape with respect to Ψ are smaller than in the case of the beating reed. 4.. Oboe-like resonator Inordertocomparetheeffects of the confined air jet with the geometry of the bore, we now consider an oboe-like bore,

A Digital Synthesis Model of Double-Reed Wind Instruments 999.4...4.5

77 A Digital Synthesis Model of Double-Reed Wind Instruments s (a) samples (b) samples (c) Figure 14: (a) represents external acoustic pressure, and (b), (c) represent attack and decay transients. the input impedance, and geometric parameters of which correspond to Figure 7. The other parameters have the same values as in the case of the cylindrical resonator, and the steady-state value of γ is γ =.4. Figure 14 shows the pressure p ext (t). Increasing the effect of the air jet confinement with Ψ, and hence the aerodynamical losses, results in a gradual decrease in the signal amplitude. The change in the shape of the waveform with respect to Ψ can be seen on the blowups corresponding to the attack and decay transients. Figure 15 shows the spectrogram of the external pressure generated by the model. Since the impedance includes all the harmonics (and not only the odd ones as in the case of the cylindrical bore), the output pressure also includes all the harmonics. This makes for a considerable perceptual change in the timbre in comparison with the cylindrical geometry. Since the input impedance of the bore is not perfectly harmonic, it is not possible to determine whether the moving formants are caused by a change in the value of Ψ or by a phasing effect resulting from the slight inharmonic nature of the impedance. Increasing the value of Ψ affects the amplitude of the harmonics and slightly changes the pitch. In addition, as in the case of the cylindrical bore with a nonbeating reed, a large value of Ψ brings the self-oscillation process to an end. khz s Figure 15: Spectrogram of the external pressure for an oboe-like bore where γ = (a) Figure 16: u e (n) versusp e (n): (a) t =.5 second, (b) t =.5 second (a) (b) Figure 17: u e (n)versusp e (n): (a) t =.75 second, (b) t = 1 second. Figures 16 and 17 show the flow u e (n) versus the pressure p e (n) ataroundt =.5 seconds, t =.5 seconds, t =.75 seconds, and t = 1 seconds. The shape and evolution with Ψ of the nonlinear characteristics are similar to what occurs in the case of a cylindrical bore with a beating reed. (b)

1 EURASIP Journal on Applied Signal Processing 5. CONCLUSION The synthesis model described in this paper includes the formation of a confined air jet in the embouchure of doublereed instruments.

78 1 EURASIP Journal on Applied Signal Processing 5. CONCLUSION The synthesis model described in this paper includes the formation of a confined air jet in the embouchure of doublereed instruments. A dimensionless physical model, the form of which is suitable for transposition to a digital synthesis model, is proposed. The resonator is modeled using a time domain equivalent of the input impedance and does not require the use of wave variables. This facilitates the modeling of the digital coupling between the bore, the reed and the nonlinear characteristics, since all the components of the model use the same physical variables. It is thus possible to obtain an explicit resolution of the nonlinear coupled system thanks to the specific discretization scheme of the reed model. This is applicable to other self-oscillating wind instruments using the same flow model, but it still requires to be compared with other methods. This synthesis model was used in order to study the influence of the confined jet on the sound generated, by carrying out a real-time implementation. Based on the results of informal listening tests with an oboe player, the sound and dynamics of the transients obtained are fairly realistic. The simulations show that the shape of the resonator is the main factor determining the timbre of the instrument in steadystate parts, and that the confined jet plays a role at the control level of the model, since it increases the oscillation step and therefore plays an important role mainly in the transient parts. ACKNOWLEDGMENTS The author would like to thank Christophe Vergez for helpful discussions on the physical flow model, and Jessica Blanc for reading the English. REFERENCES [1] R. T. Schumacher, Ab initio calculations of the oscillation of a clarinet, Acustica, vol. 48, no. 71, pp , [] J. O. Smith III, Principles of digital waveguide models of musical instruments, in Applications of Digital Signal Processing to Audio and Acoustics, M. Kahrs and K. Brandenburg, Eds., pp , Kluwer Academic Publishers, Boston, Mass, USA, [3] V. Välimäki and M. Karjalainen, Digital waveguide modeling of wind instrument bores constructed of truncated cones, in Proc. International Computer Music Conference, pp , Computer Music Association, San Francisco, [4] M. van Walstijn and M. Campbell, Discrete-time modeling of woodwind instrument bores using wave variables, Journal of the Acoustical Society of America, vol. 113, no. 1, pp , 3. [5] C. Vergez, R. Almeida, A. Caussé, and X. Rodet, Toward a simple physical model of double-reed musical instruments: influence of aero-dynamical losses in the embouchure on the coupling between the reed and the bore of the resonator, Acustica, vol. 89, pp , 3. [6] Ph. Guillemain, J. Kergomard, and Th. Voinier, Real-time synthesis of wind instruments, using nonlinear physical models, submitted to Journal of the Acoustical Society of America. [7] J. Kergomard, Elementary considerations on reedinstrument oscillations, in Mechanics of Musical Instruments, A. Hirschberg, J. Kergomard, and G. Weinreich, Eds., Springer-Verlag, New York, NY, USA, [8] A. Almeida, C. Vergez, R. Caussé, and X. Rodet, Physical study of double-reed instruments for application to soundsynthesis, in Proc. International Symposium in Musical Acoustics, pp. 1 6, Mexico City, Mexico, December. [9] A. Hirschberg, Aero-acoustics of wind instruments, in Mechanics of Musical Instruments, A. Hirschberg, J. Kergomard, and G. Weinreich, Eds., Springer-Verlag, New York, NY, USA, [1] S. Ollivier, Contribution àl étude des oscillations des instruments àventà anche simple, Thèse de l Université dumaine, l Université du Maine, France,. [11] T. A. Wilson and G. S. Beavers, Operating modes of the clarinet, Journal of the Acoustical Society of America, vol. 56, no., pp , [1] Ph. Guillemain, J. Kergomard, and Th. Voinier, Real-time synthesis models of wind instruments based on physical models, in Proc. Stockholm Music Acoustics Conference, Stockholm, Sweden, 3. [13] A. D. Pierce, Acoustics An Introduction to Its Physical Principles and Applications, McGraw-Hill, New York, NY, USA, 1981, reprinted by Acoustical Society of America, Woodbury, NY, USA, [14] F. Avanzini and D. Rocchesso, Efficiency, accuracy, and stability issues in discrete time simulations of single reed instruments, Journal of the Acoustical Society of America, vol. 111, no. 5, pp ,. [15] G. Borin, G. De Poli, and D. Rocchesso, Elimination of delayfree loops in discrete-time models of nonlinear acoustic systems, IEEE Trans. Speech, and Audio Processing, vol. 8, no. 5, pp ,. Ph. Guillemain was born in 1967 in Paris. Since 1995, he has worked as a full time researcher at the Centre National de la Recherche Scientifique (CNRS) in Marseille, France. He obtained his Ph.D. in 1994 on the additive synthesis modeling of natural sounds using time frequency and wavelets representations. Since 1989, he has been working in the field of musical sounds analysis, synthesis and transformation using signal models, and phenomenological models with an emphasis on propagative models, their link with physics, and the design and control of real-time compatible synthesis algorithms.

79 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation Real-Time Gesture-Controlled Physical Modelling Music Synthesis with Tactile Feedback David M. Howard Media Engineering Research Group, Department of Electronics, University of York, Heslington, York, YO1 5DD, UK dh@ohm.york.ac.uk Stuart Rimell Media Engineering Research Group, Department of Electronics, University of York, Heslington, York, YO1 5DD, UK smr1@ohm.york.ac.uk Received 3 June 3; Revised 13 November 3 Electronic sound synthesis continues to offer huge potential possibilities for the creation of new musical instruments. The traditional approach is, however, seriously limited in that it incorporates only auditory feedback and it will typically make use of a sound synthesis model (e.g., additive, subtractive, wavetable, and sampling) that is inherently limited and very often nonintuitive to the musician. In a direct attempt to challenge these issues, this paper describes a system that provides tactile as well as acoustic feedback, with real-time synthesis that invokes a more intuitive response from players since it is based upon mass-spring physical modelling. Virtual instruments are set up via a graphical user interface in terms of the physical properties of basic wellunderstood sounding objects such as strings, membranes, and solids. These can be interconnected to form complex integrated structures. Acoustic excitation can be applied at any point mass via virtual bowing, plucking, striking, specified waveform or from any external sound source. Virtual microphones can be placed at any point masses to deliver the acoustic output. These aspects of the instrument are described along with the nature of the resulting acoustic output. Keywords and phrases: physical modelling, music synthesis, haptic interface, force feedback, gestural control. 1. INTRODUCTION Musicians are always searching for new sounds and new ways of producing sounds in their compositions and performances. The availability of modern computer systems has enabled considerable processing power to be made available on the desktop and such machines have the capability of enabling sound synthesis techniques to be employed in realtime, that would have required large dedicated computer systems just a few decades ago. Despite the increased incorporation of computer technology in electronic musical instruments, the search is still on for virtual instruments that are closer in terms of how they are played to their physical acoustic counterparts. The system described in this paper aims to integrate music synthesis by physical modelling with novel control interfaces for real-time use in composition and live performances. Traditionally, sound synthesis has relied on techniques involving oscillators, wavetables, filters, time envelope shapers, and digital sampling of natural sounds (e.g., [1]). More recently, physical models of musical instruments have been used to generate sounds which have more natural qualities and have control parameters which are less abstract and more closely related to musicians experiences with acoustic instruments [, 3, 4, 5]. Professional electroacoustic musicians require control over all aspects of the sounds with which they are working, in much the same way as a conductor is in control of the sound produced by an orchestra. Such control is not usually available from traditional synthesis techniques, since user adjustment of available synthesis parameters rarely leads to obviously predictable acoustic results. Physical modelling, on the other hand, offers the potential of more intuitive control, because the underlying technique is related directly to the physical vibrating properties of objects, such as strings and membranes with which the user can interact through inference relating to expectation. The acoustic output from traditional electronic musical instruments is often described as cold or lifeless by players and audience alike. Indeed, many report that such sounds become less interesting with extended exposure. The acoustic output from acoustic musical instruments, on the other hand, is often described as warm, intimate or organic. The application of physical modelling for sound synthesis produces output sounds that resemble much more closely their physical counterparts.

80 1 EURASIP Journal on Applied Signal Processing The success of a user interface for an electronic musical instrument might be judged on its ability to enable the user to experience the illusion of directly manipulating objects, and one approach might be the use of virtual reality interfaces. However, this is not necessarily the best way to achieve such a goal in the context of a musical instrument, since a performing musician needs to be actively in touch visually and acoustically not only with other players, but also with the audience. This is summed up by Shneiderman [6]: virtual reality is a lively new direction for those who seek the immersion experience, where they block out the real world by having goggles on their heads. In any case, traditionally trained musicians rely less on visual feedback with their instrument and more on tactile and sonic feedback as they become increasingly accustomed to playing it. For example, Hunt and Kirk [7] note that observation of competent pianists will quickly reveal that they do not need to look at their fingers, let alone any annotation (e.g., sticky labels with the names of the notes on) which beginners commonly use. Graphics are a useful way of presenting information (especially to beginners), but are not the primary channel which humans use when fully accustomed to a system. There is evidence to suggest that the limited information available from the conventional screen and mouse interface is certainly limiting and potentially detrimental for creating electroacousticmusic.buxton [8] suggests that the visual senses are overstimulated, whilst the others are understimulated. In particular, he suggests that tactile input devices also provide output to enable the user to relate to the system as an object rather than an abstract system, every haptic input device can also be considered to provide output. This would be through the tactile or kinaesthetic feedback that it provides to the user...some devices actually provide force feedback, as with some special joysticks. Fitzmaurice [9]proposes graspable user interfaces as real objects which can be held and manipulated, positioned, and conjoined in order to make interfaces which are more akin to the way a human interacts with the real world. It has further been noted that the haptic senses provide the second most important means (after the audio output) by which users observe and interact with the behaviour of musical instruments [1], and that complex and realistic musical expression can only result when both tactile (vibrational and textural) and proprioceptive cues are available in combination with aural feedback [11]. Considerable activity exists on capturing human gesture and megaproject.org/ [1]. Specific to the control of musical instruments is the provision of tactile feedback [13], electronic keyboards that have a feel close to a real piano [14], haptic feedback bows that simulate the feel and forces of real bows [15], and the use of finger-fitted vibrational devices in open air gestural musical instruments [16]. Such haptic control devices are generally one-off, relatively expensive, and designed to operate linked with specific computer systems, and as such, they are essentially inaccessible to the musical masses. A key feature of our instrument is its potential for wide applicability, and therefore inexpensive and widely available PC force feedback gaming devices are employed to provide its real-time gestural control and haptic feedback. The instrument described in this paper, known as Cymatic [17], took its inspiration from the fact that traditional acoustic instruments are controlled by direct physical gesture, whilst providing both aural and tactile feedback. Cymatic has been designed to provide players with an immersive, easy to understand, as well as tactile musical experience that is more commonly associated with acoustic instruments but rarely found with computer-based instruments. The audio output from Cymatic is derived from a physical modelling synthesis engine which has its origins in TAO [3]. It shares some common approaches with other physical modelling sound synthesis environments such as Mosaic in [4] and Cordis-Anima in [5]. Cymatic makes use of the more intuitive approach to sound synthesis offered by physical modelling, to provide a building block approach to the creation of virtual instruments, based on elemental structures in one (string), two (sheet), three (block), or more dimensions that can be interconnected to form complex virtual acoustically resonant structures. Such instruments can be excited acoustically, controlled in real-time via gestural devices that incorporate force feedback to provide a tactile response in addition to the acoustic output, and heard after placing one or more virtual microphones at user-specified positions within the instrument.. DESIGNING AND PLAYING CYMATIC INSTRUMENTS Cymatic is a physical modelling synthesis system that makes use of a mass-spring paradigm with which it synthesises resonating structures in real-time. It is implemented on a Windows-based PC machine in C, and it incorporates support for standard force feedback PC gaming controllers to provide gestural control and tactile feedback. Acoustic output is realised via a sound card that provides support for ASIO audio drivers. Operation of Cymatic is a two-stage process: (1) virtual instrument design and () real-time sound synthesis. Virtual instrument design is accomplished via a graphical interface, with which individual building block resonating elements including strings, sheets, and solids can be incorporated in the instrument and interconnected on a userspecified mass to mass basis. The ends of strings and edges of sheets and blocks can be locked as desired. The tension and mass parameters of the masses and springs within each building block element can be user defined in value and either left fixed or placed under dynamical control using a gestural controller during synthesis. Virtual instruments can be customised in shape to enable arbitrary structures to be realised by deleting or locking any of the individual masses. Each building block resonating element will behave as a vibrating structure. The individual axial resonant frequencies will be determined by the number of masses along the given axis, the sampling rate, and the specified mass and tension values. Standard relationships hold in terms of the relative values of resonant frequency between building blocks, for example, a string twice the length of an-

81 Gesture-Tactile Physical Modelling Synthesis 13 String 1 Sheet 1 Block 1 Bow mic1 Random Figure 1: Example build-up of a Cymatic virtual instrument starting with a string with 45 masses (top left), then adding a sheet of 7 by9masses(bottomleft),thenablockof4by4by3masses(top right), and finally the completed instrument (bottom right). Key: Mic1, audio output virtual microphone on the sheet at mass (4, 1); random, random excitation at mass 33 of the string; bow, bowed excitation at mass (,, ) of the block; joins (dotted line) between string mass 18 and sheet mass (1, 5); join (dotted line) between sheet mass (6, 3) and block mass (3,, 1). other will have a fundamental frequency that is one octave lower. An excitation function, selected from the following list, can be placed on any mass within the virtual instrument: pluck, bow, random, sine wave, square wave, triangular wave, or live audio. Parameters relating to the selected excitation, including excitation force and its velocity and time of application where appropriate can be specified by the user. Multiple excitations can be specified on the basis that each is applied to its own individual mass element. Monophonic audio output to the sound card is achieved via a virtual microphone placed on any individual mass within the instrument. Stereophonic output is available either from two individual microphones or from any number of microphones greater than two, where the output from each is panned between the left and right channels as desired. Cymatic supports whatever range of sampling rates that is available on the sound card. For example, when used with an Eridol UA-5 USB audio interface, the following are available: 8 khz, 9.6 khz, 11.5 khz, 1 khz, 16 khz,.5 khz, 4 khz, 3 khz, 44.1 khz, 48 khz, 88. khz, and 96 khz. Figure 1 illustrates the process of building up a virtual instrument. The instrument has been built up from a string of 45 masses, a sheet of 7 by 9 masses, and a block of 4 by 4 by 3 masses. There is an interconnection between the string (mass 18 from the left) and the sheet (mass 1, 5) as well as the sheet (mass 6, 3) and the block (mass 3,, 1) as indicated by the dotted lines (a simple process based on clicking on the relevant masses). Two excitations have been included: a random input to the string at mass 33 and a bowed excitation to the block at mass (,, ). The basic sheet and block have been edited. Masses have been removed from both the sheet and the block as indicated by the gaps in their structure and the masses on the back surface of the block have all been locked. The audio output is derived from a virtual microphone placed on the sheet at mass (4, 1). These are indicated on the figure as random, bow,andmic1,respectively. Individual components, excitations, and microphones can be added, edited, or deleted as desired. The instrument is controlled in real-time using a Microsoft Sidewinder Force Feedback Pro Joystick and a Logitech ifeel mouse found on various gestures that can be captured by these devices can be mapped to any of the parameters that are associated with the physical modelling process on an element-by-element basis. The joystick offers four degrees of freedom (x, y, z-twist movement and a rotary throttle controller) and eight buttons. The mouse has two degrees of freedom (X, Y) and three buttons. Cymatic parameters that can be controlled include the mass or tension of any of the basic elements that make up the instrument and the parameters associated with the chosen excitation, such as bowing pressure, excitation force, or excitation velocity. The buttons can be configured to suppress the effect of any of the gestural movements to enable the user to move to a new position while making no change and then the change can be made instantaneously by releasing the button. In this way, step variations can be accommodated. The force feedback capability of the joystick allows for the provision of tactile feedback with a high degree of customisability. It receives its force instructions via MIDI through the combined MIDI/joystick port on most PC sound cards, and Cymatic outputs the appropriate MIDI messages to control its force feedback devices. The Logitech ifeel mouse is an optical mouse which implements Immersion s ifeel technology ( It contains a vibrotactile device to produce tactile feedback over a range of frequencies and amplitudes via the Immersion Touchsense Entertainment software, which converts any audio signal to tactile sensations. The force feedback amplitude is controlled by the acoustic amplitude of the signal from a user-specified virtual microphone, which might be involved in the provision of the main acoustic output, or it could solely be responsible for the control of tactile feedback. 3. PHYSICAL MODELLING SYNTHESIS IN CYMATIC Physical modelling audio synthesis in Cymatic is carried out by solving for the mechanical interaction between the masses and springs that make up the virtual instrument on a sampleby-sample basis. The central difference method of numerical integration is employed as follows: x(t dt) = x(t)v(t dt/)dt v(t dt/) = v(t dt/) a(t)dt, (1) where x = mass position, v = mass velocity, a = mass acceleration, t = time,anddt = sampling interval.

14 EURASIP Journal on Applied Signal Processing The mass velocity is calculated half a time step ahead of its position, which results in a more stable model than an implementation of the Euler

Three forces are acting on the cell: F total = F spring F damping F external, (3) where F spring = the force on the cell from springs connected to neighbouring cells, F damping = the frictional

F spring is calculated by summing the force on the cell from the springs connecting it to its neighbours, calculated via Hooke s law: String 1 Sheet 1 Joined masses: mass 3 on string to mass (6.

82 14 EURASIP Journal on Applied Signal Processing The mass velocity is calculated half a time step ahead of its position, which results in a more stable model than an implementation of the Euler approximation. The acceleration at time t of a cell is calculated by the classical equation a = F/m, () where F = the sum of all the forces on the cell and m = cell mass. Three forces are acting on the cell: F total = F spring F damping F external, (3) where F spring = the force on the cell from springs connected to neighbouring cells, F damping = the frictional damping force on the cell due to the viscosity of the medium, F external = the force on the cell from external excitations. F spring is calculated by summing the force on the cell from the springs connecting it to its neighbours, calculated via Hooke s law: String 1 Sheet 1 Joined masses: mass 3 on string to mass (6.3) on sheet Figure : Cymatic virtual instrument consisting of a string and modified sheet. They are joined together between mass 3 (from the left) on the string to mass (6, 3) on the sheet. A random excitation is applied at point 1 of the string and the virtual microphone is located at mass (6, 3) of the sheet. F spring = k ( pn p ), (4) where k = spring constant, p n = the position of the nth neighbour, and p = the position of the current cell. F damping is the frictional force on the cell caused by the viscosity of the medium in which the cell is contained. It is proportional to the cell velocity, where the constant of proportionality is the damping parameter of the cell. F damping = ρv(t), (5) where ρ = the damping parameter of the cell, v(t) = the velocity of the cell at time t. The acceleration of a particular cell at any instant can be established by combining these forces into () a(t) = (1/m) ( k ( pn p ) ρv(t)fexternal ). (6) The position, velocity, and acceleration are calculated once per sampling interval for each cell in the virtual instrument. Any virtual microphones in the instrument output their cell positions to provide an output audio waveform. 4. CYMATIC OUTPUTS Audio spectrograms provide a representation that enables the detailed nature of the acoustic output from Cymatic to be observed visually. Figure shows a virtual Cymatic instrument consisting of a string and a modified sheet which are joined together between mass 3 (from the left) on the string to mass (6, 3) on the sheet. A random excitation is applied at mass 1 of the string and a virtual microphone (mic1) is located at mass (4, 3) of the sheet. Figure 3 shows the force feedback joystick settings dialog used to control the virtual instrument and it can be seen that the component mass of the string, the component tension, and damping and mass of the sheet are controlled by the X, Y, Z and slider (throttle) functions of the joystick. Three of the buttons have been set tosuppressx,y,andz;afeaturewhichenablesanewsetting khz 4 Figure 3: Force feedback joystick settings dialog. Figure 4: Spectrogram of output from the Cymatic virtual instrument, shown in Figure, consisting of a string and modified sheet. to be jumped to as desired, for example, by pressing button 1, moving the joystick in the X axis and then releasing button 1. Force feedback is applied based on the output amplitude level from mic1. Figure 4 shows a spectrogram of the output from mic1 of the instrument. The tonality visible (horizontal banding 1S

Gesture-Tactile Physical Modelling Synthesis 15 Figure 5: Spectrogram of a section of the child is sleeping by Stuart Rimell showing Cymatic alone (from the start to A), the word hush sung by the

83 Gesture-Tactile Physical Modelling Synthesis 15 Figure 5: Spectrogram of a section of the child is sleeping by Stuart Rimell showing Cymatic alone (from the start to A), the word hush sung by the four-part choir (A to B) and the st of still at C. in the spectrogram) is entirely due to the resonant properties of the string and sheet themselves, since the input excitation is random. Variations in the tonality are rendered through gestural control of the joystick, and the step change notable just before half way through is a result of using one of the suppress buttons. Cymatic was used in a public live concert in December, for which a new piece the child is sleeping was specially composed by Stuart Rimell for a capella choir and Cymatic ( dmh). It was performed by the Beningbrough Singers in York, conducted by David Howard. The composer performed the Cymatic part, which made use of three cymbal-like structures controlled by the mouse and joystick. The choir provided a backing in the form of a slow moving carol in four-part harmony, while Cymatic played an obligato solo line. The spectrogram in Figure 5 illustrates this with a section which has Cymatic alone (up to point A), and then the choir enters singing hush be still, with the sh of hush showing at point B and the st of still at point C. In this particular Cymatic example, the sound colours being used lie at the extremes of the vocal spectral range, but there are clearly tonal elements in the Cymatic output visible. Indeed, these were essential as a means of giving the choir their starting pitches. 5. DISCUSSION AND CONCLUSIONS An instrument known as Cymatic has been described, which provides its players with an immersive, easy to understand, as well as tactile musical experience that is rarely found with computer-based instruments, but commonly expected from acoustic musical instruments. The audio output from Cymatic is derived from a physical modelling synthesis engine, which enables virtual instruments with arbitrary shapes to be built up by interconnecting one (string), two (sheet), three (block), or more dimensional basic building blocks. An acoustic excitation chosen from bowing, plucking, striking, or waveform is applied at any mass element, and the output is derived from a virtual microphone placed at any other mass element. Cymatic is controlled via gestural controllers that incorporate force feedback to provide the player with tactile as well as acoustic feedback. Cymatic has the potential to enable new musical instruments to be explored, that have the potential to produce original and inspiring new timbral palates, since virtual instruments that are not physically realizable can be implemented. In addition, interaction with these instruments can include aspects that cannot be used with their physical counterparts, such as deleting part of the instrument while it is sounding, or changing its physical properties in real-time during performance. The design of the user interface ensures that all of these activities can be carried out in a manner that is more intuitive than with traditional electronic instruments, since it is based on the resonant properties of physical structures. A user can therefore make sense of what s/he is doing through reference to the likely behaviour of strings, sheets, and blocks. Cymatic has the further potential in the future (as processing speed increases further) to move well away from the real physical world, while maintaining the link with this intuition, since the spatial dimensionality of the virtual instruments can in principle be extended well beyond the three of the physical world.

16 EURASIP Journal on Applied Signal Processing Cymatic provides the player with an increased sense of immersion, which is particularly useful when developing performance skills since it reinforces

Tactile feedback also has the potential to prove invaluable in group performance, where traditionally computer instruments have placed an over-reliance on visual feedback, thereby detracting from the

84 16 EURASIP Journal on Applied Signal Processing Cymatic provides the player with an increased sense of immersion, which is particularly useful when developing performance skills since it reinforces the visual and aural feedback cues and helps the player internalise models of the instrument s response to gesture. Tactile feedback also has the potential to prove invaluable in group performance, where traditionally computer instruments have placed an over-reliance on visual feedback, thereby detracting from the player s visual attention which should be directed elsewhere in a group situation, for example, towards a conductor. ACKNOWLEDGMENTS The authors acknowledge the support of the Engineering and Physical Sciences Research Council UK under Grant number GR/M They also thank the anonymous referees for their helpful and useful comments. REFERENCES [1] M. Russ, Sound Synthesis and Sampling, Focal Press, Oxford, UK, [] J. O. Smith III, Physical modelling synthesis update, Computer Music Journal, vol., no., pp , [3] M. D. Pearson and D. M. Howard, Recent developments with TAO physical modelling system, in Proc. International Computer Music Conference, pp , Hong Kong, China, August [4] J.D.MorrisonandJ.M.Adrien, MOSAIC:Aframeworkfor modal synthesis, Computer Music Journal, vol.17,no.1,pp , [5] C. Cadoz, A. Luciani, and J. L. Florens, CORDIS-ANIMA: A modelling system for sound and image synthesis, the general formalism, Computer Music Journal, vol. 17, no. 1, pp. 19 9, [6] J. Preece, Interview with Ben Shneiderman, in Human- Computer Interaction,Y.Rogers,H.Sharp,D.Benyon,S.Holland, and J. Preece, Eds., Addison Wesley, Reading, Mass, USA, [7] A. D. Hunt and P. R. Kirk, Digital Sound Processing for Music and Multimedia, Focal Press, Oxford, UK, [8] W. Buxton, There is more to interaction than meets the eye: Some issues in manual input, in User Centered System Design: New Perspectives on Human-Computer Interaction,D.A.Norman and S. W. Draper, Eds., pp , Lawrence Erlbaum Associates, Hillsdale, NJ, USA, [9] G. W. Fitzmaurice, Graspable user interfaces, Dphil thesis, University of Toronto, Ontario, Canada, [1] B. Gillespie, Introduction haptics, in Music, Cognition, and Computerized Sound: An Introduction to Psychoacoustics,P.R. Cook, Ed., pp. 9 45, MIT Press, London, UK, [11] D. M. Howard, S. Rimell, A. D. Hunt, P. R. Kirk, and A. M. Tyrrell, Tactile feedback in the control of a physical modelling music synthesiser, in Proc. 7th International Conference on Music Perception and Cognition, C.Stevens,D.Burnham, G. McPherson, E. Schubert, and J. Renwick, Eds., pp. 4 7, Casual Publications, Adelaide, Australlia,. [1] S. Kenji, H. Riku, and H. Shuji, Development of an autonomous humanoid robot, isha, for harmonized humanmachine environment, Journal of Robotics and Mechatronics, vol. 14, no. 5, pp ,. [13] C. Cadoz, A. Luciani, and J. L. Florens, Responsive input devices and sound synthesis by simulation of instrumental mechanisms: The Cordis system, Computer Music Journal, vol. 8, no. 3, pp. 6 73, [14] B. Gillespie, Haptic display of systems with changing kinematic constraints: The virtual piano action, Ph.d. dissertation, Stanford University, Stanford, Calif, USA, [15] C. Nichols, The vbow: Development of a virtual violin Bow haptic human-computer interface, in Proc. New Interfaces for Musical Expression Conference, pp , Dublin, Ireland, May. [16] J. Rovan and V. Hayward, Typology of tactile sounds and their synthesis in gesture-driven computer music performance, in Trends in Gestural Control of Music, M.Wanderley and M. Battier, Eds., pp. 97 3, Editions IRCAM, Paris, France,. [17] D. M. Howard, S. Rimell, and A. D. Hunt, Force feedback gesture controlled physical modelling synthesis, in Proc. Conference on New Musical Instruments for Musical Expression,pp , Montreal, Canada, May 3. David M. Howard holds a first-class B.S. degree in electrical and electronic engineering from University College London (1978), and a Ph.D. in human communication from the University of London (1985). His Ph.D. topic was the development of a signal processing unit for use with a single channel cochlear implant hearing aid. He is now with the Department of Electronics at the University of York, UK, teaching and researching in music technology. His specific research areas include the analysis and synthesis of music, singing, and speech. Current activities include the application of bio-inspired techniques for music synthesis, physical modelling synthesis for music, singing and speech, and real-time computer-based visual displays for professional voice development. David is a Chartered Engineer, a Fellow of the Institution of Electrical Engineers, and a Member of the Audio Engineering Society. Outside work, David finds time to conduct a local 1-strong choir from the tenor line and to play the pipe organ. Stuart Rimell holds a B.S. in electronic music and psychology as well as an M.S. in digital music technology, both from the University of Keele, UK. He worked for 18 months with David Howard at the University of York on the development of the Cymatic system. There he studied electroacoustic composition for 3 years under Mike Vaughan and Rajmil Fischman. Stuart is interested in the exploration of new and fresh creative musical methods and their computer-based implementation for electronic music composition. Stuart is a guitarist and he also plays euphonium, trumpet, and piano and has been writing music for over 1 years. His compositions have been recognized internationally through prizes from the prestigious Bourge festival of electronic music in 1999 and performances of his music worldwide.

85 EURASIP Journal on Applied Signal Processing 4:7, 17 1 c 4 Hindawi Publishing Corporation Vibrato in Singing Voice: The Link between Source-Filter and Sinusoidal Models Ixone Arroabarren Department of Electrical and Electronic Engineering, Universidad Publica de Navarra, Campus de Arrosadia, 316 Pamplona, Spain ixone.arroabarren@unavarra.es Alfonso Carlosena Department of Electrical and Electronic Engineering, Universidad Publica de Navarra, Campus de Arrosadia, 316 Pamplona, Spain carlosen@unavarra.es Received 4 July 3; Revised 3 October 3 The application of inverse filtering techniques for high-quality singing voice analysis-synthesis is discussed. In the context of source-filter models, inverse filtering provides a noninvasive method to extract the voice source, and thus to study voice quality. Although this approach is widely used in speech synthesis, this is not the case in singing voice. Several studies have proved that inverse filtering techniques fail in the case of singing voice, the reasons being unclear. In order to shed light on this problem, we will consider here an additional feature of singing voice, not present in speech: the vibrato. Vibrato has been traditionally studied by sinusoidal modeling. As an alternative, we will introduce here a novel noninteractive source filter model that incorporates the mechanisms of vibrato generation. This model will also allow the comparison of the results produced by inverse filtering techniques and by sinusoidal modeling, as they apply to singing voice and not to speech. In this way, the limitations of these conventional techniques, described in previous literature, will be explained. Both synthetic signals and singer recordings are used to validate and compare the techniques presented in the paper. Keywords and phrases: voice quality, source-filter model, inverse filtering, singing voice, vibrato, sinusoidal model. 1. INTRODUCTION Inverse filtering provides a noninvasive method to study voice quality. In this context, high-quality speech synthesis is developed using a source-filter model, where voice texture is controlled by glottal source characteristics. Efforts to apply this approach to singing voice have failed, the reasons being not clear: either the unsuitability of the model, or the different range of frequencies, or both, could be the cause. The lyric singers, being professionals, have an efficiency requirement, and as a result, they are educated to change their formants position moving them towards the first harmonics position, what could also be another reason of the model s failure [1]. This paper purports to shed light on this problem by comparing two salient methods for glottal source and vocal tract response (VTR) estimation, with a novel frequencydomain method proposed by the authors. In this way, the inverse filtering approach will be tested in singing voice analysis. In order to have a benchmark, the source-filter model will be compared to sinusoidal model and this comparison will be performed thanks to the particular feature of singing voice: vibrato. Regarding the voice production models, we can distinguish two approaches as follows. (i) On the one hand, interactive models are closer to the physical features of the vocal system. This system is composed by two resonant cavities (subglottal and supraglottal) which are connected by a valve, the glottis, where vocalfoldsarelocated.themovementofthevocalfoldsprovides the harmonic nature of the air flow of voiced sounds, and also controls the coupling between the two resonant cavities, which will be different during the open and closed phases. As a result of this effect, the VTR will change during a single fundamental period and there will be a relationship between the glottal source and the VTR. This physical behavior has been modeled in several ways, by physical models [] or aerodynamic models [3, 4]. From the signal processing point of view, in [4] the VTR variation is related to the glottal area, which controls the coupling of the cavities, and this relationship is represented by a frequency modulation of the central frequency and bandwidth of the formants. Other effect of the source-tract interaction is the increase of the skewness of the glottal source [4], which emphasizes the difference between the glottal area and the glottal source [5].

86 18 EURASIP Journal on Applied Signal Processing (ii) On the other hand, Non Interactive Models separate the glottal source and the VTR, and both are independently modeled as linear time-varying systems. This is the case of the source-filter model proposed by Fant in [6]. The VTR is modeled as an all-pole filter, in the case of nonnasal sounds. For the glottal source several waveform models have been proposed [7, 8, 9], but all of them try to include some of the features of the source-tract interaction, typically the asymmetric shape of the pulse. These models provide a high quality synthesis framework for the speech with a low computational complexity. The synthesis is preceded by an analysis stage, which is divided into two steps: an inverse filtering step where the glottal source and the VTR are separated [9, 1, 11, 1, 13] and a parameterization step where the most relevant parameters of both elements are obtained [14, 15, 16]. In general, inverse filtering techniques yield worse results as the fundamental frequency increases, as is the case of women and children in speech and singing voice. In the latter case, singing voice, the number of published works is very scarce [1, 17]. In [1], the glottal source features are studied in speech and singing voice by acoustic and electroglottographic signals [18, 19]. From these works, it is not apparent which is the main limitation of inverse filtering in singing voice. It might be possible that the source-tract interaction was more complex than in speech, what would represent a paradox in the noninteractive assumption []. Other reason mentioned in [1] is that perhaps the glottal source models used in speech are not suitable for singing voice. These statements are not demonstrated, but are interesting questions that should be answered. On the other hand, in [17] the noninteractive sourcefilter model is used as a high-quality singing voice synthesis approach. The main contribution of that work is the development of an analysis procedure that estimates the parameters of the synthesis model [1, 1]. However, there is no evidence that could point to differences between speech and singing as it is indicated in [1]. One of the goals of the present work is to clarify whether the noninteractive models are able to model singing voice in the same way as high-quality speech, or on the contrary, the source-tract interaction is different from speech, and precludes this linear model assumption. If the noninteractive model could model singing voice, the reason of the failure of inverse filtering techniques would be just the high fundamental frequency of singing voice. To this end, we will compare in this paper three different inverse filtering techniques, one of them novel and proposed recently by the authors in order to obtain the sourcefilter decomposition. Though they work correctly for speech and low-frequency signals, we will show their limitations as the fundamental frequency increases. This is described in Section. Since fundamental frequency in singing voice is higher than in speech, it seems obvious that the above-mentioned methods fail, apparently due to the limited spectral information provided in high pitched signals. To compensate for that, we claim that the introduction of a feature such as vibrato Glottal source VTR Lip radiation diagram 1 l z 1 Singing voice Figure 1: Noninteractive source-filter model of voice production system. may serve to increase the information available by virtue of the frequency modulated nature, and therefore wider bandwidth, of vibrato [, 3, 4]. Frequency variations are influenced by the VTR, and this effect can be used to obtain information about it. With this in mind, it is not surprising that vibrato has been traditionally analyzed by sinusoidal modeling [5, 6], the most important limitation being the impossibility to separate the sound generation and the VTR. In Section 3, we will take a step forward by introducing a source-filter model, which accounts for the physical origin of the main features of singing voice. Making use of this model, we will also demonstrate how the simpler sinusoidal model can serve to obtain a complementary information to inverse filtering, particularly in those conditions where the latter method fails.. INVERSE FILTERING Along this section, the noninteractive source-filter model, depicted in Figure 1, will be considered and some of the possible estimation algorithms for it will be reviewed. According to the block diagram in Figure 1, singing voice production can be modeled by a glottal source excitation that is linearly modified by the VTR and the lip radiation diagram. Typically, the VTR is modeled by an all-pole filter, and relying on the linearity of the model, the lip radiation system is combined with the glottal source, in such a way that the glottal source derivative (GSD) is considered as the vocal tract excitation. In this context, during the last decades many inverse filtering algorithms to estimate the model elements have been proposed. This technique is usually accomplished in two steps. In the first one, the GSD waveform and the VTR are estimated. In the second one, these signals are parameterized in a few numerical values. This whole analysis can be practically implemented in several ways. For the sake of clarity, we can group these possibilities into two types. (i) In the first group, the two identification steps are combined in a single algorithm, for instance in [9, 1]. There, a mathematical model for GSD and the autoregressive (AR) model for the VTR are considered, and then authors estimate simultaneously the VTR and the GSD model parameters. In this way, the GSD model parameterizes a given phonation type. Several different algorithms follow this structure, but all of them are invariably time domain implementations that require glottal closure instant (GCI) detection [7]. Therefore, they suffer from a high computational load, what makes them very cumbersome.

87 Vibrato in Singing Voice 19 Speech Preemphasis Vocal tract parameters Voice source model Preemphasis Covariance LPC Voice source parameters optimization Voice source parameters Figure : Block diagram of the AbS inverse filtering algorithm. (ii) The procedures in the second group split the whole process into two stages. Regarding the first step, different inverse filtering techniques are proposed, [11, 13]. These algorithms remove the GSD effect from the speech signal and the VTR is obtained by linear prediction (LP) [8] or alternatively by discrete all-pole (DAP) modeling [9], which avoids the fundamental frequency dependence of the former. For this comparative study three inverse filtering approaches have been selected. The first one is the analysis by synthesis (AbS) procedure presented in [9], the second one is the one proposed by the authors in [13], Glottal Spectrum Based (GSB) inverse filtering. In this way, both groups of algorithms mentioned above are represented. In addition, the Closed Phase Covariance (CPC) [1] has been added to the comparison. This approach is difficult to classify because it only obtains the VTR, as it is the case in the second group, but it is a time domain implementation as in the first one. The most interesting feature of this algorithm is that it is less affected by the formant ripple due to the source-tract interaction, because it only takes into account the time interval when the vocal folds are closed. In what follows, the three approaches will be shortly described, and finally compared..1. Analysis by synthesis This inverse filtering algorithm was proposed in [9]. It is based on covariance LPC [9], but the least squares error is modified in order to include the input of the system: N 1 ( ) E = s(n) ŝ(n) n= N 1 ( ( p = s(n) a k s(n k)a p1 g(n))), n= k=1 whereg(n) represents the GSD, and (1) a p1 H(z) = 1 p k=1 a () kz k represents the VTR. Since neither VTR nor GSD parameters are known, an iterative algorithm is proposed and a simultaneous search is developed. The block diagram of the algorithm is represented in Figure. As in covariance LP without source, this approach allows shorter analysis windows. However, the stability of the system is not guaranteed and a stabilization step must be included with this purpose. Also, and since it is a time domain implementation, the voice source model must be synchronized with the speech signal and a high sampling frequency is mandatory in order to obtain satisfactory results. As a result, the computational load is also high. Regarding the GSD parameter optimization, it is dependent on the chosen model. In the results shown in Section.4, the LF model is selected because it is one of the most powerful GSD models, and it allows an independent control of the three main features of the glottal source: open quotient, asymmetry coefficient and spectral tilt. The disadvantage of this model is its computational load. For more details on the topic readers are referred to [8]. Regarding fundamental frequency limits, it is shown in [1] that this algorithm provides unsatisfactory results for medium and high pitched signals... Glottal spectrum based Inverse Filtering This technique was proposed by the authors in [13]andwill be briefly described here. Unlike the technique described in the previous section, it is essentially a frequency domain implementation. In the AbS approach, the GSD effect was included in the LP error, and the AR coefficients were obtained by Covariance LPC. In our case, a short term spectrum of speech is considered (3 or 4 fundamental periods), and the GSD effect is removed from the speech spectrum. Then, the AR coefficients of () are obtained by the DAP modeling [9]. For this spectral implementation, the KLGLOTT88 model [7] has been considered. It is less powerful than the LF model, but of a simpler implementation. As it is shown in Figure 3, there is a basic voicing waveform controlled by the open quotient (O q ) and the amplitude of voicing (AV), the spectral tilt being included by a firstorder low pass filter.

88 11 EURASIP Journal on Applied Signal Processing Speech g(t) Basic voicing waveform O q AV LPF. spectral tilt 1 1 µz 1 GSD Figure 3: Block diagram of the KLGLOTT88 model. Short term spectrum Peak detection Basic voicing spectrum DAP modeling V.tract ST (N 1)th order V. tract ands.tilt separation Vocal tract parameters Figure 4: Block diagram of the GSB inverse filtering algorithm. In our inverse filtering algorithm, once the short term spectrum is calculated, the glottal source effect is removed, by spectral division, by using the spectrum of the basic voicing waveform (3), which can be directly obtained by the Fourier Transform of the basic voicing waveform, [3]: G( f ) = [ 7 AV je jπfo q T o Oq To 1e jπf O q (πf) 3 πf O q T o Oq To 1 e jπf 3j ( ) ]. πf Oq T o The spectral tilt and the VTR are combined in an (N 1)th order all-pole filter. The block diagram of the algorithm is shown in Figure 4. Since DAP modeling is the most important part of the algorithm, we should explain its rationale. In classical autocorrelation LP [8], it is a well-known effect that as fundamental frequency increases the resulting transfer function is biased by the spectral peaks of the signal. This happens because the signal is assumed to be the impulse response of the system, and this assumption is obviously not entirely correct. In order to avoid this problem, an alternative proposed in [9] is to obtain the LP error based on the spectral peaks, instead of on the time domain samples. Unfortunately, this error calculation is based on an aliased version of the right autocorrelation of the signal, and this aliasing grows as the fundamental frequency increases. Then, the resulting transfer function is not correct again. To solve this problem, the DAP modeling uses the Itakura-Saito error, instead of the least squares error, and it can be shown that the error is minimized using only the spectral peaks information. The details of the algorithm are explained in [9]. This technique allows higher fundamental frequencies than classical autocorrelation LP, but for proper operation requires an enough number of spectral peaks in order to estimate the right trans- (3) Normalized amplitude Closed phase Time (s) EGG Speech GSD Voice Figure 5: Closed phase interval in voice. GCI detection Closed phase detection Interval selection Covariance LPC Figure 6: Closed phase covariance (CPC). Vocal tract parameters fer function. So, this inverse filtering algorithm will also have a limit in the highest achievable fundamental frequency..3. Closed Phase Covariance This inverse filtering technique was proposed in [31]. It is also based on covariance LP, as the AbS approach explained above. However, instead of removing the effect of the GSD from a long speech interval, the classical covariance LP takes only into account a portion of a single cycle where the vocal folds are closed. In this way, and in the considered time interval, there is no GSD information to be removed, and the application of covariance LP will lead to the right transfer function. Considering the linearity of the model shown in Figure 1, the closed phased interval will be the time interval where the GSD is zero. This situation is depicted in Figure 5. The most difficult step in this technique is to detect the closed phase in the speech signal. In [1], a two-channel speech processing is proposed, making use of electroglottographic signals to detect the closed phase. Electroglottography (EGG) is a technique used to indirectly register laryngeal behavior by measuring the electrical impedance across the throat during speech. Rapid variation in the conductance is mainly caused by movement of the vocal folds. As they approximate and the physical contact between them increases, the impedance decreases, what results in a relatively higher current flow through the larynx structures. Therefore, this signal will provide information about the contact surface of the vocal cords. The complete inverse filtering algorithm is represented in Figure 6.

89 Vibrato in Singing Voice Time (s) GSB CPC AbS Original GSD (a) Estimated GSD. F o = 1 Hz, vowel a Time (s) GSB CPC AbS Original GSD (b) Estimated GSD. F o = 3 Hz, vowel a. Amplitude (db) Frequency (Hz) Amplitude (db) Frequency (Hz) GSB CPC AbS Original VTR GSB CPC AbS Original VTR (c) Estimated VTR. F o = 1 Hz, vowel a. (d) Estimated VTR. F o = 3 Hz, vowel a. Figure 7 In Figure 6, a GCI detection block[7] is included, because, even though both acoustic and electroglottographic signals are simultaneously recorded, there is a propagation delay between the acoustic signal recorded on the microphone and the impedance variation at the neck of the singer. Thus, a precise synchronization is mandatory. Since this technique is based on the covariance LP, it may work with very short window lengths. However, as the fundamental frequency increases, the time length of the closed phase gets shorter, and there is much less information left for the vocal tract estimation. This fact imposes a fundamental frequency limit, even using the covariance LP..4. Practical results Once the basics of three inverse filtering techniques have been presented and described, they will be compared by simulations and also by making use of natural singing voice records. The main goal of this analysis is to see how the three techniques are compared in terms of their fundamental frequency limitations Simulation results First, the non interactive model for voice production shown in Figure 1 will be used in order to synthesize some artificial signals for test. The lip radiation effect and the glottal source are combined in a mathematical model for the GSD, also making use of the LF model. It is well known [1, 17] that the formant position can affect inverse filtering results. In [3], it is also shown that the lower first formant central frequency is, the higher is the source-tract interaction. So, the interaction is higher in vowels where the first format central frequency is lower. Therefore, and in order to cover all possible situations, two vocal all-pole filters have been used for synthesizing the test signal: one representing Spanish vowel a, and the other one representing Spanish vowel e. In this latter case, the first formant is located at lower frequencies. In order to see the fundamental frequency dependence of inverse filtering techniques, this parameter has been varied from 1 Hz to 3 Hz in 5 Hz steps. For each fundamental frequency, the three algorithms have been applied and the GSD as well as the VTR have been estimated. In Figures 7a to

90 11 EURASIP Journal on Applied Signal Processing Error F1 Error GSD F o (Hz) GSB CPC AbS GSB CPC AbS (a) F o (Hz) (c) Error F1 Error GSD F o (Hz) GSB CPC AbS GSB CPC AbS (b) F o (Hz) (d) Figure 8: Fundamental frequency dependence. (a) Error F1 in vowel a. (b) Error F1 in vowel e. (c) Error GSD in vowel a. (d) Error GSD in vowel e. 7d, the glottal GSD and the VTR estimated by the three approaches are shown for two different fundamental frequencies. Note that in them, and in other figures, DC level has been arbitrarily modified to facilitate comparisons. Comparing the results obtained by the three inverse filtering approaches, it is shown that as fundamental frequency increases the error in both GSD and VTR increases. Recalling the implementation of the algorithms, the CPC uses only the time interval where the GSD is zero. When the fundamental frequency is low, it is possible to see that the result of this technique is the closest one to the original one. In the case of the other two techniques, both have slight variations in the closed phase, because in both cases the glottal source effectis removed fromthe speech signal in an approximated manner. Otherwise, when the fundamental frequency is high, the AbS approach leads comparatively to the best result. However, it provides neither the right GSD, nor the right VTR. In Figure 8, the relative error in the first formant central frequency and the error in the GSD are represented for the three methods, calculated according to the following expressions: Error F 1 ˆF 1 F1 =, F 1 N 1 Error n= g(n) ĝ(n) (4) GSD =, N where F 1 represents the first formant central frequency and g(n) and ĝ(n) are the original and estimated GSD waveforms, respectively. Although the simulation model does not take into account source-tract interactions, Figure 8 shows that inverse filtering results are dependent on the first formant position, being worse as it moves to lower frequencies. Also, it is possible to see that both errors increase as fundamental frequency increases. Therefore, the main conclusion of this simulationbased study is that the inverse filtering results have fundamental frequency dependence even when applied to a non interactive source-filter model.

91 Vibrato in Singing Voice 113 Amplitude (db) Time (s) GSB CPC AbS (a) Estimated GSD. F o = 13 Hz, vowel a Frequency (Hz) GSB CPC AbS (b) Estimated VTR. F o = 13 Hz, vowel a. Amplitude (db) Time (s) GSB CPC AbS (c) Estimated GSD. F o = 95 Hz, vowel a Frequency (Hz) GSB CPC AbS (d) Estimated VTR. F o = 95 Hz, vowel a. Figure Natural singing voice results For this analysis, three male professional singers were recorded: two tenors and one baritone. They were asked to sing notes of different fundamental frequency values, in order to register samples of all of their tessitura. Besides, different vocal tract configurations are considered, and thus, this exercise was repeated for the five Spanish vowels a, e, i, o, u. The singing material was recorded in a professional studio, in such a way that reverberation was reduced as much as possible. Acoustic and electroglottographic signals were synchronously recorded, with a bandwidth of KHz, and stored in. wav format. In order to remove low frequency ambient noise, the signals were filtered out by a high pass linear phase FIR filter whose cut-off frequency was set to a 75% of the fundamental frequency. In the case of electroglottographic signals, this filtering was also applied because of low frequency artifacts typical of this kind of signals due to larynx movements. In Figures 9a to 9c, the results obtained for different fundamental frequencies and vowel a, for the same singer, are shown. These results are also representative of the other singers recordings and of the different vowels. By comparing Figures 9a and 9c,itispossibletoconclude that in the case of a low fundamental frequency, the three algorithms provide very close results. In the case of CPC, the GSD presents less formant ripple in the closed phase interval. Regarding the VTR, the central frequencies of the formants and the frequency responses are very similar. Nevertheless, in the case of a high fundamental frequency, the resulting GSD of the three analyses are very different from those of Figure 9a, and also from the waveform model provided by the LF model. Also, the calculated VTR is very different for the three methods. Thus, conclusions with natural recorded voices are similar to those obtained with synthetic signals. 3. VIBRATO IN SINGING VOICE 3.1. Definition In Section, inverse filtering techniques, successfully employed in speech processing, have been used for singing voice

92 114 EURASIP Journal on Applied Signal Processing processing. It has been shown that as fundamental frequency increases, they reach a limit and thus an alternative technique should be used. As we will show in this section, the introduction of vibrato in singing voice provides more information about what can be happening. Vibrato in singing voice could be defined as a small quasiperiodic variation of the fundamental frequency of the note. As a result of this variation, all of the harmonics of the voice will also present an amplitude variation, because of the filtering effect of the VTR. Due to these nonstationary characteristics of the signal, singing voice has been modeled by the modified sinusoidal model [5, 6]: N 1 s(t) = a i (t)cosθ i (t)r(t), (5) i= where t θ i (t) = π f i (τ)dτ (6) and a i (t) is the instantaneous amplitude of the partial, f i (t) the instantaneous frequency of the partial, andr(t) the stochastic residual. The acoustic signal is composed by a set of components, (partials), whose amplitude and frequency change with time, plus a stochastic residual, which is modeled by a spectral density time-varying function. Also in [5, 6], detailed information is given on how these time-varying characteristics can be measured. Of the two features of a vibrato signal, frequency and amplitude variations, frequency is the most widely studied and characterized. In [3, 33], the instantaneous frequency is characterized and decomposed into three main components which account for three musically meaningful characteristics, respectively. Namely, f (t) = i(t)e(t)cosϕ(t), (7) where t ϕ(t) = π r(τ)dτ (8) f (t) being the instantaneous frequency, i(t) the intonation of the note, which corresponds to slow variations of pitch; e(t) represents the extent or amplitude of pitch variations, and r(t) represents the rate or frequency of pitch variations. All of them are time-dependent magnitudes and rely on the musical context and singer s talent and training. In the case of intonation, its value depends on the sung note, and thus, on the context. But extent and rate are mostly singerdependent features, typical values being a 1% of the intonation value and 5 Hz, respectively. Regarding the amplitude variation of the harmonics during vibrato, a well-established parameterization is not accepted, and probably it does not exist, because this variation is different for all of the harmonics. It is therefore not strange that amplitude variation has been the topic of inter- Amplitude (db) Frequency (Hz) Figure 1: AM-FM representation for the first harmonics. Anechoic tenor recording F o = Hz, vowel a. est of some few papers. The first work on this topic is [34], where the perceptual relevance on spectral envelope discrimination of the instantaneous amplitude is proven. In [],the relevance of this feature is experimentally demonstrated in the case of synthesis of singing voice. Also, its physical cause is tackled and a representation in terms of the instantaneous amplitude versus instantaneous frequency of the harmonics is introduced for the first time. This representation is proposed as a means of obtaining a local information of the VTR in limited frequency ranges. Something similar is done in [35], where the singing voice is synthesized using this local information of the VTR. We have also contributed in this direction, for instance in [3], where the instantaneous amplitude is decomposed in two parts. The first one represents the sound intensity variation and the other one represents the amplitude variation determined by the local VTR, in an attempt to split the contribution of the source and the vocal tract. Moreover, in [4], different time-frequency processing tools have been used and compared in order to identify the relationship between instantaneous amplitude and instantaneous frequency. In that work, the AM-FM representation is defined as the instantaneous amplitude versus instantaneous frequency representation, with time being an implicit parameter. This representation is compared to the magnitude response of an all-pole filter, which is typically used for VTR modeling. Two main conclusions are derived, the first one is that only when anechoic recordings are considered, these two representations can be compared. Otherwise, the instantaneous magnitudes will be affected by reverberation. The second one is that, as a frequency modulated input is considered, and frequency modulation is not a linear operation, the phase of the all-pole system will affect the AM-FM representation, leading to a different representation than the vocal tract magnitude response. However the relevance of this effect depends on the formant bandwidth and vibrato characteristics, vibrato rate in this case. It was also shown that in natural vibrato the phase effect of VTR is not noticeable, because vibrato rate is slow comparing to formant bandwidths. Figure 1 constitutes a good example of the kind of AM-FM representations we are talking about. In it, each

93 Vibrato in Singing Voice 115 harmonic s instantaneous amplitude is represented versus its instantaneous frequency. For this case, only two vibrato cycles, where the vocal intensity does not change significantly, have been considered. As the number of harmonic increases, the frequency range swept by each harmonic widens. Comparing Figure 1 and Figure 9b, the AM-FM representation of the former one is very similar to the VTR of Figure 9b. However, in the case of the AM-FM representation, no source-filter separation has been made, and thus both elements are melted in that representation. The results obtained by other authors [, 35] are quite similar regarding the instantaneous amplitude versus instantaneous frequency representation, however, in those works no comment is made about the conditions of recordings. 3.. Simplified noninteractive source-tract model with vibrato The main conclusion from the results presented above could be that vibrato might be used in order to extract more information about glottal source and VTR in singing voice. Therefore, we will propose here a simplified noninteractive source-filter model with vibrato that will be a signal model of vibrato production and will explain the results provided by sinusoidal modeling. We will first make some basic assumptions regarding what is happening with GSD and VTR during vibrato. These assumptions are based on perceptual aspects of vibrato, and on the AM-FM representation for natural singing voice. (1) The GSD characteristics remain constant during vibrato, and only the fundamental frequency of the voice changes. This assumption is justified by the fact that perceptually there is no phonation change during a single note. () The intensity of the sound is constant, at least during one or two vibrato cycles. (3) The VTR remains invariant during vibrato. This assumption relies on the fact that vocalization does not change along the note. (4) The three vibrato characteristics remain constant. This assumption is not strictly true, but their time constants are considerably larger than the signal fundamental period. Taking into account these four assumptions, the simplified noninteractive source-filter model with vibrato could be represented by the block diagram in Figure 11. Based on this model, we will simulate the production of vibrato. The GSD characteristics are the same as in Section.4, and the VTR has been implemented as an all-pole filter whose frequency response represents Spanish vowel a. A frequency variation, typical of vibrato, has been applied to the GSD with a 1 Hz intonation, an extent of 1% of the intonation value, and a rate of 5,5 Hz. All of them are kept constant in the complete register. We have applied to the resulting signal both inverse filtering (where the presence or absence of vibrato does not influence the algorithm), and sinusoidal modeling, where in- O q f t α Glotal source derivative LF model F o (t): vibrato intonation rate extent VTR H(z) = 1 1 p k=1 a kz k Singing voice Figure 11: Noninteractive source-filter model with vibrato. stantaneous amplitude and instantaneous frequency of each harmonic need to be measured. Results obtained for this simulation are shown in Figures 1, 13, 14, and 15. In Figure 1a inverse filtering results are shown for a short window analysis. When fundamental frequency is low, GSD and VTR are well separated. In Figures 1a, 13a, sinusoidal modeling results are shown. The frequency variations of the harmonics of the signal are clearly observed and, as a result, the amplitude variation. On the other hand, in Figure 14, the AM- FM representation of the partials is shown. Taking into account the AM-FM representation of every partial, and comparing this to the VTR shown in Figure 1a, itispossible to conclude that a local information of the VTR is provided by this method. However, as no source-filter decomposition has been developed, each AM-FM representation is shifted in amplitude depending on the GSD spectral features. This effect is a result of keeping GSD parameters constant during vibrato. Comparing Figures 14 and 15,it canbenoticed that if the GSD magnitude spectrum is removed from the AM-FM representation of the harmonics, the resulting AM-FM representation would provide only VTR information. The result of this operation is shown in Figure 16. For this simplified noninteractive source-filter model with vibrato, instantaneous parameters of sinusoidal modeling provide a complementary information about both GSD and VTR. When inverse filtering works, the GSD effect can be removed from the AM-FM representation provided by sinusoidal modeling and only the information of the VTR remains Natural singing voice The relationship between these two signal models, noninteractive source-filter model and sinusoidal model, has been established for a synthetic signal where vibrato has been included under the four assumptions stated at the beginning of the section. Now, the question is whether this relationship holds in natural singing voice too. Therefore, both kinds of signal analysis will be now applied to natural singing voice. In order to get close to simulation conditions, some precautions have been taken in the recording process. (1) The musical context has been selected in order to control intensity variations of the sound. Singers were asked to sing a word of three notes, where the first and the last one simply provide a musical support and the note in between is a long sustained note. This note is two semitones higher than the two accompanying ones.

94 116 EURASIP Journal on Applied Signal Processing Time (s) Original GSD Inverse filtered GSD (a) Amplitude (db) Frequency (Hz) Original VTR Inverse filtered VTR (b) Figure 1: Inverse filtering results. GSB inverse filtering algorithm. (a) GSD (b) VTR. Frequency (Hz) Amplitude (db) Time (s) Time (s) (a) (b) Figure 13: Sinusoidal modeling results. (a) Instantaneous frequency. (b) Instantaneous amplitude. Amplitude (db) Frequency (Hz) Figure 14: AM-FM representation. Amplitude (db) Frequency (Hz) Short term spectrum Spectral peaks Figure 15: GSD short term spectrum. Blackman-Harris Window. () Recordings have been done in a studio where reverberations are reduced but not completely eliminated as in an anechoic room. In this situation, the AM-FM representation will present slight variations from the actual VTR, but it is still possible to develop a qualitative study. In Figures 17, 18, 19, and the results of these analyses are shown for a low-pitched baritone recording, F o = 18 Hz, vowel a. Contrarily to Figures 1, 13, 14, and 15, here there is no reference for the original GSD and VTR. Comparing Figures 1b, 13b and 17b, 18b, instantaneous frequency variation is similar in simulation and natural singing voice. How-

95 Vibrato in Singing Voice 117 Amplitude (db) Frequency (Hz) AM-FM representation without source VTR Figure 16: AM-FM representation without source. ever, the extent of vibrato in this baritone recording is lower than in synthetic signal. In the case of instantaneous amplitude, natural singing voice results are not as regular as synthetic ones. This is because of reverberation and irregularities of natural voice. Regarding intensity of the sound, there are not large variations in instantaneous amplitude, and so, for one or two vibrato cycles it could be considered constant. In this situation, the AM-FM representation of the harmonics, shown in Figure 19, is very similar to synthetic signal s AM- FM representation, though the already mentioned irregularities are present. In Figure, the GSD spectrum is shown for the signal of Figures 17a, 18a. It is very similar to the synthetic GSD spectrum, both are low frequency periodic signals, although it has slight variations in its harmonic amplitudes that will be explained later. Now, the so-obtained GSD spectrum will be used to extract from the AM-FM the information of the VTR. The result of this operation is shown in Figure 1. As in the case of synthetic signal, the compensated AM- FM representation is very close to the VTR obtained by inverse filtering. However, the matching is not as perfect as for the synthetic signal. From this two-signal model comparison, it is possible to conclude that the simplified noninteractive source-filter model with vibrato can explain, in an approximated way, what is happening in singing voice when vibrato is present. Now, it is possible to say that GSD and VTR have not large variations during a few vibrato cycles. In this way, the instantaneous amplitude and frequency obtained by sinusoidal modeling provide more, and complementary, information about GSD and VTR during vibrato than known analysis methods. It is important to note that the AM-FM representation by itself does not provide information of GSD and VTR separately, but it represents, in the vicinity of each harmonic, a small section of the VTR. In order to know what is exactly happening with GSD and VTR during vibrato, precautions have to be taken with recording conditions. Even in nonoptimum conditions, AM-FM representation of vibrato provides complementary information to that of inverse filtering methods. 4. DISCUSSION OF RESULTS AND CONCLUSIONS In Section, inverse filtering techniques have been reviewed, and their dependence on the fundamental frequency has been shown. It seems to be obvious that, regardless of the particular technique, inverse filtering in speech fails as frequency increases. In natural singing voice, where pitch is inherently high, there are no references in order to make sure whether this is the only cause of this failure. In Section 3, and with the aim to give an answer to this question, a novel noninteractive source-filter model has been introduced for singing voice modeling, including vibrato as an additional feature. It has been shown that this model can represent the vibrato production in singing voice. In addition, this model has allowed a relationship between sinusoidal modeling and source-filter model, through which authors have coined as AM-FM representation. In this last section, AM-FM representation will be used again in singing voice analysis, in order to determine whether there are other effects in singing voice when fundamental frequency increases. To this end, the same analysis of Section 3 has been applied to the signal database of Section corresponding to three male singers recordings. On the one hand, inverse filtering is applied and GSD and VTR are estimated. On the other hand, sinusoidal modeling is considered and the two instantaneous magnitudes (frequency and amplitude for each harmonic) are measured. Then, the AM-FM representation is obtained for each (frequency modulated) harmonic, and the GSD is removed from this representation using the GSD obtained by the inverse filtering. In Figure, the results obtained for several fundamental frequencies, for the baritone singer, are shown. As in Section, these results are representative of other singers recordings and other vowels. Regarding the AM-FM representation, it is possible to say, looking at Figure, that as fundamental frequency increases, the frequency range swept by one harmonic is wider, because of the extent and intonation relationship. Also, as fundamental frequency increases, the AM-FM representations of two consecutive harmonics are more separated, which is a direct consequence of their harmonic relationship. In addition to these obvious effects, there is no other evident consequence of fundamental frequency increase in this analysis, and thus the simplified noninteractive source-filter model with vibrato can model high-pitched singing voice with vibrato, from the signal point of view. The main limitation of the plain AM-FM representation is that no source-filter separation is possible unless it is combined with other method, and thus, from here, nothing can be said about the exact shape of GSD and VTR. However, the main advantage of this representation is that it has no fundamental frequency limit, and so, it can be applied in every singing voice sample with vibrato. This conclusion brings along another evidence: the noninteractive sourcefilter model remains valid in singing voice. We can summarize the main contributions and conclusions of this work as follows.

96 118 EURASIP Journal on Applied Signal Processing Time (s) (a) Amplitude (db) Frequency (Hz) (b) Figure 17: Inverse filtering results. GSB inverse filtering algorithm. (a) GSD (b) VTR. Frequency (Hz) Time (s) (a) Amplitude (db) Time (s) (b) Figure 18: Sinusoidal modeling results. (a) Instantaneous frequency. (b) Instantaneous amplitude. Amplitude (db) Frequency (Hz) Figure 19: AM-FM representation. Amplitude (db) Frequency (Hz) AM-FM representation without source VTR of inverse filtering Figure 1: AM-FM representation without source. Amplitude (db) Frequency (Hz) GSD spectrum Spectral peaks Figure : GSD Short term spectrum. Blackman-Harris window. (i) Several representative inverse filtering techniques have been critically compared when applied to speech. It has been shown how all of them fail as frequency increases, as it is the case in singing voice. (ii) A novel noninteractive source-filter model has been proposed for singing voice, which includes vibrato as a possible feature. (iii) The existence of vibrato and the above mentioned model has allowed to relate source-filter model (i.e., inverse filtering techniques) and the simple sinusoidal

97 Vibrato in Singing Voice 119 Amplitude (db) Amplitude (db) Amplitude (db) Frequency (Hz) AM-FM representation without source VTR of inverse filtering (a) Frequency (Hz) AM-FM representation without source VTR of inverse filtering (b) Frequency (Hz) AM-FM representation without source VTR of inverse filtering (c) Figure : AM-FM representation removing the source and VTR given by inverse filtering. (a) F o = 11 Hz, vowel a, (b) F o = 156 Hz, vowel a (c) F o = 7 Hz, vowel a. Model. In other words, although both are signal models for singing voice, the first one is related to the voice production and the second one is a general signal model, but thanks to vibrato both can be linked. (iv) Even though sinusoidal modeling does not allow to obtain separate information about the sound source and VTR, the AM-FM representation gives complementary information particularly in high frequency ranges, where inverse filtering does not work. ACKNOWLEDGMENTS The Gobierno de Navarra and the Universidad Pública de Navarra are gratefully acknowledged for financial support. Authors would also like to acknowledge the support from Xavier Rodet and Axel Roebel, (IRCAM, Paris), material and medical support from Ana Martínez Arellano, and the collaboration from student Daniel Erro who implemented some of the algorithms. REFERENCES [1] N. Henrich, Etude de la source glottique en voix parlée et chantée : modélisation et estimation, mesures acoustiques et électroglottographiques, perception, Ph.D. thesis, Paris 6 University, Paris, France, 1. [] B. H. Story, An overview of the physiology, physics and modeling of the sound source for vowels, Acoustical Science and Technology, vol. 3, no. 4, pp ,. [3] B. Guerin, M. Mrayati, and R. Carre, A voice source taking account of coupling with the supraglottal cavities, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 76), vol. 1, pp. 47 5, Philadelphia, Pa, USA, April [4] T. V. Ananthapadmanabha and G. Fant, Calculation of the true glottal flow and its components, Speech Communication, vol. 1, no. 3-4, pp , 198. [5] M. Berouti, D. G. Childers, and A. Paige, Glottal area versus glottal volume-velocity, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 77), vol., pp , Cambridge, Mass, USA, May [6] G. Fant, Acoustic Theory of Speech Production, Mouton, The Hague, The Netherlands, 196. [7] D. H. Klatt and L. C. Klatt, Analysis, synthesis, and perception of voice quality variations among female and male talkers, Journal of the Acoustical Society of America, vol. 87, no., pp , 199. [8] G. Fant, J. Liljencrants, and Q. Lin, A four-parameter model of glottal flow, Speech Transmission Laboratory-Quarterly Progress and Status Report, vol. 85, no., pp. 1 13, [9] H. Fujisaki and M. Ljungqvist, Proposal and evaluation of models for the glottal source waveform, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 86), vol. 11, pp , Tokyo, Japan, April [1] A. K. Krishnamurthy and D. G. Childers, Two-channel speech analysis, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 34, no. 4, pp , [11] P. Alku and E. Vilkman, Estimation of the glottal pulseform based on discrete all-pole modeling, in Proc. nd International Conf. on Spoken Language Processing (ICSLP 94), pp , Yokohama, Japan, September [1] H.-L. Lu and J. O. Smith, Joint estimation of vocal tract filter and glottal source waveform via convex optimization, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 99), pp. 79 9, New Paltz, NY, USA, October [13] I. Arroabarren and A. Carlosena, Glottal spectrum based inverse filtering, in Proc. 8th European Conference on Speech Communication and Technology (EUROSPEECH 3), Geneva, Switzerland, September 3.

1 EURASIP Journal on Applied Signal Processing [14] E. L. Riegelsberger and A. K. Krishnamurthy, Glottal source estimation: methods of applying the LF-model to inverse filtering, in Proc. IEEE Int.

Diard, Spectral methods for voice source parameters estimation, in Proc. 5th European Conference on Speech Communication and Technology (EUROSPEECH 97), vol. 1, pp.

98 1 EURASIP Journal on Applied Signal Processing [14] E. L. Riegelsberger and A. K. Krishnamurthy, Glottal source estimation: methods of applying the LF-model to inverse filtering, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 93), vol., pp , Minneapolis, Minn, USA, April [15] B. Doval, C. d Alessandro, and B. Diard, Spectral methods for voice source parameters estimation, in Proc. 5th European Conference on Speech Communication and Technology (EUROSPEECH 97), vol. 1, pp , Rhodes, Greece, September [16] I. Arroabarren and A. Carlosena, Glottal source parameterization: a comparative study, in Proc. ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis, Geneva, Switzerland, August 3. [17] H.-L. Lu, Toward a high-quality singing synthesizer with vocal texture control, Ph.D. thesis, Stanford University, Stanford, Calif, USA,. [18] N. Henrich, B. Doval, and C. d Alessandro, Glottal open quotient estimation using linear prediction, in Proc. International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, Firenze, Italy, September [19] N. Henrich, B. Doval, C. d Alessandro, and M. Castellengo, Open quotient measurements on EGG, speech and singing signals, in Proc. 4th International Workshop on Advances in Quantitative Laryngoscopy, Voice and Speech Research, Jena, Germany, April. [] N. Henrich, C. d Alessandro, and B. Doval, Spectral correlates of voice open quotient and glottal flow asymmetry: theory, limits and experimental data, in Proc. 7th European Conference on Speech Communication and Technology (EU- ROSPEECH 1), Aalborg, Denmark, September 1. [1] H.-L. Lu and J. O. Smith, Glottal source modeling for singing voice synthesis, in Proc. International Computer Music Conference (ICMC ), Berlin, Germany, August. [] R. Maher and J. Beauchamp, An investigation of vocal vibrato for synthesis, Applied Acoustics, vol. 3, no. -3, pp , 199. [3] I. Arroabarren, M. Zivanovic, and A. Carlosena, Analysis and synthesis of vibrato in lyric singers, in Proc. 11th European Signal Processing Conference (EUSIPCO ), Toulose, France, September. [4] I. Arroabarren, M. Zivanovic, X. Rodet, and A. Carlosena, Instantaneous frequency and amplitude of vibrato in singing voice, in Proc. IEEE 8th Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 3), Hong Kong, China, April 3. [5] R. J. McAulay and T. F. Quatieri, Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 34, no. 4, pp , [6] X. Serra, Musical sound modeling with sinusoids plus noise, in Musical Signal Processing,C.Roads,S.Pope,A.Picialli,and G. De Poli, Eds., Swets & Zeitlinger, Lisse, The Netherlands, May [7] C.Ma,Y.Kamp,andL.F.Willems, AFrobeniusnormapproach to glottal closure detection from the speech signal, IEEE Trans. Speech, and Audio Processing, vol.,no.,pp , [8] J. Makhoul, Linear prediction: a tutorial review, Proceedings of the IEEE, vol. 63, no. 4, pp , [9] A. El-Jaroudi and J. Makhoul, Discrete all-pole modeling, IEEE Trans. Signal Processing, vol. 39, no., pp , [3] B. Doval and C. d Alessandro, Spectral correlates of glottal waveform models: an analytic study, in Proc. IEEE th Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 97),pp , Munich, Germany, April [31] D. Y. Wong, J. D. Markel, and A. H. Gray, Least squares glottal inverse filtering from the acoustic speech waveform, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 7, no. 4, pp , [3] E. Prame, Vibrato extent and intonation in professional western lyric singing, Journal of the Acoustical Society of America, vol. 1, no. 1, pp , [33] I. Arroabarren, M. Zivanovic, J. Bretos, A. Ezcurra, and A. Carlosena, Measurement of vibrato in lyric singers, IEEE Trans. Instrumentation and Measurement, vol.51,no.4,pp ,. [34] S. McAdams and X. Rodet, The role of FM-induced AM in dynamic spectral profile analysis, in Basic Issues in Hearing, H. Duifhuis, J. Horst, and H. Wit, Eds., pp , Academic Press, London, UK, [35] M. Mellody and G. H. Wakefield, Signal analysis of the singing voice:low-order representations of singer identity, in Proc. International Computer Music Conference (ICMC ), Berlin, Germany, August. Ixone Arroabarren was born in Arizkun, Navarra, Spain, on December 11, She received her Eng. degree in telecommunications in 1999, from the Public University of Navarra, Pamplona, Spain, where she is currently pursuing her Ph.D. degree in the area of signal processing techniques as they apply to musical signals. She has collaborated in industrial projects for the vending machine industry. Alfonso Carlosena was born in Navarra, Spain, in 196. He received his M.Sc. degree with honors and his Ph.D. in physics in 1985 and 1989, respectively, both from the University of Zaragoza, Spain. From 1986 to 199 he was an Assistant Professor in the Department of Electrical Engineering and Computer Science at the University of Zaragoza, Spain. Since October 199, he has been an Associate Professor with the Public University of Navarra, where he has also served as Head of the Technology Transfer Office. In March, he was promoted to Full Professor at the same University. He has also been a Visiting Scholar in the Swiss Federal Institute of Technology, Zurich and New Mexico State University, Las Cruces. His current research interests are in the areas of analog circuits and signal processing, digital signal processing and instrumentation, where he has published over sixty papers in international journals and a similar number of conference presentations. He is currently leading several industrial projects for local firms.

99 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation A Hybrid Resynthesis Model for Hammer-String Interaction of Piano Tones Julien Bensa LaboratoiredeMécanique et d Acoustique, Centre National de la Recherche Scientifique (LMA-CNRS), 134 Marseille Cedex, France bensa@lma.cnrs-mrs.fr Kristoffer Jensen Datalogisk Institut, Københavns Universitet, Universitetsparken 1, 1 København, Denmark krist@diku.dk Richard Kronland-Martinet LaboratoiredeMécanique et d Acoustique, Centre National de la Recherche Scientifique (LMA-CNRS), 134 Marseille Cedex, France kronland@lma.cnrs-mrs.fr Received 7 July 3; Revised 9 December 3 This paper presents a source/resonator model of hammer-string interaction that produces realistic piano sound. The source is generated using a subtractive signal model. Digital waveguides are used to simulate the propagation of waves in the resonator. This hybrid model allows resynthesis of the vibration measured on an experimental setup. In particular, the nonlinear behavior of the hammer-string interaction is taken into account in the source model and is well reproduced. The behavior of the model parameters (the resonant part and the excitation part) is studied with respect to the velocities and the notes played. This model exhibits physically and perceptually related parameters, allowing easy control of the sound produced. This research is an essential step in the design of a complete piano model. Keywords and phrases: piano, hammer-string interaction, source-resonator model, analysis/synthesis. 1. INTRODUCTION This paper is a contribution to the design of a complete piano-synthesis model. (Sound examples obtained using the method described in this paper can be found at kronland/jasp/sounds.html.) It is the result of several attempts [1, ], eventually leading to a stable and robust methodology. We address here the modeling for synthesis of a key aspect of piano tones: the hammerstring interaction. This model will ultimately need to be linked to a soundboard model to accurately simulate piano sounds. The design of a synthesis model is strongly linked to the specificity of the sounds to be produced and to the expected use of the model. This work was done in the framework of the analysis-synthesis of musical sounds; we seek both reconstructing a given piano sound and using the synthesis model in a musical context. The perfect reconstruction of given sounds is a strong constraint: the synthesis model must be designed so that the parameters can be extracted from the analysis of natural sounds. In addition, the playing of the synthesis model requires a good relationship between the physics of the instrument, the synthesis parameters, and the generated sounds. This relationship is crucial to having a good interaction between the digital instrument and the player, and it will constitute the most important aspects our piano model has to deal with. Music based on the so-called sound objects like electro-acoustic music or musique concrète lies on synthesis models allowing subtle and natural transformations of the sounds. The notion of natural transformation of sounds consists here in transforming them so that they correspond to a physical modification of the instrument. As a consequence, such sound transformations calls for the model to include physical descriptions of the instrument. Nevertheless, the physics of musical instruments is sometimes too complicated to be exhaustively taken into account, or not modeled well enough to lead to satisfactory sounds. This is the case of the piano, for which hundreds of mechanical components are connected [3], and for which

100 1 EURASIP Journal on Applied Signal Processing the hammer-string interaction still poses physical modeling problems. To take into account the necessary simplifications made in the physical description of the piano sounds, we have used hybrid models that are obtained by combining physical and signal synthesis models [4, 5]. The physical model simulates the physical behavior of the instrument whereas the signal model seeks to recreate the perceptual effect produced by the instrument. The hybrid model provides a perceptually plausible resynthesis of a sound as well as intimate manipulations in a physically and perceptually relevant way. Here, we have used a physical model to simulate the linear string vibration, and a physically informed signal model to simulate the nonlinear interaction between the string and the hammer. An important problem linked to hybrid models is the coupling of the physical and the signal models. To use a source-resonator model, the source and the resonator must be uncoupled. Yet, this is not the case for the piano since the hammer interacts with the strings during to 5 milliseconds [6, 7]. A significant part of the piano sound characteristics is due to this interaction. Even though this observation is true from a physical point of view, this short interaction period is not in itself of great importance from a perceptual point of view. The attack is constituted of two parts due to two vibrating ways [8]: one percussive, a result of the impact of the key on the frame, and another that starts when the hammer strikes the strings. Schaeffer [9] showed that cutting the first milliseconds of a piano sound (for a bass note, for which the impact of the key on the frame is less perceptible) does not alter the perception of the sound. We have informally carried out such an experiment by listening to various piano sounds cleared of their attack. We found that, from a perceptual point of view, when the noise due to the impact of the key on the frame is not too great (compared to the vibrating energy provided by the string), the hammer-string interaction is not audible in itself. Nevertheless, this interaction undoubtedly plays an important role as an initial condition for the string motion. This is a substantial point justifying the dissociation of the string model and the source model in the design of our synthesis model. Thus, the resulting model consists in what is commonly called a source-resonant system (as illustrated in Figure 1). Note that the model still makes sense for high-frequency notes, for which the impact noise is of importance. Actually, the hammer-string interaction only lasts a couple of milliseconds, while the impact sound consists of an additional sound, which can be simulated using predesigned samples. Since waves are still running in the resonator after the release of the key, repeated keystroke is naturally taken into account by the model. Laroche and Meillier [1] used such a source-resonator technique for the synthesis of piano sound. They showed that realistic piano tones can be produced using IIR filters to model the resonator and common excitation signals for several notes. Their simple resonator model, however, yielded excitation signals too long (from 4 to 5 seconds) to accurately reproduce the piano sound. Moreover, that model took into account neither the coupling between strings nor the de- Control Source (nonlinear signal model) Excitation Resonator (physical model) Figure 1: Hybrid model of piano sound synthesis. Sound pendence of the excitation on the velocity and octave variations. Smith proposed efficient resonators [11] by using the so-called digital waveguide. This approach simulates the physics of the propagating waves in the string. Moreover, the waveguide parameters are naturally correlated to the physical parameters, making for easy control. Borin and Bank [1, 13] used this approach to design a synthesis model of piano tones based on physical considerations by coupling digital waveguides and a force generator simulating the hammer impact. The commuted synthesis concept [14, 15, 16] uses the linearity of the digital waveguide to commute and combine elements. Then, for the piano, a hybrid model was proposed, combining digital waveguide, a phenomenological hammer model, and a time-varying filtering that simulates the soundboard behavior. Our model is an extension of these previous works, to which we added a strong constraint of resynthesis capability. Here, the resonator was modeled using a physically related model, the digital waveguide; and the source destined to generate the initial condition for the string motion was modeled using a signal-based nonlinear model. Theadvantagesofsuchahybridmodelarenumerous: (i) it is simple enough so that the parameters can be accurately estimated from the analysis of real sound, (ii) it takes into account the most relevant physical characteristics of the piano strings (including coupling between strings) and it permits the playing to be controlled (the velocity of the hammer), (iii) it simulates the perceptual effect due to the nonlinear behavior of the hammer-string interaction, and it allows sounds transformation with both physical and perceptual approaches. Even though the model we propose is not computationally costly, we address here its design and its calibration rather than its real time implementation. Hence, the calculus and reasoning are done in the frequency domain. The time domain implementation should give rise to a companion article.. THE RESONATOR MODEL Several physical models of transverse wave propagation on a struck string have been published in the literature [17, 18, 19, ]. The string is generally modeled using a one-dimensional wave equation. The specific features of the piano string that are important in wave propagation (dispersion due to the stiffness of the string and frequency-dependent losses) are further incorporated through several perturbation terms. To account for the hammer-string interaction, this equation is then coupled to a nonlinear force term, leading to a system of

101 Hybrid Resynthesis of Piano Tones 13 equations for which an analytical solution cannot be exhibited. Since the string vibration is transmitted only to the radiating soundboard at the bridge level, it is not useful to numerically calculate the entire spatial motion of the string. The digital waveguide technique [11] provides an efficient way of simulating the vibration at the bridge level of the string, when struck at a given location by the hammer. Moreover, the parameters of such a model can be estimated from the analysis of real sounds [1]..1. The physics of vibrating strings We present here the main features of the physical modeling of piano strings. Consider the propagation of transverse waves in a stiff damped string governed by the motion equation [1] y t c y x κ 4 y x 4 b y 1 t b 3 y x = P(x, t), (1) t where y is the transverse displacement, c the wave speed, κ the stiffness coefficient, b 1 and b the loss parameters. Frequency-dependent loss is introduced via mixed timespace derivative terms (see [1, ] for more details). We apply fixed boundary conditions y x= = y x=l = y x = y x= x =, () x=l where L is the length of the string. After the hammer-string contact, the force P isequaltozeroandthissystemcanbe solved. An analytical solution can be expressed as a sum of exponentially damped sinusoids: y(x, t) = a n (x)e αnt e iωnt, (3) n=1 where a n is the amplitude, α n is the damping coefficient, and ω n is the frequency of the nth partial. Due to the stiffness, the waves are dispersed and the partial frequencies, which are not perfectly harmonic, are given by [3] ω n = πnω 1Bn, (4) where ω is the fundamental radial frequency of the string without stiffness, and B is the inharmonicity coefficient [3]. The losses are frequency dependent and expressed by [1] ( π [ α n = b 1 b BL 1 14B ( ]) ) ω n /ω. (5) The spectral content of the piano sound, and of most musical instruments, is modified with respect to the dynamics. For the piano, this nonlinear behavior consists of an increase of the brightness of the sound and it is linked mainly to the hammer-string contact (the nonlinear nature of the generation of longitudinal waves also participates in the increase of brightness; we do not take this phenomena into account since we are interested only in transversal waves). The stiff- G(ω) E(ω) D(ω) F(ω) S(ω) Figure : Elementary digital waveguide (named G). ness of the hammer felt increases with the impact velocity. In the next paragraph, we show how the waveguide model parameters are related to the amplitudes, damping coefficients, and frequencies of each partial... Digital waveguide modeling..1. The single string case: elementary digital waveguide To model wave propagation in a piano string, we use a digital waveguide model [11]. In the single string case, the elementary digital waveguide model (named G) we used consists of a single loop system (Figure ) including (i) a delay line (a pure delay filter named D) simulating the time the waves take to travel back and forth in the medium, (ii) a filter (named F) taking into account the dissipation and dispersion phenomena, together with the boundary conditions. The modulus of F is then related to the damping of the partials and the phase to inharmonicity in the string, (iii) an input E corresponding to the frequency-dependent energy transferred to the string by the hammer, (iv) an output S representing the vibrating signal measured at an extremity of the string (at the bridge level). The output of the digital waveguide driven by a delta function can be expanded as a sum of exponentially damped sinusoids. The output thus coincides with the solution of the motion equation of transverse waves in a stiff damped string for a source term given by a delta function force. As shown in [1, 4], the modulus and phase of F are related to the damping and the frequencies of the partials by the expressions F ( ) ω n = e α nd, arg ( F ( )) (6) ω n = ωn D nπ, with ω n and α n given by (4)and(5). Aftersomecalculations(see[1]), we obtain the expressions of the modulus and the phase of the loop filter in terms of the physical parameters: F(ω) ( [ exp D b 1 b π ξ BL ]), (7) arg ( F(ω) ) ξ Dω Dω B, (8)

102 14 EURASIP Journal on Applied Signal Processing with ξ = 1 14Bω /ω (9) C e C a C a in terms of the inharmonicity coefficient B [3].... The multiple strings case: coupled digital waveguides In the middle and the treble range of the piano, there are two or three strings for each note in order to increase the efficiency of the energy transmission towards the bridge. The vibration produced by this coupled system is not the superposition of the vibrations produced by each string. It is the result of a complex coupling between the modes of vibration of these strings [5]. This coupling leads to phenomena like beats and double decays on the amplitude of the partials, which constitute one of the most important features of the piano sound. Beats are used by professionals to precisely tune the doublets or triplets of strings. To resynthesize the vibration of several strings at the bridge level, we use coupled digital waveguides. Smith [14] proposed a coupling model with two elementary waveguides. He assumed that the two strings were coupled to the same termination, and that the losses were lumped to the bridge impedance. This technique leads to a simple model necessitating only one loss filter. But the decay times and the coupling of the modes are not independent. Välimäki et al. [6] proposed another approach that couples two digital waveguides through real gain amplifiers. In that case, the coupling is the same for each partial, and the time behavior of the partials is similar. For synthesis purpose, Bank [7] showed that perceptually plausible beating sound can be obtained by adding only a few resonators in parallel. We have designed two models, a two- and a threecoupled digital waveguides, which are an extension of Välimäki et al. s approach. They consist in separating the time behavior of the components by using complex-valued and frequency-dependent linear filters to couple the waveguides. The three-coupled digital waveguide is shown on Figure 3. The two models accurately simulate the energy transfer between the strings (see Section.4.3). A related method [8] (with an example of piano coupling) has been recently available in the context of digital waveguide networks. Each string is modeled using an elementary digital waveguide (named G 1, G, G 3 ;eachloopfilteranddelaysare named F 1, F, F 3,andD 1, D, D 3 respectively). The coupled model is then obtained by connecting the output of each elementary waveguide to the input of the others through coupling filters. The coupling filters simulate the wave propagation along the bridge and are thus correlated to the distance between the strings. In the case of a doublet of strings, the two coupling filters (named C) are identical. In the case of a triplet of strings, the coupling filters of adjacent strings (named C a )areequalbutdiffer from the coupling filters of the extreme strings (named C e ). The excitation signal is assumed to be the same for each elementary waveguide since we suppose the hammer strikes the strings in a similar way. E(ω) C e(ω) C e(ω) C a(ω) C a(ω) C a(ω) C a(ω) G 1(ω) G (ω) G 3(ω) S(ω) Figure 3: The three-coupled digital waveguide (bottom) and the corresponding physical system at the bridge level (top). To ensure the stability of the different models, one has to respect specific relations. First the modulus of the loop filters must be inferior to 1. Second, for coupled digital waveguides, the following relations must be verified: C G 1 G < 1 (1) in the case of two-coupled waveguides, and G 1 G Ca G 1 G 3 Ce G G 3 Ca G 1 G G 3 CaC e < 1 (11) in the case of three-coupled waveguides. Assuming that those relations are verified, the models are stable. This work takes place in the general analysis-synthesis framework, meaning that the objective is not only to simulate sounds, but also to reconstruct a given sound. The model must therefore be calibrated carefully. In the next section is presented the inverse problem allowing the waveguide parameters to be calculated from experimental data. We then describe the experiment and the measurements for one-, two- and three-coupled strings. We then show the validity and the accuracy of the analysis-synthesis process by comparing synthetic and original signals. Finally, the behavior of the signal of the real piano is verified.

103 Hybrid Resynthesis of Piano Tones The inverse problem We address here the estimation of the parameters of each elementary waveguide as well as the coupling filters from the analysis of a single signal (measured at the bridge level). For this, we assume that in the case of three-coupled strings the signal is composed of a sum of three exponentially decaying sinusoids for each partial (and respectively one and two exponentially decaying sinusoids in the case of one and two strings). The estimation method is a generalization of the one described in [9] for one and two strings. It can be summarized as follows: start by isolating each triplet of the measured signal through bandpass filtering (a truncated Gaussian window); then use the Hilbert transform to get the corresponding analytic signal and obtain the average frequency of the component by derivating the phase of this analytic signal; finally, extract from each triplet the three amplitudes, damping coefficients, and frequencies of each partial by a parametric method (Steiglitz-McBride method [3]). The second part of the process is described in detail in the appendix. In brief, we identify the Fourier transform of the sum of the three exponentially damped sinusoids (the measured signal) with the transfer function of the digital waveguide (the model output). This identification leads to a linear system that admits an analytical solution in the case of one or two strings. In the case of three coupled strings, the solution can be found only numerically. The process gives an estimation of the modulus and of the phase of each filter near the resonance peaks as a function of the amplitudes, damping coefficients, and frequencies. Once the resonator model is known, we extract the excitation signal by a deconvolution process with respect to the waveguide transfer function. Since the transfer function has been identified near the resonant peaks, the excitation is also estimated at discrete frequency values corresponding to the partial frequencies. This excitation corresponds to the signal that has to be injected into the resonator to resynthesize the actual sound..4. Analysis of experimental data and validation of the resonator model We describe here first an experimental setup allowing the measurement of the vibration of one, two, or three strings struck by a hammer for different velocities. Then we show how to estimate the resonator parameters from those measurements, and finally, we compare original and synthesized signals. This experimental setup is an essential step that validates the estimation method. Actually, estimating the parameters of one-, two-, or three-coupled digital waveguides from only one signal is not a trivial process. Moreover, in a real piano, many physical phenomena are not taken into account in the model presented in the previous section. It is then necessary to verify the validity of the model on a laboratory experiment before applying the method to the piano case Experimental setup On the top of a massive concrete support, we have attached a piece of a bridge taken from a real piano. On the other extremity of the structure, we have attached an agraffe on Modulus Velocity (m/s) Frequency (Hz) Figure 4: Amplitude of filter F as a function of the frequency and of hammer velocity. a hardwood support. The strings are tightened between the bridge and the agraffe and tuned manually. It is clear that the strings are not totally uncoupled to their support. Nevertheless, this experiment has been used to record signals of struck strings, in order to validate the synthesis models, and was it entirely satisfactory for this purpose. One, two, or three strings are struck with a hammer linked to an electronically piloted key. By imposing different voltages to the system, one can control the hammer velocity in a reproducible way. The precise velocity is measured immediately after escapement by using an optic sensor (MTI, probe module 15H) pointing to the side of the head of the hammer. The vibration at the bridge level is measured by an accelerometer (B&K 4374). The signals are directly recorded on digital audio tape. Acceleration signals correspond to hammer velocities between.8 m.s 1 and 5.7 m.s Filter estimation From the signals collected on the experimental setup, a set of data was extracted. For each hammer velocity, the waveguide filters and the corresponding excitation signals were estimated using the techniques described above. The filters were studied in the frequency domain; it is not the purpose of this paper to describe the method for the time domain and to fit the transfer function using IIR or FIR filters. Figure 4 shows the modulus of the filter response F for the first twenty-five partials in the case of tones produced by a single string. Here the hammer velocity varies from.7 m.s 1 to 4 m.s 1. One notices that the modulus of the waveguide filters is similar for all hammer velocities. The resonator represents the strings that do not change during the experiment. If the estimated resonator remains the same for different hammer velocities, all the nonlinear behavior due to the dynamic has been taken into account in the excitation part. The resonator and the source are well separated. This result validates our approach based on a source-resonator separation. For high frequency partials, however, the filter modulus decreased slightly as a function of the hammer velocity. This nonlinear behavior is not directly linked to the

104 16 EURASIP Journal on Applied Signal Processing Modulus Velocity (m/s) Frequency (Hz) Phase Velocity (m/s) Frequency (Hz) Figure 5: Amplitude of filter F (three-coupled waveguide model) as a function of the frequency and of hammer velocity. Figure 6: Phase of filter F as a function of the frequency and hammer velocity. hammer-string contact. It is mainly due to nonlinear phenomena involved in the wave propagation. At large amplitude motion, the tension modulation introduces greater internal losses (this effectisevenmorepronouncedinplucked strings than in struck strings). The filter modulus slowly decreases (as a function of frequency) from a value close to 1. Since the higher partials are more damped than the lower ones, the amplitude of the filter decreases as the frequency increases. The value of the filter modulus (close to 1) suggests that the losses are weak. This is true for the piano string and is even more obvious on this experimental setup, since the lack of a soundboard limits the acoustic field radiation. More losses are expected in the real piano. We now consider the multiple strings case. From a physical point of view, the behavior of the filters F 1, F,andF 3 (which characterize the intrinsic losses) of the coupled digital waveguides should be similar to the behavior of the filter F for a single string, since the strings are supposed identical. This is verified except for high-frequency partials. This behavior is shown on Figure 5 for filter F of the three-coupled waveguide model. Some artifacts pollute the drawing at high frequencies. The poor signal/noise ratio at high frequency (above Hz) and low velocity introduce error terms in the analysis process, leading to mistakes on the amplitudes of the loop filters (for instance, a very small value of the modulus of one loop filter may be compensated by a value greater than one for another loop filter; the stability of the coupled waveguide is then preserved). Nevertheless, this does not alter the synthetic sound since the corresponding partials (high frequency) are weak and of short duration. The phase is also of great importance since it is related to the group delay of the signal and consequently directly linked to the frequency of the partials. The phase is a nonlinear function of the frequency (see (8)). It is constant with the hammer velocity (see Figure 6) since the frequencies of the partials are always the same (linearity of the wave propagation). Modulus Velocity (m/s) Frequency (Hz) Figure 7: Modulus of filter C a as a function of the frequency and of hammer velocity. The coupling filters simulate the energy transfer between the strings and are frequency dependent. Figure 7 represents one of these coupling filters for different values of the hammer velocity. The amplitude is constant with respect to the hammer velocity (up to signal/noise ratio at high frequency and low velocity), showing that the coupling is independent of the amplitude of the vibration. The coupling rises with the frequency. The peaks at frequencies 7 Hz and 13 Hz correspond to a maximum Accuracy of the resynthesis At this point, one can resynthesize a given sound by using a single- or multicoupled digital waveguide and the parameters extracted from the analysis. For the synthetic sounds to be identical to the original requires describing the filters precisely. The model was implemented in the frequency domain, as described in Section, thus taking into account the exact amplitude and the phase of the filters (for instance, for a three-coupled digital waveguide, we have to implement three

105 Hybrid Resynthesis of Piano Tones 17 Amplitude (arbitrary scale). Amplitude (arbitrary scale) Frequency (Hz) Time (s) 4 6 Frequency (Hz) Time (s) (a) (a) Amplitude (arbitrary scale). Amplitude (arbitrary scale) Frequency (Hz) Time (s) 4 6 Frequency (Hz) Time (s) (b) (b) Figure 8: Amplitude modulation laws (velocity of the bridge) for the first six partials, one string, of the (a) original and (b) resynthesised sound. Figure 1: Amplitude modulation laws (velocity of the bridge) for the first six partials, three strings, of the (a) original and (b) resynthesised sound. Amplitude (arbitrary scale) Frequency (Hz) Amplitude (arbitrary scale) 4 6 Frequency (Hz) (a) (b) 4 Time (s) 4 Time (s) Figure 9: Amplitude modulation laws (velocity of the bridge) for the first six partials, two strings, of the (a) original and (b) resynthesised sound. delays and five complex filters, moduli, and phases). Nevertheless, for real-time synthesis purposes, filters can be approached by IIR of low order (see, e.g., [6]). This aspect will be developed in future reports. By injecting the excitation signal obtained by deconvolution into the waveguide model, the signal measured is reproduced on the experimental setup. Figures 8, 9, and 1 show the amplitude modulation laws (velocity of the bridge) of the first six partials of the original and the resynthesized sound. The variations of the temporal envelope are generally well retained, and for the coupled system (in Figures 9 and 1), the beat phenomena are well reproduced. The slight differences, not audible, are due to fine physical phenomena (coupling between the horizontal and the vertical modes of the string) that are not taken into account in our model. In the one-string case, we now consider the second and sixth partials of the original sound in Figure 8. We can see beats (periodic amplitude modulations) that show coupling phenomena on only one string. Indeed, the horizontal and vertical modes of vibration of the string are coupled through the bridge. This coupling was not taken into account in this study since the phenomenon is of less importance than coupling between two different strings. Nevertheless, we have shown in [9] that coupling between two modes of vibration can also be simulated using a two-coupled digital waveguide model. The accuracy of the resynthesis validates a posteriori our model and the source-resonator approach..5. Behavior and control of the resonator through measurements on a real piano To take into account the note dependence of the resonator, we made a set of measurements on a real piano, a Yamaha Disklavier C6 grand piano equipped with sensors. The vi-

106 18 EURASIP Journal on Applied Signal Processing Modulus Frequency (Hz) Modeled Original Figure 11: Modulus of the waveguide filters for notes A, F1 and D3, original and modeled. Amplitude (arbitrary scale) Time (ms).8 m/s m/s 4m/s Figure 1: Waveform of three excitation signals of the experimental setup, correspondingtothree different hammer velocities. brations of the strings were measured at the bridge by an accelerometer, and the hammer velocities were measured by a photonic sensor. Data were collected for several velocities and several notes. We used the estimation process described in Section.3 for the previous experimental setup and extracted for each note and each velocity the corresponding resonator and source parameters. As expected, the behavior of the resonator as a function of the hammer velocity and for a given note is similar to the one described in Section.4., for the signals measured on the experimental setup. The filters are similar with respect to the hammer velocity. Their modulus is close to one, but slightly weaker than previously, since it now takes into account the losses due to the acoustic field radiated by the soundboard. The resynthesis of the piano measurements through the resonator model and the excitation obtained by deconvolution are perceptively satisfactory since the sound is almost indistinguishable from the original one. On the contrary, the shape of the filters is modified as a function of the note. Figure 11 shows the modulus of the waveguide filter F for several notes (in the multiple string case, we calculated an average filter by arithmetic averaging). The modulus of the loop filter is related to the losses undergone by the wave over one period. Note that this modulus increases with the fundamental frequency, indicating decreasing loss over one period as the treble range is approached. The relations (7) and(8), relating the physical parameters to the waveguide parameters, allow the resonator to be controlled in a relevant physical way. We can either change the length of the strings, the inharmonicity, or the losses. But to be in accordance with the physical system, we have to take into account the interdependence of some parameters. For instance, the fundamental frequency is obviously related to the length of the string, and to the tension and the linear mass. If we modify the length of the string, we also have to modify, for instance, the fundamental frequency, considering that the tension and the linear mass are unchanged. This aspect has been taken into account in the implementation of the model. 3. THE SOURCE MODEL In the previous section, we observed that the waveguide filters are almost invariant with respect to the velocity. In contrast, the excitation signals (obtained as explained in Section.3 and related to the impact of the hammer on the string) varies nonlinearly as a function of the velocity, thereby taking into account the timbre variations of the resulting piano sound. From the extracted excitation signals, we here study the behavior and design a source model by using signal methods, so as to simulate these behaviors precisely. The source signal is then convolved with the resonator filter to obtain the piano bridge signal Nonlinear source behavior as a function of the hammer velocity Figure 1 shows the excitation signals extracted from the measurement of the vibration of a single string struck by a hammer for three velocities corresponding to the pianissimo, mezzo-forte, and fortissimo musical playing. The excitation duration is about 5 milliseconds, which is shorter than what Laroche and Meillier [1] proposed and in accordance with the duration of the hammer-string contact [6]. Since this interaction is nonlinear, the source also behaves nonlinearly. Figure 13 shows the spectra of several excitation signals obtained for a single string at different velocities regularly spaced between.8 and 4 m/s. The excitation corresponding to fortissimo provides more energy than the ones corresponding to mezzo-forte and pianissimo. But this increased

107 Hybrid Resynthesis of Piano Tones m/s Amplitude (db) 1 Frequency (Hz) m/s Frequency (Hz) Figure 13: Amplitude of the excitation signals for one string and several velocities. amplitude is frequency dependent: the higher partials increase more rapidly than the lower ones with the same hammer velocity. This increase in the high partials corresponds to an increase in brightness with respect to the hammer velocity. It can be better visualized by considering the spectral centroid [31] of the excitation signals. Figure 14 shows the behavior of this perceptually (brightness) relevant criteria [3] as a function of the hammer velocity. Clearly, for one, two, or three strings, the spectral centroid is increased, corresponding to an increased brightness of the sound. In addition to the change of slope, which translates into the change of brightness, Figure 13 shows several irregularities common to all velocities, among which a periodic modulation related to the location of the hammer impact on the string. 3.. Design of a source signal model The amplitude of the excitation increases smoothly as a function of the hammer velocity. For high-frequency components, this increase is greater than for low frequency components, leading to a flattening of the spectrum. Nevertheless, the general shape of the spectrum stays the same. Formants do not move and the modulation of the spectrum due to the hammer position on the string is visible at any velocity. These observations suggest that the behavior of the excitation could be well reproduced using a subtractive synthesis model. The excitation signal is seen as an invariant spectrum shaped by a smooth frequency response filter, the characteristics of which depend on the hammer velocity. The resulting source model is shown on Figure 15. The subtractive source model consists of the static spectrum, the spectral deviation, and the gain. The static spectrum takes into account all the information that is invariant with respect to the hammer velocity. It is a function of the characteristics of the hammer and the strings. The spectral deviation and the gain both shape the spectrum as function of the hammer velocity. The spectral deviation simulates the shifting of the energy to the high frequencies, and the gain models the global increase of Hammer velocity (m/s) One string Two strings Three strings Figure 14: The spectral centroid of the excitation signals for one (plain), two (dash-dotted) and three (dotted) strings. Static spectrum Spectral deviation Gain E s Hammer position db Hammer velocity Figure 15: Diagram of the subtractive source model. amplitude. Earlier versions of this model were presented in [1, ]. This type of models has been, in addition, shown to work well for many instruments [33]. In the early days of digital waveguides, Jaffe and Smith [4] modeled the velocity-dependent spectral deviation as a one-pole lowpass filter. Laursen et al. [34] proposed a second-order biquad filter to model the differences between guitar tones with different dynamics. A similar approach was developed by Smith and Van Duyne inthe time domain [15]. The hammer-string interaction force pulses were simulated using three impulses passed through three lowpass filters which depend on the hammer velocity. In our case, a more accurate method is needed to resynthesize the original excitation signal faithfully The static spectrum We defined the static spectrum as the part of the excitation that is invariant with the hammer velocity. Considering the expression of the amplitude of the partials, a n,forahammer striking a string fixed at its extremities (see Valette and Cuesta [19]), and knowing that the spectrum of the excita- E

108 13 EURASIP Journal on Applied Signal Processing Amplitude (db) Amplitude (db) m/s.8 m/s. m/s Frequency (Hz) Figure 16:ThestaticspectrumE s (ω) Frequency (Hz) Original Spectral tilt tion is related to amplitudes of the partials by E = a n D [9], the static spectrum E s can be expressed as ( ) 4L sin ( nπx /L ) E s ωn = T nπ 1n B, (1) where T is the string tension and L its length, B is the inharmonicity factor, and x the striking position. We can easily measure the striking position, the string length and the inharmonicity factor on our experimental setup. On the other hand, we have an only estimation of the tension, it can be calculated through the fundamental frequency and the linear mass of the string. Figure 16 shows this static spectrum for a single string. Many irregularities, however, are not taken into account for several reasons. We will see later their importance from a perceptual point of view. Equation (1) isstillused, however, when the hammer position is changed. This is useful when one plays with a different temperament because it reduces dissonance The deviation with the dynamic The spectral deviation and the gain take into account the dependency of the excitation signal on velocity. They are estimated by dividing the spectrum of the excitation signal by the static spectrum for all velocities: d(ω) = E(ω)/E s (ω), (13) wheree is the original excitation signal. Figure 17 shows this deviation for three hammer velocities. It effectivelystrengthens the fortissimo, in particular for the medium and high partials. Its evolution with the frequency is regular and can successfully be fitted to a first-order exponential polynomial (as shown in Figure 17) ˆ d = exp(af g), (14) Figure 17: Dynamic deviation of three excitation signals of the experimental setup, original and modeled. db db/khz Hammer velocity (m/s) Hammer velocity (m/s) Figure 18: Parameters g (gain)(top), a (spectral deviation) (bottom) as a function of the hammer velocity for the experimental setup signals, original () and modeled (dashed). where ˆ d is the modeled deviation. The term g corresponds to the gain (independent of the frequency) and the term af corresponds to the spectral deviation. The variables g and a depend on the hammer velocity. To get a usable source model, we must consider the parameter s behavior with different dynamics. Figure 18 shows the two parameters for several hammer velocities. The model is consistent since their behavior is regular. But the tilt increases with the hammer velocity, showing an asymptotic and nonlinear behavior. This observation can be directly related to the physics of the hammer. As we have seen, when the felt is compressed, it becomes harder and thus gives more energy to high frequencies. But, for high velocities, the felt is totally compressed and its hardness is almost constant. Thus, the amplitude of the corre-

109 Hybrid Resynthesis of Piano Tones 131 sponding string wave increases further but its spectral content is roughly the same. We have fitted this asymptotic behavior by an exponential model (see Figure 18), for each parameter g and a, g(v) = α g β g exp ( γ g v ), a(v) = α a β a exp ( γ a v ), (15) where α i (i = g, a) is the asymptotic value, β i (i = g, a) is the deviation from the asymptotic value at zero velocity (the dynamic range), and γ i (i = g, a) is the velocity exponential coefficient, governing how sensitive the attribute is to a velocity change. The parameters of this exponential model were found using a nonlinear weighted curvefit Resynthesis of the excitation signal For a given velocity, the excitation signal can now be recreated using (13), (14), and (15). The inverse Fourier transform of this source model convoluted with the transfer function of the resonator leads to a realistic sound of a string struck by a hammer. The increase in brightness with the dynamic is well reproduced. But from a resynthesis point of view, this model is not satisfactory. The reproduced signal is different from the original one; it sounds too regular and monotonous. To understand this drawback of our model, we calculated the error we made by dividing the original excitation signal by the modeled one for each velocity. The corresponding curves are shown on Figure 19 for three velocities. Notice that this error term does not depend on the hammer velocity, meaning that our static spectrum model is too straightforward and does not take into account the irregularities of the original spectrum. Irregularities are due to many phenomena including the width of the hammer-string contact, hysteretic phenomena in the felt, nonlinear phenomena in the string, and mode resonances of the hammer. To obtain a more realistic sound with our source model, we include this error term in the static spectrum. The resulting original and resynthesized signals are shown on Figure. The deviations of the resulting excitations are perceptually insignificant. The synthesized sound obtained is then close to the original one Behavior and control of the source through measurements on a real piano The source model parameters were calculated for a subset of the data for the piano, namely the notes A, F1, B1, G, C3, G3,D4,E5,andF6.Eachnotehasapproximatelytenvelocities,fromabout.4m/stobetween3to6m/s.Thesource extracted from the signals measured on the piano behaves as the data obtained with the experimental setting for all notes with respect to the hammer velocity. The dynamic deviation is well modeled by the gain g and the spectral deviation parameter a. AsinSection 3., their behavior as a function of the velocity is well fitted using an asymptotic exponential curve. From a perceptual point of view, an increased hammer velocity corresponds both to an increased loudness and a rel- Amplitude (db) Frequency (Hz) Figure 19: Example of the error spectrum. The large errors generally fall in the weak parts of the spectrum. Amplitude (db) m/s.8 m/s. m/s Frequency (Hz) Original Velocity modeled Figure : Original and modeled excitation spectrum for three different hammer velocities for the experimental setup signals. ative increase in high frequencies leading to a brighter tone. Equations (15) make it possible to resynthesize of the excitation signal for a given note and hammer velocity. However, parameters g and a used in the modeling are linked in a complex way to the two most important perceptual features of the tone, that is, loudness and brightness. Thus, without a thorough knowledge of the model, the user will not be able to adjust the parameters of the virtual piano to obtain a satisfactory tone. To get an intuitive control of the model, the user needs to be provided access to these perceptual parameters, loudness and brightness, closely corresponding to energy and spectral centroid. The energy En is directly correlated to the perception of loudness and the spectral centroid Ba to the

110 13 EURASIP Journal on Applied Signal Processing perception of brightness [3]. These parameters are given by Fs/ En = 1 E ( f )df, T Fs/ E( f ) fdf Ba = Fs/, E( f )df (16) where f is the frequency and Fs the sampling frequency. To synthesize an excitation signal having a given energy and spectral centroid, we must express parameters g and a as functions of Ba and En. The centroid actually depends only on a: Fs/ Es( f )e Ba af fdf = Fs/. (17) Es( f )e af df We numerically calculate the expression of a as a function of Ba and store the solution in a table. Alternatively, assuming that the brightness change is unaffected by the shape of the static spectrum Es, the spectral deviation parameter a can be calculated directly from the given brightness [35]. Knowing a, we can calculate g from the energy En by the relation g = 1 ( ) log EnT Fs/. (18) Es ( f )e af bf The behavior of Ba and En as a function of the hammer velocity will then determine the dynamic range of the instrumentanditmustbedefinedbytheuser. Figure 1 shows the behavior of the spectral centroid and the energy for several notes. The curves have similar behavior and differ mainly by a multiplicative constant. We have fitted their asymptotic behavior by an exponential model, similarly to what was done with (15). These functions are applied to the synthesis of each excitation signal and then characterize the dynamic range of the virtual instrument. It is easy for the user to change the dynamic range of the virtual instrument, which is modified by the user by changing the shape of these functions. Calculating the excitation signal is then done as follows. To a given note and velocity, we associate a spectral centroid Ba andanenergyen (using the asymptotic exponential fit); a is then obtained from the spectral centroid and g from the energy (equation (18)). One finally gets the spectral deviation which, multiplied by the static spectrum, allows the excitation signal to be calculated. 4. CONCLUSION The reproduction of the piano bridge vibration is undoubtly the first most important step for piano sound synthesis. We show that a hybrid model consisting of a resonant part and an excitation part is well adapted for this purpose. After accurate calibration, the sounds obtained are perceptually close to the original ones for all notes and velocities. The resonator, which simulates the phenomena intervening in the strings Frequency (Hz) db F1 G B1 A Hammer velocity (m/s) A F1 B1 G (a) G3 C3 G3 C3 E5 D4 E5 D Hammer velocity (m/s) (b) Figure 1: Spectral centroid (a) and energy (b) for several notes as a function of the hammer velocity, original (plain) and modeled (dotted). themselves, is modeled by a digital waveguide model that is very efficient in simulating the wave propagation. The resonatormodelexhibitsphysicalparameterssuchasthestring tension, the inharmonicity coefficient, allowing physically relevant control of the resonator. It also takes into account the coupling effects, whichareextremelyrelevantforperception. The source is extracted using a deconvolution process and is modeled using a subtractive signal model. The source model consists of three parts (static spectrum, spectral deviation, and gain) that are dependent on the velocities and the notes played. To get intuitive control of the source model, we exhibited two parameters: the spectral centroid and the energy, strongly related to the perceptual parameters brightness and loudness. This perceptual link permits easy control of the dynamic characteristics of the piano. Thus, the tone of a given piano can be synthesized using a hybrid model. This model is currently implemented in realtime using a Max-MSP software environment. APPENDIX INVERSE PROBLEM, THREE-COUPLED DIGITAL WAVEGUIDE We show in this appendix how the parameters of a threecoupled digital waveguide model can be expressed as function of the modal parameters. This method is an extension of the model presented in [9]. F6 F6

111 Hybrid Resynthesis of Piano Tones 133 The signal measured at the bridge level is the result of the vibration of three coupled strings. Each partial is actually constituted by at least three components, having frequencies which are slightly different from the frequencies of each individual string. We write the measured signal as a sum of exponentially damped sinusoids: s(t) = a 1k e α1kt e iω1kt a k e αkt e iωkt a 3k e α3kt e iω3kt, k=1 (A.1) with a 1k, a k,anda 3k the initial amplitudes, α 1k, α k, α 3k and ω 1k, ω k, ω 3k the damping coefficients and the frequencies of the components of the kth partial. The Fourier transform of s(t)is S(ω) = k=1 a 1k α 1k i ( ω ω 1 k ) a k α k i ( ω ω k ) a 3k α 3k i ( ω ω 3 k ). (A.) We identify this expression locally in frequency with the output T(ω) of the three-coupled waveguide model (see Figure 3): with N 1 = F 1 F F 3 T(ω) = N 1 N [( C a 1 )( ) ( F 1 F F F 3 Ce 1 ) ] F 1 F 3 [ F 1 F F 3 34Ce C a 4C a C e Ce ] N = 1 ( ) ( )( ) F 1 F F 3 F1 F F F 3 1 C a F 1 F 3 ( 1 C e ) F1 F F 3 ( C a C e C a C e C ac e 1 ), (A.3) (A.4) where F i (i = 1,, 3) are the loop filters of the digital waveguides G i (i = 1,, 3) (without loss of generality, one can assume that D 1 = D = D 3 = D, since the difference in delays can be taken into account in the phase of the filter F i ). For this purpose, since T(ω) is a rational fraction of third-order polynomial in e iωd (see (6)), it can be decomposed into a sum of three rational fractions of the first-order polynomial in e iωd : T(ω) = P(ω)e iωd Q(ω)e iωd 1 X(ω)e iωd 1 Y(ω)e iωd R(ω)e iωd 1 Z(ω)e iωd. (A.5) The vibrations generated by the model are assimilated to a superposition of three series of partials whose frequencies and decay times are governed by the quantities X(ω), Y(ω), and Z(ω). By identification between (A.3)and(A.5), we determine the following system of 6 equations: P Q R = F 1 F F 3, (A.6) PY PZ QX QZ RX RY = F 1 F ( 1 Ca ) F1 F 3 ( 1 Ce ) F F 3 ( 1 Ca ), (A.7) PYZ QXZ RXY ( = F 1 F F 3 4Ca C e 4C a C e Ce 3 ), X Y Z = F 1 F F 3, (A.8) (A.9) XY XZ YZ = F 1 F ( 1 C a ) F F 3 ( 1 C a ), F 1 F 3 ( 1 C e ) (A.1) XYZ = F 1 F F 3 ( 1 C a C e C ac e ). (A.11) We identify (A.) with the excitation signal times the transfer function T (equation (A.5)): S(ω) = E(ω)T(ω). (A.1) Assuming that two successive modes do not overlap (these assumptions are verified for the piano sound) and by writing X(ω) = X(ω) e iφx (ω), Y(ω) = Y(ω) e iφy (ω), Z(ω) = Z(ω) e iφz(ω), we express (A.1)neareachdoubleresonanceas a 1k α 1k i ( ω ω 1 k ) a k α k i ( ω ω k ) (A.13) a 3k α 3k i ( ω ω 3 k ) E(ω)P(ω)e iωd 1 X(ω) e i(ωd ΦX (ω)) E(ω)Q(ω)e iωd 1 Y(ω) i(ωd ΦY e (ω)) E(ω)R(ω)e iωd 1 Z(ω) e i(ωd ΦZ(ω)). (A.14) We identify term by term the members of this equation. We take, for example, a 1k α 1k i ( ω ω 1 k ) E(ω)P(ω)e iωd 1 X(ω). (A.15) i(ωd ΦX e (ω)) The resonance frequencies of each doublet ω 1k, ω k,andω 3k correspond to the minimum of the three denominators 1 X(ω) e i(ωd ΦX (ω)), 1 Y(ω) e i(ωd ΦY (ω)), 1 Z(ω) e i(ωd ΦZ(ω)). (A.16) If we assume that moduli X(ω), Y(ω), and Z(ω) are close to one (this assumption is realistic because the propagation is weakly damped), we determine the values of ω 1k,

112 134 EURASIP Journal on Applied Signal Processing ω k,andω 3k : ω 1k = Φ ( ) X ω1k kπ, D ω k = Φ ( ) Y ωk kπ, D ω 3k = Φ ( ) Z ω3k kπ. D (A.17) Taking ω = ω 1k ɛ with ɛ arbitrary small, a 1k α 1k iɛ E ( ω1k ɛ ) P ( ω 1k ɛ ) e iφx (ω1kɛ) e iɛd 1 X ( ω 1k ɛ ) e iɛd. (A.18) A limited expansion of e iɛd 1 iɛd θ(ɛ )aroundɛ = (at the zeroth order for the numerator and at the first order for the denominator) gives E ( ω 1k ɛ ) P ( ω 1k ɛ ) e iφx (ω1kɛ) e iɛd E ( ) ( ) ω 1k P ω1k e iφ X (ω 1k), 1 X ( ω 1k ɛ ) e iɛd 1 X ( ) ω 1k (1 iɛd). (A.19) Assuming that P(ω) and X(ω) are locally constant (in the frequency domain), we identify term by term (the two members are considered as functions of the variable ɛ). We deduce the expressions of X(ω), Y(ω), and Z(ω) as a function of the amplitudes and decay times coefficients for each mode: X ( ) ω 1k 1 = α 1k D 1, Y ( ) ω k 1 = α k D 1, Z ( ) ω k 1 = α 3k D 1. (A.) We also get the relations E ( ) ( ) ω 1k P ω1k = a1k DX ( ) ω 1k, E ( ) ( ) ω k Q ωk = ak DY ( ) ω k, E ( ) ( ) ω 3k Q ω3k = a3k DY ( ) ω 3k. (A.1) From the measured signal, we estimate the modal parameters a 1k, a k, a 3k, α 1k, α k, α 3k, ω 1k, ω k,andω 3k. Using (A.17) and (A.), we calculate X, Y, andz. Westillhave9unknown variables P, Q, R, E, C a, C e, F 1, F,andF 3.Butwe also have a system of 9 equations ((A.6), (A.7), (A.8), (A.9), (A.1), (A.11),and (A.1)). Assuming that the two resonance frequencies are close and that the variables P, Q, R, E, C a, C e, F 1, F, F 3, X, Y, andz have a locally smooth behavior, we then express the waveguide parameters as function of the temporal parameters. For the sake of simplicity, we note E k = E(ω 1k ) = E(ω k ). Using (A.6) and(a.9), we obtain P k Q k R k = X k Y k Z k. Thanks to (A.1) we finally get the expression of the excitation signal at the resonance frequencies E k = D( ) a 1k X k a k Y k a 3k Z k. (A.) X k Y k Z k In the case of a two-coupled digital waveguide, the corresponding system admits analytical solutions (see [9]). But in the case of three-coupled digital waveguide, we have not found analytical expressions for variables P, Q, R, C a, C e, F 1, F,andF 3. We have then solved the system numerically. REFERENCES [1] J. Bensa, K. Jensen, R. Kronland-Martinet, and S. Ystad, Perceptual and analytical analysis of the effect of the hammer impact on piano tones, in Proc. International Computer Music Conference, pp , Berlin, Germany, August. [] J. Bensa, F. Gibaudan, K. Jensen, and R. Kronland-Martinet, Note and hammer velocity dependance of a piano string model based on coupled digital waveguides, in Proc. International Computer Music Conference, pp , Havana, Cuba, September 1. [3] A. Askenfelt, Ed., Five Lectures on the Acoustics of the Piano, Royal Swedish Academy of Music, Stockholm, Sweden, 199, Lectures by H. A. Conklin, Anders Askenfelt and E. Jansson, D. E. Hall, G. Weinreich, and K. Wogram, lectures/. [4] S. Ystad, Sound modeling using a combination of physical and signal models, Ph.D. thesis, Université delaméditérranée, Marseille, France, [5] S. Ystad, Sound modeling applied to flute sounds, Journal of the Audio Engineering Society, vol. 48, no. 9, pp ,. [6] A. Askenfelt and E. V. Jansson, From touch to string vibrations. II: The motion of the key and hammer, Journal of the Acoustical Society of America, vol. 9, no. 5, pp , [7] A. Askenfelt and E. V. Jansson, From touch to string vibrations. III: String motion and spectra, Journal of the Acoustical Society of America, vol. 93, no. 4, pp , [8] X. Boutillon, Le piano: Modelisation physiques et developpements technologiques, in Congres Francais d Acoustique Colloque C, pp , Lyon, France, 199. [9] P. Schaeffer, Traitédesobjetsmusicaux, Edition du Seuil, Paris, France, [1] J. Laroche and J. L. Meillier, Multichannel excitation/filter modeling of percussive sounds with application to the piano, IEEE Trans. Speech, and Audio Processing, vol.,no.,pp , [11] J. O. Smith III, Physical modeling using digital waveguides, Computer Music Journal, vol. 16, no. 4, pp , 199. [1] G. Borin, D. Rochesso, and F. Scalcon, A physical piano model for music performance, in Proc. International Computer Music Conference, pp , Computer Music Association, Thessaloniki, Greece, September [13] B. Bank, Physics-based sound synthesis of the piano, M.S. thesis, Budapest University of Technology and Economics, Budapest, Hungary,, published as Report 54, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, bank. [14] J. O. Smith III, Efficient synthesis of stringed musical instruments, in Proc. International Computer Music Conference, pp , Computer Music Association, Tokyo, Japan, September [15] J.O.SmithIIIandS.A.VanDuyne, Commutedpianosynthesis, in Proc. International Computer Music Conference, pp , Computer Music Association, Banff, Canada, September [16] S. A. Van Duyne and J. O. Smith III, Developments for the commuted piano, in Proc. International Computer Music

Hybrid Resynthesis of Piano Tones 135 Conference, pp. 319 36, Computer Music Association, Banff, Canada, September 1995. [17] A. Chaigne and A. Askenfelt, Numerical simulations of struck strings. I.

Boutillon, Model for piano hammers: Experimental determination and digital simulation, Journal of the Acoustical Society of America, vol. 83, no., pp. 746 754, 1988. [19] C. Valette and C.

Askenfelt, Piano string excitation V: Spectra for real hammers and strings, Journal of the Acoustical Society of America, vol. 83, no. 6, pp. 167 1638, 1988. [1] J. Bensa, S. Bilbao, R.

113 Hybrid Resynthesis of Piano Tones 135 Conference, pp , Computer Music Association, Banff, Canada, September [17] A. Chaigne and A. Askenfelt, Numerical simulations of struck strings. I. A physical model for a struck string using finite difference methods, Journal of the Acoustical Society of America, vol. 95, no., pp , [18] X. Boutillon, Model for piano hammers: Experimental determination and digital simulation, Journal of the Acoustical Society of America, vol. 83, no., pp , [19] C. Valette and C. Cuesta, Mécanique de la corde vibrante, Traité des nouvelles technologies. Série Mécanique. Hermès, Paris, France, [] D. E. Hall and A. Askenfelt, Piano string excitation V: Spectra for real hammers and strings, Journal of the Acoustical Society of America, vol. 83, no. 6, pp , [1] J. Bensa, S. Bilbao, R. Kronland-Martinet, and J. O. Smith III, The simulation of piano string vibration: from physical model tofinitedifference schemes and digital waveguides, Journal of the Acoustical Society of America, vol. 114, no., pp , 3. [] A. Chaigne and V. Doutaut, Numerical simulations of xylophones. I. Time-domain modeling of the vibration bars, Journal of the Acoustical Society of America, vol. 11, no. 1, pp , [3] H. Fletcher, E. D. Blackham, and R. Stratton, Quality of piano tones, Journal of the Acoustical Society of America, vol. 34, no. 6, pp , 196. [4] D. A. Jaffe and J. O. Smith III, Extensions of the Karplus- Strong plucked-string algorithm, Computer Music Journal, vol. 7, no., pp , [5] G. Weinreich, Coupled piano strings, Journal of the Acoustical Society of America, vol. 6, no. 6, pp , [6] V. Välimäki, J. Huopaniemi, M. Karjalainen, and Z. Jánosy, Physical modeling of plucked string instruments with application to real-time sound synthesis, Journal of the Audio Engineering Society, vol. 44, no. 5, pp , [7] B. Bank, Accurate and efficient modeling of beating and twostage decay for string instrument synthesis, in Proc. Workshop on Current Research Directions in Computer Music, pp , Barcelona, Spain, November 1. [8] D. Rocchesso and J. O. Smith III, Generalized digital waveguide networks, IEEE Trans. Speech, and Audio Processing, vol. 11, no. 3, pp. 4 54, 3. [9] M. Aramaki, J. Bensa, L. Daudet, P. Guillemain, and R. Kronland-Martinet, Resynthesis of coupled piano string vibrations based on physical modeling, Journal of New Music Research, vol. 3, no. 3, pp. 13 6,. [3] K. Steiglitz and L. E. McBride, A technique for the identification of linear systems, IEEE Trans. Automatic Control, vol. 1, pp , [31] J. Beauchamp, Synthesis by spectral amplitude and brightness matching of analyzed musical instrument tones, Journal of the Audio Engineering Society, vol. 3, no. 6, pp , 198. [3] S. McAdams, S. Winsberg, S. Donnadieu, G. de Soete, and J. Krimphoff, Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes, Psychological Research, vol. 58, pp , 199. [33] K. Jensen, Musical instruments parametric evolution, in Proc. International Symposium on Musical Acoustics, pp , Computer Music Association, Mexico City, Mexico, December. [34] M. Laursen, C. Erkut, V. Välimäki, and M. Kuuskankara, Methods for modeling realistic playing in acoustic guitar synthesis, Computer Music Journal, vol. 5, no. 3, pp , 1. [35] K. Jensen, Timbre models of musical sounds, Ph.D. thesis, Datalogisk Institut, Kobehavns Universitet, Copenhagen, Denmark, DIKU Tryk, Technical Report No 99/7, Julien Bensa obtained in 1998 his Master s degree (DEA) in acoustics, signal processing, and informatics applied to music from the Pierre et Marie Curie University, Paris, France. He received in 3 a Ph.D. in acoustics and signal processing from the University of Aix-Marseille II for his work on the analysis and synthesis of piano sounds using physical and signal models (available on line at bensa). He currently holds a postdoc position in the Laboratoire d Acoustique Musicale, Paris, France, and works on the relation between the parameters of synthesis models of musical instruments and the perceived quality of the corresponding tones. Kristoffer Jensen got his Master s degree in computer science at the Technical University of Lund, Sweden, and a DEA in signal processing at the ENSEEIHT, Toulouse, France. His Ph.D. was delivered and defended in 1999 at the Department of Datalogy, University of Copenhagen, Denmark, treating analysis/synthesis, signal processing, classification, and modeling of musical sounds. Kristoffer Jensen has a broad background in signal processing, including musical, speech recognition and acoustic antenna topics. He has been involved in synthesizers for children, state-of the-art next-generation effect processors,and signal processing in music informatics. His current research topic is signal processing with musical applications, and related fields, including perception, psychoacoustics, physical models, and expression of music. He currently holds a position at the Department of Datalogy as Assistant Professor. Richard Kronland-Martinet received a Ph.D. in acoustics from the University of Aix-Marseille II, France, in He received a Doctorat d Etat ès Sciences in 1989 for his work on Analysis and synthesis of sounds using time-frequency and timescale representations. He is currently Director of Research at the National Center for Scientific Research (CNRS), Laboratoire de Mécanique et d Acoustique in Marseille, where he is the head of the group Modeling, Synthesis and Control of Sound and Musical Signals. His primary research interests are in analysis and synthesis of sounds with a particular emphasis on musical sounds. He has recently been involved in a multidisciplinary research project associating sound synthesis processes and brain imaging techniques fonctional Nuclear Magnetic Resonance (fnrm) to better understand the way the brain is processing sounds and music.

114 EURASIP Journal on Applied Signal Processing 4:7, c 4 Hindawi Publishing Corporation Warped Linear Prediction of Physical Model Excitations with Applications in Audio Compression and Instrument Synthesis Alexis Glass Department of Acoustic Design, Graduate School of Design, Kyushu University, Shiobaru, Minami-ku, Fukuoka , Japan alexis@andes.ad.design.kyushu-u.ac.jp Kimitoshi Fukudome Department of Acoustic Design, Faculty of Design, Kyushu University, Shiobaru, Minami-ku, Fukuoka , Japan fukudome@design.kyushu-u.ac.jp Received 8 July 3; Revised 13 December 3 A sound recording of a plucked string instrument is encoded and resynthesized using two stages of prediction. In the first stage of prediction, a simple physical model of a plucked string is estimated and the instrument excitation is obtained. The second stage of prediction compensates for the simplicity of the model in the first stage by encoding either the instrument excitation or the model error using warped linear prediction. These two methods of compensation are compared with each other and to the case of single-stage warped linear prediction, adjustments are introduced and their applications to instrument synthesis and MPEG4 s audio compression within the structured audio format are discussed. Keywords and phrases: warped linear prediction, audio compression, structured audio, physical modelling, sound synthesis. 1. INTRODUCTION Since the discovery of the Karplus-Strong algorithm [1] and its subsequent reformulation as a physical model of a string, a subset of the digital waveguide [], physical modelling has seen the rapid development of increasingly accurate and disparate instrument models. Not limited to string model implementations of the digital waveguide, such as the kantele [3] and the clavichord [4],models for brass,woodwind, and percussive instruments have made physical modelling ubiquitous. With the increasingly complex models, however, the task of parameter selection has become correspondingly difficult. Techniques for calculating the loop filter coefficients and excitation for basic plucked string models have been refined [5, 6] and can be quickly calculated. However, as the one-dimensional model gave way to models with weakly interacting transverse and vertical polarizations, research has looked to new ways of optimizing parameter selection. These new methods of optimizing parameter selection use neural networks or genetic algorithms [7, 8] to automate tasks which would otherwise take human operators an inordinate amount of time to adjust. This research has yielded more accurate instrument models, but for some applications it also leaves a few problems unaddressed. The MPEG-4 structured audio codec allows for the implementation of any coding algorithm, from linear predictive coding to adaptive transform coding to, at its most efficient, the transmission of instrument models and performance data [9]. This coding flexibility means that MPEG- 4 has the potential to implement any coding algorithm and to be within an order of magnitude of the most efficient codec for any given input data set [1]. Moreover, for sources that are synthetic in nature, or can be closely approximated by physical or other instrument models, structured audio promises levels of compression orders of magnitude better than what is currently possible using conventional pure signal-based codecs. Current methods used to parameterize physical models from recordings require, however, a great deal of time for complex models [8]. They also often require very precise and comprehensive original recordings, such as recordings of the impulse response of the acoustic body [5, 11], in order to achieve reproductions that are indistinguishable from the original. Given current processor speeds, these limitations preclude the use of genetic algorithm parameter selection techniques for real-time coding. Real-time coding is also

115 WLP in Physical Modelling for Audio Compression 137 made exceedingly difficult in such cases where body impulse responses are not available or playing styles vary from model expectations. This paper proposes a solution to this real-time parameterization and coding problem for string modelling in the marriage of two common techniques, the basic plucked string physical model and warped linear prediction (WLP) [1]. Thejustificationsforthisapproachareasfollows.Most string recordings can be analyzed using the techniques developed by Smith, Karjalainen et al. [, 6]inordertoparameterize a basic plucked string model, and a considerable prediction gain can be achieved using these techniques. The excitation signal for the plucked string model is constituted by an attack transient that represents the plucking of the string according to the player s style and plucking position [11], and is followed by a decay component. This decay component includes the body resonances of the instrument [11, 13], beating introduced by the string s three-dimensional movement and further excitation caused by the player s performance. Additional excitations from the player s performance include deliberate expression through vibrato or even unintentional influences, such as scratching of the string or the rattling caused by the string vibrating against the fret with weak fingering pressure. The body resonances and contributions from the three-dimensional movement of the string mean that the excitation signal is strongly correlated and therefore a good candidate for WLP coding. Furthermore, while residual quantization noise in a warped predictive codec is shaped so as to be masked by the signal s spectral peaks [1], in one of the proposed topologies, the noise in the physical model s excitation signal is likewise shaped into the modelled harmonics. This shaping of the noise by the physical model results in distortion that, if audible, is neither unnatural nor distracting, thereby allowing codec sound quality to degrade gracefully with decreasing bit rate. In the ideal case, we imagine that at the lowest bit rate, the guitar would be transmitted using only the physical model parameters and that with increasing excitation bit rate, the reproduced guitar timbre would become closer to the target original one. This paper is composed of six sections. Following the introduction, the second section describes the plucked string model used in this experiment and the analysis methods used to parameterize it. The third section describes the recording of a classic guitar and an electric guitar for testing. The coding of the guitar tones using a combination of physical modelling and warped linear predictive coding is outlined in Section4. Section5 analyzes the results from simulated coding scenarios using the recorded samples from Section3 and the topologies of Section 4, while investigating methods of further improving the quality of the codec. Section 6 concludes the paper.. MODEL STRUCTURE A simple linear string model extended from the Karplus- Strong algorithm, by Jaffe and Smith [14], was used in this x(n) F(z) G(z) z L y(n) Figure 1: Topology of a basic plucked string physical model. study, comprised of one delay line z L with a first-order allpass fractional delay filter F(z) and a single pole low-pass loop filter G(z) as shown in Figure 1, where, F(z) = a z 1, (1) 1az 1 G(z) = g( ) 1a 1, () 1a 1 z 1 and the overall transfer function of the system can be expressed as 1 H(z) =. (3) 1 F(z)G(z)z L This string model is very simple and much more accurate and versatile models have been developed since [6, 11, 15]. For the purposes of this study, however, it was required that the model could be quickly and accurately parameterized without the use of complex or time consuming algorithms and sufficientthatitoffers a reasonable first-stage coding gain. The algorithms used to parameterize the first-order model are described in detail in [15] andwillonlybeoutlined here as they were implemented for this study. In the first stage of the model parameterization, the pitch of the target sound was detected from the target s autocorrelation function. The length of the delay line z L and the fractional delay filter F(z) were determined by dividing the sampling frequency (44.1 khz) by the pitch of the target. Next, the magnitude of up to the first harmonics were tracked using short-term Fourier transforms (STFTs). The magnitude of each harmonic versus time was recorded on a logarithmic scale after the attack transient of the pluck was determined to have dissipated and until the harmonic had decayed 4 db or disappeared into the noise floor. A linear regression was performed on each harmonic s decay to determine its slope, β k, as shown in Figure, and the measured loop gain for each harmonic, G k, was calculated according to the following equation, G k = 1 βkl/h, k = 1,,..., N h, (4) where L is the length of the delay line (including the fractional component), and H is the hop size (adjusted to account for hop overlap). The loop gain at DC, g, wasestimated to equal the loop gain of the first harmonic, G 1,as in [15]. Because the target guitar sounds were arbitrary and nonideal, the harmonic envelop trajectories were quite noisy in some cases, so, additional measures had to be introduced to stop tracking harmonics when their decays became too

116 138 EURASIP Journal on Applied Signal Processing 5 Separate room Magnitude (db) Anechoic chamber PC with layla Mic amp Time (s) Figure : The temporal envelopes of the lowest four harmonics of a guitar pluck. The temporal envelopes of the lowest four harmonics of a guitar (dashed) and their estimated decays (solid). Figure 3: Schematic for classic guitar pluck recording. erratic or, as in some cases, negative. In such cases as when the guitar fret was held with insufficient pressure, additional transients occurred after the first attack transient and this tended to raise the gain factor in the loop filter, resulting in a model that did not accurately reflect string losses. For the purposes of this study, such effects were generallyignored so long as a positive decay could be measured from the harmonics tracked. The first-order loop filter coefficient a 1 was estimated by minimizing the weighted error between the target loop filter G k,ascalculatedin(4), and candidate filters G(z) from(). A weighting function W k, suggested by [15]anddefinedas W k = 1 ( 1 Gk ), (5) was used such that the error could be calculated as follows: E ( ) N h ( a 1 = W k Gk G ( ) ) e jωk, a 1, (6) k=1 where ω k is the frequency at the harmonic being evaluated and <a 1 < 1. This error function is roughly quadratic in the vicinity of the minimum, and parabolic interpolation was found to yield accurate values for the minimum in less time than iterative methods. For controlled calibration of the loop filter extraction algorithm, synthesized plucked string samples were created using the extended Karplus-Strong algorithm and the model as described by Välimäki [11], with two string polarizations and a weak sympathetic coupling between the strings. 3. DATA ACQUISITION The purpose of the algorithms explored in this research was to resynthesize real, nontrivial plucked string sounds using the combination of the basic plucked string model and WLP coding. No special care was taken, therefore, in the selection of the instruments to be used or the nature of the guitar tones to be analyzed and resynthesized beyond that they were monophonic, recorded in an anechoic chamber and each pluck was preceded by silence to facilitate the analysis process. A schematic of the recording environment and signal flow for the classic guitar is pictured in Figure 3. Two guitars were recorded. The first, a classic guitar, was recorded in an anechoic chamber with the guitar held approximately 5 cm from a Bruel & Kjaer type 4191 free field 1/ microphone, the output of which was amplified by a Falcon Range 1/ type 669 microphone preamp with a Bruel & Kjaer type 5935 power supply and fed into a PC through a Layla 4/96 multitrack recording system. The electric guitar was recorded through its line out and a Yamaha O3D mixer into the Layla. A variety of plucking styles were recorded in both cases, along with the application of vibrato, string scratching, and several cases where insufficient finger pressure on the frets lead to further string excitation (i.e., a rattling of the string) after the initial pluck. After capturing approximately 8 minutes of playing with each guitar, suitable candidates for the study were selected on the basis of their unique timbres, durations, and potential difficulty for accurate resynthesis using existing plucked string models. More explicitly, in the case of the classic guitar, bright plucks of E1 (8 Hz) were recorded along with several recordings of B1 (14 Hz), where weak finger pressure lead to a rattling of the string. Another sample selected involved this weak finger pressure leading to an early damping of the string by the fret hand, though without the nearly instantaneous subsequent decay that a fully damped string would yield. A third, higher pitch was recorded with an open string at E3 (335 Hz). In the case of the electric guitar, two samples were used one of slapped E1 (8 Hz) with almost no decay and another of E (165 Hz) with some vibrato applied.

117 WLP in Physical Modelling for Audio Compression 139 Amplitude Amplitude Time (s) (a) Time (s) (b) Figure 4: The decomposition of an excitation into (a) attack and (b) decay. The attack window is milliseconds long. In this case, decay refers to the portion of the pluck where the greatest attenuation is a result of string losses. Because the string is not otherwise damped, it may also be considered to be the sustain segment of the envelope. 4. ANALYSIS/RESYNTHESIS ALGORITHMS 4.1. Warped linear prediction Frequency warping methods [16] can be used with linear prediction coding so that the prediction resolution closely matches the human auditory system s nonuniform frequency resolution. Härmä found that WLP realizes a basic psychoacoustic model [1]. As a control for the study, the target signal was therefore first processed using a twentieth-order WLP coder of lattice structure. Thelatticefilter sreflectioncoefficients were not quantized, and after inverse filtering, the residual was split into two sections, attack and decay, which were quantized using a mid-riser algorithm. The step size in the mid-riser quantizer was set such that the square error of the residual was minimized. The number of bits per sample in the attack residual (BITSA) was set to each of BITSA = {16, 8, 4} for each of the bits per sample in the decay residual BITSD = {, 1}. The frame size for the coding was set to equal two periods of the guitar pluck being coded, and the reflection coefficients were linearly interpolated between frames. The bit allocation method was used in order to match the case of the topologies that use a first-stage physical model predictor, where more bits were allocated to the attack excitation than the decay excitation. Härmä found in [1] that near transparent quality could be achieved with 3 bits per sample using a WLP codec. It is therefore reasonable to suggest that the WLP used here could have been optimized by distributing the high number of bits used in the attack throughout the length of the sound to be coded. However, since similar optimizations could also be made in the two-stage algorithms, only the simplest method was investigated in this study. 4.. Windowed excitation As the most basic implementation of the physical model, the residual from the string model s inverse filter can be windowed and used as the excitation for the model. In this study, the excitation was first coded using a warped linear predictive coder of order and with BITSA bits of quantization for each sample of the residual. In many cases, the first 1 milliseconds of the excitation contains enough information about the pluck and the guitar s body resonances for accurate resynthesis [13, 15]. The beating caused by the slight threedimension movement of the string and the rattling caused by the energetic plucks used in the study, however, were significant enough that a longer excitation was used. Specifically, the window used was thus unity for the first 1 milliseconds of the excitation and then decayed as the second half of a Hanning window for the following 1 milliseconds. An example of this windowed excitation can be seen in the top of Figure 4. This windowed excitation, considered as the attack component, was input to the string model for comparison to the WLP case and used in the modified extended Karplus-Strong algorithm which will now be described Two-stage coding topologies As described in [9], structured audio allows for the parameterization and transmission of audio using arbitrary codecs. These codecs may be comprised of instrument models, effect models, psychoacoustic models, or combinations thereof. The most common methods used for the psychoacoustic compression of audio are transform codecs, such as MP3 [17] andatrac[18] and time-domain approaches such as WLP [1]. Because the specific application being considered here is that of the guitar, the first stage of our codec is the simple string model described in Section. The second stage of coding was then approached using one of two methods: (1) the model s output signal error (referred to as model error) could be immediately coded using WLP, or () the model s excitation could be coded using WLP, with the attack segment of the excitation receiving more bits as in the WLP case of Section 4.. The topologies of these two strategies are illustrated in Figure 5. Both topologies require the inverse filtering of the target pluck sound in order to extract the excitation. The decomposition of the excitation into attack and decay components for the first topology, as formerly proposed by Smith [19] and implemented by Välimäki and Tolonen in [13], reflects the wideband and high amplitude portion which marks the beginning of the excitation signal and the decay which typically contains lower frequency components from body resonances

118 14 EURASIP Journal on Applied Signal Processing Coder Transmission Decoder s String model parameter estimation Inverse filter H 1 (z) w attack x full WLPC Q P 1 (z) BITSA WLPD P(z) ˆx attack String model H(z) s wex e model WLPC P 1 (z) Q BITSD WLPD P(z) WLPD P(z) String model H(z) ê model s wex ŝ String model parameter estimation s Inverse filter H 1 (z) x full WLPC P 1 (z) w attack w decay Q BITSA Q BITSD x attack x decay WLPD P(z) ˆx full String model H(z) ŝ Figure 5: The WLP Coding of Model Error (WLPCME) topology (top) and WLP Coding of Model Excitation (WLPCMX) topology (bottom). Here, s represents the plucked string recording to be coded and ŝ the reconstructed signal. In this diagram, WLPC indicates the WLP coder, or inverse filter, and WLPD indicates the WLP decoder. Q is the quantizer, with BITSA and BITSD being the number of bits with which the respective signals are quantized. or from the three-dimensional movement of the string. However, whereas the authors of [13] synthesized the decay excitation at a lower sampling rate, justified by its predominantly lower frequency components, the excitations in our study often contained wideband excitations following the initial attack and no such multirate synthesis was therefore used. Typical attack and decay decomposition of an excitation is shown in Figure 4. The high frequency decay components are a result of the mismatch between the string model and the source recording Warped linear prediction coding of model error The WLPCME topology from Figure 5 was implemented such that WLP was applied to the model error as follows s wex = h ˆx attack, e model = s s wex, ŝ = s wex ê model, where s is the recorded plucked string input, h is the impulse response of the derived pluck string model from (3), ˆx attack is the WLP-coded windowed excitation introduced in Section 4., s wex is the pluck resynthesized using only the windowed excitation, and e model is the model error. ê model is thus the model error coded using WLP and BITSD bits per sample and ŝ is the reconstructed pluck Warped linear prediction coding of model excitation In this case, the model excitation was coded instead of the model error. Following the string model inverse filtering, the excitation is whitened using a twentieth-order WLP inverse filter. Next, the signal is quantized with BITSA bits per sample allotted to the residual in the attack, and BITSD bits per (7) sample for the decay residual. This process can be expressed in the following terms: x full = h 1 s, ( x attack = q BITSA p 1 ) x full w attack, ( x decay = q BITSD p 1 ) x full w decay, ˆx full = p ( x ) attack x decay, ŝ = h ˆx full, wheres is the original instrument recording being modelled, h is the string model s inverse filter, and x full is thus the model excitation. x attack is therefore the string model excitation whitened by the WLP, p 1, and quantized to BITSA, while x decay is likewise whitened and quantized to BITSD. The sum of the attack and decay is then resynthesized by the WLP decoder, p. The resulting ˆx full is subsequently considered as excitation to the string model, h, to form the resynthesized plucked string sound ŝ. 5. SIMULATION RESULTS AND DISCUSSION Inordertoevaluatetheeffectiveness of the two proposed topologies, a measure of the sound quality was required. Informal listening tests suggested that the WLPCMX topology offered slightly improved sound quality and a more musical coding at lower bit rates, although it came at the cost of a much brighter timbre. At very low bit rates, WLPCMX introduced considerable distortion especially for sound sources that were poorly matched by the string model. WLPCME, on the other hand, was equivalent in sound quality to WLPC and sometimes worse. Resynthesis using windowed excitation yielded passable guitar-like timbres, but in none of the test cases came close to reproducing the nuance or fullness of the original target sounds. (8)

119 WLP in Physical Modelling for Audio Compression 141 For a more formal evaluation of the simulated codecs sound quality, an objective measure of sound quality was calculated by measuring the spectral distance between the frequency warped STFTs, S k, of the original pluck recording and the resynthesized output, Ŝ k, created using the codecs. The frequency-warped STFT sequences were created by first warping each successive frame of each signal using cascaded all-pass filters [16], followed by a Hanning window and a fast Fourier transform (FFT). The method by which the bark spectral distance (BSD) was measured is as follows: ( BSD k = 1 N N 1 n= ( log1 S k (n) log1 Ŝk (n) ) ), (9) with the mean BSD for the whole sample being the unweighted mean of all frames k. A typical profile of BSD versus time is shown in Figure 6 for the three cases WLPC, WLPCMX, and WLPCME. In the first round of simulations, all six input samples as described in Section 3 were processed using each of the algorithms described in Section 4. The resulting mean BSDs were then calculated to be as shown in Figure 7. Subjective evaluation of the simulated coding revealed that as bit rate decreased, the WLPCMX topology maintained a timbre that, while brighter than the target, was recognizably as a guitar. In contrast, the other methods became noisy and synthetic. Objective evaluation of these same results reveals that both topologies using a first-stage physical model predictor have greater spectral distortion than the case of WLPC, particularly in the case of the recordings with very slow decays (i.e., with a high DC loop gain g). In identifying the cause of this distortion, we must first consider the model prediction. The degradation occurs for the following reason in each of the two topologies. (A) In the case of the WLPCME, the beating that is caused by the three-dimensional vibration of the string causes considerable phase deviation from the phase of the modelled pluck, and the model error often becomes greater in magnitude than the original signal itself. This leads to a noisier reconstruction by the resynthesizer. Additionally, small model parameterization errors in pitch and the lack of vibrato in the model result in phase deviations. (B) In the case of the WLPCMX, with a low bit rate in the residual quantization stage of the linear predictor, a small error in coding of the excitation is magnified by the resynthesis filter (string model). In addition to this, as noted in [15], the inverse filter may not have been of sufficiently high order to cancel all harmonics, and high frequency noise, magnified by the WLP coding, may have been further shaped by the plucked string synthesizer into bright higher harmonics. The distortion caused by the topology in (A) seems impossible to improve significantly without using a more complex model that considers the three-dimensional vibration of the string, suchas the model proposed by Välimäki et al. [11] Mean BSD (db) Time (s) Figure 6: Bark scale spectral distortion (db) versus time (s). WLPC is solid, WLPCMX is dashed-dotted and WLPCME is the dashed line. Mean BSD (db) Figure 7: Mean Bark scale spectral distortion (db) using each of WLPC, WLPCME, and WLPCMX (left to right) for (1) E3 classic; () E1 classic; (3) B1 classic (rattle); (4) B1 classic (rattle); (5) E1 electric; and (6) E electric. Simulation parameters were BITSA = 4 and BITSD = 1. and previously raised in Section.Performancecontrol,such as vibrato, would also have to be extracted from the input for a locked phase to be achieved in the resynthesized pluck. The topology of (B), however, allows for some improvement in the reconstructed signal quality by compromising between the prediction gain of the first stage and the WLP coding of the second stage. More explicitly, if the loop filter gain was to be decreased, then the cumulative error being introduced by the quantization in the WLP stage would be correspondingly decreased. Such a downwards adjustment of the loop filter gain in order to minimize coding noise results in a physical model that represents a plucked string with an exaggerated decay.

120 14 EURASIP Journal on Applied Signal Processing This almost makes the physical model prediction stage appear more like the long-term pitch predictor in a more conventional linear prediction (LP) codec targeted at speech. However, there is still the critical difference in that the physical model contains the low-pass component of the loop filter and can still be thought of as modelling the behaviour of a (highly damped) guitar string. To obtain an appropriate value for the loop gain, multiplier tests were run on all six target samples. The electric guitar recordings and the recordings of the classical guitar at E3 represented ideal cases; there were no rattles subsequent to the initial pluck, in addition to negligible changes in pitch throughout their lengths. Amongst the remaining recordings, the two rattling guitar recordings represented two timbres very difficult to model without a lengthy excitation or a much more complex model of the guitar string. The mean BSD measure for the electric guitar at E1 is shown in Figure 8. As can be seen from Figure 8, reducing the loop gain of the physical model predictor increased the performance of the codec and yielded superior BSD scores for loop gain multipliers between.1 and.9. The greater the model mismatch, as in the case of the recordings with rattling strings, the less the string model predictor lowered the mean BSD. Models which did not closely match also featured minimal mean BSDs at lower loop gains (e.g.,.5 to.7). The simulation used to produce Figure 7 was performed again using a single, approximately optimal, loop gain multiplier of.7. The results from this simulation are pictured in Figure 9. The decreased BSD for all the samples in Figure 9 confirms the efficacy of the two-stage codec. Informal subjective listening tests described briefly at the beginning of this section also confirmed that decreasing the bit rate reduced the similarity of the reproduced timbre to the original timbre, without obscuring the fact that it was a guitar pluck and without the thickening of the mix that occurs due to the shaped noise in the WLPC codec. This improvement offered by the two-stage codec becomes even more noticeable at lower bit rates, such as with a constant 1 bit per sample quantization of WLP residual over both attack and decay. To evaluate the utility of the proposed WLPCMX, it is important to compare it to the alternatives. Existing purely signal-based approaches such as MP3 and WLPC have proven their usefulness for encoding arbitrary wideband audio signals at low bit rates while preserving transparent quality. As an example, Härmä found that wideband audio could be coded using WLPC at 3 bits per sample (= khz) for good quality [1]. These models can be implemented in real-time with minimal computational overhead, but like sample-based synthesis, do not represent the transmitted signal parametrically in a form that is related to the original instrument. Pure signal-based approaches, using psychoacoustic models, are thus limited to the extent which they can remove psychoacoustically redundant data from an audio stream. On the other hand, increasingly complex physical models can now reproduce many classes of instruments with excellent quality. Assuming a good calibration or, in the best case, Mean BSD Loop gain multiplier Figure 8: Mean Bark scale spectral distortion versus loop gain multiplier. WLPCMX is solid and WLPC is the dashed-dotted line. Mean BSD (db) Figure 9: Mean Bark scale spectral distortion (db) using each of WLPC, WLPCMX (left to right) for (1) E3 classic; () E1 classic; (3) B1 classic (rattle); (4) B1 classic (rattle); (5) E1 electric; and (6) E electric. Simulation parameters were BITSA = 4andBITSD= 1. a performance made using known physical modelling algorithms, transmission of model parameters and continuous controllers would result in a bit rate at least an order of magnitude lower than the case of pure signal-based methods. As an example, if we consider an average score file from a modern sequencing program using only virtual instruments and software effects, the file size (including simple instrument and effect model algorithms) is on the order of 5 kb. For an average song length of approximately 4 minutes, this leads to a bit rate of approximately 17 kbps. For optimized scores and simple instrument models, the bit rate could be lower than 1 kbps. Calibration of these complex instrument models to resynthesize acoustic instruments remains an obstacle for real-time use in coding, however. Likewise, parametric mod-

121 WLP in Physical Modelling for Audio Compression 143 els are flexible within the class for which they are designed, but an arbitrary performance may contain elements not supported by the model. Such a performance cannot be reproduced by the pure physical model and may, indeed, result in poor model calibration for the performance as a whole. This preliminary study of the WLPCMX topology offers a compromise between the pure physical-model-based approaches and the pure signal-based approaches. For the case of the monophonic plucked string considered in this study, a lower spectral distortion was realized using the model-based predictor. Because more bits were assigned to the attack portion of the string recording, the actual long-term bit rate of the codec is related to the frequency of plucks, but at its worst case it is limited by the rate of the WLP stage (assuming a loop gain multiplier of ) and its best case, given a close match between model and recording, approaches the physical model case. For recordings that were well modelled by the string model, such as the electric guitar at E1 and E and the E3 classic guitar sample, subjective tests suggested that equivalent quality could be achieved with 1 bit per sample less than the WLPC case. Limitations of the string model prevent it from capturing all the nuances of the recording, such as the rattling of the classical guitar s string, but these unmodelled features are successfully encoded by the WLP stage. Because the predictor reflects the acoustics of a plucked string, degradation in quality with lower bit rates sounds more natural. 6. CONCLUSIONS The implementation of a two-stage audio codec using a physical model predictor followed by WLP was simulated and the subjective and objective sound quality analyzed. Two codec topologies were investigated. In the first topology, the instrument response was estimated by windowing the first milliseconds of the excitation, and this estimate was subtracted from the target sample, with the difference being coded using WLP coding. In the second topology, the excitation to the plucked string physical model was coded using WLP before being reconstructed by reapplying the coded excitation to the string model shown in Figure 1. Tests revealed that the limitations of the physical model resulted in model error in the first topology to be of greater amplitude than the target sound, and the codec therefore operated with inferior quality to the WLPC control case. The second topology, however, showed promise in subjective tests whereby a decrease in the bits allocated to the coding of the decay segment of the excitation reduced the similarity of the timbre without changing its essential likeness to a plucked string. A further simulation was performed wherein the loop gain of the physical model was reduced in order to limit the propagation of the excitation s quantization error due to the physical model s long-time constant. This improved objective measures of the sound quality beyond those achieved by the similar WLPC design while maintaining the codec s advantages exposed by the subjective tests. Whereas the target plucks became noisy when coded at 1 bit per sample using WLPC, the allocation of quantization noise to higher harmonics in the second topology meant that the same plucks took on a drier, brighter timbre when coded at the same bit rate. WLP can easily be performed in real-time, and it could thus be applied to coding model excitations in both audio coders and in real-time instrument synthesizers. Analysis of polyphonic scenes is still beyond the scope of the model, however, and the realization of highly polyphonic instruments would entail a corresponding increase in computational demands from the WLP in the decoding of the excitation. Future exploration of the two-stage physical model/wlp coding schemes should be investigated using more accurate physical models, such as the vertical/transverse string model mentioned in Section 1, whichmight allowthe firsttopology investigated in this paper to realize coding gains. Implementation of more complicated models reintroduces, however, the difficulties of accurately parameterizing them though this increased complexity is partially offset by the increased tolerance for error that the excitation coding allows. ACKNOWLEDGMENTS The authors of this paper would like to thank the Japanese Ministry of Education, Culture, Sports, Science and Technology for funding this research. They are also grateful to Professor Yoshikawa for his guidance throughout, and the students of the Signal Processing Lab for their assistance, particularly in making the guitar recordings. REFERENCES [1] K. Karplus and A. Strong, Digital synthesis of plucked-string and drum timbres, Computer Music Journal, vol. 7, no., pp , [] J. O. Smith, Physical modeling using digital waveguides, Computer Music Journal, vol. 16, no. 4, pp , 199. [3] C. Erkut, M. Karjalainen, P. Huang, and V. Välimäki, Acoustical analysis and model-based sound synthesis of the kantele, Journal of the Acoustical Society of America, vol. 11, no. 4, pp ,. [4] V. Välimäki, M. Laurson, C. Erkut, and T. Tolonen, Modelbased synthesis of the clavichord, in Proc. International Computer Music Convention, pp. 5 53, Berlin, Germany, August September. [5] V. Välimäki and T. Tolonen, Development and calibration of a guitar synthesizer, Journal of the Audio Engineering Society, vol. 46, no. 9, pp , [6] M.Karjalainen,V.Välimäki, and T. Tolonen, Plucked-string models: From the Karplus-Strong algorithm to digital waveguides and beyond, Computer Music Journal, vol., no. 3, pp. 17 3, [7] A. Cemgil and C. Erkut, Calibration of physical models using artificial neural networks with application to plucked string instruments, in Proc. International Symposium on Musical Acoustics, Edinburgh, UK, August [8] J. Riionheimo and V. Välimäki, Parameter estimation of a plucked string synthesis model using a genetic algorithm with perceptual fitness calculation, EURASIP Journal on Applied Signal Processing, vol. 3, no. 8, pp , 3. [9] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer, Structured audio: Creation, transmission, and rendering of parametric sound representations, Proceedings of the IEEE, vol. 86, no. 5, pp. 9 94, 1998.

144 EURASIP Journal on Applied Signal Processing [1] E. D. Scheirer, Structured audio, Kolmogorov complexity, and generalized audio coding, IEEE Transactions on Speech and Audio Processing, vol.

International Computer Music Conference, pp. 56 63, Tokyo, Japan, 1993. [1] A.

122 144 EURASIP Journal on Applied Signal Processing [1] E. D. Scheirer, Structured audio, Kolmogorov complexity, and generalized audio coding, IEEE Transactions on Speech and Audio Processing, vol. 9, no. 8, pp , 1. [11] M. Karjalainen, V. Välimäki, and Z. Janosy, Towards highquality sound synthesis of the guitar and string instruments, in Proc. International Computer Music Conference, pp , Tokyo, Japan, [1] A. Härmä, Audio coding with warped predictive methods, Licentiate s thesis, Helsinki University of Technology, Espoo, Finland, [13] V. Välimäki and T. Tolonen, Multirate extensions for modelbased synthesis of plucked string instruments, in Proc. International Computer Music Conference, pp , Thessaloniki, Greece, September [14] D. Jaffe and J. O. Smith, Extensions of the Karplus-Strong plucked-string algorithm, Computer Music Journal, vol.7, no., pp , [15] V. Välimäki, J. Huopaniemi, M. Karjalainen, and Z. Jánosy, Physical modeling of plucked string instruments with application to real-time sound synthesis, Journal of the Audio Engineering Society, vol. 44, no. 5, pp , [16] A. Härmä, M. Karjalainen, L. Savioja, V. Välimäki, U. K. Laine, and J. Huopaniemi, Frequency-warped signal processing for audio applications, Journal of the Audio Engineering Society, vol. 48, no. 11, pp ,. [17] K. Brandenburg and G. Stoll, ISO/MPEG-audio codec: a generic standard for coding of high quality digital audio, Journal of the Audio Engineering Society, vol. 4, no. 1, pp , [18] K. Tsutsui, H. Suzuki, O. Shimoyoshi, M. Sonohara, K. Akagiri, and R. M. Heddle, ATRAC: Adaptive transform acoustic coding for MiniDisc, reprinted from the 93rd Audio Engineering Society Convention, San Francisco, Calif, USA, 199. [19] J. O. Smith, Efficient synthesis of stringed musical instruments, in Proc. International Computer Music Conference, pp , Tokyo, Japan, September Acoustic Design, Faculty of Design, Kyushu University. His research interests include digital signal processing for 3D sound systems, binaural stereophony, engineering acoustics, and direction of arrival (DOA) estimation with sphere-baffled microphone arrays. Alexis Glass received his B.S.E.E. from Queen s University Kingston, Ontario, Canada in During his bachelor s degree, he interned for nine months at Toshiba Semiconductor in Kawasaki, Japan. After graduating, he worked for a defense firm in Kanata, Ontario and a videogame developer in Montreal, Quebec before winning a Monbusho Scholarship from the Japanese government to pursue graduate studies at Kyushu Institute of Design (KID, now Kyushu University, Graduate School of Design). In 1, he received his Masters of Design from KID and is currently a doctoral candidate there. His interests include sound, music signal processing, instrument modelling, and electronic music. Kimitoshi Fukudome was born in Kagoshima, Japan in He received his B.E., M.E., and Dr.E. degrees from Kyushu University in 1966, 1968, and 1988, respectively. He joined Kyushu Institute of Design s Department of Acoustic Design as a Research Associate in 1971 and has been an Associate Professor there since 199. With the October 1, 3 integration of Kyushu Institute of Design into Kyushu University, his affiliation has changed to the Department of

Smith, Kuroda, Perng, Van Heusen, Abel CCRMA, Stanford University ASA November 16, Smith, Kuroda, Perng, Van Heusen, Abel ASA / 31

Efficient computational modeling of piano strings for real-time synthesis using mass-spring chains, coupled finite differences, and digital waveguide sections Smith, Kuroda, Perng, Van Heusen, Abel CCRMA,