Digital Sound Synthesis by Physical Modelling

Symposium on Image and Signal Processing and Analysis (ISPA 0), Pula, Croatia, June 200 Digital Sound Synthesis by Physical Modelling Rudolf Rabenstein and Lutz Trautmann Telecommunications Laboratory University of Erlangen-Nürnberg D-9058 Erlangen, Cauerstr. 7 frabe, trautg@lnt.de Abstract After recent advances in coding of natural speech and audio signals, also the synthetic creation of musical sounds is gaining importance. Various methods for waveform synthesis are currently used in digital instruments and software synthesizers. A family of new synthesis methods is based on physical models of vibrating structures (string, drum, etc.) rather than on descriptions of the resulting waveforms. This article describes various approaches to digital sound synthesis in general and discusses physical modelling methods in particular. Physical models in the form of partial differential equations are presented. Then it is shown, how to derive discrete-time models which are suitable for real-time DSP implementation. Applications to computer music are given as examples.. Introduction The last 50 years have seen tremendous advances in electrical, electronical, and digital information transmission and processing. From the very beginning, the available technology has not only been used to send written or spoken messages but also for more entertainig purposes: to make music! An early example is the Musical Telegraph of Elisha Gray in 876, based on the telephone technology of that time. Later examples used vacuum tube oscilators throughout the first half of last century, transistorized analog synthesizers in the 960s, and the first digital instruments in the 970s. By the end of last century, digital soundcards with various methods for sound reproduction and generation were commonplace in any personal computer. The development is rapidly going on. One driving force is certainly the availablity of ever more powerful hardware. Cheap memory allows to store sound samples in high quality and astonishing variety. The increase in processing power makes it possible to compute sounds in real time. But also new algorithms and more powerful software give desktop computers the functionality of stereo equipment or sound studios. An example are new coding schemes for high quality audio. Together with rising bitrates for file transmission on the internet, they have made digital music recordings freely avaiable on the world wide web. Another example is the combination of high performance sound cards, high capacity and fast access hard disks, and sophisticated software for audio recording, processing and mixing. A high-end personal computer equipped with these components and programs provides the full functionality for a small home recording studio. While more powerful hard- and software turn a single computer into a music machine, advances in standardization pave the way to networked solutions. The benefits of audio coding standards has already been mentioned. But the new MPEG-4 video and audio coding standard does not only provide natural but also synthetic audio coding. This means, that not only compressed samples of recorded music can be transmitted, but also digital scores similar to MIDI in addition to algorithms for the sound generation. Finally the concept of Structured Audio allows to break down an acoustic scene into their components and to transmit and manipulate them independently. While natural audio coding is a well researched subject with widespread applications, the creation of synthetic high quality music is a topic of active development. For some time, applications have been confined to the refinement of digital musical instruments and software synthesizers. Recently, digital sound synthesis finds its way into the MPEG- 4 video and audio coding standard. The most recent and maybe most interesting family of synthesis algorithms is based on physical models of vibrating structures. This article will higlight some of the methods for digital sound synthesis with special emphasis on physical modelling. Section 2 presents a survey of synthesis methods. Two algorithms for physical modelling are described in section 3. Applications to computer music are given in section 4.

2. Digital Sound Synthesis 2.. Overview Four methods for the synthesis of musical sounds are be presented in ascending order of modelling complexity [5]. The first method, wavetable synthesis, is based on samples of recorded sounds with little consideration of their physical nature. Spectral synthesis creates sounds from models of their time-frequency behaviour. The parameters of these models are derived from descriptions of the desired waveforms. Nonlinear synthesis allows to create spectrally rich sounds with very modest complexity of the synthesis algorithms. In contrast to spectral synthesis the parameters of these nonlinear models are not related to the produced waveforms in a straightforward way. The most advanced method, physical modelling, is based on models of the physical properties of the vibrating structure which produces the sound. Rather than imitating a waveform, they simulate the physical behaviour of a string, drum, etc. Such simulations are numerically demanding, but modern hardware allows real-time implementations under practical conditions. 2.2. Wavetable Synthesis The most widespread method for sound generation in digital musical instruments today is wavetable synthesis, also simply called sampling. Here, the term wavetable synthesis will be used, since sampling strictly denotes time discretization of continuous signals in the sense of signal theory. In wavetable synthesis recorded or synthesized musical events are stored in the internal memory and are played back on demand. Therefore wavetable synthesis does not require a parameterized sound source model. It only consists of a database of digitized musical events (the wavetable) and a set of playback tools. The musical events are typically temporal parts of single notes recorded from various instruments and at various frequencies. The musical events must be long enough to capture the attack of the real sounds as well as a portion of the sustain. Capturing the attack is necessary to reproduce the typical sound of an instrument. Recording a sufficiently long sustain period avoids a strict periodicity during playback. The playback tools consist of various techniques for sound variation during reproduction. The most important components of this toolset are pitch shifting, looping, enveloping, and filtering. They are discussed here only briefly. See [3, chapter 8] and [5] for a more detailed treatment. Pitch shifting allows to play a wavetable at different pitches. Recording notes at all possible frequencies for all instruments of interest would require excessive memory. To avoid this situation only a subset of the frequency range is recorded. Missing keys are reconstructed from the closest recorded frequency by pitch variation during playback. Pitch shifting is accomplished by sample rate conversion techniques. Pitch variation is only possible within the range of a few semitones without noticeable alteration of the sound characteristics (Micky-Mouse effect). Looping stands for recursive read out of the wavetable during playback. It is applied due to memory limitations as well as length variations of the played notes. As mentioned above, only a certain period is recorded, long enough to capture the richness of the sound. This period is extended by looping to produce the required duration of the tone. Care has to be taken to avoid discontinuities at the loop boundaries. Enveloping denotes the application of a time varying gain function on the looped wavetable. Since the typical attack-decay-sustain-release (ADSR) envelope of an instrument is destroyed by looping, it can be reconstructed or modified by enveloping. Filtering modifies the time dependent spectral content of a note as enveloping changes its amplitude. Usually recursive digital filters of low order with adjustable coefficients are used. This allows not only a better sound-variability than present in the originally recorded wavetables but also time-varying effects which are not possible with acoustic instruments. Despite these playback tools for sound alteration (and others not mentioned here), the sound variation of wavetable synthesis is limited by the recorded material. However, with the availability of cheap memory, wavetable synthesis has become popular for two reasons: Low computational cost and ease of operation. More advanced synthesis techniques need more processing power and require more skill of the performing musician to fully exploit their advantages. 2.3. Spectral Synthesis While wavetable synthesis is based on sampled waveforms in the time domain, spectral synthesis produces sounds from frequency domain models. There is a variety of methods based on a common generic signal representation: the superposition of basis functions ψ(t) with time-varying amplitudes F l (t) f (t) = X l F l (t) ψ l (t) : () Only a short description of the main approaches is given here, based on [3, chapter 9], [5], and [2]. Practical implementations often consist of combinations of these methods.

Additive Synthesis In additive synthesis, () describes the superposition of sinusoids f (t) = X l F l (t) sin( l (t)) + n(t): (2) Sometimes a noise source n(t) is added to account for the stochastic character which is not modelled well by sinusoids. In the simplest case, each frequency component l (t) is given by a constant frequency and phase term l (t) =! l t + ffi l. In practical synthesis, the time signals in (2) are represented by samples and the synthesized sound is processed in subsequent frames. The time variation of the amplitude and the frequency of the sinusoids are considered by changing the values of F l,! l, and possibly ffi l from frame to frame. Subtractive Synthesis Subtractive synthesis shapes signals by taking away frequency components from a spectrally rich excitation signal. This is achieved by exciting time-varying filters with noise. This approach is closely related to filtering in wavetable synthesis. However, in subtractive synthesis, the filter input is a synthetic signal rather than a wavetable. Since harmonic tones cannot be well approximated by filtered noise, subtractive synthesis is mostly used in conjunction with other synthesis methods. Granular Synthesis In granular synthesis the basis functions ψ l (t) in () are chosen to be concentrated in time and frequency. These basis functions are called atoms or grains here. Building sounds from such grains is called granular synthesis. Sound grains can be obtained by various means: from windowed sine segments, from wavetables, from Gabor expansions, or with wavelet techniques. 2.4. Nonlinear Synthesis In the previous sections linear sound synthesis methods have been described. They varied from the computational cheap wavetable synthesis with low variability to the computational expensive additive synthesis where arbitrary access on the basics of a sound is possible. Using nonlinear models for sound synthesis leads to computational cheap methods with rich spectra. The disadvantage of these methods is that the resulting time functions or spectra cannot be calculated analytically in most cases. Also the effect of parameter changes on the timbre of the sound cannot be predicted except for very simple schemes. Nevertheless nonlinear synthesis provides computational low-cost synthetic sounds with a wide variety of time functions and spectra. The simplest case of nonlinear synthesis is discussed here. Making the phase term in the sine function time de-! m VCO VCO 2 f (t)! 0 Figure. Frequency Modulation pendent leads to the frequency modulation (FM) method. In its simplest form, the time function f (t) is given by f (t) =F (t) sin(! 0 t + ffi(t)): (3) The implementation consists of at least two coupled oscillators. In (3) the carrier sin(! 0 t) is modulated by the timedependent modulator ffi(t) such that the frequency becomes time-dependent with!(t) =! 0 +(@=@t)ffi(t). If the modulator is also sinusoidal with ffi(t) = sin(! m t) as shown in Fig. the resulting spectrum consists of the carrier frequency! 0 and side frequencies at! 0 ± n! m ;n2n. The relations between the amplitudes of the discrete frequencies can be varied with the modulation index. They are given by the values of the Bessel functions of order n with argument. Four different FM spectra for! 0 = khz and different modulator frequencies and different modulation indices are shown in Fig. 2. The spectrum for = has a simple rational relation between! 0 and! m resulting in a harmonic spectrum. Increasing the modulation index to = 2 preserves the distance of the frequency lines but increases their number (top right). A slight decrease of! m moves the frequency components closer together and produces a non-harmonic spectrum (bottom left). Spectrally very rich sounds can be produced by combining small values of the modulation frequency! m with high modulation indices, as shown for = 8. However, due to the dependence on only a few parameters, arbitrary spectra as in additive synthesis cannot be produced. Therefore this method fails to reproduce natural instruments. Nevertheless FM is frequently used in synthesizers and in sound cards for personal computers, often with more than just two oscillators in a variety of different connections. 2.5. Physical Modelling Wavetable synthesis, spectral synthesis as well as the nonlinear synthesis are based on sound descriptions in the time and frequency domain. A family of methods called physical modelling goes one step further by modelling directly the sound production mechanism instead of the sound. Invoking the laws of acoustics and elasticity theory results in the physical description of the main vibrating structures of musical instruments by partial differential equations. Most methods are based on the wave equation which describes wave propagation in solids and in air ([7]).

0.5 η= 0 0 2 f in khz η=2 0.5 η=2 0 0 2 f in khz η=8 and three spatial dimensions, however with considerable increase in computational demand. Physical modeling by digital waveguides is incorporated into various commercial musical instruments using appropriate models for excitation (e.g. plucked, struck, and bowed strings) and boundary conditions. Furthermore, it provides a sound basis for the creation of artificial instruments like bowed flutes. 0.5 0 0 2 f in khz 0.5 0 0 2 f in khz Figure 2. Typical FM spectra Finite Difference Methods The most direct approach is the discretization of the wave equation by finite difference approximations of the partial derivatives with respect to time and space. However, a faithful reproduction of the harmonic spectrum of an instrument requires small step sizes in time and space. The resulting numerical expense is considerably. The application of this aproach to piano strings has for example been shown by []. A physical motivation of the space discretization is given by the mass-spring-models described in [4]. Modal Synthesis Vibrating structures can also be described in terms of their characteristic frequencies or modes and the associated decay rates. This approach allows the formulation of couplings between different substructures. Except for simple cases, the determination of the eigenmodes can only be conducted by experiments ([4]). Digital Waveguides A well known theoretical approach to the solution of the wave equation in one spatial dimension is the d Alembert solution. It separates the wave propagation process into a pair of waves travelling into opposite directions without dispersion or losses. This separation is the basis of the digital waveguides described in [9], [3, chapter 0], and [5, chapter 7]. The digital model consists of a bidirectional delay line with coupling coefficents between the taps approximating losses and dispersion. The digital waveguide method has been refined by proper adjustment of the delay lines using fractional delay filters [5]. Applications to string instruments are found in [4] and to woodwind instruments in [7]. Couplings between sections with different wave impedances are modelled by scattering junctions. They approximate the partial reflections at discontinuities. Waveguide methods have also been extended to two Transfer Function Models This relatively new approach starts directly at the partial differential equation (PDE) describing the continuous vibrations in a musical instrument. It transforms the PDE with suitable functional transformations into a multidimensional (MD) transfer function model (TFM). For the time variable the Laplace transformation is used. The spatial transformation depends on the PDE and its boundary conditions. This leads to a generalized Sturm- Liouville type problem whose solutions are the eigenfunctions K(x; fi μ ) and the eigenvalues fi μ. They are used in the spatial transformation as transformation kernel and as spatial frequency variable [2]. The physical effects modelled by the PDE like longitudinal and transversal oscillations, loss and dispersion are treated with this method analytically. Moreover, the TFM explicitely takes initial and boundary conditions, as well as linear and nonlinear excitation functions into account. The discretization of this continuous model for computer implementation based on analog-to-discrete transformations preserves not only the inherent stability, but also the natural frequencies of the oscillating body. All parameters of this method are strictly based on physical parameters (dimensions as well as material parameters) and the output signal is calculated analytically from these parameters. Digital waveguides and multidimensional transfer function models are covered in more detail in section 3. 2.6. Structured Audio The techniques described above had initially been confined to proprietary hard and software in musical instruments or dedicated programs for the generation of digital music. Although there have been tremendous efforts in the standardization of multimedia services, they were mostly directed to the compression of natural audio and video material. Synthetic sounds were of no concern until the emergence of MPEG-4 standardization. While still advancing the coded representation of natural audiovisual scenes, MPEG-4 introduced a tool for digital sound synthesis under the name of structured audio (SA) [6, 8, 3]. The idea is not to transmit coded sounds as in natural audio but a highly parametric description of music (such as a musical score) from which the sound is synthesized at the de-

coder. Among other tools, structured audio provides score languages to encode the musical paramaters (pitch, duration, etc.) as well as methods for sound synthesis. To describe musical scores, the very popular MIDI standard has been included into MPEG-4 structured audio. But also a more advanced structured audio score language (SASL) has been created to provide enhanced control of algorithmic and wavetable synthesis. Also for sound synthesis two different methods exist, a programming language for musical synthesis algorithms structured audio orchestra language (SAOL) and a standard for the storage and transmission of wavetables structured audio sample bank format (SASBF). SAOL is an object oriented programming language with special commands and variable types for real-time sound synthesis. It differs from conventional programming languages by providing three different time scales for the generation of synthetic wave forms. To each time scale belongs a certain data type, such that variables of that type are automatically evaluated at the corresponding rate. The fastest timescaleisthea-rate, which is peformed at the sampling rate. A medium time scale is the control rate (k-rate) for updating envelopes and other control signals. Typical values for the k-rate are a few cycles per second. An even slower rate is the instrument rate (i-rate) for the initialization of timbre parameters. They may be updated asynchroneously, e.g. at the beginning of each note. Furthermore, SAOL provides special high level sound processing commands like signal and envelope generation, parametric filtering, evaluation of MIDI data, and access of wavetables. Since SAOL is a general purpose language, it can be used to realize any of the sound synthesis algorithms described above. Since SAOL code and sample banks are transmitted together with the score data, the synthesized sound at the decoder will be exactly the same as intended at the encoder side. This is in contrast to the reproduction of MIDI files, where the sound quality is determined by individual instrument or wavetable of the listeners audio equipment. Although there is not yet a real-time structured audio encoder to date, it can be expected that synthetic audio reproduction will become an alternative to coded natural audio in the near future [8]. It has already been demonstrated that structured audio provides highest sound quality at very low bitrates, not attainable by natural audio coders. Of course synthetic audio is restricted to music which can be completely described by musical scores of some kind. It cannot reproduce sounds without an underlying model, e.g. from microphone recordings. Furthermore also the available synthesis methods and sound productions models have to be refined. Advances can be expected on the field of physical models, which are discussed in the following section. 3. Physical Modelling Sound synthesis by physical modelling requires two essential steps: the description of a vibrating structure by the principles of physics and the transformation into a discretetime, discrete-space model which is suitable for computer implementation. Each step requires certain simplifications and allows variations. These are discussed in the following sections. 3.. Vibration Models Deformable bodies may exhibit vibrations in various frequency ranges. The exact description of such vibrations requires to break up the body into small volume elements. Setting up a balance between the forces of inertia and deformation for each element leads to a PDE for the deflection from the rest position. The derivation of these PDE descriptions for vibrating strings, reeds, membranes, and other elastic bodies can be found e.g.in [6, 0, 7]. Various PDE models for a vibrating string are presented as examples. The time and space coordinates are denoted by t and x. Only one space coordinate is considered for simplicity. y(x; t) denotes the deflection of the string from the rest position. Furthermore a number of material constants and shape parameters are required such as the Young s modulus E and the density ρ of the material, the cross section area A and the moment of inertia I of the string. The simplest model results for undamped longitudinal waves. It takes the form of the well-known wave equation with second order derivatives for time and space E @2 y(x; t) ρ @2 y(x; t) =0: (4) @x 2 @t 2 For sound generation, transversal waves are more important, since they transmit energy to the resonance body and the surrounding air. They are characterized by a fourthorder spatial derivative EI @4 y @x 4 + ρa @2 y(x; t) =0: (5) @t 2 Typically, a string is under strain by a certain force F,resulting in an additional second order term EI @4 y @x 4 F @2 y @x 2 + ρa @2 y(x; t) @t 2 =0: (6) Further refinement of the model by inclusion of rotational vibrations and shear effects finally leads to the Timoshenko equation from elasticity theory. Rather than refining the model in this direction, we extend (6) by an external force per length f (x; t). Furthermore damping is considered by additional terms with the

decay variables d and d 3 EI @4 y @x F @2 y 4 @x 2 + ρa @2 y(x; t) @t 2 @y +d @t + d @ 3 y 3 @t@x 2 = f (x; t) : (7) Note that for rigid (E =0)orverythin(I =0) strings with no damping (d = d 3 =0) (7) has the same structure as the wave equation (4), however with different coefficients. Its solution can be written as superposition of a forward and a backward travelling wave (d Alembert solution) y(x; t) =y l (x + ct) +y r (x ct) (8) where c = p F=(ρA) is the propagation speed of the waves. If the first term in (7) does not vanish, then the travelling waves are subject to dispersion. If the decay variables d and d 3 are nonzero, then the term with d introduces frequency independent damping and the term with d 3 introduces frequency dependent losses. The vibration modes of a string are not only determined by the PDE but also by the boundary conditions at the ends x = x 0 and x = x. To solve (7) we need four boundary conditions because the highest order of spatial derivatives is four. In most musical instruments the string is fixed at the ends, as shown in Fig. 3. Figure 4. Boundary conditions for a string fixed at both ends initial conditions: one for the initial value of the deflection and one for its time derivative. The plucked string is characterized by a given deflection profile for t = 0, while the time derivatives are zero [6]. The struck string is given by the first order time derivatives while the deflection is zero at t = 0. The corresponding initial conditions are y(x; 0) = y i0 (x); t =0; () _y(x; 0) = y i (x); t =0: (2) The dot denotes time derivation _y(x; t) =@y=@t. The initial profile of a string plucked close to x is shown in Fig. 5, while Fig. 6 shows the initial velocity of string struck by a hammer at the position x e. In general, both y i0 (x) and y i (x) can be specified independently from each other. Figure 3. Mechanical fixing of a string Figure 5. Initial conditions for a plucked string The boundary conditions for this situation require that the deflection (9) and the skewness (0) at these points are zero [6] (see Fig. 4) y(x 0 ;t)=0; y(x ;t)=0; (9) y 00 (x 0 ;t)=0; y 00 (x ;t)=0: (0) The double prime denotes the second order spatial derivative y 00 = @ 2 y=@x 2. Elastic fixing at the ends of the string or interface conditions to the sound board are described in the same way, e.g. by prescribing a certain linear combination of y(x 0 ;t) and y 00 (x 0 ;t). The boundary conditions can also include an excitation function at the boundary as they occur in wood wind instruments. Typical excitation modes for musical instruments are to pluck or to struck the string. These modes are expressed in mathematical terms as initial conditions of the PDE. Because the highest time derivative in (7) is two, we need two Figure 6. Initial conditions for a struck string The PDE descriptions presented in this section consitute highly accurate physical models of strings and other vibrating structures. Extensions to two and three space dimensions are strightforward with a more general definition of the spatial differentiation operators. However, a computer implementation of these models requires to implement suitable discretization schemes for time and space coordinates. Two approaches of practical importance are the digital wave guides method and the functional transformation method. They are described below in detail.

3.2. Digital Waveguide Method The digital waveguide method is based on the analogy between wave propagation mechanisms in elastic bodies, reeds and electromagnetic waveguides and their counterparts in digital delay lines. Its application to computer music has been presented e.g. in [9], [3, chapter 0], [5, chapter 7] and [5, 4, 7]. The most simple vibration model is wave equation, according to (4) for longitudinal waves or by simplification of (7) for transversal waves. A representation of the corresponding travelling wave solution (8) by a continouos waveguide is shown in Fig. 7. ffl a multiplication to model frequency independent damping, ffl a selective filter to model frequency dependent damping, or ffl an allpass to model frequency dependent delay. Figure 9. Digital waveguide with loss and dispersion filters Figure 7. Continuous waveguide It can be transformed into a equivalent discrete structure by sampling the solution y(x; t) on a space-time grid with x = mh and t = kt where m and k are the discrete space and time coordinates and h and T are the corresponding step sizes. When the time step size h is set equal to the distancethat awave withpropagationspeedc travels during the time step size h, i.e. h = ct, then the sampled waves y[m; k] =y(mh; kt ) are related by y[m; k] =y l [m + k] +y r [m k] (3) Boundary conditions are considerd by a proper termination of the dual delay line waveguide. These are also realized by digital filters as shown in Fig. 0. Similar to the filters H(z) for loss and dispersion, the boundary reflection filters R l (z) and R r (z) represent ffl a phase shift for an open or closed line termination, ffl a real constant 0 < r < as frequency independent reflection factor, or ffl a digital filter for frequency dependent reflections. The spatial shifts by the distance h are realized by sampling the continuous waveguide in x-direction and the time shifts are realized by delay elements (z ). The resulting dual delay line structure of a digital waveguide is shown in Fig. 8. Figure 0. Digital waveguide with boundary reflection filters Figure 8. Digital waveguide It is capable of reproducing the travelling wave solution of the wave equation, but it does not consider loss or dispersion from more detailed vibration models such as (7)). These effects can be approximated by including additional filter elements H(z) in the delay lines (see Fig. 9). These elements may consist of The double delay line waveguide structure in Fig. 0 gives a complete picture how wave propagation, loss, dispersion, and reflection at the boundaries can be approximated. In addition, initial conditions are realized by the initial values of the delay elements for k =0. On the other hand, a practical implementation under realtime constraints calls for some simplifications. At first all delay elements can be combined into a single delay line represented by a multiple delay element by 2M samples. Then the various filters H(z) at each delay element, as well as the boundary reflection filters R l (z), andr r (z) are combined

into a smaller number of different filters. In practical implementations, their transfer functions are not derived directly from H(z), R l (z), andr r (z). Instead they are designed to produce a certain waveform. It has turned out that three different filters are adequate to model the correct pitch, dispersion and frequency dependent damping [4]. Fig. shows the resulting arrangement of a single delay line and three digital filters. f (t) H fd H disp H TP z 2M Figure. Efficient realization of a digital waveguide with excitation function f (t) and output y(t) The functions of these filters are y(t) H fd (z) fractional delay filter to produce the exact pitch, H disp (z) dispersion filter for deviations from pure wave propagation, H TP (z) lowpass filter to model frequency dependent damping. They are assumed to be orthogonal in the sense that they can be designed independently from each other. 3.3. Functional Transformation Method The functional transformation method derives a discrete model of the vibrating body from its multidimensional transfer function description. Such an approach is well-known from the design of digital filters in the one-dimensional case. It starts from the description of a continuous-time electrical network by an ordinary differential equation. Application of Laplace transformation turns the differential equation into a transfer function. Suitable discretisation schemes like impulse-invariant transormation or others convert the transfer function of the continuoustime network into the transfer function of a discrete-time system, which is suitable for computer implementation. It is worthwhile to review the reasons why the Laplace transformation is well-suited for the derivation of transfer functions from ordinary differential equations:. Laplace transformation turns the time derivatives into multiplications with algebraic functions of the complex frequency variable. 2. Laplace transformation turns the initial conditions of a differential equation into additive terms. By virtue of these properties, the differential equation with initial conditions is converted into an algebraic equation. Solving this equation for the Laplace transform of the output quantity yields the transfer function of the network. The same approach can also be applied to multidimensional systems described by PDEs. Again, Laplace transformation can be applied with respect to the time variable. However, the result will still contain derivatives with respect to space. Now assume that a transformation exists for the space variable which has similar properties as the Laplace transformation for the time variable. Then application of this transformation would turn the boundary-value problem into an algebraic equation. This approach relies on the existence of a transformation with respect to space with differentiation properties similar to the Laplace transformation. It has been shown how such transformations can be obtained for the PDEs presented in section 3. and many others [2, ]. Then the following four-step procedure can be applied to derive a discrete model for vibrating bodies from a PDE model:. Application of the Laplace transformation with respect to time removes the time derivatives and turns the initial-boundary-value problem into a boundary value problem for the space variable. 2. Application of a suitable transformation for the space variable which removes the spatial derivatives and turns the boundary value problem into an algebraic equation. 3. Solution of the algebraic equation for the transform of the solution of the PDE. The resulting MD transfer function is the frequency domain equivalent to the initial continuous-time, continuous space PDE description. 4. Discretization of the MD transfer function to obtain a discrete-time discrete-space model of the vibrating body. This procedure is now demonstrated by a simple PDE model already considered in section 3.2. Then extensions to other PDE models and other types of boundary conditions will be discussed. The derivation of a transfer function model of a vibrating string is presented by a simple example. We consider the wave equation for a string with fixed ends (boundary condi-

tions (9) and with initial conditions according to (,2). ÿ(x; t) c 2 y 00 (x; t) = 0 x 0 <x<x y(x; 0) = y i0 (x) x 0 <x<x _y(x; 0) = y i (x) x 0 <x<x y(x 0 ;t) = 0 x = x 0 y(x ;t) = 0 x = x (4) ÿ(x; t) and y 00 (x; t) denote second order derivatives with respect to time and space. Laplace transformation with respect to the time variable Y (x; s) =Lfy(x; t)g turns this initial-boundary value problem into a boundary-value problem for the space variable x s 2 Y (x; s) c 2 Y 00 (x; s) = sy i0 (x) +y i (x) Y (x 0 ;s) = 0 (5) Y (x ;s) = 0 Figure 2. Shape of the eigenfunctions K(fi μ ;x), μ =; 2; 3 with suitable normalization factors N μ. The shape of the first three eigenfunctions is shown in Fig. 2. Using the conditions (9) and (20) and integration by parts, we can show the differentation property of the transformation T TfY 00 (x)g = Z x Y 00 (x)k(fi;x) dx = fi 2 μ μy (fi μ ) : (22) F x 0 Note that the second order time derivative has turned into a multiplication with s 2 and that the initial values from (4) appear as additive terms on the right hand side. To remove also the spatial derivative and to consider the boundary conditions, we apply the spatial transformation TfY (x)g = μ Y (fi μ )= Z x with the transformation kernel K(fi μ ;x)=k μ sin fiμ (x x 0 ) x 0 Y (x)k(fi μ ;x) dx (6) (7) and the discrete spatial frequency ß fi μ = μ ; μ 2 N : (8) x x 0 This special form of the spatial transformation (finite sinetransformation) has been chosen, because the transformation kernel from (7) fulfills the same boundary conditions as the deflection y(x; t) of the string (compare (4) K(fi μ ;x 0 ) = 0 (9) K(fi μ ;x ) = 0 (20) The transformation kernel K(fi μ ;x) represents the spatial eigenfunction of the string. In other words, the frequency domain quantities μ Y (fi μ ) represent the amplitudes of the corresponding eigenfunctions. In reverse, the inverse transformation constitutes an expansion of the deflection y(x) in terms of the eigenfunctions K(fi μ ;x) y(x) =T f μ Y (fi μ )g NX μ=0 N μ μy(fi μ )K(fi μ ;x) (2) Application of (6) and (22) now turns the boundaryvalue problem (6) into an algebraic equation s 2 μ Y (fi μ ;s)+c 2 fi 2 μ μ Y (fi μ ;s)=s μy i0 (fi μ )+μy i (fi μ ) : (23) It is straightforward to solve (23) for the transform of the solution μ Y (fi μ ;s)=t flfy(x; t)gg. s μy (fi μ ;s)= μy s 2 + c 2 fiμ 2 i0 (fi μ )+ μy s 2 + c 2 fiμ 2 i (fi μ ) (24) This result is the desired transfer function model. It can also be written in the form μy (fi μ ;s)= μ G i0 (fi μ ;s)μy i0 (fi μ )+ μ G i (fi μ ;s)μy i (fi μ ) (25) with the transfer function for the inital values μy i0 (fi μ ) and μy i (fi μ ) μg i0 (fi μ ;s) = μg i (fi μ ;s) = s s 2 + c 2 fi 2 μ s 2 + c 2 fi 2 μ ; (26) : (27) These transfer function describe the string in the same way as the initial-boundary value from (4). However, in contrast to the original PDE model, they provide a convient transition to a discrete-time, discrete-space model. At first, we note that the spatial frequency fi μ is a discrete variable. Thus it is sufficient to discretize the time variable. This is accomplished by any analog-to-discrete transformation, e.g. impulse-, step-, ramp-invariant or bilinear transformation. Since the inital values may be seen as the result of impulse function, the impulse-invariant transformation

provides optimal results. It turns the second order transfer functions into s s 2 + c 2 fiμ 2 s 2 + c 2 fi 2 μ!! z 2 z cos(! μ T ) z 2 2z cos(! μ T )+ z sin(! μ T )=! μ z 2 2z cos(! μ T )+ with! μ = cfi μ. Then the discrete-time transfer function model takes the form μy d z 2 z cos(! μ T ) (fi μ ;z) = z 2 2z cos(! μ T μy i0(fi μ )+ )+ z sin(! μ T )=! μ z 2 2z cos(! μ T μy i(fi μ ) : (28) )+ Inverse z-transformation gives finally for each value of the discrete spatial frequency index μ one difference equation for the discrete time variable k μy d (fi μ ;k)= 2 cos(! μ T ) μy d (fi μ ;k ) μy d (fi μ ;k 2) + +μy i0 (fi μ ) fl 0 (k) + ψ! sin(! μ T ) μy i (fi μ ) cos(! μ T ) μy i0 (fi μ ) fl 0 (k )! μ where fl 0 (k) denotes the discrete-time impulse sequence. The structure of this difference equation is shown in Fig. 3. The spatial transforms of the initial value profiles act as inputs for the first time step k =0. The second order recursive system computes the time history of the eigenfunction with frequency! μ. Figure 3. Second order difference equation Since the most simple model, the lossless wave equation, has been assumed, the second order system in Fig. 3 shows no decay and would ring forever. Furthermore, there exists such a second order system for each value of μ and the final output has to be recovered by the inverse spatial transformation (2) from all partial results μy d (fi μ ;k) in the audible frequency range. Fig. 4 shows this situation for the more general string model which contains loss terms. Different from Fig. 3, each second order system now exhibits (29) a decaying oscillation. The individual decay rate for each frequency is determined by the coefficient c μ. The partial results from each recursive system are weighted with the values of the eigenfunctions K(μ; x a ) at certain listening position x a along the string. The final result y d (x a ;k) represents the sampled oscillation of a string element at position x a. Figure 4. Parallel arrangement of recursive systems Although based on many simplifications, this example showed, that the functional transformation method (FTM) provides an exact and systematic way from the PDE description of a vibrating string to a discrete model suitable for computer implementation. The coefficients of the discrete model are expressed directly by the parameters of the physical model. Extensions of the FTM into many directions have been presented in [2, ] and other literature cited there. The main topics are briefly discussed: ffl The PDEs in section 3. exhibit more complex differentiation operators in time and space than the wave equation presented above. Higher order differential operators with respect to time simply introduce higher order polynomials in s into the transfer functions, resulting in recursive systems of higher order. Higher order spatial operators require a careful construction of the spatial transformation T. The suitable theoretical framework for this task is the theory of special boundary-value problems of the Sturm-Liouville type. ffl A more realistic treatment of the fixing of a string or other vibrating bodies requires to consider also boundary conditions of second or third kind. This is also possible in the context of the Sturm-Liouville theory mentioned above.

ffl So far, only problems with one spatial dimension have been presented for simplicity. The extension to two or three dimensions poses no fundamental difficulty. An example for two spatial dimensions is given in []. 4. Applications to Computer Music After a presentation of sound synthesis methods by physical modelling, we show some applications to computer music. The focus is on the functional transformation method (FTM) because it is the most flexible and most accurate physical modelling method. 4.. Modelling of Musical Instruments The parallel arrangement of recursive systems shown in Fig. 4 is the core of a number of musical instrument models. They all share the same physical model of a vibrating string, but they differ in the kind of excitation. The most simple one is a string plucked with a certain force profile. The other models for a bowed string and for a string struck by a hammer employ different nonlinear excitation models. Finally a FTM model of drum is presented. Plucked String. The most simple way to model a plucked string is to choose the initial value y i0 (x) according to Fig. 5 and to set y i0 (x) to zero. A more advanced model uses a certain time and space profile for the excitation by a suitable force per length f (x; t) as in (7). Including this excitation model into the discrete-time system from Fig. 4 results in an excitation of the inputs a(μ) while the inputs b(μ) remain unaffected. This simple but versatile excitation model is shown in Fig. 5 Figure 6. FTM model of a bowed string Hammered String. To model a real hammer-string interaction the dynamic of the hammer has to be taken into account. The hammer deflection can be modeled by one second order recursive system. The input force for this recursive system is the negative input force for the recursive systems of the string. The hammer interacts nonlinear with the string because of the nonlinearity of the force-deflection law of the hammer felt. The input variable is here the initial hammer velocity v h. The algorithm is shown in figure 7. The nonlinear operation includes a delay for computability. Figure 7. FTM model of a hammered string Figure 5. FTM model of a plucked string Bowed String. The action of the bow on a string is not only time dependent but depends also on the velocity of the string. It can be described as a nonlinear stick and slip action between bow and string. It can be realized with the feedback structure shown in figure 6. The input variable is the bow velocity. Vibrations of a Drum. The extension of the above string models to membranes leads to a spatial transformation with two-dimensional eigenfunctions. A result from [] shows the vibrations of a circular drum excited with a drum stick at different points. As is well-known among drummers, an excitation closer to boundary produces a more interesting sond than an excitation in the center.

References Figure 8. Vibrations of a circular drum with excitation in the center and close to the boundary 4.2. Musical Instrument Morphing So far, we have assumed that the physical properties of the instrument models do not change with time. This is a reasonable constraint for most real instruments. However, for virtual instrument, also the model parameters are at the disposal of the player. The FTM described above permits also sound variations of the following kind: During operation of an instrument, its physical parameters are slowly changed from one set of parameters to another. As a consequence, the timbre of the instrument changes gradually, e.g. from a guitar string to a xylophone. Of course also rare combinations of material parameters are possible that cannot appear in real instruments. This well directed change of the sound characteristic of a virtual instrument is called instrument morphing. It requires a close control over the physical parameters of the model as it is provided by the FTM. 5. Conclusions Digital sound synthesis is an emerging application for multimedia processing. With ever increasing computing power, real-time implementation of demanding physical models has become feasable. The advantage of physical modelling over conventional sound reproduction or synthesis methods lies in the combination of highly flexible and at the same time physically correct models. The high flexiblity allows the player of a virtual instrument to control all parameters of the model during operation, while the physical correctness ensures stable operation and meaningful results with all parameter variations. Future developements are expected in different directions. The complexity of the modells for strings, membranes, bells, tubes and other obejcts will cetrainly increase. Furthermore, also the interactions between different kinds of models for different components of an instrument have to be established and implemented. Finally, the control of the player over the virtual instrument will be extended by new, human gesture based interfaces. [] A. Chaigne and A. Askenfelt. Numerical simulations of piano strings. I. A physical model for a struck string using finite difference methods. J. Acoust. Soc. Am., 95(2):2 8, 994. [2] M. Goodwin and M. Vetterli. Time-frequency signal models for music analysis, transformation, and synthesis. In Proc. IEEE Int. Symp. on Time-Frequency and Time-Scale Analysis, pages 33 36, 996. [3] M. Kahrs and K. Brandenburg, editors. Application of Digital Signal Processing to Audio and Acoustics. Kluwer Academic Publishers, Boston, 998. [4] G. D. Poli, A. Piccialli, and C. Roads. Representation of Musical Signals. MIT Press, Cambridge, Mass., 99. [5] C. Roads, S. Pope, A. Piccialli, and G. D. Poli, editors. Musical Signal Processing. Swets & Zeitlinger, Lisse, 997. [6] T. Rossing and N. Fletcher. Principles of Vibration and Sound. Springer, New York, 995. [7] G. P. Scavone. Digital waveguide modeling of the non-linear excitation of single reed woodwind instruments. In Proc. Int. Computer Music Conference, 995. [8] E. D. Scheirer, R. Väänänen, and J. Houpaniemi. AudioB- IFS: Describing audio scenes with the MPEG-4 multimedia standard. IEEE Transactions on Multimedia, (3):237 250, September 999. [9] J. O. Smith. Physical modeling using digital waveguides. Computer Music Journal, 6(4):74 9, 992. [0] M. Tohyama, H. Suzuki, and Y. Ando. The Nature and Technology of Acoustic Space. Academic Press, London, 995. [] L. Trautmann, S. Petrausch, and R. Rabenstein. Physical modeling of drums by transfer function methods. In Proc. Int. Conf. Acoustics, Speech, and Signal Proc. (ICASSP 0). IEEE, 200. [2] L. Trautmann and R. Rabenstein. Digital sound synthesis based on transfer function models. In Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 999. [3] R. Väänänen. Synthetic audio tools in MPEG-4 standard. In Proc. 08th AES Convention. Audio Engeneering Society, February 2000. Preprint 5080. [4] V. Välimäki, J. Huopaniemi, and M. Karjalainen. Physical modeling of plucked string instruments with application to real-time sound synthesis. Journal Audio Engineering Soc., 44(5):33 353, 996. [5] V. Välimäki and T. Takala. Virtual musical instruments natural sound using physical models. Organised Sound, (2):75 86, 996. [6] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer. Structured audio: Creation, transmission, and rendering of parametric sound representations. Proc. of the IEEE, 86(5):922 940, 998. [7] L. J. Ziomek. Fundamentals of Acoustic Field Theory and Space-Time Signal Processing. CRC Press, Boca Raton, 995. [8] G. Zoia and C. Alberti. An audio virtual DSP for multimedia frameworks. In Proc. Int. Conf. Acoustics, Speech, and Signal Proc. (ICASSP 0), 200.