David Weenink. First semester 2007

Size: px

Start display at page:

Download "David Weenink. First semester 2007"

Amberly Stevenson
6 years ago
Views:

1 Institute of Phonetic Sciences University of Amsterdam First semester 2007

3 Definition (ANSI: In Psycho-acoustics) is that auditory attribute of sound according to which sounds can be ordered on a scale from low to high. This ordering is unique only for sine tones! Complex tones in general can have several pitches. Perception of pitch is a compex phenomenon: Definition of Praat s Meter The Object From to Sound of pure tone depends on: Intensity Spectral composition Type and intensity of superimposed additional sound Which ear is stimulated

4 Definition (ANSI: In Psycho-acoustics) is that auditory attribute of sound according to which sounds can be ordered on a scale from low to high. This ordering is unique only for sine tones! Complex tones in general can have several pitches. Perception of pitch is a compex phenomenon: Definition of Praat s Meter The Object From to Sound of pure tone depends on: Intensity Spectral composition Type and intensity of superimposed additional sound Which ear is stimulated

5 Definition (ANSI: In Psycho-acoustics) is that auditory attribute of sound according to which sounds can be ordered on a scale from low to high. This ordering is unique only for sine tones! Complex tones in general can have several pitches. Perception of pitch is a compex phenomenon: Definition of Praat s Meter The Object From to Sound of pure tone depends on: Intensity Spectral composition Type and intensity of superimposed additional sound Which ear is stimulated

6 Definition (ANSI: In Psycho-acoustics) is that auditory attribute of sound according to which sounds can be ordered on a scale from low to high. This ordering is unique only for sine tones! Complex tones in general can have several pitches. Perception of pitch is a compex phenomenon: Definition of Praat s Meter The Object From to Sound of pure tone depends on: Intensity Spectral composition Type and intensity of superimposed additional sound Which ear is stimulated

7 Intensity Stevens, S.S. (1935), The relation of pitch to intensity, J. Acoust. Soc. Am. 6, Measurements were made of the amount by which the pitch of tones ranging from 150 to 12,000 Hz is changed by an increase in intensity. Observers were presented alternately with two tones of different frequency and required to make them sound equal in pitch by varying the intensity of one of the tones. The results, when plotted as equal pitch contours, show (1) the pitch of high tones increases with intensity, (2) the pitch of low tones decreases with intensity, (3) the point at which the effect reverses varies with intensity level. The fact that the points of reversal correspond quite closely with the points of greatest sensitivity of the ear, as shown by contours of equal loudness, suggests that the change in pitch with intensity is due to the resonant characteristics of the ear. Back to Definition of Praat s Meter The Object From to Sound

8 Superimposed Additional Sound Allanson, J.T., Schenkel, K.D. (1965). The effect of band-limited noise on the pitch of pure tones. J. Sound Vib. 2, An investigation has been made of the effect of a band of noise, one-third of an octave wide, on the perceived pitch of a pure tone. In general, the pitch was found to move away from the interfering noise. However, in contrast with the results of erlier workers, the shifts in pitch were found to be quite small and it is suggested that this may be due to the difference in experimental procedures. Back to Definition of Praat s Meter The Object From to Sound

9 Spectral Composition Terhardt, E. (1971). Die Tonhöhe Harmonischer Klänge und das Oktavintervall. Acustica 24, The frequencies of a sinusoidal tone and of a complex tone with the same pitch are slightly different. The investigations show that usually the pitch of a complex tone is lower than the pitch of sinusoidal tone of the same (fundamental) frequency. The frequency ratio corresponding to the subjectively correct pitch interval of a musical octave usually differs slightly from the value 2. This phenomenon was investigated with low pure tones and with complex tones. The results for complex tones are explained by the octave intervals that were found with simple tones and the pitch differences between simple and complex tones. Back to Definition of Praat s Meter The Object From to Sound

10 Interaural Difference Observation When one and the same tone is alternately presented to the right and left ear alone, there is a good chance that a slight but, for a fixed frequency systematic, difference between the pitch from the right ear and that from the left can be noticed. Definition of Praat s Meter The Object From to Sound It escapes notice because the pitches of the two ears merge in conscious perception. of sine tones is basically created by each ear individually! Ref: persons/ter/top/diplacus.html Back to

11 Relations between frequency is signal property is a subjective property (Du: toonhoogte) Definition of Praat s Meter The Object From to Sound Monotone relation between pitch and F 0

12 Relations between frequency is signal property is a subjective property (Du: toonhoogte) Definition of Praat s Meter The Object From to Sound Monotone relation between pitch and F 0

13 Relations between frequency is signal property is a subjective property (Du: toonhoogte) Definition of Praat s Meter The Object From to Sound Monotone relation between pitch and F 0

14 Time-domain: inverse filtering, autocorrelation... -domain: via harmonic structure in the spectrum, cepstrum, harmonic sieve... Auditory modeling: auditive filterbank neural transduction spike generation interval detection candidates. Definition of Praat s Meter The Object From to Sound

15 Time-domain: inverse filtering, autocorrelation... -domain: via harmonic structure in the spectrum, cepstrum, harmonic sieve... Auditory modeling: auditive filterbank neural transduction spike generation interval detection candidates. Definition of Praat s Meter The Object From to Sound

16 Time-domain: inverse filtering, autocorrelation... -domain: via harmonic structure in the spectrum, cepstrum, harmonic sieve... Auditory modeling: auditive filterbank neural transduction spike generation interval detection candidates. Definition of Praat s Meter The Object From to Sound

17 Time-domain: inverse filtering, autocorrelation... -domain: via harmonic structure in the spectrum, cepstrum, harmonic sieve... Auditory modeling: auditive filterbank neural transduction spike generation interval detection candidates. Definition of Praat s Meter The Object From to Sound A good pitch meter gives several candidates for pitch

18 praat s Meter Table: Accuracy of the algorithm with Hanning window. Periods per determination error window F /F sine wave pulse train > 3 < < > 6 < < > 12 < < > 3 < 10 6 (Gaussian window) Definition of Praat s Meter The Object From to Sound

19 praat s Algorithm Two steps: 1 Find the pitch candidates and their strengths 2 Determine optimal path in candidate space on the basis of costs and merits Definition of Praat s Meter The Object From to Sound

20 praat s Algorithm Two steps: 1 Find the pitch candidates and their strengths 2 Determine optimal path in candidate space on the basis of costs and merits Definition of Praat s Meter The Object From to Sound

21 Finding the Candidates (1) autocorrelation 1-r r r 0 Definition of Praat s Meter The Object From to Sound τ max Lag τ (s) r(τ) = x(t)x(t + τ)dt (autocorrelation) r (τ) = r(τ)/r(0) (normalized autocorrelation) Every peak in autocorrelation function is pitch candidate HNR = 10 log r 1 r (harmonics-to-noise ratio in db) Position and amplitude of peak by sin x/x interpolation.

22 Finding the Candidates (1) autocorrelation 1-r r r 0 Definition of Praat s Meter The Object From to Sound τ max Lag τ (s) r(τ) = x(t)x(t + τ)dt (autocorrelation) r (τ) = r(τ)/r(0) (normalized autocorrelation) Every peak in autocorrelation function is pitch candidate HNR = 10 log r 1 r (harmonics-to-noise ratio in db) Position and amplitude of peak by sin x/x interpolation.

23 Finding the Candidates (1) autocorrelation 1-r r r 0 Definition of Praat s Meter The Object From to Sound τ max Lag τ (s) r(τ) = x(t)x(t + τ)dt (autocorrelation) r (τ) = r(τ)/r(0) (normalized autocorrelation) Every peak in autocorrelation function is pitch candidate HNR = 10 log r 1 r (harmonics-to-noise ratio in db) Position and amplitude of peak by sin x/x interpolation.

24 Finding the Candidates (1) autocorrelation 1-r r r 0 Definition of Praat s Meter The Object From to Sound τ max Lag τ (s) r(τ) = x(t)x(t + τ)dt (autocorrelation) r (τ) = r(τ)/r(0) (normalized autocorrelation) Every peak in autocorrelation function is pitch candidate HNR = 10 log r 1 r (harmonics-to-noise ratio in db) Position and amplitude of peak by sin x/x interpolation.

25 Finding the Candidates (2): Autocorrelation and Window Correction 1 (1) x(t) -1 0 Time (ms) 24 1 (2) w(t) 0 0 Time (ms) 24 1 (3) a(t)=w(t)x(t) -1 0 Time (ms) 24 Definition of Praat s Meter The Object From to Sound 1 (4) r a (τ) 1 (5) r w (τ) 1 (6) r x (τ) Lag (ms) Lag (ms) Lag (ms) 24 x(t) = ( sin 2π140t) sin 2π280t x(t) multiplied by w(t) gives a(t) r a (τ) divided by r w (τ) gives r x (τ)

26 Finding the Optimal Path (1): The Problem The Problem N analysis frames M candidates per frame Find the best path } O(M N )paths Definition of Praat s Meter The Object From to Sound N = 100 and M = 10 gives possible paths. This will take year (10 9 paths /s) We need something better: dynamic programming O(N M)! A reduction of computing time from to

27 Finding the Optimal Path (2): The Solution Introduce positive and negative costs and find the path with maximum costs with a dynamic programming algorithm. There are two kinds of costs involved 1 Costs within each frame Definition of Praat s Meter The Object From to Sound 2 Costs for transitions between frames

28 Finding the Optimal Path (2): The Solution Introduce positive and negative costs and find the path with maximum costs with a dynamic programming algorithm. There are two kinds of costs involved 1 Costs within each frame Definition of Praat s Meter The Object From to Sound 2 Costs for transitions between frames

29 Finding the Optimal Path (3): Costs Within On the + side: r (τ max ) On the - side: OctaveCost 2 log(minimum τ max ) R = r (τ max ) OctaveCost 2 log(minimum τ max ) Best local candidate has highest R OctaveCost favors higher frequencies: Perfect periodic signal For x(t) = (1 + d mod sin 2πFt)sin2π2Ft, perceived pitch is 2F when d mod 0.3 Criterion: OctaveCost = d 2 mod Default is 0.01, i.e. d mod = 0.1 Definition of Praat s Meter The Object From to Sound

30 Finding the Optimal Path (3): Costs Within On the + side: r (τ max ) On the - side: OctaveCost 2 log(minimum τ max ) R = r (τ max ) OctaveCost 2 log(minimum τ max ) Best local candidate has highest R OctaveCost favors higher frequencies: Perfect periodic signal For x(t) = (1 + d mod sin 2πFt)sin2π2Ft, perceived pitch is 2F when d mod 0.3 Criterion: OctaveCost = d 2 mod Default is 0.01, i.e. d mod = 0.1 Definition of Praat s Meter The Object From to Sound

31 Finding the Optimal Path (3): Costs Within On the + side: r (τ max ) On the - side: OctaveCost 2 log(minimum τ max ) R = r (τ max ) OctaveCost 2 log(minimum τ max ) Best local candidate has highest R OctaveCost favors higher frequencies: Perfect periodic signal For x(t) = (1 + d mod sin 2πFt)sin2π2Ft, perceived pitch is 2F when d mod 0.3 Criterion: OctaveCost = d 2 mod Default is 0.01, i.e. d mod = 0.1 Definition of Praat s Meter The Object From to Sound Optimal path: connect all the best local candidates

32 Finding the Optimal Path (4): Costs Between On the transition from frame i to i + 1 we can have octave jumps or voiced-unvoiced jumps - costs: { VoicedUnvoicedCost Fi = 0 xor F i+1 = 0 OctaveJumpCost 2 log F i F i+1 F i 0 and F i+1 0 Considerations Increasing VoicedUnvoicedCost decreases the number of voiced-unvoiced transitions Increasing OctaveJumpCost decreases the number of octave jumps OctaveJumpCost=0 and VoicedUnvoicedCost=0: no path finder, local best candidate selected. Definition of Praat s Meter The Object From to Sound

33 Finding the Optimal Path (4): Costs Between On the transition from frame i to i + 1 we can have octave jumps or voiced-unvoiced jumps - costs: { VoicedUnvoicedCost Fi = 0 xor F i+1 = 0 OctaveJumpCost 2 log F i F i+1 F i 0 and F i+1 0 Considerations Increasing VoicedUnvoicedCost decreases the number of voiced-unvoiced transitions Increasing OctaveJumpCost decreases the number of octave jumps OctaveJumpCost=0 and VoicedUnvoicedCost=0: no path finder, local best candidate selected. Definition of Praat s Meter The Object From to Sound

34 Finding the Optimal Path (4): Costs Between On the transition from frame i to i + 1 we can have octave jumps or voiced-unvoiced jumps - costs: { VoicedUnvoicedCost Fi = 0 xor F i+1 = 0 OctaveJumpCost 2 log F i F i+1 F i 0 and F i+1 0 Considerations Increasing VoicedUnvoicedCost decreases the number of voiced-unvoiced transitions Increasing OctaveJumpCost decreases the number of octave jumps OctaveJumpCost=0 and VoicedUnvoicedCost=0: no path finder, local best candidate selected. Definition of Praat s Meter The Object From to Sound

35 Extra Correction in the PathFinder Extra correction for timestep values different from 0.01: timestepcorrection = 0.01 / timestep Definition of Praat s Meter The Object From to Sound OctaveJumpCost *= timestepcorrection VoicedUnvoicedCost *= timestepcorrection

36 The Object Attributes (Sampled) xmin, xmax Time domain nx Number of analysis frames dx Timestep x1 Centre of first frame ceiling Maximum frequency, if above then voiceless. maxncandidates Maximum number of candidates frame Row (1..nx) of Frame s Definition of Praat s Meter The Object From to Sound

37 The Object Attributes (Sampled) xmin, xmax Time domain nx Number of analysis frames dx Timestep x1 Centre of first frame ceiling Maximum frequency, if above then voiceless. maxncandidates Maximum number of candidates frame Row (1..nx) of Frame s Definition of Praat s Meter The Object From to Sound

38 The Object Attributes (Sampled) xmin, xmax Time domain nx Number of analysis frames dx Timestep x1 Centre of first frame ceiling Maximum frequency, if above then voiceless. maxncandidates Maximum number of candidates frame Row (1..nx) of Frame s Definition of Praat s Meter The Object From to Sound

39 The Object Attributes (Sampled) xmin, xmax Time domain nx Number of analysis frames dx Timestep x1 Centre of first frame ceiling Maximum frequency, if above then voiceless. maxncandidates Maximum number of candidates frame Row (1..nx) of Frame s Definition of Praat s Meter The Object From to Sound

40 The Object Attributes (Sampled) xmin, xmax Time domain nx Number of analysis frames dx Timestep x1 Centre of first frame ceiling Maximum frequency, if above then voiceless. maxncandidates Maximum number of candidates frame Row (1..nx) of Frame s Definition of Praat s Meter The Object From to Sound

41 The Object Attributes (Sampled) xmin, xmax Time domain nx Number of analysis frames dx Timestep x1 Centre of first frame ceiling Maximum frequency, if above then voiceless. maxncandidates Maximum number of candidates frame Row (1..nx) of Frame s Definition of Praat s Meter The Object From to Sound

42 The Object Attributes (Sampled) xmin, xmax Time domain nx Number of analysis frames dx Timestep x1 Centre of first frame ceiling Maximum frequency, if above then voiceless. maxncandidates Maximum number of candidates frame Row (1..nx) of Frame s Definition of Praat s Meter The Object From to Sound

43 Frame Frame Attributes intensity relative intensity (0: silence) ncandidates Number of candidates in frame candidate Row (1..nCandidates) of Candidate Definition of Praat s Meter The Object From to Sound Candidate Attributes frequency in Hz, 0 for voiceless. strength A number in [0, 1]

44 Frame Frame Attributes intensity relative intensity (0: silence) ncandidates Number of candidates in frame candidate Row (1..nCandidates) of Candidate Definition of Praat s Meter The Object From to Sound Candidate Attributes frequency in Hz, 0 for voiceless. strength A number in [0, 1]

45 Frame Frame Attributes intensity relative intensity (0: silence) ncandidates Number of candidates in frame candidate Row (1..nCandidates) of Candidate Definition of Praat s Meter The Object From to Sound Candidate Attributes frequency in Hz, 0 for voiceless. strength A number in [0, 1]

46 Frame Frame Attributes intensity relative intensity (0: silence) ncandidates Number of candidates in frame candidate Row (1..nCandidates) of Candidate Definition of Praat s Meter The Object From to Sound Candidate Attributes frequency in Hz, 0 for voiceless. strength A number in [0, 1]

47 Frame Frame Attributes intensity relative intensity (0: silence) ncandidates Number of candidates in frame candidate Row (1..nCandidates) of Candidate Definition of Praat s Meter The Object From to Sound Candidate Attributes frequency in Hz, 0 for voiceless. strength A number in [0, 1]

48 From to Sound (1) To make a sound with a prescribed F 0 contour: Start with a Tier object, i.e. a series of (time,pitch) points without voiced-unvoiced info. (Hz) Time (s) Definition of Praat s Meter The Object From to Sound With two points specified, the pitch follows the blue line.

49 From to Sound (2) Next: To Sound (pulse train)... We want to generate a signal s(t) specified as s(t) = n i=1 p iδ(t t i ), where the t i are the positions of the pulses. Not band-limited signal: times t i are not necessarily at multiples of T How do we make a band-limited signal from s(t)? Filtering with a low-pass filter! We do not know the spectrum of s(t) so we make the convolution with the impulse response of the low-pass filter. Definition of Praat s Meter The Object From to Sound

50 Intermezzo: Pulses and Low-Pass Convolution of Pulse Series with Impulse Response of a Low-Pass Impulse response of a square low-pass [ f c, f c ] is: h(t) = + H(f )e2πift df = +f c 1 f c 2f c e 2πift df Now the convolution: = sin 2πfct 2πf ct Definition of Praat s Meter The Object From to Sound g(t) = h(t) s(t) = h(τ)s(t τ)dτ = = sin 2πf cτ n i=1 p iδ(t t i τ)dτ 2πf cτ = n i=1 sin 2πfcτ 2πf cτ We have used f (t)δ(t t 0 )dt = f (t 0 ). This is like the resample formula p i δ(t t i τ)dτ = n i=1 p i sin 2πfc(t t i ) 2πf c(t t i )

51 Check Create Sound from formula... sinxx Mono sin(2*pi*1000*(x-0.5))/(2*pi*1000*(x-0.5)) To Spectrum... n Edit Definition of Praat s Meter The Object From to Sound

52 Boersma, P. (1993), Accurate short-time analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, IFA Proceedings 17, Hermes, D.J. (1993), analysis, in Visual Representations of s, Cooke, M., Beet, S. & Crawford, M. (eds.), John Wiley & Sons Ltd. Terhardt, E., Akustische Kommunikation - Grundlagen mit Hörbeispielen. Springer, Berlin/Heidelberg, Definition of Praat s Meter The Object From to Sound

Signal representations: Cepstrum

Signal representations: Cepstrum Source-filter separation for sound production For speech, source corresponds to excitation by a pulse train for voiced phonemes and to turbulence (noise) for unvoiced phonemes,