Ken-Ichi Sakakibara∗1, Hiroshi Imagawa∗2, Tomoko Konishi, Kazumasa Kondo, Emi Zuiki Murano∗2, Masanobu Kumada∗3, and Seiji Niimi∗Vocal fold and false vocal fold vibrations in throat singing and synthesis of khoomei

Ken-Ichi Sakakibara∗1,

Hiroshi Imagawa∗2,

Tomoko Konishi, Kazumasa Kondo, Emi Zuiki Murano∗2,

Masanobu Kumada∗3,

and Seiji Niimi∗4
∗1 NTT Communication Science Laboratories, ∗2 The University of Tokyo, ∗3 National Rehabilitation Center for the Disabled, ∗4 International University of Health and Welfare

Introduction

Throat singing is a traditional singing style of people who live around the Altai mountains. Kh¨ o¨ omei in Tyva and Kh¨ o¨ omij in Mongolia are representative styles of throat singing. Throat singing is sometimes called biphonic singing, multiphonic singing, overtone singing, or harmonic singing because two or more distinct pitches (musical lines) are produced simultaneously in one tone. One is a low sustained fundamental pitch, called a drone, and the second one is a whistle-like harmonic that resonates high (in the range from 1 kHz to 3 kHz) above the drone. Many variations of singing styles in throat singing are classiﬁed according to singers and regions. However, it is possible to objectively classify these variations in the terms of a source-ﬁlter model in speech production. The laryngeal voices of throat singing can be classiﬁed into (i) a pressed voice and (ii) a kargyraa voice based on listener’s impression, acoustical characteristics, and the singer’s personal observation on voice production. The pressed voice is the basic laryngeal voice in throat singing and used as drone. The kargyraa voice is a very low pitched voice that ranges out of the modal register. The production of the high pitched overtone is mainly due to the pipe resonance of the cavity from the larynx to the point of articulation in the vocal tract [1]. In Tyvan kh¨ o¨ omei, sygit is a style where singers articulate by touching the tongue to the palate and kh¨ o¨ omei is one where they articulate by pursing the lips. We have also physiologically observed two diﬀerent laryngeal voices and estimated the patterns of the vocal fold and false vocal fold vibrations [6]. We have simulated the vibration patterns by a physical modeling of the larynx: 2 × 2-mass model. Based on the physiological observations and the simulation, we propose a new laryngeal voice model and synthesis system for throat singing

https://www.academia.edu/14948935/Vocal_fold_and_false_vocal_fold_vibrations_in_throat_singing_and_synthesis_of_khoomei?email_work_card=view-paper

Ken-Ichi Sakakibara,Leonardo Fuks,Hiroshi Imagawa, Niro Tayama : Growl voice in and pop styles

Proceedings of the International Symposium on Musical Acoustics, March 31st to April 3rd 2004 (ISMA2004), Nara, Japan

Ken-Ichi Sakakibara , Leonardo Fuks , HiroshiImagawa , Niro Tayama

NTT Communication Science Laboratories, NTT Corporation, Japan

Department of Otolaryngology, The University of Tokyo, Japan

School of Music, Universidade Federal do Rio de Janeiro, Brazil

Department of Speech Physiology, The University of Tokyo, Japan

International Medical Center of Japan, Japan

kis@brl.ntt.co.jp leofuks@serv.com.ufrj.br

imagawa@m.u-tokyo.ac.jp ntayama@imcj.hosp.go.jp

Growl voice in ethnic and pop styles

Article (PDF Available) · May 2 with 356 Reads

Cite this publication

Ken-Ichi Sakakibara

23.23

Health Sciences University of Hokkaido

NTT Communication Science Laboratories, NTT Corporation, Japan

Department of Otolaryngology, The University of Tokyo, Japan

Leonardo Fuks

9.95

Federal University of Rio de Janeiro

Hiroshi Imagawa

Department of Speech Physiology, The University of Tokyo, Japan

Niro Tayama

International Medical Center of Japan, Japan

Department of Otolaryngology, The University of Tokyo, Japan

Show more authors

Among the so-called extended vocal techniques, vocal growl is a rather common effect in some ethnic (e.g. the Xhosa people in South Africa) and pop styles (e.g. Jazz, Louis Armstrong-type) of music. Growl usually consists of simultaneous vibrations of the vocal folds and supra-glottal structures of the larynx, either in harmonic or sub-harmonic co-oscillation. This paper examines growl mechanism using vide-ofluoroscopy and high-speed imaging, and its acousit-cal characteristics by spectral analysis and model simu-lation. In growl, the larynx position is usually high and aryepiglottic folds vibrate. The aryepiglottic constriction is associated to a unique shape of the vocal tract, includ-ing the larynx tube, and characterizes growl.

1. Introduction

The term growl is originally referred to as low-pitched

sounds uttered by animals, such as dogs, or similar

sounds by humans, and therefore is mainly described

by auditory-perceptual impression. Growl is widely ob-

served in singing as well as in shouting and aroused

speech.

The growl phonation has been also referred to as the

phonation observed in some singing styles, such as the

jazz singing style of Louis Armstrong 1and Cab Cal-

loway, [2, 3]. Many jazz, blues, and gospel singers often

use growl in a similar manner. Besides such pop musics

from North America, growl styles are widely found in

pop music of other areas: in Brazil, samba singers, par-

ticularly in carnival lead voices, pop star Elza Soares, and

country singing duoBruno& Marrone; in Japan, Enka (a

popular emotive style) singers, such as Harumi Miyako,

employ it frequently. Some singers use growl extensively

through a song, while others use it as a vocal effect for

expressive emphasis.

In ethnic music, one of the most prominent use of

growl is found in umngqokolo, which is a vocal tradition

of the Xhosa people in South Africa [11]. In Japanese

theatre, Noh percussionist’s voice, Kakegoe, may present

growl at the beginning of phonation.

Growl may have perceptual similarities with the

rough or harsh voice. In terms of phonetics, growl

is sometimes described as the voiced aryepiglottic trill

[3]. However, there is no clear evidence of its produc-

tion mechanism, such as physiological observation of the

aryepglottic vibration.

In throat singing (Tyvan khoomei and Mongolian

khoomij), ventricular and vocal fold vibration was ob-

served for the two different laryngeal voices (drone and

kargyraa) [4, 9]. In drone, the basic voice in throat

singing with a whistle-like high overtone, the ventricular

fold vibration is at the same frequency as the vocal fold

vibration. In kargyraa, which usually sounds one octave

(or more) lower than the modal register, the ventricular

folds vibrate at when the vocal folds vibrate at .

Moreover, some singers can do triple-periodic kargyraa

in which the ventricular folds vibrate at .

In this paper, the phonation mode with ventricular and vocal fold

vibration is called VVM (vocal-ventricular mode) [4]. In

growl, there is no clear evidence of the ventricular fold

vibration.

TO READ THE WHOLE ARTICLE , PLEASE CLICK ON THE LINK BELOW/

https://www.researchgate.net/publication/228485036_Growl_voice_in_ethnic_and_pop_styles

Ken-Ichi Sakakibara, Leonardo Fuks, Hiroshi Imagawa, Niro Tayama: Growl Voice in Ethnic and Pop Styles

CLICK ON THIS LINK TO READ THE WHOLE PAPER :

https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3A528dbdd7-5280-4bf2-b448-c79009c9a6d2

Growl Voice in Ethnic and Pop Styles

Ken-Ichi Sakakibara (1 2), Leonardo Fuks (3), Hiroshi Imagawa(4), Niro Tayama(5)

1NTTCommunication Science Laboratories, NTT Corporation, Japan

2Department of Otolaryngology, The University of Tokyo, Japan School of Music,

3Universidade Federal do Rio de Janeiro, Brazil

4Department of Speech Physiology, The University of Tokyo, Japan

5International Medical Center of Japan, Japan

Abstract

This paper examines growl mechanism using videofluoroscopy and high-speed imaging, and its acousitcal characteristics by spectral analysis and model simulation. In growl, the larynx position is usually high and aryepiglottic folds vibrate. The aryepiglottic constriction is associated to a unique shape of the vocal tract, including the larynx tube, and characterizes growl.

Ken-ichi Sakakibara, Hiroshi Imagawa, Seiji Niimi: Vocal fold and false vocal fold vibrations in throat singing and synthesis of khoomei

Upload
8

Vocal fold and false vocal fold vibrations in throat singing and synthesis of khoomei

2000

We observed laryngeal movements in throat singing using physiological methods: the simultaneous recording of singing sounds, EGG, and high-speed digital images. We observed vocal fold and false vocal fold vibration and estimated the vibration patterns. We also estimated the laryngeal voices by using an inverse filtering method and simulated the vibration pattern using a new physical model: 2 ×2-mass model.

Publication Date: 2000

18 Views
•2 Files

Vocal fold and false vocal fold vibrations in throat singing andsynthesis of kh¨o¨omei

Ken-Ichi Sakakibara

∗

, Hiroshi Imagawa

∗

,Tomoko Konishi, Kazumasa Kondo,Emi Zuiki Murano

∗

, Masanobu Kumada

∗

, and Seiji Niimi

∗

NTT Communication Science Laboratories,

∗

The University of Tokyo,

∗

National Rehabilitation Center for the Disabled,

∗

International University of Health and Welfare

Abstract

We observed laryngeal movements in throat singing using physiological methods: the simultaneous recording of singing sounds, EGG, and high-speed digital images. We observed vocal fold and false vocal fold vibration and estimated the vibration patterns. We also estimated the laryngeal voices by using an inverse ﬁltering method and simulated the vibration pattern using a new physical model:

-mass model. From these observations, we propose a laryngeal voice model for throat singing and synthesis system of throat singing.

1 Introduction

Throat singing is a traditional singing style of peo-ple who live around the Altai mountains. Kh¨o¨omeiin Tyva and Kh¨o¨omij in Mongolia are representa-tive styles of throat singing. Throat singing is some-times called biphonic singing, multiphonic singing,overtone singing, or harmonic singing because two ormore distinct pitches (musical lines) are produced si-multaneously in one tone. One is a low sustainedfundamental pitch, called a drone, and the secondone is a whistle-like harmonic that resonates high (inthe range from 1 kHz to 3 kHz) above the drone.Many variations of singing styles in throat singingare classiﬁed according to singers and regions. How-ever, it is possible to objectively classify these varia-tions in the terms of a source-ﬁlter model in speechproduction.The laryngeal voices of throat singing can be clas-siﬁed into (i) a pressed voice and (ii) a kargyraa voicebased on listener’s impression, acoustical character-istics, and the singer’s personal observation on voiceproduction. The pressed voice is the basic laryngealvoice in throat singing and used as drone. The kar-gyraa voice is a very low pitched voice that rangesout of the modal register.The production of the high pitched overtone ismainly due to the pipe resonance of the cavity fromthe larynx to the point of articulation in the vo-cal tract [1]. In Tyvan kh¨o¨omei, sygit is a stylewhere singers articulate by touching the tongue tothe palate and kh¨o¨omei is one where they articulateby pursing the lips.We have physiologically observed two diﬀerent la-ryngeal voices and estimated the patterns of the vo-cal fold and false vocal fold vibrations [6]. We havealso simulated the vibration patterns by a physicalmodeling of the larynx: 2

2-mass model. Basedon the physiological observations and the simulation,we propose a new laryngealvoice model and synthesissystem for throat singing.

2 Physiological observations

2.1 Methods

We observed laryngeal movements in throat singingdirectly and indirectly by simultaneous recording of high-speed digital images, EGG (Electroglottogra-phy) waveforms, and sound waveforms (Fig. 1). Thehigh-speed digital images were captured through aﬁberscope inserted into the nose cavity of a singerat 4501 frames/s. Sound and EGG waveforms weresampled at 12 b/s and 18 kHz sf [4]. Two singers,who are normal, participated as subjects. One stud-ied kh¨o¨omei in Tyva and the other studied kh¨o¨omij in Mongolia.

EGG

Fig.1: High-speed digital image system.

2.2 Results

Common laryngeal movements are observed amongtwo singers for each of the two laryngeal voices.

contact: K.-I. Sakakibara,

kis@brl.ntt.co.jp

, NTT Communication Science Labs, 3-1, Morinosato Wakamiya, Atsugi-shi, 243-0198, Japan

Pressed voice

In pressed-voice production, the following features of the laryngeal movements were observed. (1) Overallconstriction of the supra-structures of the glottis wasobserved, thus it was diﬃcult to directly observe vi-brations of vocal folds (VFs). (2) Vibration of thesupra-structures of the glottis, whose edges are pre-sumably false vocal folds (FVFs), was observed indigital high-speed images. (3) The period of FVFsvibrations was almost equal to the period of the EGGwaveform. (4) The slope of the EGG curve changedin the beginning of the closed phase of the FVFs, theimpedance of the EGG reached the maximal valuewhen the FVFs were open, and reached the minimalvalue when they were closed (Fig. 2). The graph atthe bottom of Fig. 2 depicts the locus of the edge of FVFs. The upper line (the lower line) is the locus of the left (right, respectively) edges of FVFs.

Kargyraa voice

In kargyraa-voice production, the following featuresof the laryngeal movement were observed. (1) Over-all constriction at the supra-structures of the glottiswas observed. (2) The constriction was looser thanthat in the case of the pressed voice. (3) Vibrationof the supra-structures of the glottis, whose edges arepresumably FVFs. (4) The phases of FVF vibrationsare observed to alternate between almost completelyclosed and open. (5) Vibration of the VFs was ob-served during the open period of the FVFs. (6) Thedouble period of vibration of the FVFs were equalto the period of the sound waveform. (7) When theFVFs almost completely closed, the power of soundbecame weaker. (8) In the EGG waveform, two dif-ferent shapes alternated, and the period of the EGGwaveform was equal to that of the sound waveform(Fig. 3).

Fig. 2: Pressed voice(from above, sound, EGG, edges of FVF).Fig. 3: Kargyraa voice(from above, sound, EGG, edges of FVF).

2.3 Discussion

Two common features were observed among themechanisms of the two diﬀerent laryngeal voice pro-ductions: (1) Overall constriction of the supra-structures of the glottis and (2) vibration of thesupra-structures of the glottis, which presumably areFVFs. These features are not observed in vowel pro-duction in ordinary speech. The diﬀerences amongthe two diﬀerent laryngeal voice productions are (1)narrowness of the constriction and (2) the manner of FVF vibration.The EGG waveforms for the pressed voice andkarygraa voice represent the contact area of thesupra-structures of the glottis as well as that of theVFs. However, taking into account the high-speeddigital images and sound waveforms, the EGG wave-forms can be assumed to mainly represent the contactarea of VFs. Thus, we can conclude that VF vibra-tions and FVF vibrations have the opposite phase inthe pressed-voice case . In the kargyraa voice, theFVFs can be assumed to close once for every two pe-riods of closure of the VFs, and this closing blocksairﬂow and contributes to the generation of the sub-harmonic tone of kargyraa.In a previous study, the open quotient (OQ) inthroat singing was estimated to be smaller from theacoustical feature [2]. However, for both the pressedand kargyraa voice, our physiological observationsuggests that the OQ is diﬃcult to estimate becauseof the contribution of the supra-structuresof the glot-tis. Therefore the OQ was not estimated.In the synthesis of the throat singing sounds, aspointed out in [1], glottal source modeling is neededfor reproduction of the timber. Our physiological ob-servations suggests that the glottal source model of throat singing should include the FVF vibrations aswell as the VF vibrations [7].

3 Laryngeal voice model of throat singing

In this paper, we deﬁne the glottal airﬂow as the air-ﬂow through glottis to the area between FVFs andthe laryngeal airﬂow as the airﬂow through the areabetween FVFs to the pharynx.

Glottal airﬂow estimation

From recorded sounds, we estimated laryngealairﬂowusing the inverse ﬁltering technique. In the pressedvoice, the estimated laryngeal airﬂow curve had asmall notch just after the curve reached a peak, andthe closing of the VFs was apparently not complete

(Fig. 4). In the kargyraa voice, the estimated la-ryngeal airﬂow curve has two peaks in each period.From our physiological observation, the VFs vibratetwice in each period of the FVF vibration, and theestimated laryngeal airﬂow curve showed that in oneof the two vibrations of VFs, the closing of VFs werenot completed (Fig. 5).

SoundEGGLaryngealairflowAirflowderivative

Fig. 4: Inverse ﬁltered laryngeal airﬂow of pressedvoices for two singers.

SoundEGGAirflowderivativeLaryngealairflow

Fig. 5: Inverse ﬁltered laryngeal airﬂow of kargyraavoices for two singers.

All the power spectra of the estimated glottal air-ﬂows showed an increase of power in the range from1 to 3 kHz, which is where the second formant fre-quency which corresponds the whistle-like overtoneappears in throat singing (Fig. 6–8).

Fig. 6: Inverse ﬁltered airﬂow spectrum of normal voicefor two singers.Fig. 7: Inverse ﬁltered airﬂow spectrum of pressed voicefor two singers.Fig. 8: Inverse ﬁltered airﬂow spectrum of karygraavoice for two singers.

A 2

2-mass model

For a physical simulation of the VF and FVF vi-brations, we propose a 2

2-mass model as a self-oscillating model of VF and FVF vibrations (Fig.9). This model was devised by introducing a two-mass model for the FVFs to the ordinary two-massmodel for the VFs. The mechanical transmission of vibrations between the VFs and FVFs were not con-sidered. The laryngeal ventricle is a cylinder whosesectional area is uniformally 5 cm

and height is 16 cmand not deformed. In the simulation the 2

2-massmodel oscillated stably. The simulation of laryngealmovements using the 2

2-mass model agreed withthe above assumptions for the two laryngeal move-ment patterns of throat singing for both the pressedand kargyraa voices (Fig. 10). The 2

2-mass modelcan simulate ordinary glottal source in the same wayas the two-mass model by setting suitable model pa-rameters [3].

VocalfoldsFalsevocalfoldsLaryngealVentricleVocal tractTrachea

Fig. 9: 2

2-mass model for the VFs and FVFs.

Sound waveformLaryngeal airflow

1000 cc/s

Fig. 10: Laryngeal airﬂow obtained by using 2

2-massmodel(left: pressed voice, right: kargyraa voice).

Laryngeal voice model

From the physiological observations and estimatedlaryngeal voices, we assume (1) in pressed-voice pro-duction, VFs and FVFs vibrate in almost oppositephase; (2) in karygraa-voice production, two closed

phases of the VFs appeared in one period of a glottalvolume ﬂow waveform, and VFs were incompletelyclosed at one of the two closed phases. Under theseassumptions, we propose a laryngeal voice model forthroat singing and synthesized throat singing sounds.Our proposed laryngeal voice model is obtainedas follows: We generate almost sine-shaped glottalairﬂow, because the glottal ﬂow of the throat singingmust be symmetric from Fig. 4 (Step 1). The glottalairﬂow is modulated by the vibration of the FVFs(Step 2). Turbulent noise is added according to theopen width of the FVFs (Step 3). The output is con-voluted with the transfer function of the laryngealventricle (Step 4)[3].

Laryngeal ventricle resonanceglottal airflowAg: glottal areaFalse glottalareaLaryngealairflow

Fig. 11: Block diagram for laryngeal voice model.

4 Synthesis of throat singing

Based on a Klatt synthesizer [5], we propose synthe-sis model for throat singing, which has the proposedlaryngeal voice model as source and time-varying for-mants obtained from recorded throat singing soundsas resonating ﬁlters (Fig. 12). Compared with an or-dinary glottal airﬂow model, some improvements of the timbre were observed.

Conclusion

We observed the laryngeal movements in throatsinging. The VF and FVF vibrations were observed.The FVF vibrations contribute to production of boththe two laryngeal voices of throat singing. We also es-timated the laryngeal voice source and simulated thelaryngeal movements by using a 2

2-mass model.Based on these observations, we proposed a laryn-geal source model and synthesis model for throatsinging. These models can also simulate the normalvoice. Consequently, all the power spectrum of thesimulated glottal airﬂows showed the increase of thepower on the range less than 3 kHz where the secondformant frequency which corresponds the whistle-likeovertone in throat singing. Our study indicates theglottal source also contributes the whistle-like over-tone production as well as the articulation of thetongue and lips.

Fig. 12: Block diagram of kh¨o¨omei synthesizer.Fig. 13: Synthesized laryngeal airﬂows, synthesizedsounds by kh¨o¨omei synthesis system, and power spectraof sythesized souds (left: pressed voice, right: kargyraavoice).

Acknowledgments

We wish to thank Seiji Adachi, Zoya Kyrgys,Koichi Makigami, Naotoshi Osaka, Yoshinao Shiraki,and Masahiko Todoriki for their help and useful dis-cussion.

Bibliography

[1] S. Adachi and M. Yamada. An acoustical study of soundproduction in biphonic singing x¨o¨omij.

J. Acoust. Soc.Am.

, 105(5):2920–2932, 1999.[2] G. Bloothooft, E. Bringmann, M. van Cappellen, J. B. vanLuipen, and K. P. Thomassen. Acoustics and perceptionof overtone singing.

J. Acoust. Soc. Am

, 92(4):1827–1836,1992.[3] H. Imagawa, K.-I. Sakakibara, T. Konishi, E. Z. Murano,and S. Niimi. Throat singing synthesis by a laryngealvoice model based on vocal fold and false vocal fold vi-brations.

Tech. Rep. IECE

, SP2000-140:71–78, Feb. 2001.in Japanese.[4] S. Kiritani, H. Imagawa, and H. Hirose. Vocal cord vibra-tion in the production of consonants-observation by meansof high-speed digital imaging using a ﬁberscope.

J. Acoust.Soc. Jpn. (E)

, 17:1–8, 1996.[5] D. H. Klatt. Software for a cascade/parallel formant syn-thesizer.

J. Acoust. Soc. Am.

, 67(3):971–995, 1980.[6] T. C. Levin and M. E. Edgerton. The throat singers of tuva.

Scientiﬁc America

, (Sep.1999):80–87, 1999.[7] K.-I. Sakakibara, S. Adachi, T. Konishi, K. Kondo, E. Z.Murano, M. Kumada, M. Todoriki, H. Imagawa, and S. Ni-imi. Observation of vocal fold vibrations in tyvan and mon-golian throat singing.

Tech. Rep. Musical Acoust., Acoust.Soc. Jpn

, 19-4:41–48, Sep. 2000. in Japanese.