Abstract: Submission #26

Automated Speech Segmentation: Example of an African Language

Brigitte BIGI

Additional Fields

Abstract: Speech segmentation is the process of identifying boundaries between speech units in the speech signal and determining where in time they occur. Linguistic resources of the target language should be defined: a lexicon (the words to be recognized), a word dictionary (their pronunciations as a sequence of phonemes), an acoustic model (a stochastic representation of input waveform patterns per phoneme).

SPPAS software tool implements language-and-task-independent algorithms. This multilingual approach was applied to the african langage Naija (Nigerian pidgin). We developped language resources for a tokenizer, an automatic speech system for predicting the pronunciation of the words and their segmentation.

Resume: La segmentation de la parole consiste à identifier les unités dans le signal de parole et à déterminer où celles-ci se produisent dans le temps. Des ressources linguistiques de la langue cible doivent être définies : un lexique (les mots à reconnaître), un dictionnaire de mots (leurs prononciations en tant que séquence de phonèmes), un modèle acoustique (une représentation stochastique par phonème).

L'outil logiciel SPPAS implémente des algorithmes indépendants du langage et des tâches. Cette approche multilingue a été appliquée au langage africain Naija (pidgin nigérien). Nous avons développé des ressources linguistiques pour un tokenizer, un convertisseur graphème-phonèmes et leur alignement avec le signal.

File(s)

[Paper (PDF)]

START Conference Manager (V2.61.0 - Rev. 5964)

category:	Poster
Session:	6 December Session P4: African Languages Poster Session


Abstract:	Speech segmentation is the process of identifying boundaries between speech units in the speech signal and determining where in time they occur. Linguistic resources of the target language should be defined: a lexicon (the words to be recognized), a word dictionary (their pronunciations as a sequence of phonemes), an acoustic model (a stochastic representation of input waveform patterns per phoneme). SPPAS software tool implements language-and-task-independent algorithms. This multilingual approach was applied to the african langage Naija (Nigerian pidgin). We developped language resources for a tokenizer, an automatic speech system for predicting the pronunciation of the words and their segmentation.

Resume:	La segmentation de la parole consiste à identifier les unités dans le signal de parole et à déterminer où celles-ci se produisent dans le temps. Des ressources linguistiques de la langue cible doivent être définies : un lexique (les mots à reconnaître), un dictionnaire de mots (leurs prononciations en tant que séquence de phonèmes), un modèle acoustique (une représentation stochastique par phonème). L'outil logiciel SPPAS implémente des algorithmes indépendants du langage et des tâches. Cette approche multilingue a été appliquée au langage africain Naija (pidgin nigérien). Nous avons développé des ressources linguistiques pour un tokenizer, un convertisseur graphème-phonèmes et leur alignement avec le signal.

Automated Speech Segmentation: Example of an African Language

Brigitte BIGI

Categories

Additional Fields

File(s)