START Conference Manager    

A South African Corpus of Multilingual Code-switched Soap Opera Speech

Febe De Wet, Ewald Van der westhuizen and Thomas Niesler


Categories

category:  Poster
Session:  6 December Session P4: African Languages Poster Session

Additional Fields

 
Abstract:   We introduce a speech corpus containing multilingual code-switching compiled from South African soap operas. The corpus contains monolingual as well as code-switched examples of English, isiZulu, isiXhosa, Setswana and Sesotho speech. The last four are indigenous languages, all belonging to the Southern Bantu family. IsiZulu and isiXhosa are Nguni languages that, while distinct, are to some degree mutually intelligible and linguistically similar. The same applies to Setswana and Sesotho, which are Sotho-Tswana languages. The data contains both inter-sentential and intra-sentential code-switching. Intra-sentential code-switching occurs as alternation, insertion as well as intra-word switches.

 
Resume:   Sethula i-corpus yenkulumo equkethe ukushintshwa kwekhodi yezilimi eziningi ehlanganiswe kuma-soap opera waseNingizimu Afrika. I-corpus iqukethe izibonelo zesiNgisi nesiZulu nesiXhosa nesiTswana nesiSuthu ezinolimi olulodwa kanye​ nezibonelo ezishintshile ​ikhodi. Ezine zokugcina ziyizilimi zomdabu, zonke zingabomndeni waseSouthern Bantu. IsiZulu nesiXhosa yizilimi zesiNguni, nakuba zihlukile, ngezinga elithile ziyaqondana futhi zifana ngohlelo. Kwenzeka okufanayo nesiTswana futhi nesiSuthu, eziyizilimi zesiSuthu-Tswana. Idatha iqukethe ukushintshwa kwekhodi okungaphandle kwemisho futhi okungaphakathi kwemisho. Ukushintshwa kwekhodi okungaphakathi kwemisho kwenzeka njengokushintshana (alternation), ukufakwa (insertion) kanye nokushintshwa ngaphakathi kwamagama (intra-word switches).

File(s)

[Paper (PDF)]  

START Conference Manager (V2.61.0 - Rev. 5964)