START Conference Manager    

Heuristic guided probabilistic graphic language modelling for morphological segmentation of isiXhosa

Lulamile Mzamo, Albert Helberg and Sonja Bosch


Categories

category:  Poster
Session:  6 December Session P4: African Languages Poster Session

Additional Fields

 
Abstract:   The IsiXhosa Heuristics Maximum Likelihood Segmenter (XHMLS), an unsupervised isiXhosa segmenter, is evaluated. The study contributes use of isiXhosa word morphology heuristics as a guide to probabilistic graphical modelling (PGM) the segmentation of isiXhosa. Four guided PGMs with options for modified Kneser-Ney (mKN) smoothing are presented. XHMLS’s boundary identification accuracy of 78.7% outperforms the benchmark Morfessor-Baseline’s 77.2%, and shows an even better f1-Score, 68.0%, compared to Morfessor-Baseline’s 48.9%, when modelled with circumfixing and smoothing. The study shows that better word segmentation performance could be achieved in the unsupervised morphological segmentation of isiXhosa if a representative and smoothed PGM is used.

 
Resume:   I-IsiXhosa Heuristics Maximum Likelihood Segmenter (XHMLS), isicaluli-mbhalo sesiXhosa esingagadwanga, siyavavanywa. Igalelo loluphando kukusetyenziswa kwendlela amagama esiXhosa aguquka ngayo njengesikhokelo somFanekiso-mBoniso-Thuba (FBT) ekucaluleni isiXhosa. Ii-FBT ezikhokelweyo ezine, ezinokukhetha ukusebenzisa ugudiso lwe-Kneser-Ney elungisiweyo (mKN), ziyaboniswa. Inkcaneko yokukhomba imida yezimilo ye-XHMLS eyi-78.8% igqitha eyomgangatho-jikelele oyiMorfessor-Baseline, we-77.2%, kwaye igqithe ngakumbi ngenqaku le-f1, ngo-68.0%, xa ithelekiswa neyeMorfessor-Baseline engu-48.9%, xa inkokhelo izizimi-macala yaye igudiswe nge-mKN. Olu phononongo lubonisa ukuba ucalulo-magama lwesiXhosa olungcono lungafumaneka xa kusetyenziwa isicaluli-magama esingagadwanga se-FBT esisifuziseleyo isiXhosa sibe sigudiswe nge-mKN.

File(s)

[Paper (PDF)]  

START Conference Manager (V2.61.0 - Rev. 5964)