Ganda (lg) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
25000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizelgwiki sample
original in terminology development, it is always best to form words by blending, compounding, eponyms or semantic extension. for example, the luganda concept
these are some of the concepts you need to explain the themes of work, energy, and power in luganda:
buli kintu kigezaako obutayagala gugenda mu mbeera ya nkyukakyuka oba ka tugambe kigezaako okulemera mu mbeera gye kibaddemu. buli kintu kiremera mu m
1000 ▁in ▁te r m ino logy ▁de ve lo p ment , ▁i t ▁is ▁a lwa y s ▁b es t ▁to ▁for m ▁wor ds ▁by ▁b len d ing , ▁co mp ou nd ing , ▁e p on y ms ▁or ▁se ma n tic ▁e x t ensi on . ▁for ▁e x a mp le , ▁the ▁luganda ▁concep t
▁the se ▁are ▁so me ▁of ▁the ▁concep ts ▁yo u ▁ne e d ▁to ▁e x p la in ▁the ▁the me s ▁of ▁wor k , ▁en ergy , ▁and ▁po we r ▁in ▁luganda :
▁buli ▁kintu ▁ki geza ako ▁obuta yagala ▁gu genda ▁mu ▁mbeera ▁ya ▁n kyuka kyuka ▁oba ▁ka ▁tu ga mbe ▁ki geza ako ▁oku le mera ▁mu ▁mbeera ▁gye ▁kiba dde mu . ▁buli ▁kintu ▁ki re mera ▁mu ▁m
3000 ▁in ▁termino logy ▁development , ▁it ▁is ▁a lwa y s ▁b es t ▁to ▁for m ▁words ▁by ▁blen ding , ▁compound ing , ▁e pon y ms ▁or ▁se man tic ▁ex tensi on . ▁for ▁ex a mp le , ▁the ▁luganda ▁concept
▁the se ▁are ▁some ▁of ▁the ▁concepts ▁you ▁ne ed ▁to ▁ex pla in ▁the ▁the mes ▁of ▁wor k , ▁energy , ▁and ▁po wer ▁in ▁luganda :
▁buli ▁kintu ▁ki gezaako ▁obuta yagala ▁gu genda ▁mu ▁mbeera ▁ya ▁n kyukakyuka ▁oba ▁ka ▁tugambe ▁ki gezaako ▁okule mera ▁mu ▁mbeera ▁gye ▁kiba ddemu . ▁buli ▁kintu ▁kire mera ▁mu ▁m
5000 ▁in ▁terminology ▁development , ▁it ▁is ▁a lwa ys ▁b es t ▁to ▁form ▁words ▁by ▁blending , ▁compound ing , ▁e pon y ms ▁or ▁se mantic ▁extension . ▁for ▁exa mple , ▁the ▁luganda ▁concept
▁these ▁are ▁some ▁of ▁the ▁concepts ▁you ▁ne ed ▁to ▁ex pla in ▁the ▁the mes ▁of ▁work , ▁energy , ▁and ▁power ▁in ▁luganda :
▁buli ▁kintu ▁ki gezaako ▁obuta yagala ▁gu genda ▁mu ▁mbeera ▁ya ▁nkyukakyuka ▁oba ▁ka ▁tugambe ▁ki gezaako ▁okule mera ▁mu ▁mbeera ▁gye ▁kiba ddemu . ▁buli ▁kintu ▁kire mera ▁mu ▁m
10000 ▁in ▁terminology ▁development , ▁it ▁is ▁a lwa ys ▁b est ▁to ▁form ▁words ▁by ▁blending , ▁compounding , ▁e ponyms ▁or ▁semantic ▁extension . ▁for ▁example , ▁the ▁luganda ▁concept
▁these ▁are ▁some ▁of ▁the ▁concepts ▁you ▁need ▁to ▁explain ▁the ▁the mes ▁of ▁work , ▁energy , ▁and ▁power ▁in ▁luganda :
▁buli ▁kintu ▁ki gezaako ▁obutayagala ▁gugenda ▁mu ▁mbeera ▁ya ▁nkyukakyuka ▁oba ▁ka ▁tugambe ▁ki gezaako ▁okule mera ▁mu ▁mbeera ▁gye ▁kiba ddemu . ▁buli ▁kintu ▁kire mera ▁mu ▁m
25000 ▁in ▁terminology ▁development , ▁it ▁is ▁alwa ys ▁best ▁to ▁form ▁words ▁by ▁blending , ▁compounding , ▁eponyms ▁or ▁semantic ▁extension . ▁for ▁example , ▁the ▁luganda ▁concept
▁these ▁are ▁some ▁of ▁the ▁concepts ▁you ▁need ▁to ▁explain ▁the ▁the mes ▁of ▁work , ▁energy , ▁and ▁power ▁in ▁luganda :
▁buli ▁kintu ▁kigezaako ▁obutayagala ▁gugenda ▁mu ▁mbeera ▁ya ▁nkyukakyuka ▁oba ▁ka ▁tugambe ▁kigezaako ▁okulemera ▁mu ▁mbeera ▁gye ▁kibaddemu . ▁buli ▁kintu ▁kiremera ▁mu ▁m