Interlingua (International Auxiliary Language Association) (ia) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
25000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizeiawiki sample
original iste occupation esseva invertite in 0000 e 0000 quando le potentias alliate liberava le areas de europa occupate per germania.
le violino es le minime del grande familia de instrumentos musical de chordas. le violino ha quatro chordas que es accordate al notas g, d, a, e e, e
secundo le norma iso 0000, le septimana se initia in le lunedi. isto se conforma al termino "fin de septimana" pro le sequentia de sabbato e dominica.
1000 ▁iste ▁oc cu p ation ▁esseva ▁in ver tite ▁in ▁0000 ▁e ▁0000 ▁quando ▁le ▁pote n tias ▁al lia te ▁li be rava ▁le ▁a re as ▁de ▁euro pa ▁oc cu p ate ▁per ▁german ia .
▁le ▁vi ol ino ▁es ▁le ▁min i me ▁del ▁grande ▁fam ilia ▁de ▁ins tru mentos ▁mus ic al ▁de ▁ch or das . ▁le ▁vi ol ino ▁ha ▁qua tro ▁ch or das ▁que ▁es ▁ac c or d ate ▁al ▁no tas ▁g , ▁d , ▁a , ▁e ▁e , ▁e
▁secundo ▁le ▁nor ma ▁is o ▁0000, ▁le ▁sep ti man a ▁se ▁ini tia ▁in ▁le ▁l un e di . ▁isto ▁se ▁con for ma ▁al ▁termin o ▁" f in ▁de ▁sep ti man a " ▁pro ▁le ▁se qu entia ▁de ▁sa b ba to ▁e ▁d omin ica .
3000 ▁iste ▁occup ation ▁esseva ▁in ver tite ▁in ▁0000 ▁e ▁0000 ▁quando ▁le ▁pote n tias ▁al lia te ▁libe rava ▁le ▁areas ▁de ▁europa ▁occup ate ▁per ▁germania .
▁le ▁vi ol ino ▁es ▁le ▁min ime ▁del ▁grande ▁familia ▁de ▁instru mentos ▁musical ▁de ▁ch or das . ▁le ▁vi ol ino ▁ha ▁quatro ▁ch or das ▁que ▁es ▁ac c ord ate ▁al ▁no tas ▁g , ▁d , ▁a , ▁e ▁e , ▁e
▁secundo ▁le ▁nor ma ▁is o ▁0000, ▁le ▁septiman a ▁se ▁initia ▁in ▁le ▁l un e di . ▁isto ▁se ▁con for ma ▁al ▁termino ▁" f in ▁de ▁septiman a " ▁pro ▁le ▁se quentia ▁de ▁sa b ba to ▁e ▁domin ica .
5000 ▁iste ▁occup ation ▁esseva ▁inver tite ▁in ▁0000 ▁e ▁0000 ▁quando ▁le ▁poten tias ▁allia te ▁libe rava ▁le ▁areas ▁de ▁europa ▁occup ate ▁per ▁germania .
▁le ▁vi ol ino ▁es ▁le ▁min ime ▁del ▁grande ▁familia ▁de ▁instrumentos ▁musical ▁de ▁ch or das . ▁le ▁vi ol ino ▁ha ▁quatro ▁ch or das ▁que ▁es ▁ac c ord ate ▁al ▁no tas ▁g , ▁d , ▁a , ▁e ▁e , ▁e
▁secundo ▁le ▁norma ▁iso ▁0000, ▁le ▁septiman a ▁se ▁initia ▁in ▁le ▁l une di . ▁isto ▁se ▁con forma ▁al ▁termino ▁" fin ▁de ▁septiman a " ▁pro ▁le ▁se quentia ▁de ▁sa b ba to ▁e ▁domin ica .
10000 ▁iste ▁occupation ▁esseva ▁inver tite ▁in ▁0000 ▁e ▁0000 ▁quando ▁le ▁potentias ▁allia te ▁libe rava ▁le ▁areas ▁de ▁europa ▁occupate ▁per ▁germania .
▁le ▁viol ino ▁es ▁le ▁minime ▁del ▁grande ▁familia ▁de ▁instrumentos ▁musical ▁de ▁chor das . ▁le ▁viol ino ▁ha ▁quatro ▁chor das ▁que ▁es ▁acc ordate ▁al ▁notas ▁g , ▁d , ▁a , ▁e ▁e , ▁e
▁secundo ▁le ▁norma ▁iso ▁0000, ▁le ▁septimana ▁se ▁initia ▁in ▁le ▁l une di . ▁isto ▁se ▁con forma ▁al ▁termino ▁" fin ▁de ▁septimana " ▁pro ▁le ▁sequentia ▁de ▁sabba to ▁e ▁dominica .
25000 ▁iste ▁occupation ▁esseva ▁invertite ▁in ▁0000 ▁e ▁0000 ▁quando ▁le ▁potentias ▁alliate ▁liberava ▁le ▁areas ▁de ▁europa ▁occupate ▁per ▁germania .
▁le ▁violino ▁es ▁le ▁minime ▁del ▁grande ▁familia ▁de ▁instrumentos ▁musical ▁de ▁chordas . ▁le ▁violino ▁ha ▁quatro ▁chordas ▁que ▁es ▁accordate ▁al ▁notas ▁g , ▁d , ▁a , ▁e ▁e , ▁e
▁secundo ▁le ▁norma ▁iso ▁0000, ▁le ▁septimana ▁se ▁initia ▁in ▁le ▁lunedi . ▁isto ▁se ▁conforma ▁al ▁termino ▁" fin ▁de ▁septimana " ▁pro ▁le ▁sequentia ▁de ▁sabbato ▁e ▁dominica .