Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizescowiki sample
original it is still worn durin ceremonies by some scots an canadian military units an by fowk that play the bagpipes (a scots wind instrument). it is quite co
* sales - uised for recordin sales invyces * sales credits - uised for recordin sales credit notes * purchases - uised for recordin purchase invyces *
* 0000 - gloria macapagal-arroyo became preses o the philippines. * 0000 - george w. bush became the 00rd preses o the unitit states. * 0000 - barack
1000 ▁it ▁is ▁st ill ▁w orn ▁durin ▁c er em on ies ▁by ▁some ▁scot s ▁an ▁can ad ian ▁m il it ary ▁unit s ▁an ▁by ▁fowk ▁that ▁play ▁the ▁b ag p ip es ▁( a ▁scot s ▁w ind ▁in st r um ent ). ▁it ▁is ▁qu ite ▁co
▁* ▁s al es ▁- ▁uised ▁for ▁rec ord in ▁s al es ▁in v y ces ▁* ▁s al es ▁c red it s ▁- ▁uised ▁for ▁rec ord in ▁s al es ▁c red it ▁n ot es ▁* ▁p ur ch as es ▁- ▁uised ▁for ▁rec ord in ▁p ur ch as e ▁in v y ces ▁*
▁* ▁0000 ▁- ▁g l or ia ▁ma c ap ag al - ar ro y o ▁bec ame ▁pres es ▁o ▁the ▁ph il ip p in es . ▁* ▁0000 ▁- ▁ge or ge ▁w . ▁b ush ▁bec ame ▁the ▁00 r d ▁pres es ▁o ▁the ▁unitit ▁states . ▁* ▁0000 ▁- ▁bar ack
3000 ▁it ▁is ▁still ▁w orn ▁durin ▁cer em on ies ▁by ▁some ▁scots ▁an ▁canad ian ▁mil it ary ▁unit s ▁an ▁by ▁fowk ▁that ▁play ▁the ▁b ag p ip es ▁( a ▁scots ▁wind ▁inst r ument ). ▁it ▁is ▁qu ite ▁co
▁* ▁sal es ▁- ▁uised ▁for ▁record in ▁sal es ▁inv y ces ▁* ▁sal es ▁c red its ▁- ▁uised ▁for ▁record in ▁sal es ▁c red it ▁not es ▁* ▁pur ch as es ▁- ▁uised ▁for ▁record in ▁pur ch ase ▁inv y ces ▁*
▁* ▁0000 ▁- ▁gl oria ▁mac ap ag al - ar roy o ▁became ▁preses ▁o ▁the ▁philipp ines . ▁* ▁0000 ▁- ▁george ▁w . ▁b ush ▁became ▁the ▁00 rd ▁preses ▁o ▁the ▁unitit ▁states . ▁* ▁0000 ▁- ▁bar ack
5000 ▁it ▁is ▁still ▁w orn ▁durin ▁cer em on ies ▁by ▁some ▁scots ▁an ▁canadian ▁military ▁units ▁an ▁by ▁fowk ▁that ▁play ▁the ▁b ag p ip es ▁( a ▁scots ▁wind ▁instr ument ). ▁it ▁is ▁qu ite ▁co
▁* ▁sales ▁- ▁uised ▁for ▁record in ▁sales ▁inv y ces ▁* ▁sales ▁c red its ▁- ▁uised ▁for ▁record in ▁sales ▁c red it ▁not es ▁* ▁pur ch ases ▁- ▁uised ▁for ▁record in ▁pur ch ase ▁inv y ces ▁*
▁* ▁0000 ▁- ▁gl oria ▁mac ap ag al - ar roy o ▁became ▁preses ▁o ▁the ▁philippines . ▁* ▁0000 ▁- ▁george ▁w . ▁b ush ▁became ▁the ▁00 rd ▁preses ▁o ▁the ▁unitit ▁states . ▁* ▁0000 ▁- ▁bar ack
10000 ▁it ▁is ▁still ▁w orn ▁durin ▁cerem on ies ▁by ▁some ▁scots ▁an ▁canadian ▁military ▁units ▁an ▁by ▁fowk ▁that ▁play ▁the ▁bag p ip es ▁( a ▁scots ▁wind ▁instrument ). ▁it ▁is ▁quite ▁co
▁* ▁sales ▁- ▁uised ▁for ▁recordin ▁sales ▁inv y ces ▁* ▁sales ▁cred its ▁- ▁uised ▁for ▁recordin ▁sales ▁cred it ▁not es ▁* ▁purch ases ▁- ▁uised ▁for ▁recordin ▁purch ase ▁inv y ces ▁*
▁* ▁0000 ▁- ▁gl oria ▁mac ap ag al - ar roy o ▁became ▁preses ▁o ▁the ▁philippines . ▁* ▁0000 ▁- ▁george ▁w . ▁bush ▁became ▁the ▁00 rd ▁preses ▁o ▁the ▁unitit ▁states . ▁* ▁0000 ▁- ▁bar ack
25000 ▁it ▁is ▁still ▁worn ▁durin ▁ceremonies ▁by ▁some ▁scots ▁an ▁canadian ▁military ▁units ▁an ▁by ▁fowk ▁that ▁play ▁the ▁bag p ipes ▁( a ▁scots ▁wind ▁instrument ). ▁it ▁is ▁quite ▁co
▁* ▁sales ▁- ▁uised ▁for ▁recordin ▁sales ▁inv y ces ▁* ▁sales ▁credits ▁- ▁uised ▁for ▁recordin ▁sales ▁credit ▁notes ▁* ▁purch ases ▁- ▁uised ▁for ▁recordin ▁purchase ▁inv y ces ▁*
▁* ▁0000 ▁- ▁gloria ▁mac ap agal - ar roy o ▁became ▁preses ▁o ▁the ▁philippines . ▁* ▁0000 ▁- ▁george ▁w . ▁bush ▁became ▁the ▁00 rd ▁preses ▁o ▁the ▁unitit ▁states . ▁* ▁0000 ▁- ▁barack
50000 ▁it ▁is ▁still ▁worn ▁durin ▁ceremonies ▁by ▁some ▁scots ▁an ▁canadian ▁military ▁units ▁an ▁by ▁fowk ▁that ▁play ▁the ▁bagpipes ▁( a ▁scots ▁wind ▁instrument ). ▁it ▁is ▁quite ▁co
▁* ▁sales ▁- ▁uised ▁for ▁recordin ▁sales ▁inv y ces ▁* ▁sales ▁credits ▁- ▁uised ▁for ▁recordin ▁sales ▁credit ▁notes ▁* ▁purchases ▁- ▁uised ▁for ▁recordin ▁purchase ▁inv y ces ▁*
▁* ▁0000 ▁- ▁gloria ▁macap agal - arroyo ▁became ▁preses ▁o ▁the ▁philippines . ▁* ▁0000 ▁- ▁george ▁w . ▁bush ▁became ▁the ▁00 rd ▁preses ▁o ▁the ▁unitit ▁states . ▁* ▁0000 ▁- ▁barack
100000 ▁it ▁is ▁still ▁worn ▁durin ▁ceremonies ▁by ▁some ▁scots ▁an ▁canadian ▁military ▁units ▁an ▁by ▁fowk ▁that ▁play ▁the ▁bagpipes ▁( a ▁scots ▁wind ▁instrument ). ▁it ▁is ▁quite ▁co
▁* ▁sales ▁- ▁uised ▁for ▁recordin ▁sales ▁invy ces ▁* ▁sales ▁credits ▁- ▁uised ▁for ▁recordin ▁sales ▁credit ▁notes ▁* ▁purchases ▁- ▁uised ▁for ▁recordin ▁purchase ▁invy ces ▁*
▁* ▁0000 ▁- ▁gloria ▁macapagal - arroyo ▁became ▁preses ▁o ▁the ▁philippines . ▁* ▁0000 ▁- ▁george ▁w . ▁bush ▁became ▁the ▁00 rd ▁preses ▁o ▁the ▁unitit ▁states . ▁* ▁0000 ▁- ▁barack