Supported Languages#
The follwing list reflects the languages supported by the BBF for which we additionally tested their correctness:
Language Name |
ISO 639-1 Code |
Pre Processor Source Code |
Pre Processing Type |
---|---|---|---|
English |
en |
Stemmer |
|
Spanish |
es |
Stemmer |
|
French |
fr |
Stemmer |
|
German |
de |
Stemmer |
|
Portuguese |
pt |
Stemmer |
|
Catalan |
ca |
Stemmer |
|
Luxembourgish |
lb |
Lemmatizer |
Snowball stemmer#
The snowball stemmer supports more languages than the ones displayed in the table above. For the complete list check out https://snowballstem.org/algorithms/.
Luxembourgish Pre-processing#
To support the luxembourgish language, we wanted to enable luxembourgish-specific text-processing. For that purpose, we opted for the best available tool, which is the lemmatizer created by Christoph Purschke as part of “spellux - Automatic text normalization for Luxembourgish”.