For low-resource languages, adapting existing multilingual embedding models through vocabulary trimming and task-specific fine-tuning can produce efficient, locally-deployable alternatives to large proprietary models without sacrificing performance.
This paper introduces SkMTEB, the first comprehensive benchmark for evaluating text embedding models on Slovak, a low-resource language.