A standardized, reproducible method for converting print dictionaries into machine-readable lexical resources—critical infrastructure for Arabic NLP that didn't exist before.
This paper describes how to convert a printed Arabic-English dictionary into a digital, machine-readable format using international standards (ISO LMF and TEI Lex-0). The authors tested their approach on a sample section and achieved 91% accuracy in parsing the dictionary's structure, while extracting synonyms and word features with 85-98% precision.