Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars
Quick Answer
This paper proposes a method to convert the Arabic-English Al-Mawrid dictionary into a machine-readable format using parsing expression grammars.
Quick Take
This paper proposes a method to convert the Arabic-English Al-Mawrid dictionary into a machine-readable format using parsing expression grammars. The approach structures dictionary entries into hierarchical formats, enhancing their usability for natural language processing applications despite the lack of standardization in Arabic dictionaries.
Key Points
- The method structures dictionary entries into hierarchical formats for better machine processing.
- Parsing expression grammars were utilized to implement the parser for the dictionary.
- Each dictionary entry includes subentries with defining phrases and translation equivalences.
- The study shows potential for automatic or semi-automatic structuring of Arabic dictionaries.
- Lack of microstructure standardization in Arabic dictionaries is addressed through this method.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 25231v1 Announce Type: new Abstract: Dictionaries are rich sources of lexical information about words that is required for many applications of natural language processing and human language technology. However, publishers prepare printed dictionaries for human usage not for machine processing. This paper presented a method to structure partly a machine-readable version of the Arabic-English Al-Mawrid dictionary.
The method converted the entries of Al-Mawrid from a stream of words and punctuation marks into hierarchical structures. The hierarchical structure expresses the components of each dictionary entry in explicit format. A dictionary entry is composed of subentries and each subentry consists of defining phrases, domain labels, cross-references, and translation equivalences. We designed the proposed method as cascaded steps where parsing is the main step.
We implemented the parser using the parsing expression grammars formalism. In conclusion, although Arabic dictionaries do not have microstructure standardization, this study demonstrated that it is possible to structure them automatically or semi-automatically with plausible accuracy after inducing their microstructure.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.