|Qur'an Word by Word Sunnah Salah Audio|
The Quranic Arabic Corpus
Welcome to the Quranic Arabic Corpus, an annotated linguistic resource which shows the Arabic grammar, syntax and morphology for each word in the Holy Quran. The corpus provides three levels of analysis: morphological annotation, a syntactic treebank and a semantic ontology.
Developed at the Institute for Artificial Intelligence (AI) at the University of Leeds, the grammatical data was initially produced by a computer program that independently derived the rules of Classical Arabic grammar using machine learning. By learning from example annotation, the AI system provides detailed grammatical analysis for each word in the Quran. The resulting information has been made available online as a free educational resource.
Similar to Wikipedia, the corpus website is publically accessible and contains no advertising. Although generated automatically by an artificial intelligence system, to ensure accuracy, a team of volunteer editors continually update and improve the content by comparing to authentic, traditional sources of Quranic grammar.
The corpus is a unique online website for studying the Classical Arabic language of the Quran through the traditional analysis of iʿrāb (إعراب), focusing on morphology (ṣarf - صرف) and syntactic analysis (naḥu - نحو). The website is actively used worldwide to access deep linguistic analysis of the Quran, as a study aid for learning Classical Arabic. Reviving the historical approach to linguistics by traditional Arabic grammarians, the website contains new grammatical information provided in unprecedented detail that is not available anywhere else on the internet or in book form.
What is the Quran?
The Quran is a significant religious text followed by believers of the Islamic faith. Written in Quranic Arabic, the Quran contains 6,236 numbered verses (ayāt) and is divided into 114 chapters. An example verse (21:30) from the Quran:
Access the corpus...
How you can get involved
This project contributes to original research of the Quran by applying natural language computing technology to analyze the Arabic text of each verse. The word by word grammar is very accurate, but ensuring complete accuracy is not possible without your help. If you come across a word and you feel that a better analysis could be provided, you can suggest a correction online by clicking on an Arabic word.
The map above shows worldwide interest in the Quranic Arabic Corpus. The website is used by over 5 million people a year from 165 different countries. Help us review the information on this website so that together we can build the most accurate linguistic resource for Quranic Arabic.
The Quranic Arabic Corpus is divided into several research projects that focus on different aspects of linguistics. The main projects are the treebank, which covers morphology (صرف) and syntax (نحو), and the ontology which maps concepts in the Quran.
The Quranic Arabic Treebank
The Quranic Treebank is an effort to map out the entire grammar of the Quran by linking Arabic words through dependencies. The linguistic structure of verses is represented using formal graphs (from mathematical graph theory). The Quranic Arabic Corpus provides a novel visualization of Quranic syntax using hybrid dependency graphs that combine aspects of both modern constituency and dependency grammar.
The Ontology of Quranic Concepts
The Quranic Ontology uses knowledge representation to define the key concepts in the Quran and shows the relationships between these concepts using predicate logic. Named entities in verses, such as the names of historic people and places mentioned in the Quran, are linked to concepts in the ontology.
University of Leeds
|In research publications,
Parsing by Machine Learning from a Classical Arabic Treebank.