Content on this page requires a newer version of Adobe Flash Player.

Get Adobe Flash player

 

Technology

Our Technology is...

 

Technology White Paper:

To download our Technology White Paper click Here

 

INDEX:

 

Glossary:



A)Technological Information



1- Natural Language Processing:

NLP: is an interdisciplinary field, with a mix of computer science and linguistics.

Natural language generation systems convert information from computer databases into readable human language.

Natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.

Many problems within NLP apply to both generation and understanding; for example, a computer must be able to model morphology (the structure of words) in order to understand an English sentence, but a model of morphology is also needed for producing a grammatically correct English sentence.

للرجوع الى فهرس الصفحة

2- Indexing:

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics and computer science.

للرجوع الى فهرس الصفحة

3- Morphological Search:

Morphological Search ensures that the search algorithm considers all inflected forms of words when creating the search index and searching for the requested words and phrases.

للرجوع الى فهرس الصفحة

4- Word Adjacency or Word Proximity:

Searching using adjacency or proximity means that the words specified in the search query are to appear in the results with a maximum number of intervening words. This maximum number is specified by using an adjacency or proximity operator.

للرجوع الى فهرس الصفحة

5- Boolean:

Where does the term Boolean originate from?

Boolean searching is built on a method of symbolic logic developed by George Boole, a 19th century English mathematician. Most online databases and search engines support Boolean searches. Boolean search techniques can be used to carry out effective searches, cutting out many unrelated documents.

It is possible to compose some complex search expressions using Boolean logic on this search system. To do so, use the following terms:

'AND' (Boolean AND) – in Arabic و

'OR' (Boolean OR) – in Arabic أو

'NOT' (Boolean NOT) – in Arabic ليس

Parentheses can also be used when conducting an advanced Boolean search.

For example: art AND (school OR college) - this expresses a search for records containing information about art schools or colleges.

للرجوع الى فهرس الصفحة

6- Wildcard:

A wildcard character can be used to substitute for any other character or characters in a string.

The asterisk (*) usually substitutes as a wildcard character for any zero or more characters, and the question mark (?) usually substitutes as a wildcard character for any one character. 'Wildcard' search is used to search for non-dictionary words that cannot be searched for using linguistic (morphological) rules; in this case the word may be searched for using wildcard characters, i.e. using 'wildcard search'.

للرجوع الى فهرس الصفحة

7- Redundancy:

Redundancy in search results is the occurrence of unnecessary results that are not relevant to the word(s) being searched for.

للرجوع الى فهرس الصفحة

8- Morphological Analyzer for Arabic Words (KMorph):

The morphological analyzer for Arabic words, KMorph, is capable of analyzing a word and returning its prefix, stem, suffix, triliteral root and meaning. It is also capable of morphological generation, i.e. generating inflected word forms, given the prefix, stem and suffix.

It can also be integrated into third party products in order to provide Arabic morphological search.

للرجوع الى فهرس الصفحة

9- Spelling Verification and Correction (KSpell):

KSpell is a spelling verification and correction technology that uses Natural Language Processing technology, especially formulated for the Arabic language, through a fast Arabic morphological analyzer.

The speller is capable of discovering spelling errors and suggesting corrections for the most frequent misspellings of Arabic words.

Basically, the speller/corrector's structure consists of two components: one is used to verify a word's spelling, while the other suggests a list of correct words.

للرجوع الى فهرس الصفحة

10- Morpho-Syntactic Error:

The term originates within the field of Machine Translation. Morpho-syntactic errors are errors resulting from misapplication of morphological inflection and syntactic rules.
Simple spell checkers are only able to spot errors leading to non-words; errors involving legally spelled words go unnoticed.

للرجوع الى فهرس الصفحة

B) Linguistic Information:

1- Root:

The roots of verbs and most nouns in Semitic languages are characterized as a sequence of consonants or "radicals" (hence also the term consonantal root). Such abstract consonantal roots are used in the derivation of actual words by adding the vowels and non-root consonants (or "transfixes") which go with a particular morphological category around the root consonants, in an appropriate way, generally following specific patterns. It is a peculiarity of Semitic linguistics that a large majority of these consonantal roots are triliteral.

للرجوع الى فهرس الصفحة

2- Derivation:

Derivation is the process of forming new words from a root (called "Derived Forms"), using certain "derivational patterns". For example, the following "Derived Forms" may be formed from the root (س ل م):
- Using the derivational pattern <فَعِل> to form the word (سَلِم)
- Using the derivational pattern <فاعِل> to form the word (سالِم)
- Using the derivational pattern <فَعالة> to form the word (سَلامة)
- Using the derivational pattern <فَعيل> to form the word (سَليم)
- Using the derivational pattern <إفعال> to form the word (إسلام)

للرجوع الى فهرس الصفحة

3- Inflected Forms:

Arabic is a highly-inflected language. In contrast to "Derivations", inflected forms are generated from "Derived Forms", using a complex system of prefixes and suffixes for verbs and nouns. An Arabic word may have up to 10,000 inflected forms.

للرجوع الى فهرس الصفحة

4- Semantic(s):

In linguistics, semantics is the subfield that is devoted to the study of meaning, as inherent at the levels of words, phrases, sentences, and even larger units of discourse (referred to as texts).

للرجوع الى فهرس الصفحة

5- Orthography:

The orthography of a language specifies the correct way of using a specific writing system to write the language. The Arabic alphabet has 25 basic letters. There are no distinct upper and lower case letter forms.
Both printed and written Arabic are cursive, with most of the letters directly connected to the letter that immediately follows. Each individual letter can have up to four distinct forms, based on its position within the word. These forms are:

  • Initial: at the beginning of a word; or in the middle of a word, after a non-connecting letter.
  • Medial: between two connecting letters (non-connecting letters lack a medial form).
  • Final: at the end of a word following a connecting letter.
  • Isolated: at the end of a word following a non-connecting letter; or used independently.

للرجوع الى فهرس الصفحة

6- Long and Short vowels:

Long vowels are written, but short ones are not, so the reader must be familiar with the language to understand the missing vowels.

للرجوع الى فهرس الصفحة

7- Spelling error:

These forms are:

  • Typographical error:

 A mistake in printing, typesetting, or typing, especially one caused by striking an incorrect key on a keyboard.

  • Linguistic error:

A mistake in printing, typesetting, or typing, that is caused by linguistic ignorance of the correct word form.

للرجوع الى فهرس الصفحة
 
 
All Rights Reserved. www.AlKhawarizmy.com
Valid XHTML 1.0 Strict | Valid CSS 3.0