About Us
  Who We Are
  Contact Us
   
KSearch
  KSearch Features
  Who Uses KSearch
  KSearch & Your Business
  KSearch Online Demo
  KSearch User Guide
   
Technologies
  White Paper
   
Other Links
  Site Map
  Home
  عربي
 



"Search Beyond the Words".. !

 



ا
The Arabic language possesses unique characteristics. In spite of there being about 22 Arab internet users and about 100 million Arabic web pages, yet international companies have neglected the different nature of the Arabic language.

 



| The Problem in Detail | The Solution that alKhawarizmy Introduces| Who Uses KSearch |
 
KSearch Features | Product Editions

The Problem in Detail:

Let us assume that an Arabic user is interested in international politics and wishes to search for the phrase  "اجتماع الأمم المتحدة" ("United Nations meeting"), via an internet search engine, or a search engine on a particular news website.

 

The search results will only include phrases with the words found "as is" from the search query. Phrases such as "وقد اجتمع أعضاء الأمم المتحدة" ("Members of the United Nations met") will not appear in the results. KSearch is capable of finding all the inflected forms of a word (via morphology), to produce such results.

 

Let us consider another interesting notion. Supposing a user wishes to search for the phrase "حادث الطائرة المصرية" ("The Egyptian airplane crash/accident"). Current search engines may produce results containing the phrase "تحادثوا عن عمل فني متميز" ("[They] conversed about a distinctive work of art"), which is not what the user wanted. The problem stems from
the fact that the word "حادث" is ambiguous.


KSearch, on the other hand, provides the user with a choice of meanings for the word "حادث": Either (crash/accident) or (converse). Based on this, the redundancy in the search results is reduced.

 

 Top of Page


The Solution that alKhawarizmy Introduces:

 

KSearch is an electronic search engine, capable of searching through large amounts of Arabic web pages and documents, quickly providing comprehensive and accurate search results. It caters for the features of the Arabic language; the absence of this feature renders any search engine useless in searching through Arabic content.

Top of Page

 

Who Uses KSearch:

 

The need for such a search engine emanates from several important facts: The amount of human knowledge that is stored electronically, doubles every year, of which textual information constitutes about 70%[1]. Arabic content consists of about 100 million web pages, in addition to hundreds of thousands of electronic documents archived in companies, organizations, government bodies, etc.

 

As an example of large organizations containing a large amount of Arabic content, there are 76 Arabic newspapers (out of a total of 140)[2], that have websites and a search engine. There are 22 million Arabic users that enter Arabic search queries; this may be attributed to the fact that 65% of Arab internet users (in 2005) cannot read English[3].

 

alKhawarizmy also conducted research on a sample of 100 different Arabic websites[4]. It was found that 48% of the sample use conventional search engines (without morphological or meaning search). The rest (52%) do not have a search engine, in spite of the large amount of Arabic content in 60% of the sample.

 

Therefore, there is a vast market area for KSearch in which to move; this market consists of the following:
- Arabic websites that already have a search engine and wish to upgrade to acquire a smarter search engine.
- Arabic websites that do not have a search engine and wish to add one.

- Arabic companies and organizations that have intranets and wish to be able to extract information from Arabic documents residing on their networks, quickly and efficiently, thereby saving valuable time going through redundant search results.



[1]  According to the School of Information Management and Systems, UC Berkeley, U.S.A.
[2] According to Google Directory http://www.google.com/dirhp
[3] Reference: Madar Company for Market Research www.madarresearch.com (via IslamOnline.net)
[4]  Research conducted on a random sample of 100 Arabic websites obtained from Google Directory

 

 Top of Page

 

 

KSearch Features: 

1.   Arabic Morphological Search:

If a user were to search for the word "اجتماع" (meeting), the search results would contain various inflected forms, such as "اجتمع" ([he] met), "يجتمعون" ([they] meet), etc. Traditional engines that do not have morphological search might, at most, retrieve words that have "اجتماع" (a meeting) as part of the word, such as "اجتماعهم" ([their] meeting), "واجتماع" ([and] a meeting).

 

2.   Differentiation between Word Meanings:

If there is more than one meaning to an input query word, the user may choose the meaning he wishes to search for. The search results will largely contain the inflected forms of the word, that belong to that meaning. This helps reduce the redundancy that results from morphological search only.

 

3.   Search using Logical Operators:

In addition to the "All Words" and "Any Words" search types, the system includes "Logical Search" or "Boolean Search". Logical Search allows users to search using either exact phrases, or by using the logical operators AND, OR, NOT. It also allows users to specify word adjacency (proximity), specifying a number of intervening words, and whether the words are in order of user input or not.

 

4.   Search using Wildcards:

The user can search for proper nouns of non-Arabic origin, using wildcards. The wildcards supported are ? (denoting a single Arabic character) and * (denoting any number of Arabic characters)

 

5.   Search words are highlighted in the results pages. This is very important for Arabic, in particular, as it relieves the user of having to additionally search for all the inflected forms of a search word on a page; the lack of this feature would  render the search system useless.

 

6.   The following document formats are supported: HTML, TXT, RTF, PDF, as well as UNICODE encoded documents.

 

7.   The Arabic dictionary that supports KSearch is a comprehensive dictionary of contemporary Arabic, that includes up-to-date words used in the various media.

 

8.   Fast Indexing Engine that uses 64 bit Technology:

KSearch includes a fast indexing engine for the various file formats supported (Intranet Edition), as well as for the databases supported (Web Edition).

 

The indexing rate reaches speeds of 20,000 words/sec., or 1.2 million words/min. on a desktop PC, equipped with an AMD Athlon XP 2500+ processor, 1GB of memory and an IDE hard disk drive.

 

In addition, the indexing engine uses 64 bit technology, which does not limit the size of the index generated; 32 bit technology limits the index size to 4GB.

 

9.   Comprehensive Index Management:

The indexing system includes comprehensive index management, that allows a user to divide groups of documents or web pages into separate indexes, for flexible management.

The system also provides the capability of deleting, updating and merging indexes, as well as the deletion/addition of files or folders from/to indexes, respectively.


Top of Page

Product Editions:

The following editions of KSearch will be supported in both versions 1.0 and 2.0:

 

1.   KSearch Web Edition:

In this edition, the website's database is indexed (using the KSearch indexing engine) and the indexes are stored on the site's server. Users can then search via a browser, which will use the KSearch search engine installed on the server.

 

2.   KSearch Intranet Edition:

In this edition, a company's documents are indexed on a centralized server connected to the company's intranet. Anyone on the company’s network may search these documents via a browser, through an internal web application that accesses the KSearch search engine.

 

 
     
  © 2007 alKhawarizmy Language Software - All rights reserved .