|
| The
Problem in Detail |
The Solution that alKhawarizmy Introduces|
Who
Uses KSearch
|
KSearch Features
| Product Editions
The Problem in Detail:
Let us assume that an Arabic
user is interested in international politics and wishes to
search for the phrase "اجتماع
الأمم المتحدة" ("United Nations meeting"), via
an internet search engine, or a search engine on a
particular news website.
The search results will only
include phrases with the words found "as is" from the search
query. Phrases such as "وقد اجتمع
أعضاء الأمم المتحدة" ("Members of the United Nations
met") will not appear in the results. KSearch is
capable of finding all the inflected forms of a word (via
morphology), to produce such results.
Let us consider another
interesting notion. Supposing a user wishes to search for
the phrase "حادث الطائرة
المصرية" ("The Egyptian airplane crash/accident").
Current search engines may produce results containing the
phrase "تحادثوا
عن عمل فني متميز" ("[They] conversed about a
distinctive work of art"), which is not what the user
wanted. The problem stems from
the fact that the word "حادث"
is ambiguous.
KSearch, on the other hand, provides the user with a choice
of meanings for the word "حادث":
Either (crash/accident) or (converse). Based
on this, the redundancy in the search results is reduced.
Top
of Page
The Solution that alKhawarizmy Introduces:
KSearch is an electronic search
engine, capable of searching through large amounts of Arabic
web pages and documents, quickly providing comprehensive and
accurate search results. It caters for the features of the
Arabic language; the absence of this feature renders any
search engine useless in searching through Arabic content.
Top of Page
Who Uses
KSearch:
The need for such a search
engine emanates from several important facts: The amount of
human knowledge that is stored electronically, doubles every
year, of which textual information constitutes about 70%.
Arabic content consists of about 100 million web pages, in
addition to hundreds of thousands of electronic documents
archived in companies, organizations, government bodies,
etc.
As an example of large
organizations containing a large amount of Arabic content,
there are 76 Arabic newspapers (out of a total of 140),
that have websites and a search engine. There are 22 million
Arabic users that enter Arabic search queries; this may be
attributed to the fact that 65% of Arab internet users (in
2005) cannot read English.
alKhawarizmy also conducted
research on a sample of 100 different Arabic websites.
It was found that 48% of the sample use conventional search
engines (without morphological or meaning search). The rest
(52%) do not have a search engine, in spite of the large
amount of Arabic content in 60% of the sample.
Therefore, there is a vast
market area for KSearch in which to move; this market
consists of the following:
- Arabic websites that already have a search engine and wish
to upgrade to acquire a smarter search engine.
- Arabic websites that do not have a search engine and wish
to add one.
- Arabic companies and organizations that have intranets and
wish to be able to extract information from Arabic documents
residing on their networks, quickly and efficiently, thereby
saving valuable time going through redundant search results.
Top
of Page
KSearch Features:
1. Arabic
Morphological Search:
If a user were to search for
the word "اجتماع"
(meeting), the search results would contain various
inflected forms, such as "اجتمع"
([he] met), "يجتمعون"
([they] meet), etc. Traditional engines that do not have
morphological search might, at most, retrieve words that
have "اجتماع" (a
meeting) as part of the word, such as
"اجتماعهم" ([their]
meeting), "واجتماع"
([and] a meeting).
2. Differentiation
between Word Meanings:
If there is more than one
meaning to an input query word, the user may choose the
meaning he wishes to search for. The search results will
largely contain the inflected forms of the word, that belong
to that meaning. This helps reduce the redundancy that
results from morphological search only.
3. Search
using Logical Operators:
In addition to the "All Words"
and "Any Words" search types, the system includes "Logical
Search" or "Boolean Search". Logical Search allows users to
search using either exact phrases, or by using the logical
operators AND, OR, NOT. It also allows users to specify word
adjacency (proximity), specifying a number of intervening
words, and whether the words are in order of user input or
not.
4. Search
using Wildcards:
The user can search for proper
nouns of non-Arabic origin, using wildcards. The wildcards
supported are ? (denoting a single Arabic character) and *
(denoting any number of Arabic characters)
5. Search
words are highlighted in the results pages. This is
very important for Arabic, in particular, as it relieves the
user of having to additionally search for all the inflected
forms of a search word on a page; the lack of this feature
would render the search system useless.
6. The
following document formats are supported: HTML, TXT, RTF,
PDF, as well as UNICODE encoded documents.
7. The
Arabic dictionary that supports KSearch is a comprehensive
dictionary of contemporary Arabic, that includes up-to-date
words used in the various media.
8. Fast
Indexing Engine that uses 64 bit Technology:
KSearch includes a fast
indexing engine for the various file formats supported
(Intranet Edition), as well as for the databases supported
(Web Edition).
The indexing rate reaches
speeds of 20,000 words/sec., or 1.2 million words/min. on a
desktop PC, equipped with an AMD Athlon XP 2500+ processor,
1GB of memory and an IDE hard disk drive.
In addition, the indexing
engine uses 64 bit technology, which does not limit the size
of the index generated; 32 bit technology limits the index
size to 4GB.
9. Comprehensive
Index Management:
The indexing system includes
comprehensive index management, that allows a user to divide
groups of documents or web pages into separate indexes, for
flexible management.
The system
also provides the capability of deleting, updating and
merging indexes, as well as the deletion/addition of files
or folders from/to indexes, respectively.
Top of Page
Product Editions:
The following editions of
KSearch will be supported in both versions 1.0 and 2.0:
1. KSearch
Web Edition:
In this edition, the website's
database is indexed (using the KSearch indexing engine) and
the indexes are stored on the site's server. Users can then
search via a browser, which will use the KSearch search
engine installed on the server.
2. KSearch
Intranet Edition:
In this edition, a company's
documents are indexed on a centralized server connected to
the company's intranet. Anyone on the company’s network may
search these documents via a browser, through an internal
web application that accesses the KSearch search engine.
|