During the
past several years, different methods have been introduced to prepare Indian
language documents by entering the text through specific transliteration
schemes. Data entry through transliteration is quite close to phonetic
mapping of Indian language characters to the letters of the Roman alphabet.
Notable among these
methods are,
The ITRANS
package developed by Avinash Chopde which makes use of an approach to printing
documents through LATEX. ITRANS now has the support for several Indian
languages but the transliteration scheme is not uniform for all Indian
languages. However it is a highly recommended package for printing documents.
Substantial number of Sanskrit and Hindi
documents have been prepared using ITRANS.
The RIT
package developed by RamaRao Kanneganti and Ananda Kishore to prepare Telugu
text. RIT is not unlike ITRANS but offers greater flexibility during data
entry. RIT also relies on LATEX to get an output. A large number of Telugu
documents (Poetry and Texts) are available in RIT format.
The ADAMI
and
ADHAWIN
packages
of
Dr. Srinivasan
are
exclusively for Tamil and are based on the principle of using TrueType
fonts to view documents under Windows or the Mac. They constituted one
of the earliest approaches (1995) to dealing with Tamil text on a PC. Data
entry in Tamil uses a special transliteration scheme different from ITRANS.
The software uses a specific internal code for the Tamil characters (8
bit representation) which permits direct viewing on the screen using TrueType
fonts specially designed for this purpose. The Adhawin package is quite
popular among the net Tamil community. A good number of Tamil
documents have already been prepared using Adhawin. As of 2000, the
schemes have yielded to other methods for dealing with Tamil text on a
system
The MYLAI fonts software
developed by Dr.
Kalyana Sundaram of the Swiss Federal Institute of Technology, is again
for use with TAMIL. Data entry for this scheme is based on Transliterated
Tamil using Roman letters but the transliteration scheme is different from
that of Adhawin. Mylai fonts are supported under windows, Mac and Unix
and an account of the transliteration scheme used, both MYLAI and Adhawin
permit Tamil text to be included within web pages.
A
comparison of the above schemes is provided in a separate page
Other
Schemes
A review of the archives
of Indian language documents on the net reveals several other schemes of
Transliteration and fonts. The Indology
site in England has electronic texts of Sanskrit Documents prepared in
CSX format, a special input method recommended in 1990 for Sanskrit data
entry using a Dos feature called Code page switching. ITRANS which is more
recent offers conversion facilities to convert from CSX to the ITRANS format.
The Tamil archives
of the
Institute
of Indology and Tamil Studies in Germany (IITS) has an archive of texts
of Tamil Sangam literature and many Sanskrit documents. These archives
are based on the transliteration scheme recommended by the University of
Madras, a fairly well known and accepted standard. This scheme is somewhat
different from the other transliteration schemes used for Tamil.
The Mahabharata
and Ramayana texts have been prepared by Prof.
Muneo Tokunaga of Kyoto university in Japan and the massive amount
of effort that has gone into preparing the archives deserves special mention
as also the dedication and patience with which the project was undertaken
and completed.
A list
of resources for documents in Indian languages is provided by the linked
site in Helsinki, Finland.
Apart from these,
several schemes (and software) have been introduced to permit display of
Indian language text on the web. Many of these schemes are contributions
from groups interested in web sites for different magazines.
Transliteration
schemes and Fonts
Since transliteration
schemes have become popular, it may be useful to discuss some technical
issues relating to the methods used to prepare html documents with embedded
Indian language text. The following paragraphs provide insight.
Ultimately, the display
of characters on a screen or printer requires what is known as a rendering
program which generates the shape of the character from the encoding used
to represent the character. Typically this is accomplished through FONTS
and web browsers have excellent capabilities to handle different types
of fonts. HTML documents may contain text in specific languages by specifying
the font to be used while displaying the text. This was by and large the
method used to include text in other languages within a HTML document.
It must be noted that the browser viewing the document showed be able to
load the required FONT locally. Only then the text is correctly displayed.
In the absence of the required fonts, the browser will use some default
and the text will not intelligible at all.
It has been argued
(with enough justification) that displaying multilingual text using the
above approach is not the right way to deal with multiple scripts in a
single document. It turns out that applications should be aware of the
character set used to represent the text in a document. The above approach
takes care of only the display and linguistic processing will be severely
hampered. In respect of Indian languages, the method using a specific font
has somehow remained in use in spite of variations observed in different
fonts, even for a given language. The situation has changed somewhat after
Unicode support was included and today, Indian language text can be handled
through Unicode though there are some serious linguistic issues to be answered..
Transliteration schemes
employed while entering Indian language Texts may have no connection with
FONTS at all. That is, the transliteration mechanism is only a means to
identifying what letters (vowels, conjuncts, consonants etc.) are present
in the text that should be displayed in the specified Indian language.
Hence the transliterated text is converted into the required encoding of
the characters to be shown and this encoding is specific to the FONT chosen
to display the text.
Packages like ITRANS
rely on METAFONT which is basically a program to generate the display based
on descriptions of the shape of the character. Packages like Adhawin use
FONTS which were designed for handling Tamil. In these, the transliteration
directly relates to the placement of the Glyph (shape of the Tamil letter)
within the ADHAWIN font which must be used by the package. Data entry thus
becomes font specific and the text prepared using the package will not
be rendered correctly if some other font is used for the display.
Preparing web pages
(html documents) which include Indian language text requires the use of
word processors which support FONTS. Even with the support, additional
factors must be taken into account while entering text. Due to the fairly
complex nature of the Indian scripts, data entry is quite cumbersome with
most word processors. A point to remember here is that word processors
are not yet universal enough to run on all platforms. At least in the context
of Indian language text entry, a standardized editor is very much required.