Transliteration Principles
Transliteration
refers to the process by which one reads and pronounces the words and sentences
of one language using the letters and special symbols of another language.
Thus transliteration is meant to preserve the sounds of the syllables in
words. Transliteration is helpful in situations where one does not know the
script of a language but knows to speak and understand the language nevertheless.
For several
decades now, Roman transliteration has been used to represent texts
of Indian languages, especially Sanskrit. In many printed books, a key to
transliteration would be printed at the beginning in the form of a table.
Since it is difficult to represent the aksharas of Sanskrit using just
the twenty six letters of the Roman alphabet, scholars used varying schemes
to accommodate sounds that could not be correctly indicated using appropriate
Roman letters.
Here are some
examples of transliteration as per the schemes which were in general use in
the past. The schemes are somewhat arbitrary in the choice of the Roman letters.
Sometimes
phonetics symbols are used in place of the normal Roman letters. Phonetic
symbols are basically the letters of the Roman alphabet with special marks
known as diacritic marks. Here are some examples of transliteration using
symbols from the phonetic alphabet. In the second set of aksharas shown
below, one sees the use of special symbols from the ascii character set in
place of diacritics.
Roman transliteration which
makes use of diacritic marks works better for Indian languages and in the
last few decades some standardization has been effected based on the recommendation
from the National Library in Calcutta. Roman letter assignments in this
scheme are phonetically equivalent to the aksharas of Sanskrit or other Indian
languages. As indicated earlier, the phonetic alphabet with diacritic
marks is very helpful for representing text in Indian languages. Such letters
are also easily typeset, for typefaces are available specifically for this
purpose. Typesetting was however attempted manually for nearly a century
until special word processing and typesetting applications were developed
using computers. These programs make use of high quality fonts to produce
good printouts and displays. However most of them rely on some indirect data
entry methods to generate the phonetic symbols.
The primary difficulty in data entry of the phonetic symbols is that there
is no provision to input the symbols directly using the standard ASCII keyboard.
Desktop publishing and word processing programs provide means by which the
glyph code of the symbol is input using the numeric keypad. While this
is acceptable, it does not provide a natural approach. Transliteration methods
which use only the displayable ASCII symbols do not run into this problem
since the ASCII letters can be typed in directly. A special computer program
would however be required to interpret the input string to produce the Indian
languages display or printout. This is precisely what the currently popular
transliteration schemes attempt. Schemes such as ITRANS, RIT, ADHAWIN etc., use
only the standard displayable ASCII letters and symbols to transliterate
the text. These schemes allow multiple representations for certain syllables
and long vowels but the processing program handles this well.
Top of page
Lack of uniformity between different schemes.
While transliteration based data input is very useful, one must
remember that the schemes themselves vary, even for a given language. The
consequence of this is that the data entry procedures will change depending
on the scheme and worse still, a given transliterated string will produce
different outputs for different languages/scripts. Take for instance the word
'yoga' . The transliterated data input for this string using the "ITRANS"
scheme is "yogA". However, when you use this string to get an output in Tamil,
using other schemes, you will get
as opposed to
which is the correct transliteration. The fact that the short forms of
the vowels "o" and "e" are present only in the Southern languages is the
real issue here.
Transliteration schemes have to face the problem of letters present in one
language and not in the other. Thus, unless a superset of letters from
all the Indian Languages is formed, uniform transliteration is ruled
out. Even if such a superset were identified, it turns out that unique Roman
letter combinations are not easily identified for complex Aksharas.
Moreover, the large number of vowels in Indian scripts also add to the complexity
in transliteration.
String Processing
using transliterated text.
One useful feature of transliterated representation of Indian Language
strings is that conventional string processing programs may be used to process
the text. However, applications such as sorting will produce erroneous results
as the sorting order of the Aksharas and Roman letters are quite different.
Many string processing applications such as processing a sentence may however
work properly, so long as the input strings do not contain special characters
which are needed for transliteration but can cause confusion if they happen
to be delimiters fixed for parsing routines.
With transliterated input, the representation for syllables is always multibyte
with varying number of bytes for different syllables. For example, if we were
to examine the aksharas in the second row of the letters seen in the image
above, we will see that the last two words contain two aksharas (samyuktaksharas
are treated as aksharas since they constitute one syllable) each. However,
the word "Arya" has four ascii letters but the word "dhR^shTvA" has
nine. So linguistically speaking, transliteration using Roman letters may
not be the best choice for text processing at the level of a syllable.
It would be helpful to have a representation which uses a fixed number of
bytes for each syllable. Such a representation would be ideally suited for
studying the metrical structure of poems or slokas.
Transliteration features in the IITMadras
Software.
The Multilingual software from IIT Madras has incorporated features
to help deal with transliterated text. The multilingual editor has a data
entry method that directly allows transliterated text to be typed in and the
text viewed in local scripts. A .llf file is also automatically created by
the editor. Those familiar with ITRANS based input will find this feature
helpful. We also have some utilities for viewing and converting transliterated
text. Additional
information about the utilities is available.
Top of page
|
|