Unicode
for Tamil
Among the Indian languages,
Tamil employs a simpler script with an equally simple orthography. Surprisingly,
this simplicity is seen even as early as two thousand years ago. The Brahmi
script for Tamil employs a limited number of shapes being restricted to
just the twelve vowels and eighteen consonants. Conjunct aksharas involving
two or more consonants are simply composed from the generic equivalents
of the consonants and thus ligatures or conjunct forms are totally avoided.
Pure Tamil could therefore be written with 18x12 shapes together with 18
generic consonant shapes and 12 vowel shapes. The special letter known
as the "ayda" letter should also be included in the set which brings the
number to 247 shapes. These can be further reduced if typesetting is effected
using overlapped shapes.
Today, Tamil orthography
continues to retain its simplicity though the shapes have changed from
the days of Brahmi. It seems as if 247 code values would adequately represent
the syllables of Tamil. This is indeed so but in practice, one must concede
the presence of consonants from Sanskrit which have come into fairly regular
use. There are six sanskrit consonants which should be included along with
their representation with the twelve vowels as also their generic forms.
This will add 78 more shapes to the set. Thus the full set set will require
more than 8 bits per code.
Unicode for Tamil
initially specified codes only for the basic vowels and consonants together
with the medial vowel forms. The "pulli" that turns a consonant into
its generic (linguistically defined) form is also viewed as a medial vowel.
The initial assignment
of Unicode for Tamil included only nine numerals omitting the code
for zero since it was presumed that symbols were available for tens, hundreds
and such. This omission was the subject of much debate and the recent version
of Unicode has included the same. Also the consonant "ca" from Sanskrit
(the one seen in Shree) which was not included in the earlier version has
now been included.
Differing opinions among
the experts in respect of Tamil Unicode.
During the past few
years, the topic of assignment of Unicode for Tamil has been debated much
in the computing circles of Tamilnadu. Regrettably, the discussions have
not resulted in any form of consensus. The differing opinions have to do
with the interpretation of the linguistic definition of a consonant. The
basic issue has to do with whether codes must be assigned for a generic
consonant (without any vowel) or whether it is alright to view a
cosnonant as one with the implied "ah". Apparently, the experts have woken
upto the fact that the coding method should conform to linguistic requirements.
The absence of any form of consensus has created more confusion and it
is unlikely that any of the ideas discussed will find a place in the coding
scheme.
Given below are some
links to pages discussing the issue of Unicode for Tamil. Some of them
are proposals for effecting changes to current Unicode assignments. Some
of the URLs are difficult to reproduce since they contain spaces or other
wide characters. It should be possible to access the documents with a bit
of effort though.
http://www.infitt.org/ti2000/tamilinaiyam/papers/D1vkuniok.pdf
(Tamil Encoding in Unicode
- A Comparative Study)
http://www.angelfire.com/empire/thamizh/2/aanGilam/index1.html#flaws
(Problems as perceived by
an expert)
http://www.tscii.org/Fonts%20and%20Utilties/Documents/tscii_spec.htm
(Though not directly related
to Unicode, a standard that is popular)
http://groups.msn.com/Tamil-Unicode
http://www.venkatarangan.com/blog/default,month,2005-08.aspx
(The URL may be difficult
to input. Try a search for the last part of the URL)
http://www.araichchi.net/kanini/unicode/Tamil_Unicode.html
http://mailman.ldc.upenn.edu/pipermail/lodl/2004-September/000090.html
www.ss003b3751.pwp.blueyonder.co.uk/Tamil/Unicode/
Tamil%20Unicode.html
http://www.webtamilan.com/tamilinayam/unicodetamil1.htm
http://ta.wikipedia.org/wiki/
(Please search for Natkeeran at this site)