The conference also arrived at a standard for data entry in Tamil. Three
different keyboard layouts were arrived at for use by different sections
of the users. The first relates to what is termed as the phonetic keyboard
where data entry is effected through lower case keys alone for the basic
text. the second scheme referred to as the Romanized keyboard, specifies
data entry based on the Roman letter that comes close to the sound of the
vowel or consonant. The third is the layout seen in standard Tamil typewriters.
Details of the keyboard layouts and some software that support the schemes
are included in the pages at the Tamilnet99 web site.
The conferences
held in 2000 and 2001 (Links available on the right) do not seem to have
led to significant additional recommendations. The proliferation
of fonts seems to continue as also specific encoding recommendations. Most
of the efforts seem to relate to the Win9X platforms. The new TISCII
recommendation seems to be gaining some ground, as seen from the increasing
reference to it in the web..
Unicode
for Tamil
Unicode has
become a world standard and many computer applications have provided Unicode
support so that multilingual text can be handled. The Unicode standard
proposed for Tamil has not taken into consideration some of the important
linguistic issues. Also, even among the professionals, there seems to be
considerable difference of opinion in respect of the adequacy of Unicode.
At the Systems
Development Laboratory, the view held is that Unicode is not really suited
for text processing in Indian languages though the data entry and display
requirements could be handled with the current Unicode assignments for
Indian languages. There are differing views about the suitability of Unicode,
even in respect of Tamil. The specific issues have been addressed in a
separate section on Unicode with
specific reference to Tamil.
The
relevance of the IITM Software
The IITM
software, on account of its flexible approach to computing with Indian
languages, was able to support the requirements specified in the Tamilnet99
standards. The multilingual editor conforming to the Tamilnet99 standards
was easily developed at the lab and has been made available for general
use. Please follow the appropriate links on the right.
The real
power of the IITM software becomes apparent in applications that require
linguistic analysis of text in Tamil. The links below refer to many applications
that have been developed at IIT Madras for document preparation and linguistic
processing with Tamil.
Multilingual
text editor
A simple but
very effective text editor conforming to the syllabic requirements for
linguistic processing. This editor provides support for generating displays
with a variety of fonts, including the fonts such as Tamilnet99, Anjal,
Murasu, Mylai, Tiscii and more. This editor also provides variations in
internal storage for the letters which have the same shape but differing
sounds. The data entry scheme for the editor is flexible. The Tamilnet99
standard for data entry is fully supported here. Look up the features
in the section on Multilingual Editor (linked above).
A text to speech enhanced
version of the editor has been provided for the benefit of the Visually
Handicapped. Files prepared with the editor may be pasted into Word, Outlook
Express and other Windows applications. The editor is distributed free
of charge and executable binaries for Microsoft Windows as well as Linux
may be downloaded from this web site. The link above provides additional
details.
Letter
and word Frequency Count programs
The set of utilities
developed for linguistic processing in different Indian languages includes
a program for computing the frequencies of occurrences of vowels, consonants
and their combinations in any given text. Essentially the program does
a count of the syllables and tabulates the results in a useful manner.
The link above will also take you to the results of frequency counts of
the aksharas in Tirukkural and Sambhandar Tevaram. The results have much
to reveal.
Sorting
utilities
Lexical ordering according
to the specified order of the letters of Tamil has been a major issue and
this specific problem has not been given sufficient attention by those
developing standards for Tamil. The Unicode assignment for Tamil is a hopelessly
mangled set of the letters but the claim is that Unicode does not purport
to preserve lexical ordering!
The IITM Software
preserves the lexical ordering and utilities for sorting and indexing text
prepared using the Multilingual editor. It is also possible to write utilities
in PERL to effect text processing and regular expression matching with
text in indian languages. The section on PERL modules for Indian languages
has additional information.
There is also
the fundamental question about what constitutes a proper sorting order
for Tamil. This question can be answered only after the full set of aksharas
and special symbols required for regular use are correctly identified and
codes assigned for the set. We have a separate
section discussing the set of Tamil characters that would adequately
represent and meet linguistic processing requirements of Tamil.
Email
and chat
Sending and receiving
email with text in Tamil or handling chat has been greatly simplified.
All that is required is a cut and paste into the application, Outlook Express,
Instant messenger or similar ones, from text entered into the Multilingual
Editor. Using the Multilingual Editor, email and chat are just one simple
paste operation. The link above discusses the principles and also explains
how you can send the required text as an attachment as well so that email
in Tamil can be sent and received on Linux systems too.
Tirukkural-
On-line reference
A comprehensive
on-line
reference for Tirukkural permitting the text of Kural to be viewed from
virtually any graphics enabled browser has been included at this site.
No fonts of any kind will be required. The pages also offer the provision
to search for words in the text of Kural, where the search word may be
directly entered into the web page in Tamil. A wordlist consisting of all
the words in the text of Kural is also presented with reference information
on the couplet containing the word. This presentation is unique on the
web. We do not know of any other site that offers a service close to what
is provided here (as on May 2006).
Search
engines which can accept query strings in Tamil
An example of a web
based application searching for words in the text of Tirukkural, Tolkappiam
and other works in Tamil. This is a Java based web interface and allows
data entry of text strings in Tamil. This is a unique presentation.
Text
to Speech generation
A look at the approach
taken by IIT Madras to synthesize speech in Tamil as well as other Indian
Languages. The results are extremely satisfying, despite a robotic flavour
to the synthesized output. The speech enhanced Multilingual Editor and
other applications have found acceptance by the Visually Handicapped community
in Tamilnadu as well as other states in India. We believe this to be the
VERY FIRST demonstration of continuous text to speech generation on the
web in any Indian language. If you are intrigued, we invite you to see
and hear the output in the
on-line demo.
Data Base applications
Using PERL to work with
Tamil