About
IITM Software
The set of
computer programs developed at the Systems Development Laboratory, Department
of Computer Science and Engineering, IIT Madras, India, go by the name IITM
Software. The software permits the development of applications supporting
multilingual user interfaces in all the Indian languages. With some applications,
users will also be able to interact with a computer in their own mother tongue.
The IITM software is useful for teaching people about computers and also
get trained in their use, directly in their own mother tongue.
The distinguishing aspect of the IIT Madras software is that the Multilingual
issues are handled at the level of the application itself and so support
within the Operating System is not called for. By taking this approach,
the IIT Madras team has been able to provide a useful set of applications
for data entry, printing and processing of text on a variety of computer
systems. Data prepared on one computer may be moved to other systems without
the need for special conversion utilities.
Almost all the multilingual developments in the world today, seem to go in
the direction of providing support within the Operating System to handle
the specific languages of the world. The IIT Madras team has observed
that while such an approach may indeed work, getting a common system to accommodate
all the Indian languages is not going to be easy. Providing user interaction
on a system by writing applications which depend both on the platform as
well as the language is going to result in too many variations across platforms.
Accomplishing the same at the level of an application not only eliminates
system dependent programming but will actually provide a consistent and uniform
user interface across all systems.
The software
developed at the Institute may be classified as under.
The IITM software project was begun in 1991 and initially concentrated on
developing a uniform internal representation for the aksharas of Indian languages.
This approach is consistent with the writing systems which are syllabic in
nature, as is the case with all Indian scripts. This led to a system of sixteen
bit codes for the aksharas of the languages and the scheme supported more
than 12000 different aksharas. Subsequently, a C library, supporting functions
similar to the curses library of Unix, was developed for different platforms
and a simple editor, viewer and a postscript printing utility were completed.
This library included an effective character rendering utility to display
the aksharas of our languages using primitives made up of curves. This
approach allowed text to be rendered uniformly on all systems. No fonts were
used by the library as there were (and still are) no standards for Indian
language fonts on any of the platforms. Besides the Indian languages, the
system was also able to deal with Greek, Hebrew and Japanese Hiragana.
The
first three applications developed using the library were 1) a multilingual
viewer for viewing text, 2) a screen editor capable of handling all Indian
scripts and 3) a printing utility to generate a Postscript file from the
text prepared using the editor so that hard copy output may be obtained.
The multilingual viewer could be invoked as an external application
to view Indian language text from web browsers and email applications that
supported MIME attachments. It was a simple but very effective approach to
displaying Indian languages on web browsers and it did not require fonts
of any sort to be installed at the browser end. All the three applications
were developed for use with many different computers including DOS, Win-3.x,
Win95/NT, Unix systems including Linux, Sun workstations, HP systems, IBM
RS6000 machines, Silicon Graphics systems and finally the Macintosh.
The
character rendering program was also able to generate a .gif file of the
text to be displayed. This way it was also possible to serve Indian language
documents which may be seen on virtually any graphics based browser by generating
the .gif file on the fly. A search engine was also developed by adapting an
existing Indexing program to work with 16 bit characters and this with the
.gif file generation, allowed web based search applications to work with
Indian language text. Samskritapriyah, a volunteer group in Madras
used the software to put up on-line lessons for
learning Sanskrit using this approach and this was received very
well.
The
second phase of the development of the IITM software concentrated on enhancing
the system to support fonts. This is one of the most complex problems since
there is no standardization whatsoever in respect of Indian language fonts.
Font designers had used arbitrary encodings and arbitrary choices for the
glyphs themselves in generating the fonts. As a consequence, Indian
language applications were necessarily font specific and most certainly
language specific.
The
IIT Madras system handled this effectively by developing a layer between
the character codes and the font rendering program and this layer used a
table to derive the glyphs for any akshara. This way, multilingual text could
be displayed merely by switching tables. The separation between internal
representation and display rendering is truly incorporated in this approach
which makes linguistic processing very effective. Also any font that has
the necessary glyphs to render all the required aksharas could be used for
the display.
By this
time it was clear that the clue to developing multilingual interfaces was
a language and font independent internal representation. Having already
implemented this as part of the initial design, application development became
easier. The font based output also satisfied the requirements of users who
were contemplating getting quality printed outputs for publications.
By June
1998, the development team took a decision to restrict the development of
applications to two platforms only, Linux and Microsoft Windows, since the
process of development for a variety of systems was getting to be unwieldy.
The MFC
based Windows editor is a particularly useful piece of software since
it provides Word compatible documents in the .rtf format. The multilingual
editor, together with the a word processor or DTP application can produce
truly high quality printed documents.
Top of Page
The
project taken up at the Systems Development Laboratory represents a unique
experiment in system design by continuing the development over a period of
more than fourteen years resulting in a set of useful applications for use
in the country. During the second half of 1998, the development team worked
at enhancing the system to provide support for disabled
persons. We have been able to synthesize speech
in indian languages from the text representation supported by the software.
It has also been possible for us to produce Braille output
in indian languages consistent with the Bharati Braille recommendations.
After
preparing the multilingual text, the Braille output may be obtained
using any standard Braille embosser connected to the system. These two applications
have added strength to the IIT Madras software as they can provide visually
handicapped persons a means to accessing information much more meaningfully.
A number of
programs listed above were essentially developed during the period 1998-2005.
The lab had set up a web site ( the one you are currently viewing) to present
the IITM software to the users and had also included some useful on-line demos
with Java applets and search engines. Application development for the visually
handicapped continues to receive priority as also text and linguistic processing
applications. The PERL modules for dealing with Indian scripts (16 bit codes)
should allow easy development of many linguistic applications. A page describing
the different applications is also included at the site.
A note about the
.llf files
The syllable level coding which forms the internal representation
of text in Indian languages is special to the IITM Software. Each akshara,
be it a pure vowel, consonant or conjunct, has a unique sixteen bit representation
which is also uniform across all the languages. This representation includes
numerals and special symbols used in the writing systems of India.
All applications
developed using the IITM software library use the sixteen bit representation
for string processing and the .llf format is just a series of 16 bit codes,
similar to pure text in ASCII where each letter goes with a seven bit code
within a byte. All programs built around the software, use the .llf
format. The .llf format is a binary format and is not amenable to editing
with any other software. The IITM software does include utilities to convert
the representation to ISCII, Unicode and Roman transliterated text and
hence one should be able to handle files prepared using other software by
converting them to the .llf format.
Top of Page