Multilingual Systems: Source Code
Distribution
The development of the IITM
Software may be grouped into the following categories. The platforms are
also indicated. The links will take you to pages providing additional information
relating to the sources as well as the development tools used.
1.
Applications relating to text preparation
(The
applications relate to MS Windows, X-Windows, Java)
The
IITM local language library
The
basic Library of functions for handling local language text. The linked
page has the details of the library as well as the sources.
iitmfced
Linux editor
r2leditor
Text
editors. iitmfced - The Version for Microsoft Windows, Linux_ed
- the version for Linux and r2leditor refers to the application for handling
right to left scripts (Urdu, Arabic etc.). Basic documentation about these
is included in individual pages describing each application.
Description
of the editors and their features in included elsewhere in these pages.
The open source version of
the multilingual editor can be accessed from http://sourceforge.net/projects/imli/.
2.
Conversion utilities
llf2html
, llf2gif , llf2jpg
, llf2png,
llf2pdf
and llf2ps. These allow
conversion of local language text to other formats. They can be used with
CGI programs to deliver Indian language text in flexible ways. The sources
are for Linux.
tconvert
and tview. convert text in transliterated format such as ITRANS, RIT
etc., to llf, the syllable level representation of text in local languages.
Features
supported by tconvert and tview are described in a separate page.
Convert between Unicode,
ISCII and llf.
llf2brl
Utility for producing Bharati Braille output. The application is described in an independent page.
3.
Linguistic text processing
Regular expression
matching
Indexing
text data bases (Linux Sources)
The principles
of Indexing Indian language text are described in an independent page.
Search
engines suited for Indian language text. The link is for the source.
Description of the search
methods is given in a separate page.
Generating concordances
Frequency
of occurrence of aksharas in a text data base. Description of the method
is covered in an independent
page.
4.
Fonts and Font generation utilities. (MSWindows and Linux)
Script
IITM font generation tool. Script
documentation
This is based on the principle
of displaying Aksharas through strokes. Each stroke is a Bezier curve.
The set of aksharas is specified in terms of shapes made up of strokes.
The tool allows interactive generation of the shapes.
The data structures employed
here may be used to display Indian language text on any computer without
the need for fonts. Just a graphical display is all that is required. The
IITM c-callable local language library caters to functions providing the
equivalent of getch(), putch() etc..
Font generation is accomplished
in the following manner.
skeletal shape for an Akshara
is first generated. Metafont is used to apply a Brush Pattern to this shape
and an image generated in the bit mapped form. "limn" , the utility to
generate the outline for a bit mapped shape is used to convert the bit
map to an outline description. The output of "limn" is processed to get
a PostScript font (.pfa or .pfb).
5.
Shells and Script processing. (Linux and MSWindows)
Local
language shells to provide interactive use. Support the equivalent
of the basic commands in a Unix Shell. The local language shells allow
the use of local language names for files. The source is for Linux.
Local
language shells with speech output for use by the visually handicapped.
Scripting
languages: A simple scripting mechanism to write scripts in local
languages. Such scripts may be interpreted using the script interpreter
running under the shell. Link for the source is provided in the linked
page.
6.
Data Base systems. (MSWindows and Linux)
PERL utilities
to interface with mysql. Data is stored in a mysql data base and accessed
through the client interfaces. The IITM encoding scheme is retained in
the data stored. The source provided here is an example of a cgi-bin script
to interface with mysql data (online dictionary).
7.
Text to Speech related applications. (MSWindows and Linux)
Speaking
local language text from a .llf file. The speech engine used is MBROLA.
Source is for MS Windows. The Linux
source is included independently. The application is described in an
iindependent page.
Screen readers and applications
with speech output for text preparation in Indian languages. Sounded,
the speech enhanced multilingual editor.
Enhancements
to permit JAWS for DOS (free screen reader) to work under MS
Windows and Linux.i The linked page describes the application and includes
the links for the sources.
Enhancements
to Lynx, the text based web browser to function as a screen reader
supporting both English and local language text. The linked page has details
on the sources as well.
8.
Development relating to PERL. (Linux)
PERL
modules to support processing Indian language text. The PERL application
is written using the multilingual editor as a .llf file. This file
is processed through a preprocessor module and directly executed by the
standard PERL interpreter.
9.
Applications for the web. (Linux- PERL, Java)
Search
applet. Java based user interfaces supporting data entry in Indian
languages on a web browser. The source is an example of how the pcf font
classes are utilized to render Indian language text in an applet downloaded
from a server.