Unicode for Indian
Languages: A discussion
Support for Unicode
in applications catering to Indian languages is a highly debated issue.
Though Unicode has emerged as a viable standard and is finding increasing
use all over the world, there are some real difficulties in using it in
practice for building applications supporting multilingual user interfaces
in Indian languages. The conceptual basis for Unicode, though well accepted
for the western languages (scripts), does not fully conform to the linguistic
requirements seen in our languages.
At the Systems Development
Laboratory, IIT Madras, where some meaningful multilingual solutions consistent
with the linguistic requirements for all the Indian languages have been
developed and distributed as well, there is a strong feeling that Unicode
will not really help. It is true that Unicode is a world standard proposed
and accepted by a large community of academics, professionals and users.
Unfortunately, it does not really blend with the syllabic writing systems
used in india, much less provide the means to express linguistic content
without ambiguity and in a manner that ties in well with our own understanding
of languages.
What we have tried to say
here reflects the above view.
Multilingual Computing:
A view from SDL
Introduction
Viewpoint
Idiosyncrasies
of the writing systems
Defining
Linguistic requirements
Dealing
with Text consistent with Linguistic requirements
Multilingual
computing requirements (for India)
Unicode for
Indian Languages
The
conceptual basis for Unicode
Unicode
for Indian languages/scripts
Data
entry and associated problems
Issues
in rendering Unicode
Using
a shaping engine to render Unicode text
Discussion
on sorting or collation
The
conceptual basis of the Open type font
Unicode support
in Microsoft applications
Uniscribe,
the shaping engine
Limitations
of Uniscribe
A
review of some Microsoft applications in respect of handling linguistic
content
Recommendations
for Developers of Indian language Applications
Use
of True type fonts to render Unicode Text
Can
we simplify handling Unicode text?
Guidelines
for development under Linux
Examples of Unicode
Rendering by different applications (Windows and Linux)
circa 2003 circa 2007
Summary
of Observations
The experiences of the lab in working with Unicode are summarized in the
linked page. As of this update (June 2006), one has not seen an application in any of the Indian Languages that can be cited as a satisfactory implementation based on Unicode. Though a number of developers are counting on using Unicode, it is not going to be easy to effect Localization of our languages, consistent with the requirements of Computing with Indian Languages.
|
Note
These pages were added to
the acharya web site during the period March-April 2003
The discussions deal with
conceptual issues.
____________________
We have tried to provide
as much information as possible to relate many different aspects of computing
in Indian languages with Unicode. Since the discussions relate to text
representation in terms of syllables, repetition of the basic principles
of syllabic writing systems discussed in the linked pages is unavoidable.
Each topic is in a way self contained.
Examples involving Microsoft
applications were generated under WinXP/2000 and MIcrosoft Office 2000.
It is certainly possible that the inconsistencies we have reported have
already been taken care of in the proposed (newer) versions of the software
available today (June 2005).
Text displayed in the vernacular
to illustrate specific linguistic issues was generated using the Multilingual
Editor developed in the lab. The text is actually sent to your browser
as an image generated on the fly, to allow more or less guaranteed viewing
of the text on any browser.
_______________________
Acknowledgment
Special thanks to Sri.
Karthik Venkatesan, a friend of the lab who gave valuable suggestions
in organizing these pages.
|