![]() |
![]() ![]() |
![]() |
![]() |
![]() |
![]() |
| Home --> Software Design issues --> Encoding Standards |
Unicode rendering examples. |
Devanagari rendering under Ubuntu and Windows Vista Tamil rendering under Ubuntu and Windows Vista |
|
Devanagari (Ubuntu)
Shown at the left is the page from Google returning results for a search of pages which refer to the Acharya site. The page displayed by Firefox under Ubuntu, shows displaced matras s well as linearized rendering of syllables. The display under Firefox
on a Vista system is proper.
|
|
Shown at the left is a page
from Google which includes Unicode Text in Tamil. The rendering under Ubuntu
is totally inappropriate. The medial vowel shapes are in the wrong place.
Worse still, the rendering of syllables with the vowel "u" are completely
wrong.
Tamil has the advantage of
a simpler script where syllable formation is relatively easy. Unfortunately,
the application uses inappropriate algorithms to render syllables.
The rendering under Windows
is correct, though one must keep in mind the fact that Unicode for Tamil
does not address all the requirements.
|
|
The display at the left is the Wordpad screen under Windows Vista. The problems of Unicode data entry are highlighted in the display. It has been possible to create identical displays for two different strings. This example shows that preparing a text string for a query may be an extremely difficult task. Zero width non joiners can bring in confusion when a syllable is linearized by the user. It turns out that Wordpad allows the entry of a zero width non joiner but Notepad does not permit the same. The problem here is that one is trying to create a syllable in two different ways, one with a single code and the other with two codes, resulting in ambiguity. The nukta character is not a linguistic entity and Unicode assignment for it is as inappropriate as the assignment of medial vowel forms. The linguistic structure demands that we assign codes so as to clearly write syllables which can be identified without confusion.
The fact that Unicode rendering is necessarily application dependent is illustrated in the two screen shots at the left. Wordpad and the Word processor under Microsoft Works are taken as applications. The rendering under wordpad is correct while the one with Word shows totally incorrect display. The applications run under Windows Vista. The assignment of Unicode values to Indian language letters is such that syllables have multibyte representations. To accommodate different renderings of the same syllable, Unicode allows the use of special characters which are known as Modifiers. The modifiers are not handled properly by different applications. Also the algorithm for rendering cannot ever be standardized due to the variations permitted in syllable representation. This is the reason why rendering is basically a responsibility of the application. Note how the medial vowel shapes are incorrectly placed s well as incorrectly rendered. |
|
Unicode rendering often goes by the assumption that it should be possible to handle arbitrarily long syllables. It is easy to force on the application, syllable formation with valid Unicode values which have only symbolic value and not strictly a linguistic value for the code. The codes for the medial vowels, the nukta are few examples. These codes can confuse the application while arbitrarily long syllables are attempted. The display at the left is a consequence of entering a series of halanth codes (almost 400) in this case when the data entry state machine starts to misbehave. A copy of the file is available for you to verify this. Please note the differences in rendering between Wordpad and Notepad. One has to accept the fact that the rendering issue cannot be totally divorced from the application. When the application decides on how a specific case will be handled, uniformity across applications is lost. While one can dismiss this
as a pathological example, one should remember that, Unicode allows a user
to compose a syllable with special modifier codes. Hence it may be virtually
impossible to discern the internal representation of a displayed string,
which information is essential while typing in query strings for searches.
|
Acharya Logo |
Local Time: 21 16 42 Kali Year 5112 Month: Makaram , Day:26 Star: Magha |