A tutorial
on Fonts for Indian languages.
This presentation is a tutorial on Fonts for Indian languages and scripts.
Over the years, many different fonts have been introduced in the context
of data entry, display and printing of documents in Indian scripts. Each
font is associated with a specific data entry scheme recommended by the
designer and the software is often platform specific. Also, each designer
has put into the fonts glyphs which are adequate for simple publications.
The idea behind this tutorial is to allow our viewers get an insight into
the vagaries of fonts as well as the attempt by IIT Madras to provide a
minimum level of standardization in designing fonts for the scripts.
This tutorial covers the following topics.
A
set of fonts (for 11 scripts) with compliments from IIT Madras
(Bengali, Devanagari, Gurmukhi,
Gujarati, Kannada, Malayalam, Oriya, Tamil, Telugu, Roman Diacritics and
Urdu.
The set of fonts is provided
in these formats.
TrueType, PostScript (pfb),
Unix (BDF) and MacIntosh Truetype.
|
Points to remember
Fonts for Indian languages
cannot be based on any standard encoding specific to the syllabic writing
systems followed. Such encodings do not exist.
Font encoding makes sense
only when one character code maps into one glyph. For writing systems which
are based on syllables, a character string making up a syllable has to
be mapped into a single shape corresponding to that of the syllable. Hence,
Fonts traditionally designed for Indian languages, just include a collection
of basic shapes for the vowels, consonants and other ligatures used in
the writing system. The application has the responsibility to figure out
what glyphs should be combined to generate the shape for a syllable. Thus
the concept of the character set does not apply.
Virtually no standardization
is possible in Indian language fonts because variations in the representation
of a syllable are permitted.
|
The
concept of the font
In simple
terms, a font provides for displaying a set of symbols through well defined
shapes for each symbol. The symbol is a generic concept and the font is
an instance of specific representation of a set of symbols. Traditionally,
the symbols mentioned here have been the letters of the alphabet in a particular
language along with punctuation marks and special characters. Fonts used
to be created by craftsmen and artists during the days of printing machines
that used movable type faces. Today, fonts are created by artists and designers
who work with computer based tools.
In a font, the
specific shape for a symbol is described either in terms of a digital image
through bit maps or in terms of a filled outline. The former is called
a bit mapped font and the latter, an outline font. An outline font specifies
the shape for a symbol in mathematical terms using curves. The mathematical
description allows the shape to be drawn at different sizes by scaling
the parameters suitably. Outline fonts are increasingly being used on account
of their scalability. The descriptions result in a pictorial representation
or shape for each symbol, which is referred to as a glyph.
The number of symbols
which are displayable by a font is generally limited by the value of the
index used to access a specific shape within the font. Eight bit fonts
are limited to 256 glyphs but in computer systems recognize only a subset
which usually ranges between 96 and 240 (approximately).
Given below is a table
displaying the glyphs present in the familiar Times-New-Roman font. Notice
that the letters of the Roman alphabet along with special characters and
punctuation marks are present as glyphs in the font. The glyphs shown here
occupy positions from 32 to 255. The first 32 are not assigned in most
fonts.
Each glyph
in the font is specified by a name as well as a glyph index which locates
the glyph in an ordered arrangement of the glyphs. In the fonts for the
Roman alphabet, it is common to locate the glyph for a letter in the place
corresponding to the ASCII code of the letter. Thus, the glyph for "capital
b" i.e., "B" will be in the sixty sixth position (the locations are numbered
from zero). Most of the frequently used glyphs are seen in the first
128 locations of the font. The second half, known as the upper ASCII,
usually contains special symbols which are required in printed text to
indicate phonetic aspects of the letter or a reference symbol for footnotes
etc..
In most computer systems,
a provision is available to display text using a font that may be selected
by the user. The text to be displayed is represented through the ASCII
codes of the characters to be shown. These codes may span the range 32-126,
the usual set of values for the letters of the alphabet, or the range 160-254
normally reserved for special symbols.
There is no specific
recommendation available on what symbols should get displayed via the upper
ASCII range though the International Standards organization has recommended
that the glyphs for some of the languages of Europe and the Middle East
be assigned these locations. The term character set, is often used to refer
to the set of numeric codes assigned to the letters of the alphabet of
a language. Thus, for a specified language, the code assigned to a letter
of the alphabet will be the same in all computers so that application programs
may recognize the letter from its internal code. If the glyph location
for that letter also coincides with this code, then a one to one relationship
exists between the code for a character and its glyph. For most European
languages, one letter invariably gets represented through one glyph.
The fixing of glyph
locations for a letter of the alphabet has the most important advantage
that the text to be displayed may be shown using many different fonts.
This is precisely the idea behind word processors permitting selected text
to be displayed in a font chosen by the user.
As of today, fonts
for most of the languages of the world are limited to 8 bit codes for specifying
the glyph positions. In other words, the number of symbols (or glyphs)
required to display text in most languages is less than 256 and hence 8
bit fonts work well. Almost all the languages of Asia, Japan, China and
Korea cannot be specified through 8 bit codes for their letters, as there
are far too many of them. The Japanese character set includes some 24000
symbols while most of the scripts of India provide for as many as 12000-14000
individually differing aksharas. Later we will see how the aksharas
of Indian languages may still be handled using 8 bit fonts, i.e., fonts
supporting only up to 256 glyphs.
Back
to Contents
|
A font consists of a set
of Glyphs which are arranged in some order inside the font. It is customary
to view this order in the form of a rectangular grid.
A character in a text string
is displayed by using its code (often plain ASCII) to index into the array
and selecting the shape. However the approach taken in practice is that
the code is related to a character from a character set and the character
identified by its name in the set.
The name is used to identify
the location of the glyph with the required shape. This approach allows
the designer some freedom in designing the glyphs without having to worry
about the different character sets that the same glyph may be associated
with. |