The problem of text to speech output has received considerable
attention in recent years. Several good solutions are available using
hardware as well as software means. The phonetic nature of Indian
languages allows phoneme based speech synthesis to be effected with sufficient
clarity. A simple concatenation of phoneme sounds is surprisingly effective
though lacking in intonation.
The
IITM software uses syllable level representation of the text and each syllable
directly translates into a sound that can be synthesized or simply played
from a prerecorded piece of audio.
To begin
with, the IITM development team has experimented with the MBROLA speech
synthesizer software and has found that clear speech output may be obtained
using this phoneme and diphone based approach to speech synthesis.
The MBROLA system produces synthesized speech from a representation of text
known as the .pho format. A text to speech system would use some algorithm
to convert the text into a .pho file which is fed to the synthesizer.
the .pho file is a highly compressed representation of the sound output and
is remarkably compact in size, being a very efficient compression mechanism
as well!
The IIT Madras
software has a utility to convert the local language representation (.llf)
to a .pho file by a simple table look up. Therefore speech output may be
obtained on the fly efficiently and quickly. Mbrola supports multilingual
speech output by allowing programs to dynamically choose the diphone data
base for the language. Unfortunately the databases required for Indian languages
are not yet available for use with Mbrola. Hence the IITM team had experimented
with other available data bases where the phonemes are close to the phonemes
of Indian languages. We have found that the phonemes of Swedish are well
suited to produce speech output in Indian Languages. English and American
voices do not quite admit of the pronunciation required for the Indian Aksharas.
You can get
an idea of the quality of synthesized speech output by hearing the
audio clip of the passage shown below. There are three languages in
the text: Tamil,Telugu and Hindi. The audio is in the real-audio format.
The synthesized
speech output corresponding to the above
Real Audio Format mp3 format
You may also
download the .pho file corresponding
to this output and play it yourself on your system if you have MBROLA installed.
The .pho files are highly compressed representations of the required sound
output. You will have to use the Swedish voice data base (SW1) when
playing the downloaded .pho file.
The development
team at IIT Madras has made available several useful applications in Indian
languages which permit speech output as a part of the user interface. These
also include standard applications in English which run in text mode under
DOS as well as Linux. Many of the applications in English have also been
enabled to run under Windows9X/2000/XP and screen reading functions supported
by JAWS for DOS which has been adpated to work on a PC without the need for
an external synthesizer.
This approach
permits virtually any text based internet application (typically those running
under Unix) to also work with Windows through the use of porting tools from
Unix to Windows provided by Cygwin. IIT Madras has developed the interface
for JAWS for DOS to work with a sound card on Win9X/2000/XP systems
and this opens up several possibilities for the Visually handicapped to learn
to use computers. Many of the applications support Indian language based user
interfaces and these also offer text to speech capabilities.
Given below
is a list of useful applications for the Visually handicapped, developed at
the Systems Development Laboratory, IIT Madras. Among these, applications
which are English based, are already available for use under the UNIX platform
but have been ported to the Windows environment and enhanced to work with
the sound hardware on a PC.
1. The Multilingual
Editor with speech enhancements.
2. A sound enhanced
Web Browser Based on Lynx.
3. PC Pine working
with JAWS for DOS.
4. A utility to speak
out text from a file prepared using the IITM software.
5. Jaws for Dos
without an external synthesizer.
All these
applications are distributed free for the benefit of the visually handicapped
in the Southeast Asian region. The applications themselves are discussed
in separate pages at this web site.
It must
be noted that MBROLA is a superb piece of software providing high quality
voice output. As of now, we have not generated a special Indian language
data base for use with MBROLA but have used one of the existing voice data
bases (Swedish). With suitable Indian languages data bases for Mbrola, the
quality of the speech output can come very close to natural speech and the
quality of the present speech output may be improved to sound more like how
Indians speak! This work is being taken up and will be completed in due course.
IIT Madras would be very happy to interact with other teams working in the
area of speech synthesis using Mbrola.