The earliest version of MaryTTS was developed around 2000 by Marc Schröder as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. For many years, it evolved and matured, first as an in-house Text-to-Speech (TTS) component, and subsequently as a fully open-source TTS platform with a growing community. Since its origins, MaryTTS has changed significantly, but it remains true to its original goal: a Modular Architecture for Research in sYynthesis written in Java.
The history of MaryTTS can be largely broken down into several phases:
Marc and the other early developers have all moved on to other institutions and projects, so little is known about the earliest days of MaryTTS. DFKI wanted to develop its own in-house solution for TTS, and the earliest incarnation of what was to become MaryTTS consisted of a collection of shell scripts piping data to each other. When Marc joined DFKI, he realized that to create a sustainable and extensible TTS platform, it needed to be redesigned and implemented in an object-oriented way. The decision was made in favor of Java (over C++), since platform portability was an important criterion.
Originally, as described in the first reference publication, MARY was using MBROLA as a back-end for diphone synthesis of German. Support for English was added by integrating code from FreeTTS, a Java port of Flite, which was in turn a light-weight runtime implementation of the venerable Festival. In a software project in 2005, students from Saarland University used MARY to create a TTS system for Tibetan.
When MARY development moved the source code from CVS to Subversion, the project was already using Apache Ant to manage its build. Around the same time, the DFKI-internal MARY code was forked into an open-source project, OpenMARY, and managed using a Trac instance at http://mary.opendfki.de/.
Development on MARY was boosted by funding from the German Research Council (DFG) for a three-year project, “PAVOQUE”. This provided the resources needed to implement an open-source unit-selection engine, which offers much higher naturalness in the synthesis output, as well as tools to create new TTS voices and even support for new languages.
The internal DFKI MARY was effectively discontinued as development efforts focused on OpenMARY. Additional funding was granted by the EU project SEMAINE, which integrated MARY into a multimodal, spoken dialog system with expressive virtual characters. Along with the final year of the PAVOQUE project and collaboration in another EU project (SSPNET), this phase marked extensive development on MaryTTS. In 2011, the development was migrated to GitHub, and Apache Maven replaced Ant as the build management tool.
A number of departures resulted in the DFKI Speech Group led by Marc Schröder closing down, with the new Multimodal Speech Processing group led by Ingmar Steiner taking over the MaryTTS project. Development on MaryTTS is currently primarily funded through research grants from the DFG, in particular the Cluster of Excellence Multimodal Computing and Interaction (MMCI) and the Collaborative Research Center SFB-1102.