'KALDI'

History of the Kaldi project

Kaldi began its existence in the 2009 Johns Hopkins University workshop cumbersomely titled "Low Development Cost, High Quality Speech Recognition for New Languages and Domains" (see Acknowledgements).

The focus of that project was Subspace Gaussian Mixture Model (SGMM) based modeling and some investigations into lexicon learning. The software which is now Kaldi began to be developed there, but the recipe we developed at that time was still dependent on HTK. A list of participants in that workshop, official and unofficial, is (alphabetically by last name):

Mohit Agarwal, Pinar Akyazi, Lukas Burget, Arnab Ghoshal, Ondrej Glembek, Nagendra Goel, Martin Karafiat, Feng Kai, Daniel Povey, Ariya Rastrow, Richard C. Rose, Petr Schwarz, Samuel Thomas.

Some of the participants of that workshop agreed to meet again in the summer of 2010 in Brno, Czech Republic (hosted by the Brno University of Technology). The aim of that workshop was to create a recipe based on the work done in 2009 that was clean and releasable, and to create a general-purpose speech toolkit as a byproduct. The problem we were trying to solve was that our previous recipe was based on disparate scripts involving both HTK and our own early "Kaldi" code, and was not easy to encapsulate. We also felt that a well-engineered, modern, general-purpose speech toolkit with an open license would be an asset to the speech-recognition community. During August of 2010 the following group of people met in Brno to work on this (again alphabetically):

Pinar Akyazi, Lukas Burget, Gilles Boullianne, Ondrej Glembek, Arnab Ghoshal, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Daniel Povey, Yanmin Qian, Petr Schwarz, Jan Silowsky, Georg Stemmer, and Karel Vesely.

We also had some remote help around this time and shortly afterward, from Sandeep Boda, Sandeep Reddy and Haihua Xu (who helped with coding, code cleanup and documentation); we were visited by Michael Riley (who helped us to understand OpenFst and gave some lectures on FSTs), and would like to acknowledge the help of Honza Cernocky (for allowing us to have the workshop and helping to organize it), Renata Kohlova (administration), and Tomas Kasparek (system administration). It is possible that this list of contributors contains oversights; any important omissions are unlikely to be intentional.

A lot of code was written during the summer of 2010 but we still did not have a complete working system. Some of the participants of the 2010 workshop continued working to complete the toolkit and get a working set of training scripts. The code was released on May 14th, 2011.

Acknowledgements

The JHU 2009 workshop was supported by National Science Foundation Grant Number IIS-0833652, with supplemental funding from Google Research, DARPA's GALE program and the Johns Hopkins University Human Language Technology Center of Excellence. BUT researchers were partially supported during this time by Czech Ministry of Trade and Commerce project no. FR-TI1/034, Grant Agency of Czech Republic project no. 102/08/0707, and Czech Ministry of Education project no. MSM0021630528. Arnab Ghoshal was partially supported by the European Community's Seventh Framework Programme under grant agreement number 213850 (SCALE).

The work of BUT researchers on Kaldi is currently supported by the Technology Agency of the Czech Republic under project No. TA01011328.

We would like to acknowledge the support of Geoffrey Zweig and Alex Acero at Microsoft Research, as well as the generosity of Henrique (Rico) Malvar in allowing the use of his FFT code. Thanks are also due to Patrick Nguyen for his help in organizing the JHU'09 workshop and with the Wall Street Journal recipe. We would also like to acknowledge the help of faculty and staff at Johns Hopkins University's Center for Language and Speech Processing during the JHU'09 workshop: particularly Desiree Cleves, Sanjeev Khudanpur and the late Fred Jelinek.

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines