**************
Introduction
**************

This directory contains data based on the University of Wisconsin X-ray Microbeam Database (referred to here as XRMB).

The original XRMB manual can be found here:  http://www.haskins.yale.edu/staff/gafos_downloads/ubdbman.pdf

We acknowledge John Westbury for providing the original data and for permitting this post-processed version to be redistributed.  The original data collection was supported (in part) by research grant number R01 DC 00820 from the National Institute of Deafness and Other Communicative Disorders, U.S. National Institutes of Health.

The post-processed data provided here was produced as part of work supported in part by NSF grant IIS-1321015.

Some of the original XRMB articulatory data was missing due to issues such as pellet tracking errors.  The data has been reconstructed in using the algorithm described in this paper:  

Wang, Arora, and Livescu, "Reconstruction of articulatory measurements with smoothed low-rank matrix completion," SLT 2014.
http://ttic.edu/livescu/papers/wang_SLT2014.pdf

The data provided here has been used for multi-view acoustic feature learning in this paper:

Wang, Arora, Livescu, and Bilmes, "Unsupervised learning of acoustic features via deep canonical correlation analysis," ICASSP 2015.
http://ttic.edu/livescu/papers/wang_ICASSP2015.pdf

If you use this version of the data, please cite the papers above.


**************
File description
**************

http://ttic.edu/livescu/XRMB_data/full/XRMBf2KALDI_window7_single1.mat
http://ttic.edu/livescu/XRMB_data/full/XRMBf2KALDI_window7_single2.mat

These two files contain the multi-view data, (X1,X2) for training, (XV1, XV2) for tuning, (XTe1, XTe2) for testing.  View 1 (X1, XV1, XTe1) is the acoustic view (found in the first file), whereas view 2 is the articulation view (found in the second file).  The matrices are of dimension # samples x # features.  The features for both views are concatenated over a 7-frame window.  trainID, tuneID, testID are the corresponding speaker labels (found in the 2nd file).  Phone labels 0-38 are also provided (<train,tune,test>Label), but were not used in the feature learning in the paper above.  Silence frames have been excluded from multi-view training and are not included in this file.

http://ttic.edu/livescu/XRMB_data/full/XRMBf2KALDI_recog.mat

Contains the data used for phonetic recognition experiments.  Here we only include the acoustic view.  NEWMFCC is the acoustic data, and NEWLABEL contains the phone labels.  Silence frames are included.

Note: XV1 and XTe1 in the first file overlap with (are subsampled from) NEWMFCC in the second file in a complicated way.

The phone labels 0-40 correspond to
'"AA"'    '"AE"'    '"AH"'    '"AO"'    '"AW"'    '"AY"'  '"B"'    '"CH"'    '"D"'    '"DH"'    '"EH"'    '"ER"'    '"EY"'    '"F"'    '"G"'    '"HH"'    '"IH"'    '"IY"'    '"JH"'    '"K"'    '"L"'    '"M"'    '"N"'    '"NG"'    '"OW"'    '"OY"'    '"P"'    '"R"'    '"S"'    '"SH"'  '"T"'    '"TH"'    '"UH"'    '"UW"'    '"V"'    '"W"'   '"Y"'    '"Z"'    '"ZH"'    '"cg"'    '"sp"'