CMUdict
The Carnegie Mellon Pronouncing Dictionary.
The Carnegie Mellon University Pronouncing Dictionary (CMUDict), created by the Speech Group in the School of Computer Science at CMU, is "an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words".
Usage
var cmudict = require( '@stdlib/datasets/cmudict' );
cmudict( [options] )
Returns datasets from the Carnegie Mellon Pronouncing Dictionary (CMUdict).
var data = cmudict();
/* returns
    {
        'dict': {...},
        'phones': {...},
        'symbols': [...],
        'vp': {...}
    }
*/
The function accepts the following options:
- data: dataset name. The following names are recognized: - dict: the main pronouncing dictionary.
- phones: manners of articulation for each sound.
- symbols: complete list of ARPABET symbols used by the dictionary.
- vp: verbal pronunciations of punctuation marks.
 
To only return the main pronouncing dictionary, set the data option to dict.
var opts = {
    'data': 'dict'
};
var data = cmudict( opts );
/* returns
    {
        'A': 'AH0',
        'A(1)': 'EY1',
        'A\'S': 'EY1 Z',
        // ...
    }
*/
To return only sound articulation manners, set the data option to phones.
var opts = {
    'data': 'phones'
};
var data = cmudict( opts );
/* returns
    {
        'AA': 'vowel',
        'AE': 'vowel',
        'AH': 'vowel',
        // ...
    }
*/
To return only ARPABET symbols used by the dictionary, set the data option to symbols.
var opts = {
    'data': 'symbols'
};
var data = cmudict( opts );
/* returns
    [
        'AA',
        'AA0',
        'AA1',
        // ...
    ]
*/
To return only the verbal pronunciations of punctuation marks, set the data option to vp.
var opts = {
    'data': 'vp'
};
var data = cmudict( opts );
/* returns
    {
        '!exclamation-point': 'EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T',
        '"close-quote': 'K L OW1 Z K W OW1 T',
        '"double-quote': 'D AH1 B AH0 L K W OW1 T',
        // ...
    }
*/
Notes
- Vowels carry a lexical stress marker (0: No stress, 1: Primary stress, 2: Secondary stress).
- The phoneme set is based on the ARPAbet symbol set developed for speech recognition.
Examples
var cmudict = require( '@stdlib/datasets/cmudict' );
var opts = {};
opts.data = 'phones';
console.dir( cmudict( opts ) );
opts.data = 'symbols';
console.dir( cmudict( opts ) );
opts.data = 'dict';
console.dir( cmudict( opts ) );
CLI
Usage
Usage: cmudict [options]
Options:
  -h,    --help                Print this message.
  -V,    --version             Print the package version.
         --data name           Dataset name: dict, phones, symbols, vp.
Notes
- If the --dataoption is set to a supported dataset name, the CLI prints the contents of the respective dataset file as plain text. Otherwise, the output format is newline-delimited JSON (NDJSON).
Examples
$ cmudict --data symbols
AA
AA0
AA1
AA2
...
License
The data files (databases) and their contents are licensed under a BSD-2-Clause license. The software is licensed under Apache License, Version 2.0.