CMUdict
The Carnegie Mellon Pronouncing Dictionary.
The Carnegie Mellon University Pronouncing Dictionary (CMUDict), created by the Speech Group in the School of Computer Science at CMU, is "an open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words".
Usage
var cmudict = require( '@stdlib/datasets/cmudict' );
cmudict( [options] )
Returns datasets from the Carnegie Mellon Pronouncing Dictionary (CMUdict).
var data = cmudict();
/* returns
{
'dict': {...},
'phones': {...},
'symbols': [...],
'vp': {...}
}
*/
The function accepts the following options
:
data: dataset name. The following names are recognized:
- dict: the main pronouncing dictionary
- phones: manners of articulation for each sound
- symbols: complete list of ARPABET symbols used by the dictionary
- vp: verbal pronunciations of punctuation marks
To only return the main pronouncing dictionary, set the data
option to dict
.
var opts = {
'data': 'dict'
};
var data = cmudict( opts );
/* returns
{
'A': 'AH0',
'A(1)': 'EY1',
'A\'S': 'EY1 Z',
// ...
}
*/
To return only sound articulation manners, set the data
option to phones
.
var opts = {
'data': 'phones'
};
var data = cmudict( opts );
/* returns
{
'AA': 'vowel',
'AE': 'vowel',
'AH': 'vowel',
// ...
}
*/
To return only ARPABET symbols used by the dictionary, set the data
option to symbols
.
var opts = {
'data': 'symbols'
};
var data = cmudict( opts );
/* returns
[
'AA',
'AA0',
'AA1',
// ...
]
*/
To return only the verbal pronunciations of punctuation marks, set the data
option to vp
.
var opts = {
'data': 'vp'
};
var data = cmudict( opts );
/* returns
{
'!exclamation-point': 'EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T',
'"close-quote': 'K L OW1 Z K W OW1 T',
'"double-quote': 'D AH1 B AH0 L K W OW1 T',
// ...
}
*/
Notes
- Vowels carry a lexical stress marker (0: No stress, 1: Primary stress, 2: Secondary stress).
- The phoneme set is based on the ARPAbet symbol set developed for speech recognition.
Examples
var cmudict = require( '@stdlib/datasets/cmudict' );
var opts = {};
opts.data = 'phones';
console.dir( cmudict( opts ) );
opts.data = 'symbols';
console.dir( cmudict( opts ) );
opts.data = 'dict';
console.dir( cmudict( opts ) );
CLI
Usage
Usage: cmudict [options]
Options:
-h, --help Print this message.
-V, --version Print the package version.
--data name Dataset name: dict, phones, symbols, vp.
Notes
- If the
--data
option is set to a supported dataset name, the CLI prints the contents of the respective dataset file as plain text. Otherwise, the output format is newline-delimited JSON (NDJSON).
Examples
$ cmudict --data symbols
AA
AA0
AA1
AA2
...
License
The data files (databases) and their contents are licensed under a BSD-2-Clause license. The software is licensed under Apache License, Version 2.0.