Spam Assassin
The Spam Assassin public mail corpus.
Usage
var corpus = require( '@stdlib/datasets/spam-assassin' );
corpus()
Returns the Spam Assassin public mail corpus.
var data = corpus();
// returns [{...},{...},...]
Each array element has the following fields:
id: message id (relative to messagegroup)group: message groupchecksum: object containing checksum infotext: message text (including headers)
The message group may be one of the following:
easy-ham-1: easier to detect non-spam e-mails (2500 messages)easy-ham-2: easier to detect non-spam e-mails collected at a later date (1400 messages)hard-ham-1: harder to detect non-spam e-mails (250 messages)spam-1: spam e-mails (500 messages)spam-2: spam e-mails collected at a later date (1396 messages)
The checksum object contains the following fields:
type: checksum type (e.g., MD5)value: checksum value
Examples
var corpus = require( '@stdlib/datasets/spam-assassin' );
var data;
var i;
data = corpus();
for ( i = 0; i < data.length; i++ ) {
    console.log( 'Character Count: %d', data[ i ].text.length );
}
CLI
Usage
Usage: spam-assassin [options]
Options:
  -h,    --help                Print this message.
  -V,    --version             Print the package version.
         --format fmt          Output format: 'txt' or 'ndjson'.
Notes
- The CLI supports two output formats: plain text (
txt) and newline-delimited JSON (NDJSON). The default output format istxt. 
Examples
$ spam-assassin
License
The data files (databases) are licensed under an Open Data Commons Public Domain Dedication & License 1.0 and their contents are licensed under Creative Commons Zero v1.0 Universal. The software is licensed under Apache License, Version 2.0.