Boston House Prices

A (corrected) dataset derived from information collected by the US Census Service concerning housing in Boston, Massachusetts (1978).

Usage

var dataset = require( '@stdlib/datasets/harrison-boston-house-prices-corrected' );

dataset()

Returns a (corrected) dataset derived from information collected by the US Census Service concerning housing in Boston, Massachusetts (1978).

var data = dataset();
/* returns
    [
        {
            'crim': 0.00632,
            'zn': 18.00,
            'indus': 2.310,
            'chas': 0,
            'nox': 0.5380,
            'rm': 6.5750,
            'age': 65.20,
            'dis': 4.0900,
            'rad': 1,
            'tax': 296.0,
            'ptratio': 15.30,
            'b': 396.90,
            'lstat': 4.98,
            'medv': 24.00,
            'cmedv': 24.00
        },
        ...
    ]
*/

Notes

  • The data consists of 15 attributes:

    • crim: per capita crime rate by town
    • zn: proportion of residential land zoned for lots over 25,000 square feet
    • indus: proportion of non-retail business acres per town
    • chas: Charles River dummy variable (1 if tract bounds river; 0 otherwise)
    • nox: nitric oxides concentration (parts per 10 million)
    • rm: average number of rooms per dwelling
    • age: proportion of owner-occupied units built prior to 1940
    • dis: weighted distances to five Boston employment centers
    • rad: index of accessibility to radial highways
    • tax: full-value property-tax rate per $10,000
    • ptratio: pupil-teacher ratio by town
    • b: 1000(Bk-0.63)^2 where Bk is the proportion of blacks by town
    • lstat: percent lower status of the population
    • medv: median value of owner-occupied homes in $1000's
    • cmedv: corrected median value of owner-occupied homes in $1000's
  • The dataset can be used to predict two dependent variables: 1) nitrous oxide level and 2) median home value.

  • The median home value field seems to be censored at 50.00 (corresponding to a median value of $50,000). Censoring is suggested by the fact that the highest median value of exactly $50,000 is reported in 16 cases, while 15 cases have values between $40,000 and $50,000. Values are rounded to the nearest hundred. Harrison and Rubinfeld do not, however, mention any censoring.

  • The dataset contains eight corrections to miscoded median values, as documented by Gilley and Pace (1996).

Examples

var Plot = require( '@stdlib/plot' );
var dataset = require( '@stdlib/datasets/harrison-boston-house-prices-corrected' );

var data;
var plot;
var opts;
var x;
var y;
var i;

data = dataset();

// Extract housing data...
x = [];
y = [];
for ( i = 0; i < data.length; i++ ) {
    x.push( data[ i ].rm );
    y.push( data[ i ].cmedv );
}

// Create a plot instance:
opts = {
    'lineStyle': 'none',
    'symbols': 'closed-circle',
    'xLabel': 'Average Number of Rooms',
    'yLabel': 'Corrected Median Value',
    'title': 'Number of Rooms vs Median Value'
};
plot = new Plot( [ x ], [ y ], opts );

// Render the plot:
console.log( plot.render( 'html' ) );

CLI

Usage

Usage: harrison-boston-house-prices-corrected [options]

Options:

  -h,    --help                Print this message.
  -V,    --version             Print the package version.
         --format fmt          Output format: 'csv' or 'ndjson'.

Notes

  • The CLI supports two output formats: comma-separated values (CSV) and newline-delimited JSON (NDJSON). The default output format is CSV.

Examples

$ harrison-boston-house-prices-corrected

References

  • Harrison, David, and Daniel L Rubinfeld. 1978. "Hedonic housing prices and the demand for clean air." Journal of Environmental Economics and Management 5 (1): 81–102. doi:10.1016/0095-0696(78)90006-2.
  • Gilley, Otis W., and R.Kelley Pace. 1996. "On the Harrison and Rubinfeld Data." Journal of Environmental Economics and Management 31 (3): 403–5. doi:10.1006/jeem.1996.0052.

License

The data files (databases) are licensed under an Open Data Commons Public Domain Dedication & License 1.0 and their contents are licensed under a Creative Commons Zero v1.0 Universal. The software is licensed under Apache License, Version 2.0.

Did you find this page helpful?