incrBinaryClassification
Incrementally perform binary classification using stochastic gradient descent (SGD).
Usage
var incrBinaryClassification = require( '@stdlib/ml/incr/binary-classification' );
incrBinaryClassification( N[, options] )
Returns an accumulator function
which incrementally performs binary classification using stochastic gradient descent.
// Create an accumulator for performing binary classification on 3-dimensional data:
var accumulator = incrBinaryClassification( 3 );
The function accepts the following options
:
intercept:
boolean
indicating whether to include an intercept. Iftrue
, an element equal to one is implicitly added to each provided feature vector (note, however, that the model does not perform regularization of the intercept term). Iffalse
, the model assumes that feature vectors are already centered. Default:true
.lambda: regularization parameter. The regularization parameter determines the amount of shrinkage inflicted on the model coefficients. Higher values reduce the variance of the model coefficient estimates at the expense of introducing bias. Default:
1.0e-4
.learningRate: an array-like object containing the learning rate function and associated parameters. The learning rate function decides how fast or slow the model coefficients will be updated toward the optimal coefficients. Must be one of the following:
['constant', ...]
: constant learning rate function. To set the learning rate, provide a second array element. By default, when the learn rate function is 'constant', the learning rate is set to0.02
.['basic']
: basic learning rate function according to the formula10/(10+t)
wheret
is the current iteration.['invscaling', ...]
: inverse scaling learning rate function according to the formulaeta0/pow(t, power_t)
whereeta0
is the initial learning rate andpower_t
is the exponent controlling how quickly the learning rate decreases. To set the initial learning rate, provide a second array element. By default, the initial learning rate is0.02
. To set the exponent, provide a third array element. By default, the exponent is0.5
.['pegasos']
: Pegasos learning rate function according to the formula1/(lambda*t)
wheret
is the current iteration andlambda
is the regularization parameter.
Default:
['basic']
.loss: loss function. Must be one of the following:
hinge
: hinge loss function. Corresponds to a soft-margin linear Support Vector Machine (SVM), which can handle non-linearly separable data.log
: logistic loss function. Corresponds to Logistic Regression.modifiedHuber
: Huber loss function variant for classification.perceptron
: hinge loss function without a margin. Corresponds to the original perceptron by Rosenblatt (1957).squaredHinge
: squared hinge loss function SVM (L2-SVM).
Default:
'log'
.
By default, the model contains an intercept term. To omit the intercept, set the intercept
option to false
:
var array = require( '@stdlib/ndarray/array' );
// Create a model with the intercept term:
var acc = incrBinaryClassification( 2, {
'intercept': true
});
var coefs = acc( array( [ 1.4, 0.5 ] ), 1 );
// returns <ndarray>
var dim = coefs.length;
// returns 3
// Create a model without the intercept term:
acc = incrBinaryClassification( 2, {
'intercept': false
});
coefs = acc( array( [ 1.4, 0.5 ] ), -1 );
// returns <ndarray>
dim = coefs.length;
// returns 2
accumulator( x, y )
If provided a feature vector x
and response value y
(either +1
or -1
), the accumulator function updates a binary classification model; otherwise, the accumulator function returns the current binary classification model coefficients.
var array = require( '@stdlib/ndarray/array' );
// Create an accumulator:
var acc = incrBinaryClassification( 2 );
// Provide data to the accumulator...
var x = array( [ 1.0, 0.0 ] );
var coefs = acc( x, -1 );
// returns <ndarray>
x.set( 0, 0.0 );
x.set( 1, 1.0 );
coefs = acc( x, 1 );
// returns <ndarray>
x.set( 0, 0.5 );
x.set( 1, 1.0 );
coefs = acc( x, 1 );
// returns <ndarray>
coefs = acc();
// returns <ndarray>
accumulator.predict( X[, type] )
Computes predicted response values for one or more observation vectors X
.
var array = require( '@stdlib/ndarray/array' );
// Create a model with the intercept term:
var acc = incrBinaryClassification( 2 );
// ...
var label = acc.predict( array( [ 0.5, 2.0 ] ) );
// returns <ndarray>
Provided an ndarray
having shape (..., N)
, where N
is the number of features, the returned ndarray
has shape (...)
(i.e., the number of dimensions is reduced by one) and data type float64
. For example, if provided a one-dimensional ndarray
, the method returns a zero-dimensional ndarray
whose only element is the predicted response value.
By default, the method returns the predict label (type='label'
). In order to return a prediction probability of a +1
response value given either the logistic (log
) or modified Huber (modifiedHuber
) loss functions, set the second argument to 'probability'
.
var array = require( '@stdlib/ndarray/array' );
// Create a model with the intercept term:
var acc = incrBinaryClassification( 2, {
'loss': 'log'
});
// ...
var phat = acc.predict( array( [ 0.5, 2.0 ] ), 'probability' );
// returns <ndarray>
In order to return the linear predictor (i.e., the signed distance to the hyperplane, which is computed as the dot product between the model coefficients and the provided feature vector x
, plus the intercept), set the second argument to 'linear'
.
var array = require( '@stdlib/ndarray/array' );
// Create a model with the intercept term:
var acc = incrBinaryClassification( 2, {
'loss': 'log'
});
// ...
var lp = acc.predict( array( [ 0.5, 2.0 ] ), 'linear' );
// returns <ndarray>
Given a feature vector x = [x_0, x_1, ...]
and model coefficients c = [c_0, c_1, ...]
, the linear predictor is equal to (x_0*c_0) + (x_1*c_1) + ... + c_intercept
.
Notes
- The underlying binary classification model performs L2 regularization of model coefficients, shrinking them toward zero by penalizing their squared euclidean norm.
- Stochastic gradient descent is sensitive to the scaling of the features. One is advised to either scale each feature to
[0,1]
or[-1,1]
or to transform each feature into z-scores with zero mean and unit variance. One should keep in mind that the same scaling has to be applied to training data in order to obtain accurate predictions. - In general, the more data provided to an accumulator, the more reliable the model predictions.
Examples
var normal = require( '@stdlib/random/base/normal' );
var binomial = require( '@stdlib/random/base/binomial' );
var array = require( '@stdlib/ndarray/array' );
var exp = require( '@stdlib/math/base/special/exp' );
var incrBinaryClassification = require( '@stdlib/ml/incr/binary-classification' );
// Create a new accumulator:
var acc = incrBinaryClassification( 2, {
'intercept': true,
'lambda': 1.0e-3,
'loss': 'log'
});
// Incrementally update the classification model...
var phat;
var x;
var i;
for ( i = 0; i < 10000; i++ ) {
x = array( [ normal( 0.0, 1.0 ), normal( 0.0, 1.0 ) ] );
phat = 1.0 / ( 1.0+exp( -( ( 3.0*x.get(0) ) - ( 2.0*x.get(1) ) + 1.0 ) ) );
acc( x, ( binomial( 1, phat ) ) ? 1.0 : -1.0 );
}
// Retrieve model coefficients:
var coefs = acc();
console.log( 'Feature coefficients: %d, %d', coefs.get( 0 ), coefs.get( 1 ) );
console.log( 'Intercept: %d', coefs.get( 2 ) );
// Predict new observations...
x = array( [ [ 0.9, 0.1 ], [ 0.1, 0.9 ], [ 0.9, 0.9 ] ] );
var out = acc.predict( x );
console.log( 'x = [%d, %d]; label = %d', x.get( 0, 0 ), x.get( 0, 1 ), out.get( 0 ) );
console.log( 'x = [%d, %d]; label = %d', x.get( 1, 0 ), x.get( 1, 1 ), out.get( 1 ) );
console.log( 'x = [%d, %d]; label = %d', x.get( 2, 0 ), x.get( 2, 1 ), out.get( 2 ) );
out = acc.predict( x, 'probability' );
console.log( 'x = [%d, %d]; P(y=1|x) = %d', x.get( 0, 0 ), x.get( 0, 1 ), out.get( 0 ) );
console.log( 'x = [%d, %d]; P(y=1|x) = %d', x.get( 1, 0 ), x.get( 1, 1 ), out.get( 1 ) );
console.log( 'x = [%d, %d]; P(y=1|x) = %d', x.get( 2, 0 ), x.get( 2, 1 ), out.get( 2 ) );
out = acc.predict( x, 'linear' );
console.log( 'x = [%d, %d]; lp = %d', x.get( 0, 0 ), x.get( 0, 1 ), out.get( 0 ) );
console.log( 'x = [%d, %d]; lp = %d', x.get( 1, 0 ), x.get( 1, 1 ), out.get( 1 ) );
console.log( 'x = [%d, %d]; lp = %d', x.get( 2, 0 ), x.get( 2, 1 ), out.get( 2 ) );
References
- Rosenblatt, Frank. 1957. "The Perceptron–a perceiving and recognizing automaton." 85-460-1. Buffalo, NY, USA: Cornell Aeronautical Laboratory.
- Zhang, Tong. 2004. "Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms." In Proceedings of the Twenty-First International Conference on Machine Learning, 116. New York, NY, USA: Association for Computing Machinery. doi:10.1145/1015330.1015332.
- Shalev-Shwartz, Shai, Yoram Singer, Nathan Srebro, and Andrew Cotter. 2011. "Pegasos: primal estimated sub-gradient solver for SVM." Mathematical Programming 127 (1): 3–30. doi:10.1007/s10107-010-0420-4.