casm.learn.FittingData

class casm.learn.FittingData(X, y, cv, sample_weight=[], scoring=None, penalty=0.0, tdata=None)[source]

FittingData holds feature values, target values, sample weights, etc. used to solve:

L*X * b = L*y

a weighted linear model where the weights are given by W = L * L.transpose().

X

The training input samples (correlations).

Type:: array-like of shape (n_samples, n_features)

y

The target values (property values).

Type:: array-like of shape: (n_samples, 1)

cv

Provides train/test splits

Type:: cross-validation generator or an iterable

n_samples

The number of samples / target values (number of rows in X)

Type:: int

n_features

The number of features (number of columns in X)

Type:: int

W

Contains sample weights.

Type:: array-like of shape: (n_samples, n_samples)

L

Used to generate weighted_X and weighted_y, W = L * L.transpose().

Type:: array-like of shape: (n_samples, n_samples)

weighted_X

Weighted training input data, weighted_X = L*x.

Type:: array-like of shape: (n_samples, n_features)

weighted_y

Weighted target values, weighted_y = L*y.

Type:: array-like of shape: (n_samples, 1)

scoring

A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().

Type:: string, callable or None, optional, default: None

penalty

The CV score is increased by ‘penalty*(number of selected basis function)’

Type:: float, optional, default=0.0

data

Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

Type:: pandas.DataFrame, optional, default=None

Parameters:

X (array-like of shape (n_samples, n_features)) – The training input samples (correlations).
y (array-like of shape: (n_samples, 1)) – The target values (property values).
cv (cross-validation generator or an iterable) – Provides train/test splits
sample_weight (None, 1-d array-like of shape: (n_samples, 1), or 2-d array-like of shape: (n_samples, n_samples)) –
Sample weights.

if sample_weight is None: (default, unweighted)
W = np.matlib.eye(N)

if sample_weight is 1-dimensional:
W = np.diag(sample_weight)*Nvalue/np.sum(sample_weight)

if sample_weight is 2-dimensional (must be Hermitian, positive-definite):
W = sample_weight*Nvalue/np.sum(sample_weight)
scoring (string, callable or None, optional, default=None) – A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().
penalty (float, optional, default=0.0) – The CV score is increased by ‘penalty*(number of selected basis function)’
tdata (TrainingData instance, optional, default=None) – Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

__init__(X, y, cv, sample_weight=[], scoring=None, penalty=0.0, tdata=None)[source]

Parameters:

X (array-like of shape (n_samples, n_features)) – The training input samples (correlations).
y (array-like of shape: (n_samples, 1)) – The target values (property values).
cv (cross-validation generator or an iterable) – Provides train/test splits
sample_weight (None, 1-d array-like of shape: (n_samples, 1), or 2-d array-like of shape: (n_samples, n_samples)) –
Sample weights.

if sample_weight is None: (default, unweighted)
W = np.matlib.eye(N)

if sample_weight is 1-dimensional:
W = np.diag(sample_weight)*Nvalue/np.sum(sample_weight)

if sample_weight is 2-dimensional (must be Hermitian, positive-definite):
W = sample_weight*Nvalue/np.sum(sample_weight)
scoring (string, callable or None, optional, default=None) – A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().
penalty (float, optional, default=0.0) – The CV score is increased by ‘penalty*(number of selected basis function)’
tdata (TrainingData instance, optional, default=None) – Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

Methods

__init__(X, y, cv[, sample_weight, scoring, ...])

param X:: The training input samples (correlations).