casm.learn.FittingData¶

class casm.learn.FittingData(X, y, cv, sample_weight=[], scoring=None, penalty=0.0, tdata=None)[source]¶

FittingData holds feature values, target values, sample weights, etc. used to solve:

L*X * b = L*y

a weighted linear model where the weights are given by W = L * L.transpose().

X¶

The training input samples (correlations).

Type: array-like of shape (n_samples, n_features)

y¶

The target values (property values).

Type: array-like of shape: (n_samples, 1)

cv¶

Provides train/test splits

Type: cross-validation generator or an iterable

n_samples¶

The number of samples / target values (number of rows in X)

Type: int

n_features¶

The number of features (number of columns in X)

Type: int

W¶

Contains sample weights.

Type: array-like of shape: (n_samples, n_samples)

L¶

Used to generate weighted_X and weighted_y, W = L * L.transpose().

Type: array-like of shape: (n_samples, n_samples)

weighted_X¶

Weighted training input data, weighted_X = L*x.

Type: array-like of shape: (n_samples, n_features)

weighted_y¶

Weighted target values, weighted_y = L*y.

Type: array-like of shape: (n_samples, 1)

scoring¶

A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().

Type: string, callable or None, optional, default: None

penalty¶

The CV score is increased by ‘penalty*(number of selected basis function)’

Type: float, optional, default=0.0

data¶

Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

Type: pandas.DataFrame, optional, default=None

Parameters

X (array-like of shape (n_samples, n_features)) – The training input samples (correlations).
y (array-like of shape: (n_samples, 1)) – The target values (property values).
cv (cross-validation generator or an iterable) – Provides train/test splits
sample_weight (None, 1-d array-like of shape: (n_samples, 1), or 2-d array-like of shape: (n_samples, n_samples)) –
Sample weights.

if sample_weight is None: (default, unweighted)
W = np.matlib.eye(N)

if sample_weight is 1-dimensional:
W = np.diag(sample_weight)*Nvalue/np.sum(sample_weight)

if sample_weight is 2-dimensional (must be Hermitian, positive-definite):
W = sample_weight*Nvalue/np.sum(sample_weight)
scoring (string, callable or None, optional, default=None) – A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().
penalty (float, optional, default=0.0) – The CV score is increased by ‘penalty*(number of selected basis function)’
tdata (TrainingData instance, optional, default=None) – Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

__init__(X, y, cv, sample_weight=[], scoring=None, penalty=0.0, tdata=None)[source]¶

Parameters

X (array-like of shape (n_samples, n_features)) – The training input samples (correlations).
y (array-like of shape: (n_samples, 1)) – The target values (property values).
cv (cross-validation generator or an iterable) – Provides train/test splits
sample_weight (None, 1-d array-like of shape: (n_samples, 1), or 2-d array-like of shape: (n_samples, n_samples)) –
Sample weights.

if sample_weight is None: (default, unweighted)
W = np.matlib.eye(N)

if sample_weight is 1-dimensional:
W = np.diag(sample_weight)*Nvalue/np.sum(sample_weight)

if sample_weight is 2-dimensional (must be Hermitian, positive-definite):
W = sample_weight*Nvalue/np.sum(sample_weight)
scoring (string, callable or None, optional, default=None) – A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().
penalty (float, optional, default=0.0) – The CV score is increased by ‘penalty*(number of selected basis function)’
tdata (TrainingData instance, optional, default=None) – Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

Methods

__init__(X, y, cv[, sample_weight, scoring, …])

param X: The training input samples (correlations).