casm.learn.FittingData

class casm.learn.FittingData(X, y, cv, sample_weight=[], scoring=None, penalty=0.0, tdata=None)[source]

FittingData holds feature values, target values, sample weights, etc. used to solve:

L*X * b = L*y

a weighted linear model where the weights are given by W = L * L.transpose().

X

The training input samples (correlations).

Type

array-like of shape (n_samples, n_features)

y

The target values (property values).

Type

array-like of shape: (n_samples, 1)

cv

Provides train/test splits

Type

cross-validation generator or an iterable

n_samples

The number of samples / target values (number of rows in X)

Type

int

n_features

The number of features (number of columns in X)

Type

int

W

Contains sample weights.

Type

array-like of shape: (n_samples, n_samples)

L

Used to generate weighted_X and weighted_y, W = L * L.transpose().

Type

array-like of shape: (n_samples, n_samples)

weighted_X

Weighted training input data, weighted_X = L*x.

Type

array-like of shape: (n_samples, n_features)

weighted_y

Weighted target values, weighted_y = L*y.

Type

array-like of shape: (n_samples, 1)

scoring

A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().

Type

string, callable or None, optional, default: None

penalty

The CV score is increased by ‘penalty*(number of selected basis function)’

Type

float, optional, default=0.0

data

Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

Type

pandas.DataFrame, optional, default=None

Parameters
  • X (array-like of shape (n_samples, n_features)) – The training input samples (correlations).

  • y (array-like of shape: (n_samples, 1)) – The target values (property values).

  • cv (cross-validation generator or an iterable) – Provides train/test splits

  • sample_weight (None, 1-d array-like of shape: (n_samples, 1), or 2-d array-like of shape: (n_samples, n_samples)) –

    Sample weights.

    if sample_weight is None: (default, unweighted)

    W = np.matlib.eye(N)

    if sample_weight is 1-dimensional:

    W = np.diag(sample_weight)*Nvalue/np.sum(sample_weight)

    if sample_weight is 2-dimensional (must be Hermitian, positive-definite):

    W = sample_weight*Nvalue/np.sum(sample_weight)

  • scoring (string, callable or None, optional, default=None) – A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().

  • penalty (float, optional, default=0.0) – The CV score is increased by ‘penalty*(number of selected basis function)’

  • tdata (TrainingData instance, optional, default=None) – Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

__init__(X, y, cv, sample_weight=[], scoring=None, penalty=0.0, tdata=None)[source]
Parameters
  • X (array-like of shape (n_samples, n_features)) – The training input samples (correlations).

  • y (array-like of shape: (n_samples, 1)) – The target values (property values).

  • cv (cross-validation generator or an iterable) – Provides train/test splits

  • sample_weight (None, 1-d array-like of shape: (n_samples, 1), or 2-d array-like of shape: (n_samples, n_samples)) –

    Sample weights.

    if sample_weight is None: (default, unweighted)

    W = np.matlib.eye(N)

    if sample_weight is 1-dimensional:

    W = np.diag(sample_weight)*Nvalue/np.sum(sample_weight)

    if sample_weight is 2-dimensional (must be Hermitian, positive-definite):

    W = sample_weight*Nvalue/np.sum(sample_weight)

  • scoring (string, callable or None, optional, default=None) – A string or a scorer callable object / function with signature scorer(estimator, X, y). The parameter for sklearn.model_selection.cross_val_score, default = None, uses estimator.score().

  • penalty (float, optional, default=0.0) – The CV score is increased by ‘penalty*(number of selected basis function)’

  • tdata (TrainingData instance, optional, default=None) – Optionally, store TrainingData.data with weighted_X and weighted_y data added. No checks are made for consistency of tdata.X, tdata.y and X and y or other parameters.

Methods

__init__(X, y, cv[, sample_weight, scoring, …])

param X

The training input samples (correlations).