CASM
1.1.0
A Clusters Approach to Statistical Mechanics
|
#include <ConfigIONovelty.hh>
A DatumFormatter class to measure the 'novelty' of a configuration with respect to a population of configurations Larger numbers indicate a more novel configuration, and a very large number (>~100) indicates a configuration that is linearly independent from the population (in terms of its correlations)
The novelty is based on the Mahalanobis distance. Given a population of correlations, indexed by 'i' and basis functions, indexed by 'j', that depend on the DoFs of those configurations, we define a correlation matrix
corr_mat(i,j) = <function 'j' evaluated in configuration 'i'>
and an average correlation, 'avg_corr', which is taken by averaging over the rows of corr_mat
The covariance of the correlations is given as
covar = corr_mat.transpose()*corr_mat/Nconfig - avg_corr.transpose()*avg_corr
The Mahalanobis distance, 'M', for a particular correlation vector, 'C' (which corresponds to a particular correlation) is
M = sqrt( (C-avg_corr) * inv(covar) * (C-avg_corr).transpose() )
We call inv(covar) the 'Gram matrix', which is conventional terminology for scalar products.
We use a slightly different definition for the novelty measure, 'N', which is
N = sqrt( (C-avg_corr) * inv(covar+epsilon*identity) * (C-avg_corr).transpose() / Ncorr )
Which is regularized by adding a small matrix (epsilon*identity, where epsilon is ~1e-5/Ncorr) and divided by Ncorr, in order to get a number that does not depend strongly on the number of basis functions in the basis set
Definition at line 52 of file ConfigIONovelty.hh.
Public Types | |
typedef DataObject | DataObject |
typedef long | difference_type |
typedef DataFormatterDictionary< DataObject, BaseDatumFormatter< DataObject > > | DictType |
Public Member Functions | |
Novelty () | |
std::unique_ptr< Novelty > | clone () const |
double | evaluate (const Configuration &_config) const override |
bool | init (const Configuration &_tmplt) const override |
std::string | short_header (const Configuration &_config) const override |
bool | parse_args (const std::string &args) override |
virtual ValueType | operator() (const DataObject &obj) const |
Return requested data from obj, throwing std::runtime_error if not valid. More... | |
virtual ValueType | evaluate (const DataObject &obj) const =0 |
virtual void | inject (const DataObject &_data_obj, DataStream &_stream, Index pass_index=0) const override |
Default implementation injects each element, via operator<<. More... | |
virtual void | print (const DataObject &_data_obj, std::ostream &_stream, Index pass_index=0) const override |
Default implementation prints each element in a column, via operator<<. More... | |
virtual jsonParser & | to_json (const DataObject &_data_obj, jsonParser &json) const override |
Default implementation calls jsonParser& to_json(const ValueType&, jsonParser&) More... | |
const std::string & | name () const |
Returns a name for the formatter, which becomes the tag used for parsing. More... | |
const std::string & | description () const |
Returns a short description of the formatter and its allowed arguments (if any). This description is used to automatically generate help screens. More... | |
virtual DatumFormatterClass | type () const |
const DictType & | home () const |
const Access the dictionary containing this formatter, set during DictType::lookup More... | |
void | set_home (const DictType &home) const |
Set the dictionary containing this formatter, set during DictType::lookup. More... | |
virtual bool | init (const DataObject &_template_obj) const |
Perform all initialization steps using _template_obj. Returns true if initialization is successful and false if _template_obj has insufficient data to complete initialization. More... | |
virtual bool | validate (const DataObject &_data_obj) const |
Returns true if _data_obj has valid values for requested data. More... | |
virtual std::vector< std::string > | col_header (const DataObject &_template_obj) const |
Returns a header string for each scalar produced by the formatter parsing the entries in the col_header should reproduce the exact query described by the formatter. Ex: "clex(formation_energy)" or "comp(a)", "comp(c)". More... | |
virtual std::string | long_header (const DataObject &_template_obj) const |
Returns a long expression for each scalar produced by the formatter parsing the long_header should reproduce the exact query described by the formatter Ex: "clex(formation_energy)" or "comp(a) comp(c)". More... | |
virtual std::string | short_header (const DataObject &_template_obj) const |
Returns a short expression for the formatter parsing the short_header should allow the formatter to be recreated (but the short header does not specify a subset of the elements) Ex: "clex(formation_energy)" or "comp". More... | |
virtual Index | num_passes (const DataObject &_data_obj) const |
Protected Types | |
typedef multivector< Index >::X< 2 > | IndexContainer |
Protected Member Functions | |
void | _parse_index_expression (const std::string &_expr) |
void | _add_rule (const std::vector< Index > &new_rule) const |
const IndexContainer & | _index_rules () const |
Private Member Functions | |
Novelty * | _clone () const override |
Clone. More... | |
Private Attributes | |
std::string | m_selection |
specifies which selection to use as the population More... | |
Eigen::MatrixXd | m_gram_mat |
Gram matrix, which defind Mahalanobis scalar product. More... | |
Eigen::VectorXd | m_avg_corr |
The average correlation vector of the population. More... | |
DataFormatter< Configuration > | m_format |
Formatter which is used to obtain correlations. More... | |
std::string | m_name |
std::string | m_description |
IndexContainer | m_index_rules |
const DictType * | m_home |
|
inherited |
Definition at line 334 of file DataFormatter.hh.
|
inherited |
Definition at line 337 of file DataFormatter.hh.
|
inherited |
Definition at line 335 of file DataFormatter.hh.
|
protectedinherited |
Definition at line 459 of file DataFormatter.hh.
|
inline |
Definition at line 54 of file ConfigIONovelty.hh.
|
inlineprotectedinherited |
Definition at line 472 of file DataFormatter.hh.
|
inlineoverrideprivatevirtual |
Clone.
Implements CASM::BaseValueFormatter< ValueType, DataObject >.
Definition at line 90 of file ConfigIONovelty.hh.
|
inlineprotectedinherited |
Definition at line 476 of file DataFormatter.hh.
|
protectedinherited |
Derived DatumFormatters have some optional functionality for parsing index expressions in order to make it easy to handle ranges such as:
in which case, DerivedDatumFormatter::parse_args() is called with the string "3,4:8" by dispatching that string to BaseDatumFormatter::_parse_index_expression(), m_index_rules will be populated with {{3,4},{3,5},{3,6},{3,7},{3,8}}
Definition at line 470 of file DataFormatter_impl.hh.
|
inline |
Definition at line 65 of file ConfigIONovelty.hh.
|
inlinevirtualinherited |
Returns a header string for each scalar produced by the formatter parsing the entries in the col_header should reproduce the exact query described by the formatter. Ex: "clex(formation_energy)" or "comp(a)", "comp(c)".
Reimplemented in CASM::Base2DDatumFormatter< Container, DataObject >, CASM::Base1DDatumFormatter< Container, DataObject >, and CASM::DatumFormatterAlias< DataObject >.
Definition at line 389 of file DataFormatter.hh.
|
inlineinherited |
Returns a short description of the formatter and its allowed arguments (if any). This description is used to automatically generate help screens.
Definition at line 352 of file DataFormatter.hh.
|
override |
Definition at line 82 of file ConfigIONovelty.cc.
|
pure virtualinherited |
|
inlineinherited |
const Access the dictionary containing this formatter, set during DictType::lookup
Definition at line 360 of file DataFormatter.hh.
|
override |
Definition at line 13 of file ConfigIONovelty.cc.
|
inlinevirtualinherited |
Perform all initialization steps using _template_obj. Returns true if initialization is successful and false if _template_obj has insufficient data to complete initialization.
Reimplemented in CASM::Base2DDatumFormatter< Container, DataObject >, CASM::Base1DDatumFormatter< Container, DataObject >, and CASM::DatumFormatterAlias< DataObject >.
Definition at line 376 of file DataFormatter.hh.
|
inlineoverridevirtualinherited |
Default implementation injects each element, via operator<<.
Implements CASM::BaseDatumFormatter< DataObject >.
Reimplemented in CASM::Base2DDatumFormatter< Container, DataObject >, and CASM::Base1DDatumFormatter< Container, DataObject >.
Definition at line 821 of file DataFormatterTools.hh.
|
inlinevirtualinherited |
Returns a long expression for each scalar produced by the formatter parsing the long_header should reproduce the exact query described by the formatter Ex: "clex(formation_energy)" or "comp(a) comp(c)".
Definition at line 399 of file DataFormatter.hh.
|
inlineinherited |
Returns a name for the formatter, which becomes the tag used for parsing.
Definition at line 347 of file DataFormatter.hh.
|
inlinevirtualinherited |
If data must be printed on multiple rows, returns number of rows needed to output all data from _data_obj DataFormatter class will subsequently pass over _data_obj multiple times to complete printing (if necessary)
Reimplemented in CASM::Base2DDatumFormatter< Container, DataObject >, and CASM::DatumFormatterAlias< DataObject >.
Definition at line 424 of file DataFormatter.hh.
|
inlinevirtualinherited |
Return requested data from obj, throwing std::runtime_error if not valid.
Definition at line 803 of file DataFormatterTools.hh.
|
overridevirtual |
If DatumFormatter accepts arguments, parse them here. Arguments are assumed to be passed from the command line via: formattername(argument1,argument2,...)
from which DerivedDatumFormatter::parse_args() receives the string "argument1,argument2,..." Returns true if parse is successful, false if not (e.g., takes no arguments, already initialized, malformed input, etc).
Reimplemented from CASM::BaseDatumFormatter< DataObject >.
Definition at line 41 of file ConfigIONovelty.cc.
|
inlineoverridevirtualinherited |
Default implementation prints each element in a column, via operator<<.
Implements CASM::BaseDatumFormatter< DataObject >.
Reimplemented in CASM::Base2DDatumFormatter< Container, DataObject >, and CASM::Base1DDatumFormatter< Container, DataObject >.
Definition at line 833 of file DataFormatterTools.hh.
|
inlineinherited |
Set the dictionary containing this formatter, set during DictType::lookup.
Definition at line 364 of file DataFormatter.hh.
|
override |
Definition at line 52 of file ConfigIONovelty.cc.
|
inlinevirtualinherited |
Returns a short expression for the formatter parsing the short_header should allow the formatter to be recreated (but the short header does not specify a subset of the elements) Ex: "clex(formation_energy)" or "comp".
Reimplemented in CASM::DatumFormatterAlias< DataObject >, and CASM::DataFormatterOperator< ValueType, ArgType, DataObject >.
Definition at line 417 of file DataFormatter.hh.
|
inlineoverridevirtualinherited |
Default implementation calls jsonParser& to_json(const ValueType&, jsonParser&)
Implements CASM::BaseDatumFormatter< DataObject >.
Definition at line 847 of file DataFormatterTools.hh.
|
inlinevirtualinherited |
Reimplemented in CASM::DatumFormatterAlias< DataObject >, and CASM::DataFormatterOperator< ValueType, ArgType, DataObject >.
Definition at line 354 of file DataFormatter.hh.
|
inlinevirtualinherited |
Returns true if _data_obj has valid values for requested data.
Default implementation always returns true
Reimplemented in CASM::Generic2DDatumFormatter< Container, DataObject >, CASM::Generic1DDatumFormatter< Container, DataObject >, CASM::GenericDatumFormatter< ValueType, DataObject >, CASM::DatumFormatterAlias< DataObject >, and CASM::DataFormatterOperator< ValueType, ArgType, DataObject >.
Definition at line 381 of file DataFormatter.hh.
|
mutableprivate |
The average correlation vector of the population.
Definition at line 99 of file ConfigIONovelty.hh.
|
privateinherited |
Definition at line 486 of file DataFormatter.hh.
|
mutableprivate |
Formatter which is used to obtain correlations.
Definition at line 102 of file ConfigIONovelty.hh.
|
mutableprivate |
Gram matrix, which defind Mahalanobis scalar product.
Definition at line 96 of file ConfigIONovelty.hh.
|
mutableprivateinherited |
Definition at line 488 of file DataFormatter.hh.
|
mutableprivateinherited |
Definition at line 487 of file DataFormatter.hh.
|
privateinherited |
{ return notstd::make_unique<DerivedDatumFormatter>(*this);}
Definition at line 485 of file DataFormatter.hh.
|
mutableprivate |
specifies which selection to use as the population
Definition at line 93 of file ConfigIONovelty.hh.