CASM
AClustersApproachtoStatisticalMechanics
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules
CASM::ConfigIO::Novelty Class Referenceabstract

#include <ConfigIONovelty.hh>

+ Inheritance diagram for CASM::ConfigIO::Novelty:

Detailed Description

A DatumFormatter class to measure the 'novelty' of a configuration with respect to a population of configurations Larger numbers indicate a more novel configuration, and a very large number (>~100) indicates a configuration that is linearly independent from the population (in terms of its correlations)

The novelty is based on the Mahalanobis distance. Given a population of correlations, indexed by 'i' and basis functions, indexed by 'j', that depend on the DoFs of those configurations, we define a correlation matrix

corr_mat(i,j) = <function 'j' evaluated in configuration 'i'>

and an average correlation, 'avg_corr', which is taken by averaging over the rows of corr_mat

The covariance of the correlations is given as

covar = corr_mat.transpose()*corr_mat/Nconfig - avg_corr.transpose()*avg_corr

The Mahalanobis distance, 'M', for a particular correlation vector, 'C' (which corresponds to a particular correlation) is

M = sqrt( (C-avg_corr) * inv(covar) * (C-avg_corr).transpose() )

We call inv(covar) the 'Gram matrix', which is conventional terminology for scalar products.

We use a slightly different definition for the novelty measure, 'N', which is

N = sqrt( (C-avg_corr) * inv(covar+epsilon*identity) * (C-avg_corr).transpose() / Ncorr )

Which is regularized by adding a small matrix (epsilon*identity, where epsilon is ~1e-5/Ncorr) and divided by Ncorr, in order to get a number that does not depend strongly on the number of basis functions in the basis set

Definition at line 42 of file ConfigIONovelty.hh.

Public Types

enum  FormatterType
 
typedef DataObject DataObject
 
typedef long difference_type
 
typedef
DataFormatterDictionary
< DataObject,
BaseDatumFormatter< DataObject > > 
DictType
 

Public Member Functions

 Novelty ()
 
std::unique_ptr< Noveltyclone () const
 
double evaluate (const Configuration &_config) const override
 
void init (const Configuration &_tmplt) const override
 
std::string short_header (const Configuration &_config) const override
 
bool parse_args (const std::string &args) override
 
virtual ValueType operator() (const DataObject &obj) const
 Return requested data from obj, throwing std::runtime_error if not valid. More...
 
virtual ValueType evaluate (const DataObject &obj) const =0
 
virtual void inject (const DataObject &_data_obj, DataStream &_stream, Index pass_index=0) const override
 Default implementation injects each element, via operator<<. More...
 
virtual void print (const DataObject &_data_obj, std::ostream &_stream, Index pass_index=0) const override
 Default implementation prints each element in a column, via operator<<. More...
 
virtual jsonParserto_json (const DataObject &_data_obj, jsonParser &json) const override
 Default implementation calls jsonParser& to_json(const ValueType&, jsonParser&) More...
 
const std::string & name () const
 Returns a name for the formatter, which becomes the tag used for parsing. More...
 
const std::string & description () const
 Returns a short description of the formatter and its allowed arguments (if any). This description is used to automatically generate help screens. More...
 
virtual FormatterType type () const
 
const DictTypehome () const
 const Access the dictionary containing this formatter, set during DictType::lookup More...
 
void set_home (const DictType &home) const
 Set the dictionary containing this formatter, set during DictType::lookup. More...
 
virtual void init (const DataObject &_template_obj) const
 
virtual bool validate (const DataObject &_data_obj) const
 Returns true if _data_obj has valid values for requested data. More...
 
virtual std::vector< std::string > col_header (const DataObject &_template_obj) const
 Returns a header string for each scalar produced by the formatter parsing the entries in the col_header should reproduce the exact query described by the formatter. Ex: "clex(formation_energy)" or "comp(a)", "comp(c)". More...
 
virtual std::string long_header (const DataObject &_template_obj) const
 Returns a long expression for each scalar produced by the formatter parsing the long_header should reproduce the exact query described by the formatter Ex: "clex(formation_energy)" or "comp(a) comp(c)". More...
 
virtual std::string short_header (const DataObject &_template_obj) const
 Returns a short expression for the formatter parsing the short_header should allow the formatter to be recreated (but the short header does not specify a subset of the elements) Ex: "clex(formation_energy)" or "comp". More...
 
virtual Index num_passes (const DataObject &_data_obj) const
 

Protected Types

typedef multivector< Index >
::X< 2 > 
IndexContainer
 

Protected Member Functions

void _parse_index_expression (const std::string &_expr)
 
void _add_rule (const std::vector< Index > &new_rule) const
 
const IndexContainer_index_rules () const
 

Private Member Functions

Novelty_clone () const override
 Clone. More...
 

Private Attributes

std::string m_selection
 specifies which selection to use as the population More...
 
Eigen::MatrixXd m_gram_mat
 Gram matrix, which defind Mahalanobis scalar product. More...
 
Eigen::VectorXd m_avg_corr
 The average correlation vector of the population. More...
 
DataFormatter< Configurationm_format
 Formatter which is used to obtain correlations. More...
 

Member Typedef Documentation

Definition at line 313 of file DataFormatter.hh.

Definition at line 315 of file DataFormatter.hh.

typedef multivector<Index>::X<2> CASM::BaseDatumFormatter< DataObject >::IndexContainer
protectedinherited

Definition at line 435 of file DataFormatter.hh.

Member Enumeration Documentation

Definition at line 314 of file DataFormatter.hh.

Constructor & Destructor Documentation

CASM::ConfigIO::Novelty::Novelty ( )
inline

Definition at line 46 of file ConfigIONovelty.hh.

Member Function Documentation

void CASM::BaseDatumFormatter< DataObject >::_add_rule ( const std::vector< Index > &  new_rule) const
inlineprotectedinherited

Definition at line 447 of file DataFormatter.hh.

Novelty* CASM::ConfigIO::Novelty::_clone ( ) const
inlineoverrideprivatevirtual

Clone.

Implements CASM::BaseValueFormatter< ValueType, DataObject >.

Definition at line 76 of file ConfigIONovelty.hh.

const IndexContainer& CASM::BaseDatumFormatter< DataObject >::_index_rules ( ) const
inlineprotectedinherited

Definition at line 451 of file DataFormatter.hh.

void CASM::BaseDatumFormatter< DataObject >::_parse_index_expression ( const std::string &  _expr)
protectedinherited

Derived DatumFormatters have some optional functionality for parsing index expressions in order to make it easy to handle ranges such as:

formatter_name(3,4:8)

in which case, DerivedDatumFormatter::parse_args() is called with the string "3,4:8" by dispatching that string to BaseDatumFormatter::_parse_index_expression(), m_index_rules will be populated with {{3,4},{3,5},{3,6},{3,7},{3,8}}

std::unique_ptr<Novelty> CASM::ConfigIO::Novelty::clone ( ) const
inline

Definition at line 52 of file ConfigIONovelty.hh.

virtual std::vector<std::string> CASM::BaseDatumFormatter< DataObject >::col_header ( const DataObject _template_obj) const
inlinevirtualinherited

Returns a header string for each scalar produced by the formatter parsing the entries in the col_header should reproduce the exact query described by the formatter. Ex: "clex(formation_energy)" or "comp(a)", "comp(c)".

  • Default uses col_header

Reimplemented in CASM::Base1DDatumFormatter< Container, DataObject >, and CASM::DatumFormatterAlias< DataObject >.

Definition at line 373 of file DataFormatter.hh.

const std::string& CASM::BaseDatumFormatter< DataObject >::description ( ) const
inlineinherited

Returns a short description of the formatter and its allowed arguments (if any). This description is used to automatically generate help screens.

Definition at line 332 of file DataFormatter.hh.

double CASM::ConfigIO::Novelty::evaluate ( const Configuration _config) const
override

Definition at line 80 of file ConfigIONovelty.cc.

template<typename ValueType, typename DataObject>
virtual ValueType CASM::BaseValueFormatter< ValueType, DataObject >::evaluate ( const DataObject obj) const
pure virtualinherited
const DictType& CASM::BaseDatumFormatter< DataObject >::home ( ) const
inlineinherited

const Access the dictionary containing this formatter, set during DictType::lookup

Definition at line 341 of file DataFormatter.hh.

void CASM::ConfigIO::Novelty::init ( const Configuration _tmplt) const
override

Definition at line 12 of file ConfigIONovelty.cc.

virtual void CASM::BaseDatumFormatter< DataObject >::init ( const DataObject _template_obj) const
inlinevirtualinherited
template<typename ValueType, typename DataObject>
virtual void CASM::BaseValueFormatter< ValueType, DataObject >::inject ( const DataObject _data_obj,
DataStream _stream,
Index  pass_index = 0 
) const
inlineoverridevirtualinherited

Default implementation injects each element, via operator<<.

Implements CASM::BaseDatumFormatter< DataObject >.

Reimplemented in CASM::Base1DDatumFormatter< Container, DataObject >.

Definition at line 747 of file DataFormatterTools.hh.

virtual std::string CASM::BaseDatumFormatter< DataObject >::long_header ( const DataObject _template_obj) const
inlinevirtualinherited

Returns a long expression for each scalar produced by the formatter parsing the long_header should reproduce the exact query described by the formatter Ex: "clex(formation_energy)" or "comp(a) comp(c)".

  • Default uses col_header

Definition at line 382 of file DataFormatter.hh.

const std::string& CASM::BaseDatumFormatter< DataObject >::name ( ) const
inlineinherited

Returns a name for the formatter, which becomes the tag used for parsing.

Definition at line 326 of file DataFormatter.hh.

virtual Index CASM::BaseDatumFormatter< DataObject >::num_passes ( const DataObject _data_obj) const
inlinevirtualinherited

If data must be printed on multiple rows, returns number of rows needed to output all data from _data_obj DataFormatter class will subsequently pass over _data_obj multiple times to complete printing (if necessary)

Reimplemented in CASM::DatumFormatterAlias< DataObject >.

Definition at line 406 of file DataFormatter.hh.

template<typename ValueType, typename DataObject>
virtual ValueType CASM::BaseValueFormatter< ValueType, DataObject >::operator() ( const DataObject obj) const
inlinevirtualinherited

Return requested data from obj, throwing std::runtime_error if not valid.

Definition at line 729 of file DataFormatterTools.hh.

bool CASM::ConfigIO::Novelty::parse_args ( const std::string &  args)
overridevirtual

If DatumFormatter accepts arguments, parse them here. Arguments are assumed to be passed from the command line via: formattername(argument1,argument2,...)

from which DerivedDatumFormatter::parse_args() receives the string "argument1,argument2,..." Returns true if parse is successful, false if not (e.g., takes no arguments, already initialized, malformed input, etc).

Reimplemented from CASM::BaseDatumFormatter< DataObject >.

Definition at line 39 of file ConfigIONovelty.cc.

template<typename ValueType, typename DataObject>
virtual void CASM::BaseValueFormatter< ValueType, DataObject >::print ( const DataObject _data_obj,
std::ostream &  _stream,
Index  pass_index = 0 
) const
inlineoverridevirtualinherited

Default implementation prints each element in a column, via operator<<.

  • Prints "unknown" if validation fails

Implements CASM::BaseDatumFormatter< DataObject >.

Reimplemented in CASM::Base1DDatumFormatter< Container, DataObject >.

Definition at line 757 of file DataFormatterTools.hh.

void CASM::BaseDatumFormatter< DataObject >::set_home ( const DictType home) const
inlineinherited

Set the dictionary containing this formatter, set during DictType::lookup.

Definition at line 346 of file DataFormatter.hh.

std::string CASM::ConfigIO::Novelty::short_header ( const Configuration _config) const
override

Definition at line 51 of file ConfigIONovelty.cc.

virtual std::string CASM::BaseDatumFormatter< DataObject >::short_header ( const DataObject _template_obj) const
inlinevirtualinherited

Returns a short expression for the formatter parsing the short_header should allow the formatter to be recreated (but the short header does not specify a subset of the elements) Ex: "clex(formation_energy)" or "comp".

Reimplemented in CASM::DatumFormatterAlias< DataObject >, and CASM::DataFormatterOperator< ValueType, ArgType, DataObject >.

Definition at line 400 of file DataFormatter.hh.

template<typename ValueType, typename DataObject>
virtual jsonParser& CASM::BaseValueFormatter< ValueType, DataObject >::to_json ( const DataObject _data_obj,
jsonParser json 
) const
inlineoverridevirtualinherited

Default implementation calls jsonParser& to_json(const ValueType&, jsonParser&)

  • Does nothing if validation fails

Implements CASM::BaseDatumFormatter< DataObject >.

Definition at line 769 of file DataFormatterTools.hh.

virtual bool CASM::BaseDatumFormatter< DataObject >::validate ( const DataObject _data_obj) const
inlinevirtualinherited

Returns true if _data_obj has valid values for requested data.

Default implementation always returns true

Reimplemented in CASM::Generic1DDatumFormatter< Container, DataObject >, CASM::GenericDatumFormatter< ValueType, DataObject >, CASM::DatumFormatterAlias< DataObject >, and CASM::DataFormatterOperator< ValueType, ArgType, DataObject >.

Definition at line 363 of file DataFormatter.hh.

Member Data Documentation

Eigen::VectorXd CASM::ConfigIO::Novelty::m_avg_corr
mutableprivate

The average correlation vector of the population.

Definition at line 87 of file ConfigIONovelty.hh.

DataFormatter<Configuration> CASM::ConfigIO::Novelty::m_format
mutableprivate

Formatter which is used to obtain correlations.

Definition at line 90 of file ConfigIONovelty.hh.

Eigen::MatrixXd CASM::ConfigIO::Novelty::m_gram_mat
mutableprivate

Gram matrix, which defind Mahalanobis scalar product.

Definition at line 84 of file ConfigIONovelty.hh.

std::string CASM::ConfigIO::Novelty::m_selection
mutableprivate

specifies which selection to use as the population

Definition at line 81 of file ConfigIONovelty.hh.


The documentation for this class was generated from the following files: