Completion Checks#

The CompletionCheck class can be used to check when a Monte Carlo simulation is complete. It is constructed with a CompletionCheckParams instance which allows:

setting the requested absolute or relative precision for convergence of sampled data
setting cutoff parameters for number of steps or passes, number of samples, amount of simulated time or amount of elapsed clocktime, with:
- minimums, which force the simulation to keep running until a minimium value is reached
- maximums, which force the simulation to stop running when a maximum value is reached
controlling when completion checks are performed
customizing the method used to check for equilibration
customizing the method used to calculate statistics

A CompletionCheckParams instance can be read from a Python dict for input file parsing.

Performing completion checks#

Example usage:

import libcasm.monte as monte

# ~~~ Initialize a random number engine ~~~
random_number_engine = monte.RandomNumberEngine()

# ~~~ User input: Specify the system ~~~
# For example, the prim, cluster expansions, composition axes, etc.
prim = ...
composition_axes = ...
formation_energy_clexulator = ...
formation_energy_coeff = ...

# ~~~ Construct the system ~~~
# Construct a data structure that makes system data accessible
# to the Monte Carlo calculator and sampling functions
system = SystemType(
    prim=prim,
    composition_axes=composition_axes,
    formation_energy_clexulator=formation_energy_clex_clexulator,
    formation_energy_coeff=formation_energy_clex_coeff,
    ...
)

# ~~~ Construct the Monte Carlo calculator ~~~
# A Monte Carlo calculator implements the Monte Carlo method and holds
# method specific data along with access to the system.
# For example, this would have the current configuration and conditions
mc_calculator = MonteCarloCalculatorType(
    system,
    random_number_engine,
    ...
)

# ~~~ Construct sampling functions ~~~
# Sampling functions should be given access to the mc_calculator to
# be able to take samples when called, and stored in a StateSamplingFunctionMap
sampling_functions = monte.StateSamplingFunctionMap()
_functions = [
    make_formation_energy_f(mc_calculator),
    make_potential_energy_f(mc_calculator),
    make_parametric_composition_f(mc_calculator),
]
for _f in _functions:
    sampling_functions[_f.name] = _f

# ~~~ User input: Set CompletionCheckParams ~~~
params = monte.CompletionCheckParams()
params.cutoff_params.max_count = 100

completion_check = monte.CompletionCheck(params)

# ~~~ User input: Specify quantities to be sampled ~~~
# should be keys into sampling_functions
quantities = [
    "formation_energy",
    "potential_energy",
    "parametric_composition",
]

# ~~~ Construct data samplers - for all requested quantities ~~~
samplers = monte.SamplerMap()
for quantity_name in quantities:
    if quantity_name in sampling_functions:
        f = sampling_functions[quantity_name]
        samplers[f.name] = monte.Sampler(
            shape=f.shape,
            component_names=f.component_names,
        )

# this is required, but can be left with 0 samples to indicate unweighted
sample_weight = monte.Sampler(shape=[])

# method log specifies where to periodically write status messages
# internally, method log tracks the elapsed clock time
method_log = monte.MethodLog(path_to_logfile)

n_steps = 0
i_sample_period = 0
sample_period = 1000
while not completion_check.count_check(
    samplers=samplers,
    sample_weight=sample_weight,
    count=n_steps,
    method_log=method_log,
):
    # ... do Monte Carlo step ...
    n_steps += 1

    # ... periodically sample data ...
    if i_sample_period == sample_period:
        for name, f in sampling_functions.items():
            samplers[name].append(f())

Examples#

Run for a user-specified number of steps:

import libcasm.monte as monte
params = monte.CompletionCheckParams()
params.cutoff_params.max_count = 100

completion_check = monte.CompletionCheck(params)

n_steps = 0
while not completion_check.count_check(
    samplers=samplers,
    sample_weight=sample_weight,
    count=n_steps,
    method_log=method_log,
):
    # ... do Monte Carlo step ...
    n_steps += 1

Set a limit for the maximum elapsed clock time (in seconds):

Note

Because checking the clock time at every Monte Carlo step slows down calculations excessively, clock time limits are only checked after the number of samples changes, and the actual clock time at which a simulation is stopped may longer than the limit set.

import libcasm.monte as monte
params = monte.CompletionCheckParams()
params.cutoff_params.max_count = 3.6e2 # stop after running at least 1 hr

completion_check = monte.CompletionCheck(params)

n_steps = 0
while not completion_check.count_check(
    samplers=samplers,
    sample_weight=sample_weight,
    count=n_steps,
    method_log=method_log,
):
    # ... do Monte Carlo step ...
    n_steps += 1

The ensemble average value of a property can be estimated from Metropolis Monte Carlo simulations as

\[\langle X \rangle \approx \bar{X} = \frac{\sum_l^N X_l}{N},\]

where:

\(X_l\) is the \(l\)-th of \(N\) observations of property \(X\)
\(\langle X \rangle\) is the ensemble average
\(\bar{X}\) is the mean of the observations.

The error, \(\bar{X} - \langle X \rangle\), is normally distributed and approaches zero in the limit of large \(N\) according to the central limit theorem,

\[\bar{X}-\langle X \rangle \approx \mathcal{N}(0, \sigma^2/N).\]

For stationary distributions, the variance of the error, \(\sigma^2\), can be calculated from the lag \(k\) autocovariance between observations \(X_j\) and \(X_{j+k}\) as

\[ \begin{align}\begin{aligned}\sigma^2 = \gamma_0 + 2 \sum^\infty_{k=1} \gamma_k,\\\gamma_k = \mathrm{Cov}\left( X_j, X_{j+k} \right).\end{aligned}\end{align} \]

After an initial equilibration period, samples drawn from Monte Carlo simulations have a stationary distribution. Therefore, the error can be estimated from the observations by estimating \(\sum^\infty_{k=1} \gamma_k\).

If the autocovariance decays like \(\gamma_k = \gamma_0 \rho^{-|k|}\), then the infinite sum can be evaluated to give

\[\sigma^2 = \gamma_0 \left(\frac{1+\rho}{1-\rho}\right).\]

Van de Walle and Asta introduced methods for determining the initial equilibration period from which observations should be discarded before calculating the mean, and for calculating \(\rho\) from the remaining observations to estimate the error in the mean. These methods are implemented in CASM in the default_equilibration_check() and BasicStatisticsCalculator methods, respectively, and used as the defaults for automatic convergence to user-specified precision.

Users may also implement and use alternative methods through the equilibration_check_f and calc_statistics_f parameters of the CompletionCheckParams class.

Equilibration check#

The default_equilibration_check() method partitions an array of observations into three ranges:

the equilibriation stage, [0, start1),
the first partition, [start1, start2),
and the second partition, [start2, N),

where N is len(observations), and start1 and start2 are indices into the observations array such that 0 <= start1 < start2 <= N, and the number of elements in the first and second partition are the same (within 1).

The simulation is considered equilibrated at observation start1 if the mean of the elements in the first and second partition are approximately equal to the desired precsion, (abs(mean1 - mean2) < prec).

Additionally, in CASM, the value start1 is incremented as much as needed to ensure that the equilibration stage has observations on either side of the overall mean.

The result of default_equilibration_check() is of type IndividualEquilibrationResult, which has two attributes which are set as follows:

If all observations are approximately equal, then:
- is_equilibrated = True
- N_samples_for_equilibration = 0
If the equilibration conditions are met, the result contains:
- is_equilibrated = true
- N_samples_for_equilibration = start1
If the equilibration conditions are not met, the result contains:
- is_equilibrated = false
- N_samples_for_equilibration = <undefined>

If samples are weighted, as in the n-fold way algorithm, then the same partitioning method is used, but with weighted observations calculated using:

weighted_observation[i] = sample_weight[i] * observation[i] * N / W

where:

W = np.sum(sample_weight)

The same weight_factor N/W applies for all properties.

Calculated precision#

The value of \(\rho\) depends on the details of the system and the Monte Carlo method. The method of estimating \(\rho\) introduced by Van de Walle and Asta `` and implemented in BasicStatisticsCalculator, calculates \(\hat{\rho}\), an estimate for \(\rho\), by searching for the smallest value of \(k\) for which \(\hat{\gamma}_k/\hat{\gamma}_0 \le 1/2\).

If samples are weighted, as in the n-fold way algorithm, then BasicStatisticsCalculator uses one of two optional approaches which re-sample the data to generate \(N'\) equally weighted observations. The weighted observations can be considered a time series, where the time interval associated with each sample value is equal to the sample weight. Then the time series can be sampled at \(N'\) regular intervals. Given the resampled data, the two approaches are:

Calculate \(\bar{X} = \sum_l X_l w_l / \sum_l w_l\) and \(\hat{\gamma}_0 = \mathrm{Var}(X)\) directly from the original observations and sample weights, and only calculate \(\hat{\rho}\) from resampled observations
Calculate \(\bar{X}\), \(\hat{\gamma}_0\), and \(\hat{\rho}\) from the resampled observations

Given \(\hat{\gamma}_0\) and \(\hat{\rho}\), the error in the mean, \(\bar{X} \pm p\), is calculated to a user-specified confidence level according to

\[p = \sqrt{2} * \mathrm{erf}^{-1}(c) \sqrt{ \frac{\hat{\gamma}_0}{N} \left(\frac{1+\hat{\rho}}{1-\hat{\rho}}\right) }\]

where \(c\) is the confidence level, and \(p\) is the calculated precision.