Training Local-Source Models
The three local-source models are trained through the same entry point,
scripts/run_train.py, and use the same basic data/config structure. The main
differences are the model selected with --model, whether formal charges are
needed, and which loss terms are physically meaningful.
This page covers local-source-specific training. The general config format is described in data and config files, and the density-coefficient conventions are described in atomic multipoles.
Minimal LocalCharges Example
A minimal LocalCharges config can train on energies, forces, dipoles, and
atomic multipoles:
heads:
default:
info_keys:
energy: REF_energy
dipole: REF_dipole
external_field: external_field
total_charge: total_charge
arrays_keys:
forces: REF_forces
atomic_multipoles: REF_atomic_multipoles
train_schedule:
0:
name: stage1
start: 0
end: 99
loss:
atomic_multipoles: 100.0
dipole: 1.0
energy_per_atom: 1.0
forces: 100.0
lr: 0.01
1:
name: stage2
start: 100
end: 199
loss:
dipole: 10.0
energy_per_atom: 1000.0
forces: 100.0
lr: 0.001
Note that you can mix and match loss functions between training stages. A matching training command is:
python scripts/run_train.py \
--name nonpol_example \
--config config.yaml \
--train_file train.xyz \
--valid_fraction 0.2 \
--test_file train.xyz \
--E0s average \
--model LocalCharges \
--hidden_irreps '64x0e + 64x1o' \
--r_max 6.0 \
--batch_size 8 \
--valid_batch_size 8 \
--eval_interval 2 \
--error_table PerAtomRMSE \
--device cuda \
--default_dtype float64 \
--atomic_multipoles_max_l 1 \
--atomic_multipoles_smearing_width 1.5 \
--electrostatic_pbc_method mixed_periodic \
--restart_latest \
--save_cpu
Replace the REF_* keys with the actual extxyz keys in the dataset. At the
start of training, check the logged data-loading summary to make sure all
intended quantities were found.
Common Parameters
These settings are shared by local-source models:
--modelselects the architecture. Local-source choices areLocalCharges,LocalSplitCharges, andFixedChargeBaselinedMACE.--atomic_multipoles_max_lselects the highest multipole order. See atomic multipoles.--atomic_multipoles_smearing_widthsets the Gaussian smearing width. See atomic multipoles.--electrostatic_pbc_methodselects the electrostatic boundary handling used during training. See boundary conditions.
For homogeneous datasets, use a fixed boundary mode such as pbc, slab, or
realspace. Use mixed_periodic only for intentionally mixed datasets.
Reccomended Losses
As discussed in losses, you can compose a loss from many different options. As well as energy_per_atom and forces, with these models you can also use:
atomic_multipoles, when reference density coefficients are availabledipole_per_atomordipole, when reference dipoles are available
These aren’t strictly required for training, but you can fit on them. For the LocalCharges model, its reccomended to also train on total_charge_per_atom or total_charge, since the total charge is not correct by construction. This doesn’t apply to the LCS or fixed charge models.
Settings for LocalCharges
LocalCharges does not require any additiaon settings beyond the general settings in concepts.
Settings for LocalSplitCharges
LocalSplitCharges requires formal charges. These formal charges define
the conserved total charge, while the model learns local charge transfers on
top.
There are two ways to provide them.
Fixed Per-Species Formal Charges
Use --atomic_formal_charges when each element has one formal charge. For example, for training on liquid water one would use:
--atomic_formal_charges "{1: 1.0, 8: -2.0}"
The dictionary keys are atomic numbers. The model construction code expects a
charge for every element in the training z_table.
Use this mode when formal oxidation states are fixed by chemistry and do not vary between configurations.
Per-Atom Formal Charges From Data
Use --formal_charges_from_data when formal charges vary by atom or by
configuration:
--formal_charges_from_data
The config must then map an array key for charges:
heads:
default:
arrays_keys:
charges: formal_charges
Use this mode when the dataset already contains per-atom oxidation states or when the same element can appear in more than one formal charge state.
Settings for Polarizability Output
LocalSplitCharges can add a direct polarizability readout with:
--compute_polarizability
To train this output, the config must map a polarizability info key and the
loss should include polarizability. The local-source calculator exposes
polarizability for MACELocalSplitCharges models that were trained with
this readout.
FixedChargeBaselinedMACE
FixedChargeBaselinedMACE uses fixed formal monopoles as the long-range charge
density. It does not learn atomic multipoles or charge transfer. This model currently supports formal charges specified per species. Use --atomic_formal_charges as described above.