Software design goals and features
The mathematical and algorithmic structure of basis function expansion (BFE) technique is common to all variants of the method. Object-oriented software patterns naturally promote code reuse by allowing the programmer to implement algorithms generally and allow variants to override the general case only when necessary. Moreover, the object-oriented approach allows the compiler to enforce the underlying mathematical and physical relationships that the code represents. This approach makes the code easier to maintain, verify and debug.
The classes are organized into major abstract classes the represent common functions and relations between functions common to n-body simulations and the BFE method in particular:
Phase space
. Each phase-space point is described by aParticle
, which contains the mass, position, velocity, and an arbitrary number of user-defined attributes (e.g. particular non-collisionless properties such as chemical state or various per particle diagnostics such as computational effort).
Distributions
. Particles are grouped together into aComponent
container. AComponent
represents a particular distinct phase-space distribution. For example, EXP uses individual components to represent astronomical entities such as a disks and dark-matter halos. A force method appropriate for the geometry of each phase-space distribution is associated with each component (PotAccel
, see below). TheComponent
instances are gather into a container, theComponentContainer
. TheComponentContainer
instance, for example, triggers the evaluation of the acceleration for every particle in each component. Specifically, the time step routine calls thecompute_potential()
of the simulation’sComponentContainer
instance to begin a force evaluation.
Force methods
. ThePotAccel
class defines thecore interface for describing a gravitational potential and acceleration. From this abstract base class, the properties specific to basis function expansions are layered on by the derived class
Basis
. Further specializations include details specific to harmonic expansions inAxisymmetricBasis
and the standard spherical and cylindrical geometries appropriate for halos and disks inSphere
andCylinder
, respectively. Most non-basis classes, such asDirect
which defines a direct ring code havePotAccel
as the parent class.
External forces
. The classExternalForce
provides additionalpossibly time-dependent analytic or self-consistent forces to be applies to any or every component. These methods can be very general manipulators of the phase space, not necessarily forces per se. For example, EXP uses an external force class called
PeriodicBC
to enforce periodic boundary conditions. The user-specified external forces are gather up into anExternalForce
list and evaluated at each step by theComponentContainer
. Per particle accelerations may be added to each particle’s acceleration vector if appropriate.
Output routines
. TheOutput
base classintegrate over phase-space components to provide summary statistics of various sorts and snapshots of the phase space itself. Each
Output
instance either integrates over the entire phase space or individually specified phase-space components. For example, theOutLog
class provides the standard global quantities of center of mass and velocity, angular momentum, kinetic and potential energies, etc. for each component and the entirely. TheOrbTrace
class records the trajectory of a list of specific particles. TheOutCoef
class records the basis expansion coefficients. All of theseOutput
instances are gather up into anOutputContainer
. After each time step, each instance is asked whether it wants to run. The run frequency is controlled by input parameters).
As described in EXP time stepping, the Hamilton
equations are solved by a kick-drift-kick version of the
time-centered leapfrog algorithm. In principle, time evolution could
(and perhaps should) be defined by an evolution operator class. This
may be done for a future version of EXP. At present, the time
evolution is driven by a calls to the ComponentContainer
instance
to evaluate the accelerations.
Overall organization of the code
The main()
routine is defined in src/expand.cc
. In essence, the
main()
calls three subroutines:
begin_run()
: this code initializes the three key container objects:
The
ComponentContainer
will hold phase-space and force definitions for each distinct component. For example, a three-dimensional flattened disk or an approximately spherical halo along with its associated force would each be represented by a particularContainer
instance. Thebegin_run()
routine instantiates particular phase-space components as defined in the YAML configuration and adds them to theComponentContainer
.The
ExternalCollection
defines force methods that are applied to phase space components but are not defined by components. For example, an external force, such as an analytic tidal field, would be instantiated as a member of the ExternalCollection. This collection may have zero members.The
OutputContainer
keeps a collection of procedures that may operate on each component or specific components in theComponentContainer
at some time step frequency defined by the parameter specification for eachOutput
method. These are specified for construction in the YAML configuration.
do_step()
inside of its time-step loop.do_step
implements the kick-drift-kick leapfrog time-integration algorithm described in multistep.clean_up()
calls theOutputContainer
run method one last time, and then deletes the three container instances defined bybegin_run()
.
Parallel usage
EXP is designed to use MPI for internode communication and parallelization and POSIX threads (pthreads) and OpenMP for intranode parallelization. Although MPI can be used exclusively, without multiple threads per process by starting one process for every core on each node, basis tables and other data will be duplicated by these heavy weight processes. This may lead to poor allocation of memory resources and out-of-core conditions. By using multiple threads per process on compute nodes with multiple cores, the storage internal arrays and tables will not be duplicated.
The Global stanza (see YAML config) allows the
user to specify the number of threads per process using the nthrds
parameter. For example, on a cluster where nodes have 20 cores,
specifying nthrds: 20
and one process per node using
mpirun -bind-to none -npernode 1 exp -f myjob.yml
or for a Slurm job
srun --ntasks-per-node=1 exp -f myjob.yml
allow code bottlenecks to using pthreads rather than MPI for
parallelization. For MPI parallelization only, specify nthrds: 1
,
which is the default value. Not all force methods are thread aware,
but many are.
Both pyEXP and EXP are use OpenMP to parallelize bottleneck loops. The FieldGenerator in pyEXP, in particular, can achieve linear scaling with the number of available cores. To take advantage of this, specify the OMP_NUM_THREADS environment variable and set it to the number of available cores on your machine. For example, in Bash:
export OMP_NUM_THREADS=16
for an 16-core architecture.