$GDDI group (parallel runs only)
This group controls the partitioning of a large set of
processors into sub-groups of processors, each of which
might compute separate quantum chemistry tasks. If there
is more than one processor in a group, the task assigned to
that group will run in parallel within that processor
group. Note that the implementation of groups in DDI
requires that the group boundaries be on SMP nodes, not
individual processor cores.
For example, the FMO method can farm out its different
monomer or dimer computations to different processor
subgroups. This is advantageous, as the monomers are
fairly small, and therefore do not scale to very many
processors. However, the monomer, dimer, and maybe trimer
calculations are very numerous, and can be farmed out on a
large parallel system.
At present, only a few procedures in GAMESS can
utilize processor groups, namely
a) the FMO method which breaks large calculations into
many small ones,
b) VSCF, which has to evaluate the energy at many
geometries,
c) numerical derivatives which do the same calculation
at many geometries (for gradients, see NUMGRD in $CONTRL,
for hessians, see HESS=SEMINUM/FULLNUM in $FORCE),
d) replica-exchange MD (see REMD in $MD).
NGROUP = the number of groups in GDDI. Default is 0 which
means standard DDI (all processes in one group).
NSUBGR = the number of "subgroups" in GDDI/3.
All cores are first divided into NGROUP worlds, then
each world is divided into NSUBGR groups.
At present, only two types of runs can use GDDI/3:
1. semi-analytic FMO Hessian and
2. minimum energy crossing point search with FMO.
PAROUT = flag to create punch and log files for all nodes.
If set to .false, these files are only opened on
group masters.
BALTYP = load balancing at the group level, otherwise
similar to the one in $SYSTEM. BALTYP in $SYSTEM
is used for intragroup load balancing and the one
in $GDDI for intergroup. It applies only to FMO
runs. (default is DLB)
NUMDLB = Do dynamic load balancing jobs in blocks of indices
of size numdlb.
By using values larger than 1, fewer requests for
DLB will be issued reducing the load on master node.
Default: 1
MANNOD = manual node division into groups. Subgroups must
split up on node boundaries (a node contains one
or more cores). Provide an array of node counts,
whose sum must equal the number of nodes fired up
when GAMESS is launched.
Note the distinction between nodes and cores, also
called processers, If you are using six quad-core
nodes, you might enter
NGROUP=3 MANNOD(1)=2,2,2
so that eight CPUs go into each subgroup.
If MANNOD is not given (the most common case), the
NGROUP groups are chosen to have equal numbers of
nodes in them. For example, a 8 node run that
asks for NGROUP=3 will set up 3,3,2 nodes/group.
NSUBGR = divide NGROUPs into NSUBGR subgroups.
The only run that may use it is semianalytic FMO
Hessian.
(Default: 0, do not divide)
NODINP = a logical switch to turn on node-specific input,
(each node reads its own input; note that you
should change rungms to copy those files).
as required in REUS or REMD restarts.
Default: false.
Note that nodes with very large core counts may be too
large for good scaling with certain kinds of subgroup runs.
Any such 'fat' nodes can be divided into "logical nodes" by
using the kickoff option :cpus= for TCP/IP-based runs, or
the environmental variable DDI_LOGICAL_NODE_SIZE for MPI-
based runs. See the DDI instructions.
Note on memory usage in GDDI: Distributed memory MEMDDI is
allocated globally, MEMDDI/p words per computing process,
where p is the total number of processors. This means an
individual subgroup has access to MANNOD(i)*ncores*MEMDDI/p
words of distributed memory. Thus, if you use groups of
various sizes, each group will have different amounts of
distributed memory (which can be desirable if you have
fragments of various sizes in FMO).
===========================================================
===========================================================
106 lines are written.
Edited by Shiro KOSEKI on Fri Nov 5 14:55:12 2021.