(21 June 2016)
**************************************
* *
* Section 5 - Programmer's Reference *
* *
**************************************
This section describes features of GAMESS programming
which are true for all machines. See the section 'hardware
specifics' for information about specific machines. The
contents of this section are:
Installation overview
Running Distributed Data Parallel GAMESS
parallelization history
DDI compute and data server processes
memory allocations and check jobs
representative performance examples
Altering program limits
Names of source code modules
Programming Conventions
Parallel broadcast identifiers
Disk files used by GAMESS
disk files in parallel runs
Contents of the direct access file 'DICTNRY'
--------------------------------------------------------------------
Installation overview
Very specific compiling directions are given in a file
provided with the GAMESS distribution, namely
~/gamess/machines/readme.unix
and this should be followed closely. The directions here
are of a more general nature.
Before starting the installation, you should also see
the pages about your computer in the 'Hardware Specifics'
section of this manual, and at the compiler version notes
that are kept in the script 'comp'. There might be some
special instructions for your machine.
The first step in installing GAMESS should be to print
the manual. If you are reading this, you've got that done!
The second step would be to get the source code activator
compiled and linked (note that the activator must be
activated manually before it is compiled). Third, you
should now compile all the quantum chemistry sources.
Fourth, compile the DDI message passing library, and its
process kickoff program. Fifth, link the GAMESS program.
Finally, run all the short examples provided with GAMESS,
and very carefully compare the key results shown in the
'sample input' section against your outputs. These
"correct" results are from a IBM RS/6000, so there may be
very tiny (last digit) precision differences for other
machines. That's it! The rest of this section gives a
little more detail about some of these steps.
* * * * *
GAMESS will run on essentially any machine with a
FORTRAN 77 compiler. However, even given the F77 standard
there are still a number of differences between various
machines. For example, some chips still use 32 bit
integers, as primitive as that may seem, while many chips
allow for 64 bit processing (and hence very large run-time
memory usage). It is also necessary to have a C compiler,
as the message passing library is implemented entirely in
that language.
Although there are many types of computers, there is
only one (1) version of GAMESS.
This portability is made possible mainly by keeping
machine dependencies to a minimum (that is, writing in
FORTRAN77, not vendor specific language extensions). The
unavoidable few statements which do depend on the hardware
are commented out, for example, with "*I64" in columns 1-4.
Before compiling GAMESS on a 64 bit machine, these four
columns must be replaced by 4 blanks. The process of
turning on a particular machine's specialized code is
dubbed "activation".
A semi-portable FORTRAN 77 program to activate the
desired machine dependent lines is supplied with the GAMESS
package as program ACTVTE. Before compiling ACTVTE on your
machine, use your text editor to activate the very few
machine dependent lines in ACTVTE before compiling it. Be
careful not to change the DATA initialization!
* * * * *
The quantum chemistry source code of GAMESS is in the
directory
~/gamess/source
and consists almost entirely of unactivated FORTRAN source
code, stored as *.src. There is a bit of C code in this
directory to implement runtime memory allocation.
The task of building an executable for GAMESS is:
activate compile link
*.SRC ---> *.FOR ---> *.OBJ ---> *.EXE
source FORTRAN object executable
code code code image
where the intermediate files *.FOR and *.OBJ are discarded
once the executable has been linked. It may seem odd at
first to delete FORTRAN code, but this can always be
reconstructed from the master source code using ACTVTE.
The advantage of maintaining only one master version is
obvious. Whenever any improvements are made, they are
automatically in place for all the currently supported
machines. There is no need to make the same changes in a
plethora of other versions.
* * * * *
The Distributed Data Interface (DDI) is the message
passing layer, supporting the parallel execution of GAMESS.
It is stored in the directory tree
~/gamess/ddi
It is necessary to compile this software, even if you don't
intend to run on more than one processor. This directory
contains a file readme.ddi with directions about compiling,
and customizing your computer to enable the use of System V
memory allocation routines. It also has information about
some high end parallel computer systems.
* * * * *
The control language needed to activate, compile, and
link GAMESS on your brand of computer involves several
scripts, namely:
COMP compiles a single quantum chemistry module.
COMPALL compiles all quantum chemistry source modules.
COMPDDI compiles the distributed data interface, and
generates a process kickoff program, ddikick.x.
LKED link-edit (links) together quantum chemistry
object code, and the DDI library, to produce a
binary executable gamess.x.
RUNGMS runs a GAMESS job, in serial or parallel.
RUNALL uses RUNGMS to run all the example jobs.
There are files related to some utility programs:
MBLDR.* model builder (internal to Cartesian)
CARTIC.* Cartesian to internal coordinates
CLENMO.* cleans up $VEC groups
DK3.F prepare relativistic AO contractions.
There are files related to two-D X-windows graphics, in:
~/gamess/graphics
Better back-end graphics (3D as well as 2D) is available in
the MacMolPlt program, now available for all popular
desktop operating systems.
Running Distributed Data Parallel GAMESS
GAMESS consists of many FORTRAN files implementing its
quantum chemistry, and some C language files implementing
the Distributed Data Interface (DDI). The directions for
compiling DDI, configuring the system parameters to permit
execution of DDI programs, and how to use the 'ddikick.x'
program which "kicks off" GAMESS processes may be found in
the file readme.ddi. If you are not the person installing
the GAMESS software, you can skip reading that.
Efficient use of GAMESS requires an understanding of
three critical issues: The first is the difference between
two types of memory (replicated MWORDS and distributed
MEMDDI) and how these relate to the physical memory of the
computer which you are using. Second, you must understand
to some extent the degree to which each type of computation
scales so that the proper number of CPUs is selected.
Finally, many systems run -two- GAMESS processes on every
processor, and if you read on you will find out why this is
so.
Since all code needed to implement the Distributed Data
Interface (DDI) is provided with the GAMESS source code
distribution, the program compiles and links ready for
parallel execution on all machine types. Of course, you
may choose to run on only one processor, in which case
GAMESS will behave as if it is a sequential code, and the
full functionality of the program is available.
parallelization history
We began to parallelize GAMESS in 1991 as part of the
joint ARPA/Air Force piece of the Touchstone Delta project.
Today, nearly all ab initio methods run in parallel,
although some of these still have a step or two running
sequentially only. Only the RHF+CI gradients have no
parallel method coded. We have not parallelized the semi-
empirical MOPAC runs, and probably never will. Additional
parallel work occurred as a result of a DoD CHSSI software
initiative in 1996. This led to the DDI-based parallel
RHF+MP2 gradient program, after development of the DDI
programming toolkit itself. Since 2002, the DoE program
SciDAC has sponsored additional parallelization. The DDI
toolkit has been used since its 1999 introduction to add
codes for UHF+MP2 gradient, ROHF+ZAPT2 energy, and MCSCF
wavefunctions as well as their analytic Hessians or MCQDPT2
energy correction.
In 1991, the parallel machine of choice was the Intel
Hypercube although small clusters of workstations could
also be used as a parallel computer. In order to have
the best blend of portability and functionality, we chose
in 1991 to use the TCGMSG message passing library rather
than one of the early vendor's specialized libraries. As
the major companies began to market parallel machines, and
as MPI version 1 emerged as a standard, we began to use
MPI on some equipment in 1996, while still using the very
resilient TCGMSG library on everything else. However, in
June 1999, we retired our old friend TCGMSG when the
message passing library used by GAMESS changed to the
Distributed Data Interface, or DDI. An SMP-optimized
version of DDI was included with GAMESS in April 2004.
Three people have been extremely influential upon the
current parallel methodology. Theresa Windus, a graduate
student in the early 1990s, created the first parallel
versions. Graham Fletcher, a postdoc in the late 1990s,
is responsible for the addition of distributed data
programming concepts. Ryan Olson rewrote the DDI software
in 2003-4 to support the modern SMP architectures well, and
this was released in April 2004 as our standard message
passing implementation.
DDI compute and data server processes
DDI contains the usual parallel programming calls, such
as initialization/closure, point to point messages, and
the collective operations global sum and broadcast. These
simple parts of DDI support all parallel methods developed
in GAMESS from 1991-1999, which were based on replicated
storage rather than distributed data. However, DDI also
contains additional routines to support distributed memory
usage.
DDI attempts to exploit the entire system in a scalable
way. While our early work concentrated on exploiting the
use of p processors and p disks, it required that all data
in memory be replicated on every one of the p CPUs. The
use of memory also becomes scalable only if the data is
distributed across the aggregate memory of the parallel
machine. The concept of distributed memory is contained in
the Remote Memory Access portion of MPI version 2, but so
far MPI-2 is not available from American computer vendors.
The original concept of distributed memory was implemented
in the Global Array toolkit of Pacific Northwest National
Laboratory (see http://www.emsl.pnl.gov/pub/docs/global).
Basically, the idea is to provide three subroutine
calls to access memory on other processors (in the local or
even remote nodes): PUT, GET, and ACCUMULATE. These give
access to a class of memory which is assumed to be slower
than local memory, but faster than disk:
<--- fastest slowest --->
registers cache(s) local_memory remote_memory disks tapes
<--- smallest biggest --->
Because DDI accesses memory on other CPUs by means of an
explicit subroutine call, the programmer is aware that a
message must be transmitted. This awareness of the access
overhead should encourage algorithms that transfer many
data items in a single message. Use of a subroutine call
to reach remote memory is a recognition of the non-uniform
memory access (NUMA) nature of parallel computers. In
other words, the Distributed Data Interface (DDI) is an
explicitly message passing implementation of global shared
memory.
In order to have one CPU pass data items to a second
CPU when the second CPU needs them, without significant
delay, the computing job on the first CPU must interrupt
its computation briefly to furnish the data. This type of
communication is referred to as "one sided messages" or
"active messages" since the first CPU is an unwitting
participant in the process, which is driven entirely by the
requirements of the second CPU.
The Cray T3E has a library named SHMEM to support this
type of one sided messages (and good hardware support for
this too) so, on the T3E, GAMESS runs as a single process
per CPU. Its memory image looks like this:
node 0 node 1
p=0 p=1
--------------- ---------------
| GAMESS | | GAMESS |
| quantum | | quantum |
| chem code | | chem code |
--------------- ---------------
| DDI code | | DDI code |
--------------- --------------- input keywords:
| replicated | | replicated | <-- MWORDS
| data | | data |
-----------------------------------------
| | | | | | <-- MEMDDI
| | distributed| | distributed | |
| | data | | data | |
| | | | | |
| | | | | |
| | | | | |
| --------------- --------------- |
-----------------------------------------
where the box drawn around the distributed data is meant to
imply that a large data array is residing in the memory of
all processes (in this example, half on one and half on the
other).
Note that the input keyword MWORDS gives the amount of
storage used to duplicate small matrices on every CPU,
while MEMDDI gives the -total- distributed memory required
by the job. Thus, if you are running on p CPUs, the memory
that is used on any given CPU is
total on any 1 CPU = MWORDS + MEMDDI/p
Since MEMDDI is very large, its units are in millions of
words. Since good execution speed requires that you not
exceed the physical memory belonging to your CPUs, it is
important to understand that when MEMDDI is large, you will
need to choose a sufficiently large number of CPUs to keep
the memory on each reasonable.
To repeat, the DDI philosophy is to add more processors
not just for their compute performance or extra disk space,
but also to aggregate a very large total memory. Bigger
problems will require more CPUs to obtain sufficiently
large total memories! We will give an example of how you
can estimate the number of CPUs a little ways below.
If the GAMESS task running as process p=1 in the above
example needs some values previously computed, it issues a
call to DDI_GET. The DDI routines in process p=1 then
figure out where this "patch" of data actually resides in
the big rectangular distributed storage. Suppose this is
on process p=0. The DDI routines in p=1 send a message to
p=0 to interupt its computations, after which p=0 sends a
bulk data message to process p=1's buffer. This buffer
resides in part of the replicated storage of p=1, where
computations can occur. Note that the quantum chemistry
layer of process p=1 was sheltered from most of the details
regarding which CPU owned the patch of data that process
p=1 wanted to obtain. These details are managed by the DDI
layer.
Note that with the exception of DDI_ACC's addition of
new terms into a distributed array, no arithmetic is done
directly upon the distributed data. Instead, distributed
data is accessed only by DDI_GET, DDI_PUT (its counterpart
for storage of data items), and DDI_ACC (which accumulates
new terms into the distributed data). DDI_GET and DDI_PUT
can be thought of as analogous to FORTRAN READ and WRITE
statements that transfer data between disk storage and
local memory where computations may occur.
It is the programmer's challenge to minimize the
number of GET/PUT/ACC calls, and to design algorithms that
maximize the chance that the patches of data are actually
within the local CPU's portion of the distributed data.
Since the SHMEM library is available only on a few
machines, all other platforms adopt the following memory
model, which involves ?two- GAMESS processes running on
every processor:
node 0 node 1
p=0 p=1
--------------- ---------------
| GAMESS X| | GAMESS X| compute
| quantum | | quantum | processes
| chem code | | chem code |
--------------- ---------------
| DDI code | | DDI code |
--------------- --------------- keyword:
| replicated | | replicated | <-- MWORDS
| data | | data |
--------------- ---------------
p=2 p=3
--------------- ---------------
| GAMESS | | GAMESS | data
| quantum | | quantum | servers
| chem code | | chem code |
--------------- ---------------
| DDI code X| | DDI code X|
--------------- ---------------
----------------------------------------- keyword:
| | | | | | <-- MEMDDI
| | distributed| | distributed | |
| | data | | data | |
| | | | | |
| | | | | |
| | | | | |
| --------------- --------------- |
-----------------------------------------
The first half of the processes do quantum chemistry, and
the X indicates that they spend most of their time
executing some sort of chemistry. Hence the name "compute
process". Soon after execution, the second half of the
processes call a DDI service routine which consists of an
infinite loop to deal with GET, PUT, and ACC requests until
such time as the job ends. The X shows that these "data
servers" execute only DDI support code. (This makes the
data server's quantum chemistry routines the equivalent of
the human appendix). The whole problem of interupts is now
in the hands of the operating system, as the data servers
are distinct processes. To follow the same example as
before, when the compute process p=1 needs data that turns
out to reside on process 0, a request is sent to the data
server p=2 to transfer information back to the compute
process p=1. The compute process p=0 is completely unaware
that such a transaction has occurred.
The formula for the memory required by any single CPU
is unchanged, if p is the total number of CPUs used,
total on any 1 CPU = MWORDS + MEMDDI/p.
As a technical matter, if you are running on a system
where all processors are in the same node (the SGI Altix is
an example), or if you are running on an IBM SP where LAPI
assists in implementing one-sided messaging, then the data
server processes are not started. The memory model in the
illustration above is correct, if you just mentally omit
the data server processes from it. In all cases, where the
SHMEM library is not used, the distributed arrays are
created by System V memory calls, shmget/shmat, and their
associated semaphore routines. Your system may need to be
reconfigured to allow allocation of large shared memory
segments, see 'readme.ddi' for more details.
The parallel CCSD and CCSD(T) programs add a third kind
of memory to the mix: node-replicated. This is data (e.g.
the doubles amplitudes) that is stored only once per node.
Thus, this is more copies of the data than once per
parallel job (fully distributed MEMDDI) but fewer than once
per CPU (replicated MWORDS). A picture of the memory model
for the CCSD(T) program can be found in the "readme.ddi"
file, so is not duplicated here. There is presently no
keyword for this type of memory, but the system limit on
the total SystemV memory does apply. It is important to
perform a check run when using CCSD(T) and carefully follow
the printout of its memory requirements.
memory allocations and check jobs
At present, not all runs require distributed memory.
For example, in an SCF computation (no hessian or MP2 to
follow) the memory needed is on the order of the square of
the basis set size, for such quantities as the orbital
coefficients, density, Fock, overlap matrices, and so on.
These are simply duplicated on every CPU in the MWORDS (or
the older keyword MEMORY in $SYSTEM) region. In this case
the data server processes still run, but are dormant
because no distributed memory access is attempted.
However, closed and open shell MP2 calculations, MCSCF
wavefunctions, and their analytic hessian or MCQPDT energy
correction do use distributed memory when run in parallel.
Thus it is important to know how to obtain the correct
value for MEMDDI in a check run, and how to compute how
many CPUs are needed to do the run.
Check runs (EXETYP=CHECK) need to run quickly, and the
fastest turn around always comes on one CPU only. Runs
which do not currently exploit MEMDDI distributed storage
will formally allocate their MWORDS needs, and feel out
their storage needs while skipping almost all of the real
work. Since MWORDS is replicated, the amount that is
needed on 1 CPU remains unchanged if you later do the true
computation on more than 1 CPU.
Check jobs which involve MEMDDI storage are a little
bit trickier. As noted, we want to run on only 1 CPU to
get fast turn around. However, MEMDDI is typically a large
amount of memory, and this is unlikely to be available on
a single CPU. The solution is that the check job will not
actually allocate the MEMDDI storage, instead it just
remembers what you gave as input and checks to see if this
will be adequate. As someone once said, MEMDDI is a "fairy
tale number" during a check job. So, you can input a big
value like MEMDDI=25000 (25,000 million words is equal to
25,000 * 1,000,000 * 8 = 200 GBytes) and run this check job
on a computer with only 1024 MB = 1 GB of memory per
processor. Let us say that a closed shell MP2 check run
for this case gives the output of
SCALED *PER-NODE* MEMORY REQUIREMENT
NODES DISTRIBUTED/MWORDS REPLICATED/WORDS TOTAL/MBYTES
1 952 7284508 7624
The real run requires MWORDS=8 MEMDDI=960. Note that we
have just rounded up a bit from the 7.3 and 952 in this
output, for safety.
Of course, the actual computation will have to run on a
large number of such processors, because you don't have 200
GB on your CPU, we are assuming 1024 MB (1 GB). Let us
continue to compute how many processors are needed. We
need to reserve some memory for the operating system (25
MB, say) and for the GAMESS program and local storage (let
us say 50 MB, for GAMESS is a big program, and the compute
processes should be swapped into memory). Thus our
hypothetical 1024 MB processor has 950 MB available,
assuming no one else is running. In units of words, this
machine has 950/8 = 118 million words available for your
run. We must choose the number of processors p to satisfy
needed <= available
MWORDS + MEMDDI/p <= free physical memory
8 + 960/p <= 118
so solving for p, we learn this example requires p >= 9
compute processes. The answer for roughly 8 GB of
distributed memory on 1 GB processors was not 8, because
the O/S, GAMESS itself, and the MWORDS requirements
together mean less than 1 GB could be contributed to the
distributed total. More CPUs than 9 do not require
changing MWORDS or MEMDDI, but will run faster than 9.
Fewer CPUs than 9 do not have enough memory to run!
One more subtle point about CHECK runs with MEMDDI is
that since you are running on 1 CPU only, the code does not
know that you wish to run the parallel algorithm instead of
the sequential algorithm. You must force the CHECK job
into the parallel section of the program by
$system parall=.true. $end
There's no harm leaving this line in for the true runs, as
any job with more than one compute process is parallel
regardless of the input keyword PARALL.
The check run for MCQDPT jobs will print three times
a line like this
MAXIMUM MEMDDI THAT CAN BE USED IN ... IS x MWORDS
Typically the 2nd such step, transforming over all
occupied and virtual canonical orbitals, will be the
largest of the three requirements. Its size can be
guesstimated before running, as
(Nao*Nao+Nao)/2 * ((Nocc*Nocc+Nocc)/2 + Nocc*Nvirt)
where Nocc = NMOFZC+NMODOC+NMOACT, Nvirt=NMOEXT, and
Nao is the size of the atomic basis. Unlike the closed
shell MP2 program, this section still does extensive
I/O operations even when MEMDDI is used, so it may be
useful to consider the three input keywords DOORD0,
PARAIO, and DELSCR when running this code.
representative performance examples
This section describes the way in which the various
quantum chemistry computations run in parallel, and shows
some typical performance data. This should give you as the
user some idea how many CPUs can be efficiently used for
various SCFTYP and RUNTYP jobs
The performance data you will see below were obtained
on a 16 CPU Intel Pentium II Linux (Beowulf-type) cluster
costing $49,000, of which $3,000 went into the switched
Fast Ethernet component. 512 MB/CPU means this cluster has
an aggregate memory of 8 GB. For more details, see
http://www.msg.chem.iastate.edu/GAMESS/dist.pc.shtml.
This is a low quality network, which exposes jobs with
higher communication requirements, by noting when the wall
time is much longer than the CPU.
---
The HF wavefunctions can be evaluated in parallel using
either conventional disk storage of the integrals, or via
direct recomputation of the integrals. Some experimenting
will show which is more effective on your hardware. As an
example of the scaling performance of RHF, ROHF, UHF, or
GVB jobs that involve only computation of the energy or its
gradient, we include here a timing table from the 16 CPU PC
cluster. The molecule is luciferin, which together with the
enzyme luciferase is involved in firefly light production.
The chemical formula is C11N2S2O3H8, and RHF/6-31G(d) has
294 atomic orbitals. There's no molecular symmetry. The
run is done as direct SCF, and the CPU timing data is
p=1 p=2 p=4 p=8 p=16
1e- ints 1.1 0.6 0.4 0.3 0.2
Huckel guess 14 12 11 10 10
15 RHF iters 5995 2982 1493 772 407
properties 6.0 6.6 6.6 6.8 6.9
1e- gradient 9.7 4.7 2.3 1.2 0.7
2e- gradient 1080 541 267 134 68
---- ---- ---- ---- ----
total CPU 7106 3547 1780 925 492 seconds
total wall 7107 3562 1815 950 522 seconds
Note that direct SCF should run with the wall time very
close to the CPU time as there is essentially no I/O and
not that much communication (MEMDDI storage is not used by
this kind of run). Running the same molecule as
DFTTYP=B3LYP yields
p=1 p=2 p=4 p=8 p=16
1e- ints 1.1 0.7 0.3 0.3 0.2
Huckel guess 14 12 10 10 9
23 DFT iters 14978 7441 3681 1876 961
properties 6.6 6.4 6.5 7.0 6.5
1e- gradient 9.7 4.7 2.3 1.3 0.7
2e- grid grad 5232 2532 1225 595 303
2e- AO grad 1105 550 270 136 69
---- ---- ---- ---- ----
total CPU 21347 10547 5197 2626 1349
total wall 21348 10698 5368 2758 1477
and finally if we run an RHF analytic hessian, using AO
basis integrals, the result is
p=1 p=2 p=4 p=8 p=16
1e- ints 1.2 0.6 0.4 0.3 0.2
Huckel guess 14 12 10 10 10
14 RHF iters 5639 2851 1419 742 390
properties 6.4 6.5 6.6 7.0 6.7
1e- grd+hss 40.9 20.9 11.9 7.7 5.8
2e- grd+hss 21933 10859 5296 2606 1358
CPHF 40433 20396 10016 5185 2749
----- ----- ----- ---- ----
total CPU 68059 34146 16760 8559 4519
total wall 68102 34273 17430 9059 4978
CPU speedups for 1->16 processors for RHF gradient, DFT
gradient, and RHF analytic hessian are 14.4, 15.8, and 15.1
times faster, respectively. The wall clock times are close
to the CPU time, indicating very little communication is
involved. If you are interested in an explanation of how
the parallel SCF is implimented, see the main GAMESS paper,
M.W.Schmidt, K.K.Baldridge, J.A.Boatz, S.T.Elbert,
M.S.Gordon, J.H.Jensen, S.Koseki, N.Matsunaga,
K.A.Nguyen, S.J.Su, T.L.Windus, M.Dupuis, J.A.Montgomery
J.Comput.Chem. 14, 1347-1363(1993)
---
The CIS energy and gradient code is also programmed to
have the construction of Fock-like matrices as its
computational kernel. Its scaling is therefore very
similar to that just shown, for porphin C20N4H14, DH(d,p)
basis, 430 AOs:
p=1 p=2 p=4 p=8 p=16
setup 25 25 25 25 25
1e- ints 5.1 2.7 1.5 1.0 0.6
orb. guess 30 25 23 22 21
RHF iters 1647 850 452 251 152
RHF props 19 19 19 19 19
CIS energy 36320 18166 9098 4620 2398
CIS lagrang 6092 3094 1545 786 408
CPHF 20099 10183 5163 2688 1444
CIS density 2468 1261 632 324 170
CIS props 19 19 19 19 19
1e- grad 40.9 18.2 9.2 4.7 2.4
2e- grad 1644 849 423 223 122
----- ----- ---- ---- ----
total CPU 68424 34526 17420 8994 4791
total wall 68443 34606 17853 9258 4985
which is a speedup of 14.3 for 1->16.
---
For the next type of computation, we discuss the MP2
correction. For closed shell RHF + MP2 and unrestricted
UHF + MP2, the gradient program runs in parallel using
distributed memory, MEMDDI. In addition, the ROHF + MP2
energy correction for OSPT=ZAPT runs in parallel using
distributed memory, but OSPT=RMP does not use MEMDDI in
parallel jobs. All distributed memory parallel MP2 runs
resemble RHF+MP2, which is therefore the only example given
here.
The example is a benzoquinone precursor to hongconin, a
cardioprotective natural product. The formula is C11O4H10,
and 6-31G(d) has 245 AOs. There are 39 valence orbitals
included in the MP2 treatment, and 15 core orbitals.
MEMDDI must be 156 million words, so the memory computation
that was used above tells us that our 512 MB/CPU PC cluster
must have at least three processors to aggregate the
required MEMDDI. MOREAD was used to provide converged RHF
orbitals, so only 3 RHF iterations are performed. The
timing data are CPU and wall times (seconds) in the 1st/2nd
lines:
p=3 p=4 p=12 p=16
RHF iters 241 181 65 51
243 184 69 55
MP2 step 5,953 4,399 1,438 1,098
7,366 5,669 2,239 1,700
2e- grad 1,429 1,135 375 280
1,492 1,183 413 305
----- ----- --- ---
total CPU 7,637 5,727 1,890 1,440
total wall 9,116 7,053 2,658 2,077
3-->12 4-->16
CPU speedup 4.04 3.98
wall speedup 3.43 3.40
The wall clock time will be closer to the CPU time if the
quality of the network between the computer is improved
(remember, this run used just switched Fast Ethernet). As
noted, the number of CPUs is more influenced by a need to
aggregate the necessary total MEMDDI, more than by concerns
about scalability. MEMDDI is typically large for MP2
parallel runs, as it is proportional to the number of
occupied orbitals squared times the number of AOs squared.
For more details on the distributed data parallel MP2
program, see
G.D.Fletcher, A.P.Rendell, P.Sherwood
Mol.Phys. 91, 431-438(1997)
G.D.Fletcher, M.W.Schmidt, M.S.Gordon
Adv.Chem.Phys. 110, 267-294 (1999)
G.D.Fletcher, M.W.Schmidt, B.M.Bode, M.S.Gordon
Comput.Phys.Commun. 128, 190-200 (2000)
---
The next type of computation we will consider is
analytic computation of the nuclear Hessian (force constant
matrix). The performance of the RHF program, based on AO
integrals, was given above, as its computational kernel
(Fock-like builds) scales just as the SCF itself. However,
for high spin ROHF, low spin open shell SCF and TCSCF (both
done with GVB), the only option is MO basis integrals. The
integral transformation is parallel according to
T.L.Windus, M.W.Schmidt, M.S.Gordon
Theoret.Chim.Acta 89, 77-88(1994).
It distributes 'passes' over processors, so as to
parallelize the transformation's CPU time but not the
replicated memory, or the AO integral time. Finally the
response equation step is hardly parallel at all. The test
example is an intermediate in the ring opening of
silacyclobutane, GVB-PP(1) or TCSCF, 180 AOs for 6-
311G(2d,2p):
p=1 p=2 p=4 p=8 p=16
2e- ints 83 42 21 11 5
GVB iters 648 333 179 104 67
replicate 2e- n/a 81 81 81 82
transf. 476 254 123 67 51
1e- grd+hss 7 4 2 2 1
2e- grd+hss 4695 2295 1165 596 313
CP-TCSCF 344 339 331 312 325
---- ---- ---- ---- ----
total CPU 6256 3351 1904 1189 848
total wall 6532 3538 2072 1399 1108
Clearly, the final response equation (CPHF) step is a
sequential bottleneck, as is the fact that the orbital
hessian in this step is stored entirely on the disk space
of CPU 0. Since the integral transformation is run in
replicated MWORDS memory, rather than distributing this,
and since it also needs a duplicated AO integral file be
stored on every CPU, the code is clearly not scalable to
very many processors. Typically we would not request more
than 3 or 4 processors for an analytic ROHF or GVB hessian.
The final analytic hessian type is for MCSCF. The
scalability of the MCSCF wavefunction will be given just
below, but the response equation step for MCSCF is clearly
quite scalable. The integral transformation for the
response equation step uses distributed memory MEMDDI, and
should scale like the MP2 program (documented above). The
test case has 8e- in 8 orbitals, and the time reflect this,
with most of the work involving the 4900 determinants.
Total speedup for 4->16 is 4.11, due to luckier work
distributing for 16 CPUs:
p=4 p=16
MCSCF wfn 113.5 106.1
DDI transf. 68.4 19.3
1e- grd+hss 1.5 0.6
2e- grd+hss 2024.9 509.8
CPMCHF RHS 878.8 225.8 (RHS=right hand
sides)
CPMCHF iters 115343.5 27885.9
-------- --------
total CPU 118430.8 28747.6
total wall 119766.0 30746.4
This code can clearly benefit from using many processors,
with scalability of the MCSCF step itself almost moot.
---
Now lets turn to MCSCF energy/gradient runs. We will
illustrate two convergers, SOSCF and then FULLNR. The
former uses a 'pass' type of integral transformation (ala
the GVB hessian job above), and runs in replicated memory
only (no MEMDDI). The FULLNR converger is based on the MP2
program's distributed memory integral transformation, so it
uses MEMDDI. In addition, the parallel implementation of
the FULLNR step never forms the orbital hessian explicitly,
doing Davidson style iterations to predict the new
orbitals. Thus the memory demand is almost entirely
MEMDDI.
The example we choose is at a transition state for the
water molecule assisted proton transfer in the first
excited stat of 7-azaindole. The formula is C7N2H6(H2O),
there are 190 active orbitals, and the active space is the
10 pi electrons in 9 pi orbitals of the azaindole portion.
There are 15,876 determinants used in the MCSCF
calculation, and 5,292 CSFs in the perturbation calculation
to follow. See Figure 6 of G.M.Chaban, M.S.Gordon
J.Phys.Chem.A 103, 185-189(1999) if you are interested in
this chemistry. The timing data for the SOSCF converger
are
p=1 p=2 p=4 p=8 p=16
dup. 2e- ints 327.6 331.3 326.4 325.8 326.5
transform. 285.1 153.6 88.4 57.8 47.3
det CI 39.3 39.4 38.9 38.3 38.1
2e- dens. 0.4 0.5 0.5 0.5 0.5
orb. update 39.2 25.9 17.4 12.8 11.0
iters 2-16 5340.0 3153.5 2043.7 1513.6 1308.5
1e- grad 5.3 2.3 1.3 0.7 0.4
2e- grad 695.6 354.9 179.4 93.2 50.9
------ ------ ------ ------ ------
total CPU 6,743 4,071 2,705 2,052 1,793
total wall 13,761 8,289 4,986 3,429 3,899
whereas the FULLNR convergers runs like this
p=1 p=2 p=4 p=8 p=16
2e- DDI trans. 2547 1385 698 354 173
det. CI 39 39 38 38 38
DM2 0.5 0.5 0.5 0.5 0.5
FULLNR 660 376 194 101 51
iters 2-9 24324 13440 6942 3669 1940
1e- grad 5.3 2.3 1.2 0.7 0.4
2e- grad 700 352 181 95 51
------ ------ ---- ---- ----
total CPU 28, 15,605 8,066 4,268 2,265
total wall 28,290 20,719 12,866 8,292 5,583
The first iteration is broken down into its primary steps
from the integral transformation to the orbital update,
inclusive. The SOSCF program is clearly faster, and should
be used when the number of processors is modest (say up to
8), however the largest molecules will benefit from using
more processors and the much more scalable FULLNR program.
One should note that the CI calculation was more or less
serial here. This data comes from before the ALDET and
ORMAS codes were given a replicated memory parallization,
so scaling in the CI step should now be OK, to say 8 or 16
CPUs. However, these two CI code's use of replicated
memory in the CI step limits MCSCF scalability in the large
active space limit.
Now let's consider the second order pertubation
correction for this example. As noted, it is an excited
state, so the test corrects two states simultaneously (S0
and S1). The parallel multireference perturbation program
is described in
H.Umeda, S.Koseki, U.Nagashima, M.W.Schmidt
J.Comput.Chem. 22, 1243-1251 (2001)
The run is given the converged S1 orbitals, so that it can
skip directly to the perturbation calculation:
p=1 p=2 p=4 p=8 p=16
2e- ints 332 332 329 328 331
MCQDPT 87921 43864 22008 11082 5697
----- ----- ----- ----- -----
total CPU 88261 44205 22345 11418 6028
total wall 91508 45818 23556 12350 6852
This corresponds to a speedup for 1->16 of 14.6.
---
In summary, most ab initio computations will run in
less time on more than one processor. However, some things
can be run only on 1 CPU, namely
semi-empirical runs
RHF+CI gradient
Coupled-Cluster calculations
Some steps run with little or no speedup, forming
sequential bottlenecks that limit scalability. They do not
prevent jobs from running in parallel, but restrict the
total number of processors that can be effectively used:
ROHF/GVB hessians: solution of response equations
MCSCF: Hamiltonian and 2e- density matrix (CI)
energy localizations: the orbital localization step
transition moments/spin-orbit: the final property step
MCQDPT reference weight option
Future versions of GAMESS will address these bottlenecks.
A short summary of the useful number of CPUs (based on
data like the above) would be
RHF, ROHF, UHF, GVB energy/gradient, their
DFT analogs, and CIS excited states 16-32+
MCSCF energy/gradient
SOSCF 4-8
FULLNR 8-32+
analytic hessians
RHF 16-32+
ROHF/GVB 4-8
MCSCF 64-128+
MPLEVL=2
RHF, UHF, ROHF OSPT=ZAPT 8-256+
ROHF OSPT=RMP energy 8
MCSCF 16+
Altering program limits
Almost all arrays in GAMESS are allocated dynamically,
but some variables must be held in common as their use is
ubiquitous. An example would be the common block /NSHEL/
which holds the ab initio atom's basis set. The following
Unix script, which we call 'mung' (see Wikipedia entry for
recursive acronyms), changes the PARAMETER statements that
set various limitations:
#!/bin/csh
#
# automatically change GAMESS' built-in dimensions
#
chdir /u1/mike/gamess/source
#
foreach FILE (*.src)
set FILE=$FILE:r
echo ===== redimensioning in $FILE =====
echo "C dd-mmm-yy - SELECT NEW DIMENSIONS" \
> $FILE.munged
sed -e "/MXATM=2000/s//MXATM=500/" \
-e "/MXAO=8192/s//MXAO=2047/" \
-e "/MXGSH=30/s//MXGSH=30/" \
-e "/MXSH=5000/s//MXSH=1000/" \
-e "/MXGTOT=20000/s//MXGTOT=5000/" \
-e "/MXRT=100/s//MXRT=100/" \
-e "/MXFRG=1050/s//MXFRG=65/" \
-e "/MXDFG=5/s//MXDFG=1/" \
-e "/MXPT=2000/s//MXPT=100/" \
-e "/MXFGPT=12000/s//MXFGPT=2000/" \
-e "/MXSP=500/s//MXSP=100/" \
-e "/MXTS=20000/s//MXTS=2500/" \
-e "/MXABC=6000/s//MXABC=1/" \
$FILE.src >> $FILE.munged
mv $FILE.munged $FILE.src
end
exit
The script shows how to reduce memory, by decreasing
the number of atoms and basis functions, and reducing the
storage for the effective fragment and PCM solvent models.
Of course, the 'mung' script can also be used to
increase the dimensions!
To fully turn off effective fragment storage, use
MXFRG=4, MXDFG=1, MXPT=8, MXFGPT=4. To fully turn off PCM
storage, use MXSP=1, MXTS=1. The parameters currently used
for GAMESS imply about 75 MBytes of storage tied up in
common blocks, which is not unreasonable, even in a laptop.
Reducing the storage size makes sense mainly on microkernel
type systems, without virtual memory managers.
In this script,
MXATM = max number of ab initio atoms
MXAO = max number of basis functions
MXGSH = max number of Gaussians per shell
MXSH = max number of symmetry unique shells
MXGTOT= max number of symmetry unique Gaussians
MXRT = max number of MCSCF/CI states
MXFRG = max number of effective fragment potentials
MXDFG = max number of different effective fragments
MXPT = max number of points in any one term of any EFP
MXFGPT= maximum storage for all EFPs, and is sized for
a large number of EFPs with a small number of
points (solvent applications), or a smaller
number of EFPs with many points (biochemistry).
MXSP = max number of spheres (sfera) in PCM
MXTS = max number of tesserae in PCM
MXABC = max number of A,B,C matrices in the COSMO
algorithm. The default value of 6000 allows
the construction of cavities for roughly 150
to 200 atoms.
Names of source code modules
The source code for GAMESS is divided into a number of
sections, called modules, each of which does related
things, and is a handy size to edit. The following is a
list of the different modules, what they do, and notes on
their machine dependencies.
machine
module description dependency
------- ------------------------- ----------
ALDECI Ames Lab determinant full CI code 1
ALGNCI Ames Lab determinant general CI code
BASCCN Dunning cc-pVxZ basis sets
BASECP SBKJC and HW valence basis sets
BASEXT DH, MC, 6-311G extended basis sets
BASG3L G3Large basis sets
BASHUZ Huzinaga MINI/MIDI basis sets to Xe
BASHZ2 Huzinaga MINI/MIDI basis sets Cs-Rn
BASKAR Karlsruhe (Ahlrichs) TZV basis sets
BASN21 N-21G basis sets
BASN31 N-31G basis sets
BASPCN Jensen polarization consistent basis sets
BASSTO STO-NG basis sets
BLAS level 1 basic linear algebra subprograms
CCAUX auxiliary routines for CC calculations
CCDDI parallel CCSD(T) program
CCQAUX auxiliaries for CCSD(TQ) program
CCQUAD renormalized CCSD(TQ) corrections
CCSDT renormalized CCSD(T) program 1
CEEIS corr. energy extrap. by intrinsic scaling
CEPA SR and MR-CEPA,AQCC,CPF calculations
CHGPEN screening for charge penetration of EFPs
CISGRD CI singles and its gradient 1
COSMO conductor-like screening model
COSPRT printing routine for COSMO
CPHF coupled perturbed Hartree-Fock 1
CPMCHF multiconfigurational CPHF 1
CPROHF open shell/TCSCF CPHF 1
DCCC divide and conquer coupled cluster
DCGRD divide and conquer gradients
DCGUES divide and conquer orbital guess
DCINT2 divide and conquer AO integrals 1
DCLIB divide and conquer library routines
DCMP2 divide and conquer MP2 1
DCSCF divide and conquer SCF
DCTRAN divide and conquer integral transf. 1
DDILIB message passing library interface code
DELOCL delocalized coordinates
DEMRPT determinant-based MCQDPT
DFT grid-free DFT drivers 1
DFTAUX grid-free DFT auxiliary basis integrals
DFTBX density-functional tight binding, main driver 3
DFTBFO DFTB, Fock matrix construction
DFTBGR DFTB, analytic gradient
DFTBHS DFTB, analytic Hessian
DFTBLB DFTB library, including reading the input
DFTBPB DFTB with periodic boundary condiutions
DFTBSK DFTB: Slater-Koster tables (parameters)
DFTBTD Time-dependent DFTB
DFTDIS empirical dispersion correction to DFT
DFTFUN grid-free DFT functionals
DFTGRD grid DFT implementation
DFTINT grid-free DFT integrals 1
DFTXCA grid DFT functionals, hand coded
DFTXCB grid DFT functionals, from repository
DFTXCC grid DFT functionals for meta-GGA
DFTXCD grid DFT functionals B97, etc
DFTXCE grid DFT functionals for PKZB/TPSS family
DFTXCF grid DFT functionals for CAMB3LYPdir
DFTXCG grid DFT functional for revTPSS
DGEEV general matrix eigenvalue problem
DGESVD single value decomposition
DIAB MCSCF state diabatization
DMULTI Amos' distributed multipole analysis
DRC dynamic reaction coordinate
EAIPCC EA-EOM and IP-EOM method
ECP pseudopotential integrals
ECPDER pseudopotential derivative integrals
ECPLIB initialization code for ECP
ECPPOT HW and SBKJC internally stored potentials
EFCHTR fragment charge transfer
EFDRVR fragment only calculation drivers
EFELEC fragment-fragment interactions
EFGRD2 2e- integrals for EFP numerical hessian
EFGRDA ab initio/fragment gradient integrals
EFGRDB " " " " "
EFGRDC " " " " "
EFINP effective fragment potential input
EFINTA ab initio/fragment integrals
EFINTB " " " "
EFMO EFP + FMO interfacing
EFPAUL effective fragment Pauli repulsion
EFPCM EFP/PCM interfacing
EFPCOV EFP style QM/MM boundary code
EFPFMO FMO and EFP interface
EFTEI QM/EFP 2e- integrals 1
EIGEN Givens-Householder, Jacobi diagonalization
ELGLIB elongation method utility routines
ELGLOC elongation method orbital localization
ELGSCF elongation method Hartree-Fock 1
EOMCC equation of motion excited state CCSD
EWALD Ewald summations for EFP model
EXCORR interface to MPQC?s R12 programs
FFIELD finite field polarizabilitie
FMO n-mer drivers for Fragment Molecular Orbital
FMOAFO FMO: adaptive frozen orbitals for DFTB
FMOCP couple-perturbed equations, known as SCZV
FMOESD elestrostatic potential derivatives for FMO
FMOGRD gradient routines for FMO
FMOH1A FMO: one-electron integrals for Hessian
FMOH2A FMO: two-electron integrals for Hessian
FMOH2B FMO: two-electron integrals for Hessian
FMOH2C FMO: two-electron integrals for Hessian
FMOHSS analytic Hessian
FMOINT integrals for FMO
FMOIO input/output and printing for FMO
FMOLIB utilities for FMO
FMOMM FMO: multipole method for electrostatics
FMOPBC periodic boundary conditions for FMO
FMOPRP properties for FMO
FRFMT free format input scanner
FSODCI determinant based second order CI
G3 G3(MP2,CCSD(T)) thermochemistry
GAMESS main program, important driver routines
GLOBOP Monte Carlo fragment global optimizer
GMCPT general MCQDPT multireference PT code 1
GRADEX traces gradient extremals
GRD1 one electron gradient integrals
GRD2A two electron gradient integrals 1
GRD2B specialized sp gradient integrals
GRD2C general spdfg gradient integrals
GUESS initial orbital guess
GUGDGA Davidson CI diagonalization 1
GUGDGB " " " 1
GUGDM 1 particle density matrix
GUGDM2 2 particle density matrix 1
GUGDRT distinct row table generation
GUGEM GUGA method energy matrix formation 1
GUGSRT sort transformed integrals 1
GVB generalized valence bond HF-SCF 1
HESS hessian computation drivers
HSS1A one electron hessian integrals
HSS1B " " " "
HSS2A two electron hessian integrals 1
HSS2B " " " "
INPUTA read geometry, basis, symmetry, etc.
INPUTB " " " "
INPUTC " " " "
INT1 one electron integrals
INT2A two electron integrals (Rys) 1
INT2B two electron integrals (s,p,L rot.axis)
INT2C ERIC TEI code, and its s,p routines 11
INT2D ERIC special code for d TEI
INT2F ERIC special code for f TEI
INT2G ERIC special code for g TEI
INT2R s,p,d,L rotated axis integral package
INT2S s,p,d,L quadrature code
INT2T s,p,d,L quadrature code
INT2U s,p,d,L quadrature code
INT2V s,p,d,L quadrature code
INT2W s,p,d,L quadrature code
INT2X s,p,d,L quadrature code
IOLIB input/output routines,etc. 2
IVOCAS improved virtual orbital CAS energy 1
LAGRAN CI Lagrangian matrix 1
LOCAL various localization methods 1
LOCCD LCD SCF localization analysis
LOCPOL LCD SCF polarizability analysis 1
LOCSVD singular value decomposition localization
LRD local response dispersion correction
LUT local unitary transformation IOTC
MCCAS FOCAS/SOSCF MCSCF calculation 1
MCJAC JACOBI MCSCF calculation
MCPGRD model core potential nuclear gradient
MCPINP model core potential input
MCPINT model core potential integrals
MCPL10 model core potential library
MCPL20 " " " "
MCPL30 " " " "
MCPL40 " " " "
MCPL50 " " " "
MCPL60 " " " "
MCPL70 " " " "
MCPL80 " " " "
MCQDPT multireference perturbation theory 1
MCQDWT weights for MR-perturbation theory
MCQUD QUAD MCSCF calculation 1
MCSCF FULLNR MCSCF calculation 1
MCTWO two electron terms for FULLNR MCSCF 1
MDEFP molecular dynamics using EFP particles
MEXING minimum energy crossing point search
MLTFMO multiscale solvation in FMO
MM23 MMCC(2,3) corrections to EOMCCSD
MOROKM Morokuma energy decomposition 1
MNSOL U.Minnesota solution models
MP2 2nd order Moller-Plesset 1
MP2DDI distributed data parallel MP2
MP2GRD CPHF and density for MP2 gradients 1
MP2GR2 disk based MP2 gradient program
MP2IMS disk based MP2 energy program
MPCDAT MOPAC parameterization
MPCGRD MOPAC gradient
MPCINT MOPAC integrals
MPCMOL MOPAC molecule setup
MPCMSC miscellaneous MOPAC routines
MTHLIB printout, matrix math utilities
NAMEIO namelist I/O simulator
NEOSTB dummy routines for NEO program
NMR nuclear magnetic resonance shifts 1
ORDINT sort atomic integrals 1
ORMAS1 occ. restricted multiple act. space CI
PARLEY communicate to other programs
PCM Polarizable Continuum Model setup
PCMCAV PCM cavity creation
PCMCV2 PCM cavity for gradients
PCMDER PCM gradients
PCMDIS PCM dispersion energy
PCMIEF PCM integral equation formalism
PCMPOL PCM polarizabilities
PCMVCH PCM repulsion and escaped charge
PRMAMM atomic multipole moment expansion
PRPEL electrostatic properties
PRPLIB miscellaneous properties
PRPPOP population properties
QEIGEN 128 bit precision RI for relativity 11
QFMM quantum fast multipole method
QMFM additional QFMM code
QMMM dummy routines for Tinker/SIMOMM program
QREL relativistic transformations
QUANPO Quantum Chem Polarizable force field
RAMAN Raman intensity
RHFUHF RHF, UHF, and ROHF HF-SCF 1
ROHFCC open shell CC computations 1
RXNCRD intrinsic reaction coordinate
RYSPOL roots for Rys polynomials
SCFLIB HF-SCF utility routines, DIIS code
SCFMI molecular interaction SCF code
SCRF self consistent reaction field
SOBRT full Breit-Pauli spin-orbit compling
SOFFAC spin-orbit matrix element form factors
SOLIB spin-orbit library routines
SOZEFF 1e- spin-orbit coupling terms
STATPT geometry and transition state finder
SURF PES scanning
SVPCHG surface volume polarization (SS(V)PE)
SVPINP input/output routines for SS(V)PE
SVPLEB Lebedev grids for SS(V)PE integration
SYMORB orbital symmetry assignment
SYMSLC " " "
TDDEFP EFP solvent effects on TD-DFT
TDDFT time-dependent DFT excitations
TDDFUN functionals for TD-DFT
TDDFXC exchange-corr. grid pts. for TD-DFT
TDDGRD gradient code for TD-DFT
TDDINT integral terms for TD-DFT 1
TDDNLR non-linear (two photon) TD-DFT
TDDXCA TD-DFT functional derivatives
TDDXCC TD-DFT functional derivatives
TDDXCD TD-DFT functional der. for metaGGA
TDHF time-dependent Hartree-Fock polarzblity 1
TDX extended time-dependent RHF
TDXIO input/output for extended TDHF
TDXITR iterative procedures in extended TDHF
TDXNI non-iterative tasks in extended TDHF
TDXPRP properties from extended TDHF
TRANS partial integral transformation 1
TRFDM2 two particle density backtransform 1
TRNSTN CI transition moments
TRUDGE nongradient optimization
UMPDDI distributed data parallel MP2
UNPORT unportable, nasty code 3,4,5,6,7,8
UTDDFT unrestricted TD-DFT 1
VBDUM dummy routines for VB programs
VECTOR vectorized version routines 10
VIBANL normal coordinate analysis
VSCF anharmonic frequencies
VVOS valence virtual orbitals
ZAPDDI distrib. data ZAPT2 open shell PT gradient
ZHEEV complex matrix diagonalization
ZMATRX internal coordinates
UNIX versions use the C code ZUNIX.C for dynamic memory.
The machine dependencies noted above are:
1) packing/unpacking 2) OPEN/CLOSE statments
3) machine specification 4) fix total dynamic memory
5) subroutine walkback 6) error handling calls
7) timing calls 8) LOGAND function
10) vector library calls 11) REAL*16 data type
Note that the message passing support (DDI) for GAMESS is
implemented in C (for most machines), and is stored in a
separate subdirectory. Please see the ~/games/ddi tree for
more information about the Distributed Data Interface's
code and usage.
Programming Conventions
The following "rules" should be adhered
to in making changes in GAMESS. These
rules are important in maintaining
portability, and should be adhered to.
The following rule is so important that it is not given
a number,
The Golden Rule: make sure your code not only has no
compiler diagnostics (try as many compilers as possible),
but that it also has no FTNCHEK diagnostics. The FTNCHEK
program of Robert Moniot is a fantastic debugging tool, and
results in the great portability of GAMESS. You can learn
how to get FTNCHEK, and how to use it from the script
~/gamess/tools/checkgms
Rule 1. If there is a way to do it that works on all
computers, do it that way. Commenting out statements for
the different types of computers should be your last
resort. If it is necessary to add lines specific to your
computer, PUT IN CODE FOR ALL OTHER SUPPORTED MACHINES.
Even if you don't have access to all the types of supported
hardware, you can look at the other machine specific
examples found in GAMESS, or ask for help from someone who
does understand the various machines. If a module does not
already contain some machine specific statements (see the
above list) be especially reluctant to introduce
dependencies.
Rule 2. Write a double precision program, and let the
source activator handle any conversion to single precision,
when that is necessary:
a) Use IMPLICIT DOUBLE PRECISION(A-H,O-Z) specification
statements throughout. Not REAL*8. Integer type should be
just INTEGER, so that compiler flags can select 64 or 32
bit integers at compile time.
b) All floating point constants should be entered as if
they were in double precision, in a format that the souce
code activator can recognize as being uniquely a number.
Namely, the constants should contain a decimal point, a
number after the decimal, and a signed, two digit exponent.
A legal constant is 1.234D-02. Illegal examples are 1D+00,
5.0E+00, 3.0D-2. Check for illegals by
grep "[0-9][DE][0-9]" *.src
grep "[0-9][.]D" *.src
grep "[0-9][.][0-9][DE][0-9]" *.src
grep "[0-9][DE][+-][1-9][^0-9]" *.src
c) Double precision BLAS names are used throughout, for
example DDOT instead of SDOT, and DGEMM instead of SGEMM.
The source code activator ACTVTE will
automatically convert these double
precision constructs into the correct
single precision expressions for machines
that have 64 rather than 32 bit words.
Rule 3. FORTRAN 77 allows for generic functions. Thus
the routine SQRT should be used in place of DSQRT, as this
will automatically be given the correct precision by the
compilers. Use ABS, COS, INT, etc. Your compiler manual
will tell you all the generic names.
Rule 4. Every routine in GAMESS begins with a card
containing the name of the module and the routine. An
example is "C*MODULE xxxxxx *DECK yyyyyy". The second
star is in column 18. Here, xxxxxx is the name of the
module, and yyyyyy is the name of the routine. This rule
is designed to make it easier for a person completely
unfamiliar with GAMESS to find routines.
Rule 5. Whenever a change is made to a module, this
should be recorded at the top of the module. The
information required is the date, initials of the person
making the change, and a terse summary of the change.
Rule 6. No imbedded tabs, statements must lie between
columns 7 and 72, etc. In other words, old style syntax.
* * *
The next few "rules" are not adhered to
in all sections of GAMESS. Nonetheless
they should be followed as much as
possible, whether you are writing new
code, or modifying an old section.
Rule 7. Stick to the FORTRAN naming convention for
integer (I-N) and floating point variables (A-H,O-Z). If
you've ever worked with a program that didn't obey this,
you'll understand why.
Rule 8. Always use a dynamic memory allocation routine
that calls the real routine. A good name for the memory
routine is to replace the last letter of the real routine
with the letter M for memory.
Rule 9. All the usual good programming techniques,
such as indented DO loops ending on CONTINUEs, IF-THEN-ELSE
where this is clearer, 3 digit statement labels in
ascending order, no three branch GO TO's, descriptive
variable names, 4 digit FORMATs, etc, etc.
The next set of rules relates to coding
practices which are necessary for the
parallel version of GAMESS to function
sensibly. They must be followed without
exception!
Rule 10. All open, rewind, and close operations on
sequential files must be performed with the subroutines
SEQOPN, SEQREW, and SEQCLO respectively. You can find
these routines in IOLIB, they are easy to use. SQREAD,
SQWRIT, and various integral I/O routines like PREAD are
used to process the contents of such files. The variable
DSKWRK tells if you are processing a distributed file (one
split between all compute processes, DSKWRK=.TRUE.) or a
single file on the master process (DSKWRK=.FALSE.,
resulting in broadcasts of the data from the master to all
other CPUs).
Rule 11. All READ and WRITE statements for the
formatted files 5, 6, 7 (variables IR, IW, IP, or named
files INPUT, OUTPUT, PUNCH) must be performed only by the
master task. Therefore, these statements must be enclosed
in "IF (MASWRK) THEN" clauses. The MASWRK variable is
found in the /PAR/ common block, and is true on the master
process only. This avoids duplicate output from the other
processes.
Rule 12. All error termination is done by "CALL ABRT"
rather than a STOP statement. Since this subroutine never
returns, it is OK to follow it with a STOP statement, as
compilers may not be happy without a STOP as the final
executable statment in a routine. The purpose of calling
ABRT is to make sure that all parallel tasks get shut down
properly.
Parallel broadcast identifiers
GAMESS uses DDI calls to pass messages between the
parallel processes. Every message is identified by a
unique number, hence the following list of how the numbers
are used at present. If you need to add to these, look at
the existing code and use the following numbers as
guidelines to make your decision. All broadcast numbers
must be between 1 and 32767.
20 : Parallel timing
100 - 199 : DICTNRY file reads
200 - 204 : Restart info from the DICTNRY file
210 - 214 : Pread
220 - 224 : PKread
225 : RAread
230 : SQread
250 - 265 : Nameio
275 - 310 : Free format
325 - 329 : $PROP group input
350 - 354 : $VEC group input
400 - 424 : $GRAD group input
425 - 449 : $HESS group input
450 - 474 : $DIPDR group input
475 - 499 : $VIB group input
500 - 599 : matrix utility routines
800 - 830 : Orbital symmetry
900 : ECP 1e- integrals
910 : 1e- integrals
920 - 975 : EFP and SCRF integrals
980 - 999 : property integrals
1000 - 1025 : SCF wavefunctions
1030 - 1041 : broadcasts in DFT
1050 : Coulomb integrals
1200 - 1215 : MP2
1300 - 1320 : localization
1495 - 1499 : reserved for Jim Shoemaker
1500 : One-electron gradients
1505 - 1599 : EFP and SCRF gradients
1600 - 1602 : Two-electron gradients
1605 - 1620 : One-electron hessians
1650 - 1665 : Two-electron hessians
1700 - 1750 : integral transformation
1800 : GUGA sorting
1850 - 1865 : GUGA CI diagonalization
1900 - 1910 : GUGA DM2 generation
2000 - 2010 : MCSCF
2100 - 2120 : coupled perturbed HF
2150 - 2200 : MCSCF hessian
2300 - 2309 : spin-orbit jobs
Disk files used by GAMESS
These files must be defined by your control language in
order to execute GAMESS. For example, on UNIX the "name"
field shown below should be set in the environment to the
actual file name to be used. Most runs will open only a
subset of the files shown below, with only files 5, 6, 7,
and 10 used by every run. Files 1, 2, 3 (both), 4, 5, 6,
7, and 35 contain formatted data, while all others are
binary (unformatted) files. Files ERICFMT, EXTBAS, and
MCPPATH are used to read data into GAMESS. Files MAKEFP,
TRAJECT, RESTART, and PUNCH are supplemental output files,
containing more concise summaries than the log file for
certain kinds of data.
unit name contents
---- ---- --------
1 MAKEFP effective fragment potential from MAKEFP run
2 ERICFMT Fm(t) interpolation table data, a data file
named ericfmt.dat, supplied with GAMESS.
3 MCPPATH a directory of model core potentials and
associated basis sets, supplied with GAMESS
3 EXTBAS external basis set library (user supplied)
3 GAMMA 3rd nuclear derivatives
4 TRAJECT trajectory results for IRC, DRC, or MD runs.
summary of results for RUNTYP=GLOBOP.
35 RESTART restart data for numerical HESSIAN runs,
numerical gradients, or for RUNTYP=VSCF.
Used as a scratch unit during MAKEFP.
5 INPUT Namelist input file. This MUST be a disk
file, as GAMESS rewinds this file often.
6 OUTPUT Print output (main log file).
If not defined, UNIX systems will use the
file "standard output" for this.
7 PUNCH Punch output. A copy of the $DATA deck,
orbitals for every geometry calculated,
hessian matrix, normal modes from FORCE,
properties output, etc. etc. etc.
8 AOINTS Two e- integrals in AO basis
9 MOINTS Two e- integrals in MO basis
10 DICTNRY Master dictionary, for contents see below.
11 DRTFILE Distinct row table file for -CI- or -MCSCF-
12 CIVECTR Eigenvector file for -CI- or -MCSCF-
13 CASINTS semi-transformed ints for FOCAS/SOSCF MCSCF
scratch file during spin-orbit coupling
14 CIINTS Sorted integrals for -CI- or -MCSCF-
15 WORK15 GUGA loops for Hamiltonian diagonal;
ordered two body density matrix for MCSCF;
scratch storage during GUGA Davidson diag;
Hessian update info during 2nd order SCF;
[ij|ab] integrals during MP2 gradient
density matrices during determinant CI
16 WORK16 GUGA loops for Hamiltonian off-diagonal;
unordered GUGA DM2 matrix for MCSCF;
orbital hessian during MCSCF;
orbital hessian for analytic hessian CPHF;
orbital hessian during MP2 gradient CPHF;
two body density during MP2 gradient
17 CSFSAVE CSF data for state to state transition runs.
18 FOCKDER derivative Fock matrices for analytic hess
19 WORK19 used during CP-MCHF response equations
20 DASORT Sort file for various -MCSCF- or -CI- steps;
also used by SCF level DIIS
21 DFTINTS four center overlap ints for grid-free DFT
21 DIABDAT density/CI info during MCSCF diabatization
22 DFTGRID mesh information for grid DFT
23 JKFILE shell J, K, and Fock matrices for -GVB-;
Hessian update info during SOSCF MCSCF;
orbital gradient and hessian for QUAD MCSCF
24 ORDINT sorted AO integrals;
integral subsets during Morokuma analysis
25 EFPIND electric field integrals for EFP
26 PCMDATA gradient and D-inverse data for PCM runs
27 PCMINTS normal projections of PCM field gradients
26 SVPWRK1 conjugate gradient solver for SV(P)SE
27 SVPWRK2 conjugate gradient solver for SV(P)SE
26 COSCAV scratch file for COSMO's solvent cavity
27 COSDATA output file to process by COSMO-RS program
27 COSPOT DCOSMO input file, from COSMO-RS program
28 MLTPL QMFM file, no longer used
29 MLTPLT QMFM file, no longer used
30 DAFL30 direct access file for FOCAS MCSCF's DIIS,
direct access file for NEO's nuclear DIIS,
direct access file for DC's DIIS.
form factor sorting for Breit spin-orbit
31 SOINTX Lx 2e- integrals during spin-orbit
32 SOINTY Ly 2e- integrals during spin-orbit
33 SOINTZ Lz 2e- integrals during spin-orbit
34 SORESC RESC symmetrization of SO ints
35 RESTART documented at the beginning of this list
37 GCILIST determinant list for general CI program
38 HESSIAN hessian for FMO optimisations;
gradient for FMO with restarts
39 QMMTEI reserved for future use
40 SOCCDAT CSF list for SOC;
fragment densities/orbitals for FMO
41 AABB41 aabb spinor [ia|jb] integrals during UMP2
42 BBAA42 bbaa spinor [ia|jb] integrals during UMP2
43 BBBB43 bbbb spinor [ia|jb] integrals during UMP2
44 REMD replica exchange molecular dynamics data
45 UNV LUT-IOTC's unitary transf. of V ints
46 UNPV LUT-IOTC's unitary transf. of pVp ints
files 50-63 are used for MCQDPT runs.
files 50-54 are also used by CODE=IMS MP2 runs.
unit name contents
---- ---- --------
50 MCQD50 Direct access file for MCQDPT, its
contents are documented in source code.
51 MCQD51 One-body coupling constants for
CAS-CI and other routines
52 MCQD52 One-body coupling constants for perturb.
53 MCQD53 One-body coupling constants extracted
from MCQD52
54 MCQD54 One-body coupling constants extracted
further from MCQD52
55 MCQD55 Sorted 2e- AO integrals
56 MCQD56 Half transformed 2e- integrals
57 MCQD57 transformed 2e- integrals of (ii|ii) type
58 MCQD58 transformed 2e- integrals of (ei|ii) type
59 MCQD59 transformed 2e- integrals of (ei|ei) type
60 MCQD60 2e- integral in MO basis arranged for
perturbation calculations
61 MCQD61 One-body coupling constants between state
and CSF
62 MCQD62 Two-body coupling constants between state
and CSF
63 MCQD63 canonical Fock orbitals (FORMATTED)
64 MCQD64 Spin functions and orbital configuration
functions (FORMATTED)
unit name contents
---- ---- --------
for RI-MP2 calculations only
51 RIVMAT 2c-2e inverse matrix
52 RIT2A 2nd index transformation data
53 RIT3A 3rd index transformation data
54 RIT2B 2nd index data for beta orbitals of UMP2
55 RIT3B 3rd index data for beta orbitals of UMP2
unit name contents
---- ---- --------
for RUNTYP=NMR only
61 NMRINT1 derivative integrals for NMR
62 NMRINT2 " " " "
63 NMRINT3 " " " "
64 NMRINT4 " " " "
65 NMRINT5 " " " "
66 NMRINT6 " " " "
for RUNTYP=MAKEFP (or dynamic polarizability run)
67 DCPHFH2 magnetic hessian in dynamic polarizability
68 DCPHF21 magnetic hessian times electronic hessian
for NEO runs, only (DAFL30 has nuclear DIIS)
67 ELNUINT electron-nucleus AO integrals
68 NUNUINT nucleus-nucleus AO integrals
69 NUMOIN nucleus-nucleus MO integrals
70 NUMOCAS nucleus-nucleus half transformed integrals
71 NUELMO nucleus-electron MO integrals
72 NUELCAS nucleus-electron half transformed integrals
for elongation method, only
70 ELGDOS elongation density of states
71 ELGDAT elongation frozen/active region data
72 ELGPAR elongation geometry optimization info
74 ELGCUT elongation cutoff information
75 ELGVEC elongation localized orbitals
77 ELINTA elongation 2e- for cut-off part
78 EGINTB elongation 2e- for next elongation
79 EGTDHF elongation TDHF (future use)
80 EGTEST elongation test file
99 PT2INT integrals for MPQC?s PT2 R-12 correction
99 PT2RDM 2 particle reduced density for MPQC?s R-12
99 PT2BAS geom/basis/orbs for MPQC?s R-12 correction
files 70-98 are used for closed shell Coupled-Cluster,
all of these are direct access files.
unit name contents
---- ---- --------
70 CCREST T1 and T2 amplitudes for restarting
71 CCDIIS amplitude converger's scratch data
72 CCINTS MO integrals sorted by classes
73 CCT1AMP T1 amplitudes and some No*Nu intermediates
for MMCC(2,3)
74 CCT2AMP T2 amplitudes and some No**2 times Nu**2
intermediates for MMCC(2,3)
75 CCT3AMP M3 moments
76 CCVM No**3 times Nu - type main intermediate
77 CCVE No times Nu**3 - type main intermediate
78 CCAUADS Nu**3 times No intermediates for (TQ)
79 QUADSVO No*Nu**2 times No intermediates for (TQ)
80 EOMSTAR initial vectors for EOMCCSD calculations
81 EOMVEC1 iterative space for R1 components
82 EOMVEC2 iterative space for R2 components
83 EOMHC1 singly excited components of H-bar*R
84 EOMHC2 doubly excited components of H-bar*R
85 EOMHHHH intermediate used by EOMCCSD
86 EOMPPPP intermediate used by EOMCCSD
87 EOMRAMP converged EOMCCSD right (R) amplitudes
88 EOMRTMP converged EOMCCSD amplitudes for MEOM=2
(if the max. no. of iterations exceeded)
89 EOMDG12 diagonal part of H-bar
90 MMPP diagonal parts for triples-triples H-bar
91 MMHPP diagonal parts for triples-triples H-bar
92 MMCIVEC Converged CISD vectors
93 MMCIVC1 Converged CISD vectors for mci=2
(if the max. no. of iterations exceeded)
94 MMCIITR Iterative space in CISD calculations
95 EOMVL1 iterative space for L1 components
96 EOMVL2 iterative space for L2 components
97 EOMLVEC converged EOMCCSD left eigenvectors
98 EOMHL1 singly excited components of L*H-bar
99 EOMHL2 doubly excited components of L*H-bar
the next group of files (70-95) is for open shell CC:
unit name contents
---- ---- --------
70 AMPROCC restart info CCSD/Lambda eq./EA-EOM/IP-EOM
71 ITOPNCC working copy of the same information
72 FOCKMTX subsets of F-alpha and F-beta matrices
73 LAMB23 data during CC(2,3) step
74 VHHAA [i,k|j,l]-[i,l|j,k] alpha/alpha
75 VHHBB [i,k|j,l]-[i,l|j,k] beta/beta
76 VHHAB [i,k|j,l] alpha/beta
77 VMAA [j,l|k,a]-[j,a|k,l] alpha/alpha
78 VMBB [j,l|k,a]-[j,a|k,l] beta/beta
79 VMAB [j,l|k,a] alpha/beta
80 VMBA [j,l|k,a] beta/alpha
81 VHPRAA [a,j|c,l]-[a,l|c,j] alpha/alpha
82 VHPRBB [a,j|c,l]-[a,l|c,j] beta/beta
83 VHPRAB [a,j|b,l] alpha/beta
84 VHPLAA [a,b|k,l]-[a,l|b,k] alpha/alpha
85 VHPLBB [a,b|k,l]-[a,l|b,k] beta/beta
86 VHPLAB [a,b|k,l] alpha/beta
87 VHPLBA [a,b|k,l] beta/alpha
88 VEAA [a,b|c,l]-[a,l|b,c] alpha/alpha
89 VEBB [a,b|c,l]-[a,l|b,c] beta/beta
90 VEAB [a,j|c,d] alpha/beta
91 VEBA [a,j|c,d] beta/alpha
92 VPPPP all four virtual integrals
93 INTERM1 one H-bar, some two H-bar, etc.
94 INTERM2 some two H-bar, etc.
95 INTERM3 remaining two H-bar intermediates
96 ITSPACE iterative subspace data for EA-EOM/IP-EOM
97 INSTART initial guesses for EA-EOM or IP-EOM runs
98 ITSPC3 triples iterative data for EA-EOM
unit name contents
---- ---- --------
files 201-239 may be used by RUNTYP=TDHFX
201 OLI201...running consecutively up to
239 OLI239
files 250-257 are used by divide-and-conquer runs
file 30 is used for the DC-DIIS data
250 DCSUB subsystem atoms (central and buffer)
251 DCVEC subsystem orbitals
252 DCEIG subsystem eigenvalues
253 DCDM subsystem density matrices
254 DCDMO old subsystem density matrices
255 DCQ subsystem Q matrices
256 DCW subsystem orbital weights
257 DCEDM subsystem energy-weighted density matrices
files 297-299 are used by hyperpolarizability analysis
297 LHYPWRK preordered LMOs
298 LHYPKW2 reassigned LMOs
299 BONDDPF bond dipoles with electric fields
Unit 301 is used for direct access using an internally
assigned filename during divide and conquer MP2 runs.
disk files in parallel runs
When a file is opened by the master compute process (which
is rank 0), its name is that defined by the 'setenv'. On
other processes (ranks 1 up to p-1, where p is the number
of running processes), the rank 'nnn' is appended to the
file name, turning the name xxx.Fyy into xxx.Fyy.nnn. The
number of digits in nnn is adjusted according to the total
number of processes started. Thus the common situation of
a SMP node sharing a single disk for several processes, on
up to the case of a machine like the Cray XT having only
one disk partition for all nodes does not lead to file name
conflicts.
By the way, only the master process needs to read the
environment to learn file names: these names are sent as
network messages to the other processes.
When DDI subgroups are not in use, the variable DSKWRK (in
common /par/) defines the strategy. A large file like 2e-
AO integrals (AOINTS) is computed as several smaller files,
which taken together have all the integrals. When all
processes are supposed to process files private to each
process, DSKWRK is .TRUE., and every process has a file,
usually containing different values. For smaller data,
such as CI vectors, where all processes want to store
exactly the same data, only the master process needs to
maintain the file. This situation is DSKWRK=.FALSE. When
the data is to be recovered from disk, only the master
process reads the disk, after which, the data is sent as a
broadcast message to all other processes. The special file
DICTNRY is always processed in this second way, so data
recovered from it is the same (to the least significant
bits) on every process. Another example of a file read by
only one process is the run's INPUT file.
If DDI subgroups are used, DSKWRK is ignored, and every
process opens every file. These are often left empty,
except on the master process in each subgroup. The input
file (INPUT) is exempt from having the rank added to its
name, so that a machine with a common file system can have
all processes read from the same input file. If the groups
have different disks, the INPUT must be copied to the
master process of every group: a simple way to ensure that
is to copy INPUT to every node's work disk. Similarly, the
OUTPUT file (and a few other files like PUNCH) are written
by every group master. If the run goes badly, these extra
output files may be interesting, but most of the time the
OUTPUT from the master of the first subgroup has enough
information. The OUTPUT of non-group-masters is not very
interesting.
The DICTNRY file is also treated in a special way when
running in groups, and that should be described here.
Contents of the direct access file 'DICTNRY'
1. Atomic coordinates
2. various energy quantities in /ENRGYS/
3. Gradient vector
4. Hessian (force constant) matrix
5-6. not used
7. PTR - symmetry transformation for p orbitals
8. DTR - symmetry transformation for d orbitals
9. FTR - symmetry transformation for f orbitals
10. GTR - symmetry transformation for g orbitals
11. Bare nucleus Hamiltonian integrals
12. Overlap integrals
13. Kinetic energy integrals
14. Alpha Fock matrix (current)
15. Alpha orbitals
16. Alpha density matrix
17. Alpha energies or occupation numbers
18. Beta Fock matrix (current)
19. Beta orbitals
20. Beta density matrix
21. Beta energies or occupation numbers
22. Error function interpolation table
23. Old alpha Fock matrix
24. Older alpha Fock matrix
25. Oldest alpha Fock matrix
26. Old beta Fock matrix
27. Older beta Fock matrix
28. Oldest beta Fock matrix
29. Vib 0 gradient in FORCE (numerical hessian)
30. Vib 0 alpha orbitals in FORCE
31. Vib 0 beta orbitals in FORCE
32. Vib 0 alpha density matrix in FORCE
33. Vib 0 beta density matrix in FORCE
34. dipole derivative tensor in FORCE.
35. frozen core Fock operator, in AO basis
36. RHF/UHF/ROHF Lagrangian (see 402-404)
37. floating point part of common block /OPTGRD/
int 38. integer part of common block /OPTGRD/
39. ZMAT of input internal coords
int 40. IZMAT of input internal coords
41. B matrix of redundant internal coords
42. pristine core Fock matrix in MO basis (see 87)
43. Force constant matrix in internal coordinates.
44. SALC transformation
45. symmetry adapted Q matrix
46. S matrix for symmetry coordinates
47. ZMAT for symmetry internal coords
int 48. IZMAT for symmetry internal coords
49. B matrix
50. B inverse matrix
51. overlap matrix in Lowdin basis,
temp Fock matrix storage for ROHF
52. genuine MOPAC overlap matrix
53. MOPAC repulsion integrals
54. exchange integrals for screening
55. orbital gradient during SOSCF MCSCF
56. orbital displacement during SOSCF MCSCF
57. orbital hessian during SOSCF MCSCF
58. reserved for Pradipta
59. Coulomb integrals in Ruedenberg localizations
60. exchange integrals in Ruedenberg localizations
61. temp MO storage for GVB and ROHF-MP2
62. temp density for GVB
63. dS/dx matrix for hessians
64. dS/dy matrix for hessians
65. dS/dz matrix for hessians
66. derivative hamiltonian for OS-TCSCF hessians
67. partially formed EG and EH for hessians
68. MCSCF first order density in MO basis
69. alpha Lowdin populations
70. beta Lowdin populations
71. alpha orbitals during localization
72. beta orbitals during localization
73. alpha localization transformation
74. beta localization transformation
75. fitted EFP interfragment repulsion values
76. model core potential information
77. model core potential information
78. "Erep derivative" matrix associated with F-a terms
79. "Erep derivative" matrix associated with S-a terms
80. EFP 1-e Fock matrix including induced dipole terms
81. interfragment dispersion values
82. MO-based Fock matrix without any EFP contributions
83. LMO centroids of charge
84. d/dx dipole velocity integrals
85. d/dy dipole velocity integrals
86. d/dz dipole velocity integrals
87. unmodified h matrix during SCRF or EFP, AO basis
88. PCM solvent operator contribution to Fock
89. EFP multipole contribution to one e- Fock matrix
90. ECP coefficients
int 91. ECP labels
92. ECP coefficients
int 93. ECP labels
94. bare nucleus Hamiltonian during FFIELD runs
95. x dipole integrals, in AO basis
96. y dipole integrals, in AO basis
97. z dipole integrals, in AO basis
98. former coords for Schlegel geometry search
99. former gradients for Schlegel geometry search
100. dispersion contribution to EFP gradient
records 101-248 are used for NLO properties
101. U'x(0) 149. U''xx(-2w;w,w) 200. UM''xx(-w;w,0)
102. y 150. xy 201. xy
103. z 151. xz 202. xz
104. G'x(0) 152. yy 203. yz
105. y 153. yz 204. yy
106. z 154. zz 205. yz
107. U'x(w) 155. G''xx(-2w;w,w) 206. zx
108. y 156. xy 207. zy
109. z 157. xz 208. zz
110. G'x(w) 158. yy 209. U''xx(0;w,-w)
111. y 159. yz 210. xy
112. z 160. zz 211. xz
113. U'x(2w) 161. e''xx(-2w;w,w) 212. yz
114. y 162. xy 213. yy
115. z 163. xz 214. yz
116. G'x(2w) 164. yy 215. zx
117. y 165. yz 216. zy
118. z 166. zz 217. zz
119. U'x(3w) 167. UM''xx(-2w;w,w) 218. G''xx(0;w,-w)
120. y 168. xy 219. xy
121. z 169. xz 220. xz
122. G'x(3w) 170. yy 221. yz
123. y 171. yz 222. yy
124. z 172. zz 223. yz
125. U''xx(0) 173. U''xx(-w;w,0) 224. zx
126. xy 174. xy 225. zy
127. xz 175. xz 226. zz
128. yy 176. yz 227. e''xx(0;w,-w)
129. yz 177. yy 228. xy
130. zz 178. yz 229. xz
131. G''xx(0) 179. zx 230. yz
132. xy 180. zy 231. yy
133. xz 181. zz 232. yz
134. yy 182. G''xx(-w;w,0) 233. zx
135. yz 183. xy 234. zy
136. zz 184. xz 235. zz
137. e''xx(0) 185. yz 236. UM''xx(0;w,-w)
138. xy 186. yy 237. xy
139. xz 187. yz 238. xz
140. yy 188. zx 239. yz
141. yz 189. zy 240. yy
142. zz 190. zz 241. yz
143. UM''xx(0) 191. e''xx(-w;w,0) 242. zx
144. xy 192. xy 243. zy
145. xz 193. xz 244. zz
146. yy 194. yz
147. yz 195. yy
148. zz 196. yz
197. zx
198. zy
199. zz
245. old NLO Fock matrix
246. older NLO Fock matrix
247. oldest NLO Fock matrix
249. polarizability derivative tensor for Raman
250. transition density matrix in AO basis
251. static polarizability tensor alpha
252. X dipole integrals in MO basis
253. Y dipole integrals in MO basis
254. Z dipole integrals in MO basis
255. alpha MO symmetry labels
256. beta MO symmetry labels
257. dipole polarization integrals during EFP1
258. Vnn gradient during MCSCF hessian
259. core Hamiltonian from der.ints in MCSCF hessian
260-261. reserved for Dan
262. MO symmetry integers during determinant CI
263. PCM nuclei/induced nuclear Charge operator
264. PCM electron/induced nuclear Charge operator
265. pristine alpha guess (MOREAD or Huckel+INSORB)
266. EFP/PCM IFR sphere information
267. fragment LMO expansions, for EFP Pauli
268. fragment Fock operators, for EFP Pauli
269. fragment CMO expansions, for EFP charge transfer
270. reserved for non-orthogonal FMO dimer guess
271. orbital density matrix in divide and conquer
int 272. subsystem data during divide and conquer
273. old alpha Fock matrix for D&C Anderson-like DIIS
274. old beta Fock matrix for D&C Anderson-like DIIS
275. not used
276. Vib 0 Q matrix in FORCE
277. Vib 0 h integrals in FORCE
278. Vib 0 S integrals in FORCE
279. Vib 0 T integrals in FORCE
280. Zero field LMOs during numerical polarizability
281. Alpha zero field dens. during num. polarizability
282. Beta zero field dens. during num. polarizability
283. zero field Fock matrix. during num. polarizability
284. Fock eigenvalues for multireference PT
285. density matrix or Fock matrix over LMOs
286. oriented localized molecular orbitals
287. density matrix of oriented LMOs
288. DM1 during CEPA-style calculations
289. DM2 during CEPA-style calculations
290. pristine (gas phase) h during solvent runs
291. "repulsion" integrals during EFP1
292-299. not used
301. Pocc during MP2 (RHF or ZAPT) or CIS grad
302. Pvir during MP2 gradient (UMP2= 411-429)
303. Wai during MP2 gradient
304. Lagrangian Lai during MP2 gradient
305. Wocc during MP2 gradient
306. Wvir during MP2 gradient
307. P(MP2/CIS)-P(RHF) during MP2 or CIS gradient
308. SCF density during MP2 or CIS gradient
309. energy weighted density in MP2 or CIS gradient
311. Supermolecule h during Morokuma
312. Supermolecule S during Morokuma
H matrix in HOP for FMO
313. Monomer 1 orbitals during Morokuma
314. Monomer 2 orbitals during Morokuma
315. combined monomer orbitals during Morokuma
316. RHF density in CI grad; nonorthog. MOs in SCF-MI
317. unzeroed Fock matrix when MOs are frozen
318. MOREAD orbitals when MOs are frozen
319. bare Hamiltonian without EFP contribution
320. MCSCF active orbital density
321. MCSCF DIIS error matrix
322. MCSCF orbital rotation indices
323. Hamiltonian matrix during QUAD MCSCF
324. MO symmetry labels during MCSCF
325. final uncanonicalized MCSCF orbitals
326-329. not used
330. CEL matrix during PCM
331. VEF matrix during PCM
332. QEFF matrix during PCM
333. ELD matrix during PCM
334. PVE tesselation info during PCM
335. PVE tesselation info during PCM
336. frozen core Fock operator, in MO basis
337-339. not used
340. DFT alpha Fock matrix
341. DFT beta Fock matrix
342. DFT screening integrals
343. DFT: V aux basis only
344. DFT density gradient d/dx integrals
345. DFT density gradient d/dy integrals
346. DFT density gradient d/dz integrals
347. DFT M[D] alpha density resolution in aux basis
348. DFT M[D] beta density resolution in aux basis
349. DFT orbital description
350. overlap of true and auxiliary DFT basis
351. previous iteration DFT alpha density
352. previous iteration DFT beta density
353. DFT screening matrix (true and aux basis)
354. DFT screening integrals (aux basis only)
355. h in MO basis during DDI integral transformation
356. alpha symmetry MO irrep numbers if UHF/ROHF
357. beta symmetry MO irrep numbers if UHF/ROHF
358-369. not used
370. left transformation for pVp
371. right transformation for pVp
370. basis A (large component) during NESC
371. basis B (small component) during NESC
372. difference basis set A-B1 during NESC
373. basis N (rel. normalized large component)
374. basis B1 (small component) during NESC
375. charges of non-relativistic atoms in NESC
376. common nuclear charges for all NESC basis
377. common coordinates for all NESC basis
378. common exponent values for all NESC basis
372. left transformation for V during RESC
373. right transformation for V during RESC
374. 2T, T is kinetic energy integrals during RESC
375. pVp integrals during RESC
376. V integrals during RESC
377. Sd, overlap eigenvalues during RESC
378. V, overlap eigenvectors during RESC
379. Lz integrals
380. reserved for Ly integrals.
381. reserved for Lx integrals.
382. X, AO orthogonalisation matrix during RESC
383. Td, eigenvalues of 2T during RESC
384. U, eigenvectors of kinetic energy during RESC
385. exponents and contraction for the original basis
int 386. shell integer arrays for the original basis
387. exponents and contraction for uncontracted basis
int 388. shell integer arrays for the uncontracted basis
389. Transformation to contracted basis
390. S integrals in the internally uncontracted basis
391. charges of non-relativistic atoms in RESC
392. copy of one e- integrals in MO basis in SO-MCQDPT
393. Density average over all $MCQD groups in SO-MCQDPT
394. overlap integrals in 128 bit precision
395. kinetic ints in 128 bit precision, for relativity
396. non-relativistic h, copy used by LUT-IOTCC
397. Lx spin-orbit integrals for MCP2E
398. Ly spin-orbit integrals for MCP2E
399. Lz spin-orbit integrals for MCP2E
400. not used
401. dynamic polarizability tensors
402. GVB Lagrangian
403. MCSCF Lagrangian
404. GUGA CI Lagrangian (see 308 for CIS)
405. molecular dip-dip polarizability
406. MEX search state 1 alpha orbitals
407. MEX search state 1 beta orbitals
408. MEX search state 2 alpha orbitals
409. MEX search state 2 beta orbitals
410. not used
411. alpha Pocc during UMP2 gradient (see 301-309)
412. alpha Pvir during UMP2 gradient
413. alpha Wai during UMP2 gradient
414. alpha Lagrangian Lai during UMP2 gradient
415. alpha Wocc during UMP2 gradient
416. alpha Wvir during UMP2 gradient
417. alpha P(MP2/CIS)-P(RHF) during UMP2/USFTDDFT grad
418. alpha SCF density during UMP2/USFTDDFT gradient
419. alpha energy wghted density in UMP2/USFTDDFT grad
420. not used
421-429. same as 411-419, for beta orbitals
430. not used
440-469. reserved for NEO
470. QUAMBO expansion matrix
471. excitation vectors for FMO-TDDFT
472. X+Y in MO basis during TD-DFT gradient
473. X-Y in MO basis during TD-DFT gradient
474. X+Y in AO basis during TD-DFT gradient
475. X-Y in AO basis during TD-DFT gradient
476. excited state density during TD-DFT gradient
P matrix in FMO-DFTB/AFO
477. energy-weighted density in AO basis for TD-DFT
478-489. not used
490. transition Lagrangian right hand side during NACME
491. gradients vectors during NACME
492. NACME vectors during NACME
493. difference gradient in conical intersection search
nuclear charges in FMO/FD
494. derivative coupling vector in CI search
nuclear coordinates in FMO/FD
495. mean energy gradient in CI search
496. unused
497. temp storage of gradient of 1st state in CI search
498. interface data for ab initio multiple spawning
499-500. not used
501. A2 cavity data in COSMO
502. A3 cavity data in COSMO
503. AMTSAV cavity data in COSMO
504-510. not used
511. effective polarizability in LRD
512. C6 coefficients in LRD
513. C8 coefficients in LRD
514. C10 coefficients in LRD
515. atomic pair LRD energy
520. Malmqvist factorized orb transformation (wrt 325)
521. SVD localized orthogonal orbitals
522. SVD localized nonorthogonal orbitals
523. initial-to-SVD LMO nonorthogonal transformation
524. SVD LMO nonorthogonal-to-orthogonal transformation
525. initial-to-SVD LMO orthog transformation (wrt 15)
526. 1st order density for orthogonal SVD localized MOs
527. collective orbital reordering for Malmqvist
528. atom-to-orbital assignment for SVD orbitals
529. Malmqvist re-ordered set of SVD LMOs
530. oriented SVD density in the order of record 529
531. oriented or SVD atom-to-orbital assignment for CT
532. block zapped 'standard Fock operator' in AO basis
533. overlap of stored atom's MBS with current basis
534. occupied+external orthog loc (natural) orbitals
535. atom-to-orbital assignment for record 534 orbitals
536. specialized SVD density matrix for EXTERNAL NOS
537. VVOS no-transfer orbitals+appropriate LMOs.
538. occupied+VVOS orbitals right after VVOS formation
539. nonorthogonal SVD localized orbitals
540. atom-to-orbital assignment for record 539 orbitals
541. pristine MCSCF orbs during diabatization
542. reference geometry orbs during diabatization
543. PT2 state rotation during diabatization
544. PT2 state energies during diabatization
545. PT2's CAS-CI largest CI coefs, in diabatization
546. Group labels for SVD orbitals.
547. Atom labels for oriented orbitals.
548. Group labels for oriented orbitals.
549. Quasi-atomic orbitals during No Charge Transfer
550. Current guess orbitals during No Charge Transfer
551. Atom labels during No Charge Transfer
552. Determinant NCT density for SVD/oriented orbitals.
553. Total NCT density mtx for SVD/oriented orbitals.
554. pseudodensity mtx from right coupled cluster NOs.
555. Unmodified input orbs for checking active space.
556. DFTB atom-resolved Mulliken populations
557. DFTB shell-resolved Mulliken populations
558. DFTB shell-resolved spin populations
559. DFTB atom-resolved shift contributions
560. DFTB shell-resolved shift contributions
561. DFTB shell-resolved shift contributions from spin
562. DFTB alpha occupation numbers
563. DFTB beta occupation numbers
564. DFTB non-perturbed Hamiltonian in FMO
565. DFTB reserved
566. DFTB atom-resolved shift of ESP in FMO
567. DFTB atom-resolved shift of ESP in FMO (DFTB3)
568. DFTB Slater-Kostner tables
569. DFTB U matrix for FMO-DFTB/AFO
570-571. FMO/FDD/PCM: Lagrangian and energies
572-579. unused.
580-599. reserved for Aaron
600. alpha loc. transformation in LMOEDA
601. alpha localized orbs in LMOEDA
602. beta loc. transformation in LMOEDA
603. beta localized orbs in LMOEDA
604. alpha Coulomb operator in LMOEDA
605. alpha exchange operator in LMOEDA
606. alpha density in LMOEDA
607. beta Coulomb operator in LMOEDA
608. beta exchange operator in LMOEDA
609. beta density in LMOEDA
610-950. mostly not used, but
615,617 EFMO?
801-809. xx,xy,xz,yx,yy,yz,zx,zy,zz quadrupole MO ints.
810-815. xx,xy,xz,yy,yz,xx quadrupole AO ints.
816. LMO dipole-quadrupole polarizability
817. molecular dipole-quadrupole polarizability
In order to correctly pass data between different
machine types when running in parallel, it is required that
a DAF record must contain only floating point values, or
only integer values. No logical or Hollerith data may be
stored. The final calling argument to DAWRIT and DAREAD
must be 0 or 1 to indicate floating point or integer values
are involved. The records containing integers are so
marked in the list below.
Physical record 1 (containing the DAF directory) is
written whenever a new record is added to the file. This
is invisible to the programmer. The numbers shown above
are "logical record numbers", and are the only thing that
the programmer need be concerned with.
Edited by Shiro KOSEKI.