$RIMP2 group (optional, relevant if CODE=RIMP2 in $MP2)
This group controls the resolution of the identity MP2
program, which approximately evaluates the MP2 energy. The
RI approximation greatly reduces the computer resources
required, while suffering only a small error in the
energies. Thus, very large atomic basis sets may be used.
The input below controls both utilization of the computer
resources, and the accuracy of the calculation. See also
$AUXBAS, regarding the auxiliary basis set, whose choice
also affects the accuracy of the calculation.
The program is enabled for parallel calculation, and is
tuned to today's SMP nodes. It is limited to energy
calculations only, without any solvent effects, for RHF or
UHF references.
IAUXBF = 0 uses Cartesian Gaussians
= 1 uses spherical harmonics
for the auxiliary basis set used to expand the
MP2 energy expression into products of 3-index
matrices. The default is inherited from ISPHER.
The next two control computer resources, trading memory for
disk storage.
GOSMP = flag requesting shared memory use. The default
is .TRUE. in multi-core nodes, but .FALSE. in a
uniprocessor. This option means only one copy of
certain large matrices is stored per node.
USEDM = a flag to store two and three center repulsion
integrals in distributed memory (.TRUE.), or in
disk files (.FALSE., which is the default).
Selection of this flag requires MEMDDI in $SYSTEM.
The default is .TRUE.
The RI approximation reduces CPU time, memory requirements,
and total disk storage requirements compared to exact
calculation. Experimentation with these two keywords will
let you tune the program to your hardware situation. For
example, choosing GOSMP=.TRUE. and USEDM=.TRUE. will run
without any extra disk files, while setting GOSMP=.TRUE.
and USEDM .FALSE. will minimize memory usage (and network
usage) at the expense of doing disk I/O.
Total memory usage per node can be obtained by running
EXETYP=CHECK. Note the largest replicated memory printed
during the RIMP2's output, dividing by 1000000 to get the
correct input for MWORDS (round up a bit). Note the
largest shared memory requirement printed, also dividing by
100000, and rounding up a bit. Note the distributed memory
requirement, which is already in megawords, and is the
correct input for MEMDDI. Then, assuming you use p total
compute process on multiple n-way nodes, the memory per
node is
GBytes/node= 8(n*MWORDS + shared + n*MEMDDI/p)/1024
Turning off GOSMP reduces the shared memory to 0 but
increases MWORDS, which is multiplied by the number of
cores per node! Turning off USEDM leads to MEMDDI=0 by
using disk storage instead.
If additional memory is available, increasing MWORDS can
lead to a reduction in the level of the occupied orbital
batch, or "LV". Larger MWORDS permits a smaller LV, which
will in turn reduce the required computational time, and
the required network traffic or disk I/O. The value of LV
used is the last line appearing after "CHECKING SIZE OF
OCCUPIED ORBITAL BATCH".
The next four control numerical accuracy, but see $AUXBAS
which is even more influential in regards the accuracy!
OTHAUX = flag to orthogonalize the RI basis set by
diagonalization of the overlap matrix. If there
is reason to suspect linear dependence may exist
in the RI basis, select this option to have a
more numerically stable result. Larger RI basis
sets such as CCT and ACCT, in particular, may
benefit from selecting this. (default=.FALSE.)
STOL = threshold at which to remove small overlap matrix
eigenvectors, ignored if OTHAUX=.FALSE. This
keyword is analogous to QMTTOL in $CONTRL for the
true AO basis. (default= 1.0d-6)
IVMTD = selects the procedure for removing redundancies
when inverting the two-center, two-e- matrix.
= 0 use Cholesky decomposition (default)
= 2 use diagonalization
VTOL = threshold at which to remove redundancies. This
is ignored unless IVMTD=2 (default= 1.0d-6)
Don't forget to see also the $AUXBAS input group!
An example of this program follows. The molecule is taxol,
with 1032 AOs and MOs in the 6-31G(d) basis, correlating
164 valence orbitals. The RI basis set used is SVP, which
matches the true basis set in quality. There are 4175 AOs
in the RI basis. The job was run on a single 8-way node
(n=8, p=1,2,4,8), using MWORDS=50 (leading to LV=6),
MEMDDI=580, and the largest shared memory needed is 95
million words. The total node memory is thus
(8 bytes/word)*(8*50 + 95 + 8*580/ 8)/1024 = 8.4 GBytes
easily fitting into a modern 16 GByte node. It reduces to
(8 bytes/word)*(8*50 + 95 + 8*580/16)/1024 = 6.1 GB/node
if two 8-way nodes are used. Scaling is
p SCF RI-MP2 job total
1 7391 7919 15366
2 3718 4131 7860
4 1857 2290 4174
8 952 1488 2479
16 486 758 1276 using two 8-way nodes.
numerical results are E(RI-MP2)= -2920.607512
versus the exact E(MP2)= -2920.606231
The 0.0013 error should be measured against the total 2nd
order correlation energy, which is -8.7855, while noting
the time for the 2nd order E is similar to the SCF time.
===========================================================
===========================================================
127 lines are written.
Edited by Shiro KOSEKI on Mon Feb 13 10:50:16 2017.