Note: Only changes specific to the parallel codes are noted here.
      Check the serial UPDATES file for changes common to both - NRB.


30/06/21
Removed tabs.

13/12/18 pstg1r (1.4)
Changed LAMAX default from 2*MAXLA to LAMAX=2 since p/stgf default is to ignore .gt.2.
Helpful for R-matrix script since it does not set LAMAX explicitly. For codes
compiled with MZLMX=2, p/stg2/3/r ignore .gt. 2, but p/stg1/r does not.

18/08/17 pstgicf (2.5), pstgicfdamp (2.3)
Ported serial change to TCCDW.DAT matching with term.dat.

17/08/17 pstg2r (1.9), pstgjk (1.7)
Port serial fixes for KCUT operation.

24/07/17 pstg2r (1.9), pstgjk (1.7)
Port of serial KCUT operation.

17/10/16 p/stg2.5 (1.6)
Pass thru LS target e-vectors for TECs, if present.
(p/stg2 has an option flag to do this but it was not originally ported into 2.5).

14/10/16 pstgjk (1.6)
Ported serial TEC coding.

12/10/16 pstg2r (1.8)
Refine use of allocate in sr.setmx1.

27/03/15 pstgd_dip, pstgd_dip_split
KFLAG not initialized when non-DARC H.DAT in use. If taken to be zero,
fails for BP (tries to use a non-allocated array). Initialize KFLAG=1
to avoid both (K quantum number not used anyway.)

21/01/15 pstg2.5 (1.5)
Re-synched with Connor's (use actual dimen for H(N+1) allocate).

04/11/14 pstg3r_dip
Minor: ensure DIPORDER indexes zero-channel case in sync with STG2 operation and
sets the mapping index to zero for pstgd/pstgd_split to catch and exit gracefully.

03/11/14 pstgd_dip, pstgd_dip_split
Corrected for DMAT2 reverasal of ABUT & BBUT dipole Buttle corrections, but differently
from serial case - see also UPDATES - since they are not in the CALL DMAT2 argument.
But, no use is made of DMAT1 and so can simply reverse on the DMAT WRITE.

31/10/14 pstgd_dip_split
Introduce ndip_per_subworld: =1 just follows previous operation i.e. all dipoles
treated simultaneously and masternproc=ndipole*nproc_per_subworld. More generally,
ndip_per_subworld dipoles are treated sequentially within a sub-world a la pstgd_dip.
So, masternproc=(ndipole/ndip_per_subworld)*nproc_per_subworld. Currently require
the same ndip_per_subworld for all sub-worlds i.e. ndipole/ndip_per_subworld integer.
This is useful for radiation damped EIE on small clusters, e.g. number of dipoles
exceeds number of nodes.

27/10/14 pstgd_dip_split
DSCRATCH could be deleted before all processors had finished read it (small cases).

24/10/14 pstgd_dip, pstgd_dip_split
Small channel case (NPROC>No. Channels) missed a call to MPI_barrier for "unused" procs.
Small channel case (NPROC>No. Channels) missed a de-allocate for "unused" procs.

23/10/14 pstg2r (1.8)
Ported writes for stgjk TECs from serial code, for convenience.

23/10/14 pstg2r_dip 
INAST.lt.0 flags the read, *directly* after &STG2B of -INAST  (N+1)-SLp symmetries 
which are to be skipped from the set auto-generated by MINLT,MAXLT,MINST,MAXST.
This is to work around symmetries which have zero channels and subsequently cause
pstgd_dip to fail - not easy to work around (see below) always best to remove at root.
Note, INAST>0 is not an option, due to the prescribed way in which the code generates
the dipole symmetry pairs.
Ported back into p/stg2r, mainly for consistancy since INAST>0 was/is a workaround.
Note: pstgd *cannot* handle zero-channel cases because the STG2XXX files are indexed
including them and the DIPORDER/symmetryXX files from pstg3_dip exclude them.
There is no simple workaround, save deleting theme from sizeH.dat and the corresponding
STG2HXXX and STG2DXXX files *and* renumbering all higher symmetry files.

15/10/13 pstg3r_split
Minor fixes.

23/08/13 pstgjk (1.5)
Uniform common block sizes fixed. Re-ordering of Hamiltonian operations in LSCONT/SPINOR.

15/08/13 pstg2.5 (1.5)
CF is allocated. pstgjk_split support file output: JSPLIT.DAT.

08/08/13 pstg3r (4.4)
IONEONE skips file reads for IPWINT>1.

07/08/13 pstg3r (4.3), pstg3r_split, pstg3r_dip
If user sets MAXST too large in stg2 then all L will have symmetries with
zero channels, in particular, the final stg2 symmetry will have zero channels. 
This causes parallel pstg3 codes to exit ungracefully (i.e. write buffers not flushed,
i.e. H.DAT incomplete) because the *final* (zero) matrix size (actually, it's undefined
here) is used to initialize parameters for parallel diagonalization set-up prior to
the read of H from STG2H.DAT since stg2 has flagged MORE symmetries (it did not know
that the next symmtery was zero) but stg3 ignores zero-channel symmetries (they were 
not written to STG2H.DAT) and progresses to the next non-zero one but in the final 
instance there is none. Solution is to flag the end of sizeH.dat and exit rather
than try and proceed to serial exit point.
(Serial stg3r does not have the same initialization set-up and the code exits
gracefully on reaching the end of STG2H.DAT.)

20/06/13 pstg2r (1.7)
CPB: RK integrals allocated and modularised. Asympotic potential coefficients minimised.
 
20/06/13 pstgf (2.8)
DGEMM in SR.RINIT is now called repeatedly to carry-out half-matrix 
multiply by sub-block - do not need full WMAT transpose now. 

16/06/13 pstgf (2.8), pstgfdamp (2.7), pstgbf0damp (1.5) - copied from serial UPDATES
********************************************************
***IMPORTANT: the size of present day R-matrix calculations often
leads to much larger box sizes than was historic. The determination
of outer region solutions from Coulomb series at the box increasingly
fails. This is particularly true for MQDT operation. The previous
default was to drop channels for such failures as they were deemed
to be deeply closed (i.e. classically forbidden). Now, the 09/10/08
option is the *default*, i.e. for such cases, move inside the box, where
starting values can be determined and Numerov out to the box boundary. 
This is controlled by the IOMSW parameter.
Old defaults: IOMSW=1  (IQDT.gt.0, i.e. MQDT) IOMSW=0  (IQDT.le.0, i.e. non-MQDT)
New defaults: IOMSW=11 (IQDT.gt.0, i.e. MQDT) IOMSW=10 (IQDT.le.0, i.e. non-MQDT)

19/03/13 pstgjk (1.4)
1.3->1.4 update of 27/01/11 did not pass N2HDAT to (regular) pstg3r,
had to be set manually in dstg3.

13/03/11 pstgd_dip_split (CPB)
Distinguish between subworld and globalworld barrier calls (masteriam).

07/03/11 pstgd_dip_split - new code (CPB)
Process dipole pairs concurrently.

Namelist : &predip nproc_per_subworld, ndip, ndipole /

nproc_per_subworld = procs per dipole, ideally maxc in large cases
ndipole = number of dipoles
ndip = 1 length (default), 0 velocity
(Total no. of processors -np used is ndipole*nproc_per_subworld.)

04/03/11 pstg3r_dip (CPB)
Minor re-working of e-vector gathering.

04/03/11 pstg2r_dip (CPB)
Minor but key change - add LSp info to sizeH.dat

27/01/11 pstgjk 1.4 (CPB)
1/Use longer record length for Direct Access files
2/Integer*8 to allow H beyond rank sqrt(2**31)
3/MXHLS now in PARAM - user to set from sizeH.dat to avoid inflation.

18/12/10 pstg2.5 (CPB)
Allocate HNP1.

01/11/10 pstgjk (1.3)
N2HDAT got out of sync. with pstg3r.

03/02/10 pstg2r_dip
Fix to a FORMAT for certain compilers.

02/02/10 pstg2r (1.6)
CPB: replace large COMMON by MODULE/ALLOCATE to reduce memory usage.

18/12/09 pstg3r_split (CPB)
FULL H.DAT writes for icomplete_hdat.eq.1/npw_sub_world.eq.1  

10/09/09 pstg3r (4.3), pstg3r_split, pstg3r_dip
Comment-out no longer used SR.DA2 and, hence, /MEMORY/ so that MZMEG
does not add to memory use.

04/09/09 pstg1r (1.4), pstg2r (1.5)
Additional coding (to serial) for handling LNOEX variable.

13/08/09 pstg2nx (1.2)
Crashed if more processors specified than L-range.

14/07/09 pstg3r_split - new code (CPB)
pstg3r_split.f is a natural extension of the pstg3r (4.3) code.
If processors are available diagonalise every Hamiltonian concurrently. 

Namelist : &prediag npw_per_subworld=1 /  
(ie number of partial waves to diagonalize per subworld )

If there are npw partial waves in total, then there are

num_subworlds=npw/npw_per_subworld  sub-worlds

and

num_subworlds * npcol * nprow MUST equal number of procs set by -np

13/07/09  pstg3r_dip pstgd_dip
Further updates for NDIVD.

05/06/09 pstg3r_dip pstgd_dip
Updates for large memory cases:
NDIVD for pstg3r_dip (NDIV=1 recovers old single pass, default
should be fine)
DGEMM commented-out for pstgd_dip.f . DGEMM is faster than native coding 
but is memory hungry.

29/05/09 pstgbf0damp (1.3)
Don's fix to pstgbf0damp for writing partial PI files.
(This is parallel sepcific.)

25/02/09 pstg3r (4.3)
Implemented default dynamic setting of NDIV, which controls
the chunk length of surface amplitudes sent back to processor
zero for aggregating and writing to H.DAT - basically, the
number of chunks. If NDIV.gt.1 then the format of H.DAT is (slightly)
changed - NDIV records for WMAT. The stgb/f suite is already programmed
to handle this, as are the main utility codes (stgf for some time).
Obscure utility codes or old versions of stgb/f will fail for
NDVI.gt.1. This can be set explicitly by the user in the &matrixdat
namelist, at the cost of slower running and/or in extreme cases running
out of memory and/or exceeding the allowed buffer size for send/receive.
***Note, NDIV.gt.1 by default only for matrices of rank .ge. 5500.

09/10/08 pstgf (2.6), pstgfdamp (2.4), pstgbf0damp (1.2)
Archived for serial changes.

26/09/08 pstg3r_dip
Mike's mod to regular pstg3r for small matrices has been ported here.

24/09/08 pstg2_dip
No. of processors vs no of symmetries needed to be checked on
all processors so that can stop gracefully if fails.

26/03/08 pstg2_dip, pstgjk_dip, pstg3r_dip
Minor fixes.

05/12/07 pstg2_dip, pstgjk_dip, pstgd_dip, pstg3r_dip
Dipole enabled versions of the parallel inner region BP codes
put online (passwd protected). Mainly due to CPB. The existing
(non-dipole) parallel codes will remain and continue to be corrected.
Only changes specific to the dipole enabled codes will be noted.

13/08/07 pstg3r (4.3)
Mike (Witthoeft) added code to temporarily reset block size when handling
the diagonalization of small matrices.

02/07/07 pstg3r (4.3)
Selecting symmetries with IPWINIT,IPWFINAL did not work with IONEONE=1
(case of one symmetry per STG2HXXX.DAT file).

18/03/07 pstg1r (1.3) -> (1.2)
Revert back to previous version, as per serial 2.38 _. 2.37.
                                                  
11/01/07 pstg1r (1.3)
Archived v1.2 to consolidate serial changes and start v1.3 for latest
serial changes, see serial UPDATES.

30/05/06 pstg3r (4.3)
NDIV change 14/02/06 failed to deallocate VECTOR, causes memory hog/failure.

15/02/06 pstgbf0damp (1.1)
First release of pstgbf0damp, based on latest serial stgbf0damp (4.5)
and using same parallel coding as pstgfdamp etc., including striped
energy mesh for small clusters.

15/02/06 pstgf (2.6), pstgfdamp (2.4)
Reduced memory requirement.

14/02/06 pstg3r (4.3)
Write blocked WMAT if user specifies NDIV, no. of blocks. Default, one.

25/10/05 pstg2r (1.4)
If no coupled channels for a symmetry, N+1 removal file (AMATXXX.DAT) got out
of sync. with STG2HXXX.DAT - CPB.

24/10/05 pstg3r (4.2)
Bug in N+1 removal (introduced accidently in 19/02/04 v3.9) if read-in 
LS energies - CPB.

12/07/05 pstg3r (4.2), pstg3nx (1.5)
Set IPRINT.LT.0 to suppress screen writes (default 0), in NAMELIST STG3A.

08/07/05 pstgicf(2.4), pstgicfdamp (2.2)
Fine mesh was not being set correctly (internally) for interpolation (IMODE=-1).

02/07/05 pstgf (2.6), pstgfdamp (2.4), pstgicf(2.4), pstgicfdamp (2.2)
Implemented interpolation (IMODE=-1) operation in pstgicf. This requires
the user to define a suitable distribution of interpolation energies in pstgf,
via IMODE=0. (The default distribution is to distribute NPROCE=MXE/NPROC energies
per processor linearly across the entire energy range, with energy step on a
processor EINCR*NPROC. This balances the load optimumly for pstgf/icf, since the
time per energy is highly dependent on the number of channels open/closed.)
For interpolation we need sequential energies on a processor, the user
sets the number via NSEQ. To attempt to balance still, choose NSEQ.lt.NPROCE
so that NPROCE/NSEQ=NCINT is integer - the code determines NCINT appropriately
if NSEQ is user set. (Alternatively, one can set NCINT and the code
determines the appropriate NSEQ.) Thus we have stripes of energies per processor
across the entire energy range. The more stripes there are the better the
load balance but we also need the stripes to be wide enough (NSEQ large enough)
so as to minimise the number of boundaries, because the same K-matrix is
calculated twice at the boundary energy for "adjacent" processors, i.e.
there is no message passing of K-matrices between processors.
NSEQ=10 and NCINT.ge.5 seems a good initial choice. This does mean NPROCE=50
and so would appear to be better suited to clusters rather than massively 
parallel systems. But still, if pstgicf time is the issue, or the size of
K-matrices transferred from pstgf, then it may still be advantageous.

The fine mesh in pstgicf (IMODE=-1) is defined differently than in serial
stgicf (where the IMESH1 MXE, E0, EINCR was "repeated" on a finer subset
of stgf energies) so as to simplify coding. In NAMELIST IMESH1 the user just 
sets IEQ, the number of steps between the pstgf energies, so IEQ=1 leads to 
pstgicf mesh being equal to the pstgf mesh (the default).

Interpolation (IMODE=-1) has also been introduced for standalone pstgf operation
(e.g. BP, DARC). The same mesh distribution should be used as discussed above,
but now refers to the fine mesh. Use ieq again to specify in effect the coarse
mesh. (Note, in serial stgf this was the ieq.lt.0 option. In parallel pstgf/pstgicf
no notice is taken of the sign of ieq. It only behaves like serial ieq.lt.0.)
The OMEGAXXX files still contain duplicate boundary energy omegas, unlike those
resultant from pstgicf, because there is an effective single energy mesh in
pstgf while pstgicf has the coarse from pstgf and its own fine mesh. This
does not affect trapezoidal rule quadrature since dE=0.0, but might be an
issue for dropping "odd points" since this would appear as not being a
single spike. OMADD should be modified to take account of degenerate mesh.

14/03/05 pstgb (1.1)
First release of parallel version of stgb.

09/12/04 pstg3nx (1.5)
Read of Buttle correction from parallel pstg2nx (multiple RADn.XXX files)
was incorrect. Shows up in ICFT on weak inelastic transitions as 
elastic LS are mixed-in by TCCs. Use of serial stg2nx was/is fine.

27/08/04 pstgf (2.5), pstgfdamp (2.3)
Default BLAS in RINIT now uses DGEMM (DDOT option commented-out).
Less cache intensive and faster if openMP library used. 
Without openMP it might be slower as DGEMM multiplies out the full
R-matrix while DDOT usage takes advantage of R-matrix symmetry to
only generate half. (It compiling BLAS yourself you can trivially
modify DGEMM to generate only LOWER half of R-matrix.

23/08/04 pstg2.5 (1.3)
Initialize ISTOP.

31/07/04 pstg3r (4.2)
KAB's method for eliminating weakly coupled N+1 terms. (TAPERD implementation
quite different from serial case.)

21/07/04 pstg2r (1.4)
Need all processors to OPEN CONFIG file.

17/05/04 pstg3r (4.1), pstgf (2.5), pstgfdamp (2.3)
Partitioned R-matrix, as serial except note use of pdsyevx in pstg3r.

04/05/04 pstgicfdamp (2.1)
The default pstgf/damp write of K-matrices (split by symmetry) is inconsistent
with pstgicfdamp, which expects them split by processor/energy alone,
as is the case for pstgf/pstgicf. Changed default to energy split only.

27/04/04 pstgf (2.4), pstgfdamp (2.2)
DARC interface from serial codes.

21/04/04 pstgicf_blas (2.3)
Fixed bug in BLAS version only. Introduced in serial stgicf v4.3.
v2.2/4.2 o.k.

15/04/04 pstg2.5 (1.3)
If more than one Jp symmetry per file then final symmetry of final
file may may not be transformed, almost certainly so if a minimal 
set of LSp symmetries was used.

10/04/04 pstgf (2.3)
Simplified IMESH=3 option (no need for MPI).

10/04/04 pstgfdamp (2.1)
Initial public release.

09/04/04 pstgicfdamp (2.1)
Initial public release. Enables use of IQDT=1 (unphysical S-matrix)
with pstgf for ICFT. Faster than IQDT=2 (unphysical K-matrix) when
about 2/3, or more, channels open.

04/04/04 pstg2nx (1.2)
Tidied up MPI etc.

04/04/04 pstg3nx (1.4)
Updated to handle multiple RADn.XXX n=1,2,3 files. Needs sizeNX.dat
written by pstg2nx. Default operation looks for serial RADn.DAT n=1,2,3
first and uses them if present.

30/03/04 pstg2nx (1.1)
Initial release. Simple use requires no extra input, automatically
distributes L (equally) over all procs. Write sizeNX.dat for pstg3nx.
To control distribution of L over processors use IWAVE(I), I=1,nproc,
the number of L's to use on each proc. See code for further details.

20/3/04 pstgf (4.3)
Use of BLAS DDOT in RINIT (with WMAT indices interchanged) gives
large speed-up (800+ channels, 18k poles) when using optimized
library (as opposed to self-compiling BLAS).

16/03/04 pstg2.5 (1.3), pstgjk (1.3)
Allow multiple Jp symmetries per processor. Requires total number
of symmetries to be an integer multiple of the number of processors
still, i.e. same number of symmetries on each processor.

13/03/04 pstgjk (1.2), pstg3r (3.9)
Consistency for N2HDAT and IFL test in pstg3r.

05/03/04 pstg2r (1.4)
Serial UPDATES.

27/02/04 pstg3nx (1.3)
Rework gathering of surface amplitudes, as pstg3r (CPB).

26/02/04 pstgjk (1.2)
Imp. corrected COMMON /LRPOT/ alignment problem from serial code.
Simplified use of MPI.

24/02/04 pstg2.5 (1.2)
Made more user friendly (and idiot proof). Simplified use of MPI.

21/02/04 pstg2.5 (1.1), pstgjk (1.1)
First release (CPB).

21/02/04 pstg2r (1.3)
Trivial, but necessary, mod for pstgjk.

19/02/04 pstg3r (3.9)
Interface with Parallel pstgjk/precupd.
Rework gathering of surface amplitudes.
INTEGER*8 for large cases. All CPB.

10/02/03 pstgf (2.2)
IMESH=3 (Non-linear grid) now implemented in parallel (CPB).
MXE/NPROC=integer required.

19/12/02 pstg3nx (1.2)
The "dstg3" read by exchange and non-exchange codes was inconsistent:
exchange read /matrixdat/ BEFORE any observed energies while
non-exchange read it AFTER. If the exchange dstg3 order is
read by non-exchange the observed energies are messed-up
but one doesn't notice this until one looks at the final OMEGA file.
pstg3nx has been changed to read /matrixdat/ BEFORE observed energies,
as per the parallel exchange code. 
We may still change this ordering in both codes so as to follow
the original non-exchange. The dataset would be consistent with
the serial code then. Watch this space!

10/12/02 Initial "public" release of Parallel Classic R-matrix suite.

pstg1r (1.2), pstg2r(1.3), pstg3r (3.8),
pstg3nx.f (1.2)
pstgf (2.1), pstgicf (2.1)