Note: Only changes specific to the parallel codes are noted here. Check the serial UPDATES file for changes common to both - NRB. 30/06/21 Removed tabs. 13/12/18 pstg1r (1.4) Changed LAMAX default from 2*MAXLA to LAMAX=2 since p/stgf default is to ignore .gt.2. Helpful for R-matrix script since it does not set LAMAX explicitly. For codes compiled with MZLMX=2, p/stg2/3/r ignore .gt. 2, but p/stg1/r does not. 18/08/17 pstgicf (2.5), pstgicfdamp (2.3) Ported serial change to TCCDW.DAT matching with term.dat. 17/08/17 pstg2r (1.9), pstgjk (1.7) Port serial fixes for KCUT operation. 24/07/17 pstg2r (1.9), pstgjk (1.7) Port of serial KCUT operation. 17/10/16 p/stg2.5 (1.6) Pass thru LS target e-vectors for TECs, if present. (p/stg2 has an option flag to do this but it was not originally ported into 2.5). 14/10/16 pstgjk (1.6) Ported serial TEC coding. 12/10/16 pstg2r (1.8) Refine use of allocate in sr.setmx1. 27/03/15 pstgd_dip, pstgd_dip_split KFLAG not initialized when non-DARC H.DAT in use. If taken to be zero, fails for BP (tries to use a non-allocated array). Initialize KFLAG=1 to avoid both (K quantum number not used anyway.) 21/01/15 pstg2.5 (1.5) Re-synched with Connor's (use actual dimen for H(N+1) allocate). 04/11/14 pstg3r_dip Minor: ensure DIPORDER indexes zero-channel case in sync with STG2 operation and sets the mapping index to zero for pstgd/pstgd_split to catch and exit gracefully. 03/11/14 pstgd_dip, pstgd_dip_split Corrected for DMAT2 reverasal of ABUT & BBUT dipole Buttle corrections, but differently from serial case - see also UPDATES - since they are not in the CALL DMAT2 argument. But, no use is made of DMAT1 and so can simply reverse on the DMAT WRITE. 31/10/14 pstgd_dip_split Introduce ndip_per_subworld: =1 just follows previous operation i.e. all dipoles treated simultaneously and masternproc=ndipole*nproc_per_subworld. More generally, ndip_per_subworld dipoles are treated sequentially within a sub-world a la pstgd_dip. So, masternproc=(ndipole/ndip_per_subworld)*nproc_per_subworld. Currently require the same ndip_per_subworld for all sub-worlds i.e. ndipole/ndip_per_subworld integer. This is useful for radiation damped EIE on small clusters, e.g. number of dipoles exceeds number of nodes. 27/10/14 pstgd_dip_split DSCRATCH could be deleted before all processors had finished read it (small cases). 24/10/14 pstgd_dip, pstgd_dip_split Small channel case (NPROC>No. Channels) missed a call to MPI_barrier for "unused" procs. Small channel case (NPROC>No. Channels) missed a de-allocate for "unused" procs. 23/10/14 pstg2r (1.8) Ported writes for stgjk TECs from serial code, for convenience. 23/10/14 pstg2r_dip INAST.lt.0 flags the read, *directly* after &STG2B of -INAST (N+1)-SLp symmetries which are to be skipped from the set auto-generated by MINLT,MAXLT,MINST,MAXST. This is to work around symmetries which have zero channels and subsequently cause pstgd_dip to fail - not easy to work around (see below) always best to remove at root. Note, INAST>0 is not an option, due to the prescribed way in which the code generates the dipole symmetry pairs. Ported back into p/stg2r, mainly for consistancy since INAST>0 was/is a workaround. Note: pstgd *cannot* handle zero-channel cases because the STG2XXX files are indexed including them and the DIPORDER/symmetryXX files from pstg3_dip exclude them. There is no simple workaround, save deleting theme from sizeH.dat and the corresponding STG2HXXX and STG2DXXX files *and* renumbering all higher symmetry files. 15/10/13 pstg3r_split Minor fixes. 23/08/13 pstgjk (1.5) Uniform common block sizes fixed. Re-ordering of Hamiltonian operations in LSCONT/SPINOR. 15/08/13 pstg2.5 (1.5) CF is allocated. pstgjk_split support file output: JSPLIT.DAT. 08/08/13 pstg3r (4.4) IONEONE skips file reads for IPWINT>1. 07/08/13 pstg3r (4.3), pstg3r_split, pstg3r_dip If user sets MAXST too large in stg2 then all L will have symmetries with zero channels, in particular, the final stg2 symmetry will have zero channels. This causes parallel pstg3 codes to exit ungracefully (i.e. write buffers not flushed, i.e. H.DAT incomplete) because the *final* (zero) matrix size (actually, it's undefined here) is used to initialize parameters for parallel diagonalization set-up prior to the read of H from STG2H.DAT since stg2 has flagged MORE symmetries (it did not know that the next symmtery was zero) but stg3 ignores zero-channel symmetries (they were not written to STG2H.DAT) and progresses to the next non-zero one but in the final instance there is none. Solution is to flag the end of sizeH.dat and exit rather than try and proceed to serial exit point. (Serial stg3r does not have the same initialization set-up and the code exits gracefully on reaching the end of STG2H.DAT.) 20/06/13 pstg2r (1.7) CPB: RK integrals allocated and modularised. Asympotic potential coefficients minimised. 20/06/13 pstgf (2.8) DGEMM in SR.RINIT is now called repeatedly to carry-out half-matrix multiply by sub-block - do not need full WMAT transpose now. 16/06/13 pstgf (2.8), pstgfdamp (2.7), pstgbf0damp (1.5) - copied from serial UPDATES ******************************************************** ***IMPORTANT: the size of present day R-matrix calculations often leads to much larger box sizes than was historic. The determination of outer region solutions from Coulomb series at the box increasingly fails. This is particularly true for MQDT operation. The previous default was to drop channels for such failures as they were deemed to be deeply closed (i.e. classically forbidden). Now, the 09/10/08 option is the *default*, i.e. for such cases, move inside the box, where starting values can be determined and Numerov out to the box boundary. This is controlled by the IOMSW parameter. Old defaults: IOMSW=1 (IQDT.gt.0, i.e. MQDT) IOMSW=0 (IQDT.le.0, i.e. non-MQDT) New defaults: IOMSW=11 (IQDT.gt.0, i.e. MQDT) IOMSW=10 (IQDT.le.0, i.e. non-MQDT) 19/03/13 pstgjk (1.4) 1.3->1.4 update of 27/01/11 did not pass N2HDAT to (regular) pstg3r, had to be set manually in dstg3. 13/03/11 pstgd_dip_split (CPB) Distinguish between subworld and globalworld barrier calls (masteriam). 07/03/11 pstgd_dip_split - new code (CPB) Process dipole pairs concurrently. Namelist : &predip nproc_per_subworld, ndip, ndipole / nproc_per_subworld = procs per dipole, ideally maxc in large cases ndipole = number of dipoles ndip = 1 length (default), 0 velocity (Total no. of processors -np used is ndipole*nproc_per_subworld.) 04/03/11 pstg3r_dip (CPB) Minor re-working of e-vector gathering. 04/03/11 pstg2r_dip (CPB) Minor but key change - add LSp info to sizeH.dat 27/01/11 pstgjk 1.4 (CPB) 1/Use longer record length for Direct Access files 2/Integer*8 to allow H beyond rank sqrt(2**31) 3/MXHLS now in PARAM - user to set from sizeH.dat to avoid inflation. 18/12/10 pstg2.5 (CPB) Allocate HNP1. 01/11/10 pstgjk (1.3) N2HDAT got out of sync. with pstg3r. 03/02/10 pstg2r_dip Fix to a FORMAT for certain compilers. 02/02/10 pstg2r (1.6) CPB: replace large COMMON by MODULE/ALLOCATE to reduce memory usage. 18/12/09 pstg3r_split (CPB) FULL H.DAT writes for icomplete_hdat.eq.1/npw_sub_world.eq.1 10/09/09 pstg3r (4.3), pstg3r_split, pstg3r_dip Comment-out no longer used SR.DA2 and, hence, /MEMORY/ so that MZMEG does not add to memory use. 04/09/09 pstg1r (1.4), pstg2r (1.5) Additional coding (to serial) for handling LNOEX variable. 13/08/09 pstg2nx (1.2) Crashed if more processors specified than L-range. 14/07/09 pstg3r_split - new code (CPB) pstg3r_split.f is a natural extension of the pstg3r (4.3) code. If processors are available diagonalise every Hamiltonian concurrently. Namelist : &prediag npw_per_subworld=1 / (ie number of partial waves to diagonalize per subworld ) If there are npw partial waves in total, then there are num_subworlds=npw/npw_per_subworld sub-worlds and num_subworlds * npcol * nprow MUST equal number of procs set by -np 13/07/09 pstg3r_dip pstgd_dip Further updates for NDIVD. 05/06/09 pstg3r_dip pstgd_dip Updates for large memory cases: NDIVD for pstg3r_dip (NDIV=1 recovers old single pass, default should be fine) DGEMM commented-out for pstgd_dip.f . DGEMM is faster than native coding but is memory hungry. 29/05/09 pstgbf0damp (1.3) Don's fix to pstgbf0damp for writing partial PI files. (This is parallel sepcific.) 25/02/09 pstg3r (4.3) Implemented default dynamic setting of NDIV, which controls the chunk length of surface amplitudes sent back to processor zero for aggregating and writing to H.DAT - basically, the number of chunks. If NDIV.gt.1 then the format of H.DAT is (slightly) changed - NDIV records for WMAT. The stgb/f suite is already programmed to handle this, as are the main utility codes (stgf for some time). Obscure utility codes or old versions of stgb/f will fail for NDVI.gt.1. This can be set explicitly by the user in the &matrixdat namelist, at the cost of slower running and/or in extreme cases running out of memory and/or exceeding the allowed buffer size for send/receive. ***Note, NDIV.gt.1 by default only for matrices of rank .ge. 5500. 09/10/08 pstgf (2.6), pstgfdamp (2.4), pstgbf0damp (1.2) Archived for serial changes. 26/09/08 pstg3r_dip Mike's mod to regular pstg3r for small matrices has been ported here. 24/09/08 pstg2_dip No. of processors vs no of symmetries needed to be checked on all processors so that can stop gracefully if fails. 26/03/08 pstg2_dip, pstgjk_dip, pstg3r_dip Minor fixes. 05/12/07 pstg2_dip, pstgjk_dip, pstgd_dip, pstg3r_dip Dipole enabled versions of the parallel inner region BP codes put online (passwd protected). Mainly due to CPB. The existing (non-dipole) parallel codes will remain and continue to be corrected. Only changes specific to the dipole enabled codes will be noted. 13/08/07 pstg3r (4.3) Mike (Witthoeft) added code to temporarily reset block size when handling the diagonalization of small matrices. 02/07/07 pstg3r (4.3) Selecting symmetries with IPWINIT,IPWFINAL did not work with IONEONE=1 (case of one symmetry per STG2HXXX.DAT file). 18/03/07 pstg1r (1.3) -> (1.2) Revert back to previous version, as per serial 2.38 _. 2.37. 11/01/07 pstg1r (1.3) Archived v1.2 to consolidate serial changes and start v1.3 for latest serial changes, see serial UPDATES. 30/05/06 pstg3r (4.3) NDIV change 14/02/06 failed to deallocate VECTOR, causes memory hog/failure. 15/02/06 pstgbf0damp (1.1) First release of pstgbf0damp, based on latest serial stgbf0damp (4.5) and using same parallel coding as pstgfdamp etc., including striped energy mesh for small clusters. 15/02/06 pstgf (2.6), pstgfdamp (2.4) Reduced memory requirement. 14/02/06 pstg3r (4.3) Write blocked WMAT if user specifies NDIV, no. of blocks. Default, one. 25/10/05 pstg2r (1.4) If no coupled channels for a symmetry, N+1 removal file (AMATXXX.DAT) got out of sync. with STG2HXXX.DAT - CPB. 24/10/05 pstg3r (4.2) Bug in N+1 removal (introduced accidently in 19/02/04 v3.9) if read-in LS energies - CPB. 12/07/05 pstg3r (4.2), pstg3nx (1.5) Set IPRINT.LT.0 to suppress screen writes (default 0), in NAMELIST STG3A. 08/07/05 pstgicf(2.4), pstgicfdamp (2.2) Fine mesh was not being set correctly (internally) for interpolation (IMODE=-1). 02/07/05 pstgf (2.6), pstgfdamp (2.4), pstgicf(2.4), pstgicfdamp (2.2) Implemented interpolation (IMODE=-1) operation in pstgicf. This requires the user to define a suitable distribution of interpolation energies in pstgf, via IMODE=0. (The default distribution is to distribute NPROCE=MXE/NPROC energies per processor linearly across the entire energy range, with energy step on a processor EINCR*NPROC. This balances the load optimumly for pstgf/icf, since the time per energy is highly dependent on the number of channels open/closed.) For interpolation we need sequential energies on a processor, the user sets the number via NSEQ. To attempt to balance still, choose NSEQ.lt.NPROCE so that NPROCE/NSEQ=NCINT is integer - the code determines NCINT appropriately if NSEQ is user set. (Alternatively, one can set NCINT and the code determines the appropriate NSEQ.) Thus we have stripes of energies per processor across the entire energy range. The more stripes there are the better the load balance but we also need the stripes to be wide enough (NSEQ large enough) so as to minimise the number of boundaries, because the same K-matrix is calculated twice at the boundary energy for "adjacent" processors, i.e. there is no message passing of K-matrices between processors. NSEQ=10 and NCINT.ge.5 seems a good initial choice. This does mean NPROCE=50 and so would appear to be better suited to clusters rather than massively parallel systems. But still, if pstgicf time is the issue, or the size of K-matrices transferred from pstgf, then it may still be advantageous. The fine mesh in pstgicf (IMODE=-1) is defined differently than in serial stgicf (where the IMESH1 MXE, E0, EINCR was "repeated" on a finer subset of stgf energies) so as to simplify coding. In NAMELIST IMESH1 the user just sets IEQ, the number of steps between the pstgf energies, so IEQ=1 leads to pstgicf mesh being equal to the pstgf mesh (the default). Interpolation (IMODE=-1) has also been introduced for standalone pstgf operation (e.g. BP, DARC). The same mesh distribution should be used as discussed above, but now refers to the fine mesh. Use ieq again to specify in effect the coarse mesh. (Note, in serial stgf this was the ieq.lt.0 option. In parallel pstgf/pstgicf no notice is taken of the sign of ieq. It only behaves like serial ieq.lt.0.) The OMEGAXXX files still contain duplicate boundary energy omegas, unlike those resultant from pstgicf, because there is an effective single energy mesh in pstgf while pstgicf has the coarse from pstgf and its own fine mesh. This does not affect trapezoidal rule quadrature since dE=0.0, but might be an issue for dropping "odd points" since this would appear as not being a single spike. OMADD should be modified to take account of degenerate mesh. 14/03/05 pstgb (1.1) First release of parallel version of stgb. 09/12/04 pstg3nx (1.5) Read of Buttle correction from parallel pstg2nx (multiple RADn.XXX files) was incorrect. Shows up in ICFT on weak inelastic transitions as elastic LS are mixed-in by TCCs. Use of serial stg2nx was/is fine. 27/08/04 pstgf (2.5), pstgfdamp (2.3) Default BLAS in RINIT now uses DGEMM (DDOT option commented-out). Less cache intensive and faster if openMP library used. Without openMP it might be slower as DGEMM multiplies out the full R-matrix while DDOT usage takes advantage of R-matrix symmetry to only generate half. (It compiling BLAS yourself you can trivially modify DGEMM to generate only LOWER half of R-matrix. 23/08/04 pstg2.5 (1.3) Initialize ISTOP. 31/07/04 pstg3r (4.2) KAB's method for eliminating weakly coupled N+1 terms. (TAPERD implementation quite different from serial case.) 21/07/04 pstg2r (1.4) Need all processors to OPEN CONFIG file. 17/05/04 pstg3r (4.1), pstgf (2.5), pstgfdamp (2.3) Partitioned R-matrix, as serial except note use of pdsyevx in pstg3r. 04/05/04 pstgicfdamp (2.1) The default pstgf/damp write of K-matrices (split by symmetry) is inconsistent with pstgicfdamp, which expects them split by processor/energy alone, as is the case for pstgf/pstgicf. Changed default to energy split only. 27/04/04 pstgf (2.4), pstgfdamp (2.2) DARC interface from serial codes. 21/04/04 pstgicf_blas (2.3) Fixed bug in BLAS version only. Introduced in serial stgicf v4.3. v2.2/4.2 o.k. 15/04/04 pstg2.5 (1.3) If more than one Jp symmetry per file then final symmetry of final file may may not be transformed, almost certainly so if a minimal set of LSp symmetries was used. 10/04/04 pstgf (2.3) Simplified IMESH=3 option (no need for MPI). 10/04/04 pstgfdamp (2.1) Initial public release. 09/04/04 pstgicfdamp (2.1) Initial public release. Enables use of IQDT=1 (unphysical S-matrix) with pstgf for ICFT. Faster than IQDT=2 (unphysical K-matrix) when about 2/3, or more, channels open. 04/04/04 pstg2nx (1.2) Tidied up MPI etc. 04/04/04 pstg3nx (1.4) Updated to handle multiple RADn.XXX n=1,2,3 files. Needs sizeNX.dat written by pstg2nx. Default operation looks for serial RADn.DAT n=1,2,3 first and uses them if present. 30/03/04 pstg2nx (1.1) Initial release. Simple use requires no extra input, automatically distributes L (equally) over all procs. Write sizeNX.dat for pstg3nx. To control distribution of L over processors use IWAVE(I), I=1,nproc, the number of L's to use on each proc. See code for further details. 20/3/04 pstgf (4.3) Use of BLAS DDOT in RINIT (with WMAT indices interchanged) gives large speed-up (800+ channels, 18k poles) when using optimized library (as opposed to self-compiling BLAS). 16/03/04 pstg2.5 (1.3), pstgjk (1.3) Allow multiple Jp symmetries per processor. Requires total number of symmetries to be an integer multiple of the number of processors still, i.e. same number of symmetries on each processor. 13/03/04 pstgjk (1.2), pstg3r (3.9) Consistency for N2HDAT and IFL test in pstg3r. 05/03/04 pstg2r (1.4) Serial UPDATES. 27/02/04 pstg3nx (1.3) Rework gathering of surface amplitudes, as pstg3r (CPB). 26/02/04 pstgjk (1.2) Imp. corrected COMMON /LRPOT/ alignment problem from serial code. Simplified use of MPI. 24/02/04 pstg2.5 (1.2) Made more user friendly (and idiot proof). Simplified use of MPI. 21/02/04 pstg2.5 (1.1), pstgjk (1.1) First release (CPB). 21/02/04 pstg2r (1.3) Trivial, but necessary, mod for pstgjk. 19/02/04 pstg3r (3.9) Interface with Parallel pstgjk/precupd. Rework gathering of surface amplitudes. INTEGER*8 for large cases. All CPB. 10/02/03 pstgf (2.2) IMESH=3 (Non-linear grid) now implemented in parallel (CPB). MXE/NPROC=integer required. 19/12/02 pstg3nx (1.2) The "dstg3" read by exchange and non-exchange codes was inconsistent: exchange read /matrixdat/ BEFORE any observed energies while non-exchange read it AFTER. If the exchange dstg3 order is read by non-exchange the observed energies are messed-up but one doesn't notice this until one looks at the final OMEGA file. pstg3nx has been changed to read /matrixdat/ BEFORE observed energies, as per the parallel exchange code. We may still change this ordering in both codes so as to follow the original non-exchange. The dataset would be consistent with the serial code then. Watch this space! 10/12/02 Initial "public" release of Parallel Classic R-matrix suite. pstg1r (1.2), pstg2r(1.3), pstg3r (3.8), pstg3nx.f (1.2) pstgf (2.1), pstgicf (2.1)