2024.09.06 更新,有加速效果

安装环境

deepin 社区版(rc2)(23),lammps-17Apr2024

首先安装kokkos

用spack安装比较简单,

git clone -c feature.manyFiles=true https://github.com/spack/spack.git spack
cd spack/bin/
./spack install kokkos

三行命令搞定。

编译lammps

cd ./lammps-17Apr2024/src
make clean-all
make yes-kokkos
make peachrl_kokkos -j 8

其中MAKE/Makefile.peachrl_kokkos 如下:

# ubuntu = Ubuntu Linux box, g++, openmpi, FFTW3

# you have to install the packages g++, mpi-default-bin, mpi-default-dev,
# libfftw3-dev, libjpeg-dev and libpng12-dev to compile LAMMPS with this
# makefile

SHELL = /bin/sh

# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler

CC =        mpicxx
CCFLAGS =    -g -O3
SHFLAGS =    -fPIC
DEPFLAGS =    -M

LINK =        mpicxx
LINKFLAGS =    -g -O3 
LIB = 
SIZE =        size

ARCHIVE =    ar
ARFLAGS =    -rc
SHLIBFLAGS =    -shared
KOKKOS_DEVICES = OpenMP

# ---------------------------------------------------------------------
# LAMMPS-specific settings, all OPTIONAL
# specify settings for LAMMPS features you will use
# if you change any -D setting, do full re-compile after "make clean"

# LAMMPS ifdef settings
# see possible settings in Section 3.5 of the manual

LMP_INC =    -DLAMMPS_GZIP -DLAMMPS_JPEG -DLAMMPS_PNG -DLAMMPS_FFMPEG

# MPI library
# see discussion in Section 3.4 of the manual
# MPI wrapper compiler/linker can provide this info
# can point to dummy MPI library in src/STUBS as in Makefile.serial
# use -D MPICH and OMPI settings in INC to avoid C++ lib conflicts
# INC = path for mpi.h, MPI compiler settings
# PATH = path for MPI library
# LIB = name of MPI library

MPI_INC =       -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1
MPI_PATH = 
MPI_LIB =

# FFT library
# see discussion in Section 3.5.2 of manual
# can be left blank to use provided KISS FFT library
# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings
# PATH = path for FFT library
# LIB = name of FFT library

FFT_INC =        -DFFT_FFTW3
FFT_PATH = 
FFT_LIB = -lfftw3

# JPEG and/or PNG library
# see discussion in Section 3.5.4 of manual
# only needed if -DLAMMPS_JPEG or -DLAMMPS_PNG listed with LMP_INC
# INC = path(s) for jpeglib.h and/or png.h
# PATH = path(s) for JPEG library and/or PNG library
# LIB = name(s) of JPEG library and/or PNG library

JPG_INC =       
JPG_PATH =     
JPG_LIB = -ljpeg -lpng

#  library for loading shared objects (defaults to -ldl, should be empty on Windows)
# uncomment to change the default

# override DYN_LIB =

# ---------------------------------------------------------------------
# build rules and dependencies
# do not edit this section

include Makefile.package.settings
include Makefile.package

EXTRA_INC = $(LMP_INC) $(PKG_INC) $(MPI_INC) $(FFT_INC) $(JPG_INC) $(PKG_SYSINC)
EXTRA_PATH = $(PKG_PATH) $(MPI_PATH) $(FFT_PATH) $(JPG_PATH) $(PKG_SYSPATH)
EXTRA_LIB = $(PKG_LIB) $(MPI_LIB) $(FFT_LIB) $(JPG_LIB) $(PKG_SYSLIB) $(DYN_LIB)
EXTRA_CPP_DEPENDS = $(PKG_CPP_DEPENDS)
EXTRA_LINK_DEPENDS = $(PKG_LINK_DEPENDS)

# Path to src files

vpath %.cpp ..
vpath %.h ..

# Link target

$(EXE): main.o $(LMPLIB) $(EXTRA_LINK_DEPENDS)
    $(LINK) $(LINKFLAGS) main.o $(EXTRA_PATH) $(LMPLINK) $(EXTRA_LIB) $(LIB) -o $@
    $(SIZE) $@

# Library targets

$(ARLIB): $(OBJ) $(EXTRA_LINK_DEPENDS)
    @rm -f ../$(ARLIB)
    $(ARCHIVE) $(ARFLAGS) ../$(ARLIB) $(OBJ)
    @rm -f $(ARLIB)
    @ln -s ../$(ARLIB) $(ARLIB)

$(SHLIB): $(OBJ) $(EXTRA_LINK_DEPENDS)
    $(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o ../$(SHLIB) \
        $(OBJ) $(EXTRA_LIB) $(LIB)
    @rm -f $(SHLIB)
    @ln -s ../$(SHLIB) $(SHLIB)

# Compilation rules

%.o:%.cpp
    $(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<

# Individual dependencies

depend : fastdep.exe $(SRC)
    @./fastdep.exe $(EXTRA_INC) -- $^ > .depend || exit 1

fastdep.exe: ../DEPEND/fastdep.c
    cc -O -o $@ $<

sinclude .depend

命令

只需要在开始计算时在命令中加入-sf kk,相当于自动在能够加速的地方加上了“/kk”,不需要修改脚本。

mpirun -np 8 lmp4kk -k on t 1 -sf kk -in in.script

不过,好像对我的计算根本没有加速效果诶。。?


2024.09.06 更新 解决warning【OMP_PROC_BIND】

注意到一个warning:

Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
  In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
  For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
  For unit testing set OMP_PROC_BIND=false

手册里有说明,于是按照手册另一处的说明调整了一下Makefile(这种是不适用于多线程的,只能t=1):

# ubuntu = Ubuntu Linux box, g++, openmpi, FFTW3

# you have to install the packages g++, mpi-default-bin, mpi-default-dev,
# libfftw3-dev, libjpeg-dev and libpng12-dev to compile LAMMPS with this
# makefile

SHELL = /bin/sh

# ---------------------------------------------------------------------
# compiler/linker settings
# specify flags and libraries needed for your compiler

CC =        mpicxx
CCFLAGS =    -g -O3
SHFLAGS =    -fPIC
DEPFLAGS =    -M

LINK =        mpicxx
LINKFLAGS =    -g -O3 
LIB = 
SIZE =        size

ARCHIVE =    ar
ARFLAGS =    -rc
SHLIBFLAGS =    -shared
KOKKOS_DEVICES = Pthread
KOKKOS_USE_TPLS=hwloc 
# or KOKKOS_USE_TPLS=libnuma

# ---------------------------------------------------------------------
# LAMMPS-specific settings, all OPTIONAL
# specify settings for LAMMPS features you will use
# if you change any -D setting, do full re-compile after "make clean"

# LAMMPS ifdef settings
# see possible settings in Section 3.5 of the manual

LMP_INC =    -DLAMMPS_GZIP -DLAMMPS_JPEG -DLAMMPS_PNG -DLAMMPS_FFMPEG

# MPI library
# see discussion in Section 3.4 of the manual
# MPI wrapper compiler/linker can provide this info
# can point to dummy MPI library in src/STUBS as in Makefile.serial
# use -D MPICH and OMPI settings in INC to avoid C++ lib conflicts
# INC = path for mpi.h, MPI compiler settings
# PATH = path for MPI library
# LIB = name of MPI library

MPI_INC =       -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1 
MPI_PATH = 
MPI_LIB =

# FFT library
# see discussion in Section 3.5.2 of manual
# can be left blank to use provided KISS FFT library
# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings
# PATH = path for FFT library
# LIB = name of FFT library

FFT_INC =        -DFFT_FFTW3
FFT_PATH = 
FFT_LIB = -lfftw3

# JPEG and/or PNG library
# see discussion in Section 3.5.4 of manual
# only needed if -DLAMMPS_JPEG or -DLAMMPS_PNG listed with LMP_INC
# INC = path(s) for jpeglib.h and/or png.h
# PATH = path(s) for JPEG library and/or PNG library
# LIB = name(s) of JPEG library and/or PNG library

JPG_INC =       
JPG_PATH =     
JPG_LIB = -ljpeg -lpng

#  library for loading shared objects (defaults to -ldl, should be empty on Windows)
# uncomment to change the default

# override DYN_LIB =

# ---------------------------------------------------------------------
# build rules and dependencies
# do not edit this section

include Makefile.package.settings
include Makefile.package

EXTRA_INC = $(LMP_INC) $(PKG_INC) $(MPI_INC) $(FFT_INC) $(JPG_INC) $(PKG_SYSINC)
EXTRA_PATH = $(PKG_PATH) $(MPI_PATH) $(FFT_PATH) $(JPG_PATH) $(PKG_SYSPATH)
EXTRA_LIB = $(PKG_LIB) $(MPI_LIB) $(FFT_LIB) $(JPG_LIB) $(PKG_SYSLIB) $(DYN_LIB)
EXTRA_CPP_DEPENDS = $(PKG_CPP_DEPENDS)
EXTRA_LINK_DEPENDS = $(PKG_LINK_DEPENDS)

# Path to src files

vpath %.cpp ..
vpath %.h ..

# Link target

$(EXE): main.o $(LMPLIB) $(EXTRA_LINK_DEPENDS)
    $(LINK) $(LINKFLAGS) main.o $(EXTRA_PATH) $(LMPLINK) $(EXTRA_LIB) $(LIB) -o $@
    $(SIZE) $@

# Library targets

$(ARLIB): $(OBJ) $(EXTRA_LINK_DEPENDS)
    @rm -f ../$(ARLIB)
    $(ARCHIVE) $(ARFLAGS) ../$(ARLIB) $(OBJ)
    @rm -f $(ARLIB)
    @ln -s ../$(ARLIB) $(ARLIB)

$(SHLIB): $(OBJ) $(EXTRA_LINK_DEPENDS)
    $(CC) $(CCFLAGS) $(SHFLAGS) $(SHLIBFLAGS) $(EXTRA_PATH) -o ../$(SHLIB) \
        $(OBJ) $(EXTRA_LIB) $(LIB)
    @rm -f $(SHLIB)
    @ln -s ../$(SHLIB) $(SHLIB)

# Compilation rules

%.o:%.cpp
    $(CC) $(CCFLAGS) $(SHFLAGS) $(EXTRA_INC) -c $<

# Individual dependencies

depend : fastdep.exe $(SRC)
    @./fastdep.exe $(EXTRA_INC) -- $^ > .depend || exit 1

fastdep.exe: ../DEPEND/fastdep.c
    cc -O -o $@ $<

sinclude .depend

这样编译的话,似乎是有加速的?对于19600个原子的聚酰亚胺壁面在300K弛豫一段时间:

mpirun -np 8 lmp4kkh -k on t 1 -sf kk -in in.scriptmpirun -np 8 lmp4kkh -in in.script
"Performance: 0.082 ns/day, 291.026 hours/ns, 3.818 timesteps/s, 74.831 katom-step/s, 98.8% CPU use with 8 MPI tasks x 1 OpenMP threads""Performance: 0.062 ns/day, 389.210 hours/ns, 2.855 timesteps/s, 55.954 katom-step/s, 99.7% CPU use with 8 MPI tasks x no OpenMP threads"