64-bit version: Segmentation fault with Intel MPI
64 位版本:使用 Intel MPI 时出现段错误
Compiling the 64-bit version of DALTON 2020.1 with Intel's OneAPI toolchain using a setup command like
使用 Intel 的 OneAPI 工具链编译 64 位版本的 DALTON 2020.1,采用类似设置命令的方式。
$ ./setup --fc mpiifx --cc mpiicx --cxx mpiicpx --mkl parallel --mpi --int64
for OneAPI 2024 leads to a segmentation fault after the SYMGRP output when running calculations with more than 1 MPI process. This happens on all OneAPI versions, using both the classic (ifort
etc.) and LLVM (ifx
etc.) compilers.
对于 OneAPI 2024,在使用超过 1 个 MPI 进程进行计算时,执行 SYMGRP 输出后会导致段错误。此问题在所有 OneAPI 版本中均出现,无论是使用经典编译器(如 ifort
等)还是 LLVM 编译器(如 ifx
等)。
The segfault occurs in gen1int/gen1int_host.F90:82
in a call to MPI_Bcast
; I have attached a sample calculation with debug flags and backtrace:
段错误发生在 gen1int/gen1int_host.F90:82
中,在调用 MPI_Bcast
时发生;我附上了一个带有调试标志和回溯的示例计算:
I am fairly certain this issue is due to the use of the mpif.h
header in DALTON, which defines MPI_INTEGER_KIND
to be 4 bytes:
我相当确定这个问题是由于在 DALTON 中使用了 mpif.h
头文件,该头文件定义 MPI_INTEGER_KIND
为 4 字节:
$ grep -n MPI_INTEGER_KIND $I_MPI_ROOT/include/mpif.h
406: INTEGER MPI_INTEGER_KIND
407: PARAMETER (MPI_INTEGER_KIND=4)
Manually changing all #include "mpif.h"
statements to use mpi
statements before the implicit
line makes the segfault disappear and calculations run as expected; however, I've seen some comments in the code saying that use mpi
causes problems with MPICH in return. Therefore, seeing as there is already a USE_MPI_MOD_F90
preprocessor flag, would it be possible to automate the usage of the header or module through something like
在 implicit
行之前手动将所有 #include "mpif.h"
语句更改为 use mpi
语句,可以使段错误消失,计算也能按预期运行;然而,我在代码中看到一些评论提到 use mpi
会导致 MPICH 出现问题。因此,鉴于已经有一个 USE_MPI_MOD_F90
预处理器标志,是否可以通过类似的方式自动化使用头文件或模块?
#include "mpi_mod.h"
#include "implicit.h"
#include "mpi_header.h"
with 与
! mpi_mod.h
#if defined(VAR_MPI) && defined(USE_MPI_MOD_F90)
use mpi
#endif
and 和
! mpi_header.h
#if defined(VAR_MPI) && !defined(USE_MPI_MOD_F90)
#include "mpif.h"
#endif
Having to remember to include two headers is certainly not ideal, but since use mpi
has to come before the implicit
statement and the #include "mpif.h"
has to come after (?), I don't really see another way.
必须记住包含两个头文件当然不理想,但由于 use mpi
必须在 implicit
语句之前,而 #include "mpif.h"
必须在之后(?),我实在想不出其他办法。
Let me know if you need any additional info. Also, I'd be happy to submit a PR if you're fine with the proposed solution.
如有需要其他信息,请告知。此外,如果您对提议的解决方案没问题,我很乐意提交一个 PR。
Best, Yannick Lemke 最好的,扬尼克·伦克
Edit: I just noticed this is not the actual issue tracker, should I move this to https://gitlab.com/dalton/dalton/-/issues?
编辑:我刚注意到这不是实际的问题跟踪器,我应该将其移至 https://gitlab.com/dalton/dalton/-/issues 吗?