Installing Required Tools and Libraries
To develop parallel Fortran applications on Debian, you need a Fortran compiler and parallel computing libraries. The GNU Fortran compiler (gfortran) is the most common choice, supporting OpenMP (shared-memory) and Coarray Fortran (distributed-memory) out of the box. For MPI (Message Passing Interface), install OpenMPI or MPICH—both are widely used for distributed memory parallelism. Use the following commands to install the necessary tools:
sudo apt update
sudo apt install gfortran # Fortran compiler with OpenMP/Coarray support
sudo apt install libomp-dev # OpenMP runtime library
sudo apt install openmpi-bin libopenmpi-dev # OpenMPI implementation
Verify installations with gfortran --version, mpif90 --version, and omp_get_num_threads() (in a test OpenMP program).
OpenMP: Shared-Memory Parallelism
OpenMP is ideal for multi-core processors, using compiler directives to parallelize loops. Below is a simple Fortran program that calculates the sum of sine values in parallel:
program parallel_sum
use omp_lib
implicit none
integer, parameter :: n = 1000
real(kind=8) :: sum = 0.0
integer :: i
!$omp parallel do reduction(+:sum) ! Parallelize loop with reduction
do i = 1, n
sum = sum + sin(real(i, kind=8))
end do
!$omp end parallel do
print *, "Sum: ", sum
end program parallel_sum
Compilation: Add the -fopenmp flag to enable OpenMP support:
gfortran -fopenmp -o parallel_sum parallel_sum.f90
Execution: Run the executable directly (OpenMP uses threads, so no special launcher is needed):
./parallel_sum
Key Notes: Use reduction to avoid manual synchronization for operations like sums. For irregular loops, consider schedule(dynamic) to balance load.
MPI: Distributed-Memory Parallelism
MPI is designed for distributed systems (clusters), using message passing for inter-process communication. The following example broadcasts a matrix from the root process (rank 0) to all other processes and computes their sum:
program mpi_matrix_sum
use mpi_f08
implicit none
integer :: ierr, rank, size
real(kind=8), dimension(3, 3) :: matrix, local_sum
real(kind=8) :: global_sum
! Initialize MPI
call MPI_Init(ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)
! Root process initializes the matrix
if (rank == 0) then
matrix = reshape([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], [3, 3])
end if
! Broadcast matrix to all processes
call MPI_Bcast(matrix, 9, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
! Each process computes its local sum
local_sum = sum(matrix)
! Reduce local sums to global sum (root process gets the result)
call MPI_Reduce(local_sum, global_sum, 1, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_COMM_WORLD, ierr)
! Root process prints the result
if (rank == 0) then
print *, "Global sum: ", global_sum
end if
! Finalize MPI
call MPI_Finalize(ierr)
end program mpi_matrix_sum
Compilation: Use mpif90 (OpenMPI) or mpifort (MPICH) to compile:
mpif90 -o mpi_matrix_sum mpi_matrix_sum.f90
Execution: Use mpiexec or mpirun to launch with the desired number of processes (e.g., 4):
mpiexec -np 4 ./mpi_matrix_sum
Key Notes: Use MPI_Bcast for data distribution and MPI_Reduce for collective operations. Minimize communication (e.g., use collective operations instead of individual sends/receives) for better performance.
Coarray Fortran: Partitioned Global Address Space (PGAS)
Coarray Fortran is a modern, standardized approach for parallel programming, supported by gfortran with the -fcoarray flag. The following example computes the global sum of an array using coarrays (each image handles a portion of the array):
program coarray_sum
implicit none
integer, parameter :: n = 1000, num_images = 4
integer :: i, local_n, my_image, global_sum
real :: a(n)[*] ! Coarray declaration (each image has a copy)
! Initialize
my_image = this_image() ! Current image ID (1 to num_images)
local_n = n / num_images ! Elements per image
! Initialize local portion of the array
do i = 1, local_n
a((my_image - 1) * local_n + i) = real((my_image - 1) * local_n + i, kind=8)
end do
! Synchronize all images before combining results
sync all
! Root image (1) collects and sums all elements
if (my_image == 1) then
global_sum = 0.0
do i = 1, num_images
global_sum = global_sum + sum(a(:local_n)[i])
end do
print *, "Global sum: ", global_sum
end if
end program coarray_sum
Compilation: Use -fcoarray=single for single-image testing (replace with -fcoarray=mpi for distributed-memory execution):
gfortran -fcoarray=single -o coarray_sum coarray_sum.f90
Execution: For distributed-memory execution, use mpiexec (Coarray Fortran often uses MPI under the hood):
mpiexec -n 4 ./coarray_sum
Key Notes: Coarrays simplify parallel programming by providing a unified syntax for shared and distributed memory. Use sync all to ensure synchronization between images.
Performance Optimization Tips
schedule(dynamic) for irregular loops to balance load. For example:!$omp parallel do schedule(dynamic) reduction(+:sum)
Use reduction clauses for operations like sums, products, or maxima to avoid manual locks.MPI_Reduce instead of individual MPI_Send/MPI_Recv). Overlap communication with computation where possible (e.g., compute while waiting for messages).gprof (for CPU usage) or Intel VTune (for memory access patterns) to identify bottlenecks. Optimize array operations (use Fortran’s array syntax, e.g., c = matmul(a, b), which the compiler vectorizes). Compile with optimization flags like -O3 (highest optimization) and -march=native (target current CPU architecture):gfortran -O3 -march=native -fopenmp program.f90 -o optimized_program
sudo apt install libopenblas-dev libscalapack-openmpi-dev