Installing Required Tools and Libraries
To develop parallel Fortran applications on Debian, you need a Fortran compiler and parallel computing libraries. The GNU Fortran compiler (gfortran) is the most common choice, and it supports OpenMP and Coarray Fortran out of the box. For MPI (Message Passing Interface), install OpenMPI or MPICH—both are widely used for distributed memory parallelism. Use the following commands to install the necessary tools:
sudo apt update
sudo apt install gfortran # Fortran compiler with OpenMP/Coarray support
sudo apt install openmpi-bin libopenmpi-dev # OpenMPI (C and Fortran bindings)
Verify the installation with:
gfortran --version # Should show GNU Fortran version
mpif90 --version # Should show OpenMPI Fortran compiler version
These steps ensure you have the basic tools to compile and run parallel Fortran programs.
1. OpenMP: Shared Memory Parallelism
OpenMP is ideal for multi-core CPUs, using compiler directives to parallelize loops and sections of code. It’s easy to implement and integrates seamlessly with Fortran. Key steps include adding use omp_lib to access OpenMP functions (e.g., omp_get_thread_num) and using directives like !$omp parallel do to parallelize loops.
Example: Parallel Array Addition
The following program uses OpenMP to double each element of an array in parallel. The private(i) clause ensures each thread has its own loop variable, avoiding race conditions:
program parallel_add
use omp_lib
implicit none
integer :: i, n = 1000
real :: a(n), b(n)
! Initialize arrays
do i = 1, n
a(i) = i * 1.0
b(i) = 0.0
end do
! Parallel region: distribute loop iterations across threads
!$omp parallel do private(i)
do i = 1, n
b(i) = a(i) * 2.0
end do
!$omp end parallel do
! Print result from thread 0
if (omp_get_thread_num() == 0) then
print *, "First 10 elements of b:", b(1:10)
end if
end program parallel_add
Compilation and Execution
Compile with -fopenmp to enable OpenMP support:
gfortran -fopenmp parallel_add.f90 -o parallel_add
Run the program (it will use all available cores by default):
./parallel_add
You should see output like First 10 elements of b: 2.00000000 4.00000000 ..., confirming parallel execution.
2. MPI: Distributed Memory Parallelism
MPI is designed for distributed systems (e.g., clusters) and uses message passing for communication between processes. Each process runs independently, and data is explicitly shared via messages. Fortran programs use the mpi_f08 module (modern Fortran) or use mpi (legacy) to access MPI functions.
Example: Hello World with Process Information
This program initializes MPI, gets the process rank (unique ID) and total number of processes, and prints a message:
program hello_mpi
use mpi_f08
implicit none
integer :: ierr, rank, size
! Initialize MPI environment
call MPI_Init(ierr)
! Get current process rank and total processes
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)
! Print message from each process
print *, "Hello from process", rank, "of", size
! Finalize MPI environment
call MPI_Finalize(ierr)
end program hello_mpi
Compilation and Execution
Compile with mpif90 (OpenMPI Fortran compiler):
mpif90 hello_mpi.f90 -o hello_mpi
Run with mpiexec, specifying the number of processes (e.g., 4):
mpiexec -n 4 ./hello_mpi
You’ll see output from 4 processes, each with a unique rank (0 to 3).
3. Coarray Fortran: Partitioned Global Address Space (PGAS)
Coarray Fortran is a modern, standardized approach (Fortran 2008+) for parallel programming. It uses a global address space, where each process (image) can access data from other processes using coarray notation ([image_index]). It’s simpler than MPI for shared-data problems but requires compiler support (e.g., gfortran with -fcoarray=single).
Example: Parallel Sum of an Array
This program splits an array across multiple images, computes local sums, and combines them into a global sum using coarrays:
program coarray_sum
implicit none
integer, parameter :: n = 1000, num_images = 4
integer :: i, local_n, my_image, global_sum
real :: a(n), local_sum
! Initialize array and get image info
my_image = this_image() ! Current image ID (1 to num_images)
local_n = n / num_images ! Elements per image
! Initialize local portion of array
do i = 1, local_n
a((my_image - 1) * local_n + i) = (my_image - 1) * local_n + i
end do
! Compute local sum
local_sum = sum(a(1:local_n))
! Combine local sums into global sum using coarray
sync all ! Ensure all images finish local computation
if (my_image == 1) then
global_sum = sum([a(1:n)]) ! Image 1 collects all data
print *, "Global sum:", global_sum
end if
end program coarray_sum
Compilation and Execution
Compile with -fcoarray=single (single-image mode for testing):
gfortran -fcoarray=single coarray_sum.f90 -o coarray_sum
Run with mpiexec (Coarray Fortran often uses MPI under the hood):
mpiexec -n 4 ./coarray_sum
The output should show the global sum of the array (499500 for n=1000).
4. Performance Optimization Tips
schedule(dynamic) for irregular loops to balance load, and reduction clauses for operations like sums (avoids manual synchronization).MPI_Reduce instead of individual sends/receives) and overlap communication with computation.gprof (for CPU usage) or Intel VTune (for memory access patterns) to identify bottlenecks.5. Additional Libraries for High-Performance Computing
For numerical linear algebra (a common task in scientific computing), use optimized libraries that support parallelism:
sudo apt install libopenblas-dev
sudo apt install libscalapack-openmpi-dev
These libraries provide highly optimized routines for matrix operations, eigenvalue problems, and more, leveraging parallel hardware.