Before proceeding forward, make sure, you gather relevant details about AVIDD's filesystems.
We will see how MPI I/O can be used by multiple processes to read and write to files. All processes will read from/write to the same file using different offsets.
Note: MPI I/O can only be used with parallel file-systems like GPFS. It is not designed to work with NFS and the like. Refer the MPI distribution's documentation to check what file-systems are supported.
Within the program illustrating MPI I/O (shown below), we also introduce very basic error handling mechanisms.
Important: Do note that, error handling is especially important when you are dealing with files. For example, if you open a file for writing and encounter "write" errors, then it is your responsibility to close the file (handle) and also unlink (delete) it before your program terminates. Refer to the I/O section of the MPI-2 specification.
/**********************************************************************
Copyright 2005, The Trustees of Indiana University. All right reserved.
This program illustrates how MPI I/O can be used to copy a file...
i.e. read from it and then write to a new file.
**********************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h> /* Include the MPI definitions */
#define USE_VT 1 /* 1 to call Intel Trace Collector API, 0 otherwise */
#if USE_VT
#include "VT.h"
#endif
void ErrorMessage(int, int, char*);
int main(int argc, char *argv[])
{
int numprocs, myrank;
int start, end, length;
char* buffer;
int error=0, my_get_size_error=0, get_size_error=0;
MPI_Status status;
MPI_File fh;
MPI_Offset filesize;
/* Initialize MPI */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (argc != 3)
{
fprintf(stderr, "Usage: %s FileToReadIn FileToWriteTo\n", argv[0]);
exit(-1);
}
/* Open file to read */
/* FIRST way of handling errors, not very clean, note open file */
/* MPI_Abort () lets you return an error code */
/* MPI_SUCCESS defined in mpi_errno.h which is included in mpi.h */
/* While using LAM, linking to libVT causes MPI_File_open () to crash ; This problem */
/* does not exist if MPICH is used */
#if USE_VT
VT_traceoff(); /* Don't want to trace MPI_File_open () while using LAM */
#endif
error = MPI_File_open(MPI_COMM_WORLD, argv[1],
MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
#if USE_VT
VT_traceon(); /* Restart tracing */
#endif
if(error != MPI_SUCCESS) {
fprintf(stdout, "Proc %d: Could not open file %s ; error code: %d\n", myrank, argv[1], error);
MPI_Abort (MPI_COMM_WORLD, -1);
}
/* Get the size of file */
error = MPI_File_get_size(fh, &filesize);
/* Not efficient use of variables, but our objective is show a mechanism to handle errors */
if(error != MPI_SUCCESS) my_get_size_error = 1;
/* SECOND way of handling errors, much safer, still not copmplete. */
MPI_Allreduce (&my_get_size_error, &get_size_error, 1, MPI_INT,
MPI_LOR, MPI_COMM_WORLD);
if (get_size_error == 0) {
/* Calculate the range for each process to read */
length = filesize / numprocs;
start = length * myrank;
if (myrank == numprocs-1)
end = filesize;
else
end = start + length;
fprintf(stdout, "Proc %d: range = (%d, %d); filesize was %d\n", myrank, start, end, filesize);
/* Allocate space */
buffer = (char *)malloc((end - start) * sizeof(char));
if (buffer == NULL) ErrorMessage(-1, myrank, "malloc");
/* Each process reads in different part of the file */
MPI_File_seek(fh, start, MPI_SEEK_SET);
error = MPI_File_read(fh, buffer, end-start, MPI_BYTE, &status);
if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_read");
}
else {
fprintf (stderr, " Proc: %d: Error getting file size, will close file now and quit ... \n", myrank);
}
/* Close the file; Would have gotten here straight away if there was an error getting size */
MPI_File_close(&fh);
if (get_size_error == 0) {
/* Open file to write */
#if USE_VT
VT_traceoff(); /* Don't want to trace MPI_File_open () while using LAM */
#endif
error = MPI_File_open(MPI_COMM_WORLD, argv[2],
MPI_MODE_WRONLY | MPI_MODE_CREATE, MPI_INFO_NULL, &fh);
#if USE_VT
VT_traceon(); /* Restart tracing */
#endif
if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_open");
error = MPI_File_write_at(fh, start, buffer, end-start, MPI_BYTE, &status);
if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_write");
/* close the file */
MPI_File_close(&fh);
}
/* Finalize MPI */
MPI_Finalize();
return 0;
}
void ErrorMessage(int error, int rank, char* string)
{
fprintf(stderr, "Process %d: Error %d in %s\n", rank, error, string);
MPI_Finalize();
exit(-1);
}
To compile this program, at your shell prompt (head node bhX) in the MPI_IO directory type
[agopu@bh2 agopu]$ cd ~/MPI_Tutorial/MPI_IO [agopu@bh2 MPI_IO]$ make
Important Note about LAM MPI_IO and ITC/A:
We have found that linking a LAM program that uses MPI_IO to the Intel Trace
library (libVT) i.e. using the Avoid use of Turn tracing off using -lVT flag in the Makefile causes
the MPI_File_open()
function to crash. While we work with the developers of ITC/A as well as that of LAM to fix this, We recommend two
solutions for now:
-lVT within your Makefile or VT_Trace_off () before the MPI_File_open () function
call (and turn it back on following the function call).
To run the program on interactive nodes you got through qsub -I , (bcXX; switch to that terminal and then), create a sample file on your GPFS directory and then try running the mpi_io program as shown below:
[agopu@bc56 agopu]$ cd ~/MPI_Tutorial/MPI_IO
[agopu@bc56 MPI_IO]$ ls -ld /N/gpfs/${USER}
drwx--x--x 4 agopu hpc 8192 Feb 22 17:02 /N/gpfs/agopu
If you do not have a GPFS directory, then create one:
[agopu@bc56 MPI_IO]$ mkdir /N/gpfs/${USER}/
[agopu@bc56 MPI_IO]$ cat > /N/gpfs/${USER}/mydatafile
Some data goes here
Some more data goes here
'C' programming rocks!
We all live under the same sun
Novocaine for the soul
Confortably Numb
La Villa Strangiato
How much more data can I type with my (poor) typing skills? (Hit Ctrl D to quit ...)
[agopu@bc56 MPI_IO]$ ls -l /N/gpfs/${USER}/mydatafile*
-rw------- 1 agopu hpc 275 Feb 22 17:03 /N/gpfs/agopu/mydatafile
[agopu@bc56 MPI_IO]$ lamboot $PBS_NODEFILE
[agopu@bc56 MPI_IO]$ mpirun C mpi_io /N/gpfs/${USER}/mydatafile /N/gpfs/${USER}/mydatafile2
[agopu@bc56 MPI_IO]$ lamhalt
You can expect to see something like this:
Proc 0: range = (0, 50); filesize was 202 Proc 2: range = (100, 150); filesize was 202 Proc 1: range = (50, 100); filesize was 202 Proc 3: range = (150, 202); filesize was 202After the program completes, ls -l should show you the newly created file. If everything went on smoothly, then the file-sizes should be the same; Doing a diff should not return any output!
[agopu@bc56 MPI_IO]$ ls -l /N/gpfs/${USER}/mydatafile*
-rw------- 1 agopu hpc 275 Feb 22 17:03 /N/gpfs/agopu/mydatafile
-rw------- 1 agopu hpc 275 Feb 22 17:10 /N/gpfs/agopu/mydatafile2
[agopu@bc56 MPI_IO]$ diff /N/gpfs/${USER}/mydatafile /N/gpfs/${USER}/mydatafile2
[agopu@bc56 MPI_IO]$
Now that you've seen the program work (hopefully!), let us see how the code worked and what the various MPI functions mean...
The MPI_File_open () function opens a file identified by filename on all processes in the MPI_COMM_WORLD communicator. This is a collective function meaning that all processes must do it together and the values of all parameters passed to it must be identical on all processes too.
The third parameter specifies how the file should be opened, e.g., for writing, reading or both, and whether it should be created if it doesn't exist. In our example, the input file is opened in read-only mode. All file-open modes are listed on the MPI-2 specification for MPI_File_open (). The options can be combined with the boolean ``or'', |, operator.
The fourth parameter can be used to give the operating system additional hints about how and where the file should be opened - too advanced, so let us not worry about it.
The MPI_File_get_size () function gives the file size, which will be used later on to determine the offset for each process.
The MPI_File_seek () function points to the position in the file where each process will start reading data. The seek can be performed in one of three ways: MPI_SEEK_SET (the pointer is set exactly to my_offset), MPI_SEEK_CUR (the pointer is advanced by my_offset from its current position), MPI_SEEK_END (the pointer is advanced by my_offset from the end of the file). In general my_offset can be positive and negative, i.e., you can use it to move in both directions within the file.
The MPI_File_read () function reads data into the buffer specified as the second parameter. The size to be read is defined by the third parameter.
The MPI_File_write_at () function will write data from the buffer, specified by the third parameter, into a specific position on the file specified by the second parameter.
The MPI_File_close function closes the file opened by the function MPI_File_open.
The MPI_Allreduce () function is similar to MPI_Reduce () but all processes do the reduction and know the result of the operation (rather than just the master process knowing the result)
The MPI_Abort () function terminates all the MPI processes associated with the communicator specified. It also allows you to specify a return error code.
Have you spotted a bug in the above shown code?
Hint: Try running the code on a different input file while using the same output filename. What do you see? Is there a bug? If yes, what is it???
For more comprehensive information on MPI I/O and (associated) error handling, refer to the MPI I/O section of Gustav Meglicki's I590 notes.
| Previous: Parallel Numerical Solution... | Up: Table of Contents | Next: Tweaking Security Settings |
|---|