High Performance Computing

The project for this module was a program written in both MPI and in OMP that was used to read texts and patterns of characters and output the positions of the patterns within the texts as fast as possible using parallel programming. The sequential and OMP programs can be downloaded here and the source code for the project can be found here. The presentation slides discussing the approach used for the project can be downloaded below.

To play, press and hold the enter key. To stop, release the enter key.

Parallel Pattern Finding Program

This project contained 3 different versions of the pattern finding program, each written in C. The basic sequential version of the program provided the general algorithm used to find patterns within text files, and provided a baseline for the times taken to do so. Then the OMP and MPI programs used OMP and MPI techniques to parallelise aspects of the algorithm and improve on the times taken. There were 2 options needed for finding patterns. Either only the first instance of the pattern would need to be found in the text, or every instance needed to be found. The program was structured to read in a control file that gives details of the process, so it knows how to run. The information in the control file is given by 3 integer numbers each separated by a space on a single line. The first number is either 0 or 1, to denote whether the first instance of the pattern needs to be found or every instance. The second number indicates the text to use and the third number indicates the pattern to use.

The control file, along with the text files and pattern files, are read in from a directory called “inputs” (or “small-inputs” in the sequential version of the program). The patterns and texts are stored as “.txt” files with the names “patternn.txt” and “textn.txt”, where n denotes the number used to specify the pattern or text. When the number of the pattern and text were read in from the control file in the programs, this is how the relevant files were found and read into the programs. The control file can contain numerous lines with different sets of patterns and text, and different options for finding the patterns. The programs read in each line one by one and performed the pattern finding algorithm for each one. The output of the programs would be a results file, that would give details of the patterns found within the text. The file would specify 3 integers, the first indicating the text, the second indicating the pattern, and the third giving the position of the pattern within the text. If the pattern wasn’t found in the text, the position would be output as -1. If there were multiple occurrences of the pattern in the text, and the program was to find all occurrences, then each position would be written as a separate line in the results file.

The OMP program was given 8 cores to use for parallel programming. The program used a parallel for loop to split the work up between the 8 threads. The position of the program in the text was defined as a shared variable. In the cases that only the first instance of the pattern in the text was needed, if the text position was beyond the position found the program would break out of the loop to return the value. If all occurrences of the pattern were desired, the details of the occurrences would be output directly to the results file within the loop when they were found. The shared position value would only be updated if the value found was smaller than the value already stored (or if it was the first instance found so far). In the case that only the first occurrence was needed, or if the pattern was not found within the text, this position would be checked and output to the results file after the loop was terminated and the shared position variable was returned.

The MPI program was given 16 cores to use for parallel programming. The MPI approach was more fine grained than the OMP approach, and was a fair bit more complex in its application. With this approach, the text itself was split across the 16 threads, in order to balance the load of the sequential pattern searching across 16 separate processes. After the text was sufficiently split across the different threads, each portion of the text along with a copy of the pattern was sent to the corresponding thread. A set of arrays of boolean variables were also created to mark for each position of the text whether the pattern occurred at that position, and all positions in the arrays were initialised to false. These arrays were also sent throughout the threads. If the pattern was bigger than the text, then the algorithm was not run. Likewise, if the pattern was bigger than the text size for an individual thread, then the text size for each thread was increased by the pattern size before being sent to the different threads in case the pattern existed in the last position of the previously existing text block. For the last thread, the result was automatically set as -1 in this case. If the pattern was the same size as the text, or if there were more threads than characters in the text file, then the pattern was searched for sequentially (to avoid wasting time on communication and synchronisation overhead to invoke a redundant parallel approach). If the text was split unevenly across the threads with some leftover characters unaddressed by a thread, the remainder was split throughout the threads until the whole text was covered.

Similarly to with the OMP program, the program executed differently depending on if all occurrences of the pattern were needed or only the first. If only the first occurrence of the pattern was needed, the thread would finish once the pattern was found in that section of the text. In both cases, a position was returned for each thread to indicate the position of the pattern otherwise -1 was returned. The boolean arrays, used to mark the local positions of the pattern within each individual chunk of text, were also returned. Once the processing across each thread was, complete, the first thread was treated as the master thread and this was used to gather the positions and boolean arrays together. The positions were combined to only preserve the minimum global position where the pattern was found. The boolean arrays were combined together into a global array that transformed the local positions into positions relative to the overall size of the text. If only the first occurrence of the pattern was needed, the global position variable was used to output either that position or -1 to the results file. Otherwise, the global array was iterated through and any positions checked were output to the results file. If there were none (which was indicated in this case by the global position being set to -1 instead of a position in the text), then -1 was output in this case as well.

The parallel programs were run on Unix. The OMP program was compiled using the icc compiler (Intel C++ Compiler), and the MPI program was compiled using mpicc. Execution files were used to pass in the instructions to compile and run each program. They can be found along with the source code here. For assessment, they were executed as batch jobs on a Dell server cluster, and the elapsed execution time for each program was returned by the time utility. The programs were run with a variety of different pattern and text inputs. The outputs containing the times taken to run the parallel programs on the Dell cluster can also be found with the source code and the execution files.