Analysis Of Parallel Computing Models And Tools

Published Date: 02 Nov 2017

1.INTRODUCTION

There has been a tremendous increase in the performance of the single processing unit (CPU) in the last two decades, but due to heat dissipation and energy consumption issues this process reached a limit around 2003 which limited increment in CPU clock frequencies[1]. This made processor developers to switch to multiple microprocessor model known as cores [2].There a two approaches we talk about, the first approach integrates few cores(two to ten) normally in desktops and laptops, while the second approach incorporates large number of cores(several hundred) specially oriented for execution throughput of parallel programs. This has brought a tremendous change in software developing community and the interest in parallel programming has been termed as "concurrency revolution"[3]. Parallel program-ming is related to High Performance (HPC) that executes applications on multiple processors thus increasing performance. Unfortunately performance of parallel programming has not matched the peak performance of the processor, hence programming burden continues. Now the responsibility of scaling parallelism lies in the hands of the application developer [4].There are basically two approaches for application parallelization viz. autoparallelization and parallel programming [5].In the first approach, sequential applications are directly parallelized after recompiling them with parallel compilers, but due to automatic parallelization complexity the amount of parallelism reached is low. In the second approach, applications are developed to exploit parallelism that is tasks are divided appropriately and given to respective processor for executing. Thus parallel programming gives higher performance as compared to autoparallelization. Architectures of parallel systems are divided into two categories: shared memory and distributed memory[6].In shared memory architecture all the processors share the same memory address space while in distributed memory architecture every single processor has its own memory space. Today's largest and fastest computers employ both, shared and distributed memory architectures.

2. Parallel Programming Models

Let us consider the parallel programming models using a shared or distributed memory approach such as threads, shared memory OpenMP and distributed memory message passing models.

2.1 POSIX Threads

Here there are many concurrent execution paths known as threads that are independently controlled. A thread is a lightweight process having its own program counter and execution stack[7].This is a flexible model, but is low level and is usually associated with shared memory and Operating System. In 1995 a standard was released [21]: the POSIX.1c, Threads extensions (IEEE Std 1003.1c-1995), or as it is usually called Pthreads or Portable Operating System Interface (POSIX). Pthreads is a set of C programming language types and procedure calls and is implemented as header (pthread.h) with library creating and manipulating each thread. In POSIX model, the heap memory and global variables are shared by the threads. Usually this causes programming difficulties as one requires global variables shared inside a thread but not between threads. Programmer must be aware of the deadlock and race conditions while threads access shared data. For protecting critical section, Pthread provides mutex(mutual exclusion) and semaphores. Due to unstructured nature of Pthreads, it is difficult to develop and maintain programs.

2.2 OpenMP

OpenMP is an Application Programming Interface(API) which uses shared memory and its aim is to simplify shared memory parallel programming[8].Main aim to design OpenMP was, it should support HPC(High Performance Computing) programs and should be portable across shared memory Architectures. OpenMP versions for Fortran, C, C++ are available free in proprietary compilers. As OpenMP is specifically designed for parallel applications, it uses threads in a highly structures manner.

2.3 Message Passing

Message Passing can be said as a parallel programming model where inter process communication is done by interchanging messages. Message Passing is a model for distributed memory system, where we cannot achieve communication by sharing variables. Over a period of time, a standard has evolved which is Message Passing Interface(MPI). MPI is a specification for message passing operations [4]. MPI is not a language, but is a library which specifies calling sequences, names and results of functions or subroutines to be called from C, C++ or Fortran programs. The program can be compiled in any compiler but should be linked with MPI library. On distributed architectures, MPI is a de facto standard for HPC applications. In Message Passing model, separate memory is allocated for the processes executed in parallel. There is communication, when a particular part of address space of one process is copied in the address space of another process. Operation is basically cooperative and occurs normally when one process is executing send and another process is executing receive operation. It is the programmers job to manage what tasks should be computed by each process. In MPI Communication models comprise one-sided, collective, point-to-point, and parallel I/O operations. MPI is suited for applications where portability is important, both in time and in space. MPI is best choice for applications which has dynamic data structures and task parallel computations.

3. HETEROGENEOUS PARALLEL PROGRAMMING MODELS.

NVIDIA introduced first programmable GPU named GeForce3 in early 2001. In 2003, Siggraph/ Eurographics Graphics Hardware workshop, which was held in San Diego, showed a shift to non-graphics from graphics applications of the GPUs, and the new concept of GPGPU(General-purpose graphics processing unit) was born. Now it is possible to have one or more GPUs and one or more host CPUs inside a single system. This can be said as a heterogeneous system. Integrating a CPU(multi-core) and GPU on same die can be termed as an Accelerated Processing Unit (APU). NVIDIA popularized (Compute Unified Device Architecture) CUDA as the primary model and language to program their GPUs. More recently, industry worked together on the Open Computing Language (OpenCL) standard as a common model for heterogeneous programming. In addition, various proprietary solutions, like Microsoftâ€™s Direct Compute or Intelâ€™s Array Building Blocks (ArBB), are also available.

3.1 CUDA

Fig. CUDA(OpenCL) architecture.

CUDA is a parallel programming model which is developed by NVIDIA. The CUDA project was started in 2006 with first CUDA SDK released in early 2007. The CUDA model has been designed to develop applications scaling transparently with increasing number of processor cores provided by the GPUs. CUDA provides a software environment which allows the developers to use C as an high-level programming language. For CUDA, parallel system consists of host processor (i.e., CPU) and a computation resource (i.e., GPU). The computation is done in the GPU by a set of threads that run in parallel. There is a two-level hierarchy in GPUs thread architecture which consists the block and the grid. The block is a group of tightly coupled thread, each of which is identified by a thread ID. On the other hand, the grid is a group of loosely coupled blocks that have similar size and dimension. Entire grid is handled by a single GPU and there is no synchronization between the blocks.GPU is organized as a collection of multiprocessors, where each multiprocessor responsible for handling one or more than one block in a grid. A block is never divided across multiple processors. Threads that are within a block and can cooperate by sharing data through some shared memory, and by synchronizing the execution to coordinate the memory accesses.

3.2 OpenCL

OpenCL is an open standard for general purpose parallel programming across GPUs, CPUs, and other processors. The first specification of OpenCL was OpenCL 1.0, which was finished in late 2008 by Khronos Group. Essentially, OpenCL distinguished between the host (CPU) and devices (usually GPUs or CPUs). The idea of OpenCL is to write kernels (functions that execute on the OpenCL devices) and APIs for creating and managing these kernels. Kernels are compiled for the targeted device in runtime, during the execution of application which enables the host application to take advantage of all the computing devices in the system.

3.3 DirectCompute

DirectCompute is Microsoftâ€™s approach towards GPU programming. DirectCompute is a part of Microsoft DirectX APIs collection which also known as DirectX11 Compute Shader. Initially it was released with the DirectX 11 API, but runs on both DirectX 10 and DirectX 11 GPUs. One of the drawback of DirectCompute is that it only works on Windows platforms.

3.4 Array Building Blocks

Array Building Blocks (ArBB) of Intel provides a generalized vector-parallel-programming solution for data intensive mathematical computation. ArBB consists of a standard C++ library interface along with powerful runtime.

4. DISTRIBUTED PROGRAMMING

Distributed programming is one of the most important landscape of distributed programming. Here individual computing units are interconnected using some network. Distributed Computing Systems are specifically designed for HPC(High Performance Computing). Detailed explanation about distributed systems can be found in [9].Some of the approaches of distributed programming is: Grid Computing[10], CORBA, Distributed Component Object Model (DCOM), Web Services, Service-Oriented Architecture (SOA), Representational State Transfer (REST), Internet Communications Engine (Ice).

5. HYBRID PROGRAMMING

The basic goal of the hybrid model is to exploit is to exploit strengths of both the models: memory saving, efficiency and ease of programming of shared memory model along with the scalability of distributed memory model. However instead of developing new languages or runtimes, we can combine and use the already available models and tools and this approach is termed as hybrid(parallel) programming. In this programming, we can also use GPUs as source of computing power. Some of the hybrid approaches are Combining Pthreads and MPI, Combining MPI and OpenMP, Combining CUDA and Pthreads, Combining CUDA and OpenMP, Combining CUDA and MPI.

6. LANGUAGES WITH PARALLEL SUPPORT

We have seen various programming models, but fields special interest is due to languages with parallel support. These languages are normally HPC oriented and the languages fall under an umbrella of Distributed and Shared memory models. Some of the languages that support programming are Java, Cilk, High Performance Fortran (HPF), Z-level Programming Language (ZPL), Erlang, Glasgow Parallel Haskell (GpH).

7. CONCLUSIONS

Fig. Trend in Computational Field related to Parallelism

This work covers the parallel programming landscape. We have seen classification of parallel programming models, different parallel programming models namely Pthreads, OpenMP and message passing that is MPI. then we saw different Heterogenous parallel programming models viz. CUDA, OpenCL, DirectCompute, Array Building Blocks, then we saw Distributed Programming and Hybrid Programming and at last we saw the programming languages that support Parallel programming.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now