Internal Communication Optimization In Galaxy

Published Date: 02 Nov 2017

1 INTRODUCTION

The project is an attempt to improve the performance of job processing in galaxy by optimizing internal communication. The project is divided into two parts. In the first part of the project, I implemented a message queue as the medium of communication between different entities of Galaxy which are involved in job execution. Here the entities are Galaxy Web servers, Job handlers and Job manager. The exact role of these parts are explained in the next section. In the second part of the project, I implemented a fault tolerance mechanism, where job handlers are checked if they are alive before job dispatching. This mechanism is very useful in identifying faulty handlers and saves important job processing time. In this draft, I shall explain the though process behind implementing a message queue as opposed to the current scheme of operation. In addition, the way job handler fault-tolerance is implemented is also explained.

1.1 Galaxy Architecture

The GALAXY Framework is a web tool which aids is biomedical research [1]. It has multiple tools which wrap around existing computational tools and provides a neat interface for users who are not exposed to running tools manually. GALAXY is coded completely in python on the back-end and also uses HTML5 , SQLite and Django framework The core components of the GALAXY Framework are the toolbox, the job manager , the job handler and the web interface.[2]

The toolbox acts a bridge between the command line property of a computational tool and its web-interface. The toolbox identifies the type of data, input, output and other dependencies that the user provides and gets as an end result.

The job manager deals with the details of executing tools. The job manager decides the order in which jobs must be run and orders the dataset accordingly for the expected result. In addition to providing features like deciding which of the configured handlers to run and maintaining a queue for all the jobs requested, manager acts as a connection link between the web-interface and the back-end.

The web interface provides a mean to interact with GALAXY framework using a regular web browser. It has a web-based interface to access the tools and a model to use different datasets.

The job handler provides the processing and dispatching mechanism inside GALAXY. The handler executes the jobs assigned by the manager and reports back the results to the web interface.

1.2 Current model for job processing

Galaxy uses a SQLite database for storing transient data and record keeping. Currently, Galaxy uses a table in the SQLite database for storing job related information as a row entry. Let us assume a typical use case. For example, A user clicks uses a text manipulation tool on top his loaded dataset using Galaxy. This gets transferred from the web interface to the back-end of Galaxy, where the job parameters are extracted and an entry is made in the Jobs table of the SQLite database. The Job manager has a background thread which keeps polling the Jobs table in attempt to pick the new jobs that are added. Once a job is picked, A handler is assigned to the job ( some jobs can be handled by specific handlers). The entry inside the Jobs table for the Job is updated with the Handler information by the Job manager. The Handlers also have a similar background thread which keeps polling the Jobs table for new jobs assigned to them. New jobs are picked and processed.

As one can imagine, there are many database accesses in this model. In addition to this, as the number of handlers and web-servers keep increasing, the performance decreases even more, as each database transaction has to be locked. Which would render the other threads polling the table to be in wait state. Keeping the consistency model will be hard in cases where individual entities are deployed in separate machines.

2 METHEDOLOGY

The current model communicates the addition of a new job by adding an entry to the Jobs table in the database. The background threads in the Job manager and the Handler poll the table and pick the new jobs. As mentioned in the previous section, the polling might result in degraded performance. So I proposed a model, where the background threads don't need to poll and receive new jobs asynchronously. In the heard of the model is a message queue, which is placed in between web interface and the manager and multiple queues between manager and the respective handlers. When a new job is introduced into the system , the web interface populates the Jobs table as before and sends the job parameters as a message to the Job manager. The Job manager consumes the message and assigns the job to a dedicated handler. After this, the Manager sends a message comprising of the job parameters to the assigned handler and the handler executes the job. In the next section, I shall explain how a message queue works.

2.1 Message Queue

Message queues provide an asynchronous communications protocol, meaning that the sender and receiver of the message do not need to interact with the message queue at the same time. Usage of Message queue can be viewed as a kind of connectionless programming. Connectionless programming involves using a queue to hold your input to another application or server. Connectionless in this context means you donâ€™t have a permanent connection to the application or server. Instead, you log your input in a queue, and the other application or server then takes it from the queue, processes it, and sometimes puts it back in the same or a different queue. The opposite of connectionless is connection oriented, and basically any standard database system falls into this category.

2.2 Why Use a Message Queue and Not a Database Table?

A message queue can be of good use in many situations, but what makes it different from using a database table? When you call a DBMS, it is normally done synchronously, but with a message queue it is done asynchronously. This means it is generally faster to access a message queue than to access a database table. With synchronous access the client application has to wait for the server to respond, whereas with asynchronous access a query is sent to the server, and the client application then continues its normal duties. When the server has finished processing the query, it will notify you and pass you the result. When you use a database table, you must comply with a fairly strict format for adding data. There are certain fields that must be supplied, and you donâ€™t really have the ability to add extra information. Although the same can be said about message queues, when it comes to supplying certain fields, it is much easier to supply extra information that you have a hard time storing in a database table.

This model is an optimization over the current model for three reasons. The current model doesn't poll the Jobs table repeatedly and filter the resulting data for specifics. And the current model the Jobs table is only accessed when the assigned handler executes the job. The mode of communication that the message queue uses just TCP sockets , as opposed to executing complex queries to a relational database.

2.3 Heartbeats

GALAXY system can be configured with multiple job handlers, in order to increase capability of job processing simultaneously. Since the Job manager and the Job Handlers are separate entities, if one of the handlers suddenly stops functioning or becomes unresponsive, the Job manager has no way to know. The jobs assigned to the specific handler become stuck and have to be manually dealt with. To make it worse, the manager keeps assigning jobs to the dead handler, as per the initial configuration. To avoid such scenarios, it makes sense for the handlers to periodically communicate to let the relevant components know that they are still available. The idea that I came up with is that each Job handler updates an entry with the current time stamp in the database periodically and the manager , when ever is about to assign a job to a particular handler, checks the database if the entry specific to the handler is not stale. The manager will only assign a job to the handler once the check returns as positive.

This implementation of recording heartbeats is even more useful in a distributed scenario where each entity is placed in a different machine. This can be attributed to the fact the failure probability of an entity increases more the system is distributed.

3 Performance Analysis

The performance analysis for measuring the improvement in GALAXY job processing was done by using probes at locations in code where the job parameters are extracted from the message from the web interface and the place where the handler picks up the job for processing and execution. When ever a new job is added to the systems, both the probes come into action and report the timestamps at which they are executed in code. The time difference between the timestamps gives a near accurate understanding of the latency incurred for job transfer.

Probe 1

Probe 2

Time difference

1367465657.33

1367465656.84

0.49

1367465757.63

1367465757.13

0.50

1367465942.17

1367465941.07

1.10

1367466019.05

1367466140.81

1.00

1367466141.67

1367466140.81

0.86

1367466464.94

1367466464.09

0.85

1367466552.96

1367466552.08

0.88

In the table above, column 1 shows the timestamps recorded by probe 1 and column 2 shows the timestamps recorded by probe2. The time difference between the timestamps , which indicates the time taken by the system to start servicing a job starts around the range of 0.5 seconds at the beginning but goes up to 0.8-1 second in the later iterations. This might be attributed to the time taken for the SqlAlchemy , the database tool kit , to filter the results for each handler and jobs in a particular state.

In the table below we can see the numbers reported by the probes after using the message queue.

Probe 1

Probe 2

Time difference

1367471110.09

1367471109.70

0.39

1367471202.92

1367471202.52

0.40

1367471321.88

1367471321.55

0.33

1367471397.49

1367471397.19

0.30

1367471448.57

1367471448.24

0.33

1367471487.85

1367471487.55

0.30

1367471535.78

1367471535.44

0.34

From the table it is clear that the time taken to process a job is between 0.3-0.4 s which is less that 50% of the time taken for a job to get processed in the previous case. This presents an improvement of more than 50%. On a large scale system like GALAXY which runs online for public usage, a reduction of 50% job processing time means more throughput and better user experience.

4 Future Work

Currently the GALAXY architecture has a Job manager component which lies between the web interface and the Job handlers. There is a possibility that the manager can become a bottleneck when there is heavy traffic associated with the system. As a next step, each handler should pick up jobs directly from the web-interface with the inherent logic of managing dependencies between jobs that are pushed at the same time. This would eliminate the bottleneck that is caused by the Job manager component and also reduces the latency of job processing even more, as eliminating manager eliminates message queue activity at the manager as well.

Though GALAXY is capable of being distributed, for large scale usage, it is mostly run as a whole in a single machine, where each entity is run as a separate process. In such cases, the message queue can be replaced by inter process communication mechanisms like shared memory or FIFO. This would eliminate the overhead of using a message queue for localized scenarios.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Internal Communication Optimization In Galaxy

1 INTRODUCTION

1.1 Galaxy Architecture

1.2 Current model for job processing

2 METHEDOLOGY

2.1 Message Queue

2.2 Why Use a Message Queue and Not a Database Table?

2.3 Heartbeats

3 Performance Analysis

Probe 1

Probe 2

Time difference

Probe 1

Probe 2

Time difference

4 Future Work

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time