Cache Size On Global Miss Rate

Published Date: 02 Nov 2017

Microprocessor cannot work without memory. Processor speed has been increasing day by day at a much greater rate than memory speed; result in processor-memory gap. In order to bridge this gap, modern computers rely heavily on a hierarchical memory organization with a small amount of fast memory called cache. It also plays an important role in processor organization â€“ may it be uniprocessor or multiprocessor. Cache memory provides a quicker supply of data for execution by forming a bridge between the faster processor unit on one side and the relatively slower memory unit on the other side. It is critical task to decide cache size in the early definition stage of the design process of multiprocessor architecture. This paper deals with finding optimum cache size in centralized shared memory architectures. This work has been carried out by taking into account the miss rate; one of the important metrics to measure the performance of cache systems. It has been carried out using SMPCache 3.0 â€“ a trace driven cache simulator.

Keywords Centralized Shared Memory Architectures, Cache Size, and Miss Rate.

1. Introduction

In recent years multiprocessor demand has increased. Uses of multiprocessors have grown from mostly scientific and engineering applications to other areas also, such as databases, files and media servers. Multiprocessor architectures vary depending on the size of the machine and differ from vendor to vendor. Shared-memory architectures, become dominant in small and medium-sized machines, provide a single view of memory, which is shared among all processors referred as Centralized shared memory multiprocessor architectures. Like in uniprocessors, caching is used to achieve good performance in multiprocessors [3]. It reduces the latency of accesses by bringing the data closer to the processor [6] and it also reduces the communication traffic and bandwidth requirements in the network by satisfying requests without having to access the network. Achieving the shared memory model in the presence of caches requires special mechanisms to maintain a coherent view of memory. These mechanisms enforce a cache coherence protocol and are usually implemented in hardware for performance. Hardware based cache coherence provides better results compare to software implemented coherence [1]. The hybrid software â€“ hardware coherence mechanism has been proposed by [2].

Cache memory plays vital role in the performance of multiprocessor architectures. To achieve high performance it is very essential to select optimum cache size. But it is critical task to decide the size of caches in multiprocessor architectures. The size of cache is not decided randomly. Random selection of cache size affects heavily on the performance of multiprocessor architecture. There are many factors to determine the cache size like block size, associativity, number of blocks with much attention to miss ratio [8]. Therefore it is essential to do the experimentation with different configurations by varying cache size and taking different metrics into account before implementation. This paper focuses more on miss rate, one of the important metrics to measure the performance of cache systems. The rest of the paper is organized as follows. Section 2 gives a discussion on cache misses and its types. Methodology along with trace driven simulation tool and memory traces is discussed in section 3. Simulation Setup and Results, showing influence of cache size on miss rate are presented in section 4 and section 5 respectively. Finally, conclusion and future work will be given in section 6.

2. Tradeoffs in Cache Size

The cache organization is a critical issue in Uniprocessor as well as Multiprocessor architectures. The selection of optimum cache size is essential as it affects the miss rate which is one of the important metrics to measure the performance of cache. In the uniprocessor architectures, cache misses are categorized into the "3 â€“ Cs". They are compulsory, capacity, and conflict misses [7]. Many studies have examined how cache size affects each category of miss.

Compulsory misses

Compulsory misses are also referred as cold misses; occur on the first reference to a memory block by a processor. Since data cannot exist in the cache without first being brought into the cache, these misses cannot be avoided. Cold misses can only be reduced by increasing the block size, so that a single cold miss will bring in more data that may be accessed as well.

Capacity misses

They occur in fully associative mapped caches. When there is no room to map the block that is referenced by a processor during the execution of a program, based on replacement policy one of the blocks already mapped is replaced. Capacity misses are reduced by enlarging the cache.

Conflict misses

They occur in direct mapped and set associative mapped caches. If block from cache is replaced by another block and if processor makes the request for the replaced block conflict occurs. They are misses that would not have occurred in a fully associative cache. Conflict misses are reduced by increasing the associativity or increasing the number of blocks (increasing cache size or reducing block size).

Coherence misses

Cache-coherent multiprocessors introduce a fourth category of misses i.e. coherence misses. These occur when blocks of data are shared among multiple caches. Sharing is of two types: true sharing and false sharing. True sharing occurs when a data word produced by one processor is used by another. False sharing occurs when independent data words for different processors happen to be placed in the same block.

3. Methodology

It is beneficial to evaluate the performance of cache systems using simulation tools. Industry uses simulation extensively during processor and system design because it is the easiest and least expensive way to explore design options. Simulation is even more important in research to evaluate radical new ideas and characterize the nature of the design space. Trace-driven simulation is often a cost-effective method to estimate the performance of computer system designs. Above all when designing caches, trace driven simulation is a very popular way to study and evaluate computer architectures, obtaining an acceptable estimation of performance before a system is built. Fig.1 outlines the working with trace driven simulator.

Trace driven simulation tool

In this work we have used recent version of "SMP Cache â€“ A trace driven simulation tool" to evaluate the performance of cache systems. Recent version of this tool is "SMP Cache 3.0", designed by ARCO Research Group at Department of Technologies of Computer and Communications, University of Extremadura (Spain). It is combined version of "SMP Cache 2.0" [5] that supports bus based shared memory multiprocessor architectures (SMP) and "DSM Cache" [4] that supports distributed shared memory (DSM) multiprocessor architectures. Unlike above two tools "SMP Cache 3.0" also supports multilevel caching which is one of the important features in architectures.

Memory Traces

We have used multiprocessor traces with tens of millions of memory accesses (references) for two benchmarks (SPEECH and Simple). These traces were provided by David Chaiken (then of MIT) for NMSU PARL. The traces represent several real parallel applications. The traces had different formats, like the canonical format for multiprocessor traces developed by Anant Agarwal, and they have been changed to the SMPCache trace format. Detailed Memory Accesses along with Instructions, Data Readings and Writings of both SPEECH and SIMPLE are shown in Table 1 and Table 2 respectively.

SPEECH

Accesses

11771664

Instruction

Data Readings

9211160

Data Writings

2560503

Table 1. SPEECH Memory trace file details

SIMPLE

Accesses

27030092

Instructions

11594172

Data Readings

11541252

Data Writings

3894668

Table 2. SIMPLE Memory trace file details

Fig.1 Working with Trace Driven Simulator

4. Simulation Setup

The experiments are conducted using SMPCache 3.0 trace driven simulator with above discussed traces. The system is configured with the following architectural characteristics.

Processors in SMP = 8.

Cache coherence protocol = MESI.

Scheme for bus arbitration = LRU.

Word wide (bits) = 16.

Words by block = 32 (block size = 64 bytes).

Blocks in main memory = 524288 (main memory size = 32 MB).

Mapping = Set-Associative.

Cache sets = They will vary depending on the number of blocks in cache, to keep four-way set associative caches

Number of ways = Number of blocks in cache / Number of cache sets.

Replacement policy = LRU.

Blocks in cache are varied from 16 (cache size = 1KB), 32, 64, 128, 256, 512, 1024, and 2048 (cache size = 128 KB). For each of the selected configuration, the global miss rate is obtained for the trace files: SPEECH and SIMPLE.

5. Results

The set of experiments are performed with unified cache. By varying number of blocks the influence of cache size over miss rate was examined. Fig.2 displays global miss rate for two traces SPEECH and SIMPLE by varying cache size from (1KB to 128 KB). The Table 3 and Table 4 shows the detailed results for SPEECH and SIMPLE memory trace files respectively including number of hits, hit rate, number of misses, miss rate for the cache size 1KB to 128 KB. Table 3 and Table 4 show that for SPEECH and for SIMPLE traces 128 KB cache size shows minimum miss rate.

Cache Size (in KB)

# Hits

Hit Rate (%)

# Misses

Miss Rate (%)

1426490

12.118

10345174

87.882

1426490

12.118

10345174

87.882

4148893

35.245

7622771

64.755

5017752

42.626

6753912

57.374

5521651

46.906

6250013

53.094

7035183

59.764

4736481

40.236

7214731

61.289

4556933

38.711

128

7287619

61.908

4484045

38.092

Table 3. Hit and Miss rate for SPEECH trace file

Cache Size (in KB)

# Hits

Hit Rate (%)

# Misses

Miss Rate (%)

10481957

38.779

16548135

61.221

10481957

38.779

16548135

61.221

11967766

44.276

15062326

55.724

14888574

55.081

12141518

44.919

17203493

63.646

9826599

36.354

17491493

64.711

9538599

35.289

17767655

65.733

9262437

34.267

128

17835789

65.985

9194303

34.015

Table 4. Hit and Miss rate for SIMPLE trace file

Fig.2 Miss Rate vs Cache Size varies from (1KB to 128 KB)

6. Conclusion and Future Work

In this work it has been observed that Global miss rate decreases as cache size increases. For the memory trace SPEECH global miss rate decreases from 87.882 (for cache size 1KB) to 38.092 (for cache size 128 KB) and for the memory trace SIMPLE global miss rate decreases from 61.221 (for cache size 1KB) to 34.015 (for cache size 128 KB). As number of blocks is increased in every configuration it makes room for mapping more blocks from memory. In turn helps to reduce miss rate.

In future work we will consider the split and multi level caching to find the impact of cache size on miss rate. Along with this it is decided to work with cache misses (3Câ€™s + coherent misses) and its influence on the performance of multiprocessor architectures.

Acknowledgements

We are very much thankful to Vega-Rodriguez, Miguel A., Associate Professor and Member of ARCO Research Group at Department of Technologies of Computer and Communications, University of Extremadura (Spain), for providing SMPCache 3.0 â€“ trace driven simulator for research work.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Cache Size On Global Miss Rate

1. Introduction

2. Tradeoffs in Cache Size

3. Methodology

SPEECH

Accesses

Instruction

Data Readings

Data Writings

SIMPLE

Accesses

Instructions

Data Readings

Data Writings

4. Simulation Setup

5. Results

Cache Size (in KB)

# Hits

Hit Rate (%)

# Misses

Miss Rate (%)

Cache Size (in KB)

# Hits

Hit Rate (%)

# Misses

Miss Rate (%)

6. Conclusion and Future Work

Our Service Portfolio

Want To Place An Order Quickly?

Do not panic, you are at the right place

Get 20% Discount, Now £19 £14/ Per Page14 days delivery time

Get An Instant Quote

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time