Processors For Handheld Systems Mobile Devices

Published Date: 02 Nov 2017

SiSoftware Sandra is a benchmarking suite which supports both 32-bit and 64-bit systems for analyzing as well as diagnosing the computer. It can also provide all the detail information about the hardware and software installed which are essential in driver updating or hardware upgrading. The suit consists of the following benchmarking tests for the following hardware elements: CPU, chipset, memory, graphic card, printers, sound card, network, AGP, ports, Windows components, ODBC Connections, FireWire and USB. Not only hardware, but also the details of the software installed and running on the computer can be analyzed using the Sandra. Moreover, it also allows the users to select individual modules tests for each hardware elements. It also includes other useful features such as creating a report about the computer or benchmark result. Lastly, but not the least, the program size can be considered as small (less than 50MB) and the suite is very user friendly. But currently, the Sandra only runs on the Windows platform. Most importantly, the lite version is free to use and includes numerous useful features.

Processors for handheld systems/mobile devices

This session is the study of benchmarks by SiSoftware from the link http://www.sisoftware.net/?d=qa&f=xolo

With the introduction of Intelâ€™s Atom x86 SoC for smart phones, the need to compare the performance of the Intel SoC with the current market dominating ARM SoCs arise in order to determine the Atomâ€™s performance. Performance comparisons between the Lava Xolo X900 (uses Atom), HTC One S, Sony Xperia S and Motorola Xoom were made to determine the performance of the newly introduced Intelâ€™s SoC. Since, the Atom is a single-core with 2 threads, comparing it with the latest ARMâ€™s quad-core CPUs would not yield a fair result for the Atom. Hence, only dual-core phones and tablet were chosen for the test. The test, however, is not comparing the overall phones but only the following hardware performance: CPU, GPU, and memory performance.

To test the CPU performance, the "Native Processing Performance" test was used in order to get the actual CPU performance instead of the JVM performance. The CPUsâ€™ arithmetic performance (integer and double), multi-media SIMD (integer, float, and double) performance, cryptographic performance (AES and SHA2) and multi-core efficiency were tested to determine the CPU performance.

Next, since Android applications run on the Java JVM (Dalvik), it is important to test the JVM virtualization performance and how well it is optimized to take advantage of the latest instruction sets. The Java CPU arithmetic (integer and double) and Java CPU multi-media (integer, float and double) were tested to compare the Atom with other SoCs.

Another important factor is the memory performance since it can directly affect the CPU efficiency. Hence, the cache bandwidth and latency, memory bandwidth and latency were tested using all the various access patterns (sequential, in-page random, full random access patterns) tests.

Lastly, but not the least, the graphic performance of the SoCs also need to be compared. However, none of the GPU cores being compared have the General-purpose graphics processing unit (GPGPU) capabilities. Since modern applications and UI require high graphic performance, it is essential consider the graphic performance when comparing the chips. Video OpenGL Shading (float FP32 and float FP64) tests were used to determine the graphic performance of the chips.

The test results show that Intelâ€™s Atom x86 SoC which is a new architecture for Android devices performs relatively well in most of the native code benchmarks even when compared with the modern ARMâ€™s architecture dual-core SoCs. Atoms performs a lot better than the ARMâ€™s dual-core CPUs not only in SSE SIMD tests, but also in cache, memory and inter-thread bandwidth and latencies tests. However, the Atom still cannot be compared with the latest Android phones with ARMâ€™s quad-core CPUs. Intel will need a dual-core CPU with Hyper Threading technology in order to go against with those quad-core CPUs. But, whether the TDP for dual-core will remain the same or not is still hard to tell. Since the latest ARM processor Cortex A15 and NEON still havenâ€™t released yet, Intel now has a great opportunity to get a market share by deploying its already proven technologies.

Memory hierarchy

This session is the study of benchmarks by SiSoftware from the link http://www.sisoftware.net/?d=qa&f=ben_mem_latency

Latency is defined as the time or clocks taken to transfer a data block from the cache or main memory (RAM). Hence, the lower the latency, the better the performance is. For CPU to execute an instruction, both the data and the instruction must be first brought into the register and only after that the CPU can proceed to execute. Otherwise, the CPU has to wait for the data and instruction to be in the register. The CPU wait time is known as the latency and usually expressed in clocks for caches and in nanoseconds for the main memory.

Since the latency of the RAM can highly affect the CPU efficiency, it is important to measure it. Even if the execution speed becomes faster, the overall performance will not increase if the CPU needs to keep waiting. To lower the main memory latency effect, modern CPUs have different levels of internal caches which are connected using back side bus. The latency of the cache is a lot lower than that of main memory. Most CPUs now have a hierarchy of caches known as L1D, L2 and L3. L1D being the fastest and smallest and L3, the largest and slowest among the caches. Not only the running speed, but also the various memory timing such as access, command, and transfer could affect the overall memory latency. The ratio of memory latency over CPU L1D latency can also give a very useful information about the memory speed compared to the CPU caches.

However, modern CPUs bring the data into cache by guessing what instructions or data are going to be needed next and fetch and store it in the caches. By doing so, the CPU can get the data from the caches instead of the main memory and hence, the CPU waiting time can be greatly reduced. This process is done by the prefetchers which guess the next instruction and data and bring it into the caches.

Another factor that could affect the result is the virtual memory performance (paging). Paging makes use of a page table and they are usually implemented together with Translation Lookaside Buffer (TLB) to improve the performance. TLB is a cache that contains the mapping of virtual and physical address. In other words, it is a cache that contains a subset of the page table content. When the CPU want to access a data, it will check the TLB. If the required data is not found in the TLB, then a TLB miss has occurred. In such case, the TLB has to be updated with the required data causing more delay.

Hence for testing the memory, the following 3 tests are used: Sequential Access Pattern, In-Page Random Access Pattern and Full Random Access Pattern. The test showed that the latency is not fixed but varies greatly with how the memory is accessed (access pattern) and the page size. How the applications access the memory and how they allocate and manage the memory is going to affect the resulting latencies. Moreover, the results also shows that the prefetchers can greatly improve the CPU efficiency (since the CPU does not have to wait for the instruction and data to be fetched from the main memory). Lastly, the extra latency caused by the TLB miss is also significant. Hence, CPUs these days have multiple TLBs. With the applicationsâ€™ memory usage size increase, the TLB misses also increase. Since Microsoft Windows OS doesnâ€™t not allow large pages, larger TLBs size could compensate for the matter.

SPEC benchmarks

SPEC CPU2006

It is an industry-standardized software suite from SPEC that is designed to benchmark the CPU mainly by running stress test on the processor, memory subsystems and compiler. The main reason of SPEC for producing it is to let the users perform compute-intensive performance measurements across different kinds of hardware using the workloads which are designed using the real user applications.

SPECint/SPECfp

It is a part of the SPEC benchmarking suite. SPECint is used to benchmark the CPUâ€™s integer performance while the SPECfp is used to benchmark the CPUâ€™s floating point performance. There are 12 integer testing components which are written in C or C++ in the SPECint. On the other hand, the SPECfp contains 17 floating point testing components which are written in C, C++ or Fortran. Both SPECint and SPECfp are used to benchmark single core CPUs.

The benchmarking components run the following area of applications to get the benchmark result.

SPECint

SPECfp

Programming Language

Compression

C Compiler

Combinatorial Optimization

Artificial Intelligence: Go

Search Gene Sequence

Artificial Intelligence: chess

Physics/Quantum Computing

Video Compression

Discrete Event Simulation

Path-finding Algorithms

XML Processing

Fluid Dynamics

Quantum Chemistry

Physics/Quantum Chromodynamics

Physics/CFD

Biochemistry/Molecular Dynamics

Physics/General Relativity

Biology/ Molecular Dynamics

Finite Element Analysis

Linear Programming, Optimization

Image Ray-tracing

Structural Mechanics

Computational Electromagnetics

Quantum Chemistry

Weather

Speech recognition

SPECintRate/SPECfpRate

The SPECintRate and SPECfpRate are similar to the SPECint and SPECfp but they are used to benchmark the multicore CUPs. It is important not to compare the score of the int and intRate. For a multicore CPU and single core CPU, even though the SPECint score for single core CPU is higher, it doesnâ€™t mean that it will perform better than the multicore CPU which has a lower score. This is due to the fact that the SPECint doesnâ€™t no utilize the power of multicore in its benchmarking. SPECintRate and SPECfpRate measure the overall performance of CPU by utilizing as many cores and threads as possible.

SPECJVM98/SPECJBB

SPECJVM9 is a benchmarking software that is designed to measure the Java Virtual Machine (JVM) client platforms performance. There are 8 different tests included and 7 of them are used to benchmark the computing performance and the remaining one is used to validate some of the features of Java. However, with the introduction of new version, SPECJVM2008, the SPECJVM98 is no long supported.

SPECJBB (latest version: SPECJBB2005) is used for benchmarking the server side Java performance. It benchmarks the performance by reproducing the function of a 3-tier system with emphasis on the middle tier which is responsible for the business logic and object manipulation. In addition, the CPUs, scalability of shared memory processors (SMPS), caches performance are also part of the benchmarking process. With the introduction of XML processing and BigDecimal computation, the SPECJBBâ€™s benchmark score represents a better reflection of the real applications.

SPECPower

SPECpower_ssj2008 is one of the SPECâ€™s industry-standard benchmarking suite. It benchmarks the performance and power characteristics of server computer as well as multi-node computers. With the increased concern in the high energy and power consumption of the servers, it is important to benchmark not only the computing performance but also the power/energy performance.

Dhrystone/Whetstone

The Dhrystone benchmark contains a collection of numerical operations what are widely used by applications. Whetstone, on the other hand, contains a collection of floating point operations which are designed to benchmark the Floating-Point Unit (FPU) or the Co-processor. The operations are important especially in scientific, statistical and engineering applications. Both the Dhrystone and Whetstone are synthetic benchmarking programs, simple programs which put the workload on the individual components. Hence, the Dhrystone and Whetstone benchmark scores do not represent the real-life application performance. However, they are useful in comparing the speed of different CPUs.

OpenGL

OpenGL stands for Open Graphics Library and is a widely used graphics application programming interface (API). It is usually used to interact with a Graphics Processing Unit (GPU) to render 2D and 3D graphics. Being a language independent API, it can be run on many platforms making it a cross-platform API. It only focus on the rendering of the graphics and does not have any functions to provide input and audio. OpenGL contains a collection of rendering, special effects, texture mapping and other useful functions. Since it is a cross-platform API, it ensures the wide application deployment.

OpenCL

OpenCL stands for Open Computing Language and it is a low-level API for parallel programming which are used in the computer systems which include different types of computational units. Such computer systems are also known as heterogeneous systems.

Efficiency Metrics

Performance vs. Cost (Cost Efficiency)

Cost efficiency can be also defined as the ratio of price over performance. Even though the performance of a computer system is important, price is still an important factor in deciding which computer to buy. Depending on the type of performance, it can be either "the higher, the better" or "the lower, the better". For computing power and transfer rates, it is the case of "the higher, the better". However, for timing performance such as latency, it is the case of "the lower, the better". Hence, it is important to know the type parameter that is being measure in order to know whether higher or lower is better.

Capacity vs. Power (Size Efficiency)

It is a ratio of capacity over power. It indicates how capacity a device can provide for 1 unit of power. For storage devices, the size efficiency is also an important factor. Dissipating more power means more heat will be produced. Hence, for data centers and server farms, where the power and cooling is expensive, it is very important to consider the size efficiency. For size efficiency, since the measuring component is always capacity, the higher the result, and the better the component is.

Performance vs. Power (Power Efficiency)

Power efficiency is also known as energy efficiency. It is a ratio of performance over power. It is very useful in estimating the long term cost efficiency since better power efficiency mean less power dissipated and hence lower the energy and cooling cost. Therefore, power efficiency is also a very important factor for data centers and server farms. Similar to the cost efficiency, the result of power efficiency benchmarking can be either "higher better" or "lower better".

Memory Bandwidth and latency

Memory bandwidth is defined as the data transfer rate between the memory and the processor. It can also be simply defined as the result of multiplication of memory bus width and bit rate. However, modern memory bandwidth calculation are not as simple anymore. On the other hand, latency can be defined as the time taken to receive a response after a request is sent. In other words, latency is the number of clock cycle delays. Usually, latency are displayed as X1-X2-X3-X4 where X1, X2, X3 and X4 represents the column address strobe latency, row address strobe latency, row precharge latency and active to precharge delay representatively. The lower the numbers are, the better the performance is.

Cache/Memory Bandwidth

Cache bandwidth is the rate of data transfer between the CPU and the cache. On the other hand, memory bandwidth is the rate of data transfer between the CPU and the system memory (RAM). The cache bandwidth is also known as back side bus and the memory bandwidth is also known as the front side bus. For memory bandwidth, even though the front side bus speed does not represent the actual bandwidth of the memory, the front side bus represents the maximum data transfer rate between the CPU and the memory. Hence, it can be said that front side bus speed limit the memory bandwidth of the system.

Computer Overview

The following is part of the report generated by the SiSoftware Sandra about the computer overview.

Computer Overview

ID

Host Name

Zachary-PC

Workgroup

WORKGROUP

Computer

Model

Acer Aspire 4920

Serial Number

LXAKW0X6378***********

Chassis

Acer Notebook

Mainboard

Acer Tahoe

Serial Number

LXAKW0X6378***********

BIOS

Phoenix (OEM) V1.20 02/01/2008

Total Memory

3GB DDR2 SO-DIMM

Processors

Processor

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz (2C 2.2GHz, 4MB L2)

Since it is a dual-core processor, it is mentioned by "2C"

Socket/Slot

FC ÂµPGA (Socket P)

Chipset

Memory Controller

Acer Mobile PM965/GM965/GL960 Express Processor to DRAM Controller 4x 199MHz (796MHz), 2x 1.5GB DDR2 SO-DIMM 664MHz 128-bit

Memory Module(s)

Memory Module

Hynix (Hyundai) HYMP125S64CP8-Y5 2GB DDR2 SO-DIMM PC2-5300U DDR2-666 (5-5-5-15 3-20-5-3)

Memory Module

Transcend JM667QSJ-1G 1GB DDR2 SO-DIMM PC2-5300U DDR2-666 (5-5-5-15 3-20-5-3)

Video System

Monitor/Panel

Samsung Generic PnP Monitor (1600x1200, LTN141W3-L01 , 14.0")

Video Adapter

ATI Mobility Radeon HD 2400 XT (SM4.0 600MHz, 256MB DDR2 800MHz 64-bit, PCIe 1.00 x16)

Graphics Processor

Storage Devices

WDC WD2500BEVS-22UST0 (250GB, SATA150, 2.5", 5400rpm, 8MB Cache)

233GB (C:) (D:) (U:)

TSSTcorp CDDVDW TS-L632H (ATA33, DVD+-RW, CD-RW, 2MB Cache)

N/A (E:)

Logical Storage Devices

DATA (D:)

88GB (NTFS) @ WDC WD2500BEVS-22UST0 (250GB, SATA150, 2.5", 5400rpm, 8MB Cache)

New Volume (U:)

10GB (NTFS) @ WDC WD2500BEVS-22UST0 (250GB, SATA150, 2.5", 5400rpm, 8MB Cache)

Hard Disk (C:)

132GB (NTFS) @ WDC WD2500BEVS-22UST0 (250GB, SATA150, 2.5", 5400rpm, 8MB Cac

Operating System

Windows System

Microsoft Windows 7 Ultimate 6.01.7601 (Service Pack 1)

Platform Compliance

x86

From the report, we can see many useful information such as the BIOS type, the chipset, memory modules installed and so on. These information are very useful in updating BIOS, upgrading the memory, and updating drivers. The report shows that both memory modules installed in the system have the same latency of 5-5-5-15.

Processor

Model

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz

Speed

2.2GHz (99%)

Minimum/Maximum/Turbo Speed

600MHz - 2.2GHz - 2.2GHz

Peak Processing Performance (PPP)

17.6GFLOPS

Adjusted Peak Performance (APP)

5.28WG

Cores per Processor

2 Unit(s)

Threads per Core

1 Unit(s)

Front Side Bus Speed

199MHz

Type

Laptop/Netbook

Revision/Stepping

F / B

Microcode

MU060F0BB6

Latest Version

MU060F0BC1

L1D (1st Level) Data Cache

2x 32kB, Write/Back, 8-Way, 64bytes Line Size

L1I (1st Level) Code Cache

2x 32kB, Write/Back, 8-Way, 64bytes Line Size

L2 (2nd Level) Data/Unified Cache

4MB, ECC, Write/Back, 16-Way, 64bytes Line Size, 2 Thread(s)

ECC (Error-Correcting Code) memory can detect and correct most of the common internal data corruption. The memory maintained by the ECC is single-bit errors free. It is very useful in cases where no data corruption is tolerated under any condition. Especially in scientific and financial cases.

Unified cache means both the data and instructions are cached in one cache. Unlike L1D, and L1I, L2 cache will cache both the data and instructions, while L1D only caches data and L1I only caches codes/instructions.

Comparing the performance of 2 computer systems for stock valuation

It is essential for a stock broker to evaluate the stock before doing any decision. There are many companies out there with various access, income and pay-out. Moreover, those data are changing real time. Hence, the stock valuation process is a very CPU intensive process. The system needs to calculate the various factors that can affect the price and update them. To be able to do so, a powerful CPU is essential. Hence, in this report, 2 different CPU were compared and the results will be discussed.

CPUs specifications

The following 2 processors are used in this report.

T9300

T7500

Model

Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz

Speed

2.7GHz (103%)

2.2GHz (99%)

Minimum/Maximum/Turbo

600MHz - 2.5GHz - 2.6GHz

600MHz - 2.2GHz - 2.2GHz

Peak Processing Performance(PPP)

21.55GFLOPS

17.6GFLOPS

Adjusted Peak Performance (APP)

6.47WG

5.28WG

Cores per Processor

2 Units(s)

2 Unit(s)

Threads per Core

1 Unit(s)

Front Side Bus Speed

200MHz

199MHz

Type

Laptop/Netbook

L1D (1st level) Data Cache

2x 32kB, Write/Back, 8-Way, 64bytes Line Size

L1I (1st level) Code Cache

2x 32kB, Write/Back, 8-Way, 64bytes Line Size

L2 (2nd level) Data/Unified Cache

6MB, ECC, Write/Back, 24-Way, 64bytes Line Size, 2 Thread(s)

4MB, ECC, Write/Back, 16-Way, 64bytes Line Size, 2 Thread(s)

We can see from the specifications that T9300 is better than T7500 in terms of CPU clock speed (13.6%), and L2 cache. But we wonâ€™t be able to know how much T9300 can perform better than T7500 in the stock valuation process. Therefore, we need to perform a benchmarking to test the 2 processors performance.

Processor native arithmetic performance

Benchmark Results

T9300

T7500

Aggregate Native Performance

20.69GOPS

16GOPS

Dhrystone Integer Native ALU

23.55GIPS

19GIPS

Whetstone Double Native SSE3

18.17GFLOPS

13.34GFLOPS

The stock valuation required a lot of integer as well as floating point calculations. Hence, to be able to perform well, the CPU must be able to perform these 2 tests well. Therefore, to test the integer and floating point performance, we used the "Dhrystone Integer Native ALU" test for the integer performance benchmark and "Whetstone Double Native" test for the floating point performance benchmark. By using the "native" test, we will be able to get the actual performance of the CPUs, not the JVM performance.

Since the calculation performance is being measured, the higher the score, the better the performance is. From the results, we can see that T9300 performed 29.3% faster than the T7500 in aggregate native performance. But if we compare the native integer performance, the T9300 is only 23.9% faster than the T7500. However, in the native double performance test, the T9300 beats the T7500 by 36.2%.

Processor native arithmetic speed efficiency

Speed efficiency (performance vs. speed) can be indirectly used to get the cost efficiency (long term) and power efficiency. Hence, it is important to see how the 2 CPUs perform in terms of speed efficiency.

Performance vs. Speed

T9300

T7500

Aggregate Native Performance

7.68MOPS/MHz

7.27MOPS/MHz

Dhrystone Integer Native ALU

8.74MIPS/MHz

8.68MIPS/MHz

Whetstone Double Native SSE3

6.74MFLOPS/MHz

6.08MFLOPS/MHz

The results shows that T9300 speed efficiency is only 5.6% better than the T7500 in the aggregate native performance test. And in the native integer speed efficiency test, T9300 is only 0.69% better than the T7500. Even though the T9300 performs 23.9% better than the T7500 in the native integer performance test, the speed efficiency difference for the 2 CPUs is negligible. Lastly, in the native double speed efficiency test, the difference is only 10.9%.

Java arithmetic performance

Next, the Java arithmetic performance of the CPUs are benchmarked and compared. Since many applications nowadays are written in Java, it is important to have a great Java arithmetic performance for the CPUs. Therefore, to benchmark the Java arithmetic performance, "Dhrystone Integer Java", "Whetstone Double Java" and "Whetstone Float Java" tests were used.

Benchmark Results

T9300

T7500

Aggregate Java Performance

12GOPS

10.35GOPS

Dhrystone Integer Java

17GIPS

15.3GIPS

Whetstone Double Java

8.44GFLOPS

7GFLOPS

Whetstone Float Java

8.74GFLOPS

6GFLOPS

In the Java performance test, the T9300 performs 15.9% better than the T7500 in overall Java performance. The T9300 totally beat the T7500 in the Java floating point performance benchmark with 45.7% better performance. But in the Java integer performance test, T9300 is only 11.1% faster than the T7500 and in the Java double performance test, 20.6%.

Java arithmetic speed efficiency

We also need to test the CPUsâ€™ Java arithmetic speed efficiency in order to determine their power efficiency and long-term energy efficiency.

Performance vs. Speed

T9300

T7500

Aggregate Java Performance

4.45MOPS/MHz

4.32MOPS/MHz

Dhrystone Integer Java

6.33MIPS/MHz

6.39MIPS/MHz

Whetstone Double Java

3.13MFLOPS/MHz

2.92MFLOPS/MHz

Whetstone Float Java

3.24MFLOPS/MHz

2.51MFLOPS/MHz

For the Java arithmetic speed efficiency test, T9300 is 3% more efficient than T7500 in the aggregate Java performance test. However, in the Java integer test, the T7500 is more efficient than the T9300 by 1%. From this, we can see that even though the T9300â€™s Java integer performance benchmark result is better than the T7500, it does not necessarily mean that T9300 will have a better speed efficiency than the T7500. But in the Java double and float speed efficiency tests, the T9300 is better than the T7500 by 7.1% and 29% respectively.

Conclusion

From the above results, we can see that even though the T9300 has significant better performance in native CPU and Java arithmetic tests than T7500 with its higher clock speed and larger L2 cache size, the speed efficiency difference between the 2 CPUs on average is less than 5%. Therefore, we can conclude that the aggregate CPU and Java native performance result is important if we are looking for the highest possible performance. On the other hand, if we are looking for the "best system", the speed efficiency result is important since it is generally accepted that the most efficient system is "the best"". In this case, we can conclude that T9300 is a better suited CPU for the stock valuation systems since we are looking for the best possible performance.

Our Service Portfolio

Want To Place An Order Quickly?

Then shoot us a message on Whatsapp, WeChat or Gmail. We are available 24/7 to assist you.

Do not panic, you are at the right place

Visit Our essay writting help page to get all the details and guidence on availing our assiatance service.

Get 20% Discount, Now
£19 £14/ Per Page
14 days delivery time

Our writting assistance service is undoubtedly one of the most affordable writting assistance services and we have highly qualified professionls to help you with your work. So what are you waiting for, click below to order now.

Get An Instant Quote

ORDER TODAY!

Our experts are ready to assist you, call us to get a free quote or order now to get succeed in your academics writing.

Get a Free Quote Order Now

Processors For Handheld Systems Mobile Devices