Wednesday, June 23, 2010


ARM and MIPS multi-threaded multi-core embedded processor technology analysis
Built in multi-core embedded device (including homogeneous or heterogeneous) and multi-threading technology, can indeed bring many benefits, especially to improve system performance in the most obvious.
Although RISC embedded technology more and more challenges, but in the past, embedded software resources to maintain compatibility with the premise, to improve its future applicability, and effectively improve the performance of the new system, it may well be good solution.
Multi-core or multi-threaded applications decided
Multi-threaded multi-core and on the performance of their help, but these technologies built-in performance and in fact there is no absolute relationship, the main cause of this reason is the application of environmental requirements. Mobile phone, for example, integrated in the SoC chip, although the phone are a part of multi-core architecture, but the phone is used in SoC chip multi-application processor, the integrated nature of the core is not entirely belong to the same structure, homogeneous multi-core in practical application of embedded system case is actually very small.
The multi-threaded processors in automotive electronics or embedded network environment plays an important role, but some manufacturers use several pieces of multi-threaded multi-core chips with both multi-threaded computing architecture, in other words, the two not simply choose sides only, based on the actual application requirements, with or develop their own has become the ultimate solution when many manufacturers face the problem of attitude. It also represents, in the choice of embedded systems infrastructure, the processor itself is only the application of an aspect of how the application will be required for maximum performance, the product must be in accordance with a variety of different considerations.
The technology is not just emotional dispute
Truly homogeneous multi-core architecture-ARM11 MPCore
Embedded multi-core application processor in this field, the current ARM-technology leader, although the company itself is not fab, but purely in the form of sale of the IP processor architecture, because positioning is correct, in just few years has made great market position, the world's most handheld devices are embedded ARM processor technology.
Their technology development process, early ARM7 architecture itself to meet some of the audio codec applications. The increase in 16-bit saturation operation instruction and improve the speed ARM9 core, not only work to complete the audio encoding and decoding, as well as about 80 MHz, 15 frames / second speed of the MPEG-4 QCIF (4 分之 1 CIF resolution) coding. In the ARM11 V6 instruction set architecture and SIMD instructions to increase speed, you can achieve VGA resolution H.264 encoding. Further to the latest 64-bit Cortex A8 with Neon SIMD architecture work with under the accelerator, you can complete 30 frame / sec MPEG-4 VGA encoding, ARM11 spent only half the cycle. In reality, the work requires about 300 MHz. To make these options more viable for users, ARM is developing a parallel compiler prototype, it can extract data parallelism, and use SIMD hardware to use it.
ARM11 MPCore structure diagram. (
ARM11 ARM11 MPCore but the core foundation in the composition, structure belongs to V6 command system. According to different application needs, MPCore can be configured as a 4-processor combinations, according to official said, about to reach its peak performance of 2600 Dhrystone MIPS level. MPCore is the standard homogeneous multi-core processors, MPCore is composed of four core processors based on the ARM11 architecture, due to the advantages of multi-core design is the same frequency in the case of the processor's performance be improved significantly, so expected in multitasking applications with good performance, which is very suitable for the future of home consumer electronics needs. For example, the top box recording several TV channels at the same time, can also watch digital video on demand via the Internet program, car navigation systems provide navigation functionality in the same time, there is still spare capacity to the rear seat passengers can play all kinds of video entertainment streaming.
In such applications, the multi-core structure of the embedded processor able to demonstrate strong performance advantage. According to the original data, MPCore multiprocessor supports up to 4 shared cache structure of the symmetric multi-processor (four-way cache coherent symmetric multiprocessing, SMP), or 4-way asymmetric multi-processor (four-way asymmetric multiprocessing , AMP) and 4 both symmetric / asymmetric hybrid multi-processor systems. The high flexibility of its design, in theory, designed to meet a variety of interdisciplinary applications of computing performance, flexibility needs to ensure the responsiveness of the system available to art or data throughput.
But ARM11 MPCore early 2004, has been released in 2005, formally joined the licensing business, so far, using the processor's products focus on home appliances and automotive electronics, but the number is not much, is the industry's computing power for the processor demand has not yet appeared? It is understood that in the automotive electronics, automotive applications of microprocessors have become increasingly demanding, but also the last single-core is basically to meet the general car use, but as more and more electronic aids integrated into the car , during which the work required to deal with more and more complicated, and has far exceeded the traditional automotive microcontroller affordable level, it can be expected that the next few years should be more and more car manufacturers to use similar multi- core architecture to achieve a reasonable system response speed.
The applications in the appliance, in fact, need to use such a complex core of few products in the most widely used audio and video products, in fact, most manufacturers are using dedicated hardware decoding circuit or the DSP for encoding and decoding of the action, direct multi-core processor for encoding and decoding efficiency action in fact is not obvious. In mobile applications, in fact, power is still the most mobile vendors focusing, even ARM11 MPCore multi-core to achieve low power consumption simultaneously, but still can not single-core versions, the visibility in the mobile applications not high. But with the Intel implementation of MID (Mobile Internet Device), a similar product is expected to become a great opportunity for ARM11 MPCore architecture, because even the next generation of 45nm products Stealey Silverthorne, its power is still five times higher than MPCore above (plus The total power consumption on the chipset), and only single-core architecture, flexibility in the application is obviously not as good as MPCore architecture, but there is one point worth noting is that, Silverthorne smuggle a large amount of X86 software resources, ARM and other RISC-based system processor, placing him in this regard to be significantly below the wind.
In the RISC-based MID products category, you can also consider the latest ARM processor architecture, that is, Cortex-A8, the new ARM v7 processor based system, and integrates a 64-bit DSP processing unit, for streaming applications possesses excellent acceleration, making it ideal for handheld devices like MID of multimedia and even gaming applications. Strictly speaking, Cortex-A8 can also be regarded as one of multi-core system, but its structure and core MPCore different kind of homogeneity, instead of using a general-purpose processor core, and made with a heterogeneous multi-core DSP core processor, I believe this to the Texas Instruments ARM application processors draw a lot of development experience.
Cortex-A8 structure diagram. ( )
In fact, NOKIA's N770/N800 it already has all the features of the MID, and more compact size, but unfortunately, with the original 1500mAh rechargeable battery, the continued use of the time only 3.5 hours, and general market not far from the UMPC products, less the Intel MID products, systems using ARM processor (N800 based ARM1136J (F)-S core, the i.MX31 applications processor) power advantage has not been highlighted in this out, but standby time to be slightly longer than the MID.
Adhere to multi-threaded MIPS line
May be regarded as an emotional issue, MIPS and ARM insist on the implementation of different technology strategies, ARM Development Multi Processor (MP, multi-processor core), and MIPS are to Multi Thread (MT, multi-threaded) to develop, on the application of concepts point of view, MP and MT technology, both committed to improving the overall performance of the processor, both of which can reduce any current software application thread processing time. But neither of these techniques use different hardware architecture to reduce processing time, so for any specific software program code for, MP and MT on processor performance improvement with a different degree.
But it will cause such a result, in fact, 2 IP, R & D concepts are very connected. As MT technology focused on the processing unit, memory controller, effective use, to save the maximum extent possible the use of transistors, and this premise up to enhance performance, which the MP architecture, the number of system performance requirements, on the copy How many core chips into a completely different approach in the waste, MP can get a more comprehensive breadth of application, but is a bit extravagant, compared, MT in terms of cost and performance balance performance is wiser to some.
Many people compare the MP and MT, and to some extent, such a comparison is actually not much sense, because the basic design concept has been poles apart, on the use of the natural structure can not be generalized. Technically, in order to achieve multi-processing hardware, software optimized for both the complexity in fact the same than the more complex single-core architecture to many, but in order to avoid processing unit and memory controller on the conflict in resource allocation , MT structure may be much more complicated, but the MP framework to some extent, in fact, will face the same problems (especially shared cache and memory controller, multi-core architecture). Both in the command level, or thread-level multi-task, with the traditional single-core single-threaded programs written in a way the best method with different creature.
The MT architecture in general, a single processor core in the computing process, often have memory access speed can not keep up the problem of increasing the processor frequency, which led to the cache miss (miss), the formation of long-idle execution pipeline situation, we all know, a system storage unit, to belong to the fastest processor in the buffer, followed by the L1 cache, L2 cache, the last is the main memory, the speed difference of up to thousands of times more than the processor to obtain instructions or data, must first be extracted from the cache, stored in a buffer in the operation, the final results to come back to the cache memory, and fill in the idle back to main memory when processing given access to the cache device needs only to find the required data is not in cache, which is an enormous amount of time must be spent to find and read the main memory, during this time may be wasted dozens of high frequency cycle, the data processing pipeline to fill the waiting time to form the idle state.
If the concept of multi-threaded processing, timely to pull over to other threads have been created to fill the idle state, its rate of growth is very obvious even to the point, although not doubled, but from 20% to 40% are likely to . In achieving this purpose, an increase in the number of transistors in only about 15% of the level can be, If the same general structure of single-core processors in dual-core performance for the change of growth of about 40% to 70% of the degree, and almost doubled the number of transistors the situation, the MIPS MT can be seen how efficient the technology. But the MT technology a serious drawback, it is the work of multi-threaded processing, context switching is too frequent (context switch) will likely cause significant performance loss.
MIPS 74K processor schematic. (
MIPS Technologies, Inc. has a large product line, which are single-threaded 24K and 74K series, and multi-threaded 34K series. 74K just published in June this year, the 65nm process, its operation frequency has gone beyond 1GHz, using general-purpose processor with a DSP core design, but the overall performance and power consumption performance was slightly worse than a similar architecture ARM Cortex-A8. Multi-threaded processor, the protagonist-34K range, the processor core can set one or two virtual processing elements (VPE) and up to 5 thread context (TC), can be configured to provide sufficient flexibility. But talk white, in fact, two VPE approach is to simulate the single-core Core 2, so that 34K core to execute two separate operating systems, or a two-way symmetric multi-processor operating system.
MIPS32 34Kc core using 90nm technology, the worst operating conditions the frequency of 500MHz. Core size is 2.1mm2, while the core part of the power consumption of 0.56mW/MHz @ 1.0V . The series core contains a total of 34Kc, 34Kf, 34Kc Pro and 34Kf Pro. The core is fully compatible with the IEEE 754 specification of the hardware floating-point processor. One 34Kc Pro and 34Kf Pro core with CorExtend feature that allows SoC designers themselves amplified instructions.
MIPS 34K processor schematic. (
According to MIPS own estimates, with the same family of products compared to up 24K, 34K, and in VPE 2 2 TC's configuration settings under the performance can be improved to exceed 60% of 24K processor, chip area roughly increased by 14%, and because the multi-threading errors caused by the cache ratio from 4.41% to 5.16%, be considered within the acceptable range.
However, compared with single-core 74K up, 34K but also does not apply to networks or multimedia streaming intensive computing environments, while the increase in VPE and TC cells, the same will increase the chip area. Although the MT to technical limitations, it is not suitable for multimedia encoding and decoding applications, but in automotive electronics, manufacturers have been successfully formed using two 34K-processor dual-core multi-threaded processors, and provides very good execution performance, there are successful precedents for this, we can predict the future there will be more with MIPS multi-core and multi-threaded solution there, but way, the deployment cost advantages but also how much is left, provided by the program makers on the worrying to it.
Want more info???? get in touch with me at dan at 2211 dot in

No comments:

Post a Comment