Besides the unveiling of the Cortex processor A7 Wednesday press event was also a kind of second debut for Cortex A15. The A15 will be ARM tablets and some high-end smartphones to go in the second half of 2012, and it is by far the best candidate for an ARM-based MacBook Air, Apple chose to go this route. Equally important, A15 also discuss the upcoming wave of ARM-based cloud server components still to be announced.
As part of the press materials for A7 launch, ARM released the first detailed block diagram of the smallest I could find of the Cortex-A15. The company also had the first working silicon of A15 on display running Android. So let's take a look at the A15 from top to bottom, as it is in the medium term, not only of mobile gadgets that we all know and crave, but possibly some of the servers, as these devices are connected.
Deeply pipelined, out-of-order
The A15 is a 15-stage out-of-order architecture, making it the same length as the venerable Intel Core 2 (Penryn), which has only recently begun on all Apple machines. Fifteen steps used to be quite long pipeline, but by today's standards it is modest. But this deep pipeline means that the chip will scale to higher clock frequencies, yes, it will be necessary to scale to higher frequencies more power and higher frequencies mean more power. A15 So the pipeline is the first place where you can see the scales tipped slightly in favor of performance power.
The force of the impact of the pipeline depth A15 pales in comparison, but the fact that the chip is out-of-order. With an out-of-order processing, sequential instruction stream that flows into the front-end processor is dynamically rearranged before he was executed after the implementation of the program is back in order, and the results are written back to memory. All this increases the reorder results, but also dramatically increases the power consumption too. To rearrange the instruction stream, so it works optimally, and those current in the original contract, chip designers have a number of additional storage structures add to the CPU called registers, issue queues, and a type of accounting device to follow the instructions in plane, and then put them back in order.
These storage structures are often used, although most parts of the processor is idle and off, so they are of the energy drain or a "hot spot". You can use the extra hardware that goes with thinking out-of-order processing as a single judge in a football game, some players on the court is idling at a time when the ball is not anywhere near their position, but the judge is constantly on the run because he is always on the ball (so to speak). This out-of-order storage must also be fast and correct number of read and write ports, which contributes to their size, complexity and strength need.
The block diagram
The block diagram A15 of the pipeline in some detail. The pipeline starts with a five-step phase when the instructions are taken from L1 and predecoded. Instructions, then move to the decode stage, where she decoded micro-ops. (Yes, despite the fact that ARM is the most classic of classic RISC architectures, A15 still uses the old "CISCy" trick to decode the ISA instructions into a smaller internal instruction format for reordering.) After decoding phase, rename registers assigned and follow the instructions will be sent.
There is also a loop cache in the decoding phase, a function which is more significant role in the processor front ends. Loop cache is a place for the instructions that form a loop kernel in decoded, in micro form, then the front is not to decode it again and again on each loop iteration to save. This feature saves energy and improves efficiency decode bandwidth.
A15 can send to three instructions per cycle to one of the eight questions cows. Like A15 pipeline are three instruction / cycle dispatch width relatively modest by today's standards. Some architectures do more, but you soon at a point of diminishing returns after three / cycle because the instruction-level parallelism for most of the code is just not as high. So a four-wide transmission would be aggressive on a part designed for mobile, which is why they went three-wide.
Instructions for one of the 8-5 arithmetic logic pipelines pipes, branch pipes, and two storage tubes. Three of the five arithmetic logic unit (ALU) is the scalar integer units, two of which appear to be single cycle and is used only for simple instructions and a four-cycle multiply unit for more complex instructions.
As usual these days, the floating point and vector units assembled in a two-unit of A15 floating-point/vector pipeline is 10 stages deep. A15 This gives a modest but still respectable amount of floating point horsepower, but it will probably not need much more than this in its target applications. Hazardous substances in the smartphone and tablet, the A15 will probably involve some kind of help nuclei, such as the ARM cores in two M4 TI's OMAP 5430 SoC. These smaller grains composed exclusively of ARM vector units (ie the introduction of ISA NEON extension), so some vector processing can be released to them. Other vector workload will undoubtedly be sent to the GPU in SOCs where possible. And the cloud server incarnation, the A15 usually makes integer workloads, so there is not much need for real FPU hp.
To conclude our discussion of the A15 back, it has the typical two-memory load and store pipelines, there are really only limited integer units used exclusively for the address generation. Then there is an operating unit to execute branches. Note that the separation of executive power in its own pipeline to a special issue queue is a classic feature of the PowerPC family of Intel products, hardware cluster, the branch calculations do not embarked with integer Alus, and it is certainly no need to get A separate issue queue.
The A15 in context
The Cortex A15 is in many ways a relatively simple iteration of a basic block that is already in one form or another since commodity processors made the leap to out-of-order processing. It falls well within a general line that goes back to the Pentium Pro and PowerPC-604. This is in no way a slam on the A15, then the same can be said of the current Intel designs. (AMD Bulldozer is a kind of an exception, and the results of the deviations from the norm so far been mixed.)
With regard to the current generation of processors, the closest peer to the A15 is probably AMD's Bobcat, which is essentially an out-of-order to take on Intel's Atom, right down to the fact that it will only send two instructions per cycle ( compare three A15) for performing core. Intel's Atom in-order will also be a direct competitor, but I think it's likely that Intel is out-of-order with Atom when the time comes, to produce something that looks like Bobcat and A15. My best estimate would be that they do it at 22nm.
All three of these chips ARM's Cortex-A15, Intel Atom and AMD's Bobcat will focus on both mobile devices and cloud server space. Regarding the latter, all lend themselves to easy server tasks such as running a web server, and a micro-server configuration such as SeaMicro and smooth Stone, all three will increase energy efficiency provide many types of cloud server tasks than larger, more complex parts such as Intel Xeon.
MacBook Air demand
On the client, it is certain that a future version of the iPad will be based on the A15, either alone or in a configuration with A7 big.LITTLE (my money on the latter). It will also be the first ARM part that really a suitable place for the laptop, which is important not only will Chrome book and possible Android laptops, but also potentially for Apple. There are persistent rumors that the Apple MacBook Air to move to ARM, I imagine that the rumors because Apple is definitely an up-to-date internal ARM port of OS X running test hardware. In this context there are a few important things to remember.
Years ago I heard the back story of the shift to Intel Apple first hand a number of people on the IBM side of things and what I learned was that Steve Jobs worried about this decision and waited until the morning of the keynote address before the trigger on this step. He actually did this today with two keynote presentations prepared: one for PowerPC-based product line, and one for The Switch. When he pulled the switch presentation, IBM team was completely stunned as the rest of the world that the PA Semi team, who were individually insured Jobs that their dual-core PowerPC portion would find its way into Apple's laptops.
This little anecdote confirms three things about Jobs and Apple, a long time Apple watchers are well aware of the CPU:
1) As he often said Steve Jobs as options. From the beginning of OS X, Apple held an internal, top-secret x86 port of the whole system up to date in the event that Jobs may one day decide to change the end he actually does.
2) For whatever reasons of ego and romance, Steve Jobs also as its own, non-commodity hardware CPUs. This related to their own hardware, the reason that Apple was one of the original ARM partners at the launch of the latter, it's why Apple stuck with PowerPC through the lean years, and that is why Jobs horrified at the decision to finally give the dream and go with Intel.
3) Job was notoriously fickle and mercurial when it came to making major hardware decisions, to the chagrin of Apple's processor partners. The story of the Mac CPU is a tale of heartbreak and surprise, rather than Heartbreak on the part of consumers, but there was, but Heartbreak on the part of Motorola, IBM and PA Semi.
Number 1 above is my reason to say that it is clear that Apple is working, internal prototype of an ARM-based MacBook Air runs an ARM port of Lion. No. 2 and 3 are the reason I would think that if Steve Jobs is still with us, we ultimately A15-based MacBook Air to see. But the fact that he is gone makes it much less likely, but not necessarily impossible.
Ivy Bridge Intel mobile parts are incredibly competitive from a performance / watt of view, and anyone who is healthy seems to want to MacBook Air on the A15. Stay on Intel at least 2013, the most technical and business sense for any rational observer. But when Jobs was still alive, there was always a chance to jump to the A15, a combination of his lifelong obsession with the idea of having her own store CPU hardware is notorious mercurialness it. Now he is gone, however, Apple will be free to sensible thing to do, and the ARM-based MacBook Air prototype plug, so that their human resources to spend on something productive. Whether they actually do is anyone's guess, but I think they should.