# end of moore's law ?

## Recommended Posts

fir    460

I am not sure how the situation is today and how fast cpus are gaining power today.. When you read about improvememts in new intel architestures you can read aboul silly values of about 5% each new generation, If so one coud suspect that speed of cpus is now growing silly slow say 5% a year or something - but still I am not sure if it is really such slow (?) Could anybody answer how buch faster will we have a home pc 5 yerars  from above - only twice faster? - maybe less? more? (I am not asking about gpu just cpu program speed) What is an amount of execution speed difference between the average pc model in the shop and say more expensive models from the shop, are the expensive models 300% faster or maybe just 30% faster ? Is there some reliable benchmark showing the differences?

I could say that when giving values it seem to me that people are overoptymistic for evemple i was doing some

test of some mine raytracing code running on 2003 or 2004 pentium 4 machine and strong 2012 machines  (one core code)- it was about 5 times faster - this is noticable but this

is almost 10 years gap, now I think it can maybe grow noticably slower

Edited by fir

##### Share on other sites
fir    460

Moores law is about transistor density, not computing power -- that the number of transistors you can cram into a particular area seems to double every 2 years.

It's slowing down, and for the past 5 years or so, there's been more of a focus on using those extra transistors to create more cores, rather than using them to speed up a single core.

If you're writing single-threaded code now, it will only be a few % faster in 5 years. Buf if you're writing code right now for a 4-core CPU, but written in a way where it can scale up to a 32-core CPU, then maybe in 5 years you'll be able to run that code 700% faster...

i do not think so (too)

I remember in 2007 or 2008 there were 4 core processors in the shops (as a news maybe but not to expensive) and today five years later I got still 4 cores in shops - not too much growing up, So sadly i doubt if in 2018 wi willl have 32 core pcs

at home - if the core will be 30% faster and there would be 8 cores instead the four it is slow progress

IMO the cores could be revritten - it is very bad for example that there is no hardvare command for cross and dot products vector length normalisation etc etc - this should be reasonable

easy way of improving throughtput of some important algorithms

##### Share on other sites
fir    460

I remember in 2007 or 2008 there were 4 core processors in the shops (as a news maybe but not to expensive) and today five years later I got still 4 cores in shops - not too much growing up, So sadly i doubt if in 2018 wi willl have 32 core pcs
at home - if the core will be 30% faster and there would be 8 cores instead the four it is slow progress

IMO the cores could be revritten - it is very bad for example that there is no hardvare command for cross and dot products vector length normalisation etc etc - this should be reasonable easy way of improving throughtput of some important algorithms

You are taking too narrow a view of processors:
- CPU manufacturers have been largely focussed on mobile for the last few years, and there have been huge gains in both performance and power efficiency in the mobile sector.
- GPU manufacturers have been pushing massive increases in the number of parallel cores and, as you suggest, highly specialised instructions sets to accelerate domain-specific tasks.

Probably this is true, but indeed personally I am not much interested in both mobile and gpu - i am focused on desktops cpu and still am hungry for pure MIPS and FLOPS the slowdown of it worries me (If more power to core is hard to obtain more cores is okay but sadly as i said it doeas not grow even one additional core a year but much slower too)

[or it looks like that I am not quite sure if this view and estimation of the present and future is about to be correct,

but as far as i know it looks like that]

##### Share on other sites
swiftcoder    18432

Probably this is true, but indeed personally I am not much interested in both mobile and gpu - i am focused on desktops cpu and still am hungry for pure MIPS and FLOPS the slowdown of it worries me

That is what CUDA / OpenCL / DirectCompute are designed to solve. Toolkits like OpenCL let you write code which can execute across CPU and/or GPU cores as available.

Even integrated GPUs are faster than most general-purpose CPUs for highly-parallel tasks, and dedicate GPUs are orders of magnitude more powerful.

I remember in 2007 or 2008 there were 4 core processors in the shops (as a news maybe but not to expensive) and today five years later I got still 4 cores in shops - not too much growing up, So sadly i doubt if in 2018 wi willl have 32 core pcs

I missed this the first time round. While 4-8 cores is still the norm for desktop computers, higher core machines certainly exist...

If you have the cash to burn, HP will sell you a 24-core workstation off-the-shelf (for a cool \$10,000). And I'm pretty sure there are higher core counts (maybe even for less $$) if you look at server hardware. Edited by swiftcoder #### Share this post ##### Link to post ##### Share on other sites fir 460 Probably this is true, but indeed personally I am not much interested in both mobile and gpu - i am focused on desktops cpu and still am hungry for pure MIPS and FLOPS the slowdown of it worries me That is what CUDA / OpenCL / DirectCompute are designed to solve. Toolkits like OpenCL let you write code which can execute across CPU and/or GPU cores as available. Even integrated GPUs are faster than most general-purpose CPUs for highly-parallel tasks, and dedicate GPUs are orders of magnitude more powerful. I remember in 2007 or 2008 there were 4 core processors in the shops (as a news maybe but not to expensive) and today five years later I got still 4 cores in shops - not too much growing up, So sadly i doubt if in 2018 wi willl have 32 core pcs I missed this the first time round. While 4-8 cores is still the norm for desktop computers, higher core machines certainly exist... If you have the cash to burn, HP will sell you a 24-core workstation off-the-shelf (for a cool 10,000). And I'm pretty sure there are higher core counts (maybe even for less$$) if you look at server hardware.

Well, interesting.. As to OpenCl I know it just slightly - it seem to be a way to go eventually but read somewhere that it is not too much well designed or something like that (it was probably a blog of man called aneru) But maybe it will mature one day. today, it seem to me that multicore coding is harder and more limited than one core and openCl coding yet more hard and yet more limited than multicore programming.

Is the power of OpenCl prosessing growing faster than the growth of the cpu/cores? I do not know..

As to HP 24 core workstation interesting - is this auch a 24-core stuff you could buy as a christmas gift to some maniacal player and coder who can run games on it or running 24-procesor windows code on it? If so I think indeed a thing may

be getting cheaper and go to the shops after some 5 years (but now it still does not look like that and it seem to me that

consumer computer is still 4 core)

##### Share on other sites
fir    460

These new quad-cores have used their smaller transistors to achieve better performance, with much higher clock speeds, more cache, better pathways to RAM, new instruction sets, 16-wide SIMD operations, added an integrated GPU, more complex pipelines, etc...

Well they do all that but when you measure such difference

in the procesing abilities of one core of quad core from 2007

and 2013 you after that can meybe observe 2x speedup

maybe less (I do not know how much exactly ) and it is maybe still getting slower - It seem to me that you maybe cant count on one core efficiency improvements (but as I said I am not quite sure - maybe should read some reallife benchmarks and it should show something)

##### Share on other sites
swiftcoder    18432

It seem to me that you maybe cant count on one core efficiency improvements

Yes, that is entirely correct, and has been so for the better part of a decade. Did you read the free lunch article frob posted?

The future of software performance is in parallelisation. Luckily, we have many excellent sources of massively parallel computing power. GPUs, Intel's accelerator boards, networked cloud computers...

##### Share on other sites
fir    460

It seem to me that you maybe cant count on one core efficiency improvements

Yes, that is entirely correct, and has been so for the better part of a decade. Did you read the free lunch article frob posted?

The future of software performance is in parallelisation. Luckily, we have many excellent sources of massively parallel computing power. GPUs, Intel's accelerator boards, networked cloud computers...

The other thing worth mention is ram speed - few people I know mention it. This ram speed seem to be limited and

I do not know the technical reason of it - as to cpu is the

famous 3-4GHz barrier (or something like that) do not know if that ram speed limit is in some way related to that - but this ram chip speed limit may be more crucial than cpu throughput imit..

Is there some chance that someone will invent much faster ram technology or this is the same molecular phenomenon

like that 4Ghz clock barrier? Should find and read something good on that..

##### Share on other sites
fir    460

It seem to me that you maybe cant count on one core efficiency improvements

Yes, that is entirely correct, and has been so for the better part of a decade. Did you read the free lunch article frob posted?

this is long to read but it has a statement that also comes to

my mind - if so (it is power of core stops growing) then optymization should be now not less 9like many say) but more important. Also optymization can be maybe also easier than rewriting to multiyhreading.

##### Share on other sites
swiftcoder    18432

Also optymization can be maybe also easier than rewriting to multihreading.

Optimisation is hard, multithreading is hard. Picking which one is easier is probably not a simple task :)

There are also hard limits to the gains available from optimisation of single-threaded code.

Modern compilers are very good at optimising single-CPU operations (SIMD auto-vectorisation, etc), so most of your gains are in either improvements to algorithms, or improvements in cache locality. The optimal algorithms in most common problem domains tend to be fairly well known, so algorithmic gains are rare. That leaves cache locality improvements as about your only option...

##### Share on other sites
King Mir    2490

It seem to me that you maybe cant count on one core efficiency improvements

Yes, that is entirely correct, and has been so for the better part of a decade. Did you read the free lunch article frob posted?

The future of software performance is in parallelisation. Luckily, we have many excellent sources of massively parallel computing power. GPUs, Intel's accelerator boards, networked cloud computers...

The other thing worth mention is ram speed - few people I know mention it. This ram speed seem to be limited and
I do not know the technical reason of it - as to cpu is the
famous 3-4GHz barrier (or something like that) do not know if that ram speed limit is in some way related to that - but this ram chip speed limit may be more crucial than cpu throughput imit..

Is there some chance that someone will invent much faster ram technology or this is the same molecular phenomenon
like that 4Ghz clock barrier? Should find and read something good on that..

RAM has indeed staggered behind cpus in speed, which is why modern CPUs have so much cache. But caches are small, and can only prefetch local memory, so randomly accessing a large amount of memory, like in a database, can be a major bottleneck. Futhermore, this bottleneck cannot be combated by increasing the number of cores, because there is only one memory bus per CPU. This isn't at all a new thing, but it is the reason why locality of reference is such a big deal for programmers today.

Increasing the speed of RAM would be nice, but major breakthroughs are rare, so the trend is likely to continue. I guess the reason that memory has not gotten much faster is A) because it instead grew in size, and B) because CPUs have improved not only because of smaller transistors, but also architecture improvements, which you can't do as much of for memory.

##### Share on other sites
StubbornDuck    602

It seems fitting to mention Amdahl's law here: The shortest possible time you can execute a parallel program in is its longest sequential part.

Amdahl's law is often cited by very pessimistic people. There's Gustafson's law as a counter: Programmers tend to adjust their problems to be able to parallelize a larger fraction of them.

Optimisation is hard, multithreading is hard

Sure, but to elaborate a bit, we can build other concurrency models to make parallel programming easier. The actor model is lovely. Then there's task based parallelism, etc. Optimization remains hard however, because it's not necessarily so that you gain performance out of parallelizing your problem with all the overhead involved, even though actor message passing is very understandable.

##### Share on other sites
fir    460

Futhermore, this bottleneck cannot be combated by increasing the number of cores, because there is only one memory bus per CPU.

Did you mean that when i have 4 or 24 core 'CPU' the cores

reaches the memory through the one common 'bus' and they colide on that way? (i do not know such hardware stuff but it seem unbelivable)

##### Share on other sites
swiftcoder    18432

Did you mean that when i have 4 or 24 core 'CPU' the cores reaches the memory through the one common 'bus' and they colide on that way?

Pretty much, yes. The technical term would be 'bus contention'.

There are more exotic architectures that avoid this problem to a greater or lesser degree (i.e. the various 'tiled CPU' architectures), but nothing in the mainstream.

##### Share on other sites
Hodgman    51234

Well they do all that but when you measure such difference
in the procesing abilities of one core of quad core from 2007
and 2013 you after that can meybe observe 2x speedup

That depends on the code. If you port your SSE code over to the new AVX instructions , then that's a potential/theoretical 4x speedup just there, plus if we say that the CPU is 2x faster overall, then that's an 8x speedup in total.

That's theoretical best case though, if you re-write your code to completely exploit the new CPU.

reaches the memory through the one common 'bus' and they colide on that way?

In very simple terms, the CPU plugs into the motherboard, and the RAM plugs into the motherboard. The motherboard has a link (bus) between the two components. That bus itself has a maximum speed (and these have been getting faster over time).

The problem is that CPUs have been improving at x% per year, while RAM and the bus have been improving at y% per year, and y is smaller than x.

For example -- I'm just making up numbers, but imagine that every two years CPU's are 2x as fast faster than before, but new RAM/motherboards are only 1.5x as fast as before.

RAM has still gotten faster... but the CPU has gotten more faster, faster

Say you then measure the CPU in instructions per second, and you measure RAM/bus in bytes per second...

Let's say that Computer does 100 instructions/second, and 100 Bytes/second. In relative terms, that's 1 byte per instruction.

Lets say that two years later, Computer B does 200 instructions/second, and 150 Bytes/second. In relative terms, thats 0.75 bytes per instruction!

Then this means that the number of Bytes per Instruction is actually decreasing over time. CPUs are getting so fast, that they can crunch data faster than the RAM can deliver that data...

Edited by Hodgman

##### Share on other sites
King Mir    2490

Futhermore, this bottleneck cannot be combated by increasing the number of cores, because there is only one memory bus per CPU.

Did you mean that when i have 4 or 24 core 'CPU' the cores
reaches the memory through the one common 'bus' and they colide on that way? (i do not know such hardware stuff but it seem unbelivable)

Essentially yes.

Though it's not really a long the way, because main memory itself is shared. The bus is just the point where the cores converge.

The problem does not apply to the CPU caches, so when a program is able to keep all its data in at least the L3 cache, there is no such bottleneck.

##### Share on other sites
fir    460

Well they do all that but when you measure such difference
in the procesing abilities of one core of quad core from 2007
and 2013 you after that can meybe observe 2x speedup

That depends on the code. If you port your SSE code over to the new AVX instructions , then that's a potential/theoretical 4x speedup just there, plus if we say that the CPU is 2x faster overall, then that's an 8x speedup in total.

That's theoretical best case though, if you re-write your code to completely exploit the new CPU.

IMO you say it wrong here: 4 float SSE is avaliable in 2003 machines, in 1012 or 2013 here you got 8 float but this

perself not give you 2x speedup - it can do but in some cases only so it doeas not do 4x difference, well maybe 1.5x on average or less, or about that

as to raw core power I was not testing between 2007-8

cores and 2012-13 cores so I am not sure if it would be 2x

but it should be "about" that - I was runing my simple one core (not sse) raytracing code between 2003 pentium4

and couple of 2012 new machines and it was about 5x down

in frame milliseconds - I suspect that speedup between

2003 and 2008 was alrger than speedup between 2008

and 2013 so I conclude that 2008->2013 would be about 2x down or somewhat less - I think it is about to be correct

If someone wouldlike tu run my benchmark-like simple raytracing I can give something close to the test I used then

(not any spy-malware here , but as  far as i am maybe not be sure about some oldschool vir maybe someone afriad running could check this with some antivirus or something,)

https://dl.dropboxusercontent.com/u/42887985/r30.zip

(someone said that it do not run on his win8 I was testing it on xp only)

##### Share on other sites
Hodgman    51234
2013 CPUs have 16-wide SIMD, 4x better than the old 4-wide ones.
*In theory* that can give a 4x improvement over the old ones. In practice, you'll probably be bottlenecked by RAM, or non-SIMD-friendly code.

As above, moore's law doesn't predict 2x performance per two years, just 2x complexity. These CPUs, 6 years apart are 8x more complex as predicted, and could potentially offer 8x performance increases in some cases, but yes in other cases it might be even less than a 2x increase.

He careful when benchmarking a particular app, like your link, because you're not just benchmarking the CPU, but the whole system.
It could be that the CPU is capable of giving you a 10x boost, but it's spending half it's time stalled waiting for RAM, so it only ends up giving you a 5x boost, etc...

Watching the multi-core performance increases in the GPU market is more interesting IMHO - as they're working with highly parallel applications, and and therefore have a lot more freedom to simply increase core counts and SIMD-widths rather than all the fancy stuff that Intel does to boost single-core performance.

##### Share on other sites
fir    460

2013 CPUs have 16-wide SIMD, 4x better than the old 4-wide ones.
*In theory* that can give a 4x improvement over the old ones. In practice, you'll probably be bottlenecked by RAM, or non-SIMD-friendly code.

As above, moore's law doesn't predict 2x performance per two years, just 2x complexity. These CPUs, 6 years apart are 8x more complex as predicted, and could potentially offer 8x performance increases in some cases, but yes in other cases it might be even less than a 2x increase.

He careful when benchmarking a particular app, like your link, because you're not just benchmarking the CPU, but the whole system.
It could be that the CPU is capable of giving you a 10x boost, but it's spending half it's time stalled waiting for RAM, so it only ends up giving you a 5x boost, etc...

Watching the multi-core performance increases in the GPU market is more interesting IMHO - as they're working with highly parallel applications, and and therefore have a lot more freedom to simply increase core counts and SIMD-widths rather than all the fancy stuff that Intel does to boost single-core performance.

Are the 16-float wide AVx already in production/shops? i didnt know about it ..In general we generally agree i think.. and some intersting points were pointed here.. didnt know that 24-core pc are avaliable today.. it seemed to me that this growing of number of coress stopped strangely (4 cores in 2007 four cores now) but maybe it will grow... Maybe also I should interest myself in OpenCl computing this can be maybe interesting though I hear some critical opinions sometimes - so I will really not know until i check it myself.

##### Share on other sites
frob    44911

it seemed to me that this growing of number of coress stopped strangely (4 cores in 2007 four cores now) but maybe it will grow

Just monitor the latest rounds of processors coming out of both AMD and Intel.  There is always an impressive list of processor details.

For consumer hardware you could get an i7 which has 6 hyperthreaded thread cores (6/12) that looks like 12 fully featured processors to your operating system. For the server side, the E7 Xeon has 10/20 with 10 hyperthreaded cores that look like 20 fully featured processors.  Both are fairly common on the market right now.

AMD chips have more internal cores, but internally the way they share resources can give better results or worse results than the Intel chip equivalent. Both chipsets have their strengths and weaknesses and the main performance difference between them is the way they handle floating point and SIMD instructions. So a 24-core AMD chip may have better or worse performance for your application than a 10/20 Intel chip

Either way, Moore was referring to cost per component, and component didn't necessarily refer to transistor. Some very smart people have gone back over technology and found that for hundreds of years the rate of change is similar even when the highest technology was weaving looms and mechanical gears. When we eventually make the transition to some future computing medium, be it quantum computing or analog computing or DNA-based computing or whatever, we will likely see a similar trend in cost-per-component.

##### Share on other sites
fir    460

it seemed to me that this growing of number of coress stopped strangely (4 cores in 2007 four cores now) but maybe it will grow

Just monitor the latest rounds of processors coming out of both AMD and Intel.  There is always an impressive list of processor details.

For consumer hardware you could get an i7 which has 6 hyperthreaded thread cores (6/12) that looks like 12 fully featured processors to your operating system. For the server side, the E7 Xeon has 10/20 with 10 hyperthreaded cores that look like 20 fully featured processors.  Both are fairly common on the market right now.

AMD chips have more internal cores, but internally the way they share resources can give better results or worse results than the Intel chip equivalent.

Both of the chipsets share resources, but they do it differently.  Both chipsets have their strengths and weaknesses and the main performance difference between them is the way they handle floating point and SIMD instructions. So a 24-core AMD chip may have better or worse performance for your application than a 10/20 Intel chip

One workload may be better on one chip, another workload better on a different chip.

on agner fog site (well known assembly man) about it - and

he said that hyperthreading do not give much (as far as i remember that) so I think that possibly counting this is much more marketing than performance (so if such HP stuff got 24 with hyperthreading it is maybe really much closer to 12)

but of course 10 is not 4 so it counts.(maybe I can assume that after five years i will have 10 core at home, damn I will be 42 then, whlole life devoted to computers :\ )