From DDR3 to DDR4: Bandwidth by the Numbers

 

We’ve done a couple FAQs and Q&As, but we haven’t painted a clear, by-the-numbers picture yet of what DDR4 really has to offer beyond DDR3. On the desktop side the lower power consumption is offset somewhat by the fact that the only platform that supports it starts chugging power the instant overclocking gets involved, while 16GB DIMMs (one of the key advantages of DDR4) aren’t expected to be available until 2015.

That leaves us with performance. A lot of users are concerned that the increased timings on DDR4 make it inferior to DDR3 at similar speeds, but that doesn’t really tell the whole story. While DDR2 and DDR3 were architecturally very similar and took some time to separate, DDR4 is host to a few internal architectural changes that affect overall latency and performance. Those changes allow it to see benefits over DDR3 right out of the gate.

I want to stress that this exercise, at least right now, is academic: there is no platform currently available that supports both DDR3 and DDR4. So if you want DDR4, you’re using Haswell-E, and vice versa. That makes this comparison a little bit difficult since it’s tough to quantify in apples-to-apples terms whether or not DDR4 really is “faster” than DDR3.

For testing, I used three platforms with both single rank and dual rank DIMMs. Dual rank DIMMs increase parallelization a little bit at the cost of a very minor hit in latency, typically about 1ns. In lay terms, denser memory DIMMs (i.e. 8GB) get a little more mileage than lower capacity, single rank DIMMs (i.e. 4GB). Single rank DIMMs pretty much have to bank on hitting higher speeds to make up the deficit.

These are the testbeds I used:

 

Haswell

Ivy Bridge-E

Haswell-E

CPU

Intel Core i7-4790K

Intel Core i7-4930K

Intel Core i7-5960X

Motherboard

ASUS Z97-WS

ASUS P9X79 Pro

ASUS X99-Deluxe

Single Rank Kit

CMY16GX3M4A3000C12R

CMD16GX3M4A2933C12

CMD16GX4M4B3200C16

Dual Rank Kit

CMY32GX3M4A2800C12R

CMY64GX3M8A2400C11R

CMD32GX4M4A2800C16

Memory Channels

2x DDR3

4x DDR3

4x DDR4

Note that in each case, the CPU’s core clock was set to 4GHz and uncore clock was set to 3GHz.

And these are the latencies I tested with at each speed:

 

DDR3

DDR4

1600 MHz

10-10-10-30

 

1866 MHz

11-13-13-31

 

2133 MHz

11-13-13-31

15-15-15-35

2400 MHz

11-13-13-31

15-15-15-35

2666 MHz

11-13-13-31

15-15-15-35

2800 MHz

12-14-14-36

16-16-16-36

3000 MHz

 

16-16-16-36

3200 MHz

 

16-16-16-36

You can see I’ve tried to make it as apples-to-apples as possible, but these are different architectures and memory controllers. For bandwidth testing, I used AIDA64.

I’m keen to point out before we get started that it’s tough to actually quantify “faster” since there are essentially four disciplines you’re looking at: three that are bandwidth related and one that is latency related. It’s more sensible to look for trends.

READ

1600

1866

2133

2400

2666

2800

3000

3200

Haswell 1R

23233

26392

30586

34079

22850

23917

 

 

Haswell 2R

23982

27840

31833

35406

23585

 

 

 

Ivy-E 1R

41435

48444

50573

55197

 

 

 

 

Ivy-E 2R

43670

50520

57341

59831

 

 

 

 

Haswell-E 1R

 

 

54514

57664

60025

59651

60848

62407

Haswell-E 2R

 

 

56771

60231

62164

61045

 

 

So right off the bat, you can see Haswell’s dual-channel memory controller is going to have a hard time keeping up with the quad-channel memory controllers on Ivy Bridge-E and Haswell-E. What’s notable right off the bat, though, is that DDR3 and DDR4 are very close at the same clock speed despite DDR4’s increased CAS latency. In fact, if you’re using single rank DIMMs, DDR4 is measurably faster than DDR3.

You may also be seeing Haswell’s memory bandwidth take a bath after 2400MHz; this is something independently verifiable. Latency continues to improve past 2400MHz, but memory bandwidth takes a consistent hit. Meanwhile, Haswell-E’s DDR4 controller takes a slight dip at 2800MHz when we have to shift to CAS16 from CAS15, but resumes climbing at 3000MHz and 3200MHz.

WRITE

1600

1866

2133

2400

2666

2800

3000

3200

Haswell 1R

23715

27157

30852

34819

22926

24053

 

 

Haswell 2R

25132

29222

33248

37415

24158

 

 

 

Ivy-E 1R

30537

33096

37845

42184

 

 

 

 

Ivy-E 2R

31488

52746

60432

43347

 

 

 

 

Haswell-E 1R

 

 

46711

46817

46919

46888

46927

47009

Haswell-E 2R

 

 

47758

47832

47892

47912

 

 

At this point it’s obvious Intel’s Ivy Bridge-E and Haswell DDR3 controllers just weren’t architected to handle high speeds. Ivy Bridge-E and DDR3 do offer consistently higher write speeds than DDR4 does (provided you’re running dual rank modules), while DDR4’s write speed is essentially constant and stable at about 47GB/s.

While write speeds are obviously a weak point in Haswell-E’s DDR4 memory controller, they’re really the only one.

COPY

1600

1866

2133

2400

2666

2800

3000

3200

Haswell 1R

21557

24347

27535

30596

22256

23324

 

 

Haswell 2R

23794

27262

30494

33635

23557

 

 

 

Ivy-E 1R

39492

44368

49485

54149

 

 

 

 

Ivy-E 2R

43876

51026

58393

59667

 

 

 

 

Haswell-E 1R

 

 

52447

57749

62076

62793

65558

68990

Haswell-E 2R

 

 

56066

61703

67144

59848

 

 

Memory copy functions start slightly behind DDR3 at 2133MHz and then pretty much start soaring past it at 2400MHz. Judging from the synthetics so far, it seems like users who want to start getting the most out of Haswell-E should be looking at 2666MHz kits at a minimum. Again, mainstream Haswell’s dual-channel DDR3 controller is totally outclassed by the fatter pipes of these higher end, hexa-core and octal-core processors.

COPY

1600

1866

2133

2400

2666

2800

3000

3200

Haswell 1R

60.2

57.2

53.6

48.6

46.1

45.6

 

 

Haswell 2R

61.5

58.3

53.7

49.4

46.5

 

 

 

Ivy-E 1R

78.3

72.2

65.7

60.6

 

 

 

 

Ivy-E 2R

80.8

67.7

60.1

61.7

 

 

 

 

Haswell-E 1R

 

 

71.3

66.2

62.3

63.3

61.2

56.3

Haswell-E 2R

 

 

72.9

67.5

63.3

64.6

 

 

This is probably the biggest bugbear in the transition from DDR3 to DDR4. But users expecting DDR4 to grossly underperform DDR3 due to the higher CAS latency are going to be in for a surprise: as you ramp DDR4 to its intended speeds, latency actually drops below DDR3 (excepting Haswell’s dual-channel controller, which is just plain lower latency than both quad-channel controllers.) So while it’s true that DDR4 can be as much as 10ms slower than DDR3 at the same clock speed, it still has lower latency at its mainstream speeds, and the deficit isn’t any greater than if you were going from Haswell’s dual-channel controller to Ivy Bridge-E’s quad-channel.

Ultimately that’s kind of the takeaway here: DDR4 starts at very high speeds with room to scale higher, and at those entry level speeds, it’s faster and more capable than its predecessor in almost every test. Mainstream DDR4 actually winds up with lower overall latency and higher bandwidth than mainstream DDR3. In the future we’ll be testing DDR4 in practical applications to see if there are performance gains to be had from exceeding the baseline 2133MHz, but for now it’s clear that if nothing else, DDR4 is a more than worthy successor to DDR3, and fears regarding the higher timings resulting in substantially increased overall latency are by and large unfounded.

Share:

Add your comment

  • *
  • *
  • *
  • Captcha
    *

Comments

 
Facebook Newsfeed