### Accurate Statistical-Based DDR4 Margin Estimation using SSN Induced Jitter Model

Hee-Soo LEE, Keysight Technologies Cindy Cui, Keysight Technologies Heidi Barnes, Keysight Technologies Luis Bolunas, Keysight Technologies











#### **HEESOO LEE**

*SI/PI/EM Application Engineer, Keysight Technologies EEsof EDA* hee-soo\_lee@keysight.com



#### **Cindy Cui**

Application Engineer, Keysight Technologies EEsof EDA cindy\_cui@keysight.com





# **DDR4 Highlights**

#### Highlights:

- $\circ~$  Lower VDD voltage and Pseudo-Open Drain (POD) reduced power consumption by 40%
- o Internal VREF training performed within the IC receiver to optimize VREF level. Retraining at regular intervals
- $\circ~$  Data lines are calibrated at the IC to reduce their skew to the strobe
- Data bus inversion (DBI)

| Specification               | DDR2        | DDR3         | DDR4      |
|-----------------------------|-------------|--------------|-----------|
| Voltage                     | 1.8 V       | 1.5 / 1.35 V | 1.2 V     |
| Per Pin Data Rate<br>(Mbps) | 400-1066    | 800-2133     | 1600-3200 |
| Channel Bandwidth<br>(GBps) | 3.2-8.5     | 6.4-17       | 12.8-25.6 |
| Component Density           | 512 MB–2 GB | 1-8 GB       | 2-16 GB   |

Image Source: Micron Technology



DDR3 Push-Pull

DDR4 Pseudo-Open Drain





UBM

# **Timing Margin vs. BER**

 Requires new specs beyond traditional timing margin:

 $_{\rm O}$  Higher data rate – reduced UI and smaller margin

- Reduced VDDQ to achieve power consumption spec
- $_{\circ}$  Timing margin is eroded by ISI and RJ
- Adding a safety margin creates over-engineered solutions

#### Shrinking Timing Margins in Picoseconds Package / Board Margin Chip Margin -Data Valid Window DRAM Margin 2,500 Package/ DRAM Data Valid Chip Board Window Margin Margin Margin DDR1 2.500 900 800 800 DDR2 938 256 425 256 DDR3 469 188 140 140 DDR4 313 125 93 93 938 469 313 DDR1 DDR2 DDR3 DDR4 3,200 Mbps 400 Mbps

Image Source: Altera





Δ

# **New JEDEC DQ Specification**

- Receiver requirements defined by masks instead of setup / hold and DC voltage swings
- Bit Error Rate (BER) Specification:
  - $_{\rm o}$  Simpler definition of DRAM requirements and system design
  - $_{\odot}$  Bit Error Rate (BER) spec recovers timing and noise margin
  - $_{\circ}$  Eliminates troublesome slew rate derating
  - Jitter includes the sum of deterministic and random jitter terms for a specified BER
  - $_{\odot}$  The design specification is BER < 1e^{-16}
- How many bits for 1e<sup>-16</sup> BER?
  - $_{\odot}$  10 quadrillion bits (1/ 1e<sup>-16</sup>), equivalent to 125,000 Peta Bytes





| Symbol      | Parameter                        | 1600,1866,2133 |                 | 2400 |     | 2666,3200 |     | Unit | NOTE    |
|-------------|----------------------------------|----------------|-----------------|------|-----|-----------|-----|------|---------|
| Symbol      | cymse.                           |                | max             | min  | max | min       | max | Unit | NOTE    |
| VdIVW_total | Rx Mask voltage - p-p total      | -              | 136<br>(note12) | -    | tbd | -         | tbd | m∨   | 1,2,4,6 |
| VdI∨W_d∨    | Rx Mask voltage - deterministic  | -              | 136             | -    | tbd | -         | tbd | m∨   | 1,5,13  |
| TdIVW_total | Rx timing window total           | -              | 0.2<br>(note12) | -    | tbd | -         | tbd | UI*  | 1,2,4,6 |
| TdI∨W_dj    | Rx deterministic timing          |                | 0.2             | -    | tbd | -         | tbd | UI*  | 1,5, 13 |
| VIHL_AC     | DQ AC input swing pk-pk          |                | -               | tbd  | -   | tbd       | -   | m∨   | 7       |
| TdIPW       | DQ input pulse width             |                |                 | tbd  |     | tbd       |     | UI*  | 8       |
| Tdqs_off    | DQ to DQS Setup offset           |                | tbd             | -    | tbd | -         | tbd | UI*  | 9       |
| Tdqh_off    | DQ to DQS Hold offset            |                | tbd             | -    | tbd | -         | tbd | UI*  | 9       |
| Tdqs_dd_off | DQ to DQ Setup offset            |                | tbd             | -    | tbd | -         | tbd | UI*  | 10      |
| Tdqh_dd_off | DQ to DQ Hold offset             |                | tbd             | -    | tbd | -         | tbd | UI*  | 10      |
| SRIN_dI∨W   | Input Slew Rate over VdIVW_total | tbd            | 9               | tbd  | tbd | tbd       | tbd | V/ns | 11      |

#### NOTE :

 Data Rx mask voltage and timing total input valid window where VdI/W is centered around Vcent\_DQ(pin avg). The data Rx mask is applied per bit and should include voltage and temperature drift terms. The design specification is BER <1e-16 and how this varies for lower BER is thd. The BER will be characterized and extrapolated if necessary using a dual drare method from a higher BER(tbd).

#### JESD79-4, page 202



# **Bit-by-bit (SPICE-Like) vs. Statistical Approach**

#### Bit-by-bit (SPICE-Like, Transient) Approach

- $_{\odot}$  Bit-by-bit simulation takes too long to run for 10 quadrillion bits
- At least, 1 million bits (1e<sup>-6</sup>) is required to do jitter separation and predict eye opening accurately using Dual-Dirac extrapolation with Bit-by-bit approach
- $_{\odot}$  Example: 4587 seconds for a simple DQ test case

#### Statistical Approach (13 seconds, 350X faster)

- Statistical calculation for DQ and DQs eye probabilities at ultra low BER in seconds not days without running an actual bit sequence
- $_{\circ}~$  No need for risky dual-Dirac extrapolation
- $_{\circ}$  Example: 13 seconds for the simple DQ test case







JAN 31-FEB 2, 2017

UBM

# Simultaneous Switching Noise (SSN)

- SSN noise is generated when all drivers switch concurrently with fast rising/falling edge
- Two primary SSN mechanisms are:
  - $_{\circ}$  Crosstalk
    - Mutual coupling from aggressor signals to victim
  - Delta-I noise due to the inductance of both power and ground plane
    - The switching current on both power and ground planes induces a fluctuating voltage drop, by L \* di/dt.
    - The voltage drop is proportional to the inductance and switching speed



UBM





### **Delta-I Noise With Statistical Approach**

#### Assumptions made in statistical approach

- Statistical methodology assumes the system to be LTI (Linear Time Invariant)
- The amplitude and jitter noise by crosstalk and ISI are well taken care of by the statistical approach

#### Dilemma:

- Delta-I induced amplitude and jitter noise are time variant, so they are not taken into consideration with the statistical approach
- For the ultra-low BER value, 1e<sup>-16</sup>, the statistical approach is required

1. Transient analysis to get an impulse response of channel, TX, and RX



2. Statistical analysis with the statistical distribution of a conceptually infinite non-repeating bit pattern





UBM

## **Solution - Mask Correction Factor (MCF)**

#### Definition of MCF:

- The difference of eye height and eye width, one with and the other without delta-I noise contribution
  - Eye height difference Amplitude noise correction factor
  - Eye width difference Jitter noise correction factor
- Usage of MCF:
  - Apply to the mask data to compensate delta-l induced noise for the statistical analysis
  - Correct the eye height and eye width value at a certain BER level



time, psec







UBM

### **MCF Extraction Procedure**

#### Steps to extract MCF:

- Run Transient simulations on two cases, one with PDN and the other without PDN.
- Find the eye height and eye width values at the expected BER level respectively
- Extract the mask correction factor by subtracting the values of these two cases for the amplitude and jitter MCF

### Note for PDN model:

 Higher frequency model to avoid any extrapolation errors and accurately model the switching speed

| Amplitude MCF | Jitter MCF |
|---------------|------------|
| 25 mV         | 19 ps      |



measurement

3.Summary 3.771E-10 0.423

| measurement | BSummary  |
|-------------|-----------|
| WidthAtBER  | 3.958E-10 |
| HeightAtBER | 0.448     |



10

UBM

### **Relationship Between MCF and # of DQ Lines**

### Total current draw vs. number of DQ lines

- If the bit pattern on each of 64 DQ lines is identical, the total current draw from the source will increase linearly proportional to the number of DQ lines included. But in real case, the bit pattern is random, so it doesn't have the linear relationship
- Extract MCF with all 64 DQ line running by the non-identical bit patterns

#### Cases with 4,8,12,16,20,24,28,32,36,48, and 64 DQs DQ lines







time, nsec







### **Test Example for MCF vs. # of DQ Lines**

#### Transmitter

- 64 PRBS with a different seed value
- 64 "kintexu.ibs" Power-Aware IBIS models

#### Results

- Amplitude correction factor:
  - 49 mV with 64 DQs and 24 mV with 16 DQs
- Jitter correction factor:
  - 25 ps with 64 DQs and 6 ps with 16 DQs











With PDN 64 Lines



=: 0P3040





#### JAN 31-FEB 2, 2017

12 UBM

#### Receiver 64 Micron "z80a v5p0.ibis model"

Package models included

### **Solution Validation**

#### Xilinx® KCU 105 FPGA Platform Board

 o Provides a hardware environment for developing and evaluation designs targeting the Ultrascale<sup>™</sup> XCKU040-2FFVA1156E device

 Provides features common to many evaluation systems including DDR4, HDMI, SFP+, PCIE, Ethernet PHY, etc

 $_{\circ}$  9.27 x 5 inch, 16 layers PCB

#### DDR4 Memory

- 2GB Micron 4 DDR4 component memory (four [256 Mb x 16] devices)
- o 64 DQ lines between FPGA and DDR4 memory with a single Power Deliver Network



13

UBM





### **MCF Extraction for KCU105 Board**

### MCF Extraction

- $_{\circ}$  Pre-layout models used for the channel
- $_{\odot}$  Transient Simulation w/ and w/o PDN on DQ lines with  $1e^{6}$  bits
- $_{\odot}$  Significant increase of noise to amplitude and jitter
  - Amplitude, jitter correction factor: 94 mV, 16 ps





| Amplitude MCF | Jitter MCF |
|---------------|------------|
| 94 mV         | 16 ps      |



## **Statistical Analysis – KCU105 Board**

### PCB EM Modeling

- Accurate EM models for PCB, which include channels (DQ, DQs, etc) and PDN
- Include only one I/O Bank (16 bits) for a faster EM model generation assuming minimal crosstalk between I/O banks
- $_{\rm \circ}$  Vendor supplied de-coupling capacitor models

### DDR Bus Simulation (Statistical Approach)

 $_{\odot}$  Simulations at two BER level, 1e  $^{-8}$  and 1e  $^{-16}$ 

- @ BER = 1e<sup>-8</sup>
  - $_{\circ}$  Eye height = 347 mV, Eye width = 356 ps
- @ BER = 1e<sup>-16</sup>
  - $_{\circ}$  Eye height = 374 mV, Eye width = 348 ps

#### DQ35 Eye Diagram



| measurement | @ 1E-16 BER | @ 1E-8 BER |
|-------------|-------------|------------|
| WidthAtBER  | 3.479E-10   | 3.563E-10  |
| HeightAtBER | 0.367       | 0.374      |



### **Measurement Setup**

### Measurement:

- Keysight's DSAV334A Infiniium Oscilloscope
- $_{\circ}$  N6462A DDR4 Compliance Test Application
- $_{\odot}$  Measured on DQ35 at 2400 Mbps speed grade with 109 million bits, which is close to  $1e^{-8}$  BER









### **Measured Data**

Measurement at 109million bits:

 $_{\odot}$  Eye Width – 339 ps

 $_{\circ}$  Eye Height – 271 mV

|            | Measurement Result<br>(@1E <sup>-8</sup> BER) |
|------------|-----------------------------------------------|
| Eye Width  | 339 ps                                        |
| Eye Height | 271 mv                                        |





### **Side-By-Side Comparison**

Statistical analysis vs. measured comparison on DQ35 – No correction:

Reasonable agreement

o Larger amplitude and jitter noise with the measured data due to the delta-I noise contribution





### **Corrected Mask – Still Within Spec!**



time, psec

|               |            |            | DDR4 DQ Mask in JEDEC Spec | New DQ Mask After Correction factor |
|---------------|------------|------------|----------------------------|-------------------------------------|
| Amplitude MCF | Jitter MCF | Eye Width  | 0.2 UI                     | 0.23 UI                             |
| 94 mV         | 16 ps      | Eye Height | 130 mv                     | 224 mv                              |





UBM

# **Eye Height and Width with MCF Applied**

#### • Excellent agreement :

 $_{\odot}\,2\%$  eye width difference on simulation vs. measured @  $1e^{\text{-8}}\,\text{BER}$ 

 $_{\odot}$  2.2% eye height difference on simulation vs. measured @ 1e^8 BER

| DDR BUS Sim Result @ 1E <sup>-16</sup> BER |                          |                           | DDR BUS Sim Result @ 1E <sup>-8</sup> BER |                           | Measurement<br>Result   | Sim/Measure<br>Difference |
|--------------------------------------------|--------------------------|---------------------------|-------------------------------------------|---------------------------|-------------------------|---------------------------|
|                                            | W/O correction<br>factor | With correction<br>factor | W/O correction<br>factor                  | With correction<br>factor | (@1E <sup>-8</sup> BER) |                           |
| Eye Width                                  | 323 ps                   | 307 ps                    | 348 ps                                    | 332 ps                    | 339 ps                  | 2%                        |
| Eye Height                                 | 360 mv                   | 266 mv                    | 371 mv                                    | 277 mv                    | 271 mv                  | 2.2%                      |

Compared



### Conclusion

- Statistical simulation approach must be used for DDR4 to get an ultra-low BER,1e<sup>-16</sup>.
- Statistical simulation approach assumes the system to be linear, so the delta-I noise contribution for SSN is ignored
- Proposed solution using the mask correction factor (MCF) improves the accuracy of DDR4 statistical simulation by compensating the delta-I noise contribution
- Simulated results with MCF agree well to the measured data







### References

[1] H. Shi, G. Liu, and A. Liu, "Analysis of FPGA simultaneous switching noise in three domains: time, frequency, and spectrum", Proc. DesignCon 2006, Feb. 2006.

- [2] James P. Libous and Daniel P. O'Connor, "Measurement, Modeling, and Simulation of Flip-Chip CMOS ASIC Simultaneous Switching Noise on a Multilayer Ceramic BGA", IEEE Trans on Components Packaging, and Manufacturing Technology, Part B, Vol. 20, No. 3, August 1997.
- [3] Penglin Niu, Fangyi Rao, Juan Wang etc. "Ultrascale DDR4 De-emphasis and CTLE Feature Optimization with Statistical Engine for BER Specification" DesignCon 2015
- [4] JEDEC DDR4 SDRAM Specification\_JESD79-4A, NOVEMBER 2013
- [5] Fangyi Rao, Vuk Borich, Henock Abebe, Ming Yan "Rigorous Modeling of Transmit Jitter for Accurate and Efficient Statistical Eye Simulation", DesignCon 2010
- [6] Keysight, "A New Methodology for Next-Generation DDR4 Application Note"
- [7] Ai-Lee Kuan, "Making Your Most Accurate DDR4 Compliance Measurements", DesignCon 2013
- [8] Larry Smith and H. Shi, "Design for Signal and Power Integrity", DesignCon 2007





# Thank you!

### **QUESTIONS?**





JAN 31-FEB 2, 2017

