Case Study: The Broken Link
How JTAG and Boundary Scan Test the "Untestable" Board
1. Introduction: The "It-Works-Alone" Problem
Our DFT journey so far has been inside the chip. We've used Scan, MBIST, and At-Speed tests to gain 99.9% confidence that the *silicon die itself* is perfect. This "perfect" chip is then packaged (e.g., in a Ball Grid Array (BGA)) and soldered to a Printed Circuit Board (PCB) alongside other chips (like DRAM, a Wi-Fi module, etc.).
The system is assembled, powered on, and... it's dead.
We have a new, critical problem:
- The CPU chip is perfect.
- The DRAM chip is perfect.
- But the connection between them is broken.
A single solder ball under the BGA package didn't melt correctly, or a tiny trace on the PCB has a hairline crack. This is a board-level manufacturing defect. How do we find it?
2. The Design Under Test (DUT): A System-on-Board
Our DUT is no longer a single chip, but a *system*:
- Our Chip: A CPU with a 32-bit data bus.
- Another Chip: A DRAM controller.
- The Board: A PCB that connects them with 32 tiny traces.
The "defect" is a single "open" on the DATA[17] solder ball.
3. The "Old" (Failed) Solution: The Bed-of-Nails Tester
In the 1980s, you would test this with a "bed-of-nails" fixture. This was a giant press with thousands of tiny, spring-loaded pins ("pogos") that would physically touch test pads on the PCB.
- Problem 1: No Access. Our
DATA[17]trace is on an *inner layer* of the PCB, and the pin is a *solder ball* hidden under the chip. There is no physical way for a "nail" to touch it. - Problem 2: Density. Modern boards have thousands of connections. A bed-of-nails tester would be astronomically complex and expensive.
- Conclusion: This method is physically impossible for modern electronics.
4. The DFT Solution: IEEE 1149.1 (JTAG) Boundary Scan
The solution, standardized as IEEE 1149.1, is to build the "bed of nails" *inside the chip itself*. This is Boundary Scan.
Here's what we add:
- The TAP Controller: This is the "brain." It's a small state machine controlled by 4 (or 5) special pins: TDI (Test Data In), TDO (Test Data Out), TCK (Test Clock), and TMS (Test Mode Select). This 4-pin interface is the Test Access Port (TAP), commonly known as JTAG.
- Boundary Scan Cells (BSC): We add one special flop-and-MUX "cell" *right next to* every single I/O pin of the chip.
- The Boundary Scan Register (BSR): We chain all these individual BSCs together into one, long scan chain that loops from TDI to TDO.
This BSR chain creates a "virtual" wall that isolates the chip's *internal logic* from its *external pins*. We can now control the pins *directly* from the JTAG port, like a puppet master.
5. The Test Procedure: `EXTEST` (External Test)
Here is how we find our broken DATA[17] solder ball.
- Chain the Board: The ATE (or board tester) connects to the JTAG ports of *all* JTAG-compliant chips on the board, daisy-chaining them:
ATE -> CPU-TDI -> CPU-TDO -> DRAM-TDI -> DRAM-TDO -> ATE. - Enter EXTEST: The tester uses the TMS/TCK pins to "talk" to all TAP controllers at once, issuing the
EXTESTinstruction. - Isolate: This instruction configures all the BSC MUXes to *disconnect* the chips' internal logic. The CPU's core is now "off" and the pins are controlled *only* by the boundary scan flops.
- Scan-In: The tester shifts a test pattern (e.g.,
...0101...) into the *entire* board's chain. The bits for the CPU's *output* pins are loaded. - Update: The tester "updates" the BSRs. The CPU's boundary scan cells now *drive* this
...0101...pattern *out* of the CPU's pins and onto the PCB traces. - Capture: The tester pulses TCK one more time. The DRAM's boundary scan cells (which are in "input" mode) *capture* the values arriving on its pins.
- Scan-Out: The tester shifts the *entire* chain's contents out and reads the data that was captured by the DRAM.
The "Moment of Truth":
- Expected (at DRAM):
...0101...on its data bus. - Actual (at DRAM):
...01Z1...(or...0101...ifZfloated low). The value onDATA[17]is *wrong* because the "open" solder ball broke the connection. - Diagnosis: The tester compares the expected and actual patterns and instantly pinpoints the fault: "Failure between CPU pin G12 (DATA[17]) and DRAM pin B4 (DATA[17])."
The board can now be sent for precise repair (re-soldering or "reballing" the CPU).
6. Final Analysis: The Payoff
Boundary Scan is the *only* technology that can solve this board-level test problem.
| Test Strategy | Find Internal Faults? | Find Board-Level Faults? | Required Access | Cost/Complexity |
|---|---|---|---|---|
| No DFT | No | No | N/A | Total Failure |
| Bed of Nails | No | Yes (on old chips) | Needs 100% physical access | Astronomically High |
| Internal Scan | Yes | No | 4-pin JTAG port | Low |
| Boundary Scan | Yes (via INTEST) |
Yes (via EXTEST) |
4-pin JTAG port | Low |
The Payoff is Diagnostics:
- Without Boundary Scan, a failing board is a "black box." We have no idea why it's failing. The entire $1000 board is thrown in the trash.
- With Boundary Scan, we get a "Google Maps" for the failure, pointing to the exact broken trace or solder joint. The $1000 board is repaired for $5.
- Bonus: The JTAG TAP is the "front door" we use to access *all other* DFT (Internal Scan, MBIST, OCCs, etc.).
7. Post-Silicon Validation: The Moment of Truth
The first chip arrives. How do we test our *Boundary Scan* logic?
- Check the TAP: The first thing an ATE does is try to talk to the TAP controller. It issues a command to read the chip's IDCODE. If the chip responds with its correct 32-bit ID (e.g.,
0xDEADBEEF), we know the JTAG pins and the core TAP logic are working. - Run `BYPASS`: We issue the
BYPASSinstruction. This shrinks the chip's scan chain to a *single flop*. We scan 1 bit in and check that it comes out 1 clock later. This confirms the bypass path. - Test the BSR: We run a test that scans a
0101...pattern into the full Boundary Scan Register (BSR) and scans it right back out, *without* capturing or updating. This confirms the BSR chain *itself* isn't broken. - Run `INTEST`: We can use
INTEST(Internal Test) to use the BSR to drive values *into* our own chip's logic and capture the result. This confirms the BSR's connection to the internal logic.
Only after all this is confirmed can we trust EXTEST to find board-level faults.
8. Conclusion
This case study shows that DFT isn't just about testing the *inside* of the chip; it's about testing the chip's *place in the world*.
- Why Boundary Scan? To test the connections *between* chips on a board, which are physically impossible to probe.
- How? By creating a "virtual," scannable I/O ring around the chip's logic, controlled by the JTAG TAP.
- The Result: We can find and diagnose board-level "open" and "short" defects with pinpoint accuracy, saving millions in manufacturing and repair costs.
Comments
Post a Comment