If we had to guess at the most-feared word in the CPU industry, "erratum" would be a top contender. When a major silicon bug makes itself known, like the Pentium FDIV bug and the AMD Phenom TLB problems of years past, the costs can be enormous. These bugs only cause problems bad enough to make their way into the public eye once in a blue moon, but today's is a doozy. The Register reports that a circuit degradation problem inside some Intel Atom C2000 embedded CPUs can render systems with those chips inside permanently unable to boot.
El Reg says that neither Intel nor any of the companies whose products are apparently affected by the issue would actually confirm or deny that C2000 Atom CPUs are the point of failure, but Synology apparently tipped Intel's hand when it issued a statement to the site regarding some of its storage appliances. That statement confirmed that Synology products with C2000 chips inside were affected, but the company said it hadn't seen an increase in failure rates related to the parts. Synology subsequently requested that The Register remove all mentions of Intel and Intel products from its earlier statement.
Synology isn't the only company that used Intel C2000 products, however. Cisco recently issued an advisory to its customers noting that it "became aware of an issue related to a component manufactured by one supplier that affects some Cisco products. In some units, we have seen the clock signal component degrade over time. " The company wouldn't admit the name of the supplier to The Register, but Intel's own erratum update (AVR54 in this document) states that "the SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning" in affected chips. Those clock signals are apparently related to critical functions performed on boot, and without them, a system is dead in the water. The Register used that info to put two and two together.
Cisco did provide a list of products it expects will experience increased failures related to the C2000 processors, and it says the issue could start rearing its head after the affected gear is in operation for 18 months or so. The company will replace affected components so long as they're under warranty or a service contract. If you have some Cisco switches or ASA security applicances, be sure to check whether they're affected.
For its part, Intel told The Register that a "board-level workaround" could be applied to systems in production with the affected steppings of its products. The blue team will also be issuing a new stepping of the Atom C2000 series to resolve the issue at an SoC level. There's no telling what the cost of this issue will be to Intel or its suppliers once the pace of failures picks up, though.