For years, CPU manufacturers used Moore's Law to scale up the frequency of the chip. This helped drive single-core speeds upward for decades, until 2005 when Intel hit a thermal limit at 4 GHz per core. Single-core clock speeds have no longer scaled up nearly as fast with subsequent process shrinks. That's a big reason why the per-core speed from Sandy Bridge -> Ivy Bridge -> Haswell -> Broadwell only increased ~10% each generation.
Next they started increasing the number of cores per CPU. That helped a bit, but they are starting to hit diminishing returns there too. Most consumer-level software doesn't benefit beyond a core or two, you have to do substantial rewrites of your code to take advantage of multiple cores, and even then most normal programs won't scale well beyond 4-8 cores due to Amdahl's Law.
So their next use of die space from process shrinks appears to be a) adding massive amounts of on-die cache and b) adding new instructions and offloading some computations to dedicated silicon (video compression/decompression, encryption, speech recognition, audio processing) Broadwell is rumored to have 128 MB of cache and an audio DSP. Imagine Commonlake+1 packing 1 GB of on-die Z-RAM cache and lots of dedicated silicon goodies.
But there's only so much cache you can use effectively before diminishing returns again kick in. So assuming CPU manufacturers will be able to continue pulling off process shrinks in the post-silicon era, how would they effectively use the extra die space to increase computation?