Tune in for our Skylake live stream tonight with David Kanter

We're having a special episode of the TR Podcast live stream tonight at 6:30 PM PT/9:30 PM ET. David Kanter will be joining us to talk about this week's Skylake news—and especially about the CPU architecture changes in Intel's new processor.

If you have questions for us to address on the show, you can join us live on our Twitch channel or drop them into the comments below.

I'm sure David will have some interesting insights for us, so plan to join us live if you can. These conversations with David are always interesting. Can't wait!

Comments closed
    • camelNotation
    • 4 years ago

    I have a question about the eDRAM implementation:

    Are there possible disadvantages with Skylake’s memory-side cache implementation, vs. the L4 for Haswell/Broadwell? For example, would the tags have to consume 2MB of SRAM on the eDRAM controller?

    • chuckula
    • 4 years ago

    Dear Doctor K!

    Diving into the deeper minutia of Skylake we see some pretty significant improvements to the front-end including the internal register file, in-flight instructions, OoOE, etc. On top of that, they’ve doubled the inter-core bandwidth over the ring-bus in high performance mode. However, the improvements to most workloads haven’t been that impressive with the exception of workloads that make heavy use of AVX/AVX2.

    So here’s the two-part question:
    1. Do you think that most of these improvements in Skylake were really meant to support AVX-512 even though Intel didn’t actually include (or activate) it in the production Skylakes for whatever reason?

    2. Even though Skylake actually has some pretty impressive [b<][i<]micro[/i<][/b<]-optimizations there really wasn't anything in the CPU that would be considered a [b<][i<]macro[/i<][/b<]-optimization like fundamentally different execution units, etc. Do you foresee anything in the next 3 - 5 years that could be a big departure from the basic microarchitecture that's been in place since Sandy Bridge?

      • Milo Burke
      • 4 years ago

      I can’t wait to hear Scott read “OoOE” out-loud. And he knows not using the acronym would be condescending.

    • Milo Burke
    • 4 years ago

    Windows 10 is quite perky on my laptop’s Silvermont > Bay Trail Z3735F processor. It’s a 2.2 watt, quad-core, 1.3 GHz little wonder with a list price of $17. It allows for a passively cooled laptop, and it’s perfectly sufficient for that non-techy friend.

    I haven’t heard much out of Airmont > Cherry Trail released earlier this year. Did they not make as many design wins? Or is Intel less interested than they used to be in creating products like this?

    Lastly, what new goodness can we expect from Goldmont > Willow Trail?

    • SeaPUMonkey
    • 4 years ago

    According to the Intel Turbo Boost Technology Frequency Table the i7-6700K only boosts the frequency to 4.2 GHz when one core is active. The i7-4790K boosts to 4.2GHz even when all four cores are active. Why is the Skylake clocked 5%-7.5% lower than Haswell, when it overclocks better and has a higher TDP on a smaller process? Could a 14nm process that’s optimized for lower power and frequency be the culprit? And if so, will Intel develop a high performance/high power 14nm process for Kaby Lake, making it a kind of Tick?

    • CScottG
    • 4 years ago

    Yes, a question – and one I’d appreciate being answered here at some point after the live stream.

    Question:

    Are there any plans to update the Atom C2570/2550 any time soon (within 6 months), and/or

    Is there a near equivalent having these same features:

    1. low cost
    2. multiple real cores (8 or 4)
    3. very low power (20 or 14 watts or less, hopefully less)
    4. 32 gb+ ecc memory capable (and preferably registered)

    I ask this because I’m considering a near purchase of an Atom C2550 board as an always-on personal “net” appliance (vm’s assigned different tasks – mostly “browsing” related) . I want all 4 features, I’m not willing to “give up” any of these features.

    Of course I’d like to see lower power, registered memory, and some added extras relating to virtual machine hardware access and added security.

    (..as a reference, the board I’m looking at is the SUPERMICRO A1SAM-2550F.. and while I’d like the 8 core version, I’ll take this lower power and lower price alternative – the 4 core count should be enough for my purpose.)

      • zzing123
      • 4 years ago

      Not eschewing the C2570/2550/2530 in any way as they’re damned good chips for what they are, but for not a lot more $, you can get a much, much more potent platform in Xeon-D (see: [url<]http://www.servethehome.com/intel-xeon-d-1540-power-performance-preview/),[/url<] via the Supermicro X10SDV-TLN4F board, if you're looking at that sort of thing... That said, I've not seen any talk of a Skylake version of Xeon-D. With DDR4, I'd be theoretically possible to get 256GB RAM on Mini ITX, which compared to the Broadwell Xeon-D with an already ridiculous 128GB RAM makes it even an more amazing platform - any news on the wire for Skylake Xeon-D?

        • CScottG
        • 4 years ago

        While a beautiful bit of kit – that board+cpu is about $900 and is rated at 45 watts..

        The C2550 board + cpu is about $260 and is rated at 14 watts..

        -so not comparable. (..bummer.)

    • zzing123
    • 4 years ago

    Sorry for the fairly esoteric question related to software development as opposed to normal Tech Report hardware stuff.

    With Skylake, we finally have working Transactional Memory (TSX) extensions and also a new set of extensions for memory bounds checks (MPX). TSX is a favourite area of DK’s, but being a managed language (Java / C#) developer, MPX seems to be perfectly matched to handle the one major issue of performance of these managed languages being memory bounds checking.

    From a gaming perspective it’s relevant only in the fact most game engines use an interpreted scripting engine like Lua to do a lot of the AI that could take advantage of MPX if it’s engineered as intelligently as the JavaScript runtimes of most browsers. However Unity is a major engine that’s based on C# that would be dramatically boosted by hardware-accelerated bounds checking in the CLR.

    The question I have therefore, is since Intel already have their C/C++ and Fortran compilers and related tools, why don’t they provide modifications to the JVM and CLR to promote adoption of these extensions, and would it be possible to drop a line to Intel to ask them why they don’t do this?

    Developers of a managed language target the common framework, not the CPU. It’s the distribution (e.g. platform-specific JVM/CLR) that exploits the CPU, so Intel need only provide support for a few frameworks to enhance support across a much wider range of codebases.

    • Ninjitsu
    • 4 years ago

    Skylake has a 4-way set associative L2 cache, instead of 8-way as is usual. Bandwidth has been boosted to compensate, apparently. Anything worth worrying about?

      • SeaPUMonkey
      • 4 years ago

      And is there any change that the Xeon Skylake has a 512KB 8-way set associative L2 cache, maybe for AVX-512?

    • jts888
    • 4 years ago

    It looks like Skylake is the only GPU/APU to fully support every tier of every DX12 feature, including the optional extensions.

    Could you guys discuss the performance and quality implications of some of the features:
    [list<][*<]Rasterizer-ordered views [/*<][*<]Conservative rasterization [/*<][*<]Stencil ref values from shaders [/*<][*<]Resource binding Tier 2 vs. Tier 3 [/*<][*<]Tiled 3D textures [/*<][*<]low-precision (16b) floats [/*<] [/list<] The only optional feature that seems both straightforward and useful to me offhand is the GPU depth sorting, but it's not apparent to me how much faster this is than doing it on the host CPU. I know that you're probably bringing in David to talk more about Skylake, but I always enjoy hearing his thoughts on more general architectural issues as well. Also, what's the deal with Intel's tick-tock failure with the recent announcement of Kaby Lake? Gossip about Intel's 10nm stuggles and any post-14nm lithography in general would be appreciated.

    • techguy
    • 4 years ago

    Can we turn this into a movement? [url<]https://techreport.com/forums/viewtopic.php?f=2&t=116298[/url<] I asked during the last live stream featuring DK how likely Intel is to offer such a product, now the question is: can we force their hand with enough feedback from end-users and the tech press?

      • Ninjitsu
      • 4 years ago

      When Intel gave us Crystalwell, Anand and some others were pestering Intel to do this. Intel delivered, everyone on TR screamed MEH. Then Scott published his review, everyone screamed HOLY GRAIL. And now Scott and the rest of us want Intel to do this again.

      And then we’ll all scream MEH again, cause Intel.

      It’s hilarious. But yeah, I hope they give us Skylake-C eventually.

      • wingless
      • 4 years ago

      Will the L4 cache provide the same level of benefits in DX12 as it did for DX11? DX12

      • klagermkii
      • 4 years ago

      There are no 4-core Skylake laptop chips with eDRAM either (i.e. Iris >= 540), I don’t think this is a purposeful desktop snub.

      Quite possible that when those chips are released in early 2016 we could have an announcement about a corresponding desktop chip.

    • jts888
    • 4 years ago

    I have a few questions about current (Maxwell/GCN) and upcoming (Pascal/Arctic Islands) GPU architectures.

    Firstly, in light of the recent Ashes of the Singularity DX12 benchmark drama, could any of you clarify the status of Maxwell being able to do compute shaders asynchronously with more general graphics computation, and whether this functionality is even explicitly part of DX12 or just an implementation optimization?

    Second, is there any strong information currently available suggesting that Pascal/A.I.s will be architecturally much different than their predecessors, other than being 16/14nm FinFET and supporting HBM 2.0? Maxwell might have asynchronous computation capacity left on the table, and GCN appears to use larger ROP pixel blocks than Nvidia (4×4 vs. 4×2) that cause them to suffer disproportionate pixel throughput loss on very highly tessellated scenes, in addition to their currently missing DX 12_1 optional features. (Also, if DP 1.3 doesn’t get into both chips I’ll be seriously disappointed, but AMD didn’t even add HDMI 2.0 to the R9 300/Fury series.)

    Finally, how long should we expect GPUs to be stuck on 14/16nm? TSMC’s 28nm node will have been around for four full years by the time Pascal/Arctic Islands roll out, and everything I hear about <10nm fabrication with quad (or more) patterning, water cooled dielectric mirrors, and inefficient EUV light generation just sounds extremely cost ineffective.

      • Freon
      • 4 years ago

      I’m also interested to hear opinions on the whole Ashes DX12 situation. There seem to be some interesting behaviors based on how the work is the parallelized.

        • jts888
        • 4 years ago

        The layman’s theory is that ALU-only (“compute”) shaders can run concurrently with shaders that use fixed function units (tessellator, triangle shader, ROP/TMU) on GCN when the ALUs would otherwise be idle, and that while Maxwell can juggle ~32 queues of compute shader work, it can’t process them simultaneously with the primary graphic instruction flow.

        According to the AotS devs, Nvidia’s DX12 driver accepts requests for this sort of overlapped computation, but performance drops heavily enough when doing so that they believe that Maxwell is doing some sort of expensive context switch going between “compute” and “graphics” modes.

        I would like to hear David’s speculation on this, but it doesn’t seem like something Nvidia would make public if it really is a shortcoming in Maxwell’s design.

      • Pwnstar
      • 4 years ago

      I can answer the Pascal question now: there isn’t any info like that. Maybe David knows non-public stuff?

      • TheFinalNode
      • 4 years ago

      I’d also like to hear Scott and David’s outlook on the 14/16nm process. When AnandTech posted an announcement in April 2013 that Raja Koduri is back at AMD (with the headline subtitled: “The King is Back”), it was made to seem that Raja would be heading big changes to upcoming GPUs and how they are developed. As if he was possibly heading a new architecture like when Jim Keller came back to the company in August 2012.

      Assuming that Arctic Islands is GCN 1.4, a power-efficiency-focused FinFET adaptation of the GCN architecture, then could 2017 be the year the successor architecture to GCN is released? I know the mobile SoCs designers can release a new architecture on a new process node without as much difficulty because the main cores take up only a small fraction of the die area, but will AMD and Nvidia move to sort of a tick-tock cycle where we see a new manufacturing process, then a new architecture on that more mature process, refinements the third year, and then repeat?

Pin It on Pinterest

Share This