I’ll try to do a proper write-up later, but I thought some of you might like to see my raw notes from the ATI Stream computing event in San Francisco today. Interesting stuff.
ATI Stream Computing event
CEO of ATI, Dave Orton
CEO – That title will last another three and a half weeks or so
Fairly unique launch to a direction we’re taking as a company, industry
Targeted to support top commercial, … partners to solve world-class problems
Lots of computational power in the graphics processor. How do you leverage that and find a class of problems that can map to that?
Why “stream?” Because it’s about data flow. This is what differentiates stream computing from other types of computation (like CPU). Can achieve the same kinds of advantages you find for graphics on the GPU.
-Climate research – up to 20X the speed of CPUs
-Homeland security – communication analysis, facial recognition, video/audio recognition
-Risk assessment and derivatives pricing
-Seismic modeling and analysis – oil and gas – flows of the data work extremely well with stream computing – ~10X overall improvement over CPU
-Can couple data and visual processing here
-ASCI type platforms to simulate first nanosecond of a nuclear explosion
-Search – very large databases, lends itself very well to stream computing
-Consumer applications – photo searches
-Opportunity to do physics processing in video games, automotive design, codecs
Orders of magnitude faster than CPU, but not trying to compete with CPU. We need CPUs, support each other in a very complementary way. Will address a whole new class of problems with Stream computing. CPU and GPU will work symbiotically to attack these kinds of problems.
ATI-AMD has great opportunity to create a platform solution as well. Over time, you will see this evolve. How do we drive the ecosystem of hardware, software, and overall solutions?
Why here talking about this instead of five years ago? These problems are becoming more relevant to a broader audience. Research has been done. GPU has evolved to being multiple processors. 48 today, general FP engines. Direction now is to move to common elements for geometry, raster processing. Today, it’s 48, future, maybe it’s 96.
CPUs are now talking about two, four cores.
We have 375 gigaflops, 64 GB/s. We’re about a third of a teraflop. Next generation will exceed half a teraflop on a single GPU.
This class of problems scales very well for parallel processors.
Vijay S. Pande – Assoc. Prof. of Chmistry, Director of Foliding@Home project, Stanford U.
Michael Mullany – VP of Marketing, PeakStream, Inc.
Chas Boyd – Architect, Graphics Platform Unit, Microsoft Corp.
Jeff Yates – VP Product Management, Havok
Surprise at the end
We are working on diseases that involve mis-folding of proteins.
Healthy brain vs. Alzheimer’s brain – Amyloid plaque, toxic agent
Toxic element is a protein that is mis-assembling
We understand what the toxic elemenrts are, but don’t know why they’re misfolding or how to stop them
Futher experiments are very, very challenging
Turned to computer simulation, distributed computing – Folding@Home
Currently have power of 200K CPU supercomputer – more powerful than all NSF supercomputers combined
People fold wherever there’s electricity (map)
People go to huge extents to compete
Would be really exciting if we could get PCs to fold faster
We turned to GPGPU
FaH on X1900 XT – 20-40X speed increase – What used to take FaH 30 years can now be done on one year
New FaH client for GPUs – can get ~100 gigaflops per processor
Oct 2: Release beta version, other versions to follow
If just 5% of folders join when we turn it on, we’ll have a petaflop machine by the end of the day
Demo – CPU vs. GPU – CPU is almost static, GPU shows constant motion happening
GPU temp is still relatively cool as far a GPUs go, power consumption is low – about 80W per GPU
New results: announcements of the first FAH results on Alzheimer’s in the coming mornts
GPUs will give us new possibilities
Michael Mullany, PeakStream
New company, just launched last week
Have build a software app for stream computing
HPC is a huge market, lots of use in high-margin businesses
Autos, aerospace, oil & gas
These customers care about performance, appetite for new capacity is insatiable – Chemists want a zetaflop machine
Machine has to have good perf, good perf/watt, good perf/sq. foot
What is stream computing exciting? It’s all about FLOPS.
GPUs have way more FLOPS than CPUs, growing faster.
Making that power accessible to people is a barrier.
HPC guys like their existing tools, libraries, etc.
What’s missing is an application platform that makes stream computing accessible. We provide a platform.
Working with Hess – largest independent US oil and gas producer, $23bln revenue last year.
HPC is used to find oil and how to extract it
Let off controlled explosions at surface, capture echoes and analyze to determine rock laters, etc.
Demo of acoustic wave explosions working on CPU and on GPU.
GPU about 15X as fast.
Working with Hess on a number of algorithms, including this one.
Can take on new challenges and levels of resolutions
Standard dev tools – gcc, Intel compiler
PeakStream VM hides the hardware specifics
Smaller hardware footprint and power footprint
Financial – Derivatives
Some optoions are very difficult to determine value – have to simulateD
Targeting oli & gas, financial services, defense, and academia
Defense – working on signal processing in a mobile application. They were working with a GPU, but we came in and delivered a 5X improvement.
Academia – working on computational fluid dynamics.
All about making stream computing power accessible.
Chas Boyd, Microsoft
Worked on DirectX in past, now I’m doing GPGPU.
Now only is constraint compute power, but also memory bandwidth, and GPU has much more.
Identified this is a strategic area, working with all of our parnters on this.
Huge ecosystem – middleware, developers, OEMs, processor vendors. We try to act as a go-between to improve perf for everyone.
Incorporating GPU concepts into the OS iteelf. Fairly major restructuring of OS stack and organizational structure.
Two new ones – image editing and user interfacei itself.
Real-time image editing using GPU. No waiting half a second to see if change works. Making multi-gigapixel processing work at full speed requires GPU.
Entire Windows UI – Aero.
This is now a MS OS running to a large extent on a GPU, number of parts of MS working on GPU is growiing expontentially.
DirectX – physics, particles, fluids, collisions, being successfully mapped to GPU using DirectX APIs. Over time, we’re evoling API to facilitate, from DX10 and forward.
Not only non-graphical apps are being accelerated, but also graphical problems by using GPU in a non-graphical way. Depth-of-field and focus blur effects in real time. Differential equations and heat diffusion run at higher performance.
Can run prefix sort, using GPU as a general-purpose processor, can solve a problem that came from graphics and put it back over into graphics with a net win.
Enabling developers to think of problems two ways: graphical problem or stream computing problem. Can get better solutions with flexibility.
MS recognizes GPU work is strategic.
Jeff Yates, Havok
Physics is central to interaction in games
Now looking to unbridled action
Physics is increasingly at the center of game computation
Games are needing to rebalance how they’ve used computational power
Physics today with one CPU – “stacks of objects”
Need more power to reach total immersion
Stream computing allows us to do that
Stream processing + physics = 10X jump
-Built specifically for stream computing
-Physics on a massive scale
-10s to 1000s of objects
Why use the GPU? (slide)
-GPUs is a next-gen plaform
-Huge installed base
Boulder demo shown at Computex
Rigid body objects
Pushing forward, with new hardware even better
Moving now into a domain where we can actually start engaging gameplay on the GPU
Demo of gameplay physics – Brick War
Castle – Lego-like bricks
13,500 objects, full rigid-body dynamics
Goal of game is to knock down other team’s men using cannon
Playing game of Brick War
Collision detection drives rigid-body dynamics and sound
Crumble mode – all objects fall at once
Rendering is done on one CPU, physics is done on a second
Product going into beta, into hands of developers and in game very soon
Game devs will need to reapproach how they think of physics, previous limits will be gone
Just some examples of where we are today and a sense of where it can go in the future.
Not a product launch today. ATI wanted to recognize there’s a vision and an opportunity here. Talking about our commitment to this segment.
Bob Drebin and Raja are here, ask them what we are doing at the core of our GPUs. We recognize we need to address other areas from a low-level hardware and software perspective.
Working on marketing stream computing.
Peddie: Do you envision the shaders in your GPUs becoming full IEEE 32-bit compliant?
Orton: Won’t reveal our roadmap, but we recognize the need.
Chas: We’re all interested in seeing more convergence here, not going to make sense if data isn’t in the same format.
Bob: Each generation is moving closer to what IEEE standard has.
Aaron R from InfoWeek: For Chas, you showed Aero and image editing. Can you talk other stream/GPU features that would come out post-Vista release? The type of features MS adds later?
Chas: DX10 is coming. Most of Vista built on DX9, but there’s more opportunity with DX10 going forward. Working with partners. Maybe that’s publishing a paper for GPGPU with DX10, other things hooks into OS.
Aaron: When did DX10 ship?
Chas: Wil ship with Vista.
IDC guy: Curious where we are in coordinating operations of CPU and GPU.
Mullany: We have simplified that greatly with smart runtime. We try to bring it back to CPU as rarely as possible.
Orton: Your profiler checks that?
Mullany: Yes. Makes sure you’re not thrashing data between GPU and CPU.
Chas: One of the goals with DX10 was to polish switching between CPU and GPU, make that as easy as possible.
Wil: What are advantages of GPU over physics add-in card?
Jeff: Same piece of hardware can do multiple things with GPU, there’s a challenege for a piece of hardware that cna do one thing.
Charlie: Add-in card has a set level of power, whereas GPUs vary widely. That’s a problem?
Jeff: Yes, there will need to be a target, in fact lower bar, that game developers will need to establish. Senstively tuned for core gameplay especialy. I think developers will get that done very soon.
PC Pro Germany: How unique is stream computing for ATI? For MS, will this be possible with other graphics cards?
Chas: We try to make interfaces as agnostic as possible.
PC Pro: Specific features in ATI hardware?
Chas: Every vendor has slight strengths, but we don’t see broad divergences that needs to be managed.
Orton: This is an angle of pursuit. As we launch more products, you will have a better idea of what we’ve done differently. PeakStream and Stanford has looked at that.
Bob: We don’t have specific benchmarks. Definitely different architectures have different perf, and ATI GPUs are quite good at that.
Raja: FaH, Havok, PeakStream chose X1900 because it has best perf, order of magnitude better than what is out there.
Orton: We’re not trying to say which is faster today, but stay tuned.
Peddie: Current GPUs have things in them not relevant to GPGPU. Ever enviision you making specific GPGPU products?
Orton: One of the big advantages of GPGPU is that they are GPUs. As Jeff said, flexiblity is good, lot of scale for that. But economics will definte that over time, not commiting to a roadmap today.
Chas: I was once concerned about conflict between GPGPU and graphics applications, but it turns out we were able to use GPU’s GP power to solve a graphics problem. We get synergies that benefit both sides of space.
Theo V. from the Inq: When it comes to building HPC data center, there are certain platform requiements. Are you thinking about those requirements when building next-gen products. I can’t imagine an X1900 XT in a 1U chassis.
PeakStream: It’s about power per watt and square foot. Our customers are using 2U chassis, and it’s very acceptable. We’d love to see 1U and half U factors, but don’t need to get started.
Orton: The platform is the whole infrastructure. What we’re launching today is a direction. We’ve talked about software and not hardware. You’ll see more from vertically integrated software houses.
Red Herring: What iwll this do in terms of cognition, cognitive computing?
Raja: I can take that. Answer is yes. GPGPU research recently has foucsed on cognition on GPU. We are just starting to get impressive results. Making algorithms possible to deply in a mass market way–facial recognition, gesture recogniton. We can talk more offline and show you some demos.
Some guy: How big can this market become, and what needs to happen?
PeakStream: Program opportunity is what we were founded to solve. Better profiles, smarter compilers, we have a JIT complier–lots of opportunity for optimization there.
Orton: HPC market has lived in area of libraries. Vijay is attacking problem that scales phenomenally well. MS and Havok are bringing it into consumer space.
IDC data talks about HPC in terms of revenues, etc. But we’re talking about opening up new areas such as search. Is that being captured?
IDC: No, Google search is not.
Orton: See this could open that up. Hard to size.
ATI guy: Scientific and technical markets will be where it starts, but vision is expaning into many areas. Opportunities are out there, starting with big-value, well-defined markets.
Orton: It’s easy to lose focus, but we have some clear proof points today. Continuity of data across CPU and graphics helps it develop.
Some guy: Do we see synergy between what’s happening in GPUs and what happened when CPUs first formed?
Chas: Yeah, a lot of commonality there. Idea is that it will be equally easy to target CPU and GPU. Many lessons learned on CPU side will be applied to that problem.
Orton: Not trying to replace CPU at all. This is focused on data and what classes you can move through a GPU more efficiently. Have broadened beyond graphics.
Peddie: Characterizing GPGPU as a coprocessor or is there a collision with multi-core CPUs from Intel and AMD?
Oteron: I think both. Looking less today at computer architecture than at data. If data lines up well to move through GPU effective, that’s what we’re trying to address. CPU is also trying to address broader set of problems.
Bob: Have to get from point A to point B. There are classes of problems. You can take a processor that does one think at a time and do it a bunch of times. Might be less efficient. Maybe with CPU bandwidth is limit. Certain things are sequential and others are highly data parallel.
Orton: Might take more offline.
Some gal: Vijay, you planning to migrate to less distributed environment now that you have GPUs?
Vijay: We want to continue distributed, but we have in house an X1900 cluster. Will also make our software available for others so they can push their work faster with GPU.
Talk w/Vijay Pande
ATI is currently 8X faster than Nvidia. Nvidia has our code, running it internally, hope we can close the gap. But even 4X difference is large, and ATI is getting faster all of the time.
Lot of work goes into qualifying GPUs internally so they can run.
Making apps like this run on a GPU requires a lot of development work. Currently, science is best served by using ATI chips. Nv may come in future.