Does anyone have any technical insight on what Haswell new instructions TSX, AVX2, and FMA3 mean and how they will work?
These terms encompass sets of instructions intended to address particular weaknesses/points of emphasis in x86 or to open new areas of development:
TSX -- Transactional Memory support. The previously-linked RWT article is probably the best introduction to the details of Intel's implementation, but you may need to do some background reading on the general concept to make much sense of it. At the highest level, this offers a potential mechanism to get more real throughput out of multi-threaded code by eliminating unnecessary serialization (where the code running in one thread thinks it needs to wait on changes being made another thread, when in a particular case it actually doesn't). There are actually two separate mechanisms defined by Intel: one which has limited potential but can be implemented without major changes to existing code, and another which is more exciting but will require significant re-architecting of most codebases.
AVX2 -- further extensions to the AVX set of instructions introduced with Sandy Bridge. This is the latest generation of SIMD instructions (in the lineage that stretches back through SSE to MMX), further extending the 256bit operations and among other things adding a GATHER instruction (to collect operands from a series of non-contiguous memory locations, which potentially eliminates a lot of gyrations involving multiple loads and moves when working with common data structures).
FMA -- Fused Multiply-Add. This is a particular operation that is very common in floating point code (including physics, 3D, and image/video/audio processing); the 3 in FMA3 refers to the 3-operand version (ie two numbers multiplied together and added to a third) without losing any precision in the intermediate steps. CPUs without an FMA instruction have to perform several other instructions to get the same result, which obviously takes more cycles/time; this has been one long-standing criticism of x86's SIMD implementation versus competitors such as Power's Altivec. Of some concern is the divergence of AMD and Intel with respect to FMA implementations; without going into the history and details of their various FMA3 and FMA4 specs, we should just note that a potential incompatibility exists which might (further) limit the adoption of the instruction. However, Intel clearly has the momentum wrt future marketshare and (especially) developer-relation resources, while AMD has shown a history of conforming to Intel implementations (eventually) when necessary, so FMA3 at least should see adoption over time.
AVX2 and FMA (which really should be considered just part of AVX, if not for the circumstances of its birth) are clear enhancements for floating point code, and of course that makes them potentially interesting for immediate real-world applications like gaming. TSX is significantly more speculative and long-term (much of the work to date with transactional memory has remained in academia, because the software-only implementations were too slow to be of practical utility), and its most likely first applications will be in specialized contexts like HPC where the hardware is known and the entire codebase belongs to one set of developers. Thereafter it will show up in server applications like MySQL that employ many interdependent threads. Its migration to the actual OS or consumer-level applications is probably quite distant, and will await lessons learned by both Intel and software architects experimenting with this first iteration (there are some obvious limitations in this initial implementation, and I would expect many of those to be mitigated before it sees employment in, for example, the Windows kernel). Of course the nature of open source, and the level of academic interest in the subject, means that there will be plenty of experimental forks of existing codebases to take advantage of Intel's work -- which is kind of exciting, in that we don't really know how those will turn out. The total gains generally will be modest -- especially with the more limited "easy" version Intel offers -- but since this will be the first widespread deployment of a heretofore experimental concept, interesting results might turn up.
However, it's worth emphasizing that -- like any new instructions -- all of these require changes in existing code; there's no "free" upgrade here like we often see with other CPU features. And we haven't even seen much AVX code yet (AFAICT) in non-specialized contexts, despite all the shipments of AVX-enabled SB/IB, so patience is required. The vast majority of developers don't work at the level of assembly language anyway, so at the very least they wait on compiler support; but in most cases the dependency is on underlying code they don't own -- libraries, game engines, etc -- which have to be modified to varying degrees (FMA a relatively easy in-place replacement, TSX a potential throw-it-out-and-start-over). Meanwhile these ISA extensions only exist in new processors, which will constitute a tiny (though ever-growing) fraction of the market, so the return on investment for developers doing those changes is initially very small. The new code has to be put behind switches so that it falls back to the boring instructions we're already using when it is run on pre-Haswell CPUs (one exception to this is Intel's "easy" version of TSX, which is cleverly designed to work correctly even on older processors, but as I said the potential gains from that version won't be huge.)
So while these are all arguably exciting features, they're not going to do anything for you the day Haswell arrives... and in fact there's pretty much an inverse relationship between how exciting the feature is and how immediately we'll see its benefits. FMA and "easy" TSX are relatively cheap but not super exciting, AVX2 is more exciting but will take longer, and "real" TSX could be pseudo-revolutionary but probably will only be of any real note sometime after Broadwell ships.