Soft Machines reveals performance targets for its VISC processors

Soft Machines continues to develop CPUs built using its VISC architecture. If you're not already familiar with VISC, be sure to read our in-depth look at VISC from October of last year. As a quick refresher, VISC works by inserting a middleware layer between the native instruction set architecture of the operating system above and the hardware beneath. This middleware then translates the native ISA into VISC's native instruction set before distributing the workload across the processor's virtual cores.

VISC's trick is that even in single-threaded workloads, it can break that work into chunks that Soft Machines calls "threadlets." In turn, a VISC CPU can distribute the work of a demanding single thread on a virtual core across multiple hardware cores. It can also dynamically provision computing resources in mixed workloads where a demanding thread and a lighter-weight task need simultaneous access to CPU resources. That flexible resource allocation purports to allow VISC to deliver two to three times the instructions per clock of traditional CPUs.

The company says it remains on-track to deliver Shasta, its first commercial core, this year, and it's releasing some more detailed performance targets today for its Shasta, Shasta+, and Tahoe cores to back up its claims of 2-4x increases in performance per watt over competing architectures.

We still don't know many details about the Shasta CPU itself yet. We do know that it presents one to two virtual cores to the operating system on top of two physical cores. Each physical core has 1MB of L2 cache. The CPU uses a 64-bit ISA, and Soft Machines expects it to run at speeds up to 2GHz.

Soft Machines won't make its own chips. Instead, the company will license its VISC cores to hardware partners that want to implement the technology in their own SoCs. The company will also offer a VISC SoC of its own that it'll work with partners to customize for use in devices like smartphones, tablets, and entry-level convertibles.

The company thinks it can deliver up to 2.5 times the performance of competing chips at the same power level, and it believes its chips will be up to four times as efficient when delivering the same performance. As we'll soon see, those figures are averages of the company's test results rather than a hard maximum.

To establish its performance targets for Shasta, Shasta+, and Tahoe, Soft Machines says it's tested a variety of production SoCs and CPUs using the SPEC CPU 2006 benchmarks, including an ARM Cortex-A72, Apple's A9X, and Intel's Skylake Core i5-6200U. The company says it normalized each chip's results to assume a 16-nm process and 1MB L2 or last-level cache per core.

The power and performance results for the Shasta, Shasta+, and Tahoe chips were produced using the company's internal performance simulator using the SimPoint input method. The company says the performance and power usage of its 28-nm prototype silicon correlates well with its simulator results for that chip: within 5% for performance and 10% for power. Given those results, Soft Machines is confident about the results it's sharing today, as well.

The first chart the company is releasing today demonstrates the efficiency of each chip. The Y-axis charts the energy used per unit of SPEC CPU 2006 score, while the X-axis is a geometric mean of the benchmark's integer and floating-point tests. The nearest-term comparison here is the ARM Cortex-A72 versus the Shasta virtual core running on two physical processors.

By this measure, Soft Machines says the Shasta core is two to three times as efficient as a projected 3.23GHz Cortex-A72 when delivering the same performance. If the company's Shasta+ chip arrives on a 10nm process in 2017 as expected, it could be about 4.5 times as efficient as Apple's A9X CPU cores, while 2018's Tahoe chip (running one virtual core on four physical cores) could be as much as seven times as efficient as a Core i5-6200U-class Skylake chip.

The second chart demonstrates how much SPEC CPU 2006 performance each of the tested chips delivers as power and frequency scales. For the same power, the company claims a pair of 2.3GHz Shasta cores running one virtual thread offers 1.8 times as much performance as a theoretical Cortex-A72 at 3.4 GHz, while a Shasta chip providing the same performance as the A72 can run for one-third the power level. Shasta+ and Tahoe could deliver even greater performance.

If these numbers hold true, they offer reason to be optimistic about VISC's performance claims. We'll just have to wait and see whether that's the case later this year when VISC processors are expected to appear in actual hardware.

Comments closed
    • ChrisGX
    • 7 years ago

    VISC, perhaps, shares some common ground with what the researchers of “MorphCore”, including a certain Chris Wilkerson of Intel Labs, have been investigating in recent years. VISC may amount to a small advance, a large advance or no advance at all over current processor designs. Still, it is a bit rich to suggest there can’t be anything to this VISC thing because Intel hasn’t seen or acknowledged (so far) the potential of that particular tangent of development. Furthermore, it is an ill-informed suggestion, as Intel’s involvement in the MorphCore research shows.

    • robliz2Q
    • 7 years ago

    Well it is more rigid.. 1 thread is on one core, and each core can ONLY run 1 thread at a time. SMT allows a stalled thread to wait on memory load or store say, whilst an alternative is active.

    SMT and superscalar or OoOE never allows functional units to serve a thread running on another CPU. Hence description “more rigid”, it is.. VISC seems to smear things out

    • Mr Bill
    • 7 years ago

    If this paradigm could work, it would already have been done at the silicon level by Intel or AMD.

    • the
    • 7 years ago

    AMD and Soft Machines also have some common investors too.

    However, a lot of the VISC block diagrams are similar to those used by AMD for Bulldozer. This makes sense from a high level as both companies focuses on a sharing common components between multiple cores. The front end responsible for decoding and scheduling is shared in both implementations for example. Their implementations should be radically different as the native ISA used by Soft Machines doesn’t even need to be exposed.

    The ISA on the virtual cores could be x86 which Soft Machines translations into the native ISA used by the physical cores. Also noteworthy is that Soft Machines does not share back end execution resources like Bulldozer does.

    • SuperSpy
    • 7 years ago

    If this was legit I’d expect an Intel buy-out in 3.. 2.. 1..

    • just brew it!
    • 7 years ago

    The slides and article indicate quite clearly that they are from simulations, not real hardware.

    • TopHatKiller
    • 7 years ago

    AMD invested in ’em. Bizarre rumors circulated that the ‘visc architecture’ was Zen.

    • Duct Tape Dude
    • 7 years ago

    Thanks, NTMBK. Reading his points and skepticism makes me feel a lot better.

    • mesyn191
    • 7 years ago

    Yea they’ll have to pull off some tricks to make the higher bandwidth help. Even doing “dumb” things like embedding some SRAM into the HBM would do wonders with certain operations like prefetch even if they can’t lower the latency much. I think they’ll have better tricks than that though. HBM is still very new.

    • thedosbox
    • 7 years ago

    Yeah, I’m always sceptical of graphs that don’t start at a zero base.

    • robliz2Q
    • 7 years ago

    HBM2 looks like another step down the integration path, so smaller compacter neater systems.
    BUT, small L1 and slow L2 cache was part of reason Bulldozer was fail. Same latency high bandwidth RAM helps graphics but is unlikely to unblock the slowing performance gains. As Rza79 was correct in pointing out L1/L2 do a good job most of the time (at least for tuned software).

    The frustration is seen, because Dennart scaling ceased around 2005.

    • robliz2Q
    • 7 years ago

    Benchmarks tend to be very well behaved programs, there’s no guarantee the virtual thread can be spread onto 2 physical cores in more general usage, which may show more serialisation.

    When you write high performance programs you find you can do all kinds of pedictable pipelined operations, seemingly almost for free, but adding in 1 branch too many or exceeding L1 cache severly reduces performance, but is necessary in more typical program logic.

    VISC figures so far, are carefully controlled … a big pinch of salt is required. The original IPC model figures for instance, were based on a slow clock. Skylake “equivalent” and normalisation for planned processes with assumptions about caches.. come on, it’s smelly. How often do marketing ppl show cherry picked figures?

    • Rza79
    • 7 years ago

    I don’t agree. I think both of you are thinking of a real life scenario which includes user input and loading from a storage device. Well that’s not how CPU performance is measured.
    Modern day CPUs have L1 caches that go over 1TB/s, very good preemptive caching, excellent branch misprediction, … that cache misses are not such a problem as you describe. Proof is in running a Skylake with DDR4 2133 or 3200. It provided barely any performance increase.
    You might be taking the the terrible ARM CPU memory subsystem/performance as guideline. You shouldn’t.
    If feeding the CPU was such a problem then a 5960X should run circles around a 6700K in single threaded loads with its quad channel memory controller. But it doesn’t.

    No, the big issue is ILP and how to accomplish it. Intel and Apple have very wide architectures. I don’t see them go wider any time soon. This is the whole thing of Soft Machines VISC and their solution to this.

    • Klimax
    • 7 years ago

    Just those highly unusual graphs (aka presentation of data) sounds warning alarm damn too loudly already. And rest of info doesn’t make it any better. Looks like heavily cherry picked data and comparisons.

    AMD marketing for Zen is in comparison is clarity itself….

    • NTMBK
    • 7 years ago

    Kanter posted his thoughts on RWT forums: [url<]http://www.realworldtech.com/forum/?threadid=156965&curpostid=156979[/url<] He sounds pretty sceptical.

    • vargis14
    • 7 years ago

    Pretty lofty goals…I doubt this will happen in 2018. But it is 2 years away, just not feeling it.

    By this measure, Soft Machines says the Shasta core is two to three times as efficient as a projected 3.23GHz Cortex-A72 when delivering the same performance. If the company’s Shasta+ chip arrives on a 10nm process in 2017 as expected, it could be about 4.5 times as efficient as Apple’s A9X CPU cores, while 2018’s Tahoe chip (running one virtual core on four physical cores) could be as much as seven times as efficient as a Core i5-6200U-class Skylake chip.

    • mesyn191
    • 7 years ago

    Yea they’re not really all that bad for most tasks people actually need to use, just not as good as Intel, and people don’t normally like settling for 2nd best unless its for a very good reason. Which AMD can’t really offer right now for their CPU’s.

    Current HBM’s latency is already pretty decent, comparable to system RAM but with crazy bandwidth. HBM2 should be similar or better. Hynix’s 2014 Hot Chips presentation has some latency numbers on slide 14 vs DDR3 and GDDR5. Too bad we probably won’t see it on a APU until 2017 at the earliest.

    • mesyn191
    • 7 years ago

    Those are details as to why it was a failure. But at least it got made.

    VISC doesn’t even exist yet as a product. Its all just software modelling at this point. There is no shake up of any sort going on.

    Its quite likely VISC’s “low latency hardware” + slightly modified ILP explicit ISA approach might end up being just as much of a failure as Itanium’s magic compilers.

    Also if you think Haswell/Skylake are “rigid single active thread designs” then you don’t know anything about them. The fact they have SMT alone shows that description to be naive. There are tons of things going on in the back round of these chips to improve ILP, IPC, and prefetch information to keep the chip fed.

    • Unknown-Error
    • 7 years ago

    Nice slides! Reminds me of AMD.

    • ronch
    • 7 years ago

    I’ll need to see it to believe it.

    For now, I’ll think of it as a bunch of pretty slides.

    • guardianl
    • 7 years ago

    Nvidia’s denver CPU core + dynamic binary translation layer is effectively the same solution as VISC, but they have the advantage of just scheduling for a single CPU core with many execution ports (i.e. doing 7 integer adds simultaneously) instead of multiple CPU cores (simpler in terms of cache latency management etc). We know how that turned out – OK in SPEC scores, not so good in real-world performance.

    I will be [b<]shocked[/b<] if this amounts to anything more than Denver 2.0. Nvidia has 20 years of software experience parallelizing serial code for their GPUs and yet they couldn't jump ahead on the perf/watt curve or absolute performance. Soft Machines VISC is effectively cold-fusion until proven independently otherwise.

    • willmore
    • 7 years ago

    Portable stuff doesn’t care how cheap power is at the wall.

    • robliz2Q
    • 7 years ago

    Plus Intel will be refining Skylake. Performance wise even if they are competitive, they’ll lag hugely on sales, never mind profits. It wasn’t so many years ago that AMD CPUs were superior performance, yet they still struggled with market share.

    • robliz2Q
    • 7 years ago

    But Itanic used VLIW and assumed scheduling magic via compiler tech which never realised.

    VISC, seems to be splitting up the tasks in low-latency hardware, as part of instruction decode and has access to run-time information that a compiler can rarely deduce. The fact it’s a shake up of various blocks of things done before is probably a good sign; though I don’t believe it’s going to accelerate single thread desktop performance in general. Surely that has to be good, for as much ILP as say Haswell/Skylake with wider but more rigid single active thread design.

    The 50% better IPC than Haswell, wasn’t so impressive when short pipeline and slow clock were considered in test samples, they benefit hugely from unrealistically low latency cache & RAM due to the CPU’s sluggish cycle time.

    • robliz2Q
    • 7 years ago

    To be fair though, even the kind of AMD CPU totally derided on forums like these (low power jaguar cores for instance) do an amazing fast job at stuff like calculating all 32bit prime numbers compared to early 32bit processors. I was just looking today, realised the L1 caches I use total bigger than the main memory of first computers I programmed, the L3 cache larger than a decent 32bit UNIX workstation. What staggers me, is how inefficient all this eye candy I see on screen must be, I don’t feel like I really have gained a lot.

    HBM on package is interesting, though right now it’s aimed at bandwidth for graphics rather than latency. Still, that alone will probably be the beginning of the end for discrete graphics cards, once it’s packaged with APUs.

    • robliz2Q
    • 7 years ago

    Well I read each tasklet has it’s own register renaming file, so they can run simultaneously without save/restore of context; indeed the Global part understands that one task could be dependant on another assigned to another core, a VISC physical core has fewer functional units than 8-way Haswell/Skylake. It seems to be like each physical core can be hyper-threaded, which reduces effect of memory stalls and ups throughput and each OS thread can be split onto several (but simpler) physical cores.
    So indeed, what is a thread/core is fuzzed 🙂

    I suspect simplicity is the key, a tasklet with number of micro-ops ought complete quickly, so the global scheduling can simply wait for tasklets to be done, then power down uneeded cores, without the OS scheduler needing to know a thing.

    Alot of the value, must be in the VISC micro-ops providing ILP without a great long pipeline. The early high IPC was helped by relatively short pipeline, but also a slowish clock, reducing comparative clock latencies.

    My gut hunch is.. it’s much more likely to work effectively in a phone, than as a competitor to Intel on desktop; though as they intend to license this technology, nothing to stop AMD or Intel moving in this direction. Actual evidence is the power curves, plotted against a < 7W scale.

    • mesyn191
    • 7 years ago

    Yup. Performance gains are stalling out with each new CPU design and it has nothing to do with a lack of hardware resources or clever tech. They just can’t feed these modern CPU’s fast enough or properly with existing cache and memory tech.

    That is why HBM and XPoint are such a big deal. Slapping a decent chunk (1GB or so) of either on die or on package with a fast low latency bus to the CPU could be a game changer.

    • mesyn191
    • 7 years ago

    The stuff that VISC is claiming to do though also requires a complex core though. And much of what they’re describing as new is actually already sorta has been done for years by OoOE cores without getting the performance and performance vs power gains they’re claiming. CPU designs that attempted to make explicit use of ILP, as Ubergerbil mentioned: Itanium, were not all that good and quite frankly were notoriously difficult to make use of outside of a few workloads.

    Many others have made big claims before with similar methods but at best delivered mediocre results. I wouldn’t go making any assumptions about what this will or won’t be a threat to at all until we see real world bench marks from a real actual shipping product. I sure as heck wouldn’t trust what the company says either until then.

    • synthtel2
    • 7 years ago

    [quote<][...] must be multiple streams of micro-ops being run as tasklets on each core.[/quote<] That's not hard for a particular instruction stream, but if it has to work on [i<]any[/i<] arbitrary instruction stream, the data dependency logic is hellish. If they've nailed that, then that's the real story here. By context switching I meant in the cores, not as something the OS sees. I'm pretty sure there's a more appropriate term, but I forgot it. After a bit more thought, it looks like they're treating their HW cores a lot like typical CPU designs treat execution units, so one good question might be what a more typical CPU design with more stuff in the EUs and less in the core at large would look like. It would presumably allow more EUs with less power overhead, which does seem to be roughly the idea here, but at the expense of quite a bit of die area? I wonder if they're sharing context in a physical area between the two cores so they can power gate the second very aggressively. That might nullify the context switching issues (for lack of a better term). One way or another, I have a feeling we're working with significantly fuzzier definitions of a core than usual.

    • robliz2Q
    • 7 years ago

    Except modern CPUs often wait 100’s of cycles on cache stalls, nevermind loading data from flash storage; don’t see how the virtual core layer can help with that. Probably most of the time in a phone say, 2-3 virtual cores end up just using 1 physical, allowing power saving.

    • robliz2Q
    • 7 years ago

    it’s fairly simple.. they have 1 virtual core in the benchmark on 2 physical, on a benchmark that’s very parallel, must be multiple streams of micro-ops being run as tasklets on each core. The simulated Skylake cores cannot even have hyper threading as the benchmark is single-threaded.

    As the OS only sees 1 thread, there’s NO context switching in that benchmark.

    The pitch is, simpler narrower cores than Skylake avoid complexity which costs power or the power disadvantages of high voltage at high clock rate. Some clever hardware scheduler splits virtual threads over physical cores that provide a pool of resources.

    How it works out on real software .. something like the Dalvik on Android is going to matter more than SPEC2006.

    • robliz2Q
    • 7 years ago

    The pitch doesn’t seem to be about absolute performance, but about staying on the linear parts of performance/watt curves, using “ideal” voltage and simpler narrower physical cores with shallower pipelines that pool resources.

    If it works out great, but extended battery life isn’t going to boost fps in those FPS’s

    • tipoo
    • 7 years ago

    Actually many CPU power curves I’ve seen have looked a lot like that. Plotting performance vs wattage on any given architecture does tend to have a nice curve.

    • robliz2Q
    • 7 years ago

    They’re not advocating 1 huge core, but some kind of thread slicing, dicing & re-assembling via a virtual layer, cores with shallower pipelines but pooling resources; rather than a very wide complex core like Haswell & Skylake. A complex core spends more and more on clever power managment, rather than doing work. So 1 Haswell core was getting ISC of about 1.5, 2 simpler virtual cores 2 but as you suspect that’s just an ISC per core of 1.

    Smaller frugal physical cores can mean more per die, BUT real world code outside HPC, games & graphics tends not to scale that well due to update lock contention. So the argument, seems to be more about performance/watt than blazing single thread performance.

    Perhaps this technique will be better for mobile & HPC than a threat to i7, modern processes don’t suit speed daemon style CPUs.

    • WhatMeWorry
    • 7 years ago

    Their graphs are too pretty. The curves are suspiciously perfect.

    • synthtel2
    • 7 years ago

    EDIT: Never mind, the threadlet splitting thing appears to be in hardware. I don’t have a clue now. Large parts of what’s below may not apply.

    I’m trying to figure out how exactly they think they can get so much higher single-threaded IPC than Intel at similar power/clocks. Looking at more traditional CPU design, wider cores have diminishing returns after a point (roughly where Intel is?) because instruction-level parallelism has limitations (depending on what code it’s running, of course). Just going with wider cores does keep increasing power use though, because of leakage and wire lengths. Hence the problem with things like POWER8 that use massive cores and rely on SMT to fully utilize them (since it doesn’t usually have enough ILP available to fully utilize a core with one thread).

    Splitting high-ILP workloads across smaller cores helps with wire lengths and makes things easier to power gate, so it should be easier on power consumption, but it seems like it would be a lot tougher to get the performance out of. For small segments of ILP-heavy code, the cores would need to be extremely fast at context switching. For larger segments, they already appear to be using a Denver-oid software translation layer, so they have lots of options available for creative use of cores…. [i<]but[/i<] this gets really tough to verify really quick. Actual programmers have enough trouble with use of multiple cores as is. I notice that the benchmark they're using has lots of long-running tasks with relatively few potential branches. [super<]1 2[/super<] They're mostly easier workloads for something Denver-oid. That kind of situation would also let the software layer spend more time optimizing a given chunk of code for whatever multi-core shenanigans they're doing. My prediction is that it will be very workload-dependent, a lot like Denver. It might do pretty well on tasks like a lot of those in SPEC (tho I'm still skeptical about that efficiency they're claiming), but I bet it will fall down in less repetitive code like a lot of OS stuff. [super<]1[/super<] [url<]https://www.spec.org/cpu2006/CINT2006/[/url<] [super<]2[/super<] [url<]https://www.spec.org/cpu2006/CFP2006/[/url<]

    • chuckula
    • 7 years ago

    Notice that they chose SPEC, which is quite parallelizable, for their comparison. I can see how making one “huge” core that behaves similarly to two [or more] regular cores in a benchmark that would already scale quite well to multiple cores would show strong results when you don’t let anyone else use more than one core.

    The stickier question is: How well does VISC do in a workload that does not scale well to multiple cores on other architectures? When it’s actual one core vs. one [huge] core, what is the real performance benefit?

    • blastdoor
    • 7 years ago

    I wonder how Denver would look on that graph, after the dynamic code optimizer has had a chance to digest the SPEC benchmark suite…

    • the
    • 7 years ago

    They’ve actually had functional hardware for testing since 2014 when they first did their major press release about the technology. They showed this off to investors but what they need to do is let the media have access to do their own independent testing so there claims can be verified.

    • UberGerbil
    • 7 years ago

    This is the key question. The design, as best I understand it, assumes there is latent parallelism that can be exploited sufficiently to not only offer that performance boost but to do so even while incurring the penalty of a translation layer. That seems… unlikely? I mean, it’s not like Intel is executing x86 directly either, anymore: they’re converting everything to their own internal µops and executing that. So on the one hand, you can argue that the translation layer isn’t expensive; but on the other hand, if there’s exploitable parallelism, Intel should already be picking up a good chunk of it with the functional unit pipelines they have in place.

    Itanium is an interesting data point — that architecture relied on the compiler to find the ILP, and it was rather rigid in how that could be exploited (instruction bundles, etc). And when that ILP actually delivered, it needed a huge (for its day) cache to keep all the pipes fed. Soft Machines clearly believes hardware at runtime can do a better job of finding ILP than software at compile time, and they might be right, but how much better? And to the extent it’s successful, won’t that require a lot of cache also? Granted cache is cheap, and easy to engineer, but it does eat die space.

    They may be exploiting other tricks, like the scout threads found in Rock, but those tend to play havok with power efficiency.

    I don’t know; this is way outside my area of expertise for anything beyond armchair analysis. But the history of these sorts of attempts is almost uniformly disappointing (Transmeta was actually maybe the most successful of them, and… yeah). Despite all the assurances on their slides, I can’t help but think they might be getting misled by their own internally developed simulator. The worst lies are those you tell yourself without knowing it. But I’d be happy to be wrong.

    • Rza79
    • 7 years ago

    It’s very important to remember that it’s outperforming the rest by such a margin on a [b<]single thread[/b<] because of the inherited advantages of the VISC architecture. The theory behind it is that VISC doesn't need two threads to maximize it's cores. As such, while the rest will basically double its performance with two threads, VISC won't (at least in theory). Meaning it will be a killer in responsiveness and single threaded apps.

    • brucethemoose
    • 7 years ago

    So, mid-late 2016 will be interesting.

    We get Zen on the desktop, new ARMv8 cores in phones, and this weird VISC thing… Whether they can compete with the established giants (Intel, Apple) is totally up in the air, but it should be fun to watch either way.

    • anotherengineer
    • 7 years ago

    I wonder if anyone would care or if it would get off the ground if electricity was 2 cents/kw.hr?

    Be interesting though if it comes to fruition and works as promised.

    • Peter.Parker
    • 7 years ago

    On paper, this looks amazing. Kinda like VW diesel emission levels…

    • faramir
    • 7 years ago

    Impressive vaporware is impressive.

    • jts888
    • 7 years ago

    It’s obvious that people should be skeptical of an unknown entity claiming a 2x perf/watt advantage against Intel, but does anyone know what the basic rationale for the claims are?

    I frankly don’t see how a 1.3/1.4 GHz CPU to beat a 2.7/2.8 GHz Skylake core in absolute performance, unless there was a crazily wide instruction issue/retire pipeline, which as seen with Itanium, is extremely hard to design and successfully implement.

    • UberGerbil
    • 7 years ago

    Skeptical. Still so skeptical. We’ve seen so many of these radical designs with radical claims of radical improvements over the established players. Would love to see one pan out and shake things up, but…. When it comes to another one, at this point I think we all [url=http://bloximages.newyork1.vip.townnews.com/stltoday.com/content/tncms/assets/v3/editorial/4/7d/47d1b3b7-2e2a-5f53-a8ec-efc8ec833653/538d063947f14.preview-620.jpg<]live in a metaphorical Missouri[/url<].

    • chuckula
    • 7 years ago

    SHASTA: Coolest codename ever. Only improvement: Name the followup chip McNasty.

    • Duct Tape Dude
    • 7 years ago

    Wow! With no hardware, this doesn’t change my view of VISC at all since last time!

    Still cannot decide if actually competitive… Need… updated podcast… with Kanter…

    • Aranarth
    • 7 years ago

    Agreed!

    • terminalrecluse
    • 7 years ago

    nothing to see here until you guys review one

Pin It on Pinterest

Share This

Share this post with your friends!