Personal computing discussed

Moderators: renee, Flying Fox, morphine

 
Noinoi
Gerbil Team Leader
Posts: 280
Joined: Fri Jun 26, 2015 11:31 pm
Location: Sabah, Malaysia

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Tue Jan 22, 2019 9:46 pm

setaG_lliB wrote:
jihadjoe wrote:
End User wrote:
Don't hurt me!

What is love?

I think it's bitrate that's really important. Those old procs will be a lot less happy when given a 50Mbps file (basically UHD Bluray raws). And to complicate stuff further NVENC seems to be capable of partially accelerating video decode even on Kepler era cards. I remember running DXVA checker on a GTX670 and it showed a significant amount of stuff as being hardware accelerated. Will post a screenie when I pull that rig out.

At first, I was concerned that the GTX 680 and 970 GPUs in some of the machines was partially off-loading the decode process. However that doesn't seem to be the case.

Using the C2D E8600-based machine, I tested VP9 playback with the GMA X4500HD IGP, a GTX 680, GTX 970, and GTX 1080Ti. Only the GTX 1080Ti significantly lowered CPU usage and made 2160p/60 completely smooth.

Same results with HEVC playback. CPU usage was reduced only when the GTX 1080Ti was installed (MPC-HC's playback status also indicated that HW was being used--which it didn't do with the IGP, GTX 680, and GTX 970).


The last paragraph interests me - as you're using a 10-bit video, have you tried 8-bit?
i5-9600K@4.9 | Patriot 2x16GB | Asus GTX 970 | Aorus Z390 Pro Wifi | Intel 660p 512GB + Kingston Fury 240GB + 2x4TB WD HDDs | Win 10
 
meerkt
Gerbil Jedi
Posts: 1699
Joined: Sun Aug 25, 2013 2:55 am

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Wed Jan 23, 2019 1:05 pm

setaG_lliB wrote:
VP9 playback with the GMA X4500HD IGP, a GTX 680, GTX 970, and GTX 1080Ti. Only the GTX 1080Ti significantly lowered CPU usage

Apparently VP9 and H265 are supported by GM206 (GTX 950, some GTX 960) but not earlier GM20x (other GTX 9xx).
 
jihadjoe
Gerbil Elite
Posts: 835
Joined: Mon Dec 06, 2010 11:34 am

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Wed Jan 23, 2019 6:09 pm

Ok, as promised here are my results. I think the "no acceleration" thing probably only applies to VP9 or HEVC 10bit files, as Kepler seems to offload HEVC 8bit just fine despite having no fixed-function hardware.

First, DXVA checker listing the GTX670 as definitely accelerating HEVC_VLD_Main:
Image

Next is GPU-Z and Task Manager while playing a 1080P 16Mbps HEVC file. Notice how GPU load is around 30%, suggesting that it is being accelerated, but probably using the shaders instead of fixed-function hardware.
Image

Now a 4k 35Mbps file. GPU load goes up to 60%, while CPU load is about the same as teh 1080P 16Mbps clip, suggesting the decode is being done almost entirely by the GPU:
Image
 
setaG_lliB
Gerbil First Class
Topic Author
Posts: 144
Joined: Wed Mar 03, 2010 6:02 pm

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Tue Mar 05, 2019 6:47 pm

As I've finally started backing up my small but growing UHD BD collection to the file server, I've been able to do some high bit rate 4K HEVC testing.

Some notes on testing:
-I packaged the audio and video streams into an MKV file. No re-encoding was done.
-Playback was through MPC-HC 1.7.13 x64
-The video track is HEVC 10-bit @ 3840x2160p24 with HDR10 and an average bit rate of 68 mbps.
-1.78:1 aspect ratio = no letterboxing to lighten CPU load
-DTS:X audio (the base audio layer is DTS-HD MA 7.1, 24-bit/48KHz)
-For testing, I took the average CPU load during 1 minute of playing a scene with lots of motion.

CPUs tested so far:

Core 2 Quad Q6600 (4c/4t Kentsfield overclocked to 3.6GHz, SSSE3, dual DDR3-1600)
w/ GTX 970 (software based decoding in MPC-HC): 100% CPU usage, not playing at the correct speed
w/ GTX 1080Ti (hardware based decoding in MPC-HC): 4-12%

Phenom II x4 980 Black Edition (4c/4t Deneb overclocked to 4GHz, SSE4A (which approximately 0 programs use, so for all intents and purposes this is more of an SSSE3 CPU), dual DDR3-1600)
w/ GTX 970: 100%, not playing back at the correct speed
w/ GTX 1080Ti: 2-9%

Core i5-2400 (4c/4t Sandy Bridge locked @ 3.4GHz, SSE4.2/AVX, dual DDR3-1333)
w/ GTX 970: 56-76%
w/ GTX 1080Ti: 1-3%

Core i5 4670 (4c/4t Haswell @ 3600-3700MHz, SSE4.2/AVX2, dual DDR3-1600)
w/ GTX 970: 42-60%
w/ GTX 1080Ti: 0-2%

Core i7 4930K (6c/12T Ivy Bridge-E overclocked to 4.6GHz, SSE4.2/AVX, quad DDR3-2400)
w/ GTX 970: 12-18%
w/ GTX 1080Ti: 0%

Ryzen 1700 (8c/16T overclocked to 4.1GHz, SSE4.2/AVX2, dual DDR4-3000)
w/ GTX 970: 9-14%
w/ GTX 1080Ti: 0%

486-DX2 @ 80MHz w/ 128MB of RAM (had to max out the RAM to prevent it from completely running out of memory)
Time for Opera to display a single 3840x2160 frame taken from the video (in .jpg format)
26 minutes, 10 seconds

Some observations:
-It's pretty satisfying to see a crusty old Sandy Bridge i5 not only play UHD BD content, but also handle it without hardware assist. Not long ago Intel was boasting that only 7th gen CPUs could pull off such a feat. ;)
-Looks like HEVC is much harder to decode than VP9. Looking at my table on the first page, even a 3.4GHz i5-3470 can handle 4k, 60fps VP9 without hardware acceleration. However, based on these 24fps HEVC results, I'm gonna go ahead and guess that not even a 3.7GHz i5-4670 would be able to handle HEVC @ 4k/60.
-As I've mentioned before, it seems that HEVC decoding benefits greatly from SSE4.1. Sandy Bridge @ 3.4 is much faster than Kentsfield @ 3.6. I do have a Yorkfield based C2Q w/SSE 4.1 kicking around somewhere that can handle 4.2GHz. If/when I find it, I'll have to plug it into my LGA775 board and see how it does!
-I just realized that the 486 may have had an easier time drawing that massive jpg through a DOS image viewer, instead of through a web browser running on top of Win98. Oops.
 
Juzu
Gerbil In Training
Posts: 1
Joined: Tue Aug 27, 2019 6:52 pm

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Tue Aug 27, 2019 7:06 pm

I have been trying to optimize my ancient laptop to playback VP9 (or youtube) on lightweight Linux.

The laptop is 2ghz Celeron from 2007, with 2gb of DDR2 666mhz. It has some ATI mobile GPU (I don't remember the model at the moment, but that shouldn't matter).

Is it possible to use hardware acceleration in some way to speed up VP9 decoding, deblocking or deringing or up/downscaling (due to having 1400x900 display) etc. part for example?

This ancient laptop has absolutely great display for it's age, and because it has 2gb, it would be really usable Linux laptop for my mother.
The most demanding task she does is to watch youtube videos, which currently work at about 15-30fps, depending from the size of the windows or if I use full screen.

I'm willing to compile everything from scratch with even the development version of GCC 10 to gain every bit of performance out, if I know there is a real
change to get it working. But I suspect without any kind of help from GPU, it's not going to work?

If there are faster software decoders in certain browsers, or possibly even as a module that could be compiled within the browser, I might be willing to compromise to get 30fps in the theater mode of youtube. If that doesn't work, I will have to buy my mother a new laptop. Which sucks, because that old laptop would probably last with her at least 5 years.
 
Concupiscence
Gerbil Elite
Posts: 707
Joined: Tue Sep 25, 2012 7:58 am
Location: Dallas area, Texas, USA
Contact:

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Tue Aug 27, 2019 9:10 pm

Juzu wrote:
I have been trying to optimize my ancient laptop to playback VP9 (or youtube) on lightweight Linux.

The laptop is 2ghz Celeron from 2007, with 2gb of DDR2 666mhz. It has some ATI mobile GPU (I don't remember the model at the moment, but that shouldn't matter).

Is it possible to use hardware acceleration in some way to speed up VP9 decoding, deblocking or deringing or up/downscaling (due to having 1400x900 display) etc. part for example?

This ancient laptop has absolutely great display for it's age, and because it has 2gb, it would be really usable Linux laptop for my mother.
The most demanding task she does is to watch youtube videos, which currently work at about 15-30fps, depending from the size of the windows or if I use full screen.

I'm willing to compile everything from scratch with even the development version of GCC 10 to gain every bit of performance out, if I know there is a real
change to get it working. But I suspect without any kind of help from GPU, it's not going to work?

If there are faster software decoders in certain browsers, or possibly even as a module that could be compiled within the browser, I might be willing to compromise to get 30fps in the theater mode of youtube. If that doesn't work, I will have to buy my mother a new laptop. Which sucks, because that old laptop would probably last with her at least 5 years.


Your best bet for prolonging that laptop's life is the browser extension h264ify, which will force YouTube to serve H.264 video. That should work fine on the laptop, and while it takes more bandwidth, it comes with the bonus of not draining the battery, stuttering, or making it too hot to sit on your lap.
Science: Core i9 7940x, 64 gigs RAM, Vega FE, Xubuntu 20.04
Work: Ryzen 5 3600, 32 gigs RAM, Radeon RX 580, Win10 Pro
Tinker: Core i5 2400, 8 gigs RAM, Radeon R9 280x, Xubuntu 20.04 + MS-DOS 7.10

Read me at https://www.wallabyjones.com/
 
MileageMayVary
Gerbil XP
Posts: 370
Joined: Thu Dec 10, 2015 9:18 am
Location: Baltimore

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Wed Aug 28, 2019 10:00 am

setaG_lliB wrote:
486-DX2 @ 80MHz w/ 128MB of RAM (had to max out the RAM to prevent it from completely running out of memory)
Time for Opera to display a single 3840x2160 frame taken from the video (in .jpg format)
26 minutes, 10 seconds

...
-I just realized that the 486 may have had an easier time drawing that massive jpg through a DOS image viewer, instead of through a web browser running on top of Win98. Oops.


:lol: :lol: :lol:
Main rig: Ryzen 3600X, R9 290@1100MHz, 16GB@2933MHz, 1080-1440-1080 Ultrasharps.
 
Concupiscence
Gerbil Elite
Posts: 707
Joined: Tue Sep 25, 2012 7:58 am
Location: Dallas area, Texas, USA
Contact:

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Wed Aug 28, 2019 10:09 am

MileageMayVary wrote:
setaG_lliB wrote:
486-DX2 @ 80MHz w/ 128MB of RAM (had to max out the RAM to prevent it from completely running out of memory)
Time for Opera to display a single 3840x2160 frame taken from the video (in .jpg format)
26 minutes, 10 seconds

...
-I just realized that the 486 may have had an easier time drawing that massive jpg through a DOS image viewer, instead of through a web browser running on top of Win98. Oops.


:lol: :lol: :lol:


Ah, those simpler pre-SIMD, pre-superscalar, low-clocked days of yore. That's just brutal. Hell, I thought it was cruel when this happened.
Science: Core i9 7940x, 64 gigs RAM, Vega FE, Xubuntu 20.04
Work: Ryzen 5 3600, 32 gigs RAM, Radeon RX 580, Win10 Pro
Tinker: Core i5 2400, 8 gigs RAM, Radeon R9 280x, Xubuntu 20.04 + MS-DOS 7.10

Read me at https://www.wallabyjones.com/
 
The Egg
Minister of Gerbil Affairs
Posts: 2938
Joined: Sun Apr 06, 2008 4:46 pm

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Wed Aug 28, 2019 10:18 am

I've spoken on it before, but the first PC I called my own was an AMD 5x86 133mhz, which was a 486 on steroids, and it actually had standard PCI slots. Wish I still had it so I could max-out the RAM, get as modern a GPU as I could in there, and then screw around.


On the topic of granny's laptop.....I'm starting to think this use-case would be much better serviced with an old/low-end iPad in combination with a bluetooth keyboard.
 
ferit
Gerbil In Training
Posts: 1
Joined: Fri Jan 24, 2020 9:42 pm

Re: Let's take a look at the VP9 decoding performance of 20 different processors!

Fri Jan 24, 2020 9:45 pm

shouldn't these all be cpu's only though? I just ran a 4K VP9 video on youtube on a i5-4460 with no gpu and it was between 60-90% usage, with 3% of frames dropped. Couldn't tell though, was very smooth. 16 GB RAM and SSD.

Who is online

Users browsing this forum: No registered users and 2 guests
GZIP: On