Personal computing discussed
Moderators: renee, morphine, SecretSquirrel
Chrispy_ wrote:Same again on the Pitcairn, doesn't seem to wake up the GPU from low-power state for some reason....
GPU selected Pitcairn
Op = Add
Time 1.92994ms GOps/s 34.7725
Op = Mul
Time 3.76447ms GOps/s 17.8269
Op = Fma
Time 3.71207ms GOps/s 18.0786
Chrispy_ wrote:....and It won't run on an optimus laptop, regardless of whether I force Intel or nVidia graphics.
"Device ID is not in range"
biffzinker wrote:Not working for me shows "device id not in range."
Edit: Tried running it on a Geforce 560 GTX 2GB
Device selected Cypress
Op = Add
Time 1.39822ms GOps/s 47.9958
Op = Mul
Time 2.1868ms GOps/s 30.6882
Op = Fma
Time 2.22738ms GOps/s 30.1291
Device selected Cypress
Op = Add
Time 0.476667ms GOps/s 140.788
Op = Mul
Time 0.611111ms GOps/s 109.814
Op = Fma
Time 0.650222ms GOps/s 103.209
Orwell wrote:Tried running it on a 5850 with Catalyst 13.4:
The program was unable to start correctly (0xc000007b). Click OK to close the application.
Okay.
GPU selected Cypress
Op = Add
Time 145.021ms GOps/s 473.859
Op = Mul
Time 284.319ms GOps/s 241.698
Op = Fma
Time 284.232ms GOps/s 241.773
Orwell wrote::P
I blindly increased the problem size at line 60 to 4096*4096 (from 1024*16 I believe) and it looks much better now:Code: Select allGPU selected Cypress
Op = Add
Time 145.021ms GOps/s 473.859
Op = Mul
Time 284.319ms GOps/s 241.698
Op = Fma
Time 284.232ms GOps/s 241.773
Compiled with GCC 4.7.1 x64 bundled with this: http://sourceforge.net/projects/orwelldevcpp/, using the OpenCL libraries provided by AMD.
Orwell wrote:Here's the end result for my 5850:
This proves the add part in FMA is basically free on this card. Yay. Here's the source code:
http://wilcobrouwer.nl/bestanden/MaxFLOPS%20Orwell.7z
So, yes, I was bored.
D:\>maxflopscl
Device selected GeForce GTX 560
Device compute units: 7
Device extensions:
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl
_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_option
s cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_ato
mics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr
_local_int32_extended_atomics cl_khr_fp64
FP64 supported with configuration: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEA
REST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
Testing DP performance
Op = Add
Time 157.156ms GOps/s 54.6588
Op = Mul
Time 157.154ms GOps/s 54.6593
Op = Fma
Time 157.637ms GOps/s 54.4918
D:\>
biffzinker wrote:D:\>maxflopscl
Device selected GeForce GTX 560
Device compute units: 7
Device extensions:
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl
_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_option
s cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_ato
mics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr
_local_int32_extended_atomics cl_khr_fp64
FP64 supported with configuration: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEA
REST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
Testing DP performance
Op = Add
Time 157.156ms GOps/s 54.6588
Op = Mul
Time 157.154ms GOps/s 54.6593
Op = Fma
Time 157.637ms GOps/s 54.4918
D:\>
560 running 985/1970
Device selected Tahiti
Device compute units: 28
Device extensions:
cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_image2d_from_buffer
FP64 supported with configuration: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
Testing DP performance
Op = Add
Time 165.452ms GOps/s 830.686
Op = Mul
Time 164.605ms GOps/s 417.48
Op = Fma
Time 164.678ms GOps/s 417.296
JustAnEngineer wrote:Radeon HD7950Device selected Tahiti
Device compute units: 28
Device extensions:
cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_image2d_from_buffer
FP64 supported with configuration: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
Testing DP performance
Op = Add
Time 165.452ms GOps/s 830.686
Op = Mul
Time 164.605ms GOps/s 417.48
Op = Fma
Time 164.678ms GOps/s 417.296
maxflopscl >max.txt
Device selected Cayman
Device compute units: 24
Device extensions:
cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_amd_image2d_from_buffer_read_only
FP64 supported with configuration: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
Testing DP performance
Op = Add
Time 104.77ms GOps/s 655.91
Op = Mul
Time 103.034ms GOps/s 333.479
Op = Fma
Time 103.164ms GOps/s 333.058
Device selected GeForce GTX 460
Device compute units: 7
Device extensions:
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
FP64 supported with configuration: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
Testing DP performance
Op = Add
Time 107.804ms GOps/s 39.8405
Op = Mul
Time 107.73ms GOps/s 39.868
Op = Fma
Time 107.988ms GOps/s 39.7726