I have a question to people that are knowledgeable on the subject... about Pascal's FP16 rate on consumer GPUs. I know that GP100 supports the ability to run two FP16 ops in the same time as one FP32 op, but GP102,4,6,8 do not as far as I'm aware. They are 1/64th the speed.
My question is: is this a hardware
limitation or a software one? I was talking to someone about it and they seem to think that Consumer Pascal GPUs can do 2:1 FP16/32 rate but Nvidia just hasn't enabled it in the drivers. I note that even the Tesla P40 cannot do 2x FP16