How to test a Tesla card?

Wed Aug 30, 2017 4:32 pm

I've been battling with a problem with a K40c Tesla card and a laser based flow cytometer for a client. The software that provides the analysis can leverage the Tesla card to speed up plotting.

We paired the Tesla card with a Dell Precision T7910 system as Dell specifically said it was compatible.

All seemed good, got it on-site and installed the software. While using the software we realized it wasn't harnessing the Tesla card (it was slow and it reports on the engine being used), it was suggested we downrev the CUDA toolkit to 7.5. I did a fresh install of the driver that includes this version and rebooted. After the reboot device manager shows the Tesla card couldn't start, specifically:

This device cannot start. (Code 10)

Insufficient system resources exist to complete the API

This was not the case before attempting to install the different version of drivers. Nothing I did seemed to resolve it, except for when I switched PCIE slots. This Dell has 2 full x16 slots, the other is populated with an el cheapo quadro card. Whenever I mess with the drivers this happens again, and I can fix it by removing the card booting up again, shutting down and putting the card back in. Then it shows up fine in device manager.

I mention this because this leaves the possibility there is a compatibility issue or a problem with the card, I'd like to find a way to rule this out.

I can force the software to try and use the Tesla card, but if I do the program crashes. The vendor is currently looking into the problem but they are taking forever and they swear they've never once had a problem in the hundreds of other deployments.

Since the Tesla card doesn't have display capability, it makes testing a bit interesting. Anyone know of a way to see if the card is functioning as intended?
