Nvidia improves virtual graphics monitoring in its latest Grid update

— 8:26 AM on August 24, 2016

Nvidia first took the wraps off its Tesla M10 high-density server graphics card (can we call them virtualizable GPUs or vGPUs yet?) back in May, and it's now shipping those cards in volume. The company says system administrators will soon be able to get Tesla M10s in 23 systems from its major hardware partners, including Dell, Asus, Gigabyte, and ASRock Rack, and that number should only grow from here on out. To help sysadmins cope with the hundreds of virtual users that might soon reside on M10s within an organization, Nvidia is also updating its Grid software platform with deeper performance profiling and analytics tools for planning, deployment, and support of virtual GPU users.

Nvidia's improved management tools address both host (server) managment and virtual client monitoring. With the new Grid software, admins will be able to get information about the number of virtual graphics instances in use and the number they can potentially create. They'll also be able to see usage information for the stream processors on board each card, the percentage of the card's frame buffer that's in use, and the load on each card's dedicated video encode and decode hardware. For each guest vGPU instance, admins can now get information on encoder and decoder usage, frame buffer occupancy, and the vGPU utilization.

The updated Grid platform isn't just about monitoring. Nvidia says vGPU provisioning today is to some degree a matter of guesswork, and it thinks the data it's exposing about vGPU usage through Grid will let system administrators tailor their virtual user profiles better when they're planning a vGPU deployment. That data-driven planning could help prevent admins from giving too big a slice of the vGPU to folks in accounting, for example, and it might also avoid under-allocation of a card for CAD or pro graphics users. That more precise provisioning could help organizations keep vGPU deployment costs down.

With this data, Nvidia thinks admins will also enjoy the ability to monitor the way users are actually using vGPUs in production. That information will presumably allow admins to fine-tune their user profiles in deployment and make sure that each user class is getting the right amount of resources they need for the best performance. Better provisioning could reduce user support calls, as well, but if a support incident should arise, that same data may also let an organization's helpdesk figure out the causes of performance issues and fix them faster. Nvidia thinks those operational improvements will also help lower costs.

Nvidia isn't keeping Grid monitoring data inside a walled garden, either. The company says that the new performance data available through Grid will immediately be available through Windows tools like Perfmon and its own NV-SMI utility, and it'll also integrate with monitoring tools from eG Innovations, Lakeside, and Liquidware Labs. Organizations with their own proprietary monitoring tools will also be able to grab data from Grid using the platform's SDK.

Both Tesla M10 servers and the August 2016 Grid software update should be available immediately.

