Exam NCA-GENM Topic 1 Question 165 Discussion

Actual exam question for NVIDIA's NCA-GENM exam
Question #: 165
Topic #: 1
You are deploying a text-to-speech application using NVIDIA Riv
a. The application needs to handle a large volume of concurrent requests with minimal latency. Which of the following Riva deployment configurations would be MOST appropriate?

Suggested Answer: C Vote an answer

For high-throughput, low-latency applications, deploying Riva across multiple GPUs using Triton Inference Server is optimal. Triton enables dynamic batching, which groups incoming requests to maximize GPU utilization, and allows for scaling across multiple GPUs to handle increased load. Riva leverages gRPC to communicate with Triton.

by Phoenix at Nov 05, 2025, 06:06 AM

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Nick name: Submit Cancel
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

0
0
0
10