Exam NCA-GENM Topic 1 Question 165 Discussion
Actual exam question for NVIDIA's NCA-GENM exam
Question #: 165
Topic #: 1
Question #: 165
Topic #: 1
You are deploying a text-to-speech application using NVIDIA Riv
a. The application needs to handle a large volume of concurrent requests with minimal latency. Which of the following Riva deployment configurations would be MOST appropriate?
a. The application needs to handle a large volume of concurrent requests with minimal latency. Which of the following Riva deployment configurations would be MOST appropriate?
Suggested Answer: C Vote an answer
For high-throughput, low-latency applications, deploying Riva across multiple GPUs using Triton Inference Server is optimal. Triton enables dynamic batching, which groups incoming requests to maximize GPU utilization, and allows for scaling across multiple GPUs to handle increased load. Riva leverages gRPC to communicate with Triton.
by Phoenix at Nov 05, 2025, 06:06 AM
0
0
0
10
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).