Route user prompt requests through a prefill engine to compute attention tables, then feed activations to a decode engine to satisfy the throughput target.
Connects inference engines and routes token data streams from inputs to outputs.
Route user prompt requests through a prefill engine to compute attention tables, then feed activations to a decode engine to satisfy the throughput target.
Connects inference engines and routes token data streams from inputs to outputs.