Benchmark

LLM latency baseline on March 30, 2026

This page records a short public request against the OpenAI-compatible chat endpoint on `probqa.com`. The purpose is to show the live request path, token accounting, and end-to-end latency for a small completion, not to claim maximum throughput.

Request

Public API call used for the baseline

curl https://probqa.com/v1/chat/completions \
  -H "Authorization: Bearer cgs_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-4-Scout-17B-16E-Instruct",
    "messages": [
      {"role": "system", "content": "Be concise and useful."},
      {"role": "user", "content": "Give a one-sentence description of an online SAT solver."}
    ],
    "max_tokens": 96
  }'

Result

Measured output

Transport

HTTP status: `200`
Response size: `778 bytes`

Usage

Prompt tokens: `32`
Completion tokens: `38`
Total tokens: `70`

Preview

“An online SAT (Satisfiability) solver is a web-based tool that takes a Boolean satisfiability problem as input and returns a solution or determines that the problem is unsatisfiable.”

Interpretation

Why this matters to buyers

The public API path is live and responds with OpenAI-style JSON.
Token accounting is visible and usable for prepaid billing.
This is a small-latency baseline you can compare against your own prompts.
For production decisions, test with your own prompt mix and response-length targets.

Next Step

Use the same curl shape on your own workload

Trial credits let you test the exact OpenAI-compatible surface with your own prompts, your own system messages, and your own token budgets.

Start with trial credits Open LLM playground