InferX | Serverless Inference Platform

Model translategemma-27b-it-FP8-Dynamic

Namespace	Model Name	Type	Standby GPU	Standby Pageable	Standby Pinned Memory	GPU Count	vRam (MB)	CPU	Memory (MB)	State	Revision
Trial	translategemma-27b-it-FP8-Dynamic	text2text	File	File	File	1	32000	20.0	80000	Normal	112

Prompt

Sample Rest Call

Pods

Tenant	Namespace	Pod Name	State	Required Resource	Allocated Resource	GPU
public	Trial	public/Trial/translategemma-27b-it-FP8-Dynamic/112/138	Ready	CPU 20000 Mem 80000 CacheMem 0 GPU Type Any GPU Count 1 GPU vRam 32000 GPU Contexts 0	Node Name g8398d4 CPU 20000 Memory 78976 Cache Memory 1024	GPU Type NVIDIA H100 80GB HBM3 vRam 32000 Slot Size 268435456 Total Slot Count 285 Max Context Per GPU 1
public	Trial	public/Trial/translategemma-27b-it-FP8-Dynamic/112/144	Standby	CPU 20000 Mem 80000 CacheMem 0 GPU Type Any GPU Count 1 GPU vRam 32000 GPU Contexts 0	Node Name g8398d4 CPU 0 Memory 0 Cache Memory 0	GPU Type NVIDIA H100 80GB HBM3 vRam 0 Slot Size 268435456 Total Slot Count 285 Max Context Per GPU 1

Logs

tenant	namespace	model name	revision	id	node name	create time	exit info	state
public	Trial	translategemma-27b-it-FP8-Dynamic	112	115	g8398d4	2026-03-01 23:07:06	None	log

Snapshot History

tenant	namespace	model name	revision	nodename	state	detail	updatetime
public	Trial	translategemma-27b-it-FP8-Dynamic	112	g8398d4	Scheduled	Scheduled	2026-03-01 22:58:59
public	Trial	translategemma-27b-it-FP8-Dynamic	112	g8398d4	Done	Done	2026-03-01 23:07:06

Model Spec

{
    "image": "vllm/vllm-openai:v0.9.0",
    "commands": [
        "--model",
        "/root/.cache/huggingface/hub/models--kaitchup--translategemma-27b-it-FP8-Dynamic/snapshots/6bdba8b2d7d690be008ce87348f2ce4be2e8a876",
        "--disable-custom-all-reduce",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.95",
        "--max-model-len",
        "2048",
        "--chat-template",
        "/root/.cache/huggingface/hub/models--kaitchup--translategemma-27b-it-FP8-Dynamic/snapshots/6bdba8b2d7d690be008ce87348f2ce4be2e8a876/chat_template.jinja",
        "--enable-prefix-caching",
        "--prefix-caching-hash-algo",
        "builtin"
    ],
    "resources": {
        "GPU": {
            "Count": 1,
            "vRam": 32000
        }
    },
    "envs": [
        [
            "HF_HUB_OFFLINE",
            "1"
        ],
        [
            "TRANSFORMERS_OFFLINE",
            "1"
        ]
    ],
    "sample_query": {
        "apiType": "text2text",
        "path": "v1/completions",
        "prompt": "write a quick sort algorithm.",
        "prompts": [],
        "dataUrl": "",
        "body": {
            "max_tokens": "1000",
            "stream": "true",
            "temperature": "0"
        },
        "loadingTimeout": 90
    },
    "policy": {
        "Obj": {
            "queue_timeout": 30.0,
            "scalein_timeout": 1.0
        }
    }
}

InferX — Serverless GPU Inference Platform for Production Workloads