InferX | Serverless Inference Platform

Model CR-70B

Namespace	Model Name	Type	Standby GPU	Standby Pageable	Standby Pinned Memory	GPU Count	vRam (MB)	CPU	Memory (MB)	State	Revision
ActionAnalytics	CR-70B	text2text	Mem	File	File	4	71000	20.0	80000	Normal	79

Prompt

Sample Rest Call

Pods

Tenant	Namespace	Pod Name	State	Required Resource	Allocated Resource	GPU
public	ActionAnalytics	public/ActionAnalytics/CR-70B/79/137	Ready	CPU 20000 Mem 80000 CacheMem 0 GPU Type Any GPU Count 4 GPU vRam 71000 GPU Contexts 0	Node Name g8398d4 CPU 20000 Memory 78976 Cache Memory 1024	GPU Type NVIDIA H100 80GB HBM3 vRam 71168 Slot Size 268435456 Total Slot Count 285 Max Context Per GPU 1
public	ActionAnalytics	public/ActionAnalytics/CR-70B/79/142	Standby	CPU 20000 Mem 80000 CacheMem 0 GPU Type Any GPU Count 4 GPU vRam 71000 GPU Contexts 0	Node Name g8398d4 CPU 0 Memory 0 Cache Memory 0	GPU Type NVIDIA H100 80GB HBM3 vRam 0 Slot Size 268435456 Total Slot Count 285 Max Context Per GPU 1

Logs

tenant	namespace	model name	revision	id	node name	create time	exit info	state
public	ActionAnalytics	CR-70B	79	82	g8398d4	2026-03-01 22:47:46	None	log

Snapshot History

tenant	namespace	model name	revision	nodename	state	detail	updatetime
public	ActionAnalytics	CR-70B	79	g8398d4	Scheduled	Scheduled	2026-03-01 22:36:58
public	ActionAnalytics	CR-70B	79	g8398d4	Done	Done	2026-03-01 22:47:46

Model Spec

{
    "image": "vllm/vllm-openai:v0.9.0",
    "commands": [
        "--model",
        "ActionAnalytics/CR-70B",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.95",
        "--max-model-len",
        "15360",
        "--tensor-parallel-size=4"
    ],
    "resources": {
        "GPU": {
            "Count": 4,
            "vRam": 71000
        }
    },
    "envs": [
        [
            "HF_HUB_OFFLINE",
            "1"
        ],
        [
            "TRANSFORMERS_OFFLINE",
            "1"
        ]
    ],
    "sample_query": {
        "apiType": "text2text",
        "path": "v1/completions",
        "prompt": "write a quick sort algorithm.",
        "prompts": [
            "Write a Python function that computes Fibonacci numbers. Explain time complexity.",
            "Translate the following Chinese text to English: \u4eca\u5929\u5929\u6c14\u5f88\u597d\u3002",
            "Explain general relativity in simple language.",
            "Write a legal contract clause about liability and indemnification.",
            "Summarize the plot of a fantasy novel involving dragons.",
            "Solve this calculus integral: \u222b x^3 log(x) dx",
            "Generate a JSON schema describing a user profile.",
            "Explain why emojis like \ud83d\ude00\ud83d\udd25\ud83d\ude80 represent byte-level tokens."
        ],
        "dataUrl": "",
        "body": {
            "max_tokens": "1000",
            "model": "ActionAnalytics/CR-70B",
            "stream": "true",
            "temperature": "0"
        },
        "loadingTimeout": 90
    }
}

InferX — Serverless GPU Inference Platform for Production Workloads