# Pinstripes

> Pinstripes is an OpenAI-compatible LLM inference API with no rate limits and no per-account concurrency caps. You can run hundreds of agents simultaneously and each gets its own dedicated throughput.

Homepage: https://pinstripes.io
API endpoint: https://api.pinstripes.io/v1
Docs: https://pinstripes.io/docs
Models: https://pinstripes.io/models
Benchmarks: https://pinstripes.io/benchmarks
Pricing: https://pinstripes.io/pricing

---

## What Pinstripes is

Pinstripes provides inference on open-weight models via an API that is fully compatible with the OpenAI SDK. Switching requires changing two lines: `base_url` and `api_key`. Streaming, function calling, structured output, and all standard OpenAI client features work without modification.

All models are served with no rate limits. Spinning up 500 parallel agents costs no more per token than running one. Pricing is pay-per-token with no minimum, no subscription, and no seat fees.

Infrastructure runs on cloud GPU nodes in the EU and USA only. Requests are processed and immediately deleted — nothing is logged, stored, or used for training.

---

## Models and Pricing (as of June 2026)

Prices in USD per 1 million tokens.

| Model              | Provider     | Model ID                    | Input   | Cached  | Output  | Speed      | Context |
|--------------------|--------------|-----------------------------|---------|---------|---------| -----------|---------|
| Qwen3.6 35B        | Qwen/Alibaba | ps/qwen3.6-35b-a3b-dflash   | $0.140  | $0.040  | $0.450  | 244 tok/s  | 131K    |
| GLM 4.5 Air        | Zhipu AI     | ps/glm-4.5-air              | $0.120  | $0.035  | $0.450  | 244 tok/s  | 131K    |
| Qwen3 30B          | Qwen/Alibaba | ps/qwen3-30b-a3b            | $0.090  | $0.025  | $0.200  | 183 tok/s  | 131K    |
| GPT-OSS 120B       | OpenAI       | ps/gpt-oss-120b             | $0.040  | $0.010  | $0.400  | 183 tok/s  | 128K    |
| Qwen3 Coder 30B    | Qwen/Alibaba | ps/qwen3-coder-30b-a3b      | $0.300  | $0.090  | $0.600  | 156 tok/s  | 131K    |
| MiniMax M2.7       | MiniMax      | ps/minimax-m2.7             | $0.260  | $0.080  | $0.550  | 156 tok/s  | 1M      |

---

## Benchmarks

Measured on Qwen3 30B A3B (ps/qwen3-30b-a3b), single NVIDIA A6000, 512-token prompt.

| Concurrency | TTFT p50 | Throughput (per agent) | Aggregate throughput |
|-------------|----------|------------------------|----------------------|
| c=1         | 70ms     | 105 tok/s              | 105 tok/s            |
| c=8         | 142ms    | ~80 tok/s              | ~640 tok/s           |
| c=16        | 155ms    | ~40 tok/s              | 643 tok/s            |
| c=32        | 273ms    | —                      | —                    |

Full benchmark methodology and raw data: https://pinstripes.io/benchmarks

---

## Price Comparison vs Competitors

### Qwen3 30B (output per 1M tokens)
- Pinstripes: $0.20
- Groq: $0.60 (3× more expensive)
- Together AI: $0.90 (4.5× more expensive)
- Fireworks AI: $0.90 (4.5× more expensive)

### Frontier model output (per 1M tokens)
- Pinstripes range: $0.20–$0.60
- OpenAI GPT-4.1: $15.00 (25–75× more expensive)
- Anthropic Claude: $15.00 (25–75× more expensive)
- OpenAI GPT-4o: $10.00 (17–50× more expensive)

### Typical agentic workflow cost (50K input + 10K output tokens)
- OpenAI GPT-4.1: ~$1.85
- Pinstripes Qwen3 30B: ~$0.006
- Saving: ~300× cheaper for comparable quality

---

## Concurrency and Rate Limits

Pinstripes has no rate limits and no per-account concurrency caps. The API accepts as many simultaneous requests as your workload requires. A fleet of 1,000 parallel agents each running at 244 tok/s produces approximately 11 million words per minute in aggregate.

Failed nodes are detected and replaced automatically. All capacity management is handled server-side with no action required from the user.

---

## API Quick Start

```python
from openai import OpenAI

client = OpenAI(
    api_key  = "sk-ps-...",
    base_url = "https://api.pinstripes.io/v1",
)

response = client.chat.completions.create(
    model    = "ps/qwen3-30b-a3b",
    messages = [{"role": "user", "content": prompt}],
    stream   = True,
)
```

API keys: https://pinstripes.io/signup
Full documentation: https://pinstripes.io/docs

---

## On-Premises Deployment

Pinstripes offers deployment of its full inference software stack on customer hardware. Suitable for regulated industries, air-gapped environments, or teams with existing GPU infrastructure. Contact: enterprise@pinstripes.io

## Pinstripes Pass (upcoming)

A subscription tier for individual users — higher message limits at lower cost than ChatGPT Plus or Claude Pro. Early interest: pass@pinstripes.io