Open Source · Free

sshop doesn't complete tasks. It runs functions.

A fleet of Tailors takes your business objectives and runs them continuously — adapting, improving, and acting. No babysitting. No finish line.

Most AI tools wait for instructions. You still need someone to give them.

Give sshop an objective once — a margin floor, a competitive position, a product signal to watch — and the Tailors run it. Every hour. Every day. Getting better each time.

Not a task. A function.

What you deploy

Entire functions. Running continuously.

Software Development

Today

Engineers handle bug fixes, architecture decisions, dependency upgrades, and large-scale refactors manually. A major refactor is a sprint. A breaking dependency is a day of careful patching.

With sshop

After ingesting your codebase, Tailors understand your architecture, conventions, and test suite. Dependency upgrades, refactors, and bug hunts run as thousands of parallel branches — each scored by running your tests. The winning approach arrives as a reviewed, tested PR.

Live Pricing System

Today

A pricing analyst monitors competitors, demand signals, and conversion rates. They propose changes, test them, and measure impact.

With sshop

Tailors monitor the same signals continuously. They propose changes, run A/B tests against real traffic, measure revenue impact, and adjust. Your team sets the margin floor. The Tailors handle the rest — 24/7.

Product Discovery Pipeline

Today

A product manager synthesizes support tickets, churn surveys, and competitor updates into a prioritized roadmap.

With sshop

Tailors ingest all of it, generate feature hypotheses, build testable mockups, and surface the highest-impact candidates weekly. Your team reviews the outputs, not the inputs.

Supply Chain Operations

Today

A procurement manager tracks stock levels, drafts purchase orders, manages supplier relationships, and reconciles inbound deliveries.

With sshop

Tailors monitor stock velocity and lead times. They send purchase orders at the right moment, flag supplier issues, and reconcile deliveries. Your team handles exceptions.

Before execution

Tailors know your systems before the first branch runs.

An ingest phase happens before task execution. Agents crawl your accessible systems and build a rich knowledge base that every Tailor queries at runtime.

01

Discovery

Agents crawl your accessible systems — Git repos, docs, API specs, database schemas. You control what they can reach.

02

Guardrails

You select what to ingest and set access controls: read-only, rate limits, scope boundaries. Nothing runs without your sign-off.

03

Knowledge build

Everything extracted into a knowledge graph, searchable corpus, and embeddings. The structure of your systems becomes queryable.

04

Seed context

When Tailors run, they query this knowledge. They know your conventions, your architecture, your current implementation before writing a single line.

Why this matters

  • Tailors understand your codebase, architecture, and conventions before the first line is written.
  • Eval agent writes sharper scoring criteria — it knows what 'good' looks like for your company specifically.
  • No hallucinated APIs. No phantom conventions. Every branch is grounded in your actual schemas and docs.
  • 'Run our entire pricing system' works because Tailors already understand your current implementation.

Process

Fully autonomous, end to end.

Task description → eval criteria generation → parallel execution → scoring → next generation. Zero human configuration at any step.

01

Describe the function

One sentence. Natural language. 'Manage our inventory reordering.' That's it.

02

Eval writes its own criteria

No eval script to write. The Eval agent derives its own success metrics from your description — informed by your ingested knowledge graph. It knows what 'good' looks like for your company.

03

Thousands of branches run

Not 10, not 50 — thousands of independent Tailors attempt solutions in parallel on Pinstripes. Each queries your domain knowledge corpus. No cross-branch communication. No echo chambers.

04

Eval scores automatically

Every branch is scored against the criteria the Eval agent wrote. No human review required. Selection is deterministic.

05

Winner seeds next generation

The best branch becomes the baseline. The next generation starts from there. Score is guaranteed ≥ previous — monotone by construction.

score(gen n+1) ≥ score(gen n) — guaranteed by construction

The guarantee

Monotone improvement. Formal, not aspirational.

Each generation seeds from the previous winner. The next generation cannot start below the current best score. Your system only moves forward.

This isn't a probabilistic claim. It's a property of the architecture. Beam search over a scored population with elitist selection gives you monotone improvement by construction.

∀n: score(gen n+1) ≥ score(gen n)

∀n: best(gen n+1) ≥ best(gen n)

monotone by elitist beam selection

Benchmark — Task 077

Baseline
0.448
Gen 1
0.556
Gen 2
0.59
Gen 3
0.621
+38.6%

Three generations. Thousands of branches per generation. Each generation guaranteed ≥ previous.

Visualized

Thousands of branches. One winner. Repeat.

Every generation, thousands of independent Tailors attempt solutions in parallel. Eval scores each. The best beam seeds the next round — guaranteed at or above the current best.

EVAL: success criteria auto-generatedfunction description0.3910.512★ 0.6210.4480.503· · · thousands more · · ·gen 1gen 2 (≥ 0.621)gen 2· · ·

Usage

One command. Then it runs forever.

terminal

$ sshop ingest --source git,confluence,postgres

Discovery: crawling accessible systems...

847 source files (git)

234 doc pages (confluence, read-only)

23 API schemas (openapi)

12 DB schemas (postgres, read-only)

Knowledge build...

12,400 embeddings · 3,847-node graph

Domain context ready

$ sshop deploy "optimize our checkout conversion rate"

Eval agent generating success criteria...

[domain context loaded — pricing system architecture understood]

14 metrics derived · baseline: 0.448

Gen 1: 2,048 Tailors on Pinstripes...

Best: 0.621 (+38.6%) guarantee ≥ 0.448

Gen 2: seeding from 0.621...

Best: 0.651 (+45.3%) guarantee ≥ 0.621

Gen 3: running [0.667 best · 2,048 active]_

Transparency

Trust it to run unsupervised

Every decision the Tailors make is logged, auditable, and backtrackable. If something looks wrong, you see exactly what it did and why. You're not trusting a black box — you're reading a paper trail.

You set the margin floor, the risk tolerance, the criteria that matter. The Tailors optimise within those bounds, and every step is visible.

── checkout-conversion / gen 3 ─────

14:02:01 EVAL criteria: 14 metrics active

14:02:03 GEN 3 2,048 Tailors spawned

14:02:31 Tailor #1847 scored 0.667 ← leading

14:02:31 Tailors #0-2047 except #1847 dropped

14:02:31 score(gen 3) ≥ score(gen 2)

14:02:34 GEN 4 seeding from Tailor #1847

──────────────────────────────────

runtime: 31s/gen cost: $0.041/gen

Built on Pinstripes

Thousands of Tailors actually fire simultaneously. Beam search is real parallelism — not 2,048 requests queued one at a time. Pinstripes has no rate limits, no concurrency caps. You pay per token.

The unlimited concurrency isn't a nice-to-have. It's load-bearing. Queue any branch and you've broken the model. We built orchestration and inference at the same company so they work together.

concurrent branches
$5/mo
starting price
1B+
tokens/week (Starter)

See it in action

Compare that to the team you'd otherwise need.

sshop is free. Open source. Runs on Pinstripes starting at $5/month.