HelpAPI Docs

Trigger crawls & pull data out as JSON.

Kick off a fresh crawl when you deploy, then read every audit’s data — pages, issues, broken links, images, scripts and CSS — through a simple HTTP API. Point a script or an AI agent at the results, and fix issues automatically.

Beta API endpoint:https://eu1.website-toolkit.co.uk

Quick start

The API does two things: start a crawl for a domain, and read the data a crawl produced. Here’s one of each, plus pulling the broken links.

# Your API key (ask us for one for the beta service)
KEY="your-api-key"
BASE="https://eu1.website-toolkit.co.uk"

# 1. Start a fresh crawl of a domain
curl -s -X POST -H "Authorization: Bearer $KEY" \
 "$BASE/api/crawl-site/example.com"

# 2. Pull the issues from the most recent crawl
curl -s -H "Authorization: Bearer $KEY" \
 "$BASE/api/v1/example.com/latest/issues"

# 3. Pull just the broken links
curl -s -H "Authorization: Bearer $KEY" \
 "$BASE/api/v1/example.com/latest/broken-links"

Authentication

Every request needs your API key. Keys are issued per account; a revoked key stops working immediately. There are two ways to send it:

  • Authorization header (recommended): Authorization: Bearer your-api-key. Headers aren’t written to access logs, and GitHub Actions automatically masks the secret.
  • Token in the URL: For callers that can only fire a plain URL (some webhooks, uptime pingers). Convenient, but the token can appear in server logs.

Your data stays yours.

The read API only serves crawls that belong to your account. Free, public audits are not retrievable through the API. Requests for someone else’s data return a 404.

Trigger a crawl

Start a full crawl of a domain — e.g. straight after a deploy. Open by design: use GET or POST, with the key in either the header or the URL.

POST/api/crawl-site/{domain}— key in header
GET/api/crawl-site/{domain}— key in header
POST/api/crawl-site/{token}/{domain}— key in URL
GET/api/crawl-site/{token}/{domain}— key in URL
# Header auth (recommended)
curl -s -X POST -H "Authorization: Bearer $KEY" \
 "$BASE/api/crawl-site/example.com"

# Token-in-URL (for webhooks / pingers that can't set headers)
curl -s "$BASE/api/crawl-site/your-api-key/example.com"

On success you get 202 Accepted:

{ "queued": true, "domain": "example.com", "url": "https://example.com" }

Trigger from GitHub Actions

The cleanest setup: a workflow that fires the trigger on every deploy, with the key stored as a repository secret. GitHub redacts the secret from run logs automatically.

name: Re-crawl on deploy
on:
  push:
    branches: [main] # or: workflow_run, after your deploy job

jobs:
  trigger-crawl:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger a fresh crawl
        run: |
          curl -sS -X POST \
            -H "Authorization: Bearer ${{ secrets.CRAWLER_KEY }}" \
            --fail-with-body \
            "https://eu1.website-toolkit.co.uk/api/crawl-site/example.com"

List crawls for a domain

GET/api/v1/{domain}

{domain} is the site’s hostname, e.g. example.com. Returns up to 25 crawls, newest first, plus the list of request types you can ask for.

{
  "domain": "example.com",
  "types": ["pages","issues","links","images","css","scripts",
           "broken-links","broken-images","broken-css","broken-scripts",
           "words","summary"],
  "crawls": [
    {
      "crawlId": "8f1c2e4a-1b2c-4d3e-9f8a-7b6c5d4e3f2a",
      "startTime": "2026-06-20T09:00:00.000Z",
      "endTime": "2026-06-20T09:04:12.000Z",
      "status": "Completed",
      "pages": 128,
      "brokenPages": 1, "brokenLinks": 4,
      "brokenImages": 0, "brokenCss": 0, "brokenJs": 0,
      "warnings": 7, "mode": "advanced"
    }
  ]
}

Get the data feed

GET/api/v1/{domain}/{crawlId}/{type}
  • {crawlId} — A crawl ID from the discovery call, or the word latest.
  • {type} — One of the request types below.

Pagination for list types: ?page=1&pageSize=100. Default 100, max 1000.

{ "total": 128, "page": 1, "pageSize": 100, "rows": [ /* … */ ] }

Request types & columns

TypeWhat you get
pagesEvery crawled page: status, title, H1, load time, size, redirect target, SEO issues.
issuesDetected problems: type, the page it was found on, a recommendation, destination.
linksEvery link checked: link URL, owning page, link text, exists flag.
imagesEvery image: image URL, page, alt text, size, exists flag.
cssEvery stylesheet: URL, page, size, exists flag.
scriptsEvery script: URL, page, location, size, exists flag.
broken-linksOnly the links whose target failed.
broken-imagesOnly the images that failed to load.
broken-cssOnly the stylesheets that failed.
broken-scriptsOnly the scripts that failed.
wordsSite-level content-integrity word list (not crawl-scoped).
summaryThe crawl’s headline metrics (single object, not paginated).

Column reference

pages : content_status, content_url, page_title, heading_1, load_time_ms,
        size_kb, page_redirects_to, seo_issues
issues : id, issue_type, found_on_url, recommendation, destination, screenshot_path
links : link_exists, link_url, owning_page_url, link_text, link_advisory, screenshot_path
images : image_exists, image_url, image_page_url, image_alt, size_kb
css : css_exists, css_url, css_page_url, size_kb
scripts : script_exists, script_url, script_page_url, script_location, size_kb

The *_exists columns are "Yes"/"No"-style strings; the broken-* types are pre-filtered to failures.

The summary type

{
  "crawlId": "8f1c2e4a-…",
  "summary": { /* timings, counts, mode, … */ },
  "broken": 5,
  "warnings": 7
}

The words type

{
  "domain": "example.com",
  "total": 1432,
  "rows": [
    { "word": "checkout", "status": "approved",
      "first_seen_url": "https://example.com/cart", "flagged_pages": null }
  ]
}

Examples

List broken links to fix

curl -s -H "Authorization: Bearer $KEY" \
  "$BASE/api/v1/example.com/latest/broken-links" \
  | jq '.rows[] | {link_url, owning_page_url}'

Node.js — Pull every issue across all pages

const BASE = 'https://eu1.website-toolkit.co.uk';
const KEY = process.env.CRAWLER_API_KEY;
const headers = { Authorization: `Bearer ${KEY}` };

async function fetchAll(domain, crawlId, type) {
  const out = [];
  for (let page = 1; ; page++) {
    const url = `${BASE}/api/v1/${domain}/${crawlId}/${type}?page=${page}&pageSize=500`;
    const r = await fetch(url, { headers });
    if (!r.ok) throw new Error(`${type} ${r.status}`);
    const { rows, total, pageSize } = await r.json();
    out.push(...rows);
    if (page * pageSize >= total) break;
  }
  return out;
}

const issues = await fetchAll('example.com', 'latest', 'issues');
console.log(`${issues.length} issues`);

Python — Summary + broken resources

import os, requests

BASE = "https://eu1.website-toolkit.co.uk"
H = {"Authorization": f"Bearer {os.environ['CRAWLER_API_KEY']}"}

def get(domain, crawl, kind):
    r = requests.get(f"{BASE}/api/v1/{domain}/{crawl}/{kind}", headers=H, timeout=30)
    r.raise_for_status()
    return r.json()

summary = get("example.com", "latest", "summary")
print("broken:", summary["broken"], "warnings:", summary["warnings"])

for kind in ("broken-links", "broken-images", "broken-css", "broken-scripts"):
    print(kind, "→", get("example.com", "latest", kind)["total"])

Error responses

All errors are JSON: { "error": "…" }.

StatusMeaning
202Crawl trigger accepted — the crawl is queued.
400Invalid domain, invalid crawlId, unknown type, or the domain isn’t reachable.
401Missing or unrecognised API key.
403Crawl trigger: your key isn’t permitted to crawl that domain.
404No data for the domain, unknown crawl, or data that isn’t yours.
429Crawl trigger: your plan’s crawl allowance is exhausted.
500Something went wrong our end.

Need help or a new feature?

Need an API key for the beta service, or want a request type that isn’t here yet? We’d love to hear what you’re building.