Documentation

HTML API

Overview

Here is the list of the different parameters you can use with SparkProxy's HTML API.

You can also discover this API using our Postman collection covering every SparkProxy feature.

All Parameters

Quick reference of every parameter supported by the API.

Parameter	Description
`api_key` [string]required	Your API key. Pass it via the X-API-Key request header.
`url` [string]required	The URL of the page you want to scrape.
`render_js` [boolean](true)	Render the page with a headless Chromium browser. Set to false for a plain HTTP fetch (1 credit, 3× faster).
`js_scenario` [JSON object]({})	JavaScript actions to execute after page load: click, fill, scroll, wait, evaluate, screenshot.
`wait_for` [string]("")	CSS selector to wait for before capturing the page. Gives up after 30 seconds.
`wait` [integer](0)	Extra seconds to wait after the page load event before capturing (max 30). Requires render_js=true.
`wait_for_and_click` [string]("")	Wait for a CSS selector to appear in the DOM, then click it before capturing. Requires render_js=true.
`scroll` [boolean](true)	Auto-scroll after load to trigger infinite scroll and lazy-loaded images. Requires render_js=true.
`human` [boolean](true)	Simulate human-like mouse movement and random interaction delays. Requires render_js=true.
`block_resources` [boolean](false)	Block images, fonts, stylesheets, and media for faster loads. Requires render_js=true.
`block_ads` [boolean](false)	Block ad networks and tracking scripts. Requires render_js=true.
`window_width` [integer](1280)	Viewport width in pixels (200–7680). Requires render_js=true.
`window_height` [integer](720)	Viewport height in pixels (200–4320). Requires render_js=true.
`premium_proxy` [boolean](false)	Route through a residential proxy from the premium pool. 10 credits (no JS) or 25 credits (with JS).
`country_code` [string]("")	Exit through a proxy in a specific country. Pass an ISO 3166-1 alpha-2 code (e.g. US, GB, DE). +5 credits.
`stealth` [boolean](false)	Add extra stealth layers: homepage pre-warm, forced Google referrer, extended idle delays. +5 credits.
`own_proxy` [string]("")	Route through your own proxy. Accepts ip:port, ip:port:user:pass, http://user:pass@host:port, or socks5://.
`proxy_type` [string]("")	Select from file-based proxy pools. Accepts premium or ad_free.
`forward_headers` [JSON object]({})	Custom HTTP headers merged with browser defaults. Any key you set overrides the corresponding default.
`referrer` [string]("")	Override the HTTP Referer header sent to the target. Requires render_js=true.
`format` [string]("html")	Response format: html, md, mdx, json, screenshot, or pdf. screenshot and pdf require render_js=true.
`json_response` [boolean](false)	Wrap the response in a JSON envelope with job_id, status_code, duration_ms, and base64-encoded body.
`transparent_status_code` [boolean](false)	Mirror the target page's HTTP status code in the API response. Only applies to render_js=false.
`session_id` [string]("")	Label for the browser profile in per-job logs. Does not persist cookies across requests.
`cookies` [JSON array]([])	Inject cookies into the browser before the page loads. Each cookie needs name, value, and domain.
`device` [string]("desktop")	Emulate a device type: desktop, mobile, or tablet. Sets viewport, User-Agent, and touch events.
`custom_ua` [string]("")	Override the browser's User-Agent string entirely.
`callback_url` [string]("")	Webhook URL for async mode. Returns 202 immediately; POSTs the full result to your URL on completion.
`extract_rules` [JSON object]({})	Pull structured data from the page using CSS selectors. Returns an extracted object instead of raw HTML.
`tag` [string]("")	Custom label attached to the request and echoed in the response for grouping and filtering.

Quick Start

Send your first request in under 60 seconds. Select your language above, then paste the snippet.

Get your API key

Send your first request

Pass the target URL and your API key. The API returns the fully-rendered HTML of the page.

Python

import requests

API_KEY = "YOUR_API_KEY"

response = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": API_KEY},
    params={"url": "https://example.com", "render_js": "true"}
)
print(response.text)

Inspect the response

By default the response body is the raw HTML of the page. Use json_response=true to get metadata alongside the content.

// Add json_response=true to get metadata alongside content
// GET /api/v1?url=https://example.com&render_js=true&json_response=true

{
  "job_id": "job_abc123",
  "result_url": "https://scrape.sparkproxy.io/api/v1/files/job_abc123",
  "format": "html",
  "expires_at": "2026-06-27T16:00:00.000Z",
  "status_code": 200,
  "duration_ms": 3840,
  "meta": {
    "title": "Example Domain",
    "description": "This domain is for use in illustrative examples."
  },
  "credits_used": 5
}

Authentication

You can pass your API key in three ways. We recommend the header.

Method	Example	Notes
Header	`X-API-Key: YOUR_KEY`	Recommended
Query param	`?api_key=YOUR_KEY`	Easy for quick tests
Body field	`{ "api_key": "YOUR_KEY" }`	POST requests only

GETPOSThttps://scrape.sparkproxy.io/api/v1

API Reference

Full parameter documentation with code examples in 7 languages.

URL

urlstring · required

The page you want to scrape. Needs to be a full http:// or https:// URL. We block private and internal addresses. For batch scraping, pass multiple comma-separated URLs — see the Batch Scraping section.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={"url": "https://example.com"},
)
print(r.text)

Batch Scraping

Pass multiple comma-separated URLs in the url parameter to scrape them all in one request. You get back a results array with one entry per URL, each containing its own status, content, and timing data.

Heads up: Batch mode only works with render_js=false (plain HTTP fetch). The whole batch costs 1 credit regardless of URL count. For headless scraping, send separate requests.

Response shape

{
  "job_id":      "abc123",
  "duration_ms": 1840,
  "credits_used": 1,
  "tag":         "my-label",          // only present when tag param was provided
  "results": [
    { "url": "https://example.com",       "success": true,  "httpStatus": 200,  "body": "...", "error": null },
    { "url": "https://example.com/about", "success": false, "httpStatus": null, "body": null,  "error": "Connection refused" }
  ]
}

Python

import requests

urls = [
    "https://example.com",
    "https://example.com/about",
    "https://example.com/contact",
]

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":       ",".join(urls),
        "render_js": "false",  # required for batch mode
    },
)
data = r.json()
for result in data["results"]:
    print(result["url"], result["httpStatus"], result["success"])

Headless Browser

SparkProxy uses a headless Chromium browser by default. The following parameters control how the browser behaves.

JS Rendering

render_jsboolean · default: true

When true (the default), Chromium fully renders the page: JavaScript runs, dynamic content loads, and SPAs hydrate. Set it to false to skip the browser and do a plain HTTP fetch instead. That's 3x faster and costs 1 credit instead of 5.

Python

import requests

# render_js=false → plain HTTP fetch (1 credit, faster)
r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={"url": "https://example.com", "render_js": "false"},
)
print(r.text)

JS Execution

js_scenarioJSON object · optional · +5 credits · render_js=true · max 50 instructions

A sequence of browser actions that run after the page loads. Pass an object with an instructions array (max 50 entries). Each instruction is an object where the key is the action name. Great for clicking buttons, filling forms, or triggering lazy-loaded content before capture.

Key	Value	Description
`click`	`CSS selector`	Wait for element then click it
`wait_for`	`CSS selector`	Pause until element appears in DOM (30 s timeout)
`fill`	`{"selector": "...", "value": "..."}`	Type into an input field
`wait`	`milliseconds (number)`	Pause for N milliseconds (capped at 30 000 ms)
`scroll`	`pixels (number)`	Scroll down by N pixels
`evaluate`	`JS string`	Run arbitrary JavaScript on the page
`screenshot`	`true`	Capture a JPEG at this point in the sequence

Python

import requests
import json

# Each instruction is { "action_key": value } — NOT { "type": "...", "key": value }
scenario = {
    "instructions": [
        { "click":    "#cookie-banner-close" },
        { "wait_for": ".main-content" },
        { "scroll":   800 },
        { "wait":     1500 },
        { "fill":     { "selector": "#search", "value": "laptop" } },
        { "click":    "#search-submit" },
        { "evaluate": "window.scrollTo(0, 0)" },
    ]
}

r = requests.post(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY", "Content-Type": "application/json"},
    json={
        "url":         "https://example.com",
        "js_scenario": scenario,
    },
)
print(r.text)

Wait for Selector

wait_forstring · optional · render_js=true

A CSS selector the scraper waits for before taking the page. The browser stays open until that element shows up in the DOM. Handy for pages with skeleton screens or lazy-loaded content. Gives up after 30 seconds.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":      "https://example.com",
        "wait_for": "#main-content",   # wait until element is in DOM
    },
)
print(r.text)

Wait for Browser

waitinteger (seconds) · default: 0 · max: 30 · render_js=true

Extra seconds to wait after the browser's load event before capturing the page. Use this when content appears after page load but there's no reliable CSS selector to wait for.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":  "https://example.com",
        "wait": "3",   # wait 3 seconds after load event
    },
)
print(r.text)

Blocking Images & CSS

block_resourcesboolean · default: false · render_js=true

block_adsboolean · default: false · render_js=true

block_resources=true blocks images, fonts, stylesheets, and media so pages load faster and use less bandwidth. block_ads=true cuts out ad networks and tracking scripts.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":             "https://example.com",
        "block_resources": "true",  # block images, fonts, CSS
        "block_ads":       "true",  # block ad/tracking scripts
    },
)
print(r.text)

Viewport Size

window_widthinteger · 200–7680 · default: 1280 · render_js=true

window_heightinteger · 200–4320 · default: 720 · render_js=true

Set the browser viewport size in pixels. Go wider for desktop layouts, narrower to test responsive breakpoints. For tall full-page screenshots, bump up window_height or use device=mobile.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":           "https://example.com",
        "format":        "screenshot",
        "window_width":  "1920",
        "window_height": "1080",
    },
)
# r.content is a JPEG image
with open("screenshot.jpg", "wb") as f:
    f.write(r.content)

Wait and Click

wait_for_and_clickstring · optional · render_js=true

Waits for a CSS selector to appear in the DOM, then clicks it before capturing the page. Great for dismissing cookie banners, closing modals, or triggering content hidden behind a gated click. Times out after 30 seconds if the selector never appears.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":                 "https://example.com",
        "wait_for_and_click":  "#accept-cookies",  # dismiss banner then capture
    },
)
print(r.text)

Scroll & Human Mode

scrollboolean · default: true · render_js=true

humanboolean · default: true · render_js=true

scroll=true automatically scrolls the page after load to trigger infinite scroll and lazy-loaded images. human=true simulates human-like mouse movement and random interaction delays, helping bypass sites that fingerprint cursor behaviour. Both are on by default — set either to false to disable.

Python

import requests

# Disable scroll and human simulation (e.g. for fast static pages)
r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":    "https://example.com",
        "scroll": "false",
        "human":  "false",
    },
)
print(r.text)

Proxies

SparkProxy routes requests through a pool of rotating proxies. Use these parameters to control proxy type, location, and anonymity level.

Premium Proxy

premium_proxyboolean · default: false · 10 credits (no JS) · 25 credits (with JS)

Routes the request through a residential proxy from our premium pool. These are real ISP IPs that are much harder to detect and block. Worth using on sites with aggressive anti-bot defenses.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":           "https://example.com",
        "premium_proxy": "true",
    },
)
print(r.text)

GeoLocation

country_codestring (ISO 3166-1 alpha-2) · optional · +5 credits

Exit the request through a proxy in a specific country. Pass an ISO 3166-1 alpha-2 code like US, GB, DE, or JP. Great for geo-restricted content or comparing prices across regions. Adds 5 credits on top of the base cost regardless of rendering mode.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":          "https://example.com",
        "country_code": "US",  # route through US proxy
    },
)
print(r.text)

Stealth Mode

stealthboolean · default: false · +5 credits · requires render_js=true

Every request already includes randomised browser fingerprints, stealth Chromium patches, human-like mouse movement, and WebRTC leak prevention. Enabling stealth=true adds three extra layers on top: a homepage pre-warm visit before the target page, a forced Google referrer, and extended idle delays (1.5–3.5 s vs 0.5–2 s). Use it on the toughest targets like e-commerce, travel, and finance sites.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":     "https://example.com",
        "stealth": "true",
    },
)
print(r.text)

Own Proxy

own_proxystring · optional

Route the request through your own proxy server. Accepted formats: ip:port, ip:port:user:pass, http://user:pass@host:port, or socks5://host:port. SOCKS5 requires render_js=true.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":       "https://example.com",
        "own_proxy": "http://user:[email protected]:8080",
    },
)
print(r.text)

Proxy Type

proxy_typestring · optional · premium | ad_free

Select from your server's file-based proxy pools defined in proxies.config.js. Accepts premium or ad_free. Can be combined with country_code to pick a country-specific entry from the chosen pool. proxy_type=premium costs 10 credits without JS or 25 credits with JS (same tier as premium_proxy=true); ad_free has no extra credit cost.

Python

import requests

# premium pool (+5 credits) — high-quality residential IPs from proxies.config.js
r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":        "https://example.com",
        "proxy_type": "premium",       # or "ad_free" (no extra credits)
        "country_code": "US",          # optional: pick US entry from the pool
    },
)
print(r.text)

Headers

Control the HTTP headers sent to the target site.

Forward Headers

forward_headersJSON object · optional

Send custom HTTP headers to the target URL. Pass a JSON object of key/value pairs. Your headers are merged with the browser defaults, and any key you set will override that default.

Python

import requests
import json

headers_to_forward = {
    "Accept-Language": "en-US,en;q=0.9",
    "Referer":         "https://google.com",
}

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":            "https://example.com",
        "forward_headers": json.dumps(headers_to_forward),
    },
)
print(r.text)

Pure Header Forwarding

Need complete control over every header, including User-Agent, Accept-Language, and Referer? Set them all in forward_headers and we'll send exactly what you provide, no additions from our side.

Python

import requests
import json

# Override all key headers to appear as a real Chrome browser
custom_headers = {
    "User-Agent":      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
    "Accept":          "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Referer":         "https://www.google.com/",
    "DNT":             "1",
}

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":             "https://example.com",
        "forward_headers": json.dumps(custom_headers),
    },
)
print(r.text)

Custom Referrer

referrerstring · optional · max 1000 chars · render_js=true

Sets the HTTP Referer header on the request. Some sites check this to decide whether to serve content or show a paywall. Pass a plausible origin URL like a search engine result page to make the request look like it came from organic traffic. When omitted, the API automatically picks a realistic referrer from a weighted pool (Google, Bing, DuckDuckGo, Reddit, Facebook, or none) so you get varied, natural-looking traffic without doing anything.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":      "https://example.com/article",
        "referrer": "https://www.google.com/search?q=example",
    },
)
print(r.text)

Response Format

formatstring · default: html

Controls what the API returns. The result file is stored and served via /api/v1/files/:jobId. screenshot and pdf require render_js=true — the API returns 422 if you combine them with render_js=false. md, mdx, and json are rendered properly only with render_js=true; with render_js=false the raw HTTP response body is returned (no error — the format parameter is effectively ignored in plain HTTP mode).

html

Fully-rendered HTML page source

md

Markdown — stripped to clean readable text (render_js=true)

mdx

MDX file extension, same Markdown content as md (render_js=true)

screenshot

Full-page JPEG screenshot, quality 80 (render_js=true)

pdf

PDF printout of the page (render_js=true)

json

Structured JSON: headings, links, images, JSON-LD, and page meta (render_js=true)

Python

import requests

# Screenshot example: response body is a JPEG
r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":    "https://example.com",
        "format": "screenshot",  # html | md | mdx | screenshot | pdf | json
    },
)
with open("page.jpg", "wb") as f:
    f.write(r.content)

json_responseboolean · default: false

Returns a JSON envelope instead of streaming the content directly. The response shape is { "job_id", "status_code", "duration_ms", "body", "encoding": "base64", "credits_used" } — the page content is base64-encoded in body. Headless mode additionally includes a meta object with page title, description, and other metadata. Both status_code and meta may be null if the page did not respond normally. If a tag was provided, it is also echoed back in the envelope.

transparent_status_codeboolean · default: false · render_js=false only

When true, the API HTTP status mirrors the target page's status (e.g. 404 if the target 404s). Only applies to plain HTTP mode (render_js=false). Headless mode always returns 200 on success or 530 on failure.

Session

session_idstring · optional · max 128 chars · render_js=true

Labels this job's browser profile in per-job logs. Each request gets a fresh browser profile regardless of the session_id value — cookies and login state do not persist across separate API requests.

Python

import requests

SESSION = "my-session-abc123"

# Step 1: Log in
requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":        "https://example.com/login?user=alice&pass=secret",
        "session_id": SESSION,
    },
)

# Step 2: Scrape authenticated page with same session
r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":        "https://example.com/dashboard",
        "session_id": SESSION,
    },
)
print(r.text)

Timeout

waitinteger (seconds) · default: 0 · max: 30 · render_js=true

Extra seconds the browser waits after the page load event before capturing. Capped at 30. The page navigation timeout itself is 90 seconds on the first attempt and up to 3 automatic retries at 120 s and 180 s — all handled server-side. If all attempts fail, the job returns 530 and credits are refunded.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":  "https://example.com",
        "wait": "10",  # wait up to 10s after page load (max 30)
    },
)
print(r.text)

Custom Cookies

cookiesJSON array · optional · render_js=true

Inject cookies into the browser before the page loads. Pass a JSON array of cookie objects. Each cookie requires name, value, and domain (e.g. "example.com"). You can also set path, httpOnly, secure, and sameSite.

Python

import requests
import json

# domain is required — without it Playwright rejects the cookie
cookies = [
    {"name": "auth_token", "value": "eyJhbGciOiJIUzI1NiJ9...", "domain": "example.com", "path": "/"},
    {"name": "user_pref",  "value": "dark-mode",               "domain": "example.com", "path": "/"},
]

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":     "https://example.com/dashboard",
        "cookies": json.dumps(cookies),
    },
)
print(r.text)

Devices

devicestring · default: desktop · render_js=true

Emulate a device type. This sets the viewport size, User-Agent, and touch events to match that device.

desktop·1280 × 720, desktop UA (default)mobile·390 × 844, iPhone UAtablet·768 × 1024, iPad UArandom·Randomly selects desktop, mobile, or tablet with equal probability

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":    "https://example.com",
        "device": "mobile",  # desktop | mobile | tablet | random
    },
)
print(r.text)

Custom User-Agent

custom_uastring · optional · max 500 chars

Override the browser's User-Agent string entirely. When set, the API ignores the device pool and uses exactly the string you provide. The viewport is still determined by the device parameter. Useful for mimicking a specific browser version, a known bot, or any custom identity.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":       "https://example.com",
        "custom_ua": "Mozilla/5.0 (compatible; MyBot/1.0; +https://mysite.com/bot)",
    },
)
print(r.text)

Webhooks

callback_urlstring · optional · https recommended

Fire-and-forget mode. Pass a callback_url and the API returns 202 Accepted immediately with the job ID — no waiting for Chromium to finish. When the job completes (or fails), SparkProxy POSTs the full result JSON to your URL. For headless requests (render_js=true), the payload mirrors the synchronous response plus a top-level success boolean. For plain HTTP requests (render_js=false), the webhook always delivers a JSON envelope with the page content base64-encoded in a body field — regardless of the json_response parameter.

202 response (immediate)

{ "job_id": "abc123", "status": "queued" }

Webhook POST body (on completion)

{
  "success":     true,
  "job_id":      "abc123",
  "result_url":  "https://scrape.sparkproxy.io/api/v1/files/abc123",
  "format":      "html",
  "expires_at":  "2026-06-27T16:00:00.000Z",
  "status_code": 200,
  "duration_ms": 4200,
  "meta": { "title": "Example Domain", "description": "...", "canonical": "...", "wordCount": 42 },
  "credits_used": 5,
  "tag":         "my-label"           // only present when tag param was provided
}

// On failure:
{
  "success":      false,
  "job_id":       "abc123",
  "error":        "CAPTCHA challenge could not be bypassed",
  "reason":       "captcha_blocked",
  "captcha_type": "cloudflare_turnstile",
  "attempts":     3
}

Python

import requests

# Step 1 — fire the scrape, get 202 immediately
r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url":          "https://example.com",
        "callback_url": "https://your-server.com/webhook/sparkproxy",
    },
)
print(r.status_code)  # 202
print(r.json())       # {"job_id": "abc123", "status": "queued"}

# Step 2 — receive the result at your endpoint (e.g. Flask)
# from flask import Flask, request
# app = Flask(__name__)
# @app.route("/webhook/sparkproxy", methods=["POST"])
# def webhook():
#     data = request.json
#     print("Job", data["job_id"], "success:", data["success"])
#     return "", 200

POST / PUT

All parameters work exactly the same way with POST and a JSON body. Switch to POST when payloads like js_scenario get too big to fit in a URL.

Python

import requests

r = requests.post(
    "https://scrape.sparkproxy.io/api/v1",
    headers={
        "X-API-Key":    "YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url":       "https://example.com",
        "render_js": True,
        "stealth":   True,
        "format":    "screenshot",
        "js_scenario": {
            "instructions": [
                {"click": "#accept-cookies"},
                {"wait":  1000},
            ]
        },
    },
)
print(r.json())

Data Extraction

extract_rulesJSON object · optional · render_js=true

Pull structured data directly out of a page without writing a parser. Pass a JSON object where each key becomes a field in the response. The value can be a plain CSS selector string (returns the element's text content) or an object with a selector and type for richer extraction.

Selector formats

{
  "title":   "h1",                                // string: querySelector → textContent
  "links":   { "selector": "a.nav", "type": "list" },  // list: querySelectorAll → string[]
  "href":    { "selector": "a.cta", "type": "href" },  // href: → el.href (full URL)
  "img":     { "selector": "img.hero", "type": "src" } // src:  → el.src (full URL)
}

Response shape (replaces the normal result_url response)

{
  "job_id":      "abc123",
  "duration_ms": 2340,
  "credits_used": 5,
  "tag":         "my-label",          // only present when tag param was provided
  "extracted": {
    "title":       "Example Domain",
    "description": "This domain is for illustrative examples.",
    "links":       ["Home", "About", "Contact"],
    "href":        "https://example.com/signup"
  },
  "meta": {
    "title":       "Example Domain",
    "description": "This domain is for use in illustrative examples.",
    "canonical":   "https://example.com/",
    "h1":          ["Example Domain"],
    "wordCount":   42
  }
}

Python

import requests

rules = {
    # Plain string: querySelector → textContent
    "title":       "h1",
    "description": "meta[name='description']",
    "price":       ".product-price",
    # Object form: querySelectorAll → list of textContent strings
    "features":    { "selector": "ul.features li", "type": "list" },
    # Object form: returns el.href (full URL)
    "buy_link":    { "selector": "a.buy-now",       "type": "href" },
    # Object form: returns el.src (full URL)
    "hero_img":    { "selector": "img.hero",         "type": "src"  },
}

r = requests.post(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY", "Content-Type": "application/json"},
    json={
        "url":          "https://example.com/product",
        "extract_rules": rules,
    },
)
data = r.json()
print(data["extracted"])
# { "title": "...", "price": "$29.99", "features": ["Fast", "Reliable"], "buy_link": "https://..." }

Tag

tagstring · optional · max 128 chars

Attach a custom label to any request. The tag comes back in the response so you can group and filter by project, campaign, or feature. Allowed characters: alphanumeric, _, -, ., :, @, /.

Python

import requests

r = requests.get(
    "https://scrape.sparkproxy.io/api/v1",
    headers={"X-API-Key": "YOUR_API_KEY"},
    params={
        "url": "https://example.com",
        "tag": "price-monitor/amazon",  # echoed back in the response
    },
)
data = r.json()
print(data.get("tag"))  # "price-monitor/amazon"

Credit Cost

Credits are deducted before the request runs and automatically refunded on failure. Base cost is determined by proxy tier × rendering mode. The three add-ons (country_code, js_scenario, screenshot/PDF) each add 5 credits on top of the base cost.

Feature	Credits
Rotating proxy, no JS (render_js=false)	+1
Rotating proxy, with JS (render_js=true)	+5
Premium proxy, no JS (premium_proxy=true, render_js=false)	+10
Premium proxy, with JS (premium_proxy=true, render_js=true)	+25
Stealth Proxy with residential network	coming soon
Add-on: stealth browser mode (stealth=true, render_js=true)	+5
Add-on: geo-targeted proxy (country_code)	+5
Add-on: JS scenario (js_scenario, render_js=true)	+5
Add-on: screenshot or PDF format (render_js=true)	+5

Status Codes

HTTP status codes returned by the SparkProxy API, not the page you scraped.

Code	Meaning
`200`	Result body or JSON envelope returned
`202`	Job queued (webhook mode) — result will be POSTed to callback_url when ready
`401`	Missing or invalid API key
`402`	Not enough credits, top up your account
`404`	File not found or does not belong to this API key (files endpoint)
`422`	Invalid parameters, check the error message
`429`	Rate limit or concurrency limit exceeded
`500`	Internal server error, credits are refunded
`503`	Service temporarily unavailable (database or upstream down)
`410`	File expired and was not backed up to remote storage
`530`	Scrape failed, the target returned an error or timed out

Error response bodies

Every error response is JSON with at least an error string. Some include extra fields to help you handle the failure without guessing.

401Missing or invalid API key

{ "error": "api_key is required" }
// or
{ "error": "Invalid API key" }

402Insufficient credits

{
  "error":             "Insufficient credits",
  "credits_required":  10,
  "credits_remaining": 3
}

422Invalid parameters

{ "error": "url is required" }
// or
{ "error": "Invalid format. Valid: html, md, mdx, screenshot, pdf, json" }
// or
{ "error": "Requests to private, loopback, or internal addresses are not allowed", "url": "..." }
// or
{ "error": "SOCKS5 proxies are not supported with render_js=false. Use render_js=true or supply an HTTP proxy." }
// or
{ "error": "stealth=true requires render_js=true" }
// or
{ "error": "js_scenario requires render_js=true" }
// or
{ "error": "format=screenshot and format=pdf require render_js=true" }
// or
{ "error": "Batch scraping (multiple URLs) requires render_js=false. For headless scraping, send one URL per request." }
// or
{ "error": "callback_url: Requests to private, loopback, or internal addresses are not allowed" }
// or
{ "error": "own_proxy cannot point to a private or internal address" }

429Rate limit or concurrency limit

// Rate limit (per-minute window)
{
  "error":               "Rate limit exceeded",
  "retry_after_seconds": 60,
  "limit":               60
}
// Concurrency limit (too many parallel requests)
{
  "error":               "Concurrency limit reached",
  "active":              3,
  "limit":               3,
  "retry_after_seconds": 5
}

530Scrape failed

// General failure (network error, timeout, anti-bot block, etc.)
{
  "error":        "Timeout 30000ms exceeded",            // raw error from Playwright / axios
  "reason":       "Page load exceeded configured timeout", // human-readable translation; "Unexpected error — check the full error message" for unclassified failures
  "captcha_type": null,               // null for non-CAPTCHA failures
  "attempts":     3,                  // total attempts made (always equals MAX_RETRIES=3 when all fail)
  "job_id":       "abc123"
}
// CAPTCHA block (reason is always "captcha_blocked" for CAPTCHA failures)
{
  "error":        "CAPTCHA challenge (cloudflare_turnstile) could not be bypassed with the current proxy",
  "reason":       "captcha_blocked",
  "captcha_type": "cloudflare_turnstile", // "cloudflare_turnstile" | "hcaptcha" | "recaptcha" — null for behavioral blocks (DataDome, PerimeterX, Akamai)
  "attempts":     3,
  "job_id":       "abc123"
}

500Internal server error

{ "error": "Internal server error" }
// job_id is included when the failure occurred after the job was created:
{ "error": "Internal server error", "job_id": "abc123" }
// Credits are always refunded on 500

503Service temporarily unavailable

{ "error": "Service temporarily unavailable" }
// Retry with exponential back-off

// Files endpoint — local file expired and remote storage fetch failed:
{ "error": "File unavailable — remote storage fetch failed" }

410File expired (files endpoint only)

{ "error": "File has expired and was not backed up to remote storage" }
// Files are stored locally for FILE_TTL_HOURS (default 2 h) then deleted.
// If remote backup is enabled, the file remains available after local expiry.

Response Headers

Plain HTTP responses (render_js=false) include these headers. Headless responses return the same values inside the JSON body (job_id, credits_used, duration_ms).

Header	Description
`X-Job-Id`	Unique job ID, use it with `/api/v1/files/:jobId` to fetch the result
`X-Credits-Used`	Number of credits consumed by this request
`X-Duration-Ms`	Total server-side execution time in milliseconds