Parameters

All request parameters for /v1/parse — from basic URL parsing to screenshots and extraction hints.

Request body

All parameters are sent as JSON in the request body.

`url` required

Type: string

The full URL of the product page to extract. Must include the scheme (https://). Redirects are followed automatically.

{ "url": "https://www.shop.com/products/blue-widget" }

URLs must resolve to a product page. Category pages, homepages, and search result pages will return a 422 error.

`extractionHint`

Type: string · Default: none

Natural-language instructions that guide the extraction AI. Use this when you need data the standard schema doesn't capture, or when a site structures its data unusually.

{
  "url": "https://example.com/product",
  "extractionHint": "Return each storage variant separately. Include estimated shipping time and country of origin."
}

Good hints:

"Split colour and storage into separate variant entries"
"Include the quantity break pricing table"
"Return the warranty period as a separate field called warranty_years"
"The page shows a sale price and a crossed-out original price — capture both"

Hints are only used when the adaptive or visual extraction path is triggered. They have no effect when JSON-LD is found (fast-path).

`includeScreenshot`

Type: boolean | "full-height" · Default: false

When true, captures a viewport screenshot (1280×900) during extraction. Set to "full-height" for a full-page screenshot of the entire scrollable content.

{
  "url": "https://example.com/product",
  "includeScreenshot": true
}

Screenshots are stored for 7 days and linked in the response as artifacts.screenshotUrl. Useful for:

Debugging extraction issues visually
Compliance audit trails
Verifying the page state at extraction time

Screenshots add ~200–400ms to the extraction time. They do not consume an extra request from your quota.

`forceRefresh`

Type: boolean · Default: false

When true, ignores the cached domain extraction script and regenerates it from scratch. Use this when a site has recently redesigned its product pages and the existing script is returning stale selectors.

{
  "url": "https://example.com/product",
  "forceRefresh": true
}

forceRefresh: true triggers full visual extraction and will consume an API credit even if the domain was previously on the free fast-path. Only use it when you know the site layout has changed.

`strategies`

Type: string[] · Default: ["jsonld", "cached-script", "adaptive"]

Override the extraction strategy order. Available strategies:

Strategy	Description
`jsonld`	Extract from embedded JSON-LD / microdata schema
`cached-script`	Run domain-specific cached extraction selectors
`adaptive`	Generate new selectors using AI (triggers on first visit)
`visual`	Screenshot + vision model extraction

{
  "url": "https://example.com/product",
  "strategies": ["visual", "jsonld"]
}

This example forces visual extraction first, then falls back to JSON-LD. Useful for sites where the JSON-LD is incomplete but the page renders all data visually.

`minWait`

Type: number (milliseconds) · Default: 0

Extra time to wait after the page loads before extracting. Use for pages with heavy JavaScript rendering or lazy-loaded product data.

{
  "url": "https://example.com/product",
  "minWait": 1500
}

Values above 5000 are clamped to 5000ms.

`requestId`

Type: string · Default: auto-generated

A client-supplied identifier for this request. Appears in dashboard logs and error reports, making it easier to correlate extractions with your own application logs.

{
  "url": "https://example.com/product",
  "requestId": "order-import-batch-2024-05-12-item-42"
}

Must be 3–128 characters. Alphanumeric, hyphens, and underscores only.

Parameter matrix

Parameter	Free	Pro	Scale
`url`	✓	✓	✓
`extractionHint`	✓	✓	✓
`includeScreenshot`	—	✓	✓
`forceRefresh`	—	✓	✓
`strategies`	—	✓	✓
`minWait`	—	✓	✓
`requestId`	✓	✓	✓
Batch mode	—	Up to 20	Custom

Previous Parse Endpoint Next Response Schema

Edit this page on GitHub