Web Data Extraction API

Turn any web page into clean structured JSON using CSS selectors, XPath, or AI-powered schema detection.

The Agenty Extract API turns any web page into clean JSON. Define CSS selectors or XPath rules, or let the AI auto-detect the schema from a simple field list. The API handles JavaScript rendering, pagination, and proxy rotation, so you can focus on the data instead of the plumbing.

Features

CSS & XPath
Selector-based extraction with attribute pickers.
AI auto-schema
Describe fields in plain English and let AI map them.
Nested objects
Extract lists of cards with child fields.
Pagination
Follow next buttons or infinite scroll automatically.
JS rendering
Headless Chrome handles SPAs and dynamic content.
Proxy rotation
Geo-targeted residential and datacenter proxies.
Smart retries
Automatic retries with backoff on transient errors.
Batch mode
Process thousands of URLs in a single job.

Use cases

Price and inventory monitoring across e-commerce sites
Lead generation from directories and B2B catalogs
Real-estate and rental listing aggregation
Product review and rating collection for analytics
Building training datasets for LLMs and recommender systems

API examples

Extract structured data with cURLbash

curl -X POST https://api.agenty.ai/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "selectors": {
      "title": "h1",
      "price": ".price",
      "image": "img.main-image@src"
    }
  }'

AI-powered auto extraction (no selectors)bash

curl -X POST https://api.agenty.ai/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "ai": true,
    "schema": ["product name", "price", "rating", "image"]
  }'

Extract structured data in Node.jsjavascript

const res = await fetch('https://api.agenty.ai/v1/extract', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://example.com/products',
    selectors: { title: 'h1', price: '.price' },
  }),
});
const data = await res.json();
console.log(data);

Extract structured data in Pythonpython

import requests

res = requests.post(
    "https://api.agenty.ai/v1/extract",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://example.com/products",
        "selectors": {"title": "h1", "price": ".price"},
    },
)
print(res.json())

How Agenty compares

Feature	Agenty	Diffbot	ScrapingBee	ScrapeStack
CSS selector extraction	Yes	Yes	Yes	Yes
AI auto-schema detection	Yes	Yes	No	No
Pagination handling	Yes	Limited	Yes	Limited
JavaScript rendering	Yes	Yes	Yes	Yes
Free tier	Yes	Yes	Yes	Yes

Frequently asked questions

What is the Web Data Extraction API?

The Agenty Extract API lets you extract structured data from any web page using CSS selectors, XPath, or AI-powered schema detection. The output is clean JSON ready for your database, CRM, or analytics pipeline.

Do I need to write CSS selectors?

No. Enable ai: true and pass a simple field list such as ["product name", "price", "rating"]. The AI analyses the page and maps the fields automatically.

Can you handle pagination?

Yes. Provide a pagination config with the next-button selector for traditional pagination, or scroll parameters for infinite scroll.

Does it work with JavaScript-heavy sites?

Yes. We use headless Chrome to render JavaScript before extraction, so React, Vue, and Angular single-page apps work out of the box.

Is there a free tier?

Yes. Every account starts with free credits. Check our pricing page for current limits.

Explore more

Web scraping agent

Web scraping with AI

Start scraping data from any website using the Agenty's web scraping agents with AI.

Custom web scraping at scale
Real-time price monitoring
LLM training data curation
Structured JSON & CSV exports
Anti-bot bypass built-in
99.9% uptime SLA

Start free trial Book a demo