Web Data Extraction API
Turn any web page into clean structured JSON using CSS selectors, XPath, or AI-powered schema detection.
The Agenty Extract API turns any web page into clean JSON. Define CSS selectors or XPath rules, or let the AI auto-detect the schema from a simple field list. The API handles JavaScript rendering, pagination, and proxy rotation, so you can focus on the data instead of the plumbing.
Features
- CSS & XPathSelector-based extraction with attribute pickers.
- AI auto-schemaDescribe fields in plain English and let AI map them.
- Nested objectsExtract lists of cards with child fields.
- PaginationFollow next buttons or infinite scroll automatically.
- JS renderingHeadless Chrome handles SPAs and dynamic content.
- Proxy rotationGeo-targeted residential and datacenter proxies.
- Smart retriesAutomatic retries with backoff on transient errors.
- Batch modeProcess thousands of URLs in a single job.
Use cases
- Price and inventory monitoring across e-commerce sites
- Lead generation from directories and B2B catalogs
- Real-estate and rental listing aggregation
- Product review and rating collection for analytics
- Building training datasets for LLMs and recommender systems
API examples
curl -X POST https://api.agenty.ai/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/products",
"selectors": {
"title": "h1",
"price": ".price",
"image": "img.main-image@src"
}
}'curl -X POST https://api.agenty.ai/v1/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/products",
"ai": true,
"schema": ["product name", "price", "rating", "image"]
}'const res = await fetch('https://api.agenty.ai/v1/extract', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://example.com/products',
selectors: { title: 'h1', price: '.price' },
}),
});
const data = await res.json();
console.log(data);import requests
res = requests.post(
"https://api.agenty.ai/v1/extract",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"url": "https://example.com/products",
"selectors": {"title": "h1", "price": ".price"},
},
)
print(res.json())How Agenty compares
| Feature | Agenty | Diffbot | ScrapingBee | ScrapeStack |
|---|---|---|---|---|
| CSS selector extraction | Yes | Yes | Yes | Yes |
| AI auto-schema detection | Yes | Yes | No | No |
| Pagination handling | Yes | Limited | Yes | Limited |
| JavaScript rendering | Yes | Yes | Yes | Yes |
| Free tier | Yes | Yes | Yes | Yes |
Frequently asked questions
What is the Web Data Extraction API?
The Agenty Extract API lets you extract structured data from any web page using CSS selectors, XPath, or AI-powered schema detection. The output is clean JSON ready for your database, CRM, or analytics pipeline.
Do I need to write CSS selectors?
No. Enable ai: true and pass a simple field list such as ["product name", "price", "rating"]. The AI analyses the page and maps the fields automatically.
Can you handle pagination?
Yes. Provide a pagination config with the next-button selector for traditional pagination, or scroll parameters for infinite scroll.
Does it work with JavaScript-heavy sites?
Yes. We use headless Chrome to render JavaScript before extraction, so React, Vue, and Angular single-page apps work out of the box.
Is there a free tier?
Yes. Every account starts with free credits. Check our pricing page for current limits.