Article Content Extraction API

Extract clean article body, title, author, and publish date from any blog or news page — without the ads and clutter.

The Agenty Content API extracts the main article body from any blog or news URL, automatically removing navigation, ads, sidebars, and footers. Get the title, author, publish date, hero image, and a clean HTML or plain-text body — ready for aggregators, newsletters, or LLM pipelines.

Features

Auto content detection
Find the main article block on any page.
Title & metadata
Title, author, publish date, and language.
Clean HTML or text
Output as semantic HTML or plain text.
Multi-language
Works on 50+ languages out of the box.
Media URLs
Extract hero images and embedded videos.
Tags & categories
Pull article tags when present in markup.
Word count
Word count and estimated reading time.
Paywall support
Pass cookies for subscriber content.

Use cases

News and blog aggregation feeds
Newsletter and content curation pipelines
Competitive content and SEO analysis
Building clean text corpora for LLM training
Reader-mode features in apps and extensions

API examples

Extract article content with cURLbash

curl -X GET "https://api.agenty.ai/v1/content?url=https://example.com/blog/post" \
  -H "Authorization: Bearer YOUR_API_KEY"

Extract article content in Node.jsjavascript

const res = await fetch(
  'https://api.agenty.ai/v1/content?url=https://example.com/blog/post',
  { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } },
);
const article = await res.json();
console.log(article.title, article.text);

Extract article content in Pythonpython

import requests

res = requests.get(
    "https://api.agenty.ai/v1/content",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    params={"url": "https://example.com/blog/post"},
)
article = res.json()
print(article["title"], article["text"])

How Agenty compares

Feature	Agenty	Readability	Mercury	Postlight
Automatic content detection	Yes	Yes	Yes	Yes
Author & date extraction	Yes	Limited	Yes	Yes
Multi-language (50+)	Yes	Limited	Yes	Yes
Image & video extraction	Yes	No	Yes	Yes
Hosted API + free tier	Yes	Self-host	Self-host	Yes

Frequently asked questions

What is the Article Content Extraction API?

The Agenty Content API automatically identifies and extracts the main article on any web page. It returns clean structured data: title, author, publish date, article body, and embedded media URLs.

Can I get plain text instead of HTML?

Yes. Set outputFormat: "text" to receive plain text with paragraphs preserved. The default is "html" which returns clean semantic HTML.

Does it work with paywalled content?

Yes. Pass cookies or auth headers via the headers parameter. We also support session-based authentication for platforms like Medium and Substack.

Is there a free tier?

Yes. All accounts include a free tier. Visit our pricing page for details.

Explore more

Web scraping agent

Web scraping with AI

Start scraping data from any website using the Agenty's web scraping agents with AI.

Custom web scraping at scale
Real-time price monitoring
LLM training data curation
Structured JSON & CSV exports
Anti-bot bypass built-in
99.9% uptime SLA

Start free trial Book a demo