HTML to Markdown API

Convert any URL into clean, LLM-ready Markdown — perfect for RAG pipelines, AI agents, and documentation migrations.

The Agenty Markdown API converts any URL into clean, LLM-ready Markdown. It strips navigation, ads, and scripts, then preserves headings, lists, tables, code blocks (with language hints), and image alt text. The result is compact, token-efficient Markdown that you can drop straight into a vector store, RAG pipeline, or AI agent context window.

Use it to feed live web content into LLMs without writing custom scrapers, to migrate legacy HTML docs into static site generators, or to archive articles as portable Markdown files.

Features

Use cases

API examples

Convert a web page to LLM-ready Markdown with cURLbash
curl -X POST https://api.agenty.ai/v1/markdown \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/docs"
  }'
Convert to Markdown with YAML frontmatter for static sitesbash
curl -X POST https://api.agenty.ai/v1/markdown \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/post",
    "frontmatter": ["title", "author", "date"]
  }'
Fetch LLM-ready Markdown in Node.js and send it to an LLMjavascript
// 1. Convert the page to Markdown
const res = await fetch('https://api.agenty.ai/v1/markdown', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ url: 'https://example.com/docs' }),
});
const { markdown } = await res.json();

// 2. Feed it straight into an LLM as context
const answer = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    { role: 'system', content: 'Answer questions using the provided docs.' },
    { role: 'user', content: `Docs:\n\n${markdown}\n\nQuestion: What does this page describe?` },
  ],
});
Convert a URL to Markdown in Python for a RAG pipelinepython
import requests

res = requests.post(
    "https://api.agenty.ai/v1/markdown",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={"url": "https://example.com/docs"},
)
markdown = res.json()["markdown"]

# Chunk and embed for your vector store
chunks = markdown.split("\n## ")
print(f"{len(chunks)} chunks ready for embedding")

How Agenty compares

FeatureAgentyMarkDropHtml2MarkdownTurndown
URL to Markdown (hosted API)YesNoLibrary onlyLibrary only
LLM-ready outputYesPartialNoNo
Table supportYes (GFM)YesLimitedPlugin
Code language detectionYesNoNoNo
Frontmatter supportYesNoNoNo
Free tierYesYesOpen sourceOpen source

Frequently asked questions

What is the HTML to Markdown API?

The Agenty Markdown API converts any web page into clean, LLM-ready Markdown. It preserves structure (headings, lists, tables), keeps code blocks with language hints, and retains image references — so the output works in both RAG pipelines and static site generators.

Why is this better than raw HTML for LLMs?

Markdown is far more token-efficient than raw HTML — typically 3–5x smaller for the same article — and strips noise like <script>, ads, and inline styles. That means your LLM context window is spent on real content, not markup, which improves both cost and answer quality in RAG and agent workflows.

Does it support tables?

Yes. HTML tables are converted to GitHub-flavored Markdown tables. Complex tables with colspan or rowspan are flattened into simple aligned tables.

Can I include YAML frontmatter?

Yes. Pass a frontmatter array with field names like "title", "author", "date", and the API prepends YAML frontmatter to the Markdown output — ready for Hugo, Astro, Next.js, or Jekyll.

Is there a free tier?

Yes. All new accounts include a free tier. See our pricing page for limits.

Web scraping with AI

Start scraping data from any website using the Agenty's web scraping agents with AI.

No credit card required
14-day free trial
Cancel anytime
Log inSign up