The extract API allows you to auto-extract the structured data from web. The structure data including schema.org, RDFa, Microdata, JSON-LD can be extracted easily by providing the page URL to /extract API.
GET
Send a GET request to https://browser.agenty.com/api/extract
endpoint with the url
query parameter and your apiKey
to extract structured data.
For example, I have this sample web-page with restaurant microdata in the HTML. -
<!DOCTYPE html>
<html>
<head>
<title>Restaurant Schema test</title>
</head>
<body>
<div itemscope itemtype="http://schema.org/Restaurant">
<span itemprop="name">GreatFood</span>
<div itemprop="aggregateRating" itemscope itemtype="http://schema.org/AggregateRating">
<span itemprop="ratingValue">4</span> stars -
based on <span itemprop="reviewCount">250</span> reviews
</div>
<div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="streetAddress">1901 Lemur Ave</span>
<span itemprop="addressLocality">Sunnyvale</span>,
<span itemprop="addressRegion">CA</span> <span itemprop="postalCode">94086</span>
</div>
<span itemprop="telephone">(408) 714-1489</span>
<a itemprop="url" href="http://www.greatfood.com">www.greatfood.com</a>
Hours:
<meta itemprop="openingHours" content="Mo-Sa 11:00-14:30">Mon-Sat 11am - 2:30pm
<meta itemprop="openingHours" content="Mo-Th 17:00-21:30">Mon-Thu 5pm - 9:30pm
<meta itemprop="openingHours" content="Fr-Sa 17:00-22:00">Fri-Sat 5pm - 10:00pm
Categories:
<span itemprop="servesCuisine">
Middle Eastern
</span>,
<span itemprop="servesCuisine">
Mediterranean
</span>
Price Range: <span itemprop="priceRange">$$</span>
Takes Reservations: Yes
</div>
</body>
</html>
So running the /extract API with this URL in postman or any programming language to send a HTTP GET request will result in structured data extracted -
const fetch = require('node-fetch');
fetch("https://browser.agenty.com/api/extract?apiKey={{API_KEY}}&url=https://agenty.github.io/Agenty.TestData/scraping/schema/Restaurant-schema.html", {
method: 'GET'
})
.then(res => {
console.log(res.json())
})
Sample response
{
"metatags": {
"openingHours": [
"Mo-Sa 11:00-14:30",
"Mo-Th 17:00-21:30",
"Fr-Sa 17:00-22:00"
]
},
"microdata": {
"Restaurant": [
{
"@context": "http://schema.org/",
"@type": "Restaurant",
"name": "GreatFood",
"aggregateRating": {
"@context": "http://schema.org/",
"@type": "AggregateRating",
"ratingValue": "4",
"reviewCount": "250"
},
"address": {
"@context": "http://schema.org/",
"@type": "PostalAddress",
"streetAddress": "1901 Lemur Ave",
"addressLocality": "Sunnyvale",
"addressRegion": "CA",
"postalCode": "94086"
},
"telephone": "(408) 714-1489",
"url": "http://www.greatfood.com",
"openingHours": [
"Mo-Sa 11:00-14:30",
"Mo-Th 17:00-21:30",
"Fr-Sa 17:00-22:00"
],
"servesCuisine": [
"Middle Eastern",
"Mediterranean"
],
"priceRange": "$$"
}
]
},
"rdfa": {},
"jsonld": {}
}