Web Scraping API

Learn to use our Web Scraping API by reading this documentation and kickstart your data extraction journey!

Start scraping any website using our Web Scraping API only with a few lines of code. Our API handles all types of blockages and CAPTCHAs internally so that you can focus on extracting the data you need.

Our API endpoint is: https://api.serpdog.io/scrape

Guide

Our API is easy to use and is designed to be used by developers. Here are a few things to consider before we get started:

  • The request will be retried until it can be completed (up to 60 seconds). In cases where the request fails in 60 seconds, we will return a 408 error, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200 status code). Make sure to catch these errors! They will occur on roughly 1-2% of requests.

  • If you exceed 1000 requests per month on your free plan, you will receive a 403 error.

  • Each request will return the raw HTML of the web page as requested by the user.

Here is the list of default parameters you can use with this API:

Parameters
Description

api_key required

This is your API key.

url required

Type: String

The URL of the page you want to scrape.

premium

Type: String Use the premium parameter to scrape difficult-to-scrape websites.

render_js

Type: Boolean

Default: true Use this parameter to render the JavaScript on the web page using the headless browser.

wait

Type: Integer [0,35000]

Default: 0 Use this parameter to wait for a heavy website to load for a given amount of time(in milliseconds).

country

Type: String Location of premium residential proxy

Usage

You can use the Serpdog API by sending a GET request https://api.serpodg.io/scrape using two parameters: api_key (your API key) and URL (the URL you want to scrape). This API endpoint is the only one you must interact with to access all of Serpdog's web scraping services.

URL

This will be the URL of the page you want to scrape and get data from it.

Note: You should always pass the URL in the encoded form. For example, the & character should be encoded as %26.

sudo apt-get install gridsite-clients
urlencode "YOUR_URL"

URL parameter in the API request:

curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=false"

Our API will respond with the raw HTML data of the target URL.

<html>
      <head>
      </head>
      <body>
        .......
      </body>
    </html>

JavaScript Rendering

If you need to render JavaScript on a page while crawling, Serpdog offers the option to fetch these pages using a headless browser, which is only available on the Premium plans. To use this feature, set render_js=true and a headless browser instance will be used to fetch the page. Each request with normal rotating proxies costs 5 credits, while requests with premium proxies cost 25 credits.

By default render_js=true.

If you do not need to render JavaScript, you can use the render_js=false parameter in the GET request to fetch the URL without a headless browser.

curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=true"

Here is the sample response which can be returned:

<!DOCTYPE html>
<html>
<head>
	<title>Sample Page</title>
</head>
<body>
	<div id="app">
		<h1>Welcome to our website!</h1>
		<p>This page was dynamically rendered using JavaScript.</p>
		<ul>
			<li v-for="item in items">{{ item }}</li>
		</ul>
	</div>
	<script src="https://cdn.jsdelivr.net/npm/vue"></script>
	<script>
		new Vue({
			el: '#app',
			data: {
				items: ['Item 1', 'Item 2', 'Item 3']
			}
		})
	</script>
</body>
</html>

Proxies

If you're scraping websites that are difficult to scrape, such as search engines, social networks, or certain e-commerce sites, premium proxies (also called residential proxies) are a good option to consider. These proxies are less likely to be blocked and can be helpful in overcoming issues and error codes that may arise during the scraping process.

You can use the parameter premium=true in your API request to enable the use of Premium proxies.

Credit Cost:

  1. You will be charged 25 requests credits for using premium proxies with JavaScript Rendering(render_js=true).

  2. You will be charged only 10 requests credits if you use premium proxies without JavaScript rendering(render_js=false).

curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&premium=true"

Geolocation

You have the option to choose the proxy location by specifying the country code using the parameter country=country_code.

For instance, to use premium proxies from the USA, you can set both premium=true and country=us parameters in your API call. The API supports the most popular country codes in the ISO 3166-1 format. Below is the list of the supported country codes.

Country
Code

United States

us

India

in

China

cn

Russia

ru

Brazil

br

Mexico

mx

France

fr

Italy

it

Australia

au

Germany

de

Spain

es

Canada

ca

United Kingdom

uk

curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&country=in"

Delay for a fixed time

To ensure the Serpdog scraper captures fully rendered HTML on code-heavy websites, use the wait parameter to instruct it to wait for a fixed amount of time before returning the content.

The wait parameter accepts a value in milliseconds ranging from 0 to 35000. By including the wait parameter in the API call, Serpdog's headless browsers will pause for the set duration before returning the page's HTML, ensuring the page is fully rendered.

Its default value is 0.

Note: render_js should be equal to true while using this parameter.

curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&wait=4000"

Last updated