Learn to use our Web Scraping API by reading this documentation and kickstart your data extraction journey!
Start scraping any website using our Web Scraping API only with a few lines of code. Our API handles all types of blockages and CAPTCHAs internally so that you can focus on extracting the data you need.
Our API endpoint is: https://api.serpdog.io/scrape
Guide
Our API is easy to use and is designed to be used by developers.
Here are a few things to consider before we get started:
The request will be retried until it can be completed (up to 60 seconds). In cases where the request fails in 60 seconds, we will return a 408 error, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200 status code). Make sure to catch these errors! They will occur on roughly 1-2% of requests.
If you exceed 1000 requests per month on your free plan, you will receive a 403 error.
Each request will return the raw HTML of the web page as requested by the user.
Here is the list of default parameters you can use with this API:
Parameters
Description
api_key
required
This is your API key.
url
required
Type: String
The URL of the page you want to scrape.
premium
Type: String
Use the premium parameter to scrape difficult-to-scrape websites.
render_js
Type: Boolean
Default: true
Use this parameter to render the JavaScript on the web page using the headless browser.
wait
Type: Integer[0,35000]
Default: 0
Use this parameter to wait for a heavy website to load for a given amount of time(in milliseconds).
country
Type: String
Location of premium residential proxy
Usage
You can use the Serpdog API by sending a GET request https://api.serpodg.io/scrape using two parameters: api_key (your API key) and URL (the URL you want to scrape). This API endpoint is the only one you must interact with to access all of Serpdog's web scraping services.
URL
This will be the URL of the page you want to scrape and get data from it.
Note: You should always pass the URL in the encoded form. For example, the & character should be encoded as %26.
If you need to render JavaScript on a page while crawling, Serpdog offers the option to fetch these pages using a headless browser, which is only available on the Premium plans. To use this feature, set render_js=true and a headless browser instance will be used to fetch the page. Each request with normal rotating proxies costs 5 credits, while requests with premium proxies cost 25 credits.
By default render_js=true.
If you do not need to render JavaScript, you can use the render_js=false parameter in the GET request to fetch the URL without a headless browser.
Here is the sample response which can be returned:
<!DOCTYPEhtml><html><head> <title>Sample Page</title></head><body> <divid="app"> <h1>Welcome to our website!</h1> <p>This page was dynamically rendered using JavaScript.</p> <ul> <liv-for="item in items">{{ item }}</li> </ul> </div> <scriptsrc="https://cdn.jsdelivr.net/npm/vue"></script> <script>newVue({ el:'#app', data: { items: ['Item 1','Item 2','Item 3'] } }) </script></body></html>
Proxies
If you're scraping websites that are difficult to scrape, such as search engines, social networks, or certain e-commerce sites, premium proxies (also called residential proxies) are a good option to consider. These proxies are less likely to be blocked and can be helpful in overcoming issues and error codes that may arise during the scraping process.
You can use the parameter premium=true in your API request to enable the use of Premium proxies.
Credit Cost:
You will be charged 25 requests credits for using premium proxies with JavaScript Rendering(render_js=true).
You will be charged only 10 requests credits if you use premium proxies without JavaScript rendering(render_js=false).
You have the option to choose the proxy location by specifying the country code using the parameter country=country_code.
For instance, to use premium proxies from the USA, you can set both premium=true and country=us parameters in your API call. The API supports the most popular country codes in the ISO 3166-1 format. Below is the list of the supported country codes.
To ensure the Serpdog scraper captures fully rendered HTML on code-heavy websites, use the wait parameter to instruct it to wait for a fixed amount of time before returning the content.
The wait parameter accepts a value in milliseconds ranging from 0 to 35000. By including the wait parameter in the API call, Serpdog's headless browsers will pause for the set duration before returning the page's HTML, ensuring the page is fully rendered.
Its default value is 0.
Note: render_js should be equal to true while using this parameter.