# Web Scraping API

Start scraping any website using our Web Scraping API only with a few lines of code. Our API handles all types of blockages and CAPTCHAs internally so that you can focus on extracting the data you need.

Our API endpoint is: `https://api.serpdog.io/scrape`

### Guide

Our API is easy to use and is designed to be used by developers.\
\
Here are a few things to consider before we get started:<br>

* The request will be retried until it can be completed (up to 60 seconds). In cases where the request fails in 60 seconds, we will return a 408 error, you will not be charged for the unsuccessful request (you are only charged for successful requests, 200 status code). Make sure to catch these errors! They will occur on roughly 1-2% of requests.
* &#x20;If you exceed 1000 requests per month on your free plan, you will receive a 403 error.
* &#x20;Each request will return the raw HTML of the web page as requested by the user.

Here is the list of default parameters you can use with this API:

|                            Parameters                           |                                                                                              Description                                                                                              |
| :-------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| <p>api\_key<br><br><mark style="color:red;">required</mark></p> |                                                                                         This is your API key.                                                                                         |
|    <p>url<br><br><mark style="color:red;">required</mark></p>   |                                                           <p>Type: <code>String</code></p><p><br>The URL of the page you want to scrape.</p>                                                          |
|                             premium                             |                                               <p>Type: <code>String</code><br><br>Use the premium parameter to scrape difficult-to-scrape websites.</p>                                               |
|                            render\_js                           |                   <p>Type: <code>Boolean</code><br></p><p>Default: <code>true</code><br>Use this parameter to render the JavaScript on the web page using the headless browser.</p>                   |
|                               wait                              | <p>Type: <code>Integer</code><br><code>\[0,35000]</code><br></p><p>Default: <code>0</code><br>Use this parameter to wait for a heavy website to load for a given amount of time(in milliseconds).</p> |
|                             country                             |                                                             <p>Type: <code>String</code><br><br>Location of premium residential proxy</p>                                                             |

### Usage

You can use the Serpdog API by sending a GET request  `https://api.serpodg.io/scrape` using two parameters: `api_key` (your API key) and URL (the URL you want to scrape). This API endpoint is the only one you must interact with to access all of Serpdog's web scraping services.

### URL

This will be the URL of the page you want to scrape and get data from it.

**Note: You should always pass the URL in the encoded form. For example, the `&` character should be encoded as `%26`.**&#x20;

{% tabs %}
{% tab title="cURL" %}

```json
sudo apt-get install gridsite-clients
urlencode "YOUR_URL"
```

{% endtab %}

{% tab title="Node JS" %}

```javascript
encoded_url = encodeURIComponent("YOUR_URL")
```

{% endtab %}

{% tab title="Python" %}

```python
from urllib.parse import quote
encoded_url = quote("YOUR_URL")
```

{% endtab %}

{% tab title="Java" %}

```java
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;

String encoded_url = URLEncoder.encode("YOUR_URL", StandardCharsets.UTF_8.toString());
```

{% endtab %}

{% tab title="Ruby" %}

```ruby
require 'uri'
encoded_url = URI.encode_www_form_component("YOUR_URL")
```

{% endtab %}

{% tab title="PHP" %}

```php
$encoded_url = urlencode("YOUR_URL");
```

{% endtab %}
{% endtabs %}

URL parameter in the API request:

{% tabs %}
{% tab title="cURL" %}

```json
curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=false"
```

{% endtab %}

{% tab title="Node JS" %}

```javascript
const axios = require('axios');

axios.get('https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=false')
  .then(response => {
    console.log(response.data);
  })
  .catch(error => {
    console.log(error);
  });
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
payload = {'api_key': 'APIKEY', 'url':'YOUR_URL' , 'render_js':'false'}
resp = requests.get('https://api.serpdog.io/scrape', params=payload)
print (resp.text)
```

{% endtab %}

{% tab title="Java" %}

```java
try {
 String url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=false";
 URL urlForGetRequest = new URL(url);
 String readLine = null;
 HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
 conection.setRequestMethod("GET");
 int responseCode = conection.getResponseCode();
 if (responseCode == HttpURLConnection.HTTP_OK) {
 BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
 StringBuffer response = new StringBuffer();
 while ((readLine = in.readLine()) != null) {
 response.append(readLine);
 }
 in.close();
 System.out.println(response.toString());
} else {
 throw new Exception("Error in API Call");
 }
} catch (Exception ex) {
 ex.printStackTrace();
}
```

{% endtab %}

{% tab title="Ruby" %}

```ruby
require 'net/http'
require 'json'
params = {
 :api_key => "APIKEY",
 :url=> "YOUR_URL",
 :render_js=> "false"
}
uri = URI('https://api.serpdog.io/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php
$url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=false";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
```

{% endtab %}
{% endtabs %}

Our API will respond with the raw HTML data of the target URL.

```html
<html>
      <head>
      </head>
      <body>
        .......
      </body>
    </html>
```

### JavaScript Rendering

If you need to **render JavaScript** on a page while crawling, Serpdog offers the option to fetch these pages using a headless browser, which is only available on the Premium plans. To use this feature, set `render_js=true` and a headless browser instance will be used to fetch the page. Each request with normal rotating proxies costs 5 credits, while requests with premium proxies cost 25 credits.

By default `render_js=true`.

If you do not need to render JavaScript, you can use the render\_js=false parameter in the GET request to fetch the URL without a headless browser.

{% tabs %}
{% tab title="cURL" %}

```json
curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=true"
```

{% endtab %}

{% tab title="Node JS" %}

```javascript
const axios = require('axios');

axios.get('https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=true')
  .then(response => {
    console.log(response.data);
  })
  .catch(error => {
    console.log(error);
  });
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
payload = {'api_key': 'APIKEY', 'url':'YOUR_URL' , 'render_js':'true'}
resp = requests.get('https://api.serpdog.io/scrape', params=payload)
print (resp.text)
```

{% endtab %}

{% tab title="Java" %}

```java
try {
 String url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=true";
 URL urlForGetRequest = new URL(url);
 String readLine = null;
 HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
 conection.setRequestMethod("GET");
 int responseCode = conection.getResponseCode();
 if (responseCode == HttpURLConnection.HTTP_OK) {
 BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
 StringBuffer response = new StringBuffer();
 while ((readLine = in.readLine()) != null) {
 response.append(readLine);
 }
 in.close();
 System.out.println(response.toString());
} else {
 throw new Exception("Error in API Call");
 }
} catch (Exception ex) {
 ex.printStackTrace();
}
```

{% endtab %}

{% tab title="Ruby" %}

```ruby
require 'net/http'
require 'json'
params = {
 :api_key => "APIKEY",
 :url=> "YOUR_URL",
 :render_js=> "true"
}
uri = URI('https://api.serpdog.io/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php
$url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&render_js=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
```

{% endtab %}
{% endtabs %}

Here is the sample response which can be returned:

```html
<!DOCTYPE html>
<html>
<head>
	<title>Sample Page</title>
</head>
<body>
	<div id="app">
		<h1>Welcome to our website!</h1>
		<p>This page was dynamically rendered using JavaScript.</p>
		<ul>
			<li v-for="item in items">{{ item }}</li>
		</ul>
	</div>
	<script src="https://cdn.jsdelivr.net/npm/vue"></script>
	<script>
		new Vue({
			el: '#app',
			data: {
				items: ['Item 1', 'Item 2', 'Item 3']
			}
		})
	</script>
</body>
</html>
```

### Proxies

If you're scraping websites that are difficult to scrape, such as search engines, social networks, or certain e-commerce sites, **premium proxies** (also called residential proxies) are a good option to consider. These proxies are **less likely to be blocked** and can be helpful in overcoming issues and error codes that may arise during the scraping process.

You can use the parameter `premium=true` in your API request to enable the use of Premium proxies.

Credit Cost:&#x20;

1. You will be charged 25 requests credits for using premium proxies with JavaScript Rendering(`render_js=true`).
2. You will be charged only 10 requests credits if you use premium proxies without JavaScript rendering(`render_js=false`).

{% tabs %}
{% tab title="cURL" %}

```json
curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&premium=true"
```

{% endtab %}

{% tab title="Node JS" %}

```javascript
const axios = require('axios');

axios.get('https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&premium=true')
  .then(response => {
    console.log(response.data);
  })
  .catch(error => {
    console.log(error);
  });
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
payload = {'api_key': 'APIKEY', 'url':'YOUR_URL' , 'premium':'true'}
resp = requests.get('https://api.serpdog.io/scrape', params=payload)
print (resp.text)
```

{% endtab %}

{% tab title="Java" %}

```java
try {
 String url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&premium=true";
 URL urlForGetRequest = new URL(url);
 String readLine = null;
 HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
 conection.setRequestMethod("GET");
 int responseCode = conection.getResponseCode();
 if (responseCode == HttpURLConnection.HTTP_OK) {
 BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
 StringBuffer response = new StringBuffer();
 while ((readLine = in.readLine()) != null) {
 response.append(readLine);
 }
 in.close();
 System.out.println(response.toString());
} else {
 throw new Exception("Error in API Call");
 }
} catch (Exception ex) {
 ex.printStackTrace();
}
```

{% endtab %}

{% tab title="Ruby" %}

```ruby
require 'net/http'
require 'json'
params = {
 :api_key => "APIKEY",
 :url=> "YOUR_URL",
 :premium=> "true"
}
uri = URI('https://api.serpdog.io/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php
$url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&premium=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
```

{% endtab %}
{% endtabs %}

### Geolocation

You have the option to choose the proxy location by specifying the country code using the parameter `country=country_code`.&#x20;

For instance, to use premium proxies from the USA, you can set both `premium=true` and country=us parameters in your API call. The API supports the most popular country codes in the [ISO 3166-1](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes) format. Below is the list of the supported country codes.

|     Country    | Code |
| :------------: | :--: |
|  United States |  us  |
|      India     |  in  |
|      China     |  cn  |
|     Russia     |  ru  |
|     Brazil     |  br  |
|     Mexico     |  mx  |
|     France     |  fr  |
|      Italy     |  it  |
|    Australia   |  au  |
|     Germany    |  de  |
|      Spain     |  es  |
|     Canada     |  ca  |
| United Kingdom |  uk  |

{% tabs %}
{% tab title="cURL" %}

```json
curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&country=in"
```

{% endtab %}

{% tab title="Node JS" %}

```javascript
const axios = require('axios');

axios.get('https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&country=in')
  .then(response => {
    console.log(response.data);
  })
  .catch(error => {
    console.log(error);
  });
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
payload = {'api_key': 'APIKEY', 'url':'YOUR_URL' , 'country':'in'}
resp = requests.get('https://api.serpdog.io/scrape', params=payload)
print (resp.text)
```

{% endtab %}

{% tab title="Java" %}

```java
try {
 String url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&country=in";
 URL urlForGetRequest = new URL(url);
 String readLine = null;
 HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
 conection.setRequestMethod("GET");
 int responseCode = conection.getResponseCode();
 if (responseCode == HttpURLConnection.HTTP_OK) {
 BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
 StringBuffer response = new StringBuffer();
 while ((readLine = in.readLine()) != null) {
 response.append(readLine);
 }
 in.close();
 System.out.println(response.toString());
} else {
 throw new Exception("Error in API Call");
 }
} catch (Exception ex) {
 ex.printStackTrace();
}
```

{% endtab %}

{% tab title="Ruby" %}

```ruby
require 'net/http'
require 'json'
params = {
 :api_key => "APIKEY",
 :url=> "YOUR_URL",
 :country=> "in"
}
uri = URI('https://api.serpdog.io/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php
$url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&country=in";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
```

{% endtab %}
{% endtabs %}

### Delay for a fixed time

To ensure the Serpdog scraper captures fully rendered HTML on code-heavy websites, use the `wait` parameter to instruct it to wait for a fixed amount of time before returning the content.&#x20;

The `wait` parameter accepts a value in milliseconds ranging from 0 to 35000. By including the wait parameter in the API call, Serpdog's headless browsers will pause for the set duration before returning the page's HTML, ensuring the page is fully rendered.

**Its default value is 0.**

**Note: `render_js` should be equal to true while using this parameter.**

{% tabs %}
{% tab title="cURL" %}

```json
curl "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&wait=4000"
```

{% endtab %}

{% tab title="Node JS" %}

```javascript
const axios = require('axios');

axios.get('https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&wait=4000')
  .then(response => {
    console.log(response.data);
  })
  .catch(error => {
    console.log(error);
  });
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
payload = {'api_key': 'APIKEY', 'url':'YOUR_URL' , 'wait':'4000'}
resp = requests.get('https://api.serpdog.io/scrape', params=payload)
print (resp.text)
```

{% endtab %}

{% tab title="Java" %}

```java
try {
 String url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&wait=4000";
 URL urlForGetRequest = new URL(url);
 String readLine = null;
 HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
 conection.setRequestMethod("GET");
 int responseCode = conection.getResponseCode();
 if (responseCode == HttpURLConnection.HTTP_OK) {
 BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
 StringBuffer response = new StringBuffer();
 while ((readLine = in.readLine()) != null) {
 response.append(readLine);
 }
 in.close();
 System.out.println(response.toString());
} else {
 throw new Exception("Error in API Call");
 }
} catch (Exception ex) {
 ex.printStackTrace();
}
```

{% endtab %}

{% tab title="Ruby" %}

```ruby
require 'net/http'
require 'json'
params = {
 :api_key => "APIKEY",
 :url=> "YOUR_URL",
 :wait=> "4000"
}
uri = URI('https://api.serpdog.io/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
```

{% endtab %}

{% tab title="PHP" %}

```php
<?php
$url = "https://api.serpdog.io/scrape?api_key=APIKEY&url=YOUR_URL&wait=4000";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
```

{% endtab %}
{% endtabs %}
