Have a question?

Contact Us

How to Automate Web Scraping with n8n and ScrapingBot API

5 min read
Automation 10 min read  ·  Published: 07/05/2026

Automate Web Scraping in n8n with the ScrapingBot API

Combining n8n and ScrapingBot gives you the best of both worlds: a visual no-code workflow builder and a battle-tested scraping API that handles JavaScript rendering, rotating IPs, and anti-bot measures. In this guide, you will learn how to connect n8n's HTTP Request node to the ScrapingBot API, handle pagination and errors, and ship production-ready scraping automations — without writing complex infrastructure code.

1. Why combine n8n and ScrapingBot API?

Building a scraping pipeline typically requires two things: a tool to extract data from pages, and a tool to orchestrate what happens with that data. In practice, most developers end up stitching these together manually with custom scripts. n8n and ScrapingBot solve this more cleanly:

ToolWhat it does
n8nVisual workflow builder — triggers, branching, batching, scheduling, and integrations with 400+ services
ScrapingBotScraping API — handles JavaScript rendering, geo-location, anti-bot measures, and rotating IPs

Together, they let you build a full data pipeline — from scraping a page to storing results in a database, sending a Slack alert, or updating a Google Sheet — all without maintaining brittle infrastructure.

The n8n ScrapingBot API combination is particularly powerful for teams who want to automate data collection without writing custom scrapers. Furthermore, n8n's visual interface makes it easy to iterate and debug each step independently.

2. Prerequisites

Before building your workflow, make sure you have the following ready:

  • An n8n instance — Desktop app, self-hosted, or n8n Cloud
  • Your ScrapingBot username and API key — available in your ScrapingBot dashboard
  • A target URL you want to scrape
💡 Note: ScrapingBot offers free access with 100 credits per month — no payment information required. Sign up at scraping-bot.io to get your credentials.

3. Setting up n8n ScrapingBot API with HTTP Request node

Step 1 — Create your credentials

First, set up a reusable credential in n8n so you don't have to paste your API key into every node:

  1. In n8n, go to CredentialsAdd Credential
  2. Select Basic Auth
  3. Name it ScrapingBot API
  4. Set User to your ScrapingBot username
  5. Set Password to your ScrapingBot API key
  6. Click Save

Step 2 — Configure the HTTP Request node

Next, add an HTTP Request node to your workflow and configure it as follows:

FieldValue
HTTP MethodPOST
URLhttps://api.scraping-bot.io/scrape/raw-html
AuthenticationBasic Auth → select ScrapingBot API
Body Content TypeJSON
Response FormatJSON

Step 3 — Set the request body

In the JSON body field, pass the URL you want to scrape along with any options:

{
  "url": "https://example.com/products",
  "options": {
    "premiumProxy": false,
    "country": "us",
    "waitForNetworkIdle": true
  }
}

For dynamic URLs coming from a previous node (for example, a Google Sheets row), use n8n's expression syntax instead:

{
  "url": "{{ $json.url }}",
  "options": {
    "premiumProxy": false
  }
}
💡 Available options: premiumProxy (boolean) enables residential IPs for harder targets. country sets the geo-location (e.g. "fr", "de", "us"). waitForNetworkIdle waits for all JS to finish loading before returning the HTML.

4. Parsing the response

Understanding the response structure

ScrapingBot returns a structured JSON object. The main field you will use is html, which contains the fully rendered page content:

{
  "html": "<html>...rendered page content...</html>",
  "statusCode": 200,
  "captchaFound": false,
  "host": "example.com"
}

Extracting data with the HTML Extract node

After the HTTP Request node, add an HTML Extract node to pull specific data from the response. For example, to extract all product titles from a page:

FieldCSS SelectorReturn Value
productTitleh2.product-titleText
productPricespan.priceText
productUrla.product-linkHTML Attribute → href

Checking for errors before parsing

Always add an IF node after the HTTP Request to check that the scrape succeeded before processing the data:

// Condition in the IF node
{{ $json.statusCode === 200 && $json.captchaFound === false }}

If the condition is false, route that branch to a retry or error handler instead of continuing the workflow.

5. Handling multiple URLs

Recommended workflow structure

When you need to scrape a large list of URLs, batching is essential to avoid overloading the target server and hitting rate limits. Here is the recommended pattern:

StepNodePurpose
1Trigger (Manual or Cron)Start the workflow
2Google Sheets / DatabaseRead the list of URLs to scrape
3Split In BatchesProcess 5–10 URLs at a time
4HTTP Request → ScrapingBotScrape each URL
5WaitAdd a 500–1500ms delay between batches
6HTML Extract / CodeParse the response
7Write resultsPush to database, sheet, or CRM

Adding a polite delay

In the Wait node, set a random delay between requests to avoid triggering rate limits:

// In a Code node before the Wait node
// Generate a random delay between 500ms and 1500ms
const delay = Math.floor(Math.random() * 1000) + 500;
return [{ json: { delay } }];

Then in the Wait node, set the duration to {{ $json.delay }} milliseconds. As a result, your workflow behaves more like a human browser and is far less likely to get blocked.

6. Common errors and how to fix them

Even with ScrapingBot handling most protections, errors can still occur. Here is how to handle the most common ones:

ErrorCauseFix
401 UnauthorizedWrong credentialsDouble-check your username and API key in the n8n credential
429 Too Many RequestsRate limit exceededIncrease the delay between requests or reduce batch size
captchaFound: trueCAPTCHA not bypassedEnable premiumProxy: true in the request options
statusCode: 404Page no longer existsAdd an IF node to skip 404s and log them separately
Empty HTML responseJavaScript not renderedSet waitForNetworkIdle: true in the options
Workflow timeoutToo many URLs in one runReduce batch size and add a Wait node between batches

Adding retry logic

For transient errors, add automatic retries using n8n's built-in retry mechanism. In the HTTP Request node settings, enable "Retry on Fail" and set:

  • Max Tries: 3
  • Wait Between Tries: 2000ms

Additionally, for persistent failures, route them to a dedicated error branch that logs the failed URL to a Google Sheet or sends a Slack notification for manual review.

7. Production recipes

Once your basic workflow is working, here are three ready-to-ship automations you can build today:

Price monitor

Track product prices and get alerted when they change:

  1. Cron trigger — run every hour
  2. HTTP Request → ScrapingBot scrapes the product page
  3. HTML Extract — pulls the current price
  4. IF node — compares with the last stored price
  5. Slack / Email node — sends an alert if the price changed
  6. Google Sheets — updates the stored price

Lead capture pipeline

Turn a list of company pages into enriched CRM records:

  1. Google Sheets — reads a list of company LinkedIn or website URLs
  2. Split In Batches — processes 5 URLs at a time
  3. HTTP Request → ScrapingBot scrapes each page
  4. Code node — extracts name, email, phone, address
  5. HubSpot / Salesforce node — creates or updates the contact record

SEO audit

Audit your entire site for missing titles, broken H1s, and status codes:

  1. HTTP Request — fetches your sitemap.xml
  2. XML node — extracts all URLs from the sitemap
  3. Split In Batches — processes pages in groups of 10
  4. HTTP Request → ScrapingBot scrapes each page
  5. HTML Extract — pulls title, H1, meta description, status code
  6. Google Sheets — exports the full audit as a CSV-ready spreadsheet

Ready to automate your scraping workflows? Get 100 free credits when you sign up for ScrapingBot — no credit card required.

Try ScrapingBot for free →

Looking for something more specific?

Start using ScrapingBot

Ready to Unlock Web Data?
Data is only useful once it’s accessible. Let us do the heavy lifting so you can focus on insights.