Automate Web Scraping in n8n with the ScrapingBot API
Combining n8n and ScrapingBot gives you the best of both worlds: a visual no-code workflow builder and a battle-tested scraping API that handles JavaScript rendering, rotating IPs, and anti-bot measures. In this guide, you will learn how to connect n8n's HTTP Request node to the ScrapingBot API, handle pagination and errors, and ship production-ready scraping automations — without writing complex infrastructure code.
Table of contents
1. Why combine n8n and ScrapingBot API?
Building a scraping pipeline typically requires two things: a tool to extract data from pages, and a tool to orchestrate what happens with that data. In practice, most developers end up stitching these together manually with custom scripts. n8n and ScrapingBot solve this more cleanly:
| Tool | What it does |
|---|---|
| n8n | Visual workflow builder — triggers, branching, batching, scheduling, and integrations with 400+ services |
| ScrapingBot | Scraping API — handles JavaScript rendering, geo-location, anti-bot measures, and rotating IPs |
Together, they let you build a full data pipeline — from scraping a page to storing results in a database, sending a Slack alert, or updating a Google Sheet — all without maintaining brittle infrastructure.
The n8n ScrapingBot API combination is particularly powerful for teams who want to automate data collection without writing custom scrapers. Furthermore, n8n's visual interface makes it easy to iterate and debug each step independently.
2. Prerequisites
Before building your workflow, make sure you have the following ready:
- An n8n instance — Desktop app, self-hosted, or n8n Cloud
- Your ScrapingBot username and API key — available in your ScrapingBot dashboard
- A target URL you want to scrape
3. Setting up n8n ScrapingBot API with HTTP Request node
Step 1 — Create your credentials
First, set up a reusable credential in n8n so you don't have to paste your API key into every node:
- In n8n, go to Credentials → Add Credential
- Select Basic Auth
- Name it
ScrapingBot API - Set User to your ScrapingBot username
- Set Password to your ScrapingBot API key
- Click Save
Step 2 — Configure the HTTP Request node
Next, add an HTTP Request node to your workflow and configure it as follows:
| Field | Value |
|---|---|
| HTTP Method | POST |
| URL | https://api.scraping-bot.io/scrape/raw-html |
| Authentication | Basic Auth → select ScrapingBot API |
| Body Content Type | JSON |
| Response Format | JSON |
Step 3 — Set the request body
In the JSON body field, pass the URL you want to scrape along with any options:
{
"url": "https://example.com/products",
"options": {
"premiumProxy": false,
"country": "us",
"waitForNetworkIdle": true
}
}
For dynamic URLs coming from a previous node (for example, a Google Sheets row), use n8n's expression syntax instead:
{
"url": "{{ $json.url }}",
"options": {
"premiumProxy": false
}
}
premiumProxy (boolean) enables residential IPs for harder targets. country sets the geo-location (e.g. "fr", "de", "us"). waitForNetworkIdle waits for all JS to finish loading before returning the HTML.
4. Parsing the response
Understanding the response structure
ScrapingBot returns a structured JSON object. The main field you will use is html, which contains the fully rendered page content:
{
"html": "<html>...rendered page content...</html>",
"statusCode": 200,
"captchaFound": false,
"host": "example.com"
}
Extracting data with the HTML Extract node
After the HTTP Request node, add an HTML Extract node to pull specific data from the response. For example, to extract all product titles from a page:
| Field | CSS Selector | Return Value |
|---|---|---|
| productTitle | h2.product-title | Text |
| productPrice | span.price | Text |
| productUrl | a.product-link | HTML Attribute → href |
Checking for errors before parsing
Always add an IF node after the HTTP Request to check that the scrape succeeded before processing the data:
// Condition in the IF node
{{ $json.statusCode === 200 && $json.captchaFound === false }}
If the condition is false, route that branch to a retry or error handler instead of continuing the workflow.
5. Handling multiple URLs
Recommended workflow structure
When you need to scrape a large list of URLs, batching is essential to avoid overloading the target server and hitting rate limits. Here is the recommended pattern:
| Step | Node | Purpose |
|---|---|---|
| 1 | Trigger (Manual or Cron) | Start the workflow |
| 2 | Google Sheets / Database | Read the list of URLs to scrape |
| 3 | Split In Batches | Process 5–10 URLs at a time |
| 4 | HTTP Request → ScrapingBot | Scrape each URL |
| 5 | Wait | Add a 500–1500ms delay between batches |
| 6 | HTML Extract / Code | Parse the response |
| 7 | Write results | Push to database, sheet, or CRM |
Adding a polite delay
In the Wait node, set a random delay between requests to avoid triggering rate limits:
// In a Code node before the Wait node
// Generate a random delay between 500ms and 1500ms
const delay = Math.floor(Math.random() * 1000) + 500;
return [{ json: { delay } }];
Then in the Wait node, set the duration to {{ $json.delay }} milliseconds. As a result, your workflow behaves more like a human browser and is far less likely to get blocked.
6. Common errors and how to fix them
Even with ScrapingBot handling most protections, errors can still occur. Here is how to handle the most common ones:
| Error | Cause | Fix |
|---|---|---|
401 Unauthorized | Wrong credentials | Double-check your username and API key in the n8n credential |
429 Too Many Requests | Rate limit exceeded | Increase the delay between requests or reduce batch size |
captchaFound: true | CAPTCHA not bypassed | Enable premiumProxy: true in the request options |
statusCode: 404 | Page no longer exists | Add an IF node to skip 404s and log them separately |
| Empty HTML response | JavaScript not rendered | Set waitForNetworkIdle: true in the options |
| Workflow timeout | Too many URLs in one run | Reduce batch size and add a Wait node between batches |
Adding retry logic
For transient errors, add automatic retries using n8n's built-in retry mechanism. In the HTTP Request node settings, enable "Retry on Fail" and set:
- Max Tries: 3
- Wait Between Tries: 2000ms
Additionally, for persistent failures, route them to a dedicated error branch that logs the failed URL to a Google Sheet or sends a Slack notification for manual review.
7. Production recipes
Once your basic workflow is working, here are three ready-to-ship automations you can build today:
Price monitor
Track product prices and get alerted when they change:
- Cron trigger — run every hour
- HTTP Request → ScrapingBot scrapes the product page
- HTML Extract — pulls the current price
- IF node — compares with the last stored price
- Slack / Email node — sends an alert if the price changed
- Google Sheets — updates the stored price
Lead capture pipeline
Turn a list of company pages into enriched CRM records:
- Google Sheets — reads a list of company LinkedIn or website URLs
- Split In Batches — processes 5 URLs at a time
- HTTP Request → ScrapingBot scrapes each page
- Code node — extracts name, email, phone, address
- HubSpot / Salesforce node — creates or updates the contact record
SEO audit
Audit your entire site for missing titles, broken H1s, and status codes:
- HTTP Request — fetches your sitemap.xml
- XML node — extracts all URLs from the sitemap
- Split In Batches — processes pages in groups of 10
- HTTP Request → ScrapingBot scrapes each page
- HTML Extract — pulls title, H1, meta description, status code
- Google Sheets — exports the full audit as a CSV-ready spreadsheet
Ready to automate your scraping workflows? Get 100 free credits when you sign up for ScrapingBot — no credit card required.
Try ScrapingBot for free →