Is Web Scraping Legal? A 2026 Guide for B2B Data and RevOps

Scraping public data is generally legal in the US and Canada, but the answer turns on three things: what you scrape, how you scrape, and whether you ignore a site's Terms of Service. In the US, over 60% of disputes are resolved on ToS grounds and 90% of legal actions now stem from ToS breaches or copyright infringement rather than unauthorised access claims.

If you run RevOps, this is probably already your problem. Salesforce records are incomplete, HubSpot lists are messy, reps want better account intelligence, and someone on the team has suggested using Clay, ZoomInfo, or a custom scraper to fill the gaps. The data looks public. The legal risk doesn't.

That's where most B2B teams get tripped up. They ask, “Is web scraping legal?” as if there's a universal yes or no. There isn't. There's a workable operating model, though. If you're sourcing public firmographic data, respecting site rules, avoiding personal data where you lack a lawful basis, and using responsible collection methods, you can reduce risk materially and still improve enrichment coverage across Salesforce Sales Cloud, Account Engagement, Service Cloud, Revenue Cloud, and HubSpot Sales and Marketing Hubs.

The RevOps Dilemma Scraping Data vs Staying Compliant

It usually starts with a pipeline problem. Salesforce account records are missing industry and employee count. HubSpot lists are too messy to route cleanly. Your team opens Clay, looks at public company pages, directories, and LinkedIn profiles, and asks the obvious question: can we just pull what we need and load it into the CRM?

You can. You just cannot treat that as a free data grab.

For a RevOps leader, the core issue is operational control. Legal precedent like the CFAA and the hiQ case matters, but only because it affects everyday GTM engineering decisions: what your enrichment workflow touches, which fields flow into Salesforce and HubSpot, and whether your team is collecting company data or pulling personal data into systems that trigger privacy and consent obligations.

What RevOps leaders should evaluate

Set a rule before you set up the workflow. Every scraped or enriched field should have a documented source, collection method, owner, and approved use case.

Focus on three checks:

Access method: Collect only from public pages. Stay out of logins, paywalls, CAPTCHAs, gated directories, and any setup that requires bypassing a control.
Data type: Prioritise firmographic and company-level data. Treat named-person data, emails, phone numbers, and profile details as higher risk and subject to tighter review.
Site rules: Check the site's Terms of Service and technical restrictions before you automate anything. If the site says no bots or blocks automated collection, do not force it.

Operating rule: If your team cannot explain where a field came from, how it was collected, and why it is permitted in your CRM, keep it out of Salesforce and HubSpot.

Legal theory manifests as a systems design issue. A bad enrichment decision does not stay isolated inside Clay or a one-off scraping job. It spreads into lead scoring, routing, segmentation, outreach, reporting, and retention workflows. Once questionable data lands in core systems, cleanup gets expensive fast.

The right operating model

Treat scraping as one input inside a controlled enrichment program. Put first-party forms, partner data, official APIs, and approved vendors above scraping in your source hierarchy. Use scraping selectively, with approval rules and auditability built in.

Clay can help you operationalise that model. It is useful for orchestrating enrichment steps, standardising outputs, and reducing the chaos of ad hoc scripts. It does not make a risky collection method compliant. Your policy, source rules, and CRM governance do that.

The practical standard is simple. If a workflow saves time but creates unclear rights to collect, store, or use the data, it is a bad workflow. RevOps owns that decision.

Understanding the Core Legal Landscape

Your team enriches an account in Clay, pushes the record into Salesforce, syncs it to HubSpot, and triggers outbound. The legal question is not abstract at that point. It sits inside field mapping, source logging, and the rules you set for what enters your GTM systems.

The main legal split is straightforward. The Computer Fraud and Abuse Act (CFAA) focuses on access. Contract claims focus on whether your team ignored a site's Terms of Service. For RevOps, that distinction matters because a workflow can avoid one problem and still create another.

A professional woman looking at a whiteboard containing complex data privacy, legal compliance, and cybersecurity concepts.

In LinkedIn v. hiQ Labs, the Ninth Circuit held that scraping publicly available data did not violate the CFAA. That ruling changed how teams should assess public web data. It did not give RevOps a free pass to collect whatever a bot can reach, push it into Salesforce, and call the process compliant.

Public access lowers one category of risk. It does not remove contractual, copyright, or privacy risk.

That is the practical lesson for B2B marketing ops. If a source site bars automated collection, rate-limits bots, or restricts reuse of its content, your exposure shifts from anti-hacking arguments to breach of contract and related claims. In day-to-day operations, that is usually the primary concern. Your CRM does not care whether the bad data came from a Python script, a contractor, or an enrichment vendor. Once it lands in routing logic, sequences, and reports, the business owns the consequences.

What hiQ changes for RevOps teams

hiQ gives your team a narrower rule than many blog posts suggest. It says public pages are harder to treat as "unauthorized access" under the CFAA. It does not say public pages are open for unrestricted commercial extraction.

Use that rule operationally:

Public page access is not enough. Review site terms before you automate collection.
Collection method matters. Blocking controls, login walls, and technical restrictions increase risk fast.
Downstream use matters. Reusing scraped material inside enrichment, outreach, or profiling workflows can create separate exposure.

If you want a lower-risk data foundation, prioritize a first-party data strategy for B2B marketing operations over scraped third-party inputs whenever possible.

What this looks like inside Salesforce, HubSpot, and Clay

Bad legal decisions usually show up as bad systems design.

A risky setup has familiar symptoms. A Clay table pulls records from unclear sources. Salesforce fields have no source metadata. HubSpot lists inherit personal data without any review of collection rights. No one can tell whether a phone number came from an official company page, a prohibited directory scrape, or a reseller with weak documentation.

A controlled setup does the opposite. It records source URLs, labels collection methods, separates company-level fields from person-level fields, and blocks sync for records that fail policy checks. That discipline also makes deletion and suppression easier if a person objects to collection or asks to be removed. Teams handling those requests should review ContentRemoval.com's data removal advice as part of their response process.

The standard to apply

Treat publicly accessible as a starting point for review, not a green light.

If your RevOps team cannot show all four of these items for a scraped field, do not load it into Salesforce or HubSpot:

the exact source
the collection method
the site rules that apply
the approved business use

That is how legal precedent becomes operational policy. The hiQ decision shapes the access analysis. Your actual risk lives in source selection, ToS review, field governance, and how aggressively your enrichment pipeline turns raw web data into revenue actions.

Navigating Privacy Regulations and Personal Data

A company name is one thing. A named individual's contact details are another.

That distinction matters because privacy law cares less about your growth targets than your data handling. In California, scraping legality isn't just about access. It's about whether you're collecting personal data, whether you can justify processing it, and whether your workflow respects the limits built into CCPA and CPRA.

In 2022, California's Attorney General took action against a data broker for scraping 100 million personal records without consent. Data cited for California also shows that 70% of scraping activities in the state are deemed legal only if they exclude personal data and respect ToS, while 55% of disputes involve extraction of personal data or copyright-protected content.

A useful risk spectrum for RevOps

Not all enrichment fields carry the same legal profile. Use this simple operational lens:

Data type	Risk level	Why it matters
Public firmographic data	Lower	Company-level data is generally less sensitive if collected from public sources and handled within site rules
Public organisational data	Medium	Team pages, role descriptions, and office details can create issues if tied back to identifiable individuals
Personal contact data	Higher	Names, emails, job titles, and direct dials increase privacy obligations and downstream compliance risk
Sensitive personal data	Highest	Sensitive categories create the strongest regulatory exposure and should stay out of routine scraping workflows

The mistake B2B teams keep making

Teams often treat “work email” or “job title” as harmless because the context is commercial. Regulators don't care about your pipeline logic. If the data identifies a person, you need a defensible processing basis and clear handling rules.

That's why I push RevOps teams to separate account enrichment from contact enrichment. If your immediate goal is segmentation, routing, territory design, or ICP scoring, you can often get most of the value from firmographic fields alone. If you want contact-level enrichment, your standards must get tighter.

Minimise first: Only collect fields you can actively use in Salesforce or HubSpot.
Classify before sync: Tag personal fields separately from firmographic ones.
Review retention: Don't keep scraped personal data indefinitely just because your CRM can store it.

If your campaign can run on account-level data, don't scrape personal data just because it's available.

This is also where a first-party data strategy becomes more valuable than aggressive third-party collection. If you're trying to reduce dependence on scraped personal data, this guide to a first-party data strategy is a smarter long-term play than forcing more risky enrichment into your CRM.

On the consumer side, people are getting more proactive about limiting exposure on data collection sites. That's one reason RevOps teams should understand the privacy mindset behind ContentRemoval.com's data removal advice. If individuals are actively trying to remove their information from circulation, your sourcing model should assume higher scrutiny, not lower.

International Considerations A Canadian Perspective

Canada is more straightforward in one respect. The key issue is lawful access.

Under PIPEDA, scraping publicly available data can be permissible. But the line gets sharp fast when a team moves beyond open pages and starts collecting from authenticated systems, gated databases, or blocked endpoints.

A 2023 OPC analysis found that scraping data behind a login is unauthorised collection in Canada. It also established an important distinction: violating Terms of Service is typically a civil matter, while circumventing technical barriers can trigger CFAA-equivalent charges under Section 342.1 of the Criminal Code.

What Canadian GTM teams should treat as off-limits

If your workflow touches Canadian sources, stop treating all automation as equal. These activities create very different risk profiles:

Public static pages: Lower risk if your collection is respectful and transparent.
Logged-in environments: High risk because consent and authorised access become immediate issues.
Blocked or challenged pages: High risk because bypassing barriers changes the legal character of the activity.

That's why your GTM engineering team shouldn't build “clever” workarounds for CAPTCHAs, IP blocking, or gated business directories. The technical accomplishment isn't worth the legal downside.

The practical Canadian standard

The safest Canadian operating posture is narrow and disciplined:

Scrape only publicly available pages.
Avoid collecting PII unless you have a clear lawful basis.
Don't bypass controls. Not logins, not challenges, not blocks.
Make bot behaviour transparent.

If your data programme spans multiple regions, broad privacy awareness matters beyond Canada alone. For teams monitoring cross-border requirements, expert insights on global privacy laws can help frame how local rules differ before you expand a sourcing workflow internationally.

A Canadian-safe scraping process isn't just “less aggressive”. It's explicitly designed around lawful access.

Practical Risk Mitigation for Your RevOps Team

Most legal exposure in scraping doesn't come from one dramatic mistake. It comes from preventable operational sloppiness.

That's good news for RevOps managers because operations can fix sloppiness. You don't need to become a lawyer. You need guardrails your team can enforce inside enrichment jobs, sync logic, and workflow automation.

A professional team discussing a RevOps project plan during a collaborative office meeting in front of a screen.

Technical benchmarks from Canadian rulings and OPC compliance reports show that 85% of disputes arise from ToS breaches combined with excessive request rates above 5 requests/second. The operational best practices are clear: cap rate limits at 1 request/second, respect robots.txt, and use explicit user-agent identification to reduce interference and liability.

The non-negotiable checklist

Use this as your baseline policy for any scraping or scraping-adjacent enrichment process.

Prefer official APIs first: If Salesforce AppExchange, HubSpot integrations, or vendor APIs can provide the field, use them. APIs create cleaner permissions, better reliability, and clearer vendor accountability.
Respect robots.txt: It's not just etiquette. It shows your team made a deliberate effort to honour site-level access preferences.
Use a clear user-agent: Anonymous scraping looks evasive. Transparent identification supports a good-faith posture.
Throttle aggressively: Keep collection conservative. If your process needs volume, use more time, not more pressure.
Exclude restricted paths: Product dashboards, search interfaces, member areas, and login-dependent pages should be blocked in your collection rules by default.
Log provenance: Every scraped field entering Salesforce or HubSpot should have a traceable source and collection date.

Why security and compliance have to work together

Scraping controls shouldn't sit in a legal memo no one reads. They belong in your technical design.

If your team is automating enrichment across multiple systems, fold these controls into the same governance model you use for integration hardening. This overview of API security best practices is useful because the same principle applies: the more automated the flow, the more intentional your controls need to be.

Teams that need a wider operational framework can also look at UTMStack compliance management for examples of how compliance controls can be formalised rather than left to manual judgement.

Operational test: If a new RevOps hire can't tell whether a source is approved, your policy is too vague.

Building Compliant Enrichment Pipelines with Modern Tools

The best enrichment strategy isn't “scrape everything.” It's sequence your sources by risk.

That's why I recommend Clay for B2B teams that need flexible enrichment. Clay is a strong tool for scraping and data orchestration because it lets you build structured workflows across APIs, approved vendors, and public web research instead of relying on ad hoc browser hacks.

Screenshot from https://clay.com?via=3f400e

A sensible enrichment waterfall

A compliant pipeline should follow an order like this:

First source: Official API or native integration.
Second source: Approved vendor dataset.
Third source: Manual verification or analyst review for high-value accounts.
Last resort: Responsible scrape of public, low-risk firmographic data from the company's own public web presence.

That model does two things. It reduces legal exposure, and it improves data quality. Public scraping is often messy when used as a first-line source. It works far better as a fallback for missing account-level fields than as the backbone of contact enrichment.

How this fits Salesforce and HubSpot

In Salesforce Sales Cloud, this approach keeps enrichment tied to account governance, duplicate rules, field-level permissions, and source attribution. In HubSpot, it helps you avoid uncontrolled property sprawl and keeps lists, workflows, and lead scoring from depending on questionable data.

If your team is still treating enrichment as a one-off append job, fix that first. Enrichment should be an orchestrated system with source ranking, confidence logic, and suppression rules. That's the difference between operational maturity and data debt.

For teams refining their sourcing design, this explanation of what enrichment means in practice is useful because it frames enrichment as a governed revenue operation, not just a data grab.

Your Go-Forward Plan for Legal Data Sourcing

An open notebook on a wooden desk displaying a written Go-Forward Plan with various business strategy points.

If you've been asking “Is Web Scraping Legal?”, the honest business answer is this: sometimes, yes. Carelessly, no.

Public, non-personal firmographic data is the lowest-risk category. Personal data raises the stakes fast. Logged-in sources, blocked pages, and aggressive request patterns create avoidable exposure. Your biggest risk usually isn't anti-hacking law, but poor operational discipline around site rules, privacy obligations, and data governance.

The policy your organisation should adopt now

Every B2B company doing enrichment should create a Data Sourcing Policy. Not eventually. Now.

That policy should answer five questions in plain language:

Approved sources: Which websites, vendors, and APIs are allowed?
Approved data classes: Which fields are allowed, restricted, or prohibited?
Approved methods: When can the team use APIs, vendors, manual research, or scraping?
Technical controls: What are the rate limits, user-agent rules, robots.txt rules, and logging requirements?
Review ownership: Who signs off when a new source or workflow gets added?

What a strong RevOps leader does next

Don't wait for legal to clean up a broken enrichment process after the fact. Fix the process before scale makes it expensive.

Start with a source audit inside Salesforce and HubSpot. Identify where each key field comes from. Separate account-level data from personal data. Remove anything your team can't explain. Then redesign your enrichment waterfall so low-risk, permissioned sources sit at the top and scraping sits at the edge, tightly controlled.

That approach won't make your data programme flashy. It will make it defensible.

Build a sourcing model that your legal team, ops team, and sales leadership can all understand without translation.

The teams that win here aren't the teams that scrape the most. They're the teams that source data consistently, document it properly, and keep risky collection methods out of the core revenue engine.

If your Salesforce or HubSpot instance is full of mystery fields, fragile enrichments, and undocumented workflows, MarTech Do can help you audit the stack, clean up the data model, and build a compliant RevOps system that supports growth without inviting unnecessary risk.