Skip to main content

Original research

AI Bot Blocking Index: How 390 B2B SaaS Sites Treat AI Crawlers

Almost no B2B SaaS company blocks AI search crawlers in robots.txt. We fetched the robots.txt of 390 B2B SaaS websites and checked how each one treats 17 known AI crawlers, from ChatGPT and Perplexity's retrieval bots to training crawlers like GPTBot and CCBot. To avoid a big-brand bias, the sample spans two cohorts: 153 recognizable established brands and 237 active growth-stage B2B SaaS companies from the Y Combinator directory. Across the 362 sites with a readable robots.txt, only 3 (0.8%) block an AI search crawler. The real gatekeeper sits one layer up: 48.2% of these sites are served behind Cloudflare, whose default bot rules can block AI crawlers regardless of what robots.txt says.

390 B2B SaaS sites · 17 AI crawlers · scanned 2026-06-27 · 362 reachable, 28 unreachable

Key findings

0.8%
block an AI search crawler in robots.txt
3 of 362 reachable sites
48.2%
are served behind Cloudflare's edge
188 of 390 sites
15%
have any explicit AI-crawler rule at all
55 of 362 reachable sites
0
use a blanket wildcard Disallow: /
across all 390 sites

Block rate by crawler

Across the 362 reachable sites, every AI crawler is blocked by a low single-digit share or fewer. The most-blocked crawler is Common Crawl's CCBot at 2.2%; the search and user-initiated retrieval bots that actually feed AI answers sit at or below 1.7%.

CrawlerUser-agentOperatorTypeBlockedRate
ChatGPT SearchOAI-SearchBotOpenAIsearch10.3%
Claude SearchClaude-SearchBotAnthropicsearch10.3%
Perplexity SearchPerplexityBotPerplexitysearch10.3%
Apple AI SearchApplebotApplesearch10.3%
DuckAssistDuckAssistBotDuckDuckGosearch10.3%
ChatGPT (user-initiated)ChatGPT-UserOpenAIuser10.3%
Claude (user-initiated)Claude-UserAnthropicuser00.0%
Perplexity (user-initiated)Perplexity-UserPerplexityuser00.0%
OAI-AdsBot (ChatGPT Ads validation)OAI-AdsBotOpenAIads00.0%
GPTBot (OpenAI training)GPTBotOpenAItraining61.7%
ClaudeBot (Anthropic training)ClaudeBotAnthropictraining20.6%
Google-Extended (training)Google-ExtendedGoogletraining41.1%
Meta AI (training)Meta-ExternalAgentMetatraining51.4%
Bytespider (ByteDance training)BytespiderByteDancetraining71.9%
CCBot (Common Crawl)CCBotCommon Crawltraining82.2%
Cohere AI (training)cohere-aiCoheretraining30.8%
Applebot-Extended (training)Applebot-ExtendedAppletraining30.8%

Established brands vs growth-stage startups

Splitting the sample by maturity rules out a big-brand bias: both cohorts overwhelmingly leave AI crawlers open. Established brands block an AI search crawler at 1.3% and growth-stage YC companies at 0.5%. The one place they diverge is the edge: 50.2% of the YC companies sit behind Cloudflare versus 45.1% of the established brands, so the silent edge-block exposure is actually higher among younger companies.

CohortSitesReachableBlocks AI searchBehind CloudflareAny AI rule
Established brands1531492 (1.3%)69 (45.1%)24 (16.1%)
YC growth-stage2372131 (0.5%)119 (50.2%)31 (14.6%)

What this means for AI visibility

If you want AI engines to read your pages, robots.txt is rarely the problem. The data shows B2B SaaS sites overwhelmingly leave AI crawlers alone in robots.txt: only 55 of 362 reachable sites have any explicit AI rule, and not one uses a blanket Disallow: /. The blocking that does happen is concentrated on training crawlers (GPTBot, CCBot, Bytespider), not the search and retrieval bots that decide whether you get cited.

The bigger exposure is the edge. With 48.2% of sites behind Cloudflare, a default bot-management rule can challenge or block an AI crawler before it ever reaches your robots.txt. A page can be perfectly permissive in robots.txt and still earn zero citations because the WAF turned the crawler away. For the full explainer on which bots to allow and how to fix the Cloudflare default, read AI Crawlers in 2026: which bots to allow and which to block.

Methodology

The sample combines two documented cohorts, deduplicated: 153 recognizable, established B2B SaaS brands (a curated convenience sample across CRM, marketing, dev tooling, analytics, collaboration, HR/finance, security, and support), and 237 active B2B SaaS companies from the Y Combinator company directory (the public dataset, filtered reproducibly to companies tagged both B2B and SaaS, with status Active, Public, or Acquired, a resolvable website, and a team of 10 or more). The two cohorts let us check whether company maturity changes the result. It does not, materially.

Each robots.txt was fetched on 2026-06-27 and evaluated against 17 known AI crawler user-agents using the same RFC 9309 parser that powers our free AI Bot Access Checker. 28 of the 390 sites were unreachable, returned no robots.txt, or blocked our request at scan time; the remaining 362 were parsed. A site is counted as effectively blocking a crawler when its robots.txt has an explicit Disallow: / for that user-agent, or a wildcard (User-agent: *) Disallow: / with no overriding Allow. Cloudflare edge behavior is reported as a separate exposure signal, not counted as a robots.txt block. The full per-site, per-crawler results are below and downloadable.

The full dataset (390 sites)

Every scanned site, sorted by domain, tagged by cohort. Rows that block an AI search crawler are highlighted. Click any column header to sort. Download the complete per-crawler matrix as JSON or CSV.

Download JSONDownload CSV
15five.comEstablished200YesNoNo0
1password.comEstablished200YesNoNo0
abacus.comYC200NoNoNo0
abstra.ioYC200NoNoNo0
accessowl.ioYC200NoNoNo0
activecampaign.comEstablished200YesNoNo0
advantageclub.coYC200NoNoNo0
aha.ioEstablished200NoNoNo0
ahazou.comYC200YesNoNo0
ahrefs.comEstablished200YesNoNo0
airtable.comEstablished200NoNoNo0
alex.comYC200YesYesNo0
algolia.comYC200YesNoNo0
alguna.comYC200YesNoNo0
ambition.comYC200YesYesNo0
amplitude.comEstablished200NoYesNo0
animaapp.comYC200NoNoNo0
apollo.ioEstablished200YesYesNo0
apptimize.comYC200NoNoNo0
appx.co.inYC200YesNoNo0
arengu.comYCunreachableYesNoNo0
arintra.comYC200YesYesNo0
asana.comEstablished200NoNoNo0
ascen.comYC200YesNoNo0
aspireapp.comYC200YesNoNo0
atlas.soYCunreachableNoNoNo0
atlassian.comEstablished200NoNoNo0
auth0.comEstablished200YesNoNo0
bamboohr.comEstablished200YesYesNo0
basecamp.comEstablished200YesNoNo0
benchling.comYCunreachableNoNoNo0
bik.aiYC200NoNoNo0
bill.comEstablished200YesNoNo0
bitrise.ioYC200YesNoNo0
bonusly.comEstablished200YesNoNo0
boostly.comYC200NoNoNo0
borong.comYC200NoNoNo0
box.comEstablished200YesNoNo0
braze.comEstablished200YesNoNo0
brex.comEstablished200NoNoNo0
brightedge.comEstablished200NoNoNo0
briohr.comYC200NoNoNo0
buffer.comEstablished200YesNoNo0
buildwithfern.comYC200NoNoNo0
bujeti.comYC200NoNoNo0
calendly.comEstablished200YesYesNo1
calltree.aiYCunreachableNoNoNo0
cashflowportal.comYC200YesNoNo0
cekura.aiYC200NoNoNo0
chargebee.comEstablished200NoYesNo0
chipax.comYC200NoNoNo0
churnzero.comEstablished200NoNoNo0
cinder.aiYC200NoNoNo0
circleci.comEstablished200NoYesNo0
clari.comEstablished200YesNoNo0
clearscope.ioEstablished200NoNoNo0
clickup.comEstablished200NoNoNo0
close.comEstablished200YesNoNo0
cloudchipr.comYC200YesNoNo0
cloudflare.comEstablished200YesYesNo0
clueso.ioYC200NoNoNo0
cococart.coYC200NoNoNo0
coda.ioEstablished200NoNoNo0
command.aiYC200NoNoNo0
communityphone.orgYC200NoNoNo0
complif.comYC200YesNoNo0
compup.ioYC200NoNoNo0
conductor.comEstablished200YesNoNo0
confidentlims.comYC200YesYesNo0
confidotech.comYC200YesNoNo0
confluent.ioEstablished200NoYesNo0
contentful.comEstablished200NoNoNo0
copper.comEstablished200NoNoNo0
copy.aiEstablished200YesNoNo0
cortex.ioYC200NoNoNo0
coval.devYC200NoNoNo0
crowdstrike.comEstablished200YesNoNo0
cultureamp.comEstablished200NoNoNo0
curacel.coYC200YesNoNo0
customer.ioEstablished200NoNoNo0
customily.comYC200YesYesNo0
databricks.comEstablished200YesNoNo0
datadoghq.comEstablished200NoNoNo0
datamechanics.coYCunreachableYesNoNo0
datlo.comYC200NoNoNo0
dealls.comYC200YesNoNo0
deel.comEstablished200YesYesNo0
demodesk.comYC200YesYesNo0
devcycle.comYC200YesNoNo0
digitalocean.comEstablished200YesNoNo0
dimorder.comYC200NoNoNo0
disclosures.ioYC200YesNoNo0
distro.appYC200NoNoNo0
docker.comEstablished200NoNoNo0
docusign.comEstablished200NoNoNo0
dojah.ioYC200YesNoNo0
doublehq.comYC200YesNoNo0
draftwise.comYC200YesNoNo0
drift.comEstablishedunreachableYesNoNo0
dropbox.comEstablished200NoNoNo0
dyspatch.ioYC200YesNoNo0
edenworkplace.comYC200YesNoNo0
elyos.aiYC200YesNoNo0
embeddables.comYC200NoNoNo0
enableus.comYCunreachableNoNoNo0
epsilon3.ioYC200NoYesNo0
expensify.comEstablished200YesNoNo0
expent.aiYC200YesNoNo0
fathom.aiYC200YesNoNo0
fibery.ioEstablished200NoNoNo0
figma.comEstablished200NoYesYes8
finmark.comYC200YesNoNo0
firstbase.ioYC200YesNoNo0
fivetran.comEstablished200NoNoNo0
fondo.comYC200YesNoNo0
frase.ioEstablished200NoYesNo0
freshdesk.comEstablished200NoNoNo0
freshpaint.ioYC200YesNoNo0
freshworks.comEstablished200YesNoNo0
front.comEstablished200YesNoNo0
fullstory.comEstablished200NoNoNo0
gainsight.comEstablished200YesYesNo0
gaspos.coYCunreachableNoNoNo0
gem.comYC200NoNoNo0
getagency.comYC200YesYesNo0
getdbt.comEstablished200NoNoNo0
getguru.comEstablished200YesNoNo0
getofficely.comYC200YesNoNo0
getontop.comYC200YesNoNo0
getshogun.comYC200NoNoNo0
getstrada.comYC200NoNoNo0
getswipe.inYC200YesYesNo1
getthematic.comYC200NoNoNo0
gimbooks.comYCunreachableNoNoNo0
github.comEstablished200NoNoNo0
gitlab.comEstablished200YesNoNo0
gitprime.comYCunreachableNoNoNo0
givechariot.comYC200NoNoNo0
gladly.comEstablished200NoYesNo0
glowing.ioYC200YesNoNo0
go.givecampus.comYC200YesNoNo0
gocopia.comYC200NoNoNo0
gong.ioEstablished200YesNoNo0
goodkind.comYC200NoNoNo0
gotrebol.comYC200YesYesNo0
govpredict.comYC200YesNoNo0
grammarly.comEstablished200NoNoNo0
greatquestion.coYC200YesNoNo0
greenhouse.ioEstablished200YesNoNo0
guesty.comYC200NoNoNo0
guruhotel.comYC200NoYesNo0
gusto.comEstablishedunreachableYesNoNo0
haddock.appYC200YesNoNo0
hadrius.comYC200YesNoNo0
hashicorp.comEstablishedunreachableNoNoNo0
heap.ioEstablished200NoNoNo0
helloconduit.comYC200YesNoNo0
helpscout.comEstablished200NoNoNo0
herondata.ioYC200YesNoNo0
hireart.comYC200YesNoNo0
hockeystack.comYC200YesNoNo0
hona.comYC200YesNoNo0
hootsuite.comEstablished200NoNoNo0
hotglue.comYC200NoNoNo0
hotjar.comEstablished200NoNoNo0
hubspot.comEstablished200YesNoNo0
hudu.comYC200NoNoNo0
humaans.ioYC200NoNoNo0
humand.coYC200NoYesNo0
inaccord.comYC200YesNoNo0
inevent.comYC200NoNoNo0
infisical.comYC200NoNoNo0
instrumentl.comYC200YesYesNo0
intercom.comEstablished200NoNoNo0
inventive.aiYC200YesNoNo0
itsrever.comYC200YesNoNo0
jasper.aiEstablished200YesYesNo0
jfrog.comEstablished200NoNoNo0
jobvite.comEstablished200NoNoNo0
joinarc.comYC200NoNoNo0
joindeed.comYC200YesNoNo0
journey.ioYC200NoNoNo0
jutsoo.comYC200YesNoNo0
kapital.mxYC200NoNoNo0
klaviyo.comEstablished200YesYesNo0
kular.aiYC200YesYesNo0
kustomer.comEstablished200NoNoNo0
lapzo.comYC200YesNoNo0
lattice.comEstablished200YesNoNo0
launchdarkly.comEstablished200NoNoNo0
lever.coEstablished200NoYesNo0
linear.appEstablished200YesNoNo0
linkana.comYC200YesYesNo0
loom.comEstablished200NoYesNo1
luciq.aiYC200YesNoNo0
luminai.comYCunreachableNoNoNo0
mailchimp.comEstablished200NoNoNo0
mailwarm.comYC200NoNoNo0
make.comEstablished200YesNoNo0
malga.ioYC200NoNoNo0
mantys.ioYC200YesNoNo0
marketmuse.comEstablished200NoNoNo0
marketo.comEstablished200NoNoNo0
marqvision.comYC200YesNoNo0
medmehealth.comYC200YesNoNo0
merlinai.coYC200NoNoNo0
mesh.aiYC200YesNoNo0
methodfi.comYC200NoNoNo0
middesk.comYCunreachableYesNoNo0
miliopay.comYCunreachableNoNoNo0
miro.comEstablished200NoNoNo0
mixpanel.comEstablished200NoNoNo0
mixrank.comYC200NoNoNo0
modelml.comYC200NoNoNo0
momomedical.comYCunreachableNoNoNo0
monday.comEstablished200YesYesNo0
mongodb.comEstablished200NoNoNo0
moz.comEstablished200YesYesNo0
mozartdata.comYC200YesNoNo0
mutiny.comEstablished200NoYesNo3
mutinyhq.comYC200NoNoNo0
myworkpay.comYC200YesNoNo0
nanonets.comYC200NoNoNo0
natero.comYCunreachableNoNoNo0
navattic.comYC200NoNoNo0
netlify.comEstablished200NoYesNo0
netsuite.comEstablished200NoNoNo0
newrelic.comEstablished200NoNoNo0
notion.soEstablished200YesNoNo0
novahq.comYC200YesNoNo0
okta.comEstablished200YesNoNo0
okteto.comYC200NoNoNo0
olaclick.comYC200NoYesNo0
omnistrate.comYC200YesNoNo0
oneleet.comYC200NoNoNo0
oneschema.coYC200YesNoNo0
onesignal.comYC200YesNoNo0
onswitchboard.comYC200YesNoNo0
optimizely.comEstablished200NoNoNo0
otterly.aiEstablishedunreachableNoNoNo0
outreach.ioEstablished200YesNoNo0
outset.aiYC200NoNoNo0
paddle.comEstablished200NoNoNo0
pagerduty.comEstablished200YesNoNo0
palettehq.comYC200YesNoNo0
paloaltonetworks.comEstablished200NoNoNo0
panoramaed.comYC200YesYesYes6
parrotsoftware.ioYC200YesYesNo0
partnerstack.comYC200YesNoNo0
peakflo.coYC200NoNoNo0
peec.aiEstablished200NoNoNo0
pendo.ioEstablished200NoNoNo0
peoplebox.aiYC200NoNoNo0
persistiq.comYC200NoNoNo0
pipedrive.comEstablished200YesNoNo0
plaid.comEstablished200NoNoNo0
plane.comYC200NoNoNo0
plerk.ioYCunreachableNoNoNo0
podium.comYC200NoNoNo0
pointone.comYC200NoNoNo0
polotab.comYC200NoYesNo0
popl.coYC200YesNoNo0
porter.runYC200YesNoNo0
posthog.comEstablished200NoNoNo0
postman.comEstablished200YesNoNo0
preki.comYCunreachableNoNoNo0
productboard.comEstablished200NoNoNo0
properfinance.ioYCunreachableNoNoNo0
pump.coYC200NoNoNo0
qualtrics.comEstablished200NoNoNo0
quanwellbeing.comYCunreachableNoNoNo0
quartzy.comYC200YesNoNo0
quotabook.comYC200NoNoNo0
rallyuxr.comYC200YesNoNo0
ramp.comEstablished200YesNoNo0
rapid7.comEstablished200YesNoNo0
rdash.ioYC200NoNoNo0
re2.aiYC200NoNoNo0
ready.netYC200NoNoNo0
rebill.comYC200YesNoNo0
recurly.comEstablished200NoNoNo0
rehook.aiYCunreachableNoNoNo0
revl.comYC200NoYesNo0
ridecell.comYCunreachableYesNoNo0
rippling.comEstablished200YesNoNo0
rootly.comYCunreachableYesNoNo0
runway.teamYC200YesNoNo0
salarybox.inYC200NoYesNo0
salesforce.comEstablished200NoNoNo0
salesloft.comEstablished200YesNoNo0
salespatriot.comYC200NoNoNo0
salt.peYC200NoNoNo0
salvy.com.brYC200NoNoNo0
santehq.comYC200YesNoNo0
scispot.ioYC200YesYesNo0
secoda.coYC200YesYesNo1
segment.comEstablished200NoNoNo0
semrush.comEstablished200NoNoNo0
sendbird.comYC200NoYesNo3
sentinelone.comEstablished200YesNoNo0
sentry.ioEstablished200NoNoNo0
seoclarity.netEstablished200YesNoNo0
shortcut.comEstablished200YesNoNo0
sigmamind.aiYC200YesNoNo0
signoz.ioYC200NoNoNo0
slack.comEstablished200NoNoNo0
smartrecruiters.comEstablished200YesNoNo0
smartsheet.comEstablished200NoNoNo0
snapdocs.comYC200YesNoNo0
snowflake.comEstablished200YesNoNo0
snyk.ioEstablished200NoNoNo0
spadeworks.coYC200NoNoNo0
spott.ioYC200YesYesNo0
sproutsocial.comEstablished200NoNoNo0
stafbook.comYC200YesNoNo0
stilt.comYC200YesNoNo0
storylane.ioYC200YesYesNo0
stream.claimsYC200YesNoNo0
stripe.comEstablished200NoNoNo0
submittable.comYC200YesYesNo0
suger.ioYC200YesNoNo0
sully.aiYC200NoNoNo0
superb-ai.comYC200NoNoNo0
superorder.comYC200YesNoNo0
supertokens.comYC200YesNoNo0
surferseo.comEstablished200YesNoNo0
surveymonkey.comEstablished200NoNoNo0
swif.aiYC200YesNoNo0
swiftsku.comYC200YesNoNo0
swiftype.comYC200NoNoNo0
syncly.appYC200NoNoNo0
sytex.ioYC200NoNoNo0
tableau.comEstablished200NoYesNo1
tailor.techYC200NoNoNo0
talentropy.aiYCunreachableNoNoNo0
tamarind.bioYC200NoNoNo0
tenable.comEstablished200YesNoNo0
tesorio.comYC200NoNoNo0
torch.ioYC200NoNoNo0
tractian.comYC200YesNoNo0
trello.comEstablished200NoNoNo0
truewind.aiYC200NoNoNo0
trycoast.comYC200YesNoNo0
tryprofound.comEstablished200YesNoNo0
tryscribe.comYC200NoNoNo0
tryterra.coYC200NoNoNo0
twenty.comYC200YesNoNo0
twilio.comEstablished200NoNoNo0
tyltgo.comYCunreachableNoNoNo0
typeform.comEstablished200YesNoNo0
ubits.comYC200YesNoNo0
unbounce.comEstablished200YesNoNo0
unravelcarbon.comYC200YesNoNo0
unsaddl.comYC200YesNoNo0
upkeep.comYC200YesNoNo0
uplane.comYC200NoNoNo0
useagave.comYC200YesNoNo0
usecaribou.comYC200NoNoNo0
usefini.comYC200NoNoNo0
usehatch.aiYC200YesNoNo0
usekaya.comYC200NoNoNo0
usemotion.comYC200YesNoNo0
usenash.comYC200YesYesNo7
userpilot.comEstablished200YesNoNo0
vector.coYC200YesNoNo0
vercel.comEstablished200NoNoNo0
verihubs.comYC200YesNoNo0
versori.comYC200YesNoNo0
vitalize.careYC200NoNoNo0
vobi.com.brYC200YesNoNo0
volopay.comYC200NoYesNo8
walkme.comEstablished200NoNoNo0
waterplan.comYC200NoNoNo0
webflow.comEstablished200YesNoNo0
withblaze.appYCunreachableNoNoNo0
wondercraft.aiYC200NoNoNo0
workday.comEstablished200NoNoNo0
wrike.comEstablished200YesNoNo0
writesonic.comEstablished200YesNoNo0
xero.comEstablished200NoNoNo0
ycombinator.comYC200YesNoNo0
zapier.comEstablished200NoYesNo0
zeal.comYC200YesYesNo0
zendesk.comEstablished200YesNoNo0
zeorouteplanner.comYC200NoYesNo0
zoho.comEstablished200NoNoNo0
zoom.usEstablished200YesNoNo0
zoominfo.comEstablished200YesYesYes4
zscaler.comEstablished200YesNoNo0
zuddl.comYC200YesNoNo0

AI-Advisors AI Bot Blocking Index, 390 B2B SaaS sites, scanned 2026-06-27. Free to cite and reuse with attribution (CC BY 4.0).

Check your own site

Run your domain through the free AI Bot Access Checker to see, in seconds, which AI crawlers your robots.txt allows and blocks, then confirm your edge layer is not turning them away. For the strategy behind the data, read AI Crawlers in 2026 and how to optimize a website for AI search.

Check your site for free →