Bot Deterrence
Once you have TollBit set up for your website, you are now able to set up bot deterrence settings on your existing
cloud cybersecurity platform to forward known bot traffic to your new tollbit
subdomain.
At a high level, you are simply modifying your existing bot blocking solution to, instead of returning an error response if
it detects a bad bot, to instead forward that traffic over to us through your tollbit
subdomain.
The example solutions we provide here assume that you currently do not have bot detection and blocking in place.
It should be straightforward to use these examples to understand how you can update your current blocking solutions
to instead forward detected bots to your tollbit
subdomain. Forwarded bots will see a message like the following:
{
"message": "You are not authorized to access this content without a valid TollBit Token. Please follow this URL to find out more.",
"url": "https://tollbit.com"
}
AWS WAF + CloudFront
You can use a combination of AWS Web ACLs and CloudFront to detect and redirect bots. This example will use a Web ACL with a WAF rule to detect bots, and then have CloudFront redirect bot traffic.
First, go to the WAF & Shield and create a new Web ACL. Ensure that the ACL being created is for CloudFront distributions. Add your existing CloudFront distribution to this ACL under the "Associated AWS resources" section of the page.
Once you've created the ACL, you can choose any rules you'd like to enable bot detection. AWS Marketplace has managed bot detection rules that you can add to your ACL. We will provide our own WAF rule as well. To use our WAF rule, select the option for using your own rules and rule groups, and use the JSON editor. Copy and paste the following rule:
{
"Name": "cloudfront-agent-rule",
"Priority": 0,
"Statement": {
"OrStatement": {
"Statements": [
{
"ByteMatchStatement": {
"SearchString": "ChatGPT-User",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "PerplexityBot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "GPTBot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "anthropic-ai",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "CCBot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "Google-Extended",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "Amazonbot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "FacebookBot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "Claude-Web",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "cohere-ai",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "Omgilibot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "omgili",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "YouBot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "Bytespider",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "Diffbot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
"ByteMatchStatement": {
"SearchString": "OAI-SearchBot",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
},
{
"ByteMatchStatement": {
"SearchString": "Applebot-Extended",
"FieldToMatch": {
"SingleHeader": {
"Name": "user-agent"
}
},
"TextTransformations": [
{
"Priority": 0,
"Type": "NONE"
}
],
"PositionalConstraint": "CONTAINS"
}
}
]
}
},
"Action": {
"Allow": {
"CustomRequestHandling": {
"InsertHeaders": [
{
"Name": "Bot",
"Value": "true"
}
]
}
}
},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "cloudfront-agent-rule"
}
}
This will detect the top known AI bots. Next, for the action, be sure to choose "Allow" and to add a custom header. Ours is called bot
, but
feel free to make this anything unique.
Next, navigate to the CloudFront product and to the "Functions" tab. Create a new function and paste in the following javascript:
function handler(event) {
if (event.request.headers['x-amzn-waf-bot'] !== undefined) {
const host = event.request.headers.host.value
const uri = event.request.uri
const newurl = `https://tollbit.${host}${uri}`
const response = {
statusCode: 302,
statusDescription: 'Found',
headers: { location: { value: newurl } },
}
return response
}
return event.request
}
Earlier, our WAF rule had set a header called bot
onto the request if it matched the rule. Amazon automatically appends x-amzn-waf-
to the header,
so the actual header to look for is now called x-amzn-waf-bot
. If this header exists, it means that our WAF rule detected that this request
is a bot request, so we now want to forward it to our tollbit subdomain. Once you are ready, save the changes and publish this code.
On the publish tab, you will then need to associate this function to your existing CloudFront distribution.
This code snippet will run for every request to your distribution. Please ensure you've tested this function before completeing this step.
CloudFlare
There are several levels of bot detection and forwarding that you can configure for CloudFlare, depending on whether or not you are on their Enterprise plan.
The code snippets here are for a clean CloudFlare environment. If you have existing workers that are processing requests from your domain, you will need to integrate these scripts into your existing worker.
Bot Deterrence on any Plan (Including Free)
Follow the steps described here up until you have created a
new worker. Name this working something to help you keep track of it's function (such as bot-forwarding-worker
).
Once you've created this worker, click into edit code and do the following to set up your forwarding worker.
If you have already created a CloudFlare worker for log forwarding, DO NOT create a new worker. Use your existing worker when following these instructions. This is because you cannot have two CloudFlare workers on the same route, and if you do, only one will be receive requests.
If you have not set up log forwarding and just want to forward bot traffic, put this code in your worker.js
file.
// this is a non-exhaustive list of agents that we recommend you get started with first
// Add any other agents you would like to forward into this list.
const botList = [
'ChatGPT-User',
'PerplexityBot',
'GPTBot',
'anthropic-ai',
'CCBot',
'Claude-Web',
'ClaudeBot',
'cohere-ai',
'YouBot',
'Diffbot',
'OAI-SearchBot',
]
export default {
fetch(request) {
const userAgent = request.headers.get('User-Agent') || ''
const path = request.url.replace(
'https://' + request.headers.get('host'),
'',
)
let host = request.headers.get('host') || ''
if (host.startsWith('www.')) {
// remove www
host = host.slice(4)
}
for (var i = 0; i < botList.length; i++) {
if (userAgent.includes(botList[i])) {
return Response.redirect('https://tollbit.' + host + path, 302)
}
}
// Default behaviour
return fetch(request)
},
}
If you have set up log forwarding, copy and replace your worker.js
file with this code instead. Make sure that you keep your TollBit token copied over into the code.
// this is a non-exhaustive list of agents that we recommend you get started with first
// Add any other agents you would like to forward into this list.
const botList = [
'ChatGPT-User',
'PerplexityBot',
'GPTBot',
'anthropic-ai',
'CCBot',
'Claude-Web',
'ClaudeBot',
'cohere-ai',
'YouBot',
'Diffbot',
'OAI-SearchBot',
]
const CF_APP_VERSION = '1.0.0'
const tollbitLogEndpoint = 'https://log.tollbit.com/log'
const tollbitToken = 'YOUR_SECRET_KEY_HERE'
const sleep = (ms) => {
return new Promise((resolve) => {
setTimeout(resolve, ms)
})
}
const makeid = (length) => {
let text = ''
const possible = 'ABCDEFGHIJKLMNPQRSTUVWXYZ0123456789'
for (let i = 0; i < length; i += 1) {
text += possible.charAt(Math.floor(Math.random() * possible.length))
}
return text
}
const buildLogMessage = (request, response) => {
const logObject = {
timestamp: new Date().toISOString(),
client_ip: '', // worker only is able to get cloudflare edge IP, leaving blank
geo_country: request.cf['country'],
geo_city: request.cf['city'],
geo_postal_code: request.cf['postalCode'],
geo_latitude: request.cf['latitude'],
geo_longitude: request.cf['longitude'],
host: request.headers.get('host'),
url: request.url.replace('https://' + request.headers.get('host'), ''),
request_method: request.method,
request_protocol: request.cf['httpProtocol'],
request_user_agent: request.headers.get('user-agent'),
request_latency: null, // cloudflare does not have latency information
response_state: null,
response_status: response.status,
response_reason: response.statusText,
response_body_size: response.contentLength,
}
return logObject
}
// Batching
const BATCH_INTERVAL_MS = 20000 // 30 seconds
const MAX_REQUESTS_PER_BATCH = 500 // 500 logs
const WORKER_ID = makeid(6)
let workerTimestamp
let batchTimeoutReached = true
let logEventsBatch = []
// Backoff
const BACKOFF_INTERVAL = 10000
let backoff = 0
async function addToBatch(body, event) {
logEventsBatch.push(body)
if (logEventsBatch.length >= MAX_REQUESTS_PER_BATCH) {
event.waitUntil(postBatch(event))
}
return true
}
async function handleRequest(event) {
const { request } = event
const response = await fetch(request)
const isBotRequest = checkIfBotRequest(request)
const eventBody = buildLogMessage(request, response)
event.waitUntil(addToBatch(eventBody, event))
if (isBotRequest) {
const path = request.url.replace(
'https://' + request.headers.get('host'),
'',
)
let host = request.headers.get('host') || ''
if (host.startsWith('www.')) {
// remove www
host = host.slice(4)
}
return Response.redirect('https://tollbit.' + host + path, 302)
}
return response
}
const fetchAndSetBackOff = async (lfRequest, event) => {
if (backoff <= Date.now()) {
const resp = await fetch(tollbitLogEndpoint, lfRequest)
if (resp.status === 403 || resp.status === 429) {
backoff = Date.now() + BACKOFF_INTERVAL
}
}
event.waitUntil(scheduleBatch(event))
return true
}
const postBatch = async (event) => {
const batchInFlight = [...logEventsBatch.map((e) => JSON.stringify(e))]
logEventsBatch = []
const body = batchInFlight.join('\n')
const request = {
method: 'POST',
headers: {
TollbitKey: `${tollbitToken}`,
'Content-Type': 'application/json',
},
body,
}
event.waitUntil(fetchAndSetBackOff(request, event))
}
const scheduleBatch = async (event) => {
if (batchTimeoutReached) {
batchTimeoutReached = false
await sleep(BATCH_INTERVAL_MS)
if (logEventsBatch.length > 0) {
event.waitUntil(postBatch(event))
}
batchTimeoutReached = true
}
return true
}
const checkIfBotRequest = (request) => {
const userAgent = request.headers.get('User-Agent') || ''
for (var i = 0; i < botList.length; i++) {
if (userAgent.includes(botList[i])) {
return true
}
}
return false
}
addEventListener('fetch', (event) => {
event.passThroughOnException()
if (!workerTimestamp) {
workerTimestamp = new Date().toISOString()
}
event.waitUntil(scheduleBatch(event))
event.respondWith(handleRequest(event))
})
This code will immediately let through anyone with a known browser, and check all other requests against a list that we will periodically update with known bad user agents.
This worker will intercept and potentially forward traffic from your site. It is crucial to make sure that you incrementally deploy this to a subset of your pages first and QA it thoroughly to ensure that it is not blocking human traffic or good bot traffic (Google, etc) before elevating it across your entire website.
Enterprise
If you have CloudFlare enterprise, you should be able to use the Bot Management product to get a bot score for each request. You can add logic in the above code's checkIfBotRequest
function to also return true
if the bot score is lower than a certain threshold.
Fastly
Fastly allows you to set up redirectly using VCL snippets. In this document, we will go over setting up forwarding requests from known bots to your tollbit subdomain.
The code shown here is for a clean Fastly environment. If you have any existing VCL scripts that intercept requests, you will need to integrate these scripts into your existing workflow.
Go to the Deliver tab and select the domain you wish to add bot forwarding to. On the right side of the screen, click the Edit configuration button and choose to clone your current active version.
On the left hand sidebar, click "VCL Snippets".
Create a snippet and name it something like tollbit-bot-forwarding-recv
. This is the VCL code that will detect if a bot is using one of our known bad user agents, and will forward it to your subdomain. Put the following logic into the snippet. Make sure that the placement of the snippet is within the recv subroutine.
Copy and paste the following code block into the VCL input field and save. Don't worry, this VCL script will not actually apply until you activate the current Fastly version that you are editing.
if (req.http.user-agent ~ "(?i)chatgpt-user|perplexitybot|gptbot|anthropic-ai|ccbot|claude-web|claudebot|cohere-ai|youbot|diffbot|oai-searchbot") {
if (std.prefixof(req.http.host, "www.")) {
set req.http.host = std.replace_prefix(req.http.host, "www.", "tollbit.");
} else {
set req.http.host = "tollbit." + req.http.host;
}
error 600;
}
Next, create another VCL snippet. This time, call it something like tollbit-bot-forwarding-error
. This time, make sure that the placement is within the error subroutine.
Paste the following code in this snippet. This will set the correct headers and status code for the redirection done in the previous snippet.
if (obj.status == 600) {
set obj.status = 307;
set obj.response = "Temporary Redirect";
set obj.http.Location = req.protocol + "://" req.http.host + req.url;
set obj.http.cache-control = "max-age=0";
return (deliver);
}
The VCL scripts you just added will intercept and potentially redirect traffic to your main site. Please ensure that you have tested this in a test environment or for a small subset of pages before activating this across your entire site.
This should now be all you need to forward known bot traffic to your tollbit
subdomain! You can activate these changes by clicking "Apply".
Akamai
Akamai allows you to set up redirection rules at the edge using Cloudlets. Specifically, they provide Edge Redirector Cloudlets that help you manage redirection using certain matching rules.
We want to first start by creating an Edge Redirector policy. Follow the documentation here to do so in accordance with how your Akamai instance is set up.
Once you have set up your policy, follow the documentation here to set up rules for your Edge Redirector. Because we want to be redirecting based on the User-Agent
header, we will need to create a redirector with advance matching rules. You will want to create a match type based on the request header. The name of the header should be User-Agent
, and the value should be a tab separated list of bad user agents. You can use the following list:
ChatGPT-User PerplexityBot GPTBot anthropic-ai CCBot Claude-Web ClaudeBot cohere-ai YouBot Diffbot OAI-SearchBot
For the operator value, use is one of
without case sensitivity. These settings should let you match our known bad users agents. In the redirection rule, you can set the redirect url to your tollbit
subdomain.
Cloudlets Policy Manager evaluates rules from top to bottom, and picks the first rule that matches. If you have other Cloudlets with rules that also intercept requests, they may match before the rule you just added.
Click save rule to save your changes, and you should be ready to activate! Follow the steps here to do so.
The rule you just added will intercept and potentially redirect traffic to your main site. Please ensure that you have tested this in a test environment or for a small subset of pages before activating this across your entire site.
Vercel
To redirect bots to your TollBit subdomain you can use Vercel's Custom WAF rules.
Create a new rule. Set the rule to look at the User Agent and selected Matches expression
. Copy the following regex as the expression to match. Feel free to modify to remove or block different bots.
(ChatGPT-User|PerplexityBot|GPTBot|anthropic-ai|CCBot|Claude-Web|ClaudeBot|cohere-ai|YouBot|Diffbot|OAI-SearchBot)
Set the rule to redirect to your TollBit subdomain by changing the Then
option to Redirect
and copy in your TollBit subdomain. The rule should look like this when you're finished
Finally, click save rule. The change won't go into effect until you publish the change to go live.