Bot Deterrence

Once you have TollBit set up for your website, you are now able to set up bot deterrence settings on your existing cloud cybersecurity platform to forward known bot traffic to your new tollbit subdomain.

At a high level, you are simply modifying your existing bot blocking solution to, instead of returning an error response if it detects a bad bot, to instead forward that traffic over to us through your tollbit subdomain.

The example solutions we provide here assume that you currently do not have bot detection and blocking in place. It should be straightforward to use these examples to understand how you can update your current blocking solutions to instead forward detected bots to your tollbit subdomain. Forwarded bots will see a message like the following:

{
  "message": "You are not authorized to access this content without a valid TollBit Token. Please follow this URL to find out more.",
  "url": "https://tollbit.com"
}

AWS WAF + CloudFront

You can use a combination of AWS Web ACLs and CloudFront to detect and redirect bots. This example will use a Web ACL with a WAF rule to detect bots, and then have CloudFront redirect bot traffic.

First, go to the WAF & Shield and create a new Web ACL. Ensure that the ACL being created is for CloudFront distributions. Add your existing CloudFront distribution to this ACL under the "Associated AWS resources" section of the page.

Once you've created the ACL, you can choose any rules you'd like to enable bot detection. AWS Marketplace has managed bot detection rules that you can add to your ACL. We will provide our own WAF rule as well. To use our WAF rule, select the option for using your own rules and rule groups, and use the JSON editor. Copy and paste the following rule:

{
  "Name": "cloudfront-agent-rule",
  "Priority": 0,
  "Statement": {
    "OrStatement": {
      "Statements": [
        {
          "ByteMatchStatement": {
            "SearchString": "ChatGPT-User",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "PerplexityBot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "GPTBot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "anthropic-ai",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "CCBot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "Google-Extended",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "Amazonbot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "FacebookBot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "Claude-Web",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "cohere-ai",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "Omgilibot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "omgili",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "YouBot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "Bytespider",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "Diffbot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },

          "ByteMatchStatement": {
            "SearchString": "OAI-SearchBot",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        },
        {
          "ByteMatchStatement": {
            "SearchString": "Applebot-Extended",
            "FieldToMatch": {
              "SingleHeader": {
                "Name": "user-agent"
              }
            },
            "TextTransformations": [
              {
                "Priority": 0,
                "Type": "NONE"
              }
            ],
            "PositionalConstraint": "CONTAINS"
          }
        }
      ]
    }
  },
  "Action": {
    "Allow": {
      "CustomRequestHandling": {
        "InsertHeaders": [
          {
            "Name": "Bot",
            "Value": "true"
          }
        ]
      }
    }
  },
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "cloudfront-agent-rule"
  }
}

This will detect the top known AI bots. Next, for the action, be sure to choose "Allow" and to add a custom header. Ours is called bot, but feel free to make this anything unique.

Next, navigate to the CloudFront product and to the "Functions" tab. Create a new function and paste in the following javascript:

function handler(event) {
  if (event.request.headers['x-amzn-waf-bot'] !== undefined) {
    const host = event.request.headers.host.value
    const uri = event.request.uri
    const newurl = `https://tollbit.${host}${uri}`
    const response = {
      statusCode: 302,
      statusDescription: 'Found',
      headers: { location: { value: newurl } },
    }
    return response
  }
  return event.request
}

Earlier, our WAF rule had set a header called bot onto the request if it matched the rule. Amazon automatically appends x-amzn-waf- to the header, so the actual header to look for is now called x-amzn-waf-bot. If this header exists, it means that our WAF rule detected that this request is a bot request, so we now want to forward it to our tollbit subdomain. Once you are ready, save the changes and publish this code. On the publish tab, you will then need to associate this function to your existing CloudFront distribution.

CloudFlare

There are several levels of bot detection and forwarding that you can configure for CloudFlare, depending on whether or not you are on their Enterprise plan.

Bot Deterrence on any Plan (Including Free)

Follow the steps described here up until you have created a new worker. Name this working something to help you keep track of it's function (such as bot-forwarding-worker). Once you've created this worker, click into edit code and do the following to set up your forwarding worker.

If you have not set up log forwarding and just want to forward bot traffic, put this code in your worker.js file.

// this is a non-exhaustive list of agents that we recommend you get started with first
// Add any other agents you would like to forward into this list.
const botList = [
  'ChatGPT-User',
  'PerplexityBot',
  'GPTBot',
  'anthropic-ai',
  'CCBot',
  'Claude-Web',
  'ClaudeBot',
  'cohere-ai',
  'YouBot',
  'Diffbot',
  'OAI-SearchBot',
]

export default {
  fetch(request) {
    const userAgent = request.headers.get('User-Agent') || ''
    const path = request.url.replace(
      'https://' + request.headers.get('host'),
      '',
    )
    let host = request.headers.get('host') || ''
    if (host.startsWith('www.')) {
      // remove www
      host = host.slice(4)
    }
    for (var i = 0; i < botList.length; i++) {
      if (userAgent.includes(botList[i])) {
        return Response.redirect('https://tollbit.' + host + path, 302)
      }
    }

    // Default behaviour
    return fetch(request)
  },
}

If you have set up log forwarding, copy and replace your worker.js file with this code instead. Make sure that you keep your TollBit token copied over into the code.

// this is a non-exhaustive list of agents that we recommend you get started with first
// Add any other agents you would like to forward into this list.
const botList = [
  'ChatGPT-User',
  'PerplexityBot',
  'GPTBot',
  'anthropic-ai',
  'CCBot',
  'Claude-Web',
  'ClaudeBot',
  'cohere-ai',
  'YouBot',
  'Diffbot',
  'OAI-SearchBot',
]

const CF_APP_VERSION = '1.0.0'

const tollbitLogEndpoint = 'https://log.tollbit.com/log'
const tollbitToken = 'YOUR_SECRET_KEY_HERE'

const sleep = (ms) => {
  return new Promise((resolve) => {
    setTimeout(resolve, ms)
  })
}

const makeid = (length) => {
  let text = ''
  const possible = 'ABCDEFGHIJKLMNPQRSTUVWXYZ0123456789'
  for (let i = 0; i < length; i += 1) {
    text += possible.charAt(Math.floor(Math.random() * possible.length))
  }
  return text
}

const buildLogMessage = (request, response) => {
  const logObject = {
    timestamp: new Date().toISOString(),
    client_ip: '', // worker only is able to get cloudflare edge IP, leaving blank
    geo_country: request.cf['country'],
    geo_city: request.cf['city'],
    geo_postal_code: request.cf['postalCode'],
    geo_latitude: request.cf['latitude'],
    geo_longitude: request.cf['longitude'],
    host: request.headers.get('host'),
    url: request.url.replace('https://' + request.headers.get('host'), ''),
    request_method: request.method,
    request_protocol: request.cf['httpProtocol'],
    request_user_agent: request.headers.get('user-agent'),
    request_latency: null, // cloudflare does not have latency information
    response_state: null,
    response_status: response.status,
    response_reason: response.statusText,
    response_body_size: response.contentLength,
  }
  return logObject
}

// Batching
const BATCH_INTERVAL_MS = 20000 // 30 seconds
const MAX_REQUESTS_PER_BATCH = 500 // 500 logs
const WORKER_ID = makeid(6)

let workerTimestamp

let batchTimeoutReached = true
let logEventsBatch = []

// Backoff
const BACKOFF_INTERVAL = 10000
let backoff = 0

async function addToBatch(body, event) {
  logEventsBatch.push(body)

  if (logEventsBatch.length >= MAX_REQUESTS_PER_BATCH) {
    event.waitUntil(postBatch(event))
  }

  return true
}

async function handleRequest(event) {
  const { request } = event

  const response = await fetch(request)
  const isBotRequest = checkIfBotRequest(request)

  const eventBody = buildLogMessage(request, response)
  event.waitUntil(addToBatch(eventBody, event))

  if (isBotRequest) {
    const path = request.url.replace(
      'https://' + request.headers.get('host'),
      '',
    )
    let host = request.headers.get('host') || ''
    if (host.startsWith('www.')) {
      // remove www
      host = host.slice(4)
    }
    return Response.redirect('https://tollbit.' + host + path, 302)
  }
  return response
}

const fetchAndSetBackOff = async (lfRequest, event) => {
  if (backoff <= Date.now()) {
    const resp = await fetch(tollbitLogEndpoint, lfRequest)
    if (resp.status === 403 || resp.status === 429) {
      backoff = Date.now() + BACKOFF_INTERVAL
    }
  }

  event.waitUntil(scheduleBatch(event))

  return true
}

const postBatch = async (event) => {
  const batchInFlight = [...logEventsBatch.map((e) => JSON.stringify(e))]
  logEventsBatch = []
  const body = batchInFlight.join('\n')
  const request = {
    method: 'POST',
    headers: {
      TollbitKey: `${tollbitToken}`,
      'Content-Type': 'application/json',
    },
    body,
  }
  event.waitUntil(fetchAndSetBackOff(request, event))
}

const scheduleBatch = async (event) => {
  if (batchTimeoutReached) {
    batchTimeoutReached = false
    await sleep(BATCH_INTERVAL_MS)
    if (logEventsBatch.length > 0) {
      event.waitUntil(postBatch(event))
    }
    batchTimeoutReached = true
  }
  return true
}

const checkIfBotRequest = (request) => {
  const userAgent = request.headers.get('User-Agent') || ''

  for (var i = 0; i < botList.length; i++) {
    if (userAgent.includes(botList[i])) {
      return true
    }
  }
  return false
}

addEventListener('fetch', (event) => {
  event.passThroughOnException()

  if (!workerTimestamp) {
    workerTimestamp = new Date().toISOString()
  }

  event.waitUntil(scheduleBatch(event))
  event.respondWith(handleRequest(event))
})

This code will immediately let through anyone with a known browser, and check all other requests against a list that we will periodically update with known bad user agents.

Enterprise

If you have CloudFlare enterprise, you should be able to use the Bot Management product to get a bot score for each request. You can add logic in the above code's checkIfBotRequest function to also return true if the bot score is lower than a certain threshold.

Fastly

Fastly allows you to set up redirectly using VCL snippets. In this document, we will go over setting up forwarding requests from known bots to your tollbit subdomain.

Go to the Deliver tab and select the domain you wish to add bot forwarding to. On the right side of the screen, click the Edit configuration button and choose to clone your current active version.

On the left hand sidebar, click "VCL Snippets".

Create a snippet and name it something like tollbit-bot-forwarding-recv. This is the VCL code that will detect if a bot is using one of our known bad user agents, and will forward it to your subdomain. Put the following logic into the snippet. Make sure that the placement of the snippet is within the recv subroutine.

Copy and paste the following code block into the VCL input field and save. Don't worry, this VCL script will not actually apply until you activate the current Fastly version that you are editing.

if (req.http.user-agent ~ "(?i)chatgpt-user|perplexitybot|gptbot|anthropic-ai|ccbot|claude-web|claudebot|cohere-ai|youbot|diffbot|oai-searchbot") {
  if (std.prefixof(req.http.host, "www.")) {
    set req.http.host = std.replace_prefix(req.http.host, "www.", "tollbit.");
  } else {
    set req.http.host = "tollbit." + req.http.host;
  }
  error 600;
}

Next, create another VCL snippet. This time, call it something like tollbit-bot-forwarding-error. This time, make sure that the placement is within the error subroutine.

Paste the following code in this snippet. This will set the correct headers and status code for the redirection done in the previous snippet.

if (obj.status == 600) {
  set obj.status = 307;
  set obj.response = "Temporary Redirect";
  set obj.http.Location = req.protocol + "://" req.http.host + req.url;
  set obj.http.cache-control = "max-age=0";
  return (deliver);
}

This should now be all you need to forward known bot traffic to your tollbit subdomain! You can activate these changes by clicking "Apply".

Akamai

Akamai allows you to set up redirection rules at the edge using Cloudlets. Specifically, they provide Edge Redirector Cloudlets that help you manage redirection using certain matching rules.

We want to first start by creating an Edge Redirector policy. Follow the documentation here to do so in accordance with how your Akamai instance is set up.

Once you have set up your policy, follow the documentation here to set up rules for your Edge Redirector. Because we want to be redirecting based on the User-Agent header, we will need to create a redirector with advance matching rules. You will want to create a match type based on the request header. The name of the header should be User-Agent, and the value should be a tab separated list of bad user agents. You can use the following list:

ChatGPT-User PerplexityBot GPTBot anthropic-ai CCBot Claude-Web ClaudeBot cohere-ai YouBot Diffbot OAI-SearchBot

For the operator value, use is one of without case sensitivity. These settings should let you match our known bad users agents. In the redirection rule, you can set the redirect url to your tollbit subdomain.

Click save rule to save your changes, and you should be ready to activate! Follow the steps here to do so.

Vercel

To redirect bots to your TollBit subdomain you can use Vercel's Custom WAF rules.

Create a new rule. Set the rule to look at the User Agent and selected Matches expression. Copy the following regex as the expression to match. Feel free to modify to remove or block different bots.

(ChatGPT-User|PerplexityBot|GPTBot|anthropic-ai|CCBot|Claude-Web|ClaudeBot|cohere-ai|YouBot|Diffbot|OAI-SearchBot)

Set the rule to redirect to your TollBit subdomain by changing the Then option to Redirect and copy in your TollBit subdomain. The rule should look like this when you're finished

Finally, click save rule. The change won't go into effect until you publish the change to go live.

Was this page helpful?