Create and Activate Rates
This is the core of the TollBit product, and where you can set prices on your content. At the moment there are a few ways to set rates. Rates currently have the following
hierarchy: bot -> page -> keyword -> time -> directory
. This means that when determining the price of a page for a particular request,
we first check if that request is from a bot that matches any of your bot rates. If so, we return that rate. If there are no bot matches,
we then check if the requested page matches any of your page rates. We keep going down the chain, trying to find a match, and if we find no
matches at the end, the price is assumed to be 0.
When you first onboard onto the platform, your rates will not be active and demand side users will not be able to fetch your content through TollBit. In order to activate your rates and price your content, click the "Activate" button on the top of the rates page. Once this is active, all your rates will be live.
A Quick Word on our Standard Licenses
The TollBit platform has two standard licenses. The first, and default, one is the "On Demand" license. At a high level, this allows data consumers to use each piece of your content once as part of a summary. The second license is the "Full Display" license. Again at a high level, this allows users to use your content once as part of a summary and to display the content in its original format within their application. You can review the full licenses within your portal on the Rates page, and rates can be independently set and activated per standard license. You can switch between this via the license dropdown on the page.
Bot Rates
These rates allow you to set special rates for any specific bots that access your platform, and will override all other rates. You should set this type of rate if you have struck a licensing deal with a company that employs a particular user agent, and want to give them special rates to access your content (usually 0).
Page Rates
These rates allow you to set a rate for a specific page on your website. If you have any page that you know gets high bot traffic (i.e. sports or election results), or if you have a very high quality piece of original reporting, you can set a special rate for that page. This will override all other rates except bot rates.
Keyword Rates
These rates allow you to set a price for pages that may contain a particular keyword. If you know that there are some high profile sporting
events coming up, you may want to set a higher price for pages that mention football
or basketball
. This rate is still in beta.
Time Rates
This rate allows you to define how the price of a page should change over time. You set a starting price for just published or updated content, and can define what the price of the content should be after a set amount of time passes from the last modified time. This rate allows you to automatically price content without needing to constantly manage the dashboard.
Time Rates depend on your sitemap using the lastmod
XML tags. This is how we
know when a page has been last modified.
Directory Rates
These rates let you set a flat fee for all the content within a page directory of your site. For a quick way to instantly price your content,
you can set a price for your top level directory, and this will automatically apply to all pages. You can drill down into further subdirectories
and set pricing there, and it will override any price in a higher directory. For example, you can set a base price of $0.001 at the root
level, and then set a price of $0.005 for the /sports
directory. Everything under /sports
will now be $0.005 while something under /cooking
will still be $0.001.
Transactions
This page provides an audit trail where you can see all the requests that have been made to your website through TollBit. For each request, you are able to see the user agent that made the request, the page they hit, and the price they paid for that page.
Asset Management
Control what data to include or exclude when developers request content from your website through TollBit.
You can filter out certain types of assets from the HTML of your website, such as images, links or embedded content.
Note that in order to properly filter these out, these assets need to be properly included in your website using well
formatted HTML. For example, we won't be able to filter out a hyperlink if it's not within an <a>
tag.
For more advanced usecases, you can filter out all elements with a specific HTML class.
Legal
For any partners that you have struck deal with, you can upload a custom license the the user agents of that specific partner. Any requests made to TollBit with that partner's user agents will include the license that you uploaded in the transactions.
API Auth Settings
Most of our partners have data available on the open internet. However, if your content is behind an API that requires authentication,
you can use this page to set up authentication so our agent can fetch your data on behalf of end users. We support both
OAuth as well as header based authentication. In the OAuth case, you would make sure to set your OAuth endpoint and the payload to POST,
which will include the user id and secret key. Finally in the Token Key
field, you would put the exact key in the json response
whose value corresponds to the bearer token that we should use to make authenticated requests.
Content Formatting
We format content to most effectively integrate within AI applications and into LLM contexts. This feature comes out of the box when using TollBit. All sites onboarded with TollBit will work with this functionality.
What does the formatted content look like?
This formatting process makes no changes to the original content. We simply clean the content for you to be perfectly ready for your data
pipeline. Specically, the data comes back as a markdown representation of the original web page. The main
field of the content
response
will likely contain the actual content of the article without any clutter of navigational components or social media links. Should you want to use those
fields, you may get them from the header
and footer
fields if we were able to parse them out.
Finally, the metadata
field may contain additional information that isn't part of the original content, but can provide additional context around
the content. This could include raw data, follow up link, or additional topics for the end user to explore.
The following is what a user who hits our FAQs page might see.
Example Content Formatting
{
"content": {
"header": "",
"main": "
# TollBit - FAQs
[Get started](https://signup.tollbit.com)
# FAQs
[Request a demo](https://signup.tollbit.com)
-
## What is TollBit?
TollBit is a first-of-its-kind platform to help websites ensure fair compensation for their content and data. The platform allows AI bots and data scrapers to pay websites directly, rewarding quality content creation and mitigating the legal uncertainty of scraping.
-
## How does the platform work?
On the supply side, TollBit’s clients are companies with openly accessible websites, whose data are vulnerable to scraping. They includes publishers, sites with user-generated content, and sites that allow end users to take action - such as e-commerce sites.
Using TollBit, websites can sign up rates and rules can be set for autonomous (non-human) access to any specific URL. TollBit also provides powerful analytics and visibility to companies about autonomous traffic.
On the other side, companies doing the scraping today can use TollBit to access content and data on websites for a fee in exchange for licensing and a cleaner more digestible version of the URL page.
TollBit enables websites to realize the true value of their data, which would otherwise be prone to payment-free scraping.
-
## Are you onboarding publishers?
Yes, we are onboarding publishers and partners.
-
## A number of publishers are cutting their own content licensing deals with tech companies - how would this platform impact the platform?
TollBit is an “and” product. We encourage our websites to pursue 1:1 licensing deals when they make sense. TollBit can also help provide critical missing infrastructure for licensing deals, including reporting, rate limits, and authentication.
-
## How do you expect to set pricing/value of content?
On-demand access rates are something that will depend on the unique needs and business model of our individual clients. Publishers set their own rates on TollBit. However, private licensing deal terms are never public.
-
## Was this developed in concert with any publishers?
The development of the TollBit platform was informed by conversations with dozens of publishers and the product will continue to be improved with their feedback.
-
## Do publishers need to have a paywall in order to use TollBit to generate revenue? If information is already in the public domain then how can AI companies be expected to pay?
Publishers do not have to have an existing paywall in place to generate revenue via TollBit.
-
## Are there restrictions on how content can be used once licensed through TollBit?
Yes, there are specific scopes and restrictions of the on-demand license.
If you use the content/data in a form that is not covered by the license, then the license is not valid and you do not have protection or permission for use.
Made with care in Nashua, Boston, and
New York City
© Novoscribe, Inc. 2024",
"footer": ""
},
"metadata": "",
"rate": {
"priceMicros": 20000,
"currency": "USD",
"licenseType": "ON_DEMAND_LICENSE",
"licensePath": "...",
"error": ""
}
}
As you can see, none of the original content is affected in any way. Just some trimming down of the extra HTML!
Why is this format good for AI?
This formatting maintains crucial context for the LLM like titles, paragraphs, links, etc. At the same time, this format strips away the exessive HTML tags, scripts and other clutter that comes back from scraping typical websites. This format should optimize the value in the content, while being efficient to how many tokens you pass to the LLM.