The ToS;DR Crawler is important to the functionally of Phoenix.
By crawling a service we ensure that the documents are mirrored and cannot be altered until a further crawl (Verified using CRC)
We do not index websites on our own, all websites are crawled manually by curators or staff on our site.
Identifying the ToS;DR Crawler
All ToS;DR Crawlers send a respective user agent with all their requests
Check for the following user agent:
ToSDRCrawler/1.0.0 (+https://to.tosdr.org/bot)
robots.txt
If you want to forbid the crawling for some reason you can include the following directive into the robots.txt
User-Agent: TosDRCrawler
Disallow: YOUR_PATH
Crawler Clusters
176.9.76.173
144.76.3.178
45.136.28.177
87.78.131.160
157.245.142.64
Crawler problems
If you are the provider of the website, common crawling issues are
- Cloudflare
- robots.txt
- IPTables based restriction (See Crawler Clusters)
- User-Agent based blocking
To fix this, add our servers or user agents to their respective whitelist.