![]() |
If you are visiting this page you probably saw our web crawler, MLBot, accessing your web site.
We are a building an index of media on the web. We've put much effort into being a good internet citizen by making our crawler as polite as possible.
Your server logs may show multiple HTTP accesses to the same files. We respect your bandwidth and we do not download entire media files from your server. We access only small file segments that are likely to contain metadata for indexing. Each separate HTTP request will appear in your server log but we are not downloading the file multiple times.
We understand your bandwidth and server time are valuable. It is important for us and for our business to be welcomed on the internet as a well-behaved crawler. Although we try very hard to be polite there is always the possibility of a bug. If you notice any unusual activity from MLBot please report it to:
We want to hear from you and we will respond promptly. Your feedback has helped us to improve MLBot.
If you have a problem or concern about MLBot we much prefer to have the chance to address it but if you need to block MLBot we do respect the robots.txt exclusion list. To block MLBot from some parts of your web site you can use the following example:
User-agent: MLBot
Disallow: /upload_dir/
Disallow: /draft_podcasts/
In this example, /upload_dir/ and /draft_podcasts/ are directories that will be blocked to MLBot and won't be crawled. Other parts of your web site will still be crawled.
To block MLBot from your entire web site you can use this:
User-agent: MLBot
Disallow: /
Please note, our web crawler caches robots.txt files and it can take 24 to 48 hours before it is re-read.
More information on robots.txt can be found at http://www.robotstxt.org
Thank you,
The metadata labs team