Add file "robots.txt"

"robots.txt" is a file which allows the website owner to disallow bots, crawlers, scrapers, and other potentially malicious or unwanted behaviour on their website.
2024-02-02 19:19:55 +00:00 · 2024-02-02 19:19:55 +00:00 · 94de9053e6
commit 94de9053e6
parent df8c88fad4
1 changed files with 43 additions and 0 deletions
--- a/robots.txt
+++ b/robots.txt
@ -0,0 +1,43 @@
+# Inferencium - Website - robots.txt
+# Version: 1.0.0-beta.1
+
+# Copyright 2024 Jake Winters
+# SPDX-License-Identifier: BSD-3-Clause
+
+
+# ChatGPT
+User-agent: ChatGPT-User
+Disallow: /
+
+User-agent: GPTbot
+Disallow: /
+
+
+# Google Bard
+User-agent: Google-Extended
+Disallow: /
+
+
+# iThenticate (http://www.slysearch.com/)
+## A tool which crawls the internet in search of copyright and intellectual property violations
+## which may be of interest to clients. These tools have no right to scan my website for such
+## purposes.
+User-agent: SlySearch
+Disallow: /
+
+
+# NameProtect (http://www.nameprotect.com/botinfo.html)
+## A tool which crawls the internet in search of brand and intellectual property violations which
+## may be of interest to clients. These tools have no right to scan my website for such purposes.
+User-agent: NPBot
+Disallow: /
+
+
+# Turnitinbot (http://www.turnitin.com/robot/crawlerinfo.html)
+## A tool to scan the internet to allow educational institutions to compare content against
+## students' work in order to prevent plagiarism. These tools promote a bad precedence against
+## open-source content as it may be marked as copyrighted/plagiarised when it's actually legally
+## available for use under the copyright holder's license. I allow complete usage of my content for
+## educational purposes, without exception.
+User-agent: Turnitinbot
+Disallow: /