"robots.txt" is a file which allows the website owner to disallow bots, crawlers, scrapers, and other potentially malicious or unwanted behaviour on their website.
44 lines
1.3 KiB
Plaintext
44 lines
1.3 KiB
Plaintext
# Inferencium - Website - robots.txt
|
|
# Version: 1.0.0-beta.1
|
|
|
|
# Copyright 2024 Jake Winters
|
|
# SPDX-License-Identifier: BSD-3-Clause
|
|
|
|
|
|
# ChatGPT
|
|
User-agent: ChatGPT-User
|
|
Disallow: /
|
|
|
|
User-agent: GPTbot
|
|
Disallow: /
|
|
|
|
|
|
# Google Bard
|
|
User-agent: Google-Extended
|
|
Disallow: /
|
|
|
|
|
|
# iThenticate (http://www.slysearch.com/)
|
|
## A tool which crawls the internet in search of copyright and intellectual property violations
|
|
## which may be of interest to clients. These tools have no right to scan my website for such
|
|
## purposes.
|
|
User-agent: SlySearch
|
|
Disallow: /
|
|
|
|
|
|
# NameProtect (http://www.nameprotect.com/botinfo.html)
|
|
## A tool which crawls the internet in search of brand and intellectual property violations which
|
|
## may be of interest to clients. These tools have no right to scan my website for such purposes.
|
|
User-agent: NPBot
|
|
Disallow: /
|
|
|
|
|
|
# Turnitinbot (http://www.turnitin.com/robot/crawlerinfo.html)
|
|
## A tool to scan the internet to allow educational institutions to compare content against
|
|
## students' work in order to prevent plagiarism. These tools promote a bad precedence against
|
|
## open-source content as it may be marked as copyrighted/plagiarised when it's actually legally
|
|
## available for use under the copyright holder's license. I allow complete usage of my content for
|
|
## educational purposes, without exception.
|
|
User-agent: Turnitinbot
|
|
Disallow: /
|