VIPnytt bot

基本資訊

VIPnytt bot的基本資訊如下表。但部分不是很規範的蜘蛛和爬蟲，可能存在資訊不明的情況。

蜘蛛/爬蟲名稱: VIPnytt bot

型別: 資訊流

開發商: VIP nytt AS

當前狀態: 活動

使用者代理

關於VIPnytt bot蜘蛛或者爬蟲的使用者代理字串，IP地址和伺服器，所在地等資訊如下表格所示：

使用者代理字串: RobotsTxtParser-VIPnytt/2.1 (+https://github.com/VIPnytt/RobotsTxtParser/blob/master/README.md)

首次出現: 2021-10-30 20:31:37

最後出現: 2024-11-07 17:51:23

遵循robots.txt: 未知

來源

IP地址(2)	伺服器名稱	所屬國家
195.154.133.41	195-154-133-41.rev.poneytelecom.eu	FR
52.59.102.78	ec2-52-59-102-78.eu-central-1.compute.amazonaws.com	DE

使用者代理字串: RobotsTxtParser-VIPnytt/2.0 (+https://github.com/VIPnytt/RobotsTxtParser/blob/master/README.md)

首次出現: 2018-08-29 01:23:19

最後出現: 2021-11-02 17:35:31

遵循robots.txt: 未知

來源

IP地址(2)	伺服器名稱	所屬國家
3.127.119.2	ec2-3-127-119-2.eu-central-1.compute.amazonaws.com	DE
62.138.3.191	astra4433.startdedicated.de	FR

使用者代理字串: SitemapParser-VIPnytt/1.0 (+https://github.com/VIPnytt/SitemapParser/blob/master/README.md)

首次出現: 2018-04-04 15:08:00

最後出現: 2018-04-04 15:08:00

遵循robots.txt: 未知

來源

IP地址(1)	伺服器名稱	所屬國家
104.207.143.191	?	US

使用者代理字串: Mozilla/5.0 (compatible; jpg-newsbot/2.0; +http://vipnytt.no/bot.html)

首次出現: 2015-12-10 08:05:00

最後出現: 2016-04-20 23:57:21

遵循robots.txt: 未知

來源

IP地址(3)	伺服器名稱	所屬國家
212.251.196.81	?	NO
84.202.187.83	?	NO
95.34.60.49	49.60.34.95.customer.cdi.no	NO

訪問控制

瞭解如何控制VIPnytt bot訪問許可權，避免VIPnytt bot抓取行為不當。

是否攔截VIPnytt bot？

通常不需要。除非您不希望資訊流網站或者APP對您的網站內容進行抓取，網站也不提供Feed訂閱服務，則可以考慮攔截此型別爬蟲。

通過Robots.txt攔截

您可以通過在網站的 robots.txt 中設定使用者代理訪問規則來遮蔽 VIPnytt bot 或限制其訪問許可權。我們建議安裝 Spider Analyser 外掛，以檢查它是否真正遵循這些規則。

# robots.txt
# 下列程式碼一般情況可以攔截該代理
User-agent: VIPnytt bot
Disallow: /

您無需手動執行此操作，可通過我們的 Wordpress 外掛 Spider Analyser 來攔截不必要的蜘蛛或者爬蟲。

更多資訊

一個易於使用、可擴充套件的 robots.txt 解析器庫，完全支援網際網路上的所有指令和規範。

用例:

許可權檢查
抓取爬蟲規則
發現網站地圖
主機偏好
動態URL引數發現
robots.txt 渲染

優勢

(與大多數其他 robots.txt 庫相比)

自動下載robots.txt 。(可選)
整合快取系統。(可選)
抓取延遲處理程式。
可用的文件。
支援字面上的每一個指令，來自每一個規範。
HTTP狀態程式碼處理程式，根據谷歌的規範。
專用的用User-Agent分析器和組確定器庫，以獲得最大的準確性。
提供額外的資料，如首選主機、動態URL引數、網站地圖位置等。
支援的協議：HTTP, HTTPS, FTP, SFTP 和 FTP/S.

要求:

PHP 7.3+ or 8.0+
PHP extensions:
- cURL
- mbstring

安裝

The recommended way to install the robots.txt parser is through Composer. Add this to your composer.json file:

安裝 robots.txt 解析器的推薦方式是通過 Composer。在你的 composer.json檔案中加入以下內容。

{
"require": {
"vipnytt/robotstxtparser": "^2.1"
}
}

然後執行: php composer update

開始使用

基本使用範例

<?php
$client = new vipnytt\RobotsTxtParser\UriClient('http://example.com');
if ($client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html')) {
// Access is granted
}
if ($client->userAgent('MyBot')->isDisallowed('http://example.com/admin')) {
// Access is denied
}

基本方法的一個小節選

<?php
// Syntax: $baseUri, [$statusCode:int|null], [$robotsTxtContent:string], [$encoding:string], [$byteLimit:int|null]
$client = new vipnytt\RobotsTxtParser\TxtClient('http://example.com', 200, $robotsTxtContent);
// Permission checks
$allowed = $client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html'); // bool
$denied = $client->userAgent('MyBot')->isDisallowed('http://example.com/admin'); // bool
// Crawl delay rules
$crawlDelay = $client->userAgent('MyBot')->crawlDelay()->getValue(); // float | int
// Dynamic URL parameters
$cleanParam = $client->cleanParam()->export(); // array
// Preferred host
$host = $client->host()->export(); // string | null
$host = $client->host()->getWithUriFallback(); // string
$host = $client->host()->isPreferred(); // bool
// XML Sitemap locations
$host = $client->sitemap()->export(); // array

以上只是一個基本的嘗試，還有一大堆更高階和/或專門的方法，幾乎可以用於任何目的。請訪問小抄以瞭解技術細節。

請訪問文件以瞭解更多資訊。