VIPnytt bot

VIPnytt bot蜘蛛/爬虫属于信息流类型，由VIP nytt AS开发运行。您可以继续阅读下方信息，以深入了解VIPnytt bot基本信息，用户代理和访问控制等。

基本信息

VIPnytt bot的基本信息如下表。但部分不是很规范的蜘蛛和爬虫，可能存在信息不明的情况。

蜘蛛/爬虫名称: VIPnytt bot

类型: 信息流

开发商: VIP nytt AS

当前状态: 活动

用户代理

关于VIPnytt bot蜘蛛或者爬虫的用户代理字符串，IP地址和服务器，所在地等信息如下表格所示：

用户代理字符串: RobotsTxtParser-VIPnytt/2.1 (+https://github.com/VIPnytt/RobotsTxtParser/blob/master/README.md)

首次出现: 2021-10-30 20:31:37

最后出现: 2024-11-07 17:51:23

遵循robots.txt: 未知

来源

IP地址(2)	服务器名称	所属国家
195.154.133.41	195-154-133-41.rev.poneytelecom.eu	FR
52.59.102.78	ec2-52-59-102-78.eu-central-1.compute.amazonaws.com	DE

用户代理字符串: RobotsTxtParser-VIPnytt/2.0 (+https://github.com/VIPnytt/RobotsTxtParser/blob/master/README.md)

首次出现: 2018-08-29 01:23:19

最后出现: 2021-11-02 17:35:31

遵循robots.txt: 未知

来源

IP地址(2)	服务器名称	所属国家
3.127.119.2	ec2-3-127-119-2.eu-central-1.compute.amazonaws.com	DE
62.138.3.191	astra4433.startdedicated.de	FR

用户代理字符串: SitemapParser-VIPnytt/1.0 (+https://github.com/VIPnytt/SitemapParser/blob/master/README.md)

首次出现: 2018-04-04 15:08:00

最后出现: 2018-04-04 15:08:00

遵循robots.txt: 未知

来源

IP地址(1)	服务器名称	所属国家
104.207.143.191	?	US

用户代理字符串: Mozilla/5.0 (compatible; jpg-newsbot/2.0; +http://vipnytt.no/bot.html)

首次出现: 2015-12-10 08:05:00

最后出现: 2016-04-20 23:57:21

遵循robots.txt: 未知

来源

IP地址(3)	服务器名称	所属国家
212.251.196.81	?	NO
84.202.187.83	?	NO
95.34.60.49	49.60.34.95.customer.cdi.no	NO

访问控制

了解如何控制VIPnytt bot访问权限，避免VIPnytt bot抓取行为不当。

是否拦截VIPnytt bot？

通常不需要。除非您不希望信息流网站或者APP对您的网站内容进行抓取，网站也不提供Feed订阅服务，则可以考虑拦截此类型爬虫。

通过Robots.txt拦截

您可以通过在网站的 robots.txt 中设置用户代理访问规则来屏蔽 VIPnytt bot 或限制其访问权限。我们建议安装 Spider Analyser 插件，以检查它是否真正遵循这些规则。

# robots.txt

# 下列代码一般情况可以拦截该代理

User-agent: VIPnytt bot

Disallow: /

# robots.txt # 下列代码一般情况可以拦截该代理 User-agent: VIPnytt bot Disallow: /

# robots.txt
# 下列代码一般情况可以拦截该代理
User-agent: VIPnytt bot
Disallow: /

您无需手动执行此操作，可通过我们的 Wordpress 插件 Spider Analyser 来拦截不必要的蜘蛛或者爬虫。

更多信息

一个易于使用、可扩展的 robots.txt 解析器库，完全支持互联网上的所有指令和规范。

用例:

权限检查
抓取爬虫规则
发现网站地图
主机偏好
动态URL参数发现
robots.txt 渲染

优势

(与大多数其他 robots.txt 库相比)

自动下载robots.txt 。(可选)
集成缓存系统。(可选)
抓取延迟处理程序。
可用的文档。
支持字面上的每一个指令，来自每一个规范。
HTTP状态代码处理程序，根据谷歌的规范。
专用的用User-Agent分析器和组确定器库，以获得最大的准确性。
提供额外的数据，如首选主机、动态URL参数、网站地图位置等。
支持的协议：HTTP, HTTPS, FTP, SFTP 和 FTP/S.

要求:

PHP 7.3+ or 8.0+
PHP extensions:
- cURL
- mbstring

安装

The recommended way to install the robots.txt parser is through Composer. Add this to your composer.json file:

安装 robots.txt 解析器的推荐方式是通过 Composer。在你的 composer.json文件中加入以下内容。

{

"require": {

"vipnytt/robotstxtparser": "^2.1"

}

{ "require": { "vipnytt/robotstxtparser": "^2.1" } }

{
"require": {
"vipnytt/robotstxtparser": "^2.1"
}
}

然后运行: php composer update

开始使用

基本使用范例

<?php

$client = new vipnytt\RobotsTxtParser\UriClient('http://example.com');

if ($client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html')) {

// Access is granted

}

if ($client->userAgent('MyBot')->isDisallowed('http://example.com/admin')) {

// Access is denied

}

<?php $client = new vipnytt\RobotsTxtParser\UriClient('http://example.com'); if ($client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html')) { // Access is granted } if ($client->userAgent('MyBot')->isDisallowed('http://example.com/admin')) { // Access is denied }

<?php
$client = new vipnytt\RobotsTxtParser\UriClient('http://example.com');
if ($client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html')) {
// Access is granted
}
if ($client->userAgent('MyBot')->isDisallowed('http://example.com/admin')) {
// Access is denied
}

基本方法的一个小节选

<?php

// Syntax: $baseUri, [$statusCode:int|null], [$robotsTxtContent:string], [$encoding:string], [$byteLimit:int|null]

$client = new vipnytt\RobotsTxtParser\TxtClient('http://example.com', 200, $robotsTxtContent);

// Permission checks

$allowed = $client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html'); // bool

$denied = $client->userAgent('MyBot')->isDisallowed('http://example.com/admin'); // bool

// Crawl delay rules

$crawlDelay = $client->userAgent('MyBot')->crawlDelay()->getValue(); // float | int

// Dynamic URL parameters

$cleanParam = $client->cleanParam()->export(); // array

// Preferred host

$host = $client->host()->export(); // string | null

$host = $client->host()->getWithUriFallback(); // string

$host = $client->host()->isPreferred(); // bool

// XML Sitemap locations

$host = $client->sitemap()->export(); // array

<?php // Syntax: $baseUri, [$statusCode:int|null], [$robotsTxtContent:string], [$encoding:string], [$byteLimit:int|null] $client = new vipnytt\RobotsTxtParser\TxtClient('http://example.com', 200, $robotsTxtContent); // Permission checks $allowed = $client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html'); // bool $denied = $client->userAgent('MyBot')->isDisallowed('http://example.com/admin'); // bool // Crawl delay rules $crawlDelay = $client->userAgent('MyBot')->crawlDelay()->getValue(); // float | int // Dynamic URL parameters $cleanParam = $client->cleanParam()->export(); // array // Preferred host $host = $client->host()->export(); // string | null $host = $client->host()->getWithUriFallback(); // string $host = $client->host()->isPreferred(); // bool // XML Sitemap locations $host = $client->sitemap()->export(); // array

<?php
// Syntax: $baseUri, [$statusCode:int|null], [$robotsTxtContent:string], [$encoding:string], [$byteLimit:int|null]
$client = new vipnytt\RobotsTxtParser\TxtClient('http://example.com', 200, $robotsTxtContent);
// Permission checks
$allowed = $client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html'); // bool
$denied = $client->userAgent('MyBot')->isDisallowed('http://example.com/admin'); // bool
// Crawl delay rules
$crawlDelay = $client->userAgent('MyBot')->crawlDelay()->getValue(); // float | int
// Dynamic URL parameters
$cleanParam = $client->cleanParam()->export(); // array
// Preferred host
$host = $client->host()->export(); // string | null
$host = $client->host()->getWithUriFallback(); // string
$host = $client->host()->isPreferred(); // bool
// XML Sitemap locations
$host = $client->sitemap()->export(); // array

以上只是一个基本的尝试，还有一大堆更高级和/或专门的方法，几乎可以用于任何目的。请访问小抄以了解技术细节。

请访问文档以了解更多信息。