如何做搜索引擎蜘蛛日志分析
搜索引擎蜘蛛日志文件是一种非常强大但未被站长充分利用的文件,分析它可以获取有关每个搜索引擎如何爬取网站内容的相关信息点,及查看搜索引擎蜘蛛在一段时间内的行为。
| IP地址(18) | 服务器名称 | 所属国家 |
|---|---|---|
| 140.238.153.137 | 140.238.153.137 | CA |
| 140.238.155.192 | ? | CA |
| 35.203.245.147 | 147.245.203.35.gae.googleusercontent.com | US |
| 150.230.28.205 | 150.230.28.205 | CA |
| 2600:1900:2000:1d:400::7 | ipv6.gae.googleusercontent.com | US |
| 2600:1900:2001:2::28 | ipv6.gae.googleusercontent.com | US |
| 107.178.231.227 | 227.231.178.107.gae.googleusercontent.com | US |
| 2600:1900:2000:37:400::15 | ipv6.gae.googleusercontent.com | US |
| 107.178.232.188 | 188.232.178.107.gae.googleusercontent.com | US |
| 2600:1900:2001:3::11 | ipv6.gae.googleusercontent.com | US |
| 2600:1900:2000:1b:400::1f | ipv6.gae.googleusercontent.com | US |
| 192.18.145.111 | ? | CA |
| 132.145.96.13 | 132.145.96.13 | CA |
| 2600:1900:2000:1b:400::14 | ipv6.gae.googleusercontent.com | US |
| 2600:1900:2000:1b:400::22 | ipv6.gae.googleusercontent.com | US |
| 107.178.236.10 | 10.236.178.107.gae.googleusercontent.com | US |
| 2600:1900:2001:3::7 | ipv6.gae.googleusercontent.com | US |
| 2600:1900:2001:10::2c | ipv6.gae.googleusercontent.com | US |
| 107.178.237.3 | 3.237.178.107.gae.googleusercontent.com | US |
| 18.220.23.193 | ec2-18-220-23-193.us-east-2.compute.amazonaws.com | US |
| 35.203.251.50 | 50.251.203.35.gae.googleusercontent.com | US |
| 2600:1900:2000:9::9 | ipv6.gae.googleusercontent.com | US |
| 107.178.236.26 | 26.236.178.107.gae.googleusercontent.com | US |
| 107.178.239.195 | 195.239.178.107.gae.googleusercontent.com | US |
| 2600:1900:2000:50::4 | ipv6.gae.googleusercontent.com | US |
| 129.153.52.33 | 129.153.52.33 | US |
| 107.178.200.212 | 212.200.178.107.gae.googleusercontent.com | US |
| 35.203.251.54 | 54.251.203.35.gae.googleusercontent.com | US |
| 129.153.49.13 | 129.153.49.13 | US |
| 129.153.56.182 | ? | US |
| 107.178.200.204 | 204.200.178.107.gae.googleusercontent.com | US |
| 107.178.238.49 | 49.238.178.107.gae.googleusercontent.com | US |
| 192.18.152.135 | ? | CA |
| 107.178.238.46 | 46.238.178.107.gae.googleusercontent.com | US |
| 2600:1900:2000:1d:400:0:1:e00 | ipv6.gae.googleusercontent.com | US |
| 40.233.84.134 | a40-233-84-134.deploy.static.akamaitechnologies.com | US |
| 140.238.146.226 | 140.238.146.226 | CA |
| 150.230.26.190 | ? | CA |
| 107.178.200.205 | 205.200.178.107.gae.googleusercontent.com | US |
| 140.238.145.147 | 140.238.145.147 | CA |
| 2600:1900:0:2d06::300 | 2600:1900:0:2d06::300 | US |
| 150.230.24.255 | ? | CA |
| 140.238.142.80 | 140.238.142.80 | CA |
| IP地址(18) | 服务器名称 | 所属国家 |
|---|---|---|
| 2600:1900:2000:1b:400::14 | ipv6.gae.googleusercontent.com | US |
| 2600:1900:2000:1b:400::22 | ipv6.gae.googleusercontent.com | US |
| 107.178.236.10 | 10.236.178.107.gae.googleusercontent.com | US |
| 2600:1900:2001:3::7 | ipv6.gae.googleusercontent.com | US |
| 2600:1900:2001:10::2c | ipv6.gae.googleusercontent.com | US |
| 107.178.237.3 | 3.237.178.107.gae.googleusercontent.com | US |
| 18.220.23.193 | ec2-18-220-23-193.us-east-2.compute.amazonaws.com | US |
| 35.203.251.50 | 50.251.203.35.gae.googleusercontent.com | US |
| 2600:1900:2000:9::9 | ipv6.gae.googleusercontent.com | US |
| 107.178.236.26 | 26.236.178.107.gae.googleusercontent.com | US |
| IP地址(59) | 服务器名称 | 所属国家 |
|---|---|---|
| 107.178.236.2 | ? | US |
| 2600:1900:2001:10::21 | ipv6.gae.googleusercontent.com | US |
| 35.203.245.186 | ? | US |
| 35.203.252.119 | ? | US |
| 35.203.252.115 | ? | US |
| 107.178.237.3 | ? | US |
| 35.203.245.112 | ? | US |
| 2600:1900:2001:3::28 | ipv6.gae.googleusercontent.com | US |
| 35.203.245.188 | ? | US |
| 35.203.252.151 | ? | US |
| IP地址(19) | 服务器名称 | 所属国家 |
|---|---|---|
| 107.178.194.58 | ? | US |
| 107.178.194.86 | ? | US |
| 107.178.194.56 | ? | US |
| 107.178.195.202 | ? | US |
| 107.178.194.120 | ? | US |
| 107.178.194.90 | ? | US |
| 107.178.194.92 | ? | US |
| 107.178.195.215 | ? | US |
| 107.178.194.187 | ? | US |
| 107.178.194.23 | ? | US |
| IP地址(1) | 服务器名称 | 所属国家 |
|---|---|---|
| 107.178.194.11 | ? | US |
| IP地址(2) | 服务器名称 | 所属国家 |
|---|---|---|
| 107.178.195.186 | ? | US |
| 107.178.194.118 | ? | US |
对于未知蜘蛛或者爬虫。它的用途对网站来说可能是好的,也可能是坏的,这取决于它是什么。所以说,这需要站长进一步分析判断这些尚不明确的爬虫行为,再作最终决定。 但,根据以往的经验,未声明行为目的及未命名的蜘蛛爬虫,通常都有不可告人的秘密,我们理应对其行为进行控制,比如拦截。
您可以通过在网站的 robots.txt 中设置用户代理访问规则来屏蔽 KOCMOHABT bot 或限制其访问权限。我们建议安装 Spider Analyser 插件,以检查它是否真正遵循这些规则。
# robots.txt # 下列代码一般情况可以拦截该代理 User-agent: KOCMOHABT bot Disallow: /
您无需手动执行此操作,可通过我们的 Wordpress 插件 Spider Analyser 来拦截不必要的蜘蛛或者爬虫。
(工作日 10:00 - 18:30 为您服务)