archive.org_bot
archive.org_bot蜘蛛/爬虫属于工具类型,由Internet Archive开发运行。您可以继续阅读下方信息,以深入了解archive.org_bot基本信息,用户代理和访问控制等。
基本信息
archive.org_bot的基本信息如下表。但部分不是很规范的蜘蛛和爬虫,可能存在信息不明的情况。
- 蜘蛛/爬虫名称
- archive.org_bot
- 类型
- 工具
- 开发商
-
Internet Archive
- 当前状态
-
活动
用户代理
关于archive.org_bot蜘蛛或者爬虫的用户代理字符串,IP地址和服务器,所在地等信息如下表格所示:
NL-Israel_IAHarvester2024/3.3.0
-
NL-Israel_IAHarvester2024/3.3.0
-
heritrix/3.3.0
-
archive.org_bot
-
archive.org_bot
-
special_archiver
-
special_archiver/3.3.0
-
special_archiver/3.1.1
-
special_archiver/3.1.1
-
archive.org_bot
-
Wayback Machine Live Record
-
Wayback Machine Live Record
-
archive.org_bot/3.3.0
-
heritrix/3.1.1
-
archive.org_bot
-
archive.org_bot
- 用户代理字符串
- Mozilla/5.0 (compatible; NL-Israel_IAHarvester2024/3.3.0; +http://https://archive.org/details/archive.org_bot)
- 首次出现
- 2024-02-09 14:47:50
- 最后出现
- 2024-02-14 10:24:49
- 遵循robots.txt
- 未知
- 来源
-
IP地址(2) |
服务器名称 |
所属国家 |
207.241.235.85 |
wbgrp-crawl047.us.archive.org |
US |
207.241.234.202 |
wbgrp-crawl044.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20140702-2247 +http://archive.org/details/archive.org_bot)
- 首次出现
- 2016-07-09 07:46:39
- 最后出现
- 2023-10-25 16:58:29
- 遵循robots.txt
- 未知
- 来源
-
IP地址(36) |
服务器名称 |
所属国家 |
207.241.234.164 |
iw902707.archive.org |
US |
207.241.230.235 |
iw800709.archive.org |
US |
207.241.229.70 |
iw802605.archive.org |
US |
207.241.233.181 |
iw902904.archive.org |
US |
207.241.229.74 |
iw802506.archive.org |
US |
207.241.226.61 |
iw601303.archive.org |
US |
207.241.226.104 |
iw600707.archive.org |
US |
207.241.229.68 |
iw902602.archive.org |
US |
207.241.229.192 |
iw801604.archive.org |
US |
207.241.229.80 |
iw802207.archive.org |
US |
207.241.225.70 |
iw600209.archive.org |
US |
207.241.225.178 |
iw600808.archive.org |
US |
207.241.225.156 |
iw601403.archive.org |
US |
207.241.225.53 |
iw600409.archive.org |
US |
78.161.160.233 |
78.161.160.233.dynamic.ttnet.com.tr |
TR |
- 用户代理字符串
- Mozilla/5.0 (compatible; archive.org_bot; Archive-It; +http://archive-it.org/files/site-owners.html)
- 首次出现
- 2016-06-24 16:25:53
- 最后出现
- 2022-05-25 20:12:33
- 遵循robots.txt
- 未知
- 来源
-
IP地址(18) |
服务器名称 |
所属国家 |
207.241.232.221 |
wbgrp-crawl220.us.archive.org |
US |
207.241.232.173 |
wbgrp-crawl234.us.archive.org |
US |
207.241.231.104 |
wbgrp-svc210.us.archive.org |
US |
207.241.231.52 |
wbgrp-crawl214.us.archive.org |
US |
207.241.234.99 |
wbgrp-svc249.us.archive.org |
US |
207.241.231.193 |
wbgrp-crawl018.us.archive.org |
US |
207.241.232.175 |
wbgrp-crawl232.us.archive.org |
US |
207.241.231.94 |
wbgrp-svc229.us.archive.org |
US |
207.241.231.105 |
wbgrp-svc209.us.archive.org |
US |
207.241.231.103 |
wbgrp-svc211.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)
- 首次出现
- 2010-12-30 16:35:46
- 最后出现
- 2022-04-11 17:19:46
- 遵循robots.txt
- 未知
- 来源
-
IP地址(97) |
服务器名称 |
所属国家 |
3.145.67.195 |
ec2-3-145-67-195.us-east-2.compute.amazonaws.com |
US |
113.96.250.18 |
113.96.250.18 |
CN |
207.241.229.50 |
crawl812.us.archive.org |
US |
207.241.231.143 |
crawl428.us.archive.org |
US |
207.241.229.148 |
crawl802.us.archive.org |
US |
207.241.229.51 |
crawl811.us.archive.org |
US |
207.241.231.151 |
crawl420.us.archive.org |
US |
207.241.233.159 |
crawl806.us.archive.org |
US |
207.241.231.147 |
crawl424.us.archive.org |
US |
207.241.233.177 |
crawl853.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; special_archiver; Archive-It; +http://archive-it.org/files/site-owners-special.html)
- 首次出现
- 2018-05-05 21:56:46
- 最后出现
- 2021-11-27 08:37:49
- 遵循robots.txt
- 未知
- 来源
-
IP地址(11) |
服务器名称 |
所属国家 |
207.241.229.109 |
wbgrp-crawl011.us.archive.org |
US |
207.241.234.246 |
wbgrp-crawl036.us.archive.org |
US |
207.241.231.194 |
wbgrp-crawl019.us.archive.org |
US |
207.241.232.216 |
wbgrp-crawl225.us.archive.org |
US |
207.241.231.196 |
wbgrp-crawl021.us.archive.org |
US |
207.241.232.175 |
wbgrp-crawl232.us.archive.org |
US |
207.241.231.193 |
wbgrp-crawl018.us.archive.org |
US |
207.241.232.218 |
wbgrp-crawl223.us.archive.org |
US |
207.241.231.190 |
wbgrp-crawl015.us.archive.org |
US |
207.241.232.96 |
wbgrp-crawl241.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; special_archiver/3.3.0 bot@archive.org +https://archive.org/details/archive.org_bot)
- 首次出现
- 2016-11-23 11:48:52
- 最后出现
- 2019-08-11 10:52:41
- 遵循robots.txt
- 未知
- 来源
-
IP地址(2) |
服务器名称 |
所属国家 |
207.241.231.81 |
wbgrp-svc281.us.archive.org |
US |
207.241.226.41 |
wbgrp-crawl005.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; special_archiver/3.1.1 http://www.archive.org/details/archive.org_bot)
- 首次出现
- 2018-04-11 14:52:31
- 最后出现
- 2019-06-03 23:18:59
- 遵循robots.txt
- 否
- 来源
-
IP地址(16) |
服务器名称 |
所属国家 |
207.241.231.147 |
crawl424.us.archive.org |
US |
207.241.231.143 |
crawl428.us.archive.org |
US |
207.241.231.148 |
crawl423.us.archive.org |
US |
207.241.229.225 |
crawl891.us.archive.org |
US |
207.241.234.64 |
crawl505.us.archive.org |
US |
207.241.231.151 |
crawl420.us.archive.org |
US |
207.241.231.150 |
crawl421.us.archive.org |
US |
207.241.234.62 |
crawl503.us.archive.org |
US |
207.241.234.61 |
crawl502.us.archive.org |
US |
207.241.234.63 |
crawl504.us.archive.org |
US |
207.241.231.132 |
crawl500.us.archive.org |
US |
207.241.231.149 |
crawl422.us.archive.org |
US |
207.241.231.163 |
crawl345.us.archive.org |
US |
207.241.231.164 |
crawl344.us.archive.org |
US |
207.241.229.48 |
crawl814.us.archive.org |
US |
207.241.231.144 |
crawl427.us.archive.org |
US |
207.241.229.149 |
crawl801.us.archive.org |
US |
207.241.233.160 |
crawl805.us.archive.org |
US |
207.241.229.150 |
crawl809.us.archive.org |
US |
207.241.229.214 |
crawl805.us.archive.org |
US |
207.241.231.165 |
crawl339.us.archive.org |
US |
207.241.229.30 |
crawl838.us.archive.org |
US |
207.241.229.32 |
crawl836.us.archive.org |
US |
207.241.231.43 |
crawl855.us.archive.org |
US |
207.241.235.183 |
crawl861.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; special_archiver/3.1.1 http://www.archive.org/details/archive.org_bot)
- 首次出现
- 2018-04-11 14:52:31
- 最后出现
- 2019-06-03 23:18:59
- 遵循robots.txt
- 未知
- 来源
-
IP地址(16) |
服务器名称 |
所属国家 |
207.241.229.48 |
crawl814.us.archive.org |
US |
207.241.231.144 |
crawl427.us.archive.org |
US |
207.241.229.149 |
crawl801.us.archive.org |
US |
207.241.231.148 |
crawl423.us.archive.org |
US |
207.241.233.160 |
crawl805.us.archive.org |
US |
207.241.229.150 |
crawl809.us.archive.org |
US |
207.241.229.214 |
crawl805.us.archive.org |
US |
207.241.231.165 |
crawl339.us.archive.org |
US |
207.241.229.30 |
crawl838.us.archive.org |
US |
207.241.229.32 |
crawl836.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; archive.org_bot http://www.archive.org/details/archive.org_bot)
- 首次出现
- 2018-04-20 13:04:06
- 最后出现
- 2019-05-22 14:14:23
- 遵循robots.txt
- 未知
- 来源
-
IP地址(19) |
服务器名称 |
所属国家 |
207.241.231.170 |
crawl825.us.archive.org |
US |
207.241.231.144 |
crawl427.us.archive.org |
US |
207.241.231.148 |
crawl423.us.archive.org |
US |
207.241.229.149 |
crawl801.us.archive.org |
US |
207.241.229.150 |
crawl809.us.archive.org |
US |
207.241.233.160 |
crawl805.us.archive.org |
US |
207.241.229.48 |
crawl814.us.archive.org |
US |
207.241.229.214 |
crawl805.us.archive.org |
US |
207.241.231.171 |
crawl824.us.archive.org |
US |
207.241.232.43 |
crawl849.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; http://archive.org/details/archive.org_bot)
- 首次出现
- 2016-06-14 23:11:42
- 最后出现
- 2019-02-24 01:39:52
- 遵循robots.txt
- 否
- 来源
-
IP地址(7) |
服务器名称 |
所属国家 |
179.43.155.171 |
179.43.155.171 |
CH |
5.183.92.86 |
5.183.92.86 |
DE |
51.158.111.157 |
157-111-158-51.rev.cloud.scaleway.com |
FR |
207.241.226.230 |
wwwb-app14.us.archive.org |
US |
207.241.225.227 |
wwwb-app1.us.archive.org |
US |
207.241.232.121 |
wwwb-app52.us.archive.org |
US |
207.241.225.236 |
wwwb-app6.us.archive.org |
US |
207.241.227.105 |
wwwb-app54.us.archive.org |
US |
207.241.225.246 |
wwwb-app4.us.archive.org |
US |
207.241.226.219 |
wwwb-app15.us.archive.org |
US |
207.241.225.226 |
wwwb-app0.us.archive.org |
US |
207.241.225.235 |
wwwb-app8.us.archive.org |
US |
79.110.49.145 |
79.110.49.145 |
US |
109.205.213.134 |
109.205.213.134 |
AZ |
3.110.51.173 |
ec2-3-110-51-173.ap-south-1.compute.amazonaws.com |
IN |
137.184.12.53 |
137.184.12.53 |
US |
13.52.237.32 |
ec2-13-52-237-32.us-west-1.compute.amazonaws.com |
US |
3.0.56.33 |
ec2-3-0-56-33.ap-southeast-1.compute.amazonaws.com |
SG |
5.8.11.202 |
5.8.11.202 |
RU |
13.37.213.184 |
ec2-13-37-213-184.eu-west-3.compute.amazonaws.com |
FR |
105.110.165.177 |
105.110.165.177 |
DZ |
3.89.39.91 |
ec2-3-89-39-91.compute-1.amazonaws.com |
US |
81.161.238.40 |
81.161.238.40 |
NL |
178.215.236.240 |
178.215.236.240 |
FR |
- 用户代理字符串
- Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; http://archive.org/details/archive.org_bot)
- 首次出现
- 2016-06-14 23:11:42
- 最后出现
- 2019-02-24 01:39:52
- 遵循robots.txt
- 未知
- 来源
-
IP地址(7) |
服务器名称 |
所属国家 |
207.241.225.246 |
wwwb-app4.us.archive.org |
US |
207.241.225.236 |
wwwb-app6.us.archive.org |
US |
207.241.225.226 |
wwwb-app0.us.archive.org |
US |
207.241.232.121 |
wwwb-app52.us.archive.org |
US |
207.241.227.105 |
wwwb-app54.us.archive.org |
US |
207.241.226.230 |
wwwb-app14.us.archive.org |
US |
207.241.225.235 |
wwwb-app8.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; archive.org_bot/3.3.0 +http://www.archive.org/details/archive.org_bot)
- 首次出现
- 2015-09-23 21:15:00
- 最后出现
- 2015-09-24 08:11:30
- 遵循robots.txt
- 未知
- 来源
-
IP地址(1) |
服务器名称 |
所属国家 |
207.241.226.37 |
wbgrp-crawl009.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; heritrix/3.1.1-SNAPSHOT-20120116.200628 +http://www.archive.org/details/archive.org_bot)
- 首次出现
- 2012-03-25 19:13:00
- 最后出现
- 2012-09-13 22:02:30
- 遵循robots.txt
- 未知
- 来源
-
IP地址(1) |
服务器名称 |
所属国家 |
207.241.237.214 |
crawl435.us.archive.org |
US |
- 用户代理字符串
- Mozilla/5.0 (compatible; archive.org_bot/heritrix-1.15.4 +http://www.archive.org)
- 首次出现
- 2010-06-11 23:06:12
- 最后出现
- 2010-07-06 08:30:52
- 遵循robots.txt
- 否
- 来源
-
IP地址(1) |
服务器名称 |
所属国家 |
207.241.229.224 |
crawl892.us.archive.org |
US |
207.241.231.188 |
crawl895.us.archive.org |
US |
207.241.233.139 |
crawl865.us.archive.org |
US |
207.241.233.159 |
crawl806.us.archive.org |
US |
207.241.229.33 |
crawl835.us.archive.org |
US |
207.241.232.38 |
crawl109.us.archive.org |
US |
207.241.234.182 |
crawl804.us.archive.org |
US |
207.241.231.37 |
crawl897.us.archive.org |
US |
207.241.231.163 |
crawl345.us.archive.org |
US |
207.241.231.152 |
crawl409.us.archive.org |
US |
207.241.229.149 |
crawl801.us.archive.org |
US |
207.241.231.165 |
crawl339.us.archive.org |
US |
207.241.232.40 |
crawl107.us.archive.org |
US |
207.241.233.116 |
crawl825.us.archive.org |
US |
207.241.225.181 |
crawl858.us.archive.org |
US |
207.241.235.168 |
crawl902.us.archive.org |
US |
185.237.252.77 |
m18077.contaboserver.net |
DE |
3.145.67.195 |
ec2-3-145-67-195.us-east-2.compute.amazonaws.com |
US |
113.96.250.18 |
113.96.250.18 |
CN |
207.241.229.50 |
crawl812.us.archive.org |
US |
207.241.231.143 |
crawl428.us.archive.org |
US |
207.241.229.148 |
crawl802.us.archive.org |
US |
207.241.229.51 |
crawl811.us.archive.org |
US |
207.241.231.151 |
crawl420.us.archive.org |
US |
207.241.231.147 |
crawl424.us.archive.org |
US |
207.241.232.221 |
wbgrp-crawl220.us.archive.org |
US |
207.241.232.173 |
wbgrp-crawl234.us.archive.org |
US |
207.241.231.104 |
wbgrp-svc210.us.archive.org |
US |
207.241.231.52 |
wbgrp-crawl214.us.archive.org |
US |
207.241.234.99 |
wbgrp-svc249.us.archive.org |
US |
207.241.231.193 |
wbgrp-crawl018.us.archive.org |
US |
207.241.232.175 |
wbgrp-crawl232.us.archive.org |
US |
207.241.231.94 |
wbgrp-svc229.us.archive.org |
US |
207.241.231.105 |
wbgrp-svc209.us.archive.org |
US |
207.241.231.103 |
wbgrp-svc211.us.archive.org |
US |
207.241.231.170 |
crawl825.us.archive.org |
US |
207.241.231.144 |
crawl427.us.archive.org |
US |
207.241.231.148 |
crawl423.us.archive.org |
US |
207.241.229.150 |
crawl809.us.archive.org |
US |
207.241.233.160 |
crawl805.us.archive.org |
US |
207.241.229.48 |
crawl814.us.archive.org |
US |
207.241.229.214 |
crawl805.us.archive.org |
US |
207.241.231.171 |
crawl824.us.archive.org |
US |
207.241.232.43 |
crawl849.us.archive.org |
US |
207.241.228.179 |
ia360937.us.archive.org |
US |
95.216.55.129 |
static.129.55.216.95.clients.your-server.de |
FI |
207.241.233.247 |
crawl800.us.archive.org |
US |
207.241.234.15 |
crawl807.us.archive.org |
US |
207.241.235.229 |
crawl905.us.archive.org |
US |
207.241.235.230 |
crawl906.us.archive.org |
US |
207.241.236.59 |
crawl908.us.archive.org |
US |
207.241.236.85 |
crawl910.us.archive.org |
US |
95.217.88.52 |
95-217-88-52.yaip.io |
FI |
3.218.67.10 |
ec2-3-218-67-10.compute-1.amazonaws.com |
US |
149.100.158.27 |
149.100.158.27 |
US |
78.161.160.233 |
78.161.160.233.dynamic.ttnet.com.tr |
TR |
207.241.235.164 |
crawl901.us.archive.org |
US |
207.241.229.32 |
crawl836.us.archive.org |
US |
207.241.236.58 |
crawl907.us.archive.org |
US |
103.56.17.252 |
103.56.17.252 |
CN |
44.203.103.78 |
ec2-44-203-103-78.compute-1.amazonaws.com |
US |
54.183.113.139 |
ec2-54-183-113-139.us-west-1.compute.amazonaws.com |
US |
100.27.12.252 |
ec2-100-27-12-252.compute-1.amazonaws.com |
US |
207.241.236.82 |
crawl113.us.archive.org |
US |
207.241.234.235 |
crawl805.us.archive.org |
US |
207.241.234.96 |
wbgrp-svc246.us.archive.org |
US |
207.241.232.217 |
wbgrp-crawl224.us.archive.org |
US |
207.241.236.193 |
crawl917.us.archive.org |
US |
207.241.225.114 |
crawl919.us.archive.org |
US |
207.241.236.83 |
crawl346.us.archive.org |
US |
94.156.68.162 |
94.156.68.162 |
NL |
207.241.235.133 |
crawl900.us.archive.org |
US |
207.241.225.134 |
crawl917.us.archive.org |
US |
207.241.236.213 |
crawl809.us.archive.org |
US |
2a01:4f9:3071:2b63::2 |
2a01:4f9:3071:2b63::2 |
FI |
207.241.237.22 |
crawl918.us.archive.org |
US |
81.161.238.40 |
81.161.238.40 |
NL |
- 用户代理字符串
- Mozilla/5.0 (compatible; archive.org_bot/heritrix-1.15.4 +http://www.archive.org)
- 首次出现
- 2010-06-11 23:06:12
- 最后出现
- 2010-07-06 08:30:52
- 遵循robots.txt
- 未知
- 来源
-
IP地址(1) |
服务器名称 |
所属国家 |
207.241.228.179 |
ia360937.us.archive.org |
US |
访问控制
了解如何控制archive.org_bot访问权限,避免archive.org_bot抓取行为不当。
是否拦截archive.org_bot?
可能不需要。工具类型爬虫通常为网站所有者使用此类工具对网站进行相关服务请求才会出现。当然,实际情况需站长判断后再作决定。
通过Robots.txt拦截
您可以通过在网站的 robots.txt 中设置用户代理访问规则来屏蔽 archive.org_bot 或限制其访问权限。我们建议安装 Spider Analyser
插件,以检查它是否真正遵循这些规则。
User-agent: archive.org_bot
# robots.txt
# 下列代码一般情况可以拦截该代理
User-agent: archive.org_bot
Disallow: /
# robots.txt
# 下列代码一般情况可以拦截该代理
User-agent: archive.org_bot
Disallow: /
您无需手动执行此操作,可通过我们的 Wordpress 插件 Spider Analyser 来拦截不必要的蜘蛛或者爬虫。
更多信息
互联网档案馆(The Internet Archive)是一个非营利性的数字图书馆,它保存网络数据,并通过Wayback Machine使其可用于研究目的。我们从1996年开始对网络进行存档,目前已保存了超过1500亿份网络文件。
互联网档案馆与大学、图书馆和其他机构合作,保护世界文化遗产。除了我们的网络保存活动,我们还提供免费访问超过200万本数字书籍,超过60万个音频项目,以及超过30万个视频项目。请访问我们的新闻和公告论坛,阅读更多关于我们的项目。
Webmasters:用户代理archive.org_bot用于archive.org对网络的广泛抓取。archive.org尝试以足够慢的速度抓取,以免干扰正常的网络活动。你可以在Wayback Machine常见问题中了解更多。如果你发现archive.org_bot的行为有问题,请通过以下方式联系archive.org
bot@archive.org
archive.org认为,为后代保存网络数据是当务之急。与印刷媒体不同,网页可能而且确实会消失在空气中。