archive.org_bot
archive.org_bot蜘蛛/爬蟲屬於工具型別,由Internet Archive開發執行。您可以繼續閱讀下方資訊,以深入瞭解archive.org_bot基本資訊,使用者代理和訪問控制等。
基本資訊
archive.org_bot的基本資訊如下表。但部分不是很規範的蜘蛛和爬蟲,可能存在資訊不明的情況。
- 蜘蛛/爬蟲名稱
- archive.org_bot
- 型別
- 工具
- 開發商
-
Internet Archive
- 當前狀態
-
活動
使用者代理
關於archive.org_bot蜘蛛或者爬蟲的使用者代理字串,IP地址和伺服器,所在地等資訊如下表格所示:
NL-Israel_IAHarvester2024/3.3.0
-
NL-Israel_IAHarvester2024/3.3.0
-
heritrix/3.3.0
-
archive.org_bot
-
archive.org_bot
-
special_archiver
-
special_archiver/3.3.0
-
special_archiver/3.1.1
-
special_archiver/3.1.1
-
archive.org_bot
-
Wayback Machine Live Record
-
Wayback Machine Live Record
-
archive.org_bot/3.3.0
-
heritrix/3.1.1
-
archive.org_bot
-
archive.org_bot
- 使用者代理字串
- Mozilla/5.0 (compatible; NL-Israel_IAHarvester2024/3.3.0; +http://https://archive.org/details/archive.org_bot)
- 首次出現
- 2024-02-09 14:47:50
- 最後出現
- 2024-02-14 10:24:49
- 遵循robots.txt
- 未知
- 來源
-
IP地址(2) |
伺服器名稱 |
所屬國家 |
207.241.235.85 |
wbgrp-crawl047.us.archive.org |
US |
207.241.234.202 |
wbgrp-crawl044.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20140702-2247 +http://archive.org/details/archive.org_bot)
- 首次出現
- 2016-07-09 07:46:39
- 最後出現
- 2023-10-25 16:58:29
- 遵循robots.txt
- 未知
- 來源
-
IP地址(36) |
伺服器名稱 |
所屬國家 |
207.241.234.164 |
iw902707.archive.org |
US |
207.241.230.235 |
iw800709.archive.org |
US |
207.241.229.70 |
iw802605.archive.org |
US |
207.241.233.181 |
iw902904.archive.org |
US |
207.241.229.74 |
iw802506.archive.org |
US |
207.241.226.61 |
iw601303.archive.org |
US |
207.241.226.104 |
iw600707.archive.org |
US |
207.241.229.68 |
iw902602.archive.org |
US |
207.241.229.192 |
iw801604.archive.org |
US |
207.241.229.80 |
iw802207.archive.org |
US |
207.241.225.70 |
iw600209.archive.org |
US |
207.241.225.178 |
iw600808.archive.org |
US |
207.241.225.156 |
iw601403.archive.org |
US |
207.241.225.53 |
iw600409.archive.org |
US |
78.161.160.233 |
78.161.160.233.dynamic.ttnet.com.tr |
TR |
- 使用者代理字串
- Mozilla/5.0 (compatible; archive.org_bot; Archive-It; +http://archive-it.org/files/site-owners.html)
- 首次出現
- 2016-06-24 16:25:53
- 最後出現
- 2022-05-25 20:12:33
- 遵循robots.txt
- 未知
- 來源
-
IP地址(18) |
伺服器名稱 |
所屬國家 |
207.241.232.221 |
wbgrp-crawl220.us.archive.org |
US |
207.241.232.173 |
wbgrp-crawl234.us.archive.org |
US |
207.241.231.104 |
wbgrp-svc210.us.archive.org |
US |
207.241.231.52 |
wbgrp-crawl214.us.archive.org |
US |
207.241.234.99 |
wbgrp-svc249.us.archive.org |
US |
207.241.231.193 |
wbgrp-crawl018.us.archive.org |
US |
207.241.232.175 |
wbgrp-crawl232.us.archive.org |
US |
207.241.231.94 |
wbgrp-svc229.us.archive.org |
US |
207.241.231.105 |
wbgrp-svc209.us.archive.org |
US |
207.241.231.103 |
wbgrp-svc211.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)
- 首次出現
- 2010-12-30 16:35:46
- 最後出現
- 2022-04-11 17:19:46
- 遵循robots.txt
- 未知
- 來源
-
IP地址(97) |
伺服器名稱 |
所屬國家 |
3.145.67.195 |
ec2-3-145-67-195.us-east-2.compute.amazonaws.com |
US |
113.96.250.18 |
113.96.250.18 |
CN |
207.241.229.50 |
crawl812.us.archive.org |
US |
207.241.231.143 |
crawl428.us.archive.org |
US |
207.241.229.148 |
crawl802.us.archive.org |
US |
207.241.229.51 |
crawl811.us.archive.org |
US |
207.241.231.151 |
crawl420.us.archive.org |
US |
207.241.233.159 |
crawl806.us.archive.org |
US |
207.241.231.147 |
crawl424.us.archive.org |
US |
207.241.233.177 |
crawl853.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; special_archiver; Archive-It; +http://archive-it.org/files/site-owners-special.html)
- 首次出現
- 2018-05-05 21:56:46
- 最後出現
- 2021-11-27 08:37:49
- 遵循robots.txt
- 未知
- 來源
-
IP地址(11) |
伺服器名稱 |
所屬國家 |
207.241.229.109 |
wbgrp-crawl011.us.archive.org |
US |
207.241.234.246 |
wbgrp-crawl036.us.archive.org |
US |
207.241.231.194 |
wbgrp-crawl019.us.archive.org |
US |
207.241.232.216 |
wbgrp-crawl225.us.archive.org |
US |
207.241.231.196 |
wbgrp-crawl021.us.archive.org |
US |
207.241.232.175 |
wbgrp-crawl232.us.archive.org |
US |
207.241.231.193 |
wbgrp-crawl018.us.archive.org |
US |
207.241.232.218 |
wbgrp-crawl223.us.archive.org |
US |
207.241.231.190 |
wbgrp-crawl015.us.archive.org |
US |
207.241.232.96 |
wbgrp-crawl241.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; special_archiver/3.3.0 bot@archive.org +https://archive.org/details/archive.org_bot)
- 首次出現
- 2016-11-23 11:48:52
- 最後出現
- 2019-08-11 10:52:41
- 遵循robots.txt
- 未知
- 來源
-
IP地址(2) |
伺服器名稱 |
所屬國家 |
207.241.231.81 |
wbgrp-svc281.us.archive.org |
US |
207.241.226.41 |
wbgrp-crawl005.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; special_archiver/3.1.1 http://www.archive.org/details/archive.org_bot)
- 首次出現
- 2018-04-11 14:52:31
- 最後出現
- 2019-06-03 23:18:59
- 遵循robots.txt
- 否
- 來源
-
IP地址(16) |
伺服器名稱 |
所屬國家 |
207.241.231.147 |
crawl424.us.archive.org |
US |
207.241.231.143 |
crawl428.us.archive.org |
US |
207.241.231.148 |
crawl423.us.archive.org |
US |
207.241.229.225 |
crawl891.us.archive.org |
US |
207.241.234.64 |
crawl505.us.archive.org |
US |
207.241.231.151 |
crawl420.us.archive.org |
US |
207.241.231.150 |
crawl421.us.archive.org |
US |
207.241.234.62 |
crawl503.us.archive.org |
US |
207.241.234.61 |
crawl502.us.archive.org |
US |
207.241.234.63 |
crawl504.us.archive.org |
US |
207.241.231.132 |
crawl500.us.archive.org |
US |
207.241.231.149 |
crawl422.us.archive.org |
US |
207.241.231.163 |
crawl345.us.archive.org |
US |
207.241.231.164 |
crawl344.us.archive.org |
US |
207.241.229.48 |
crawl814.us.archive.org |
US |
207.241.231.144 |
crawl427.us.archive.org |
US |
207.241.229.149 |
crawl801.us.archive.org |
US |
207.241.233.160 |
crawl805.us.archive.org |
US |
207.241.229.150 |
crawl809.us.archive.org |
US |
207.241.229.214 |
crawl805.us.archive.org |
US |
207.241.231.165 |
crawl339.us.archive.org |
US |
207.241.229.30 |
crawl838.us.archive.org |
US |
207.241.229.32 |
crawl836.us.archive.org |
US |
207.241.231.43 |
crawl855.us.archive.org |
US |
207.241.235.183 |
crawl861.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; special_archiver/3.1.1 http://www.archive.org/details/archive.org_bot)
- 首次出現
- 2018-04-11 14:52:31
- 最後出現
- 2019-06-03 23:18:59
- 遵循robots.txt
- 未知
- 來源
-
IP地址(16) |
伺服器名稱 |
所屬國家 |
207.241.229.48 |
crawl814.us.archive.org |
US |
207.241.231.144 |
crawl427.us.archive.org |
US |
207.241.229.149 |
crawl801.us.archive.org |
US |
207.241.231.148 |
crawl423.us.archive.org |
US |
207.241.233.160 |
crawl805.us.archive.org |
US |
207.241.229.150 |
crawl809.us.archive.org |
US |
207.241.229.214 |
crawl805.us.archive.org |
US |
207.241.231.165 |
crawl339.us.archive.org |
US |
207.241.229.30 |
crawl838.us.archive.org |
US |
207.241.229.32 |
crawl836.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; archive.org_bot http://www.archive.org/details/archive.org_bot)
- 首次出現
- 2018-04-20 13:04:06
- 最後出現
- 2019-05-22 14:14:23
- 遵循robots.txt
- 未知
- 來源
-
IP地址(19) |
伺服器名稱 |
所屬國家 |
207.241.231.170 |
crawl825.us.archive.org |
US |
207.241.231.144 |
crawl427.us.archive.org |
US |
207.241.231.148 |
crawl423.us.archive.org |
US |
207.241.229.149 |
crawl801.us.archive.org |
US |
207.241.229.150 |
crawl809.us.archive.org |
US |
207.241.233.160 |
crawl805.us.archive.org |
US |
207.241.229.48 |
crawl814.us.archive.org |
US |
207.241.229.214 |
crawl805.us.archive.org |
US |
207.241.231.171 |
crawl824.us.archive.org |
US |
207.241.232.43 |
crawl849.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; http://archive.org/details/archive.org_bot)
- 首次出現
- 2016-06-14 23:11:42
- 最後出現
- 2019-02-24 01:39:52
- 遵循robots.txt
- 否
- 來源
-
IP地址(7) |
伺服器名稱 |
所屬國家 |
179.43.155.171 |
179.43.155.171 |
CH |
5.183.92.86 |
5.183.92.86 |
DE |
51.158.111.157 |
157-111-158-51.rev.cloud.scaleway.com |
FR |
207.241.226.230 |
wwwb-app14.us.archive.org |
US |
207.241.225.227 |
wwwb-app1.us.archive.org |
US |
207.241.232.121 |
wwwb-app52.us.archive.org |
US |
207.241.225.236 |
wwwb-app6.us.archive.org |
US |
207.241.227.105 |
wwwb-app54.us.archive.org |
US |
207.241.225.246 |
wwwb-app4.us.archive.org |
US |
207.241.226.219 |
wwwb-app15.us.archive.org |
US |
207.241.225.226 |
wwwb-app0.us.archive.org |
US |
207.241.225.235 |
wwwb-app8.us.archive.org |
US |
79.110.49.145 |
79.110.49.145 |
US |
109.205.213.134 |
109.205.213.134 |
AZ |
3.110.51.173 |
ec2-3-110-51-173.ap-south-1.compute.amazonaws.com |
IN |
137.184.12.53 |
137.184.12.53 |
US |
13.52.237.32 |
ec2-13-52-237-32.us-west-1.compute.amazonaws.com |
US |
3.0.56.33 |
ec2-3-0-56-33.ap-southeast-1.compute.amazonaws.com |
SG |
5.8.11.202 |
5.8.11.202 |
RU |
13.37.213.184 |
ec2-13-37-213-184.eu-west-3.compute.amazonaws.com |
FR |
105.110.165.177 |
105.110.165.177 |
DZ |
3.89.39.91 |
ec2-3-89-39-91.compute-1.amazonaws.com |
US |
81.161.238.40 |
81.161.238.40 |
NL |
178.215.236.240 |
178.215.236.240 |
FR |
- 使用者代理字串
- Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; http://archive.org/details/archive.org_bot)
- 首次出現
- 2016-06-14 23:11:42
- 最後出現
- 2019-02-24 01:39:52
- 遵循robots.txt
- 未知
- 來源
-
IP地址(7) |
伺服器名稱 |
所屬國家 |
207.241.225.246 |
wwwb-app4.us.archive.org |
US |
207.241.225.236 |
wwwb-app6.us.archive.org |
US |
207.241.225.226 |
wwwb-app0.us.archive.org |
US |
207.241.232.121 |
wwwb-app52.us.archive.org |
US |
207.241.227.105 |
wwwb-app54.us.archive.org |
US |
207.241.226.230 |
wwwb-app14.us.archive.org |
US |
207.241.225.235 |
wwwb-app8.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; archive.org_bot/3.3.0 +http://www.archive.org/details/archive.org_bot)
- 首次出現
- 2015-09-23 21:15:00
- 最後出現
- 2015-09-24 08:11:30
- 遵循robots.txt
- 未知
- 來源
-
IP地址(1) |
伺服器名稱 |
所屬國家 |
207.241.226.37 |
wbgrp-crawl009.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; heritrix/3.1.1-SNAPSHOT-20120116.200628 +http://www.archive.org/details/archive.org_bot)
- 首次出現
- 2012-03-25 19:13:00
- 最後出現
- 2012-09-13 22:02:30
- 遵循robots.txt
- 未知
- 來源
-
IP地址(1) |
伺服器名稱 |
所屬國家 |
207.241.237.214 |
crawl435.us.archive.org |
US |
- 使用者代理字串
- Mozilla/5.0 (compatible; archive.org_bot/heritrix-1.15.4 +http://www.archive.org)
- 首次出現
- 2010-06-11 23:06:12
- 最後出現
- 2010-07-06 08:30:52
- 遵循robots.txt
- 否
- 來源
-
IP地址(1) |
伺服器名稱 |
所屬國家 |
207.241.229.224 |
crawl892.us.archive.org |
US |
207.241.231.188 |
crawl895.us.archive.org |
US |
207.241.233.139 |
crawl865.us.archive.org |
US |
207.241.233.159 |
crawl806.us.archive.org |
US |
207.241.229.33 |
crawl835.us.archive.org |
US |
207.241.232.38 |
crawl109.us.archive.org |
US |
207.241.234.182 |
crawl804.us.archive.org |
US |
207.241.231.37 |
crawl897.us.archive.org |
US |
207.241.231.163 |
crawl345.us.archive.org |
US |
207.241.231.152 |
crawl409.us.archive.org |
US |
207.241.229.149 |
crawl801.us.archive.org |
US |
207.241.231.165 |
crawl339.us.archive.org |
US |
207.241.232.40 |
crawl107.us.archive.org |
US |
207.241.233.116 |
crawl825.us.archive.org |
US |
207.241.225.181 |
crawl858.us.archive.org |
US |
207.241.235.168 |
crawl902.us.archive.org |
US |
185.237.252.77 |
m18077.contaboserver.net |
DE |
3.145.67.195 |
ec2-3-145-67-195.us-east-2.compute.amazonaws.com |
US |
113.96.250.18 |
113.96.250.18 |
CN |
207.241.229.50 |
crawl812.us.archive.org |
US |
207.241.231.143 |
crawl428.us.archive.org |
US |
207.241.229.148 |
crawl802.us.archive.org |
US |
207.241.229.51 |
crawl811.us.archive.org |
US |
207.241.231.151 |
crawl420.us.archive.org |
US |
207.241.231.147 |
crawl424.us.archive.org |
US |
207.241.232.221 |
wbgrp-crawl220.us.archive.org |
US |
207.241.232.173 |
wbgrp-crawl234.us.archive.org |
US |
207.241.231.104 |
wbgrp-svc210.us.archive.org |
US |
207.241.231.52 |
wbgrp-crawl214.us.archive.org |
US |
207.241.234.99 |
wbgrp-svc249.us.archive.org |
US |
207.241.231.193 |
wbgrp-crawl018.us.archive.org |
US |
207.241.232.175 |
wbgrp-crawl232.us.archive.org |
US |
207.241.231.94 |
wbgrp-svc229.us.archive.org |
US |
207.241.231.105 |
wbgrp-svc209.us.archive.org |
US |
207.241.231.103 |
wbgrp-svc211.us.archive.org |
US |
207.241.231.170 |
crawl825.us.archive.org |
US |
207.241.231.144 |
crawl427.us.archive.org |
US |
207.241.231.148 |
crawl423.us.archive.org |
US |
207.241.229.150 |
crawl809.us.archive.org |
US |
207.241.233.160 |
crawl805.us.archive.org |
US |
207.241.229.48 |
crawl814.us.archive.org |
US |
207.241.229.214 |
crawl805.us.archive.org |
US |
207.241.231.171 |
crawl824.us.archive.org |
US |
207.241.232.43 |
crawl849.us.archive.org |
US |
207.241.228.179 |
ia360937.us.archive.org |
US |
95.216.55.129 |
static.129.55.216.95.clients.your-server.de |
FI |
207.241.233.247 |
crawl800.us.archive.org |
US |
207.241.234.15 |
crawl807.us.archive.org |
US |
207.241.235.229 |
crawl905.us.archive.org |
US |
207.241.235.230 |
crawl906.us.archive.org |
US |
207.241.236.59 |
crawl908.us.archive.org |
US |
207.241.236.85 |
crawl910.us.archive.org |
US |
95.217.88.52 |
95-217-88-52.yaip.io |
FI |
3.218.67.10 |
ec2-3-218-67-10.compute-1.amazonaws.com |
US |
149.100.158.27 |
149.100.158.27 |
US |
78.161.160.233 |
78.161.160.233.dynamic.ttnet.com.tr |
TR |
207.241.235.164 |
crawl901.us.archive.org |
US |
207.241.229.32 |
crawl836.us.archive.org |
US |
207.241.236.58 |
crawl907.us.archive.org |
US |
103.56.17.252 |
103.56.17.252 |
CN |
44.203.103.78 |
ec2-44-203-103-78.compute-1.amazonaws.com |
US |
54.183.113.139 |
ec2-54-183-113-139.us-west-1.compute.amazonaws.com |
US |
100.27.12.252 |
ec2-100-27-12-252.compute-1.amazonaws.com |
US |
207.241.236.82 |
crawl113.us.archive.org |
US |
207.241.234.235 |
crawl805.us.archive.org |
US |
207.241.234.96 |
wbgrp-svc246.us.archive.org |
US |
207.241.232.217 |
wbgrp-crawl224.us.archive.org |
US |
207.241.236.193 |
crawl917.us.archive.org |
US |
207.241.225.114 |
crawl919.us.archive.org |
US |
207.241.236.83 |
crawl346.us.archive.org |
US |
94.156.68.162 |
94.156.68.162 |
NL |
207.241.235.133 |
crawl900.us.archive.org |
US |
207.241.225.134 |
crawl917.us.archive.org |
US |
207.241.236.213 |
crawl809.us.archive.org |
US |
2a01:4f9:3071:2b63::2 |
2a01:4f9:3071:2b63::2 |
FI |
207.241.237.22 |
crawl918.us.archive.org |
US |
81.161.238.40 |
81.161.238.40 |
NL |
- 使用者代理字串
- Mozilla/5.0 (compatible; archive.org_bot/heritrix-1.15.4 +http://www.archive.org)
- 首次出現
- 2010-06-11 23:06:12
- 最後出現
- 2010-07-06 08:30:52
- 遵循robots.txt
- 未知
- 來源
-
IP地址(1) |
伺服器名稱 |
所屬國家 |
207.241.228.179 |
ia360937.us.archive.org |
US |
訪問控制
瞭解如何控制archive.org_bot訪問許可權,避免archive.org_bot抓取行為不當。
是否攔截archive.org_bot?
可能不需要。工具型別爬蟲通常為網站所有者使用此類工具對網站進行相關服務請求才會出現。當然,實際情況需站長判斷後再作決定。
通過Robots.txt攔截
您可以通過在網站的 robots.txt 中設定使用者代理訪問規則來遮蔽 archive.org_bot 或限制其訪問許可權。我們建議安裝 Spider Analyser
外掛,以檢查它是否真正遵循這些規則。
User-agent: archive.org_bot
# robots.txt
# 下列程式碼一般情況可以攔截該代理
User-agent: archive.org_bot
Disallow: /
# robots.txt
# 下列程式碼一般情況可以攔截該代理
User-agent: archive.org_bot
Disallow: /
您無需手動執行此操作,可通過我們的 Wordpress 外掛 Spider Analyser 來攔截不必要的蜘蛛或者爬蟲。
更多資訊
網際網路檔案館(The Internet Archive)是一個非營利性的數字圖書館,它儲存網路資料,並通過Wayback Machine使其可用於研究目的。我們從1996年開始對網路進行存檔,目前已儲存了超過1500億份網路檔案。
網際網路檔案館與大學、圖書館和其他機構合作,保護世界文化遺產。除了我們的網路儲存活動,我們還提供免費訪問超過200萬本數字書籍,超過60萬個音訊專案,以及超過30萬個視訊專案。請訪問我們的新聞和公告論壇,閱讀更多關於我們的專案。
Webmasters:使用者代理archive.org_bot用於archive.org對網路的廣泛抓取。archive.org嘗試以足夠慢的速度抓取,以免干擾正常的網路活動。你可以在Wayback Machine常見問題中瞭解更多。如果你發現archive.org_bot的行為有問題,請通過以下方式聯絡archive.org
bot@archive.org
archive.org認為,為後代儲存網路資料是當務之急。與印刷媒體不同,網頁可能而且確實會消失在空氣中。