
如何做搜尋引擎蜘蛛日誌分析
搜尋引擎蜘蛛日誌檔案是一種非常強大但未被站長充分利用的檔案,分析它可以獲取有關每個搜尋引擎如何爬取網站內容的相關資訊點,及檢視搜尋引擎蜘蛛在一段時間內的行為。
IP地址(1) | 伺服器名稱 | 所屬國家 |
---|---|---|
45.146.204.152 | 45.146.204.152 | GB |
74.85.210.138 | 74.85.210.138 | US |
188.240.49.6 | 188.240.49.6 | US |
103.251.167.10 | this-is-a-TOR-EXIT-NODE.union | NL |
207.244.252.135 | m14435.contaboserver.net | US |
194.124.247.4 | 194.124.247.4 | GB |
45.66.177.139 | 45.66.177.139 | GB |
139.28.123.92 | 139.28.123.92 | GB |
45.154.193.43 | 45.154.193.43 | GB |
217.9.18.84 | 217.9.18.84 | GB |
3.86.82.0 | ec2-3-86-82-0.compute-1.amazonaws.com | US |
3.84.24.248 | ec2-3-84-24-248.compute-1.amazonaws.com | US |
172.105.36.254 | 172-105-36-254.ip.linodeusercontent.com | IN |
3.120.243.195 | ec2-3-120-243-195.eu-central-1.compute.amazonaws.com | DE |
195.201.86.130 | static.130.86.201.195.clients.your-server.de | DE |
35.205.240.69 | 69.240.205.35.bc.googleusercontent.com | BE |
34.76.161.99 | 99.161.76.34.bc.googleusercontent.com | US |
34.77.20.92 | 92.20.77.34.bc.googleusercontent.com | US |
104.199.11.99 | 99.11.199.104.bc.googleusercontent.com | BE |
192.158.28.133 | 133.28.158.192.bc.googleusercontent.com | US |
35.187.19.247 | 247.19.187.35.bc.googleusercontent.com | US |
34.77.147.121 | 121.147.77.34.bc.googleusercontent.com | US |
136.243.74.184 | static.184.74.243.136.clients.your-server.de | DE |
159.69.137.134 | static.134.137.69.159.clients.your-server.de | DE |
62.3.25.48 | 62.3.25.48 | IE |
136.243.129.165 | static.165.129.243.136.clients.your-server.de | DE |
78.46.91.252 | static.252.91.46.78.clients.your-server.de | DE |
51.158.125.26 | 26-125-158-51.instances.scw.cloud | FR |
163.172.188.13 | 13-188-172-163.instances.scw.cloud | FR |
163.172.162.101 | 101-162-172-163.instances.scw.cloud | FR |
196.19.199.41 | 196.19.199.41 | US |
51.15.226.71 | 71-226-15-51.instances.scw.cloud | FR |
51.15.201.79 | 79-201-15-51.instances.scw.cloud | FR |
51.158.97.202 | 202-97-158-51.instances.scw.cloud | FR |
3.144.9.251 | ec2-3-144-9-251.us-east-2.compute.amazonaws.com | US |
3.19.58.18 | ec2-3-19-58-18.us-east-2.compute.amazonaws.com | US |
85.94.197.203 | itvpn.adsender.us | IT |
67.220.86.160 | main-db.shadowmap.com | US |
202.120.37.109 | 202.120.37.109 | CN |
35.90.121.79 | ec2-35-90-121-79.us-west-2.compute.amazonaws.com | US |
146.70.189.181 | 146.70.189.181 | IE |
IP地址(1) | 伺服器名稱 | 所屬國家 |
---|---|---|
3.120.243.195 | ec2-3-120-243-195.eu-central-1.compute.amazonaws.com | DE |
IP地址(63) | 伺服器名稱 | 所屬國家 |
---|---|---|
195.201.86.130 | static.130.86.201.195.clients.your-server.de | DE |
35.205.240.69 | 69.240.205.35.bc.googleusercontent.com | BE |
34.76.161.99 | 99.161.76.34.bc.googleusercontent.com | US |
34.77.20.92 | 92.20.77.34.bc.googleusercontent.com | US |
104.199.11.99 | 99.11.199.104.bc.googleusercontent.com | BE |
192.158.28.133 | 133.28.158.192.bc.googleusercontent.com | US |
35.187.19.247 | 247.19.187.35.bc.googleusercontent.com | US |
34.77.147.121 | 121.147.77.34.bc.googleusercontent.com | US |
136.243.74.184 | static.184.74.243.136.clients.your-server.de | DE |
159.69.137.134 | static.134.137.69.159.clients.your-server.de | DE |
可以考慮攔截。。爬蟲通常會下載公開的網際網路內容,這些內容預設情況下可以免費訪問。不過,如果你不希望你的內容被用於未經授權的目的,你應該攔截它們。
您可以通過在網站的 robots.txt 中設定使用者代理訪問規則來遮蔽 Splash 或限制其訪問許可權。我們建議安裝 Spider Analyser 外掛,以檢查它是否真正遵循這些規則。
# robots.txt # 下列程式碼一般情況可以攔截該代理 User-agent: Splash Disallow: /
您無需手動執行此操作,可通過我們的 Wordpress 外掛 Spider Analyser 來攔截不必要的蜘蛛或者爬蟲。