Googlebot无法访问CloudFront上托管的图像-被robots.txt阻止的报告

2024-05-17 • 问答

Google Search Console报告我托管在AWS CloudFront上的图像“被robots.txt阻止”。我正在寻找使Google确认我拥有有效图像并且不将图像网址显示为“ Crawl Anamoly”的方法。

以下是我的设置的概述：

我的主站点是www.crazyengineers.com，它的robots.txt为

User-agent: *
Disallow:

我已将Cloudfront CDN配置为提供来自data.crazyengineers.com的图像，并将图像正确加载到浏览器中以供用户使用。这是示例图片网址：

https://data.crazyengineers.com/old-attachments/6/6576-slide.jpg

为了解决此问题，我创建了一个公共robots.txt并将其放在我的原始存储桶中。我可以从浏览器上访问它

https://data.crazyengineers.com/robots.txt，它具有以下内容-

User-agent: Googlebot
Disallow:
User-agent: Googlebot-image
Disallow:

但是，Google Search Console报告该网址不可用于Google，因为该网址已被robots.txt阻止。

有人可以帮我弄清楚Google如何真正访问图像并消除GSC的“抓取异常”错误吗？