网页抓取python3的用户代理错误

这是我第一次使用网页抓取。当我使用page = requests.get(URL)时,效果很好,但是当我添加

时,
headers = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML,like Gecko) Version/13.0.4 Safari/605.1.15'}

page = requests.get(URL,headers=headers)

我遇到错误

    title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

这有什么问题?我应该辞职吗?

scutengineer3 回答:网页抓取python3的用户代理错误

我认为该页面包含无效的HTML,因此BeatifulSoup无法找到您的元素。

尝试首先修饰HTML:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/dp/B07JP9QJ15/ref=dp_cerb_1'
headers = {
    "User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML,like Gecko) Version/13.0.4 Safari/605.1.15'}
page = requests.get(URL,headers=headers)

pretty = BeautifulSoup(page.text,'html.parser').prettify()
soup = BeautifulSoup(pretty,'html.parser')
print(soup.find(id='productTitle').get_text())

哪个返回:

Dell UltraSharp U2719D-LED显示器-27英寸

本文链接:https://www.f2er.com/2813011.html

大家都在问