当我尝试抓取网站时(在本例中为Amazon,但在许多其他网站中也是如此),当查看它时,GET请求的内容会很好。
print(response.content)
但是将其转换为BeautifulSoup对象时, / body 和 / html 标签会跳起来。
<html>
<head>
...
</head>
<body>
...
</body>
</html>
... # more content that needs to go in the body
编辑: 这是代码:
import requests
from bs4 import BeautifulSoup
myUserAgent = {
"User-Agent":
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}
URL = 'https://www.amazon.com/AOC-I1659FWUX-USB-Powered-Portable-1920x1080/dp/B06Y8SSQG5'
response = requests.get(URL,headers=myUserAgent)
soup = BeautifulSoup(response.content,'html.parser')