我是python的新手。我目前正在进行网页抓取。任务是清除Dell Community Inspiron问题的前5页。我有运行并返回所需信息的代码。但是,我无法仅获取文本。我当前的代码返回text + html。我曾尝试将.text放置在代码的各个位置,但这样做时只会出错。
最常见的错误是:“ AttributeError:ResultSet对象没有属性'text'。您可能将项目列表像单个项目一样对待。当您打算调用find()时是否调用了find_all()? “
下面是我的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import os,csv
from time import sleep
pages = ['https://www.dell.com/community/Inspiron/bd-p/Inspiron','https://www.dell.com/community/Inspiron/bd-p/Inspiron/page/2','https://www.dell.com/community/Inspiron/bd-p/Inspiron/page/3','https://www.dell.com/community/Inspiron/bd-p/Inspiron/page/4','https://www.dell.com/community/Inspiron/bd-p/Inspiron/page/5'
]
import requests
data = []
for page in pages:
r = requests.get(page)
soup = BeautifulSoup(r.content,'html.parser')
rows = soup.select('tbody tr')
for row in rows:
d = dict()
d['title'] = soup.find_all ('a',attrs = {'class': 'page-link lia-link-navigation lia-custom-event'})
d['author'] = soup.find_all ('span',attrs = {'class': 'login-bold'})
d['time'] = soup.find_all ('span',attrs = {'class': 'local-time'})
d['kudos'] = soup.find_all ('div',attrs = {'class': 'lia-component-messages-column-message-kudos-count'})
d['messages'] = soup.find_all ('div',attrs = {'class': 'lia-component-messages-column-message-replies-count'})
d['views'] = soup.find_all ('div',attrs = {'class': 'lia-component-messages-column-topic-views-count'})
d['solved'] = soup.find_all ('td',attrs = {'aria-label': 'triangletop lia-data-cell-secondary lia-data-cell-icon'})
d['latest']= soup.find_all ('span',attrs = {'cssclass': 'lia-info-area-item'})
data.append(d)
sleep(10)
print(data[0])
任何帮助将不胜感激。谢谢!