我想使用BeautifulSoup从
HTML标记内的子类中抓取文本,但是输出是一个空数组。
我已经尝试过仅使用上层类(msg-content-cell)和仅使用子类(f1vbk p-msg-head-body),最后不使用标签p。
这是我的python程序:
class CrawledArticle():
def __init__(self,heading,message):
self.heading = heading
self.message = message
class ArticleFetcher():
def fetch(self):
url = "https://www.verkehrsinformation.de/?road=A8®ion=%25"
articles = []
time.sleep(1)
r = requests.get(url)
doc = BeautifulSoup(r.text,"html.parser")
for heading in doc.select(".td-msg-head-heading"):
heading = heading.select(".td-msg-head-heading")
for message in doc.select(".msg-content-cell"):
message = message.select(".msg-content-cell .f1vbk p-msg-head-body p")
crawled = CrawledArticle(heading,message)
articles.append(crawled)
return articles
这是HTML源代码的摘录,我想在其中提取文本“ zwischen beratzhausen(95)和Parsberg(94)”
</div>
<div id="a3itHKyCfOGlFAIL" class="table-row newmsg">
<div class="msg-content-cell">
<div class="row bg-white cursor-pointer" onclick="window.location.href='/staumeldung/?token=a3itHKyCfOGlFAIL&sp=ro:%|re:2|pg:1'">
<div class="td-msg-head-heading">
<p class="f1vbk p-msg-head-heading">
A3 Passau Richtung Nürnberg:
</p>
</div>
<div class="td-msg-head-info">
</div>
</div>
<div class="row bg-white cursor-pointer" onclick="window.location.href='/staumeldung/?token=a3itHKyCfOGlFAIL&sp=ro:%|re:2|pg:1'">
<p class="f1vbk p-msg-head-body">
zwischen beratzhausen (95) und Parsberg (94) Wanderbaustelle.
<!--<a class="extendlink l1vbku">Mehr</a>...//-->
</p>
<p class="p-msg-head-body pull-right f1vbk">
<a class="extendlink l1vbku">Kartenansicht</a> |
<a class="extendlink l1vbku">Alle Details</a>
</p>
</div>
</div>
我希望从
HTML标记内的子类“ f1vbk p-msg-head-body”中抓取文本,但是输出为空数组。
与“ td-msg-head-heading”类相比,有什么区别?如何获取纯文本?