网页抓取:Beautifulsoup

我正在尝试使用漂亮的汤和Python / Pandas从Wikipedia页面提取所有“景点”,并将它们放入数据框。例如:

https://en.wikipedia.org/wiki/1st_arrondissement_of_Paris

url_Paris_01 = requests.get('https://en.wikipedia.org/wiki/1st_arrondissement_of_Paris').text
soup_Paris_01 = BeautifulSoup(url_Paris_01,"html.parser")

for headline in soup_Paris_01.find_all("span",{"class": "mw-headline"}):
    print(headline.text)

Geography
Demography
Historical population
Immigration
Quarters
Economy
Education
Map
Cityscape
**Places of interest**
Bridges
Streets and squares
See also
References
External links

不起作用

soup_Paris_01.find_all('li',attrs={"id":"Places_of_interest"}) 

我看到我的“景点”都有一个标题标签。

名胜古迹

yanbinfighting 回答:网页抓取:Beautifulsoup

首先在ul跨度标签下找到place of interest项,然后对ul项下的所有锚点标签进行 find_all()

from bs4 import BeautifulSoup
import requests
url_Paris_01 = requests.get('https://en.wikipedia.org/wiki/1st_arrondissement_of_Paris').text
soup_Paris_01 = BeautifulSoup(url_Paris_01,"html.parser")
placeofinterset=soup_Paris_01.find("span",id="Places_of_interest").find_next('ul')
for place in placeofinterset.find_all('a'):
    print(place['title']) #This will give you title
    print(place.text) #This will give you text
本文链接:https://www.f2er.com/3127733.html

大家都在问