您不是将值附加到 beautified_value
,只是一遍又一遍地重写它。您可以使用 list.append
,例如:
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.worldometers.info/world-population/population-by-country/"
output = requests.get(url)
soup = BeautifulSoup(output.text,"html.parser")
table = soup.find("table")
columns = []
header_tags = table.find_all("th")
headers = [header.text.strip() for header in header_tags]
data_rows = table.find_all("tr")[1:]
beautified_value = []
for row in data_rows:
value = row.find_all("td")
beautified_value.append(dp.text.strip() for dp in value)
df = pd.DataFrame(data=beautified_value,columns=headers)
print(df)
打印:
# Country (or dependency) Population (2020) Yearly Change Net Change Density (P/Km²) Land Area (Km²) Migrants (net) Fert. Rate Med. Age Urban Pop % World Share
0 1 China 1,439,323,776 0.39 % 5,540,090 153 9,388,211 -348,399 1.7 38 61 % 18.47 %
1 2 India 1,380,004,385 0.99 % 13,586,631 464 2,973,190 -532,687 2.2 28 35 % 17.70 %
2 3 United States 331,002,651 0.59 % 1,937,734 36 9,147,420 954,806 1.8 38 83 % 4.25 %
...
,
使用read_html
。并不是说我必须使用请求手动设置用户代理,否则会抛出 403
错误:
import requests
import pandas as pd
df = pd.read_html(requests.get(url,headers={'User-agent': 'Mozilla/5.0'}).text)[0]
|
# |
国家(或依赖) |
人口(2020 年) |
逐年变化 |
净变化 |
密度(P/Km²) |
土地面积(平方公里) |
移民(净) |
Fert。速率 |
医学。年龄 |
城市流行% |
世界分享 |
0 |
1 |
中国 |
1439323776 |
0.39% |
5540090 |
153 |
9388211 |
-348399 |
1.7 |
38 |
61% |
18.47% |
1 |
2 |
印度 |
1380004385 |
0.99% |
13586631 |
464 |
2973190 |
-532687 |
2.2 |
28 |
35% |
17.70% |
2 |
3 |
美国 |
331002651 |
0.59% |
1937734 |
36 |
9147420 |
954806 |
1.8 |
38 |
83% |
4.25% |
3 |
4 |
印度尼西亚 |
273523615 |
1.07% |
2898047 |
151 |
1811570 |
-98955 |
2.3 |
30 |
56% |
3.51% |
4 |
5 |
巴基斯坦 |
220892340 |
2.00% |
4327022 |
287 |
770880 |
-233379 |
3.6 |
23 |
35% |
2.83% |
本文链接:https://www.f2er.com/802.html