使用Python从Wikipedia刮取表格代码：

2024-05-06 • 问答

我正在尝试使用Python和Beautiful Soup刮取Wikipedia表。当我尝试使用for循环获取表列属性时，出现错误：

NameError                                 Traceback (most recent call last)
<ipython-input-18-948408e65d8d> in <module>
      1 # Header attributes of the table
      2 header=[th.text.rstrip() 
----> 3         for th in rows[0].find_all('th')]
      4 print(header)
      5 print('------------')

NameError: name 'rows' is not defined

我该如何解决？

代码：

url =“ https://en.wikipedia.org/wiki/List_of_municipalities_of_Norway”

启动多个URL请求。如果请求成功，则预期的HTTP响应状态码为200。

s=requests.Session()
response=s.get(url,timeout=10)
response

res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')

维基百科页面标题

soup.title.string

获取正确的表格进行报废

right_table=soup.find('table',{"class":'sortable wikitable'})

表的标题属性

header=[th.text.rstrip() 
        for th in rows[0].find_all('th')]
print(header)
print('------------')
print(len(header))

| | Number[1](ISO 3166-2:NO) | Name | Adm. center | County | Population(2017)[2] | Area(km²)[3] | CountyMap | Arms | Language form[4] | Mayor[5] | Party | |----:|---------------------------:|:-----------------------------|:---------------------|:---------------------|----------------------:|---------------:|------------:|-------:|:-------------------------|:----------------------------|:--------| | 0 | 301 | Oslo | Oslo | Oslo | 673469 | 454.03 | nan | nan | Neutral | Marianne Borgen | SV | | 1 | 1101 | Eigersund | Egersund | Rogaland | 14898 | 431.66 | nan | nan | Bokmål | Leif Erik Egaas | H | | 2 | 1103 | Stavanger | Stavanger | Rogaland | 141186 | 262.52 | nan | nan | Bokmål | Kari Nessa Nordtun | Ap | | 3 | 1106 | Haugesund | Haugesund | Rogaland |

使用Python从Wikipedia刮取表格 代码：

代码：

center1370 回答：使用Python从Wikipedia刮取表格 代码：

大家都在问

使用Python从Wikipedia刮取表格代码：

center1370 回答：使用Python从Wikipedia刮取表格代码：