Python BeautifulSoup-麻烦解析表并避免不需要的行

2024-05-29 • 问答

我正在尝试从Wikipedia page中为Django项目解析表数据。

用户将选择他们要查看的时间段（世纪变量），并返回所请求的 Name ， Image 和 Designation 信息。 >在表格中。

在某些Wiki表中，我试图避免在表的中间出现标题。

我对Python和BeautifulSoup还是陌生的，所以我似乎很难做到这一点。如何从 Name 和 Designation 列中获取文本以及从 Image 列中获取图像，并避免在{{ 3}}。


import requests
from bs4 import BeautifulSoup
import pandas as pd

getPage = requests.get("https://en.wikipedia.org/wiki/Timeline_of_discovery_of_Solar_System_planets_and_their_moons")

source = getPage.content
soup = BeautifulSoup(source,'html.parser')
discoverTable = soup.find(id = century).findNext('table',{"class" : "wikitable"})

create_table = [['Name','Image','Designation']]
Prehistory_rows = discoverTable.find_all('tr')[2:]
rows = discoverTable.find_all('tr')[3:]

for row in Prehistory_rows:
    if century in ('Prehistory'):
        name_cell = row.find_all('td')[0].get_text()
        image_cell = row.find_all('td')[1]
        designation_cell = row.find_all('td')[2].get_text()
        display_info = [name_cell,image_cell,designation_cell]
        create_table.append(display_info)

for row in rows:
    if century in ('17th_century','18th_century','19th_century','1901-1950'):
        first_cell = row.find_all('td')[0]
        if first_cell.has_attr('colspan'):
            row.decompose() 
        if first_cell.has_attr('rowspan'):
            spanLength = int(first_cell['rowspan'])
            for i in range(1,spanLength+1):
                for row in rows:
                    name_cell = row.find_all('td')[0].get_text()
                    image_cell = row.find_all('td')[1]
                    designation_cell = row.find_all('td')[2].get_text()
                    display_info = [name_cell,designation_cell]
                    create_table.append(display_info) # add rows to the table

        else:
            first_cell = row.find_all('td')[0]
            name_cell = row.find_all('td')[1].get_text()
            image_cell = row.find_all('td')[2]
            designation_cell = row.find_all('td')[3].get_text()
            display_info = [name_cell,designation_cell] 
            create_table.append(display_info) # add rows to the table


df = pd.DataFrame(create_table[1:],columns = create_table[0])
tabledata = df.to_html(escape = False,classes='discoverTable',index=False)

context = {
    'tabledata' : tabledata,'century' : century,}

return render(request,'discoveryTimeline/discovery_result.html',context)

Python BeautifulSoup-麻烦解析表并避免不需要的行

y4328978 回答：Python BeautifulSoup-麻烦解析表并避免不需要的行

大家都在问