刮擦某些属性-Beautiful Soup Python

2024-05-07 • 问答

我需要帮助，将突出显示的span标记中的“ CPCAdvertising.com”一词刮掉（请参见HTML的随附屏幕截图）。我不确定如何正确地进行迭代。这是我到目前为止的内容：

import requests
from bs4 import BeautifulSoup
page_number = 1
flippa_page = requests.get('https://www.flippa.com/search?filter[property_type]=domain&filter[status]=won&filter[sale_method]=auction&page[number]={}&page[size]=250'.format(page_number))
price_list = []
domain_list = []
for i in range(120):
    src = flippa_page.content
    soup = BeautifulSoup(src,'lxml')
    for span_tag in soup.find_all('span'):
        domain_list.append(span_tag.attrs['class'])
    page_number += 1

HTML Screenshot

由于您的网址对我不起作用，因此我使用的网址与该网站不同。无论如何，您可以在find_all()命令中指定类，如下所示：

import requests
from bs4 import BeautifulSoup

flippa_page = requests.get('https://flippa.com/10339489-e-commerce-sports-and-outdoor')
src = flippa_page.content
soup = BeautifulSoup(src,'lxml')

for s in soup.find_all('span',{'class': 'ListingList-itemPrice'}):
    # Print out the text within the tag
    print(s.text.strip())

单词应在span_tag.string中。

使用html.parser而不是lxml，我可以找到span标签：

for item in soup.find_all('span'): 
     if (str(item.contents).find('CPCAdvertising.com')) > -1: 
         print(item) 

<span class="Basic___propertyName">CPCAdvertising.com</span>
<span class="Basic___title">CPCAdvertising.com - One Dollar Reserve !!</span>

由于某种原因，我无法使用lxml进行解析，如果您可以给我一些提示，请问您可以使用哪个lxml库进行检查

刮擦某些属性-Beautiful Soup Python

TYCandy 回答：刮擦某些属性-Beautiful Soup Python

大家都在问