如何刮the价格

如何为某个特定商品刮价?

在html中,有多个class="pb-current-price"的div,但是,我只对$2,299.99的价格感兴趣。我该怎么做?

谢谢。

                 <div class="pb pb-large-view pb-theme-default">
                  <div class="pb-current-price ">
                   <span class="">
                    $2,299.99
                   </span>
                  </div>
                 </div>

import requests
import bs4 as bs 
from lxml import html

url = ""
agent = {"User-Agent":""}
url_get = requests.get(url,headers=agent) #,cookies=cookies)


tree = html.fromstring(url_get.content)

prices = tree.xpath('//div[@class="pb-sale-price "]/span/text()')
print(prices)

运行以上代码将返回价格为[]

huanwujun 回答:如何刮the价格

Ciao

我正在处理您的代码。摘录前的几件事:

1)您正在搜索"pb-sale-price "而不是"pb-current-price "
2)如评论所述,我无法使用您的html页面,因此我根据您提供给我们的html代码段模拟了答案
3)为了完整起见,我还模拟了另一篇文章

现在输入代码:

import requests
import bs4 as bs 
from lxml import html


# simulating the html answer
string="""
<div class="pb pb-large-view pb-theme-default">
<div class="pb-current-price ">
  <span class="">
  $2,299.99
  </span>
</div>
</div>

<div class="pb pb-large-view pb-theme-default">
<div class="pb-current-price ">
  <span class="">
  $799.99
  </span>
</div>
</div>
"""


url = "https://www.bestbuy.com/site/lg-65-class-oled-b9-series-2160p-smart-4k-uhd-tv-with-hdr/6360611.p?skuId=6360611"
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/78.0.3904.87 Safari/537.36'}
# cookies = {"cookie":"COPY_HERE_YOUR_COOKIE_FROM_BROWSER"}
#url_get = requests.get(url,headers=agent) #,cookies=cookies)

#tree = html.fromstring(url_get.content)
tree = html.fromstring(string)
#print(html.tostring(tree).decode("utf-8"))

prices = tree.xpath('//div[@class="pb-current-price "]/span/text()')

# output cleaning
prices = [x.strip(',\n') for x in prices]
print(prices)

输出

['$2,299.99','$799.99']

希望有帮助,
安东尼诺

PS-我强烈建议您也阅读this beautiful article

,

您显示的价格是正常价格。您可以按照以下方式从其中一个脚本标签中获取

import requests,json,re

headers = {'User-Agent':'Mozilla/5.0'}
r = requests.get('https://www.bestbuy.com/site/lg-65-class-oled-b9-series-2160p-smart-4k-uhd-tv-with-hdr/6360611.p?skuId=6360611&intl=nosplash',headers = headers)
p = re.compile(r'regularPrice\\":([\d.]+),')
price = p.findall(r.text)[0]
print(price)
本文链接:https://www.f2er.com/3169842.html

大家都在问