我需要抓取此页面中的所有产品:website
所以我需要点击每张照片,然后抓取其中的数据。
我设法编写了用于抓取内部数据的脚本。
我必须提取名称、价格、描述...
下面是我的代码:
import scrapy
from scrapy_selenium import SeleniumRequest
from scrapy.selector import Selector
from selenium.webdriver.common.keys import Keys
from scrapy_splash import SplashRequest
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from shutil import which
class AsoswomennewSpider(scrapy.Spider):
name = 'asoswomennew'
allowed_domains = ['www.asos.com']
start_urls = ['https://www.asos.com/monki/monki-lisa-cropped-vest-top-with-ruched-side-in-black/prd/23590636?colourwayid=60495910&cid=2623']
def __init__(self):
chrome_option = Options()
chrome_option.add_argument("--headless")
chrome_path = which("chromedriver")
driver = webdriver.Chrome(executable_path=chrome_path,options = chrome_option)
driver.set_window_size(1920,1080)
driver.get('https://www.asos.com/monki/monki-lisa-cropped-vest-top-with-ruched-side-in-black/prd/23590636?colourwayid=60495910&cid=2623')
def parse(self,response):
yield{
'name':response.xpath("//div[@class='product-hero']/h1/text()").get(),'price':response.css('//*[contains(@class,"current-price")]').get(),'description':response.xpath("//div[@class='product-description']/ul/li/span/text()").getall(),'about_me':response.xpath("//div[@class='about-me']/p/text()").getall(),'brand_description':response.xpath("//div[@class='brand-description']/p/text()").getall()
}
现在我需要遍历每张图片,然后执行上面的脚本。
有人可以帮我吗?
谢谢!
PS 我的 start_url 需要更改为这个 'https://www.asos.com/women/new-in/new-in-clothing/cat/?cid=2623&nlid= ww|new+in|new+products|服装'
因为这是主(主页)网页,所以我需要为每个项目设置一个回调网址。