下载图像会导致值重复并产生错误

Spider运行正常并将数据保存在mongodb中,但突然开始在数据库中以及json和csv中开始保存重复值,我删除了下载图像的代码,它工作正常,但我需要图像,有人可以帮我吗? 预先感谢。

items.py

进口沙皮

class BucketItem(scrapy.Item):
# define the fields for your item here like:
store_name = scrapy.Field()
category = scrapy.Field()
sub_category = scrapy.Field()
name = scrapy.Field()
unit = scrapy.Field()
price = scrapy.Field()
link = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
pass

pipelines.py

导入pymongo

BucketPipeline(object)类:

def __init__(self):
    self.conn = pymongo.MongoClient(
        'localhost',27017
    )
    db = self.conn['bucket']
    self.collection = db['products_tb']

def process_item(self,item,spider):
    self.collection.insert(dict(item))
    return item

spider.py

        image_url = [product.css('.productVisuals img::attr(src)').extract_first()]

        items = response.meta['items']
        items['name'] = product_name
        items['unit'] = unit
        items['price'] = product_price
        items['link'] = product_link
        items['image_urls'] = image_url
        # print(items['name'])
        yield items

[scrapy.pipelines.files]警告:文件(代码:400):从

中引用的https://qne.com.pk/../product_images/14768.jpg>下载文件时出错。
dtzel 回答:下载图像会导致值重复并产生错误

不确定要从中下载源,但是除了“ src”属性外,您可能还会找到“ data-src”属性... 如果“ src”具有标准公用,您可能想尝试使用if语句来捕获“ data-src”。

    if product.xpath('.//div..../img[contains(@src,"some_common_text")]'):
     item['image_urls'] = product.xpath('.//div...../img/@data-src').extract()
else:
     item['image_urls'] = product.xpath('.//div...../img/@src').extract()
本文链接:https://www.f2er.com/3163673.html

大家都在问