Spider运行正常并将数据保存在mongodb中,但突然开始在数据库中以及json和csv中开始保存重复值,我删除了下载图像的代码,它工作正常,但我需要图像,有人可以帮我吗? 预先感谢。
items.py
进口沙皮
class BucketItem(scrapy.Item):
# define the fields for your item here like:
store_name = scrapy.Field()
category = scrapy.Field()
sub_category = scrapy.Field()
name = scrapy.Field()
unit = scrapy.Field()
price = scrapy.Field()
link = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
pass
导入pymongo
BucketPipeline(object)类:
def __init__(self):
self.conn = pymongo.MongoClient(
'localhost',27017
)
db = self.conn['bucket']
self.collection = db['products_tb']
def process_item(self,item,spider):
self.collection.insert(dict(item))
return item
spider.py
image_url = [product.css('.productVisuals img::attr(src)').extract_first()]
items = response.meta['items']
items['name'] = product_name
items['unit'] = unit
items['price'] = product_price
items['link'] = product_link
items['image_urls'] = image_url
# print(items['name'])
yield items
中引用的https://qne.com.pk/../product_images/14768.jpg>下载文件时出错。