将标题添加到scrapy?

我在python / scrapy上编写了以下用于网络抓取的代码:

# -*- coding: utf-8 -*-
import scrapy
from scrapy.crawler import CrawlerProcess
import requests

class HousesearchspiderSpider(scrapy.Spider):
    name = "housesearchspider"
    user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/44.0.2403.157 Safari/537.36'
    download_delay = 10.0
    start_urls = [
        'https://www.website.com/filter1/filter2/',]

        for detail in response.css('div.search-result-content'):

            yield {'price':detail.css('div.search-result-info search-result-info-price ::text').get(),'size': detail.css('ul.search-result-kenmerken ::text').get(),'postcode': detail.css('small.search-result-subtitle ::text').get(),'street': detail.css('h2.search-result-title ::text').get(),}

        next_page = response.css('li.next a::attr(href)').get()

        if next_page is not None:
            next_page = response.urljoin(next_page)
            sleep(5)
            yield scrapy.Request(next_page,callback=self.parse)

但是我被使用该user_agent阻塞了,想添加一个标头和yield scrapy.Request(url,headers = headers)以模拟与真实浏览器完全相同的请求(有点像下面的漂亮汤)代码可以,但是很麻烦):

response = get(url,headers=headers)

我找不到太多的文档/示例来确切地将此标题包含在scrapy中?有人可以帮忙吗?

super_1987cl 回答:将标题添加到scrapy?

对于您的start_urls请求,您可以使用settings.pyUSER_AGENTDEFAULT_REQUEST_HEADERS

对于您要从代码中request提取的每个yield,您可以使用headers关键字:

yield scrapy.Request(next_page,headers=you_headers,callback=self.parse)
,

scrapy.Request 现在包含一个 cookie 参数,不要为它们使用标头,因为它们不会被中间件接收。

request_with_cookies = Request(url="http://www.example.com",cookies={'currency': 'USD','country': 'UY'})

https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request

本文链接:https://www.f2er.com/3169726.html

大家都在问