Scrapy抓取ValueError

我是python和crapy的新手。我按照教程进行了抓取抓取quotes.toscrape.com。

我在代码中输入的内容与本教程中的代码完全相同,但是我不断得到ValueError: invalid hostname:运行草木爬网引号时。我正在Pycharm计算机上的Mac中进行此操作。

我尝试在URL部分的start_urls = []周围加上单引号和双引号,但这并不能解决错误。

这是代码的样子:

import scrapy

class QuoteSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = [
        'http: // quotes.toscrape.com /'
    ]

    def parse(self,response):
        title = response.css('title').extract()
        yield {'titletext':title}

应该是在抓取标题的网站。

这是错误的样子:

2019-11-08 12:52:42 [scrapy.core.engine] INFO: Spider opened
2019-11-08 12:52:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min),scraped 0 items (at 0 items/min)
2019-11-08 12:52:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-11-08 12:52:42 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http:///robots.txt>: invalid hostname: 
Traceback (most recent call last):
  File "/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py",line 44,in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
ValueError: invalid hostname: 
2019-11-08 12:52:42 [scrapy.core.scraper] ERROR: Error downloading <GET http:///%20//%20quotes.toscrape.com%20/>
Traceback (most recent call last):
  File "/Users/newuser/PycharmProjects/ScrapyTutorial/venv/lib/python2.7/site-packages/scrapy/core/downloader/middleware.py",spider=spider)))
ValueError: invalid hostname: 
2019-11-08 12:52:42 [scrapy.core.engine] INFO: Closing spider (finished)
samdd 回答:Scrapy抓取ValueError

请勿在网址中使用空格

start_urls = [
    'http://quotes.toscrape.com/'
]
本文链接:https://www.f2er.com/3135791.html

大家都在问