尝试一下:
编辑:在发送请求之前也剥离IP
import scrapy
lists = open(raw_input('IP list file name: '),'r').read().split('\n')
class jeffbullasSpider(scrapy.Spider):
name = "iptohost"
allowed_domains = ["api.hackertarget.com"]
url = "https://api.hackertarget.com/reverseiplookup/?q={}"
def start_requests(self):
for ip in lists:
yield scrapy.Request(url=self.url.format(ip.strip()),callback=self.parse)
def parse(self,response):
print(response.xpath('//body//text()').get())
,
我还有其他问题,我在scrapy上添加了代理轮换,并在使用以下命令保存后:
scrapy crawl iptohost -o some.json -t json &> some.text
我的结果不仅包含我的域,还包含代理结果和我的结果域
我的结果
2019-11-10 10:39:50 [rotating_proxies.expire] DEBUG: Proxy <http://197.157.219.25:8080> is DEAD
2019-11-10 10:39:50 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://api.hackertarget.com/reverseiplookup/?q=61.112.2.178> with another proxy (failed 4 times,max retries: 5)
2019-11-10 10:39:50 [rotating_proxies.expire] DEBUG: Proxy <http://139.59.99.119:8080> is DEAD
2019-11-10 10:39:50 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://api.hackertarget.com/reverseiplookup/?q=195.11.184.130> with another proxy (failed 5 times,max retries: 5)
2019-11-10 10:39:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://api.hackertarget.com/reverseiplookup/?q=195.11.184.130> (referer: None)
2019-11-10 10:39:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://api.hackertarget.com/reverseiplookup/?q=185.179.235.40> (referer: None)
[u'capitalinstant.com']
[u'142.92.242.192']
[u'API count exceeded - Increase Quota with Membership']
[u'API count exceeded - Increase Quota with Membership']
[u'API count exceeded - Increase Quota with Membership']
如何删除代理结果以仅抓取我的域结果pleaz非常感谢=>
本文链接:https://www.f2er.com/3157432.html