以前,我使用硒webdriver进行各种网站的Web抓取。但是一时兴起,我决定尝试将旋转代理IP地址用于网络抓取目的,因为我想了解那是什么。为此,我在线搜索并找到了这篇文章,并决定尝试一下:
https://medium.com/ml-book/multiple-proxy-servers-in-selenium-web-driver-python-4e856136199d
但是当我在代码中使用它时,我什至无法访问任何该死的网站。甚至'get'语句也不起作用:(
我在Anaconda Spyder控制台中收到此消息。
注意:我拍了一张截图并将其放在此处:
然后,我删除了从本文复制的代码。即使那样,我的代码也不会停止通过代理连接!!!!它只是拒绝不使用代理,就像我的代码正在报仇我一样。
这是我的代码:
import xlrd
import pandas as pd
import datetime as dt
import xlwings as xw
import sys
import math
import xlwt
from xlwt import Workbook
import openpyxl
from openpyxl import load_workbook
from collections import Counter
import shutil as shu
import os
import time
from selenium import webdriver
#from http_request_randomizer.requests.proxy.requestProxy import RequestProxy
sz = ('Coast_Retail_-_Auto_Weekly_Update.xlsx')
sz1 = xlrd.open_workbook(sz)
sz2 = sz1.sheet_by_index(0)
hz='Coast_Retail_-_Auto_Weekly_Update.xlsx'
hz1=load_workbook(hz)
hz2=hz1.worksheets[0]
req_proxy = RequestProxy() #you may get different number of proxy when you run this at each time
proxies = req_proxy.get_proxy_list() #this will create proxy list
Proxy = proxies[0].get_address()
webdriver.DesiredCapabilities.CHROME['proxy']={
"httpProxy":Proxy,"ftpProxy":Proxy,"sslProxy":Proxy,"proxyType":"MANUAL",}
d = webdriver.Chrome(executable_path=r'R:\Sulaiman\temp_code_vineet\nick\chromedriver.exe')
time.sleep(5)
d.get("https://tfl.compass.inovatec.ca")
time.sleep(5)
un = d.find_element_by_id("username")
un.send_keys("vpande")
pw = d.find_element_by_id("Password")
pw.send_keys("v123456A")
sb = d.find_element_by_class_name("red-btn")
sb.click()
time.sleep(5)
qz=[]
for i in range(4,sz2.nrows):
try:
if(sz2.cell_value(i,13)=="Booked"):
fn=sz2.cell_value(i,0)
ln=sz2.cell_value(i,1)
fun=fn+" "+ln
sch = d.find_element_by_class_name("search")
sch.send_keys(fun)
sch.send_keys(u'\ue007')
time.sleep(5)
d.find_element_by_xpath('//*[@id="body"]/section/div/div[2]/div[1]/div[2]/a[2]').click()
time.sleep(5)
x=d.find_element_by_xpath('/html/body/div[5]/section/div/div[2]/div/div[2]/div[2]/div[4]/span[2]').text
print(x)
qz.append(x)
d.get("https://tfl.compass.inovatec.ca")
time.sleep(5)
except:
print("err at "+str(i))
pass
print(qz)