从Olx广告中抓取电话号码

2024-04-24 • 问答

我正在尝试使用请求和beautifulsoup为olx网站（www.olx.pl）创建一个刮板。我的大多数数据都没有问题，但是电话号码是隐藏的（必须先单击它）。我已经尝试使用chrome inspect手动单击时查看“网络”标签中的情况。有一个带有此信息的ajax请求：“？pt = 5d1480fbad0a1f2006e865bfdf7a6fb07f244b82e17ab0ea4c5eaddc43f9da391b098e1926642564ffb781655d55be270c6913f7526a08298f43b24c0169636bb” 这是可以在网站来源中找到的phoneToken（每次加载页面时都会更改）。我尝试使用请求库发送这种请求，但响应为“ 000 000 000”。我可以使用Selenium来获取电话号码，但是加载太慢了。

问题是：有没有办法解决这些安全电话令牌？要么假设在1-2秒内如何加快Selenium报废电话号码的时间？

广告示例： https://www.olx.pl/561666735

编辑：实际上，现在作为回应，我收到一条消息，说我的IP地址已被阻止。（但仅使用请求，手动加载页面时不会阻止ip）。不幸的是，我进行了一些更改，但无法重现代码，以得到“ 000 000 000”的响应。这是我现在的代码的一部分。

def scrap_phone(id):
s = requests.Session()
url = "https://www.olx.pl/{}".format(id)
response = s.get(url,headers=headers)
page_text = response.text
# getting short id
index_of_short_id = page_text.index("'id':'")
short_id = page_text[index_of_short_id:index_of_short_id+11].split("'")[-1]
# getting phone token
index_of_token = page_text.index("phoneToken")
phone_token = page_text[index_of_token+10:index_of_token+150].split("'")[1]
url = "https://www.olx.pl/ajax/misc/contact/phone/{}".format(short_id)
data = {
    'pt': phone_token
}
response = s.post(url,data=data,headers=headers)
print(response.text)

scrap_phone(540006276)

从Olx广告中抓取电话号码

fishyuying 回答：从Olx广告中抓取电话号码

大家都在问