在迭代包含url的excel数据时，Python桌面客户端挂起

2024-05-06 • 问答

我是python的新手，我制作了一个网络抓取脚本，当im最多使用80个网址时，它可以正常工作。

我添加了一个进度条，它显示了95个url循环后脚本被卡住的情况。我尝试使用列表中的其他网站URL进行搜索，但仍然卡住。

有人对此有解决方案吗？

以下是脚本：

data = pd.read_excel(r'Z:\001\input.xlsx')
urllist = data['IDE_WEBSITE'].tolist()

wordlist = ('woord1','woord2')
results = []
errors = []

for word in wordlist:
    for url in tqdm(urllist):
        try:
            r = requests.get(url,allow_redirects=False)
            soup = BeautifulSoup(r.content.lower(),'lxml')
            words = soup.find_all(text=lambda text: text and word.lower() in text)
            count = len(words)
            time.sleep(1)
            if count > 0:
                result = {'url': url,'count': count,'the_word': word}
                results.append(result)
        except ConnectionError:
            error1 = {'url': url,'error': 'connection_error'}
            errors.append(error1)
            urllist.remove(url)
            continue
        except BaseHTTPError:
            error2 = {'url': url,'error': 'base_error'}
            errors.append(error2)
            urllist.remove(url)
            continue
        except ChunkedEncodingError:
            error3 = {'url': url,'error': 'encoding_error'}
            errors.append(error3)
            urllist.remove(url)
            continue
        except MissingSchema:
            urllist.remove(url)
            continue

df_errors = pd.DataFrame(errors)
print(errors)
print(urllist)
df_results = pd.DataFrame(results)

df_results.to_excel(r'Z:\001_Personal\results_nmb_run3.xlsx',index=None,header=True)
df_errors.to_excel(r'Z:\001_Personal\errors_nmb_run3.xlsx',header=True)

在迭代包含url的excel数据时，Python桌面客户端挂起

liuchang1777 回答：在迭代包含url的excel数据时，Python桌面客户端挂起

大家都在问