使用美丽汤来刮擦一个人在Quora上回答的所有问题

我该如何编写漂亮的汤来解决特定用户已回答的所有问题?

输入
作者的URL
示例:https://www.quora.com/profile/AUTHOR/answers

输出:
第1栏:作者回答的问题
例如:“腹膜注射液问题”

第2列:已回答问题的网址
示例:https://www.quora.com/lorem-ipsum-question

第3列:回答的问题的网址
示例:https://www.quora.com/lorem-ipsum-question

iCMS 回答:使用美丽汤来刮擦一个人在Quora上回答的所有问题

此脚本将打印在页面上找到的所有答案/ URL。还存在无限滚动,向https://www.quora.com/graphql/gql_para_POST?q=UserProfileAnswersMostRecent_RecentAnswers_Query发出POST请求,但我无法从中获取数据(您可以在开发人员工具->网络标签中看到它):

import re
import json
import requests


url = 'https://www.quora.com/profile/Nana-Bello-Shehu/answers'
html_data = requests.get(url).text

d = re.findall(r'window\.ansFrontendGlobals\.data\.inlineQueryResults\.results\[".*?"\] = ("{.*}");',html_data)[-1]
d = json.loads(json.loads(d));

for e in d['data']['user']['recentPublicAndPinnedAnswersConnection']['edges']:
    if e['node']['__typename'] != 'Answer':
        continue

    q = json.loads(e['node']['question']['title'])
    title = q['sections'][0]['spans'][0]['text']
    u = 'https://www.quora.com' + e['node']['question']['url']
    print('{:<90} {}'.format(title,u))

打印:

Do pictures speak louder than words?                                                       https://www.quora.com/Do-pictures-speak-louder-than-words
Does true love exist?                                                                      https://www.quora.com/Does-true-love-exist-8
What picture made your blood boil?                                                         https://www.quora.com/What-picture-made-your-blood-boil
What are the before and after pics of people who are drug addicts for several years?       https://www.quora.com/What-are-the-before-and-after-pics-of-people-who-are-drug-addicts-for-several-years
What was the funniest thing you saw/heard today?                                           https://www.quora.com/What-was-the-funniest-thing-you-saw-heard-today
Are there any truly selfless acts,motives,or people?                                     https://www.quora.com/Are-there-any-truly-selfless-acts-motives-or-people
Which famous person in history who is idolized,was actually a horrible person?            https://www.quora.com/Which-famous-person-in-history-who-is-idolized-was-actually-a-horrible-person
What is something that you read recently and is worth sharing?                             https://www.quora.com/What-is-something-that-you-read-recently-and-is-worth-sharing
How do I get the attention of my crush?                                                    https://www.quora.com/How-do-I-get-the-attention-of-my-crush
What are some heart touching stories of best friends?                                      https://www.quora.com/What-are-some-heart-touching-stories-of-best-friends
,

我认为最简单的方法是使用硒:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')
import time
url = 'https://www.quora.com/profile/Nana-Bello-Shehu/answers'

driver.get(url)

SCROLL_TIME = 2


last_height = driver.execute_script("return document.body.scrollHeight")

while True:

    driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")


    time.sleep(SCROLL_TIME)


    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

qbox = driver.find_elements_by_css_selector('.qu-pb--medium')
for qb in qbox:
    print(qb.find_element_by_css_selector('span.qu-userSelect--text').text)
    print('https://www.quora.com' + qb.find_element_by_css_selector('a.q-box.qu-cursor--pointer.qu-hover--textDecoration--underline').get_attribute('href'))
    print('\n')

输出:

Do pictures speak louder than words?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu


Does true love exist?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu


What picture made your blood boil?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu


What are the before and after pics of people who are drug addicts for several years?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu


What was the funniest thing you saw/heard today?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu


Are there any truly selfless acts,or people?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu

依此类推...

此脚本滚动到页面末尾并复制所有问题。您可以尝试将 SCROLL_TIME 设置为较低,以使脚本运行更快,但有时脚本将在页面结束之前以较短的滚动时间结束。

注意:

  1. 您需要selenium
  2. 您需要Firefox
  3. 您需要geckodriver,现在脚本是从c:/program/geckodriver.exe导入的,因此,如果将geckodriver添加到其他路径,则需要更改 executable_path
本文链接:https://www.f2er.com/1880953.html

大家都在问