使用python中的硒从https://ash.confex.com/ash/2019/webprogram/start.html提取摘要

2024-05-07 • 问答

我正在尝试从https://ash.confex.com/ash/2019/webprogram/start.html的每个摘要中提取所有正文。我有100个密钥，例如过继细胞免疫疗法，过继细胞疗法，同种异体，自体，人工T细胞受体，bcmA，TACI，CD123，CD19，CD20，并存储在Excel中。

我用硒输入一个密钥，但是我需要一个一个地使用所有的密钥，然后转到每个摘要并收集标题，作者，隶属关系，日期，背景，方法，结果，结论并保存在一个优秀的人。

    import webbrowser
import os
import requests
from bs4 import BeautifulSoup
import sys
import wget
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome('D:\\crome drive\\chromedriver.exe')
driver.get('https://ash.confex.com/ash/2019/webprogram/start.html')
searchterm = driver.find_element_by_id("words").send_keys("CAR-T")
driver.find_element_by_name("submit").click()
#driver.find_element_by_tag_name("resulttitle")
#driver.find_element_by_class_name("a")

soup_level1=BeautifulSoup(driver.page_source,'lxml')
#fl=soup_level1.find_all(class_='soup_level1')
results = soup_level1.find_all('div',attrs={'class':'resulttitle'})
#links = [x.get_attribute('href') for x in driver.find_elements_by_link_text('View Abstract')]
#htmls = []
#src = driver.page_source  # gets the html source of the page
parser = BeautifulSoup(src)
mn=[]
list_of_attributes = {"class": "resulttitle"}  # A list of attributes that you want to check in a tag
tag = results.find('a',attrs='href')

for ls in tag:
    for s in range(0,len(ls.contents)):
        try:
            if 'Session' in ls.contents[s].attrs['href']:
                mn.append('https://ash.confex.com/ash/2019/webprogram/'+ls.contents[s].attrs['href'])
        except:
            pass


# response = requests.post('https://ash.confex.com/ash/2017/webprogram/Session11552.html')
# soup = BeautifulSoup(response.text)
#
# list_of_attributes = {"class": "cricon"}  # A list of attributes that you want to check in a tag
# tag1 = soup.findAll('div',attrs=list_of_attributes)

dt=pd.DataFrame()
dt['Main Links']=mn
dt.to_excel('D:\Ash 2019\Ash_Main_links2.xlsx')

我得到空结果

使用python中的硒从https://ash.confex.com/ash/2019/webprogram/start.html提取摘要

tanxx 回答：使用python中的硒从https://ash.confex.com/ash/2019/webprogram/start.html提取摘要

大家都在问