我无法解析页面并获得链接Nokogiri

2024-06-02 • 问答

我无法通过Nokogiri解析https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/

获得链接列表。

我在做什么错了？

links = Nokoiri::HTML('https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/')

或

links = Nokoiri::XML('https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/')

--->

#(Document:0x3fcdda1b988c {
  name = "document",children = [
    #(DTD:0x3fcdda1b5b24 { name = "html" }),#(Element:0x3fcdda1b46fc {
      name = "html",children = [
        #(Element:0x3fcdda1b0804 {
          name = "body",children = [
            #(Element:0x3fcdda1ac920 {
              name = "p",children = [ #(Text "https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/")]
              })]
          })]
      })]
  })

puts links.to_html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/</p></body></html>
=> nil

这无法正常工作，因为整个页面都是使用JavaScript创建的。该文档的主体仅包含一个脚本标签。打开页面源代码或查看原始响应，而不仅仅是在Web检查器/开发人员工具中查看呈现的DOM。

view-source:https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/

Nokogiri只是HTML解析器，而不是浏览器，因此不运行JavaScript。尽管您可以使用像phantom.js这样的无头浏览器，但您可能只想查找提供所需数据的API。网页抓取工具通常是对任何问题的错误答案。

例如，我找到了一个更有趣的解决方案））： link_driver = Nokogiri::HTML(page.source).at('a:contains("mac")').values.join('') chromedriver_storage_page = 'https://chromedriver.storage.googleapis.com/' File.new('filename.zip','w') << URI.parse(chromedriver_storage+link).read

contains（“ mac”）可以更改contains（“ linux”）或contains（“ win”），没关系，选择任何版本的操作系统

和2解决方案-解析chromedriver.chromium.org页面并获取有关所有版本的信息。如果网站上的版本比我的要新，那么我用新行替换版本号进行下载
chromedriver_storage = 'https://chromedriver.storage.googleapis.com/'
chromedriver = '79.0.3945.36/'-开始使用Capybara并仅剪切版本
zip = 'chromedriver_mac64.zip'
link = chromedriver_storage+chromedriver+zip File.new('filename.zip','w') << URI.parse(link).read

事实证明，可以将无头模式的解析器插入crontab任务中，以更新当前浏览器的版本

我无法解析页面并获得链接Nokogiri

f19870215 回答：我无法解析页面并获得链接Nokogiri

大家都在问