我正在使用Nokogiri解析网页,并想解析出图像URL。这是我的设置:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('https://themeforest.net/search?sort=sales'))
目标网站包含以下代码:
<div class="_2_3rp " style="padding-top:50.847457627118644%">
<div style="">
<img class="_1xvs1" src="https://themeforest.img.customer.envatousercontent.com/files/274559780/screenshots/00-Preview.jpg?auto=compress%2Cformat&fit=crop&crop=top&w=590&h=300&s=37354d884fd0f3b574238e013b4ea423"
title="Avada | Responsive Multi-Purpose Theme"
alt="Avada | Responsive Multi-Purpose Theme" style="left: 0%;">
</div>
</div>
但是,当我跑步时:
puts doc.search("//div[@class = '_2_3rp ']")
我得到以下信息:
<div class="_2_3rp " style="padding-top:50.847457627118644%"><div style="height:100%" class="lazyload-placeholder"></div></div>
<div class="_2_3rp " style="padding-top:50.847457627118644%"><div style="height:100%" class="lazyload-placeholder"></div></div>
.....
=> nil
为什么我没有上img
类,而却得到lazyload-placeholder
?我有什么办法可以克服这个问题,并逃脱图像占位符?