我正在尝试使用Scrapy-sharp抓取淘宝网站。 我能够登录(填写用户名和密码),但是随后它转到存在“ slider captcha”的页面,如下图所示。
我知道我们可以使用puppeteer库解决此问题。但是我还面临着其他伪造者面临的挑战(请参阅https://stackoverflow.com/posts/comments/103786166?noredirect=1)
我的问题是:有没有办法用Scrapy-sharp和HtmlAgility包解决滑块验证码?我们如何获得滑块的边界框?然后在Scrapy-sharp中触发鼠标事件?
我的抓取代码如下:
ScrapingBrowser Browser = new ScrapingBrowser();
Browser.AllowAutoRedirect = true;
// Browser has settings you can access in setup
Browser.AllowMetaRedirect = true;
WebPage PageResult = Browser.NavigateToPage(new Uri("https://login.m.taobao.com/login_oversea.htm?loginFrom=wap_tmall&assets_js=mui%2Ffeloader%2F4.0.22%2Ffeloader-min.js,mui%2Ftmapp-standalone%2F4.0.3%2Fseed.js,mui%2Ftmapp-standalone%2F4.0.3%2Flogin-download.js&assets_css=3.0.8%2Fmobile%2Ftmallh5.css&redirectURL=https%3A%2F%2Fwww.tmall.com%2F"));
PageWebForm form = PageResult.FindFormById("loginForm");
form["TPL_username"] = "<<someusername>>";
form["TPL_password"] = "********";
form.Method = HttpVerb.Post;
WebPage resultsPage = form.Submit();
PageWebForm searchForm = resultsPage.FindForm("searchTop");
searchForm.Method = HttpVerb.Post;
searchForm["q"] = "nike";
//subsequent pages
//var postResults = searchForm.Submit(new Uri(@"https://list.tmall.com/m/search_items.htm?page_size=20&page_no=3&q=Nike&type=p&tmhkh5=&spm=a220m.6910245.a2227oh.d100&from=mallfp..m_1_searchbutton&searchType=&closedKey="));
//1st page//
var postResults= searchForm.Submit(new Uri(@"https://list.tmall.com/search_product.htm?q=nike&type=p&tmhkh5=&spm=a220m.8599659.a2227oh.d100&from=mallfp..m_1_searchbutton&searchType=default&closedKey="));
//PageWebForm verForm = resultsPage.FindFormById("verifyForm");
//verForm.Method = HttpVerb.Post;
//verForm.action = "https://passport.taobao.com/iv/h5/h_5_verify_modes.htm";
//WebPage postResults = verForm.Submit();
//var divs = JsonConvert.DeserializeObject<RootObject>(postResults.Content);
var divs = postResults.Html.SelectNodes("//div[@class='product ']")
但是,提交表单后,它将使用滑块验证码重定向到页面。有关如何解决此问题的任何提示/建议?