Scrape Fandango Node.js

尝试首次抓取Node.js。 fandango上的电影列表嵌套在几个div中,所以这意味着我要做的事情:$('div[id="page"]').find('div > div > div > div > ul > li').each。然后,当我在控制台中登录html时,似乎与在Chrome上进行检查时有所不同。缺少某些电影,并且日志中的ul类名称不同。这正常吗?

 const axios = require('axios');
 const cheerio = require('cheerio');

 const url = 'https://www.fandango.com/movies-in-theaters';

 axios(url)
   .then(response => {
     const html = response.data;
     console.log(html);
     const $ = cheerio.load(html);
     const movies = $('ul.browse-movielist > li');

     const openingThisWeek = [];

     movies.each(function () {
        console.log("Found the list");   // this doesn't get called
        const title = $(this).find('.heading-style-1 browse-movielist--title poster-card--title').text();
        openingThisWeek.push({
        title,});
     });

     console.log(openingThisWeek);   
})
.catch(console.error);
huakaijianyueming 回答:Scrape Fandango Node.js

Fandango使用Opening Movie的客户端渲染,所以我们不能使用axios来获取它。

另一种方法是使用无头浏览器对数据进行爬网。我正在使用puppeteer

const puppeteer = require("puppeteer");
const cheerio = require("cheerio");

(async () => {
  const url = "https://www.fandango.com/movies-in-theaters";

  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(url);
  const body = await page.evaluate(() => document.body.outerHTML);
  await browser.close();
  const $ = cheerio.load(body);
  const movies = [];
  $(".browse-movielist > li").each((i,item) => {
    const $item = $(item);
    const title = $item.find(".poster-card--title").text();
    movies.push({
      title
    });
  });
  console.log(movies);
})();
本文链接:https://www.f2er.com/3087873.html

大家都在问