从Cheerio的剪贴簿页面获取信息

2024-05-02 • 问答

我正在研究使用Puppeteer和Cheerio实现的Web抓取节点服务器。

我有一个前端React应用程序使用Soundcloud-widget播放音乐，但事实是该API仅适用于正确的Soundcloud URL。

因此，我正在输入UI，以便通过Scraper在SoundCloud搜索上发送请求，并从那里获取结果。

因为我只关心URL和歌曲名称，是从剪贴簿中获取的

$(a.soundTitle__title)

将名称保留在子span标记中（在他的“ class”属性中），并将URL作为href足够。

问题是我要得到

{
 options: {
  withDomLvl1: true,normalizeWhitespace: false,xml: false,decodeEntities: true
},length: 0,prevObject: {
  options: {
   withDomLvl1: true,decodeEntities: true
  }
 }
}

是跨度/歌曲名称的默认cheerio输出，和href网址的“未定义”。

即使我尝试使用{.text（）/ .attr（“ class”）/ etc ...}， undefind /此默认响应/错误text（）不是函数

这是我的代码-

... req

const addaptReq = req.text.replace(' ','%20');  
const url = `https://soundcloud.com/search?q=${addaptReq}`;

let myBrowser;

puppeteer
 .launch()
 .then(browser => {
     myBrowser = browser;
     return myBrowser.newPage();
   })
 .then(page => {
     return page.goto(url).then(() => {
         return page.content();
      });
  })
 .then(html => {
      // console.log(html)
      $('a.soundTitle__title',html,).each(result => {
        let songName = $('span',result)
        let songURL = $(result).attr('href')

        //  hopefuly name will give the span text (which is it class and href the URL)


          console.log(songName,songURL)

      })
   }).then(() => {
       myBrowser.close()
   })
  .catch(err => {
      console.log(err);
   });

我用devtools附加了soundcloud的屏幕截图，也许我在那儿做错了什么？

提前谢谢！

const puppeteer = require ('puppeteer') ;(async () => { const input = process.argv.length < 3 ? 'goo goo dolls' : process.argv[2] const maximumSongs = 100 const titlesArray = [] const searchquery = input.split(' ').join('%20') const url = `https://soundcloud.com/search?q=${searchquery}` const browser = await puppeteer.launch({ headless: true,devtools: false }) const [page] = await browser.pages() page.setDefaultNavigationTimeout(0) page.setRequestInterception(true) page.on('request',async request => { if (request.resourceType() === 'font' || request.resourceType() === 'image' || request.resourceType() === 'media' ){ request.abort() } else { request.continue() } }) page.on('requestfinished',async request => { if ( request.url().search('https://api-v2.soundcloud.com/search?') > -1 ) { const response = await request.response() const content = await response.json() const songs = content.collection for ( let num in songs ) { if (titlesArray.length < 100 && typeof songs[num].title !== 'undefined') { console.log ( `[${titlesArray.length + 1}] ${songs[num].title}` ) console.log ( `${songs[num].permalink_url}\n` ) titlesArray.push ( songs[num].title ) } else if (typeof songs[num].title !== 'undefined') { const exit = await browser.close() } } } }) const search = await page.goto(url,{waitUntil: 'networkidle2'}) const scroll = await page.evaluate ('const autoscroll = setInterval( () => {window.scrollBy(0,100)},250)') })()

从Cheerio的剪贴簿页面获取信息

yk7732637 回答：从Cheerio的剪贴簿页面获取信息

大家都在问