我想知道有什么解决方案可以使用VBscript从网站下载文件吗?
我知道如何从网站下载单个文件,但是如何使其循环?另外,如何在特定页面上搜索特定的文件扩展名,并下载文件(如果可用)?
For each pdf in website
xhr.open "GET",pdf.src,False
xhr.send
set stream = CreateObject("Adodb.Stream")
with stream
.type = 1
.Open
.Write xhr.responsebody
.SaveToFile "C:\temp\" + CStr(index) + ".pdf",2
end with
stream.Close
set stream = nothing
index = index + 1
Next
假设我们有一个网站https://website.com/productpage/
,那么有些链接的结构都https://website.com/products/xx-x-xx-x/
相同,因此所有需要的链接都以https://website.com/products/
开头。根据源代码,似乎有33种此类链接。
然后进入某些页面后,便有了PDF文件。有时是一个,有时是3或4。但是,链接到PDF文件的文件类似https://website.com/wp-content/uploads/2016/12/xxxx.pdf
,其中xxxx.pdf实际上可以是文件名。
这是我设法获得的一个文件:
dim xHttp: Set xHttp = createobject("microsoft.XMLHTTP")
dim bStrm: Set bStrm = createobject("Adodb.Stream")
xHttp.Open "GET","https://website.com/wp-content/uploads/2016/12/xxxx.pdf",False
xHttp.Send
with bStrm
.type = 1 '//binary
.open
.write xHttp.responseBody
.savetofile "c:\temp\xxxx.pdf",2 '//overwrite
end with
编辑:
应该像这样
- 获取所有必需的链接
- 进入每个链接
- 搜索以“ .pdf”结尾的链接
- 将文件下载到C:\ temp \
网站的结构:
https://website.com/productpage/
https://website.com/products/xx-x/
https://website.com/wp-content/uploads/2016/12/xx-xx.pdf
https://website.com/products/xxxxx-xsx/
https://website.com/wp-content/uploads/2018/12/x-xx-x.pdf
https://website.com/wp-content/uploads/2015/12/x-x-xx.pdf
https://website.com/wp-content/uploads/2019/12/xxx-x.pdf
https://website.com/products/x-xx-xsx/
https://website.com/wp-content/uploads/2014/12/x-xxx.pdf
https://website.com/wp-content/uploads/2013/12/x-x-x-x.pdf
https://website.com/products/xx-x-xsx/
https://website.com/wp-content/uploads/2012/12/x-xxxx.pdf