如何检索在页面上找到的xml页面？

2024-05-07 • 问答

我有一个这样的HTTP链接，其中链接中的最后一个数字表示基因ID。

http://glioblastoma.alleninstitute.org/ish/gene/show/22

如果单击它，将带您到一个页面，该页面的底部显示“该数据也可以XML形式提供”，并带有XML文件的链接。我想下载并保存数百个此类基因ID的XML文件。不用单击每个页面并保存，怎么办？

该链接将带您进入[深呼吸] ...

http://glioblastoma.alleninstitute.org/api/v2/data/SectionDataSet/query.xml?order=specimen_types.name$asc&numRows=all&criteria=rma::criteria,products\[id$eq17\],genes\[id$eq22\],rma::include,genes,specimen(donor(age),specimen_types),section_images\[image_type$eqPrimary\],rma::options\[only$eq%27genes.id,id,specimens.id,specimens.external_specimen_name,specimen_types.name,donors.molecular_subtype,ages.name,sub_images.path,sub_images.x,sub_images.y,sub_images.width,sub_images.height,sub_images.tier_count,failed%27\]

...是this API的一部分。请注意，数字 22 出现在其中，并且可以用任何其他基因ID代替。

如果您喜欢通用名，那么...

http://glioblastoma.alleninstitute.org/api/v2/data/SectionDataSet/query.xml?order=specimen_types.name$asc&numRows=all&criteria=rma::criteria,genes\[acronym$eqABL1\],failed%27\]

...允许您搜索 ABL1 或 MECOM

要编写脚本，我建议将您所有的基因缩写词放入文件中，然后执行以下操作：

cat geneAcronyms | xargs -n 1 -I{} curl -o {}.xml 'http://glioblastoma.alleninstitute.org/api/v2/data/SectionDataSet/query.xml?order=specimen_types.name$asc&numRows=all&criteria=rma::criteria,genes\[acronym$eq{}\],failed%27\]'

如何检索在页面上找到的xml页面？

ql8568 回答：如何检索在页面上找到的xml页面？

大家都在问