我对编程和Python语言完全陌生。我正在尝试解析许多XML,以获取一些数据并将其另存为CSV文件。这是我的代码(根据我在Stack Overflow中看到的不同情况构建的)仅解析目录(路径)中的最后一个文件。我在做什么不好?缩进或代码顺序是否有问题?
代码如下:
import xml.etree.ElementTree as ET
import csv
import os
fields = [
('300$a','volume'),('300$b','numero'),('300$c','parte'),('300$d','pag'),('245$a','title-group/article-title[1]'),('242$a','title-group/article-title[2]'),('242$y','lng'),('024$a','article-id[@pub-id-type="doi"]'),('041$a',('590$a','Art'),('590$b','focus'),('546$a','lng_abstract'),('520$a','abstract/p[1]'),'abstract/p[2]'),'abstract/p[3]'),('Surname_1','contrib-group/contrib[1]/name/surname'),('Given_1','contrib-group/contrib[1]/name/given-names'),('Surname_2','contrib-group/contrib[2]/name/surname'),('Given_2','contrib-group/contrib[2]/name/given-names'),('Surname_3','contrib-group/contrib[3]/name/surname'),('Given_3','contrib-group/contrib[3]/name/given-names'),('Surname_4','contrib-group/contrib[4]/name/surname'),('Given_4','contrib-group/contrib[4]/name/given-names')]
path = r'E:\Files\Nueva carpeta'
for filename in os.listdir(path):
if not filename.endswith('.xml'):
continue
fullname = os.path.join(path,filename)
tree = ET.parse(fullname)
root = tree.getroot()
with open('article-meta.csv','w') as f_article:
csv_article_meta = csv.DictWriter(f_article,fieldnames=[field for field,match in fields])
csv_article_meta.writeheader()
for node in tree.iter('article-meta'):
row = {}
for field_name,match in fields:
try:
row[field_name] = node.find(match).text
except AttributeError as e:
row[field_name] = ''
csv_article_meta.writerow(row)
XML如下:
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
.
.
.
<article-meta>
<article-id>S0104-40602017000700027</article-id>
<article-id pub-id-type="doi">10.1590/0104-4060.52923</article-id>
<title-group>
<article-title xml:lang="pt">
<![CDATA[
A inclusão das pessoas com deficiência: panorama inclusivo no ensino superior no Brasil e em Portugal
]]>
</article-title>
<article-title xml:lang="en">
<![CDATA[
Inclusion of people with disabilities: Inclusive panorama in higher education in Brazil and Portugal
]]>
</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>
<![CDATA[ Pereira ]]>
</surname>
<given-names>
<![CDATA[ Carlos Eduardo Candido ]]>
</given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>
<![CDATA[ Albuquerque ]]>
</surname>
<given-names>
<![CDATA[ Cristina Maria Pinto ]]>
</given-names>
.
.
.
</article-meta>
</front>
对不起,我正在学习英语。