如何使用Python解析多个XML文件并将其保存为CSV文件?

我对编程和Python语言完全陌生。我正在尝试解析许多XML,以获取一些数据并将其另存为CSV文件。这是我的代码(根据我在Stack Overflow中看到的不同情况构建的)仅解析目录(路径)中的最后一个文件。我在做什么不好?缩进或代码顺序是否有问题?

代码如下:

import xml.etree.ElementTree as ET
import csv
import os


fields = [
    ('300$a','volume'),('300$b','numero'),('300$c','parte'),('300$d','pag'),('245$a','title-group/article-title[1]'),('242$a','title-group/article-title[2]'),('242$y','lng'),('024$a','article-id[@pub-id-type="doi"]'),('041$a',('590$a','Art'),('590$b','focus'),('546$a','lng_abstract'),('520$a','abstract/p[1]'),'abstract/p[2]'),'abstract/p[3]'),('Surname_1','contrib-group/contrib[1]/name/surname'),('Given_1','contrib-group/contrib[1]/name/given-names'),('Surname_2','contrib-group/contrib[2]/name/surname'),('Given_2','contrib-group/contrib[2]/name/given-names'),('Surname_3','contrib-group/contrib[3]/name/surname'),('Given_3','contrib-group/contrib[3]/name/given-names'),('Surname_4','contrib-group/contrib[4]/name/surname'),('Given_4','contrib-group/contrib[4]/name/given-names')]

path = r'E:\Files\Nueva carpeta'

for filename in os.listdir(path):
if not filename.endswith('.xml'):
    continue

fullname = os.path.join(path,filename)

tree = ET.parse(fullname)
root = tree.getroot()

with open('article-meta.csv','w') as f_article:
    csv_article_meta = csv.DictWriter(f_article,fieldnames=[field for field,match in fields])
    csv_article_meta.writeheader()

    for node in tree.iter('article-meta'):
        row = {}

        for field_name,match in fields:
            try:
                row[field_name] = node.find(match).text
            except AttributeError as e:
                row[field_name] = ''

    csv_article_meta.writerow(row)

XML如下:

<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
.
.
.
    <article-meta>
        <article-id>S0104-40602017000700027</article-id>
        <article-id pub-id-type="doi">10.1590/0104-4060.52923</article-id>
        <title-group>
            <article-title xml:lang="pt">
                <![CDATA[
                    A inclusão das pessoas com deficiência: panorama inclusivo no ensino superior no Brasil e em Portugal
                    ]]>
        </article-title>
        <article-title xml:lang="en">
        <![CDATA[
            Inclusion of people with disabilities: Inclusive panorama in higher education in Brazil and Portugal
            ]]>
        </article-title>
        </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>
                            <![CDATA[ Pereira ]]>
                        </surname>
                        <given-names>
                            <![CDATA[ Carlos Eduardo Candido ]]>
                        </given-names>
                        </name>
                        <xref ref-type="aff" rid="Aff"/>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>
                            <![CDATA[ Albuquerque ]]>
                        </surname>
                        <given-names>
                            <![CDATA[ Cristina Maria Pinto ]]>
                        </given-names>
                        .
                        .
                        .
    </article-meta>
</front>

对不起,我正在学习英语。

aaawang312 回答:如何使用Python解析多个XML文件并将其保存为CSV文件?

for filename in os.listdir(path):
if not filename.endswith('.xml'):
    continue

fullname = os.path.join(path,filename)

在上面的代码中,您遍历所有文件,跳过那些不是XML的文件,但是如果文件确实是XML文件,则您什么也不做! filename是表示迭代过程中当前项目的变量,因此,当您以后编写os.path.join(path,filename)时,filename始终是listdir中最后一项的值。

这是如何获取所有XML文件路径的粗略草图:

xml_file_paths = [os.path.join(path,curr_f_name) for curr_f_name in os.listdir(path) if curr_f_name.endswith('.xml')]
本文链接:https://www.f2er.com/3114060.html

大家都在问