输入XML中的中文字符导致XSLT转换在输出XML中喷出无效字符引用

我试图弄清楚为什么我本来应该将XML转换为XML的简单XSLT转换似乎无法实现。

转换仅复制所有内容:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="xml" encoding="utf-8" />
    <xsl:template match="*|@*">
        <xsl:copy>
            <xsl:apply-templates select="*|@*|text()" />
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

使用输入的XML文件,如下所示:

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="uri:foo">
  <name>丕????</name>
</foo>

以下是结果:

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="uri:foo">
  <name>丕&#55360;&#56326;&#55360;&#56325;&#55360;&#56333;&#55360;&#56384;</name>
</foo>

我使用的所有工具都依赖于(Java)Apache Xalan 2.7.1 XSLT处理器,包括带有XSL Developer Tools插件的eclipse(Mars),我在其中创建了此示例。

后一个插件声称输入XML的格式正确,但是输出XML的格式不正确(字符参考&#55360是无效的XML字符)。

为什么我的XSLT处理器生成无效的XML,如何防止它这样做呢?

实际的代码与此类似(您需要在类路径中使用Xalan

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

public class XSLTTest {

    private final TransformerFactory xalanTransFact;

    public XSLTTest() {
        xalanTransFact = new org.apache.xalan.processor.TransformerFactoryImpl();
    }

    public Templates createCustomTransformation(
            File transformation
    ) throws TransformerException,IOException {
        InputStreamReader readerTransformation = null;
        try {
            readerTransformation = new InputStreamReader(
                    new FileInputStream(transformation),StandardCharsets.UTF_8);  
            Templates transformer = xalanTransFact.newTemplates(
                    new StreamSource(readerTransformation)
            );
            return transformer;
        } catch (TransformerException | IOException ex) {
            throw ex;
        } finally {
            try {
                if (readerTransformation != null) {
                    readerTransformation.close();
                }
            } catch (IOException ex) {} 
        }
    }

    public File applyCustomTransformation(
            Transformer transformer,Reader transformeeReader,Path out,boolean indent
    ) throws TransformerException,IOException {
        Writer writer = null;
        try {

            File file = out.toFile();
            writer = new OutputStreamWriter(new FileOutputStream(file),StandardCharsets.UTF_8);

            if (indent) {
                transformer.setOutputProperty(OutputKeys.INDENT,"yes");
                transformer.setOutputProperty(
                        "{http://xml.apache.org/xslt}indent-amount",String.valueOf(2));
            }
            transformer.setOutputProperty(OutputKeys.METHOD,"xml");
            transformer.setOutputProperty(OutputKeys.ENCODING,"utf-8");

            transformer.transform(
                    new StreamSource(transformeeReader),new StreamResult(writer));

            return file;

        } catch (TransformerException | IOException ex) {
            throw ex;
        } finally {      
            try {
                if (writer != null) {
                    writer.close();
                }
            } catch (IOException ex) {}
        }
    }

    private void saveToFile(File selectedFile,String content)
            throws FileNotFoundException,IOException {
        Writer writer = null;
        try {
            writer = new OutputStreamWriter(
                    new FileOutputStream(selectedFile),StandardCharsets.UTF_8);
            writer.write(content);
            writer.flush();
        }
        catch (FileNotFoundException ex) {
            throw ex;
        } catch (IOException ex) {
            throw ex;
        } finally {
            if (writer != null) {
                try {
                    writer.close();
                } catch (IOException ex) {
                }
            }
        }
    }

    public static void main(String[] args) throws IOException,TransformerException {
        String xslText = "" +
"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
"<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\"\n" +
"    version=\"1.0\">\n" +
"    <xsl:output method=\"xml\" encoding=\"utf-8\" />\n" +
"    <xsl:template match=\"*|@*\">\n" +
"        <xsl:copy>\n" +
"            <xsl:apply-templates select=\"*|@*|text()\" />\n" +
"        </xsl:copy>\n" +
"    </xsl:template>\n" +
"</xsl:stylesheet>";

        String xmlToParse = "" +
"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
"<foo xmlns=\"uri:foo\">\n" +
"  <name>丕????</name>\n" +
"</foo>";

        XSLTTest test = new XSLTTest();

        Path xsl = Files.createTempFile("test",".xsl");
        test.saveToFile(xsl.toFile(),xslText);        
        Templates templates = test.createCustomTransformation(xsl.toFile());
        Transformer transformer = templates.newTransformer();

        Path xml = Files.createTempFile("test-out",".xml");
        StringReader reader = new StringReader(xmlToParse);
        test.applyCustomTransformation(transformer,reader,xml,true);

        System.out.println("Result is at: " + xml.toString());
    }
}

由于某些原因,我无法切换到另一个XSLT处理器。

just483 回答:输入XML中的中文字符导致XSLT转换在输出XML中喷出无效字符引用

@VGR在评论中写道,这是错误https://issues.apache.org/jira/browse/XALANJ-2419的体现。

对他们的JIRA的评论提出了一种解决方法-使用UTF-16作为转换的输出编码,而不是UTF-8,因为该错误只会影响后者。

因此,在我的示例中,行

transformer.setOutputProperty(OutputKeys.ENCODING,"utf-8");

需要替换为

// workaround for https://issues.apache.org/jira/browse/XALANJ-2419
transformer.setOutputProperty(OutputKeys.ENCODING,"utf-16");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,"yes");            
writer.write("<?xml version=\"1.0\" encoding=\"utf-8\"?>\n");

而其他所有内容保持不变。实际的文件仍然写为UTF-8,但转换将在内部以UTF-16进行处理。

本文链接:https://www.f2er.com/3158614.html

大家都在问