为什么US-ASCII编码文件包含非US-ASCII字符？

2024-04-29 • 问答

当我在Linux shell中执行file -i file_name时，结果是：

text/plain; charset=us-ascii

但是此文件包含一个非US-ASCII字符â，它是扩展ASCII。

如果我修改文件并由Vim将其另存为新文件，则新文件的格式将更改为ISO-8859-1。

我尝试使用以下代码通过Java生成新文件，新文件的格式也为ISO-8859-1。

try (FileInputStream fileInputStream = new FileInputStream(in);
     BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream);
     FileOutputStream fileOutputStream = new FileOutputStream(out);
     BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream)) {
    byte[] buffer = new byte[100];

    while (bufferedInputStream.read(buffer) != -1) {
        bufferedOutputStream.write(buffer);
    }
}

如果我使用iconv将新文件的格式从ISO-8859-1更改为US-ASCII，则会显示：

iconv: illegal input sequence at position 15

我不知道为什么此US-ASCII编码文件可以包含非US-ASCII字符。以及如何创建这样的文件？

谢谢！

为什么US-ASCII编码文件包含非US-ASCII字符？

helei0838 回答：为什么US-ASCII编码文件包含非US-ASCII字符？

大家都在问