所有包含的字符集，以避免“java.nio.charset.MalformedInputException：input长度= 1”？

我用Java创build了一个简单的wordcount程序，通过目录的文本文件读取。

但是，我不断收到错误：

java.nio.charset.MalformedInputException: Input length = 1

从这行代码：

 BufferedReader reader = Files.newBufferedReader(file,Charset.forName("UTF-8"));

我知道我可能会得到这个，因为我使用了一个Charset ，它没有在文本文件中包含一些字符，其中一些包含了其他语言的字符。但我想包括这些字符。

后来我在JavaDocs上了解到Charset是可选的，只用于更有效的读取文件，所以我将代码改为：

 BufferedReader reader = Files.newBufferedReader(file);

但是有些文件仍然会抛出MalformedInputException 。我不知道为什么。

我想知道是否有一个全包的Charset ，将允许我阅读文字文件与许多不同types的字符 ？

谢谢。

你可能想要一个支持的编码列表。对于每个文件，依次尝试每个编码，也许从UTF-8开始。每当您捕获MalformedInputException ，请尝试下一个编码。

从Files.newBufferedReader创buildBufferedReader

 Files.newBufferedReader(Paths.get("a.txt"), StandardCharsets.UTF_8);

运行应用程序时可能会抛出以下exception：

 java.nio.charset.MalformedInputException: Input length = 1

但

 new BufferedReader(new InputStreamReader(new FileInputStream("a.txt"),"utf-8"));

效果很好。

不同的是，前者使用CharsetDecoder的默认操作。

错误input和不可映射字符错误的默认操作是报告它们。

而后者使用REPLACE操作。

 cs.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE)

我也遇到了这个错误消息的exception，

 java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.BufferedWriter.write(Unknown Source) at java.io.Writer.write(Unknown Source)

并发现尝试使用时会发生一些奇怪的错误

 BufferedWriter writer = Files.newBufferedWriter(Paths.get(filePath));

写一个类从一个genericstypes转换为string“orazg 54”。

 //key is of generic type <Key extends Comparable<Key>> writer.write(item.getKey() + "\t" + item.getValue() + "\n");

该string的长度为9，包含以下代码点的字符：

111 114 97 122 103 9 53 52 10

但是，如果类中的BufferedWriter被replace为：

 FileOutputStream outputStream = new FileOutputStream(filePath); BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(outputStream));

它可以成功写入这个string，没有例外。另外，如果我从字符中写入相同的String创build，它仍然可以正常工作。

 String string = new String(new char[] {111, 114, 97, 122, 103, 9, 53, 52, 10}); BufferedWriter writer = Files.newBufferedWriter(Paths.get("a.txt")); writer.write(string); writer.close();

以前，我从来没有遇到任何exception，当使用第一个BufferedWriter写任何string。这是从java.nio.file.Files.newBufferedWriter（path，options）创build的BufferedWriter发生的一个奇怪的错误

ISO-8859-1是一个包罗万象的字符集，它保证不会抛出MalformedInputException。因此，即使您的input不在此字符集中，也适用于debugging。所以：-

 req.setCharacterEncoding("ISO-8859-1");

在我的input中有一些双引号或双引号的字符，US-ASCII和UTF-8都向他们投掷了MalformedInputException，但是ISO-8859-1起作用了。

我写了以下内容，根据可用的字符集打印出标准结果列表。请注意，它还会告诉您在排除导致问题的原因的情况下，基于0的行号码会失败。

  public static void testCharset(String fileName){ SortedMap<String, Charset> charsets = Charset.availableCharsets(); for(String k:charsets.keySet()){ int line = 0; boolean success = true; try(BufferedReader b = Files.newBufferedReader(Paths.get(fileName),charsets.get(k))){ while(b.ready()){ b.readLine(); line++; } } catch (IOException e) { success = false; System.out.println(k+" failed on line "+line); } if(success) System.out.println("************************* Successs "+k); } }

那么，问题是Files.newBufferedReader(Path path)是这样实现的：

 public static BufferedReader newBufferedReader(Path path) throws IOException { return newBufferedReader(path, StandardCharsets.UTF_8); }

所以基本上指定UTF-8是没有意义的，除非你想在你的代码中描述。如果你想尝试一个“更广泛的”字符集，你可以尝试使用StandardCharsets.UTF_16 ，但是无论如何你不可能100％地确定每一个可能的字符。

你可以尝试这样的事情，或只是复制和过去下面的一块。

  boolean exception = true; Charset charset = Charset.defaultCharset(); //Try the default one first. int index = 0; while(exception){ try { lines = Files.readAllLines(f.toPath(),charset); for(String line: lines){ line= line.trim(); if(line.contains(keyword)) values.add(line); } //No exception, just returns exception = false; } catch (IOException e) { exception = true; //Try the next charset if(index<Charset.availableCharsets().values().size()) charset = (Charset) Charset.availableCharsets().values().toArray()[index]; index ++; } }

所有包含的字符集，以避免“java.nio.charset.MalformedInputException：input长度= 1”？

Java：将string转换为ByteBuffer以及相关的问题

如何在Java中find默认的字符集/编码？

在java中编码转换

json_encode（）：参数中的UTF-8序列无效

写Unicode文本到文本文件？

如何检测文本文件的字符编码？

Ruby on Rails 3，不兼容的字符编码：UTF-8和带有i18n的ASCII-8BIT

什么是UTF-16的重点？

没有BOM的UTF-8和UTF-8有什么区别？

什么是“Content-type：application / json; charset = utf-8“真的是什么意思？