获取Python可以编码的所有编码的列表

我正在编写一个脚本,将在Python 2.6中尝试将字节编码为许多不同的编码。 有什么办法可以获得我可以迭代的可用编码列表吗?

我试图这样做的原因是因为用户有一些文本编码不正确。 有有趣的人物。 我知道这个搞乱了的unicodeangular色。 我希望能够给他们一个答案,如“你的文本编辑器正在将该string解释为X编码,而不是Y编码”。 我想我会尝试使用一种编码对该字符进行编码,然后使用另一种编码重新对其进行解码,然后查看是否获得相同的字符序列。

即这样的事情:

for encoding1, encoding2 in itertools.permutation(encodinglist(), 2): try: unicode_string = my_unicode_character.encode(encoding1).decode(encoding2) except: pass 

不幸的是, encodings.aliases.aliases.keys()不是一个合适的答案。

aliases (如人们所预期的那样)包含几个不同的键被映射到相同值的情况,例如1252cp1252都被映射到cp1252 。 如果使用set(aliases.values())来代替aliases.keys()则可以节省时间。

但有一个更糟糕的问题: aliases不包含没有别名的编解码器(如cp856,cp874,cp875,cp737和koi8_u)。

 >>> from encodings.aliases import aliases >>> def find(q): ... return [(k,v) for k, v in aliases.items() if q in k or q in v] ... >>> find('1252') # multiple aliases [('1252', 'cp1252'), ('windows_1252', 'cp1252')] >>> find('856') # no codepage 856 in aliases [] >>> find('koi8') # no koi8_u in aliases [('cskoi8r', 'koi8_r')] >>> 'x'.decode('cp856') # but cp856 is a valid codec u'x' >>> 'x'.decode('koi8_u') # but koi8_u is a valid codec u'x' >>> 

同样值得注意的是,无论您获得完整的编解码器列表,忽略与编码/解码字符集无关的编解码器可能是一个好主意,但可以执行一些其他转换,例如zlibquopribase64

这给我们带来了一个问题,你为什么要“尝试将字节编码成许多不同的编码”。 如果我们知道的话,我们可能会把你引向正确的方向。

首先,这是不明确的。 一个DE代码字节到unicode,一个EN代码unicode字节。 你想要做什么?

你真正想要实现什么:你是否试图确定使用哪个编解码器来解码一些传入的字节,并计划尝试使用所有可能的编解码器? [注:latin1将解码任何东西]你是否试图确定一些unicode文本的语言,试图编码所有可能的编解码器? [注:utf8将编码任何东西]。

这里的其他答案似乎表明, 编程构build这个列表是困难的,充满陷阱。 但是,这样做可能是不必要的,因为文档包含Python支持的标准编码的完整列表,并且自从Python 2.3以来已经完成。

您可以在以下位置find这些列表(针对迄今为止发布的每种稳定版本的语言):

下面是每个Python文档版本的列表。 请注意,如果您想要向后兼容,而不仅仅是支持特定版本的Python,则可以从最新的 Python版本复制列表,并在尝试使用它之前检查运行程序的Python中是否存在每个编码 。

Python 2.3(59编码)

 ['ascii', 'cp037', 'cp424', 'cp437', 'cp500', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp869', 'cp874', 'cp875', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8'] 

Python 2.4(85编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8'] 

Python 2.5(86编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 2.6(90编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 2.7(93编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_11', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 3.0(89编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 3.1(90编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 3.2(92编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 3.3(93编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'cp65001', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 3.4(96编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1125', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'cp65001', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_11', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 3.5(98编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1125', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'cp65001', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_11', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_t', 'koi8_u', 'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

Python 3.6(98编码)

 ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp273', 'cp424', 'cp437', 'cp500', 'cp720', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp858', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1125', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'cp65001', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_11', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'iso8859_16', 'johab', 'koi8_r', 'koi8_t', 'koi8_u', 'kz1048', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_32', 'utf_32_be', 'utf_32_le', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'utf_8_sig'] 

如果他们与任何人的用例有关,请注意,文档还列出了一些特定于Python的编码 ,其中许多似乎主要供Python内部使用,或者在某些方面是奇怪的,比如'undefined'编码如果您尝试使用它,则始终引发exception。 如果像这里的问题提问者那样,你可能想要完全忽略它们,你试图找出在真实世界中遇到的一些文本使用了什么编码。 从Python 3.6开始,列表如下:

 ["idna", "mbcs", "oem", "palmos", "punycode", "raw_unicode_escape", "rot_13", "undefined", "unicode_escape", "unicode_internal", "base64_codec", "bz2_codec", "hex_codec", "quopri_codec", "string_escape", "uu_codec", "zlib_codec"] 

最后,如果你想更新我的表,以获得更新版本的Python,下面是我用来生成它们的(原始的,不是非常健壮的)脚本:

 import requests import lxml.html import pprint for version, url in [ ('2.3', 'https://docs.python.org/2.3/lib/node130.html'), ('2.4', 'https://docs.python.org/2.4/lib/standard-encodings.html'), ('2.5', 'https://docs.python.org/2.5/lib/standard-encodings.html'), ('2.6', 'https://docs.python.org/2.6/library/codecs.html#standard-encodings'), ('2.7', 'https://docs.python.org/2.7/library/codecs.html#standard-encodings'), ('3.0', 'https://docs.python.org/3.0/library/codecs.html#standard-encodings'), ('3.1', 'https://docs.python.org/3.1/library/codecs.html#standard-encodings'), ('3.2', 'https://docs.python.org/3.2/library/codecs.html#standard-encodings'), ('3.3', 'https://docs.python.org/3.3/library/codecs.html#standard-encodings'), ('3.4', 'https://docs.python.org/3.4/library/codecs.html#standard-encodings'), ('3.5', 'https://docs.python.org/3.5/library/codecs.html#standard-encodings'), ('3.6', 'https://docs.python.org/3.6/library/codecs.html#standard-encodings'), ]: html = requests.get(url).text doc = lxml.html.fromstring(html) standard_encodings_table = doc.xpath( '//table[preceding::h2[.//text()[contains(., "Standard Encodings")]]][//th/text()="Codec"]' )[0] codecs = standard_encodings_table.xpath('.//td[1]/text()') print("## Python %s (%i encodings)" % (version, len(codecs))) print('<pre><code>' + pprint.pformat(codecs) + '</code></pre>') 

也许你应该尝试使用通用编码检测器 (chardet)库,而不是自己实现它。

 >>> import chardet >>> s = '\xe2\x98\x83' # ☃ >>> chardet.detect(s) {'confidence': 0.505, 'encoding': 'utf-8'} 

您可以使用一种技术来列出encodings包中的所有模块。

 import pkgutil import encodings false_positives = set(["aliases"]) found = set(name for imp, name, ispkg in pkgutil.iter_modules(encodings.__path__) if not ispkg) found.difference_update(false_positives) print found 

我怀疑在编解码器模块中有这样的方法/function,但是如果你看到encoding/__init__.py ,searchfunctionsearch通过编码模块文件夹,所以你可以做同样的

 >>> os.listdir(os.path.dirname(encodings.__file__)) ['cp500.pyc', 'utf_16_le.py', 'gb18030.py', 'mbcs.pyc', 'undefined.pyc', 'idna.pyc', 'punycode.pyc', 'cp850.py', 'big5hkscs.pyc', 'mac_arabic.py', '__init__.pyc', 'string_escape.py', 'hz.py', 'cp037.py', 'cp737.py', 'iso8859_5.pyc', 'iso8859_13.pyc', 'cp861.pyc', 'cp862.py', 'iso8859_9.pyc', 'cp949.py', 'base64_codec.pyc', 'koi8_r.py', 'iso8859_2.py', 'ptcp154.pyc', 'uu_codec.pyc', 'mac_croatian.pyc', 'charmap.pyc', 'iso8859_15.pyc', 'euc_jp.py', 'cp1250.py', 'iso8859_10.pyc', 'koi8_r.pyc', 'unicode_escape.pyc', 'cp863.pyc', 'iso8859_4.pyc', 'cp852.py', 'unicode_internal.py', 'big5hkscs.py', 'cp1257.pyc', 'cp1254.py', 'shift_jisx0213.py', 'shift_jis.pyc', 'cp869.pyc', 'hp_roman8.py', 'iso8859_4.py', 'cp775.py', 'cp1251.py', 'mac_cyrillic.pyc', 'mac_greek.pyc', 'mac_roman.pyc', 'iso8859_11.pyc', 'iso8859_6.py', 'utf_8_sig.py', 'iso8859_3.py', 'iso2022_jp_1.py', 'ascii.py', 'cp1026.pyc', 'cp1250.pyc', 'cp950.py', 'raw_unicode_escape.py', 'euc_jis_2004.pyc', 'cp775.pyc', 'euc_kr.py', 'mac _greek.py', 'big5.pyc', 'shift_jis_2004.pyc', 'gbk.pyc', 'cp1254.pyc', 'cp1255.pyc', 'cp855.pyc', 'string_escape.pyc', 'cp949.pyc', 'cp1258.pyc', 'iso8859_3.pyc', 'mac_iceland.pyc', 'cp1251.pyc', 'cp860.py', 'cp856.py', 'cp874.py', 'iso2022_kr.py', 'cp856.pyc', 'rot_13.py', 'palmos.py', 'iso2022_jp_2.pyc', 'mac_farsi.py', 'koi8_u.pyc', 'cp1256.py', 'iso8859_10.py', 'tis_620.py', 'iso8859_14.pyc', 'cp1253.py', 'cp1258.py', 'cp437.py', 'cp862.pyc', 'mac_turkish.py', 'undefined.py', 'euc_kr.pyc', 'gb18030.pyc', 'aliases.pyc', 'iso8859_9.py', 'uu_codec.py', 'gbk.py', 'quopri_codec.pyc', 'iso8859_7.py', 'mac_iceland.py', 'iso8859_2.pyc', 'euc_jis_2004.py', 'iso2022_jp_3.pyc', 'cp874.pyc', '__init__.py', 'mac_roman.py', 'iso8859_16.py', 'cp866.py', 'unicode_internal.pyc', 'mac_turkish.pyc', 'johab.pyc', 'cp037.pyc', 'punycode.py', 'cp1253.pyc', 'euc_jisx0213.pyc', 'iso2022_jp_2004.pyc', 'iso2022_kr.pyc', 'zlib_codec.pyc', 'cp932.py', 'cp1255.py', 'iso2022_jp_1.pyc', 'cp857.pyc', 'cp424.pyc', 'iso2022_jp_2.py', 'iso2022_jp.pyc', 'mbcs.py', 'utf_8.py', 'palmos.pyc', 'cp1252.pyc', 'aliases.py', 'quopri_codec.py', 'latin_1.pyc', 'iso2022_jp.py', 'zlib_codec.py', 'cp1026.py', 'cp860.pyc', 'cp1252.py', 'hex_codec.pyc', 'iso8859_1.pyc', 'cp850.pyc', 'cp861.py', 'iso8859_15.py', 'cp865.pyc', 'hp_roman8.pyc', 'iso8859_7.pyc', 'mac_latin2.py', 'iso8859_11.py', 'mac_centeuro.pyc', 'iso8859_6.pyc', 'ascii.pyc', 'mac_centeuro.py', 'iso2022_jp_3.py', 'bz2_codec.py', 'mac_arabic.pyc', 'euc_jisx0213.py', 'tis_620.pyc', 'shift_jis_2004.py', 'utf_8.pyc', 'cp855.py', 'mac_romanian.pyc', 'iso8859_8.py', 'cp869.py', 'ptcp154.py', 'utf_16_be.py', 'iso2022_jp_ext.pyc', 'bz2_codec.pyc', 'base64_codec.py', 'latin_1.py', 'charmap.py', 'hz.pyc', 'cp950.pyc', 'cp875.pyc', 'cp1006.pyc', 'utf_16.py', 'shift_jisx0213.pyc', 'cp424.py', 'cp932.pyc', 'iso8859_5.py', 'mac_romanian.py', 'utf_8_sig.pyc', 'iso8859_1.py', 'cp875.py', 'cp437.pyc', 'cp865.py', 'utf_7.py', 'utf_16_be.pyc', 'rot_13.pyc', 'euc_jp.p yc', 'raw_unicode_escape.pyc', 'iso8859_8.pyc', 'utf_16.pyc', 'iso8859_14.py', 'iso8859_16.pyc', 'cp852.pyc', 'cp737.pyc', 'mac_croatian.py', 'mac_latin2.pyc', 'iso2022_jp_ext.py', 'cp1140.py', 'mac_cyrillic.py', 'cp1257.py', 'cp500.py', 'cp1140.pyc', 'shift_jis.py', 'unicode_escape.py', 'cp864.py', 'cp864.pyc', 'cp857.py', 'hex_codec.py', 'mac_farsi.pyc', 'idna.py', 'johab.py', 'utf_7.pyc', 'cp863.py', 'iso8859_13.py', 'koi8_u.py', 'gb2312.pyc', 'cp1256.pyc', 'cp866.pyc', 'iso2022_jp_2004.py', 'utf_16_le.pyc', 'gb2312.py', 'cp1006.py', 'big5.py'] 

但任何人都可以注册一个编解码器,所以这将不会是详尽的列表。

Python源代码在Tools/unicode/listcodecs.py有一个脚本,其中列出了所有的编解码器。

然而,列出的编解码器中有一些不是Unicode到字节转换器,就像quopri_codecquopri_codecbz2_codec一样, quopri_codec bz2_codec Machin指出的那样。

可能你可以这样做:

 from encodings.aliases import aliases print aliases.keys() 

下面是列出stdlib编码包中定义的所有编码的编程方式,注意这不会列出用户定义的编码。 这结合了其他答案中的一些技巧,但实际上使用编解码器的规范名称生成工作列表。

 import encodings import pkgutil import pprint all_encodings = set() for _, modname, _ in pkgutil.iter_modules( encodings.__path__, encodings.__name__ + '.', ): try: mod = __import__(modname, fromlist=[str('__trash')]) except (ImportError, LookupError): # A few encodings are platform specific: mcbs, cp65001 # print('skip {}'.format(modname)) pass try: all_encodings.add(mod.getregentry().name) except AttributeError as e: # the `aliases` module doensn't actually provide a codec # print('skip {}'.format(modname)) if 'regentry' not in str(e): raise pprint.pprint(sorted(all_encodings))