将任意GUID编码为可读的ASCII（33-127）的最有效方法是什么？

GUID的标准string表示大约需要36个字符。这是非常好的，但也很浪费。我想知道如何使用33-127范围内的所有ASCII字符以尽可能最短的方式对其进行编码。天真的实现产生22个字符，只是因为128位 / 6位产生22。

霍夫曼编码是我的第二好，唯一的问题是如何select代码….

当然，编码必须是无损的。

使用基地85.请参阅4.1节。 为什么85？ 紧凑的IPv6地址表示方法

一个IPv6地址，就像一个GUID由八个16位块组成。

这是一个古老的问题，但我必须解决这个问题，以便我正在努力实现向后兼容的系统。

确切的要求是客户端生成的标识符将写入数据库并存储在20个字符的唯一列中。它从来没有显示给用户，也没有以任何方式索引。

由于我无法消除这个要求，所以我真的希望使用一个Guid（这在统计上是独一无二的），如果我可以将它无损地编码成20个字符，那么考虑到约束条件，这将是一个很好的解决scheme。

Ascii-85允许您将4个字节的二进制数据编码为5个字节的Ascii数据。所以一个16字节的GUID只能用这种编码方式放入20个ASCII字符中。 Guid可以有3.1962657931507848761677563491821e + 38个离散值，而20个字符的Ascii-85可以有3.8759531084514355873123178482056e + 38个离散值。

写入数据库时，我对截断有一些担心，所以在编码中不包含空格字符。我也遇到了sorting问题，我通过从编码中排除小写字符来解决这个问题。此外，它只会通过参数化命令，所以任何特殊的SQL字符都会自动转义。

我已经包含了C＃代码来执行Ascii-85编码和解码以防止任何人在那里。显然，取决于你的用法，你可能需要select不同的字符集，因为我的约束使我select了一些不寻常的字符，如“ß”和“Ø” – 但这很容易：

 /// <summary> /// This code implements an encoding scheme that uses 85 printable ascii characters /// to encode the same volume of information as contained in a Guid. /// /// Ascii-85 can represent 4 binary bytes as 5 Ascii bytes. So a 16 byte Guid can be /// represented in 20 Ascii bytes. A Guid can have /// 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of /// Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values. /// /// Lower-case characters are not included in this encoding to avoid collation /// issues. /// This is a departure from standard Ascii-85 which does include lower case /// characters. /// In addition, no whitespace characters are included as these may be truncated in /// the database depending on the storage mechanism - ie VARCHAR vs CHAR. /// </summary> internal static class Ascii85 { /// <summary> /// 85 printable ascii characters with no lower case ones, so database /// collation can't bite us. No ' ' character either so database can't /// truncate it! /// Unfortunately, these limitation mean resorting to some strange /// characters like 'Æ' but we won't ever have to type these, so it's ok. /// </summary> private static readonly char[] kEncodeMap = new[] { '0','1','2','3','4','5','6','7','8','9', // 10 'A','B','C','D','E','F','G','H','I','J', // 20 'K','L','M','N','O','P','Q','R','S','T', // 30 'U','V','W','X','Y','Z','|','}','~','{', // 40 '!','"','#','$','%','&','\'','(',')','`', // 50 '*','+',',','-','.','/','[','\\',']','^', // 60 ':',';','<','=','>','?','@','_','¼','½', // 70 '¾','ß','Ç','Ð','€','«','»','¿','•','Ø', // 80 '£','†','‡','§','¥' // 85 }; /// <summary> /// A reverse mapping of the <see cref="kEncodeMap"/> array for decoding /// purposes. /// </summary> private static readonly IDictionary<char, byte> kDecodeMap; /// <summary> /// Initialises the <see cref="kDecodeMap"/>. /// </summary> static Ascii85() { kDecodeMap = new Dictionary<char, byte>(); for (byte i = 0; i < kEncodeMap.Length; i++) { kDecodeMap.Add(kEncodeMap[i], i); } } /// <summary> /// Decodes an Ascii-85 encoded Guid. /// </summary> /// <param name="ascii85Encoding">The Guid encoded using Ascii-85.</param> /// <returns>A Guid decoded from the parameter.</returns> public static Guid Decode(string ascii85Encoding) { // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii. // Since a Guid is 16 bytes long, the Ascii-85 encoding should be 20 // characters long. if(ascii85Encoding.Length != 20) { throw new ArgumentException( "An encoded Guid should be 20 characters long.", "ascii85Encoding"); } // We only support upper case characters. ascii85Encoding = ascii85Encoding.ToUpper(); // Split the string in half and decode each substring separately. var higher = ascii85Encoding.Substring(0, 10).AsciiDecode(); var lower = ascii85Encoding.Substring(10, 10).AsciiDecode(); // Convert the decoded substrings into an array of 16-bytes. var byteArray = new[] { (byte)((higher & 0xFF00000000000000) >> 56), (byte)((higher & 0x00FF000000000000) >> 48), (byte)((higher & 0x0000FF0000000000) >> 40), (byte)((higher & 0x000000FF00000000) >> 32), (byte)((higher & 0x00000000FF000000) >> 24), (byte)((higher & 0x0000000000FF0000) >> 16), (byte)((higher & 0x000000000000FF00) >> 8), (byte)((higher & 0x00000000000000FF)), (byte)((lower & 0xFF00000000000000) >> 56), (byte)((lower & 0x00FF000000000000) >> 48), (byte)((lower & 0x0000FF0000000000) >> 40), (byte)((lower & 0x000000FF00000000) >> 32), (byte)((lower & 0x00000000FF000000) >> 24), (byte)((lower & 0x0000000000FF0000) >> 16), (byte)((lower & 0x000000000000FF00) >> 8), (byte)((lower & 0x00000000000000FF)), }; return new Guid(byteArray); } /// <summary> /// Encodes binary data into a plaintext Ascii-85 format string. /// </summary> /// <param name="guid">The Guid to encode.</param> /// <returns>Ascii-85 encoded string</returns> public static string Encode(Guid guid) { // Convert the 128-bit Guid into two 64-bit parts. var byteArray = guid.ToByteArray(); var higher = ((UInt64)byteArray[0] << 56) | ((UInt64)byteArray[1] << 48) | ((UInt64)byteArray[2] << 40) | ((UInt64)byteArray[3] << 32) | ((UInt64)byteArray[4] << 24) | ((UInt64)byteArray[5] << 16) | ((UInt64)byteArray[6] << 8) | byteArray[7]; var lower = ((UInt64)byteArray[ 8] << 56) | ((UInt64)byteArray[ 9] << 48) | ((UInt64)byteArray[10] << 40) | ((UInt64)byteArray[11] << 32) | ((UInt64)byteArray[12] << 24) | ((UInt64)byteArray[13] << 16) | ((UInt64)byteArray[14] << 8) | byteArray[15]; var encodedStringBuilder = new StringBuilder(); // Encode each part into an ascii-85 encoded string. encodedStringBuilder.AsciiEncode(higher); encodedStringBuilder.AsciiEncode(lower); return encodedStringBuilder.ToString(); } /// <summary> /// Encodes the given integer using Ascii-85. /// </summary> /// <param name="encodedStringBuilder">The <see cref="StringBuilder"/> to /// append the results to.</param> /// <param name="part">The integer to encode.</param> private static void AsciiEncode( this StringBuilder encodedStringBuilder, UInt64 part) { // Nb, the most significant digits in our encoded character will // be the right-most characters. var charCount = (UInt32)kEncodeMap.Length; // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii. // Since a UInt64 is 8 bytes long, the Ascii-85 encoding should be // 10 characters long. for (var i = 0; i < 10; i++) { // Get the remainder when dividing by the base. var remainder = part % charCount; // Divide by the base. part /= charCount; // Add the appropriate character for the current value (0-84). encodedStringBuilder.Append(kEncodeMap[remainder]); } } /// <summary> /// Decodes the given string from Ascii-85 to an integer. /// </summary> /// <param name="ascii85EncodedString">Decodes a 10 character Ascii-85 /// encoded string.</param> /// <returns>The integer representation of the parameter.</returns> private static UInt64 AsciiDecode(this string ascii85EncodedString) { if (ascii85EncodedString.Length != 10) { throw new ArgumentException( "An Ascii-85 encoded Uint64 should be 10 characters long.", "ascii85EncodedString"); } // Nb, the most significant digits in our encoded character // will be the right-most characters. var charCount = (UInt32)kEncodeMap.Length; UInt64 result = 0; // Starting with the right-most (most-significant) character, // iterate through the encoded string and decode. for (var i = ascii85EncodedString.Length - 1; i >= 0; i--) { // Multiply the current decoded value by the base. result *= charCount; // Add the integer value for that encoded character. result += kDecodeMap[ascii85EncodedString[i]]; } return result; } }

另外，这里是unit testing。他们没有我想要的那么彻底，我不喜欢Guid.NewGuid()被使用的非确定性，但他们应该让你开始：

 /// <summary> /// Tests to verify that the Ascii-85 encoding is functioning as expected. /// </summary> [TestClass] [UsedImplicitly] public class Ascii85Tests { [TestMethod] [Description("Ensure that the Ascii-85 encoding is correct.")] [UsedImplicitly] public void CanEncodeAndDecodeAGuidUsingAscii85() { var guidStrings = new[] { "00000000-0000-0000-0000-000000000000", "00000000-0000-0000-0000-0000000000FF", "00000000-0000-0000-0000-00000000FF00", "00000000-0000-0000-0000-000000FF0000", "00000000-0000-0000-0000-0000FF000000", "00000000-0000-0000-0000-00FF00000000", "00000000-0000-0000-0000-FF0000000000", "00000000-0000-0000-00FF-000000000000", "00000000-0000-0000-FF00-000000000000", "00000000-0000-00FF-0000-000000000000", "00000000-0000-FF00-0000-000000000000", "00000000-00FF-0000-0000-000000000000", "00000000-FF00-0000-0000-000000000000", "000000FF-0000-0000-0000-000000000000", "0000FF00-0000-0000-0000-000000000000", "00FF0000-0000-0000-0000-000000000000", "FF000000-0000-0000-0000-000000000000", "FF000000-0000-0000-0000-00000000FFFF", "00000000-0000-0000-0000-0000FFFF0000", "00000000-0000-0000-0000-FFFF00000000", "00000000-0000-0000-FFFF-000000000000", "00000000-0000-FFFF-0000-000000000000", "00000000-FFFF-0000-0000-000000000000", "0000FFFF-0000-0000-0000-000000000000", "FFFF0000-0000-0000-0000-000000000000", "00000000-0000-0000-0000-0000FFFFFFFF", "00000000-0000-0000-FFFF-FFFF00000000", "00000000-FFFF-FFFF-0000-000000000000", "FFFFFFFF-0000-0000-0000-000000000000", "00000000-0000-0000-FFFF-FFFFFFFFFFFF", "FFFFFFFF-FFFF-FFFF-0000-000000000000", "FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF", "1000000F-100F-100F-100F-10000000000F" }; foreach (var guidString in guidStrings) { var guid = new Guid(guidString); var encoded = Ascii85.Encode(guid); Assert.AreEqual( 20, encoded.Length, "A guid encoding should not exceed 20 characters."); var decoded = Ascii85.Decode(encoded); Assert.AreEqual( guid, decoded, "The guids are different after being encoded and decoded."); } } [TestMethod] [Description( "The Ascii-85 encoding is not susceptible to changes in character case.")] [UsedImplicitly] public void Ascii85IsCaseInsensitive() { const int kCount = 50; for (var i = 0; i < kCount; i++) { var guid = Guid.NewGuid(); // The encoding should be all upper case. A reliance // on mixed case will make the generated string // vulnerable to sql collation. var encoded = Ascii85.Encode(guid); Assert.AreEqual( encoded, encoded.ToUpper(), "The Ascii-85 encoding should produce only uppercase characters."); } } }

我希望这可以节省一些麻烦。

另外，如果你发现任何错误，那么让我知道;-)

你有95个字符可用 – 所以，超过6位，但不是多达7（实际上约6.57）。您可以使用128 / log2（95）=大约19.48个字符，编码为20个字符。如果以编码forms保存2个字符值得丢失可读性，就像（伪代码）：

 char encoded[21]; long long guid; // 128 bits number for(int i=0; i<20; ++i) { encoded[i] = chr(guid % 95 + 33); guid /= 95; } encoded[20] = chr(0);

它基本上是通用的“在一些基础上编码”的代码，除了不需要反转“数字”，因为顺序的随意（小端更直接和自然）。为了从编码string中得到guid，以非常相似的方式，以95为底的多项式计算（从每个数字中减去33后）：

 guid = 0; for(int i=0; i<20; ++i) { guid *= 95; guid += ord(encoded[i]) - 33; }

基本上使用Horner的多项式评估方法。

只需去Base64 。

使用33的全部范围（顺便说一下，错误的wirh空间是怎么回事？）127给你95个可能的字符。表示基地95中的2^128可能的guid值将使用20个字符。这（模仿像下降nybbles将是不变的）是最好的，你可以做的。节省自己的麻烦 – 使用基地64。

假设所有GUID都是由相同的algorithm生成的，那么在应用任何其他编码之前，您可以通过不对algorithm半字节进行编码来节省4位。

一个任意的 GUID？ “天真”algorithm将产生最佳结果。进一步压缩GUID的唯一方法是利用“任意”约束排除的数据中的模式。

将任意GUID编码为可读的ASCII（33-127）的最有效方法是什么？

非常快的3D距离检查？

什么是最快的子stringsearchalgorithm？

单程飞行旅行的问题

实时数据捕获的百分比

尾recursion如何工作？

解释梅克尔树用于最终一致性

在不使用临时variables的情况下交换两个variables

为什么会有人使用set而不是unordered_set？

这个时候的复杂性实际上是O（n ^ 2）吗？

Python – 一个字典慢find每个字符的频率？