如何在C ++中的big-endian和little-endian之间进行转换?
如何在C ++中的big-endian和little-endian之间进行转换?
编辑:为了清楚起见,我必须翻译二进制数据(双精度浮点值和32位和64位整数)从一个CPU架构到另一个。 这不涉及networking,所以ntoh()和类似的function不会在这里工作。
编辑#2:我接受的答案直接适用于我所针对的编译器(这就是为什么我select它)。 但是,这里还有其他非常好的,更便携的答案。
如果您使用的是Visual C ++,请执行以下操作:包含intrin.h并调用以下函数:
对于16位数字:
unsigned short _byteswap_ushort(unsigned short value);
对于32位数字:
unsigned long _byteswap_ulong(unsigned long value);
对于64位数字:
unsigned __int64 _byteswap_uint64(unsigned __int64 value);
8位数字(字符)不需要转换。
而且这些只是为无符号值定义,它们也适用于有符号整数。
对于浮点数和双精度来说,与普通整数相比,它更加困难,因为它们可能不是主机中的字节顺序。 你可以在big-endian机器上获得小尾数的浮点数,反之亦然。
其他编译器也有类似的内在因素。
以GCC为例,你可以直接调用:
int32_t __builtin_bswap32 (int32_t x) int64_t __builtin_bswap64 (int64_t x)
(不需要包含任何东西)。 Afaik bits.h也以非gcc为中心的方式声明了相同的函数。
16位交换它只是一个位旋转。
调用内部函数而不是滚动你自己给你最好的性能和代码密度btw ..
简单的说:
#include <climits> template <typename T> T swap_endian(T u) { static_assert (CHAR_BIT == 8, "CHAR_BIT != 8"); union { T u; unsigned char u8[sizeof(T)]; } source, dest; source.u = u; for (size_t k = 0; k < sizeof(T); k++) dest.u8[k] = source.u8[sizeof(T) - k - 1]; return dest.u; }
用法: swap_endian<uint32_t>(42)
。
从罗伯·派克的字节顺序谬误 :
假设您的数据stream有一个小端编码的32位整数。 这是如何提取它(假设无符号字节):
i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
如果是大端的话,下面是如何提取它:
i = (data[3]<<0) | (data[2]<<8) | (data[1]<<16) | (data[0]<<24);
TL; DR:不要担心你的平台本地顺序,重要的是你正在读取的stream的字节顺序,你最好希望它的定义。
注意:注释中没有显式types转换,重要的是data
是一个unsigned char
或uint8_t
的数组。 使用带signed char
或char
(如果有符号)将导致data[x]
被提升为一个整数, data[x] << 24
可能将1移入符号位UB。
如果您为了networking/主机兼容性的目的而这样做,您应该使用:
ntohl() //Network to Host byte order (Long) htonl() //Host to Network byte order (Long) ntohs() //Network to Host byte order (Short) htons() //Host to Network byte order (Short)
如果你这样做是由于其他原因,这里介绍的byte_swap解决scheme之一将工作得很好。
我从这篇文章中提出了一些build议,并把它们放在一起来形成这个:
#include <boost/type_traits.hpp> #include <boost/static_assert.hpp> #include <boost/detail/endian.hpp> #include <stdexcept> enum endianness { little_endian, big_endian, network_endian = big_endian, #if defined(BOOST_LITTLE_ENDIAN) host_endian = little_endian #elif defined(BOOST_BIG_ENDIAN) host_endian = big_endian #else #error "unable to determine system endianness" #endif }; namespace detail { template<typename T, size_t sz> struct swap_bytes { inline T operator()(T val) { throw std::out_of_range("data size"); } }; template<typename T> struct swap_bytes<T, 1> { inline T operator()(T val) { return val; } }; template<typename T> struct swap_bytes<T, 2> { inline T operator()(T val) { return ((((val) >> 8) & 0xff) | (((val) & 0xff) << 8)); } }; template<typename T> struct swap_bytes<T, 4> { inline T operator()(T val) { return ((((val) & 0xff000000) >> 24) | (((val) & 0x00ff0000) >> 8) | (((val) & 0x0000ff00) << 8) | (((val) & 0x000000ff) << 24)); } }; template<> struct swap_bytes<float, 4> { inline float operator()(float val) { uint32_t mem =swap_bytes<uint32_t, sizeof(uint32_t)>()(*(uint32_t*)&val); return *(float*)&mem; } }; template<typename T> struct swap_bytes<T, 8> { inline T operator()(T val) { return ((((val) & 0xff00000000000000ull) >> 56) | (((val) & 0x00ff000000000000ull) >> 40) | (((val) & 0x0000ff0000000000ull) >> 24) | (((val) & 0x000000ff00000000ull) >> 8 ) | (((val) & 0x00000000ff000000ull) << 8 ) | (((val) & 0x0000000000ff0000ull) << 24) | (((val) & 0x000000000000ff00ull) << 40) | (((val) & 0x00000000000000ffull) << 56)); } }; template<> struct swap_bytes<double, 8> { inline double operator()(double val) { uint64_t mem =swap_bytes<uint64_t, sizeof(uint64_t)>()(*(uint64_t*)&val); return *(double*)&mem; } }; template<endianness from, endianness to, class T> struct do_byte_swap { inline T operator()(T value) { return swap_bytes<T, sizeof(T)>()(value); } }; // specialisations when attempting to swap to the same endianess template<class T> struct do_byte_swap<little_endian, little_endian, T> { inline T operator()(T value) { return value; } }; template<class T> struct do_byte_swap<big_endian, big_endian, T> { inline T operator()(T value) { return value; } }; } // namespace detail template<endianness from, endianness to, class T> inline T byte_swap(T value) { // ensure the data is only 1, 2, 4 or 8 bytes BOOST_STATIC_ASSERT(sizeof(T) == 1 || sizeof(T) == 2 || sizeof(T) == 4 || sizeof(T) == 8); // ensure we're only swapping arithmetic types BOOST_STATIC_ASSERT(boost::is_arithmetic<T>::value); return detail::do_byte_swap<from, to, T>()(value); }
有一个名为BSWAP的汇编指令,将为您做非常快的交换。 你可以在这里阅读。
Visual Studio,或者更确切地说是Visual C ++运行时库,具有用于此的平台内在函数,称为_byteswap_ushort(), _byteswap_ulong(), and _byteswap_int64()
。 其他平台也应该类似,但我不知道他们会被称为什么。
我们用模板做了这个。 你可能是这样的东西:
// Specialization for 2-byte types. template<> inline void endian_byte_swapper< 2 >(char* dest, char const* src) { // Use bit manipulations instead of accessing individual bytes from memory, much faster. ushort* p_dest = reinterpret_cast< ushort* >(dest); ushort const* const p_src = reinterpret_cast< ushort const* >(src); *p_dest = (*p_src >> 8) | (*p_src << 8); } // Specialization for 4-byte types. template<> inline void endian_byte_swapper< 4 >(char* dest, char const* src) { // Use bit manipulations instead of accessing individual bytes from memory, much faster. uint* p_dest = reinterpret_cast< uint* >(dest); uint const* const p_src = reinterpret_cast< uint const* >(src); *p_dest = (*p_src >> 24) | ((*p_src & 0x00ff0000) >> 8) | ((*p_src & 0x0000ff00) << 8) | (*p_src << 24); }
从大端到小端的过程与从小端到大端相同。
以下是一些示例代码:
void swapByteOrder(unsigned short& us) { us = (us >> 8) | (us << 8); } void swapByteOrder(unsigned int& ui) { ui = (ui >> 24) | ((ui<<8) & 0x00FF0000) | ((ui>>8) & 0x0000FF00) | (ui << 24); } void swapByteOrder(unsigned long long& ull) { ull = (ull >> 56) | ((ull<<40) & 0x00FF000000000000) | ((ull<<24) & 0x0000FF0000000000) | ((ull<<8) & 0x000000FF00000000) | ((ull>>8) & 0x00000000FF000000) | ((ull>>24) & 0x0000000000FF0000) | ((ull>>40) & 0x000000000000FF00) | (ull << 56); }
如果你这样做,在不同的平台之间传输数据看看ntoh和hton函数。
你在C中的相同方式:
short big = 0xdead; short little = (((big & 0xff)<<8) | ((big & 0xff00)>>8));
你也可以声明一个无符号字符向量,memcpyinput值,将字节转换为另一个向量,并将字节存入memcpy,但是比bit-twiddling要长数倍,特别是对于64位的值。
在大多数POSIX系统(通过它不是在POSIX标准),有endian.h,它可以用来确定您的系统使用什么编码。 从那里是这样的:
unsigned int change_endian(unsinged int x) { unsigned char *ptr = (unsigned char *)&x; return (ptr[0] << 24) | (ptr[1] << 16) | (ptr[2] << 8) | ptr[3]; }
这个交换顺序(从大端到小端):
如果你有数字0xDEADBEEF(在一个小的endian系统存储为0xEFBEADDE),ptr [0]将是0xEF,ptr [1]是0xBE,等等
但是如果你想使用它来进行networking连接,那么htons,htonl和htonll(以及它们的反托ntoh,ntohl和ntohll)将有助于将主机命令转换为networking命令。
请注意,至less在Windows中,htonl()比内部的_byteswap_ulong()要慢得多。 前者是将一个DLL库调用到ws2_32.dll中,后者是一个BSWAP汇编指令。 因此,如果您正在编写一些与平台相关的代码,则更喜欢使用intrinsics来提高速度:
#define htonl(x) _byteswap_ulong(x)
这对于.PNGimage processing尤其重要。在这种情况下,所有整数都以Big Endian格式保存,解释为“如果您没有准备,可以使用htonl()…”来减慢典型的Windows程序。
大多数平台都有一个系统头文件,可以提供高效的字节码function。 在Linux上,它在<endian.h>
。 你可以很好地包装在C ++中:
#include <iostream> #include <endian.h> template<size_t N> struct SizeT {}; #define BYTESWAPS(bits) \ template<class T> inline T htobe(T t, SizeT<bits / 8>) { return htobe ## bits(t); } \ template<class T> inline T htole(T t, SizeT<bits / 8>) { return htole ## bits(t); } \ template<class T> inline T betoh(T t, SizeT<bits / 8>) { return be ## bits ## toh(t); } \ template<class T> inline T letoh(T t, SizeT<bits / 8>) { return le ## bits ## toh(t); } BYTESWAPS(16) BYTESWAPS(32) BYTESWAPS(64) #undef BYTESWAPS template<class T> inline T htobe(T t) { return htobe(t, SizeT<sizeof t>()); } template<class T> inline T htole(T t) { return htole(t, SizeT<sizeof t>()); } template<class T> inline T betoh(T t) { return betoh(t, SizeT<sizeof t>()); } template<class T> inline T letoh(T t) { return letoh(t, SizeT<sizeof t>()); } int main() { std::cout << std::hex; std::cout << htobe(static_cast<unsigned short>(0xfeca)) << '\n'; std::cout << htobe(0xafbeadde) << '\n'; // Use ULL suffix to specify integer constant as unsigned long long std::cout << htobe(0xfecaefbeafdeedfeULL) << '\n'; }
输出:
cafe deadbeaf feeddeafbeefcafe
我喜欢这个,只是为了风格:-)
long swap(long i) { char *c = (char *) &i; return * (long *) (char[]) {c[3], c[2], c[1], c[0] }; }
认真…我不明白为什么所有的解决scheme都很复杂 ! 在任何操作系统的任何情况下,交换任何types的任何大小的最简单,最一般的模板函数怎么样?
template <typename T> void SwapEnd(T& var) { char* varArray = reinterpret_cast<char*>(&var); for(long i = 0; i < static_cast<long>(sizeof(var)/2); i++) std::swap(varArray[sizeof(var) - 1 - i],varArray[i]); }
这是C和C ++的魔力! 只需将原始可变字符交换一个字符。
请记住,我没有使用简单的赋值运算符“=”,因为当字节序翻转和复制构造函数(或赋值运算符)不起作用时,一些对象会被搞乱。 因此,通过char复制它们是更可靠的。
要调用它,只需使用
double x = 5; SwapEnd(x);
现在x
字节顺序是不同的。
我有这个代码,允许我从HOST_ENDIAN_ORDER(不pipe是什么)转换为LITTLE_ENDIAN_ORDER或BIG_ENDIAN_ORDER。 我使用一个模板,所以如果我尝试从HOST_ENDIAN_ORDER转换为LITTLE_ENDIAN_ORDER,并且它们恰好与我编译的机器相同,则不会生成任何代码。
这里是一些评论的代码:
// We define some constant for little, big and host endianess. Here I use // BOOST_LITTLE_ENDIAN/BOOST_BIG_ENDIAN to check the host indianess. If you // don't want to use boost you will have to modify this part a bit. enum EEndian { LITTLE_ENDIAN_ORDER, BIG_ENDIAN_ORDER, #if defined(BOOST_LITTLE_ENDIAN) HOST_ENDIAN_ORDER = LITTLE_ENDIAN_ORDER #elif defined(BOOST_BIG_ENDIAN) HOST_ENDIAN_ORDER = BIG_ENDIAN_ORDER #else #error "Impossible de determiner l'indianness du systeme cible." #endif }; // this function swap the bytes of values given it's size as a template // parameter (could sizeof be used?). template <class T, unsigned int size> inline T SwapBytes(T value) { union { T value; char bytes[size]; } in, out; in.value = value; for (unsigned int i = 0; i < size / 2; ++i) { out.bytes[i] = in.bytes[size - 1 - i]; out.bytes[size - 1 - i] = in.bytes[i]; } return out.value; } // Here is the function you will use. Again there is two compile-time assertion // that use the boost librarie. You could probably comment them out, but if you // do be cautious not to use this function for anything else than integers // types. This function need to be calles like this : // // int x = someValue; // int i = EndianSwapBytes<HOST_ENDIAN_ORDER, BIG_ENDIAN_ORDER>(x); // template<EEndian from, EEndian to, class T> inline T EndianSwapBytes(T value) { // A : La donnée à swapper à une taille de 2, 4 ou 8 octets BOOST_STATIC_ASSERT(sizeof(T) == 2 || sizeof(T) == 4 || sizeof(T) == 8); // A : La donnée à swapper est d'un type arithmetic BOOST_STATIC_ASSERT(boost::is_arithmetic<T>::value); // Si from et to sont du même type on ne swap pas. if (from == to) return value; return SwapBytes<T, sizeof(T)>(value); }
这是一个概括的版本,我想出了我的头顶,换了一个价值的地方。 如果性能是一个问题,其他build议会更好。
template<typename T> void ByteSwap(T * p) { for (int i = 0; i < sizeof(T)/2; ++i) std::swap(((char *)p)[i], ((char *)p)[sizeof(T)-1-i]); }
免责声明:我还没有尝试编译或testing呢。
如果采用常见的模式来颠倒单词中的位的顺序,并剔除在每个字节中反转位的部分,则只剩下一些只能反转单词中的字节的内容。 对于64位:
x = ((x & 0x00000000ffffffff) << 32) ^ ((x >> 32) & 0x00000000ffffffff); x = ((x & 0x0000ffff0000ffff) << 16) ^ ((x >> 16) & 0x0000ffff0000ffff); x = ((x & 0x00ff00ff00ff00ff) << 8) ^ ((x >> 8) & 0x00ff00ff00ff00ff);
编译器应该清理掉多余的位掩码操作(我把它留给突出显示的模式),但是如果没有,你可以用这种方法重写第一行:
x = ( x << 32) ^ (x >> 32);
这通常应该简化为大多数体系结构上的单个旋转指令(忽略整个操作可能是一个指令)。
在RISC处理器上,大的,复杂的常量可能会导致编译器的困难。 不过,你可以平均计算前一个常量。 像这样:
uint64_t k = 0x00000000ffffffff; /* compiler should know a trick for this */ x = ((x & k) << 32) ^ ((x >> 32) & k); k ^= k << 16; x = ((x & k) << 16) ^ ((x >> 16) & k); k ^= k << 8; x = ((x & k) << 8) ^ ((x >> 8) & k);
如果你喜欢,你可以把它写成一个循环。 这不会有效,但只是为了好玩:
int i = sizeof(x) * CHAR_BIT / 2; uintmax_t k = (1 << i) - 1; while (i >= 8) { x = ((x & k) << i) ^ ((x >> i) & k); i >>= 1; k ^= k << i; }
为了完整起见,下面是第一种forms的简化的32位版本:
x = ( x << 16) ^ (x >> 16); x = ((x & 0x00ff00ff) << 8) ^ ((x >> 8) & 0x00ff00ff);
如果一个大端的32位无符号整数看起来像0xAABBCCDD等于2864434397,那么同样的32位无符号整数在小端编码器上看起来就像是0xDDCCBBAA,它也等于2864434397。
如果一个大端的16位无符号短符号看起来像0xAABB,等于43707,那么同样的16位无符号短符号在小端处理器上看起来就像是0xBBAA,它也等于43707。
这里有几个方便的#define函数来交换从小端到大端的字节,反之亦然 – >
// can be used for short, unsigned short, word, unsigned word (2-byte types) #define BYTESWAP16(n) (((n&0xFF00)>>8)|((n&0x00FF)<<8)) // can be used for int or unsigned int or float (4-byte types) #define BYTESWAP32(n) ((BYTESWAP16((n&0xFFFF0000)>>16))|((BYTESWAP16(n&0x0000FFFF))<<16)) // can be used for unsigned long long or double (8-byte types) #define BYTESWAP64(n) ((BYTESWAP32((n&0xFFFFFFFF00000000)>>32))|((BYTESWAP32(n&0x00000000FFFFFFFF))<<32))
只是以为我在这里添加了我自己的解决scheme,因为我从来没有见过它。 这是一个小型的,可移植的C ++模板化function,只能使用位操作。
template<typename T> inline static T swapByteOrder(const T& val) { int totalBytes = sizeof(val); T swapped = (T) 0; for (int i = 0; i < totalBytes; ++i) { swapped |= (val >> (8*(totalBytes-i-1)) & 0xFF) << (8*i); } return swapped; }
通过下面给出的代码,您可以轻松地在BigEndian和LittleEndian之间进行切换
#define uint32_t unsigned #define uint16_t unsigned short #define swap16(x) ((((uint16_t)(x) & 0x00ff)<<8)| \ (((uint16_t)(x) & 0xff00)>>8)) #define swap32(x) ((((uint32_t)(x) & 0x000000ff)<<24)| \ (((uint32_t)(x) & 0x0000ff00)<<8)| \ (((uint32_t)(x) & 0x00ff0000)>>8)| \ (((uint32_t)(x) & 0xff000000)>>24))
哇,我简直不敢相信我在这里读到的一些答案。 实际上在汇编中有一个指令比其他任何东西都快。 BSWAP。 你可以简单地写一个这样的function…
__declspec(naked) uint32_t EndianSwap(uint32 value) { __asm { mov eax, dword ptr[esp + 4] bswap eax ret } }
它比build议的内在要快得多。 我已经拆开他们看了看。 上述function没有序幕/尾声,所以几乎没有任何开销。
unsigned long _byteswap_ulong(unsigned long value);
做16位一样容易,除了你会使用xchg al啊。 bswap只适用于32位寄存器。
64位有点棘手,但并不过分。 比上面所有使用循环和模板的例子好得多
这里有一些注意事项…首先bswap只能在80×486以上的CPU上使用。 有人打算在386上运行它吗? 如果是这样,你仍然可以用bswapreplace…
mov ebx, eax shr ebx, 16 xchg bl, bh xchg al, ah shl eax, 16 or eax, ebx
此外,内联汇编仅在Visual Studio中的x86代码中可用。 一个裸函数不能排队,也不能在x64版本中使用。 在那个例子中,你将不得不使用编译器内在函数。
用于实现优化器友好的未alignment非本地端访问者的便携式技术。 他们在每个编译器,每个边界alignment和每个字节顺序上工作。 这些未alignment的例程被补充或讨论,取决于本地sorting和alignment。 部分上市,但你明白了。 BO *是基于本地字节sorting的常量值。
uint32_t sw_get_uint32_1234(pu32) uint32_1234 *pu32; { union { uint32_1234 u32_1234; uint32_t u32; } bou32; bou32.u32_1234[0] = (*pu32)[BO32_0]; bou32.u32_1234[1] = (*pu32)[BO32_1]; bou32.u32_1234[2] = (*pu32)[BO32_2]; bou32.u32_1234[3] = (*pu32)[BO32_3]; return(bou32.u32); } void sw_set_uint32_1234(pu32, u32) uint32_1234 *pu32; uint32_t u32; { union { uint32_1234 u32_1234; uint32_t u32; } bou32; bou32.u32 = u32; (*pu32)[BO32_0] = bou32.u32_1234[0]; (*pu32)[BO32_1] = bou32.u32_1234[1]; (*pu32)[BO32_2] = bou32.u32_1234[2]; (*pu32)[BO32_3] = bou32.u32_1234[3]; } #if HAS_SW_INT64 int64 sw_get_int64_12345678(pi64) int64_12345678 *pi64; { union { int64_12345678 i64_12345678; int64 i64; } boi64; boi64.i64_12345678[0] = (*pi64)[BO64_0]; boi64.i64_12345678[1] = (*pi64)[BO64_1]; boi64.i64_12345678[2] = (*pi64)[BO64_2]; boi64.i64_12345678[3] = (*pi64)[BO64_3]; boi64.i64_12345678[4] = (*pi64)[BO64_4]; boi64.i64_12345678[5] = (*pi64)[BO64_5]; boi64.i64_12345678[6] = (*pi64)[BO64_6]; boi64.i64_12345678[7] = (*pi64)[BO64_7]; return(boi64.i64); } #endif int32_t sw_get_int32_3412(pi32) int32_3412 *pi32; { union { int32_3412 i32_3412; int32_t i32; } boi32; boi32.i32_3412[2] = (*pi32)[BO32_0]; boi32.i32_3412[3] = (*pi32)[BO32_1]; boi32.i32_3412[0] = (*pi32)[BO32_2]; boi32.i32_3412[1] = (*pi32)[BO32_3]; return(boi32.i32); } void sw_set_int32_3412(pi32, i32) int32_3412 *pi32; int32_t i32; { union { int32_3412 i32_3412; int32_t i32; } boi32; boi32.i32 = i32; (*pi32)[BO32_0] = boi32.i32_3412[2]; (*pi32)[BO32_1] = boi32.i32_3412[3]; (*pi32)[BO32_2] = boi32.i32_3412[0]; (*pi32)[BO32_3] = boi32.i32_3412[1]; } uint32_t sw_get_uint32_3412(pu32) uint32_3412 *pu32; { union { uint32_3412 u32_3412; uint32_t u32; } bou32; bou32.u32_3412[2] = (*pu32)[BO32_0]; bou32.u32_3412[3] = (*pu32)[BO32_1]; bou32.u32_3412[0] = (*pu32)[BO32_2]; bou32.u32_3412[1] = (*pu32)[BO32_3]; return(bou32.u32); } void sw_set_uint32_3412(pu32, u32) uint32_3412 *pu32; uint32_t u32; { union { uint32_3412 u32_3412; uint32_t u32; } bou32; bou32.u32 = u32; (*pu32)[BO32_0] = bou32.u32_3412[2]; (*pu32)[BO32_1] = bou32.u32_3412[3]; (*pu32)[BO32_2] = bou32.u32_3412[0]; (*pu32)[BO32_3] = bou32.u32_3412[1]; } float sw_get_float_1234(pf) float_1234 *pf; { union { float_1234 f_1234; float f; } bof; bof.f_1234[0] = (*pf)[BO32_0]; bof.f_1234[1] = (*pf)[BO32_1]; bof.f_1234[2] = (*pf)[BO32_2]; bof.f_1234[3] = (*pf)[BO32_3]; return(bof.f); } void sw_set_float_1234(pf, f) float_1234 *pf; float f; { union { float_1234 f_1234; float f; } bof; bof.f = (float)f; (*pf)[BO32_0] = bof.f_1234[0]; (*pf)[BO32_1] = bof.f_1234[1]; (*pf)[BO32_2] = bof.f_1234[2]; (*pf)[BO32_3] = bof.f_1234[3]; } double sw_get_double_12345678(pd) double_12345678 *pd; { union { double_12345678 d_12345678; double d; } bod; bod.d_12345678[0] = (*pd)[BO64_0]; bod.d_12345678[1] = (*pd)[BO64_1]; bod.d_12345678[2] = (*pd)[BO64_2]; bod.d_12345678[3] = (*pd)[BO64_3]; bod.d_12345678[4] = (*pd)[BO64_4]; bod.d_12345678[5] = (*pd)[BO64_5]; bod.d_12345678[6] = (*pd)[BO64_6]; bod.d_12345678[7] = (*pd)[BO64_7]; return(bod.d); } void sw_set_double_12345678(pd, d) double_12345678 *pd; double d; { union { double_12345678 d_12345678; double d; } bod; bod.d = d; (*pd)[BO64_0] = bod.d_12345678[0]; (*pd)[BO64_1] = bod.d_12345678[1]; (*pd)[BO64_2] = bod.d_12345678[2]; (*pd)[BO64_3] = bod.d_12345678[3]; (*pd)[BO64_4] = bod.d_12345678[4]; (*pd)[BO64_5] = bod.d_12345678[5]; (*pd)[BO64_6] = bod.d_12345678[6]; (*pd)[BO64_7] = bod.d_12345678[7]; }
These typedefs have the benefit of raising compiler errors if not used with accessors, thus mitigating forgotten accessor bugs.
typedef char int8_1[1], uint8_1[1]; typedef char int16_12[2], uint16_12[2]; /* little endian */ typedef char int16_21[2], uint16_21[2]; /* big endian */ typedef char int24_321[3], uint24_321[3]; /* Alpha Micro, PDP-11 */ typedef char int32_1234[4], uint32_1234[4]; /* little endian */ typedef char int32_3412[4], uint32_3412[4]; /* Alpha Micro, PDP-11 */ typedef char int32_4321[4], uint32_4321[4]; /* big endian */ typedef char int64_12345678[8], uint64_12345678[8]; /* little endian */ typedef char int64_34128756[8], uint64_34128756[8]; /* Alpha Micro, PDP-11 */ typedef char int64_87654321[8], uint64_87654321[8]; /* big endian */ typedef char float_1234[4]; /* little endian */ typedef char float_3412[4]; /* Alpha Micro, PDP-11 */ typedef char float_4321[4]; /* big endian */ typedef char double_12345678[8]; /* little endian */ typedef char double_78563412[8]; /* Alpha Micro? */ typedef char double_87654321[8]; /* big endian */
I recently wrote a macro to do this in C, but it's equally valid in C++:
#define REVERSE_BYTES(...) do for(size_t REVERSE_BYTES=0; REVERSE_BYTES<sizeof(__VA_ARGS__)>>1; ++REVERSE_BYTES)\ ((unsigned char*)&(__VA_ARGS__))[REVERSE_BYTES] ^= ((unsigned char*)&(__VA_ARGS__))[sizeof(__VA_ARGS__)-1-REVERSE_BYTES],\ ((unsigned char*)&(__VA_ARGS__))[sizeof(__VA_ARGS__)-1-REVERSE_BYTES] ^= ((unsigned char*)&(__VA_ARGS__))[REVERSE_BYTES],\ ((unsigned char*)&(__VA_ARGS__))[REVERSE_BYTES] ^= ((unsigned char*)&(__VA_ARGS__))[sizeof(__VA_ARGS__)-1-REVERSE_BYTES];\ while(0)
It accepts any type and reverses the bytes in the passed argument. Example usages:
int main(){ unsigned long long x = 0xABCDEF0123456789; printf("Before: %llX\n",x); REVERSE_BYTES(x); printf("After : %llX\n",x); char c[7]="nametag"; printf("Before: %c%c%c%c%c%c%c\n",c[0],c[1],c[2],c[3],c[4],c[5],c[6]); REVERSE_BYTES(c); printf("After : %c%c%c%c%c%c%c\n",c[0],c[1],c[2],c[3],c[4],c[5],c[6]); }
打印:
Before: ABCDEF0123456789 After : 8967452301EFCDAB Before: nametag After : gateman
The above is perfectly copy/paste-able, but there's a lot going on here, so I'll break down how it works piece by piece:
The first notable thing is that the entire macro is encased in a do while(0)
block. This is a common idiom to allow normal semicolon use after the macro.
Next up is the use of a variable named REVERSE_BYTES
as the for
loop's counter. The name of the macro itself is used as a variable name to ensure that it doesn't clash with any other symbols that may be in scope wherever the macro is used. Since the name is being used within the macro's expansion, it won't be expanded again when used as a variable name here.
Within the for
loop, there are two bytes being referenced and XOR swapped (so a temporary variable name is not required):
((unsigned char*)&(__VA_ARGS__))[REVERSE_BYTES] ((unsigned char*)&(__VA_ARGS__))[sizeof(__VA_ARGS__)-1-REVERSE_BYTES]
__VA_ARGS__
represents whatever was given to the macro, and is used to increase the flexibility of what may be passed in (albeit not by much). The address of this argument is then taken and cast to an unsigned char
pointer to permit the swapping of its bytes via array []
subscripting.
The final peculiar point is the lack of {}
braces. They aren't necessary because all of the steps in each swap are joined with the comma operator , making them one statement.
Finally, it's worth noting that this is not the ideal approach if speed is a top priority. If this is an important factor, some of the type-specific macros or platform-specific directives referenced in other answers are likely a better option. This approach, however, is portable to all types, all major platforms, and both the C and C++ languages.
I am really surprised no one mentioned htobeXX and betohXX functions. They are defined in endian.h and are very similar to network functions htonXX.
Here's how to read a double stored in IEEE 754 64 bit format, even if your host computer uses a different system.
/* * read a double from a stream in ieee754 format regardless of host * encoding. * fp - the stream * bigendian - set to if big bytes first, clear for little bytes * first * */ double freadieee754(FILE *fp, int bigendian) { unsigned char buff[8]; int i; double fnorm = 0.0; unsigned char temp; int sign; int exponent; double bitval; int maski, mask; int expbits = 11; int significandbits = 52; int shift; double answer; /* read the data */ for (i = 0; i < 8; i++) buff[i] = fgetc(fp); /* just reverse if not big-endian*/ if (!bigendian) { for (i = 0; i < 4; i++) { temp = buff[i]; buff[i] = buff[8 - i - 1]; buff[8 - i - 1] = temp; } } sign = buff[0] & 0x80 ? -1 : 1; /* exponet in raw format*/ exponent = ((buff[0] & 0x7F) << 4) | ((buff[1] & 0xF0) >> 4); /* read inthe mantissa. Top bit is 0.5, the successive bits half*/ bitval = 0.5; maski = 1; mask = 0x08; for (i = 0; i < significandbits; i++) { if (buff[maski] & mask) fnorm += bitval; bitval /= 2.0; mask >>= 1; if (mask == 0) { mask = 0x80; maski++; } } /* handle zero specially */ if (exponent == 0 && fnorm == 0) return 0.0; shift = exponent - ((1 << (expbits - 1)) - 1); /* exponent = shift + bias */ /* nans have exp 1024 and non-zero mantissa */ if (shift == 1024 && fnorm != 0) return sqrt(-1.0); /*infinity*/ if (shift == 1024 && fnorm == 0) { #ifdef INFINITY return sign == 1 ? INFINITY : -INFINITY; #endif return (sign * 1.0) / 0.0; } if (shift > -1023) { answer = ldexp(fnorm + 1.0, shift); return answer * sign; } else { /* denormalised numbers */ if (fnorm == 0.0) return 0.0; shift = -1022; while (fnorm < 1.0) { fnorm *= 2; shift--; } answer = ldexp(fnorm, shift); return answer * sign; } }
For the rest of the suite of functions, including the write and the integer routines see my github project
Try Boost::endian
, and DO NOT IMPLEMENT IT YOURSELF!
Here's a link
Look up bit shifting, as this is basically all you need to do to swap from little -> big endian. Then depending on the bit size, you change how you do the bit shifting.