为什么Calli比代表电话更快？

我正在玩Reflection.Emit，并且发现了关于使用不多的EmitCalli 。好奇，我想知道它是否与常规方法调用有所不同，所以我掀起了下面的代码：

 using System; using System.Diagnostics; using System.Reflection.Emit; using System.Runtime.InteropServices; using System.Security; [SuppressUnmanagedCodeSecurity] static class Program { const long COUNT = 1 << 22; static readonly byte[] multiply = IntPtr.Size == sizeof(int) ? new byte[] { 0x8B, 0x44, 0x24, 0x04, 0x0F, 0xAF, 0x44, 0x24, 0x08, 0xC3 } : new byte[] { 0x0f, 0xaf, 0xca, 0x8b, 0xc1, 0xc3 }; static void Main() { var handle = GCHandle.Alloc(multiply, GCHandleType.Pinned); try { //Make the native method executable uint old; VirtualProtect(handle.AddrOfPinnedObject(), (IntPtr)multiply.Length, 0x40, out old); var mulDelegate = (BinaryOp)Marshal.GetDelegateForFunctionPointer( handle.AddrOfPinnedObject(), typeof(BinaryOp)); var T = typeof(uint); //To avoid redundant typing //Generate the method var method = new DynamicMethod("Mul", T, new Type[] { T, T }, T.Module); var gen = method.GetILGenerator(); gen.Emit(OpCodes.Ldarg_0); gen.Emit(OpCodes.Ldarg_1); gen.Emit(OpCodes.Ldc_I8, (long)handle.AddrOfPinnedObject()); gen.Emit(OpCodes.Conv_I); gen.EmitCalli(OpCodes.Calli, CallingConvention.StdCall, T, new Type[] { T, T }); gen.Emit(OpCodes.Ret); var mulCalli = (BinaryOp)method.CreateDelegate(typeof(BinaryOp)); var sw = Stopwatch.StartNew(); for (int i = 0; i < COUNT; i++) { mulDelegate(2, 3); } Console.WriteLine("Delegate: {0:N0}", sw.ElapsedMilliseconds); sw.Reset(); sw.Start(); for (int i = 0; i < COUNT; i++) { mulCalli(2, 3); } Console.WriteLine("Calli: {0:N0}", sw.ElapsedMilliseconds); } finally { handle.Free(); } } delegate uint BinaryOp(uint a, uint b); [DllImport("kernel32.dll", SetLastError = true)] static extern bool VirtualProtect( IntPtr address, IntPtr size, uint protect, out uint oldProtect); }

我以x86模式和x64模式运行代码。结果？

32位：

代表版本：994

Calli版本：46

64位：

代表版本：326

Calli版本：83

我想现在这个问题是显而易见的…为什么会有如此巨大的速度差异？

更新：

我也创build了一个64位的P / Invoke版本：

代表版本：284

Calli版本：77

P /调用版本：31

显然，P / Invoke更快…这是我的基准testing的问题，还是有什么事情我不明白？（顺便说一下，我在发布模式。）

鉴于你的性能数字，我假设你必须使用2.0框架，或类似的东西？ 4.0中的数字要好得多，但是“Marshal.GetDelegate”版本仍然比较慢。

问题是并不是所有的代表都是平等的。

托pipe代码函数的委托本质上只是一个直接的函数调用（在x86上，这是一个__fastcall），如果调用一个静态函数（但在x86上只有3或4个指令），还需要添加一个“switcheroo”。

另一方面，由“Marshal.GetDelegateForFunctionPointer”创build的代理是一个直接调用“存根”函数的函数，在调用非托pipe函数之前会花费一些开销（调度和什么）。在这种情况下，有很less的编组，这个调用的编组似乎在4.0中进行了很多优化（但很可能仍然通过2.0的ML解释器） – 但即使在4.0，有一个stackWalk要求非托pipe代码的权限不是你的calli委托的一部分。

通常我发现，在.NET开发团队中，不了解别人，在弄清楚pipe理/非托pipe互操作的情况下，最好的办法就是用WinDbg和SOS进行一些挖掘工作。

很难回答:)无论如何，我会尝试。

EmitCalli更快，因为它是一个原始的字节码调用。我怀疑SuppressUnmanagedCodeSecurity也会禁用一些检查，例如堆栈溢出/数组越界索引检查。所以代码是不安全的，全速运行。

委托版本将有一些编译代码来检查键入，并且还将执行一个去引用调用（因为委托就像一个types化函数指针）。

我的两分钱！

为什么Calli比代表电话更快？

是否可以限制Parallel.ForEach的内核？

ObserveOn和SubscribeOn – 工作正在完成

entity framework和连接池

如何将一个“大写字母”分隔的string分割成一个数组？

如何获得unit testing在x64平台上运行

如何检查IEnumerable是否为空或空？

如何使用Windows窗体创build一个幻方块？

LINQPad 方法

它是如何从一个枚举派生System.Enum是一个整数在同一时间？

用无限参数创build方法？