在linq中创build批次

有人可以build议一种方法来创buildlinq中的一定大小的批次？

理想情况下，我希望能够以一些可configuration数量的块进行操作。

你不需要写任何代码。使用MoreLINQ Batch方法，将源序列批量处理成大小的桶（更多的LINQ可作为NuGet包提供，您可以安装）：

int size = 10; var batches = sequence.Batch(size);

其中实现为：

 public static IEnumerable<IEnumerable<TSource>> Batch<TSource>( this IEnumerable<TSource> source, int size) { TSource[] bucket = null; var count = 0; foreach (var item in source) { if (bucket == null) bucket = new TSource[size]; bucket[count++] = item; if (count != size) continue; yield return bucket; bucket = null; count = 0; } if (bucket != null && count > 0) yield return bucket.Take(count); }

 public static class MyExtensions { public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems) { return items.Select((item, inx) => new { item, inx }) .GroupBy(x => x.inx / maxItems) .Select(g => g.Select(x => x.item)); } }

用法是：

 List<int> list = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; foreach(var batch in list.Batch(3)) { Console.WriteLine(String.Join(",",batch)); }

OUTPUT：

 0,1,2 3,4,5 6,7,8 9

以上所有内容都是大批量或低内存空间。不得不写我自己的pipe道（通知任何地方没有任何物品）：

 public static class BatchLinq { public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) { if (size <= 0) throw new ArgumentOutOfRangeException("size", "Must be greater than zero."); using (IEnumerator<T> enumerator = source.GetEnumerator()) while (enumerator.MoveNext()) yield return TakeIEnumerator(enumerator, size); } private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) { int i = 0; do yield return source.Current; while (++i < size && source.MoveNext()); } }

编辑：这种方法的已知问题是每个批次必须枚举并完全枚举，然后再转移到下一批次。例如，这不起作用：

 //Select first item of every 100 items Batch(list, 100).Select(b => b.First())

如果从定义为IEnumerable<T> sequence开始，并且您知道可以安全地枚举多次（例如，因为它是一个数组或列表），则可以使用此简单模式批量处理元素：

 while (sequence.Any()) { var batch = sequence.Take(10); sequence = sequence.Skip(10); // do whatever you need to do with each batch here }

我join这个很晚，但我发现更有趣的东西。

所以我们可以在这里使用Skip和Take获得更好的性能。

 public static class MyExtensions { public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems) { return items.Select((item, index) => new { item, index }) .GroupBy(x => x.index / maxItems) .Select(g => g.Select(x => x.item)); } public static IEnumerable<T> Batch2<T>(this IEnumerable<T> items, int skip, int take) { return items.Skip(skip).Take(take); } }

接下来我检查了100000条logging。循环只是在Batch情况下花费更多的时间

代码控制台应用程序。

 static void Main(string[] args) { List<string> Ids = GetData("First"); List<string> Ids2 = GetData("tsriF"); Stopwatch FirstWatch = new Stopwatch(); FirstWatch.Start(); foreach (var batch in Ids2.Batch(5000)) { // Console.WriteLine("Batch Ouput:= " + string.Join(",", batch)); } FirstWatch.Stop(); Console.WriteLine("Done Processing time taken:= "+ FirstWatch.Elapsed.ToString()); Stopwatch Second = new Stopwatch(); Second.Start(); int Length = Ids2.Count; int StartIndex = 0; int BatchSize = 5000; while (Length > 0) { var SecBatch = Ids2.Batch2(StartIndex, BatchSize); // Console.WriteLine("Second Batch Ouput:= " + string.Join(",", SecBatch)); Length = Length - BatchSize; StartIndex += BatchSize; } Second.Stop(); Console.WriteLine("Done Processing time taken Second:= " + Second.Elapsed.ToString()); Console.ReadKey(); } static List<string> GetData(string name) { List<string> Data = new List<string>(); for (int i = 0; i < 100000; i++) { Data.Add(string.Format("{0} {1}", name, i.ToString())); } return Data; }

所花的时间是这样的。

首先 – 00：00：00.0708，00：00：00.0660

第二个（跳过和跳过） – 00：00：00.0008，00：00：00.0008

与MoreLINQ相同，但使用List而不是Array。我没有做基准testing，但可读性对于一些人来说更重要：

  public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) { List<T> batch = new List<T>(); foreach (var item in source) { batch.Add(item); if (batch.Count >= size) { yield return batch; batch.Clear(); } } if (batch.Count > 0) { yield return batch; } }

这是Nick Whaley在上面的post的扩展，它解决了第一个问题。

元素仍然必须按照严格的顺序枚举不超过一次。如果某些元素在内部循环中没有被使用，它们将被丢弃（并且试图通过保存的迭代器再次访问它们将抛出InvalidOperationException: Enumeration already finished. ）。

您可以在.NET Fiddle中testing完整的示例。

 public static class BatchLinq { public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) { if (size <= 0) throw new ArgumentOutOfRangeException("size", "Must be greater than zero."); using (var enumerator = source.GetEnumerator()) while (enumerator.MoveNext()) { var i = new BatchInner(); var e = i.BatchInner(enumerator, size); yield return e; if (!i.done) e.Count(); } } private class BatchInner { public bool done = false; public IEnumerable<T> BatchInner<T>(IEnumerator<T> source, int size) { int i = 0; do yield return source.Current; while (++i < size && source.MoveNext()); done = true; } } }

我想不出有什么方法可以在没有包装的情况下获得done价值。因此，不幸的是每个批次都会产生一次分配（如果没有优化，我不知道是否可以）。我想把它从外部循环中提出来，但这可能会导致多个迭代器同时工作的问题。它可能会起作用，但是我必须比我更确定周围的确切语义。

  static IEnumerable<IEnumerable<T>> TakeBatch<T>(IEnumerable<T> ts,int batchSize) { return from @group in ts.Select((x, i) => new { x, i }).ToLookup(xi => xi.i / batchSize) select @group.Select(xi => xi.x); }

在linq中创build批次

使用没有“catch”块的“try-finally”块

您必须添加对程序集mscorlib的引用，version = 4.0.0

箭头运算符（ – >）在C中的使用

在Roslyn中委派caching行为的变化

找不到Microsoft.Office.Interop Visual Studio

如何用c ++ 11 CAS实现ABA计数器？

什么是AsyncCallback？

我怎么知道系统滚动条的当前宽度？

引用危险是否引发exception？

如何检查一个对象是否可以空？