Bash：限制并发作业的数量？

有没有简单的方法来限制bash中的并发作业数量？我的意思是当在后台运行n个以上的并发作业时，使得＆阻塞。

我知道我可以用ps |来实现这个 grep式的技巧，但有没有更简单的方法？

如果你安装了GNU Parallel http://www.gnu.org/software/parallel/，你可以这样做：;

parallel gzip ::: *.log

它将每个CPU核心运行一个gzip，直到所有的日志文件都被压缩。

如果它是一个更大的循环的一部分，你可以使用sem代替：

 for i in *.log ; do echo $i Do more stuff here sem -j+0 gzip $i ";" echo done done sem --wait

它也会这样做，但给你一个机会，为每个文件做更多的东西。

如果您的发行版没有打包GNU Parallel，则可以通过以下方式安装GNU Parallel：

 (wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

它将下载，签名，并进行个人安装，如果它不能全球安装。

观看GNU Parallel的介绍video了解更多信息： https ： //www.youtube.com/playlist？list = PL284C9FF2488BC6D1

下面的脚本显示了一个用函数完成这个操作的方法。您可以将bgxupdate和bgxlimit函数放在您的脚本中，或者将它们放在一个单独的文件中，该文件源自您的脚本：

 . /path/to/bgx.sh

它的优点是可以独立维护多组进程（例如，可以运行一个限制为10的组，另一个完全独立的组限制为3）。

它使用bash内置的jobs来获取subprocess列表，但是将它们保存在单个variables中。在底部的循环中，您可以看到如何调用bgxlimit函数：

设置一个空的组variables。
转移到bgxgrp 。
用你想要运行的限制和命令调用bgxlimit 。
将新组转移回您的组variables。

当然，如果你只有一个组，直接使用bgxgrp而不是bgxgrp和传出。

 #!/bin/bash # bgxupdate - update active processes in a group. # Works by transferring each process to new group # if it is still active. # in: bgxgrp - current group of processes. # out: bgxgrp - new group of processes. # out: bgxcount - number of processes in new group. bgxupdate() { bgxoldgrp=${bgxgrp} bgxgrp="" ((bgxcount = 0)) bgxjobs=" $(jobs -pr | tr '\n' ' ')" for bgxpid in ${bgxoldgrp} ; do echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1 if [[ $? -eq 0 ]] ; then bgxgrp="${bgxgrp} ${bgxpid}" ((bgxcount = bgxcount + 1)) fi done } # bgxlimit - start a sub-process with a limit. # Loops, calling bgxupdate until there is a free # slot to run another sub-process. Then runs it # an updates the process group. # in: $1 - the limit on processes. # in: $2+ - the command to run for new process. # in: bgxgrp - the current group of processes. # out: bgxgrp - new group of processes bgxlimit() { bgxmax=$1 ; shift bgxupdate while [[ ${bgxcount} -ge ${bgxmax} ]] ; do sleep 1 bgxupdate done if [[ "$1" != "-" ]] ; then $* & bgxgrp="${bgxgrp} $!" fi } # Test program, create group and run 6 sleeps with # limit of 3. group1="" echo 0 $(date | awk '{print $4}') '[' ${group1} ']' echo for i in 1 2 3 4 5 6 ; do bgxgrp=${group1} ; bgxlimit 3 sleep ${i}0 ; group1=${bgxgrp} echo ${i} $(date | awk '{print $4}') '[' ${group1} ']' done # Wait until all others are finished. echo bgxgrp=${group1} ; bgxupdate ; group1=${bgxgrp} while [[ ${bgxcount} -ne 0 ]] ; do oldcount=${bgxcount} while [[ ${oldcount} -eq ${bgxcount} ]] ; do sleep 1 bgxgrp=${group1} ; bgxupdate ; group1=${bgxgrp} done echo 9 $(date | awk '{print $4}') '[' ${group1} ']' done

这是一个示例运行：

 0 12:38:00 [ ] 1 12:38:00 [ 3368 ] 2 12:38:00 [ 3368 5880 ] 3 12:38:00 [ 3368 5880 2524 ] 4 12:38:10 [ 5880 2524 1560 ] 5 12:38:20 [ 2524 1560 5032 ] 6 12:38:30 [ 1560 5032 5212 ] 9 12:38:50 [ 5032 5212 ] 9 12:39:10 [ 5212 ] 9 12:39:30 [ ]

整个事情从12:38:00开始，正如你所看到的，前三个进程立即运行。
每个进程睡眠n*10秒，所以第四个进程直到第一个退出（时间t = 10或12:38:10）才开始。在添加1560之前，您可以看到进程3368已从列表中消失。
类似地，当第二个（5880）在时间t = 20退出时，第五个过程（5032）开始。
最后，当第三个（2524）在时间t = 30退出时，第六个过程（5212）开始。
第四个过程在t = 50（从10开始，持续时间为40），第五个在t = 70（开始于20，持续时间50）和第六个在t = 90（从30开始，持续60 ）。

或者，以时间线forms：

 Process: 1 2 3 4 5 6 -------- - - - - - - 12:38:00 ^ ^ ^ 12:38:10 v | | ^ 12:38:20 v | | ^ 12:38:30 v | | ^ 12:38:40 | | | 12:38:50 v | | 12:39:00 | | 12:39:10 v | 12:39:20 | 12:39:30 v

一个小bash脚本可以帮助你：

 # content of script exec-async.sh joblist=($(jobs -p)) while (( ${#joblist[*]} >= 3 )) do sleep 1 joblist=($(jobs -p)) done $* &

如果你打电话给：

 . exec-async.sh sleep 10

…四次，前三个电话将立即返回，第四个电话将被阻止，直到有less于三个工作正在运行。

您需要在当前会话内通过加上前缀来启动这个脚本. ，因为jobs只列出当前会话的作业。

里面的sleep是丑陋的，但我没有find一种方法来等待终止的第一份工作。

假设你想写这样的代码：

 for x in $(seq 1 100); do # 100 things we want to put into the background. max_bg_procs 5 # Define the limit. See below. your_intensive_job & done

max_bg_procs应放在你的.bashrc ：

 function max_bg_procs { if [[ $# -eq 0 ]] ; then echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)" echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS" return fi local max_number=$((0 + ${1:-0})) while true; do local current_number=$(jobs -pr | wc -l) if [[ $current_number -lt $max_number ]]; then break fi sleep 1 done }

这对于大多数目的来说可能是足够好的，但并不是最佳的。

 #!/bin/bash n=0 maxjobs=10 for i in *.m4a ; do # ( DO SOMETHING ) & # limit jobs if (( $(($((++n)) % $maxjobs)) == 0 )) ; then wait # wait until all have finished (not optimal, but most times good enough) echo $n wait fi done

这是最简单的方法：

 waitforjobs() { while test $(jobs -p | wc -w) -ge "$1"; do wait -n; done }

在分出任何新工作之前调用这个函数：

 waitforjobs 10 run_another_job &

要拥有与机器上核心一样多的背景作业，请使用$(nproc)而不是像10这样的固定数字。

如果你愿意在纯粹的事情之外做到这一点，你应该看看一个工作排队系统。

例如，有GNU队列或PBS 。而对于PBS，你可能要考虑毛伊岛的configuration。

两个系统都需要一些configuration，但是完全可以允许一次执行特定数量的作业，只有在正在运行的作业完成时才启动新排队的作业。通常，这些作业排队系统将用于超级计算群集，在这些群集中，您希望为特定的批处理作业分配特定数量的内存或计算时间; 但是，没有理由不考虑计算时间或内存限制，不能在单台台式机上使用其中的一台。

您是否考虑过启动十个长时间运行的侦听器进程并通过命名pipe道与它们进行通信？

你可以使用ulimit -u参见http://ss64.com/bash/ulimit.html

在Linux上，我使用它来限制bash作业的可用CPU数量（可能通过设置CPU_NUMBER覆盖）。

 [ "$CPU_NUMBER" ] || CPU_NUMBER="`nproc 2>/dev/null || echo 1`" while [ "$1" ]; do { do something with $1 in parallel echo "[$# items left] $1 done" } & while true; do # load the PIDs of all child processes to the array joblist=(`jobs -p`) if [ ${#joblist[*]} -ge "$CPU_NUMBER" ]; then # when the job limit is reached, wait for *single* job to finish wait -n else # stop checking when we're below the limit break fi done # it's great we executed zero external commands to check! shift done # wait for all currently active child processes wait

下面的函数（从tangens开发回答上面，要么复制到脚本或源文件）：

 job_limit () { # Test for single positive integer input if (( $# == 1 )) && [[ $1 =~ ^[1-9][0-9]*$ ]] then # Check number of running jobs joblist=($(jobs -rp)) while (( ${#joblist[*]} >= $1 )) do # Wait for any job to finish command='wait '${joblist[0]} for job in ${joblist[@]:1} do command+=' || wait '$job done eval $command joblist=($(jobs -rp)) done fi }

1）只需要插入一行来限制现有的循环

 while : do task & job_limit `nproc` done

2）等待完成现有的后台任务而不是轮询，提高快速任务的效率

Bash：限制并发作业的数量？

如何从ConcurrentBag中删除所有项目？

你有没有在Java中使用volatile关键字？

Node.js或Erlang

并发编程和并行编程有什么区别？

关于“Java并发实践”的例子

为什么ConcurrentBag <T>在.Net（4.0）中很慢？我做错了吗？

Java中的易变Vs静态

使用JFreeChart更改系列时出现随机错误

Java并发性：倒数锁存与循环障碍

在Java中编程死锁检测

Bash：限制并发作业的数量？

如何从ConcurrentBag中删除所有项目？

你有没有在Java中使用volatile关键字？

Node.js或Erlang

并发编程和并行编程有什么区别？

关于“Java并发实践”的例子

为什么ConcurrentBag <T>在.Net（4.0）中很慢？ 我做错了吗？

Java中的易变Vs静态

使用JFreeChart更改系列时出现随机错误

Java并发性：倒数锁存与循环障碍

在Java中编程死锁检测

为什么ConcurrentBag <T>在.Net（4.0）中很慢？我做错了吗？