在一个序列中找到零的岛屿

想象一下，你有一个很长的序列。什么是找到序列全部为零的间隔的最有效的方法（或者更确切地说序列降到接近零的值abs(X)<eps ）：

为了简单，让我们假设以下顺序：

 sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0];

我试图得到以下信息：

 startIndex EndIndex Duration 3 6 4 12 12 1 14 16 3 25 26 2 30 30 1

然后使用这个信息，我们找到持续时间> =的某个指定值（比如说3 ）的间隔，然后返回所有这些间隔中的值的索引：

 indices = [3 4 5 6 14 15 16];

最后一部分与前面的问题有关：

MATLAB：从开始/结束索引列表创建矢量数组

这是我迄今为止：

 sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0]; len = length(sig); thresh = 3; %# align the signal with itself successively shifted by one %# v will thus contain 1 in the starting locations of the zero interval v = true(1,len-thresh+1); for i=1:thresh v = v & ( sig(i:len-thresh+i) == 0 ); end %# extend the 1's till the end of the intervals for i=1:thresh-1 v(find(v)+1) = true; end %# get the final indices v = find(v);

我正在寻找矢量化/优化的代码，但我打开其他解决方案。我必须强调的是，由于我正在处理大量的长生物信号，空间和时间效率非常重要。

这些是我将采取的步骤，以矢量化的方式解决您的问题，从给定的向量sig ：

首先，对矢量进行阈值处理，得到一个零和一个矢量（零点，其中信号的绝对值接近于零，其他位置的零点）：
```
 tsig = (abs(sig) >= eps); %# Using eps as the threshold 
```

接下来，使用函数DIFF和FIND查找每个零串的起始索引，结束索引和持续时间：

 dsig = diff([1 tsig 1]); startIndex = find(dsig < 0); endIndex = find(dsig > 0)-1; duration = endIndex-startIndex+1;

然后，查找持续时间大于或等于某个值的零的字符串（例如3，在您的示例中）：

 stringIndex = (duration >= 3); startIndex = startIndex(stringIndex); endIndex = endIndex(stringIndex);

最后，使用我的答案中的方法链接的问题来生成您的最后一组索引：

 indices = zeros(1,max(endIndex)+1); indices(startIndex) = 1; indices(endIndex+1) = indices(endIndex+1)-1; indices = find(cumsum(indices));

你可以解决这个问题，作为一个字符串搜索任务，通过查找字符串长度为零的零（STRFIND函数是非常快的）

 startIndex = strfind(sig, zeros(1,thresh));

请注意，较长的子字符串将在多个位置得到标记，但是一旦我们在从startIndex开始到start+thresh-1结束之间添加中间位置，最终将会连接在一起。

 indices = unique( bsxfun(@plus, startIndex', 0:thresh-1) )';

请注意，您始终可以通过链接问题中的@gnovice与CUMSUM / FIND解决方案交换最后一步。

这里是在numpy（也在这里回答）

 def nonzero_intervals(vec): ''' Find islands of non-zeros in the vector vec ''' if len(vec)==0: return [] elif not isinstance(vec, np.ndarray): vec = np.array(vec) edges, = np.nonzero(np.diff((vec==0)*1)) edge_vec = [edges+1] if vec[0] != 0: edge_vec.insert(0, [0]) if vec[-1] != 0: edge_vec.append([len(vec)]) edges = np.concatenate(edge_vec) return zip(edges[::2], edges[1::2])

例如：

 a=[1, 2, 0, 0, 0, 3, 4, 0] intervals = nonzero_intervals(a) assert intervals == [(0, 2), (5, 7)]

 function indice=sigvec(sig,thresh) %extend sig head and tail to avoid 0 head and 0 tail exsig=[1,sig,1]; %convolution sig with extend sig cvexsig=conv(exsig,ones(1,thresh)); tempsig=double(cvexsig==0); indice=find(conv(tempsig,ones(1,thresh)))-thresh;

genovice的上述答案可以被修改以找到向量中的非零元素的索引：

  tsig = (abs(sig) >= eps); dsig = diff([0 tsig 0]); startIndex = find(dsig > 0); endIndex = find(dsig < 0)-1; duration = endIndex-startIndex+1;

正如gnovice所示，我们将做一个阈值测试，使“接近零”真的为零：

 logcl = abs(sig(:)) >= zero_tolerance;

然后找到累积和不增加的区域：

 cs = cumsum(logcl); islands = cs(1+thresh:end) == cs(1:end-thresh);

记住gnovice填充索引范围的好方法

 v = zeros(1,max(endInd)+1); %# An array of zeroes v(startInd) = 1; %# Place 1 at the starts of the intervals v(endInd+1) = v(endInd+1)-1; %# Add -1 one index after the ends of the intervals indices = find(cumsum(v)); %# Perform a cumulative sum and find the nonzero entries

我们注意到，我们的islands矢量在endInd和endInd都已经有了，而且为了我们的目的， endInd总是会出现一些islands （ endInd有一些islands在运行）

 endcap = zeros(thresh,1); indices = find(cumsum([islands ; endcap] - [endcap ; islands]))

测试

 sig = [1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 0]; logcl = abs(sig(:)) >= .1; cs = cumsum(logcl); islands = cs(1+thresh:end) == cs(1:end-thresh); endcap = zeros(thresh,1); indices = find(cumsum([islands ; endcap] - [endcap ; islands]))

 indices = 2 3 4 5 13 14 15

我认为最大的MATLAB /“矢量化”方法是通过计算信号与像[-1 1]这样的滤波器的卷积。你应该看看函数conv的文档。然后在conv的输出中使用find来获得相关的索引。

在一个序列中找到零的岛屿

测试

为什么单独循环中的元素添加比组合循环中快得多？