计算将一个排列转换为另一个排列所需的相邻交换

我们给出了两个小写拉丁字母的序列。它们的长度相同，并且具有相同数量的给定types的字母（第一个与第二个字母具有相同数量的t等等）。我们需要find将第一个序列转化为第二个序列所需的最小互换次数（ 通过“互换”我们的意思是改变两个相邻字母的次序 ）。我们可以安全地假设每两个序列可以相互转换。我们可以用蛮力来做到这一点，但是序列太长了。

input：
序列的长度（至less2，最多999999），然后是两个序列。

输出：
表示序列变为相同所需的交换次数的整数。

例：
{5，aaaaa，aaaaa}应输出{0}，
{4，abcd，acdb}应该输出{2}。

我想起来的第一件事就是泡泡。我们可以简单地说明每个交换的顺序。问题是：a）O（n ^ 2）最糟糕的情况b）我不相信这会给我每个案件的最小数量…即使是最优化的泡沫似乎也没有办法。我们可以执行鸡尾酒sorting来解决龟的问题 – 但它会给我最好的performance吗？或者也许有一些更简单/更快？

这个问题也可以表述为： 当唯一允许的操作是换位时，我们如何确定两个string之间的编辑距离？

这是一个简单而有效的解决scheme：

令Q[ s2[i] ] = the positions character s2[i] is on in s2 。假设P[i] = on what position is the character corresponding to s1[i] in the second string 。

build立Q和P：

 for ( int i = 0; i < s1.size(); ++i ) Q[ s2[i] ].push_back(i); // basically, Q is a vector [0 .. 25] of lists temp[0 .. 25] = {0} for ( int i = 0; i < s1.size(); ++i ) P[i + 1] = 1 + Q[ s1[i] ][ temp[ s1[i] ]++ ];

例：

  1234 s1: abcd s2: acdb Q: Q[a = 0] = {0}, Q[b = 1] = {3}, Q[c = 2] = {1}, Q[d = 3] = {2} P: P[1] = 1, P[2] = 4 (because the b in s1 is on position 4 in s2), P[3] = 2 P[4] = 3

P有2反转（ 4 2和4 3 ），所以这是答案。

这个解决scheme是O(n log n)因为可以在O(n)完成build筑P和Q ，并且合并sorting可以在O(n log n)计数倒数。

关于将一个置换转换成另一个置换所需的最小数量（不一定相邻）的交换，您应该使用的度量是Cayley距离，它基本上是置换的大小 – 循环的数量。

计算置换中的周期数是一个相当微不足道的问题。一个简单的例子。假设排列521634。

如果你检查第一个位置，你有5个，第五个你有3个，第三个你有1个，closures第一个周期。 2在第二个位置，所以它自己做一个循环，第四个和第六个做最后一个循环（第四个在第六个位置，第四个在第六个）。如果你想在身份置换中转换这个置换（用最less的交换次数），你需要独立地重新sorting每个周期。交换的总数是排列的长度（6）减去循环次数（3）。

给定任何两个置换，它们之间的距离等于第一个与第二个的倒数的组合和身份（如上所述计算）之间的距离。因此，唯一需要做的是组合第一个排列和第二个排列的倒数，并计算结果中的循环数。所有这些操作都是O（n），因此您可以在线性时间内获得最less的交换次数。

你在找什么可能是相同的“ 肯德尔头距离 ”，这是（规范化）差异的协调减去不和谐的对。参见维基百科，声称它相当于气泡分类距离。

在R中，函数不仅可用于计算tau，

 cor( X, method="kendall", use="pairwise" ) ,

而且也用于testing差异的显着性，例如

 cor.test( x1, x2, method="kendall" ) ,

他们甚至能够正确地考虑到联系。

看到这里更多。

我已经写了一个类的Permutation ，其中可以返回一些转换需要将给定的排列转换为身份。这是通过创build轨道（周期）并计算其长度来完成的。术语取自Kostrikin A.，I.，“线性代数I介绍” 。

包括：

 #include <iostream> #include <vector> #include <set> #include <algorithm> #include <iterator>

class排列：

 class Permutation { public: struct ei_element { /* element of the orbit*/ int e; /* identity index */ int i; /* actual permutation index */ }; typedef std::vector<ei_element> Orbit; /* a cycle */ Permutation( std::vector<int> const& i_vector); /* permute i element, vector is 0 indexed */ int pi( int i) const { return iv[ i - 1]; } int i( int k) const { return pi( k); } /* i_k = pi(k) */ int q() const { /* TODO: return rank = q such that pi^q = e */ return 0; } int n() const { return n_; } /* return the sequence 1, 2, ..., n */ std::vector<int> const& Omega() const { return ev; } /* return vector of cycles */ std::vector<Orbit> const& orbits() const { return orbits_; } int l( int k) const { return orbits_[ k].size(); } /* length of k-th cycle */ int transpositionsCount() const; /* return sum of all transpositions */ void make_orbits(); private: struct Increment { int current; Increment(int start) : current(start) {} int operator() () { return current++; } }; int n_; std::vector<int> iv; /* actual permutation */ std::vector<int> ev; /* identity permutation */ std::vector<Orbit> orbits_; };

定义：

 Permutation::Permutation( std::vector<int> const& i_vector) : n_( i_vector.size()), iv( i_vector), ev( n_) { if ( n_) { /* fill identity vector 1, 2, ..., n */ Increment g ( 1); std::generate( ev.begin(), ev.end(), g); } } /* create orbits (cycles) */ void Permutation::make_orbits() { std::set<int> to_visit( ev.begin(), ev.end()); // identity elements to visit while ( !to_visit.empty()) { /* new cycle */ Orbit orbit; int first_to_visit_e = *to_visit.begin(); to_visit.erase( first_to_visit_e); int k = first_to_visit_e; // element in identity vector /* first orbit element */ ei_element element; element.e = first_to_visit_e; element.i = i( first_to_visit_e); orbit.push_back( element); /* traverse permutation until cycle is closed */ while ( pi( k) != first_to_visit_e && !to_visit.empty()) { k = pi( k); ei_element element; element.e = k; element.i = pi( k); orbit.push_back( element); to_visit.erase( k); } orbits_.push_back( orbit); } }

和：

 /* return sum of all transpositions */ int Permutation::transpositionsCount() const { int count = 0; int k = 0; while ( k < orbits_.size()) { count += l( k++) - 1; /* sum += l_k - 1 */ } return count; }

用法：

 /* * */ int main(int argc, char** argv) { //1, 2, 3, 4, 5, 6, 7, 8 identity (e) int permutation[] = {2, 3, 4, 5, 1, 7, 6, 8}; // actual (i) std::vector<int> vp( permutation, permutation + 8); Permutation p( vp); p.make_orbits(); int k = p.orbits().size(); std::cout << "Number of cycles:" << k << std::endl; for ( int i = 0; i < k; ++i) { std::vector<Permutation::ei_element> v = p.orbits()[ i]; for ( int j = 0; j < v.size(); ++j) { std::cout << v[ j].e << "," << v[ j].i << " | "; } std::cout << std::endl; } std::cout << "Steps needed to create identity permutation: " << p.transpositionsCount(); return 0; }

输出：

循环次数：3

1,2 | 2,3 | 3,4 | 4,5 | 5,1 |

6,7 | 7,6 |

8,8 |

创build身份排列所需的步骤：5

运行成功（总时间：82ms）

coliru

在这种情况下，“ 肯德尔距离 ”algorithm是精确的解决scheme，其中必须find相邻元素的交换次数。

例。

eyssaasse（ 基本string ）
seasysaes

基本string为每个元素提供索引： e = 0， y = 1， s = 2， s = 3， a = 4， a = 5， s = 6， s = 7， e = 8;

有些元素是重复的，所以：
1）创build一个字典，元素是键，值是索引列表：

idx = {' e '=> [ 0,8 ]，' y '=> [1]，' s '=> [2,3,6,7]，' a '=> [4,5]}

2）使用idx字典中的元素索引创build第二个string的索引图：

seasysaes – > 204316587 （循环“seasysaes”，从idx中的每个键列表中popup下一个索引）

3）创build该地图的所有配对组合列表，204316587：20 24 23 21 26 25 28 27 04 03 01 06 … 65 68 67 58 57 87;
循环通过这些对计数那些第一个数字大于第二个数字。
这个数字是string之间相邻交换的寻求数量 。

Python脚本：

 from itertools import combinations, cycle word = 'eyssaasse' # base string cmpr = 'seasysaes' # a string to find number of swaps from the base string swaps = 0 # 1) chars = {c: [] for c in word} [chars[c].append(i) for i, c in enumerate(word)] for k in chars.keys(): chars[k] = cycle(chars[k]) # 2) idxs = [next(chars[c]) for c in cmpr] # 3) for cmb in combinations(idxs, 2): if cmb[0] > cmb[1]: swaps += 1 print(swaps)

'eyssaasse'和'seasysaes'之间的掉电次数是7次。
对于“reviver”和“vrerevi”，它是8。

通过反转O（n）中的目标置换，将O（n）中的置换组合，然后find从那里到的交换次数，可以将置换从一个转换到另一个可以转换成类似问题（置换中的交换次数）身份置换。鉴于：

 int P1[] = {0, 1, 2, 3}; // abcd int P2[] = {0, 2, 3, 1}; // acdb // we can follow a simple algebraic modification // (see http://en.wikipedia.org/wiki/Permutation#Product_and_inverse): // P1 * P = P2 | premultiply P1^-1 * // P1^-1 * P1 * P = P1^-1 * P2 // I * P = P1^-1 * P2 // P = P1^-1 * P2 // where P is a permutation that makes P1 into P2. // also, the number of steps from P to identity equals // the number of steps from P1 to P2. int P1_inv[4]; for(int i = 0; i < 4; ++ i) P1_inv[P1[i]] = i; // invert the first permutation O(n) int P[4]; for(int i = 0; i < 4; ++ i) P[i] = P2[P1_inv[i]]; // chain the permutations int num_steps = NumSteps(P, 4); // will return 2 // now we just need to count the steps

要计算这些步骤，可以devise一个简单的algorithm，例如：

 int NumSteps(int *P, int n) { int count = 0; for(int i = 0; i < n; ++ i) { for(; P[i] != i; ++ count) // could be permuted multiple times swap(P[P[i]], P[i]); // look where the number at hand should be } // count number of permutations return count; }

这总是将一个物品换成一个应该在身份排列中的地方，因此在每一步它都会撤消并计算一次交换。现在，假设它返回的交换次数确实是最小的，那么algorithm的运行时间就会受到限制，并保证完成（而不是陷入无限循环）。它将以O(m)交换或O(m + n)循环迭代运行，其中m是交换count返回的count ）， n是序列（ 4 ）中项目的数量。请注意， m < n始终为真。因此，这应该优于O(n log n)解，因为这里的交换的上限是O(n - 1)或者是循环迭代的O(n + n - 1) ，实际上O(n) （在后一种情况下省略2的常数因子）。

该algorithm仅适用于有效置换，对于具有重复值的序列，它将无限循环，并且对具有[0, n)以外的值的序列进行超出边界的数组访问（和崩溃[0, n) 。一个完整的testing用例可以在这里find（用Visual Studio 2008构build，algorithm本身应该相当便携）。它生成长度为1到32的所有可能的排列，并检查用广度优先search（BFS）生成的解决scheme，似乎适用于长度为1到12的所有排列，然后它变得相当慢，但是我认为它将继续工作。

计算将一个排列转换为另一个排列所需的相邻交换

通过有效词将一个词转换成另一个词的algorithm

平行四边形内的随机点

在x和y坐标的numpy数组中寻找最近点的索引

查找未sorting数组的中位数

如何检查两个列表是否在Python中循环相同

生成一组置换（最有效）

LogLog和HyperLogLogalgorithm来计算大基数

地理围栏 – 指向内部/外部多边形

带数组结构的数组

为什么散列表扩展通常是通过加倍大小来完成的？