最快的方法来确定一个整数的平方根是一个整数
我正在寻找最快的方式来确定一个long
价值是一个完美的平方(即其平方根是另一个整数)。 我已经通过使用内置的Math.sqrt()函数,简单的方法做到了这一点,但是我想知道是否有办法通过将自己限制为只有整数的域来更快地实现。 维护一个查找表是不切实际的(因为大约有2 31.5个整数的平方小于2 63 )。
这是我现在正在做的非常简单直接的方式:
public final static boolean isPerfectSquare(long n) { if (n < 0) return false; long tst = (long)(Math.sqrt(n) + 0.5); return tst*tst == n; }
注意:我在许多Project Euler问题中使用了这个函数。 所以没有人会永远保持这个代码。 而这种微观优化实际上可能会有所作为,因为部分挑战是在不到一分钟的时间内完成每一个algorithm,而这个function在一些问题上需要被调用数百万次。
更新2 :由A. Rex发布的新解决scheme已被certificate更快。 在前10亿个整数的运行中,解决scheme只需要使用原始解决scheme的34%。 虽然John Carmack对n的小值有点更好,但与此解决scheme相比,其好处却相当小。
A. Rex解决scheme转换为Java:
private final static boolean isPerfectSquare(long n) { // Quickfail if( n < 0 || ((n&2) != 0) || ((n & 7) == 5) || ((n & 11) == 8) ) return false; if( n == 0 ) return true; // Check mod 255 = 3 * 5 * 17, for fun long y = n; y = (y & 0xffffffffL) + (y >> 32); y = (y & 0xffffL) + (y >> 16); y = (y & 0xffL) + ((y >> 8) & 0xffL) + (y >> 16); if( bad255[(int)y] ) return false; // Divide out powers of 4 using binary search if((n & 0xffffffffL) == 0) n >>= 32; if((n & 0xffffL) == 0) n >>= 16; if((n & 0xffL) == 0) n >>= 8; if((n & 0xfL) == 0) n >>= 4; if((n & 0x3L) == 0) n >>= 2; if((n & 0x7L) != 1) return false; // Compute sqrt using something like Hensel's lemma long r, t, z; r = start[(int)((n >> 3) & 0x3ffL)]; do { z = n - r * r; if( z == 0 ) return true; if( z < 0 ) return false; t = z & (-z); r += (z & t) >> 1; if( r > (t >> 1) ) r = t - r; } while( t <= (1L << 33) ); return false; } private static boolean[] bad255 = { false,false,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true , true ,true ,false,false,true ,true ,false,true ,false,true ,true ,true ,false, true ,true ,true ,true ,false,true ,true ,true ,false,true ,false,true ,true , true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,false, true ,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,false, true ,false,true ,true ,false,false,true ,true ,true ,true ,true ,false,true , true ,true ,true ,false,true ,true ,false,false,true ,true ,true ,true ,true , true ,true ,true ,false,true ,true ,true ,true ,true ,false,true ,true ,true , true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,false,true , true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,true ,true , true ,true ,true ,true ,true ,false,false,true ,true ,true ,true ,true ,true , true ,false,false,true ,true ,true ,true ,true ,false,true ,true ,false,true , true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,true , false,true ,false,true ,true ,false,true ,true ,true ,true ,true ,true ,true , true ,true ,true ,true ,false,true ,true ,false,true ,true ,true ,true ,true , false,false,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true , true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,false, true ,true ,true ,true ,false,true ,true ,true ,false,true ,true ,true ,true , false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false, true ,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,false, true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,false,true , true ,false,true ,false,true ,true ,true ,false,true ,true ,true ,true ,false, true ,true ,true ,false,true ,false,true ,true ,true ,true ,true ,true ,true , true ,true ,true ,true ,true ,false,true ,false,true ,true ,true ,false,true , true ,true ,true ,false,true ,true ,true ,false,true ,false,true ,true ,false, false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,false,true , true ,false,false,true ,true ,true ,true ,true ,true ,true ,true ,false,true , true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,true ,true , true ,true ,false,true ,true ,true ,false,true ,true ,true ,true ,false,false, true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true , false,false,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true , true ,true ,true ,false,true ,true ,false,true ,true ,true ,true ,true ,true , true ,true ,true ,true ,true ,false,true ,true ,false,true ,false,true ,true , false,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false, true ,true ,false,true ,true ,true ,true ,true ,false,false,true ,true ,true , true ,true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,true , true ,true ,true ,true ,true ,true ,false,false,true ,true ,true ,true ,false, true ,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,true , true ,false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,true , true ,true ,true ,false,false }; private static int[] start = { 1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11, 1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203, 129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395, 1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587, 257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779, 1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971, 385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163, 1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355, 513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547, 1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739, 641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931, 1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973, 769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781, 1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589, 897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397, 1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205, 1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013, 959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821, 1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629, 831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437, 1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245, 703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53, 1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139, 575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331, 1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523, 447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715, 1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907, 319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099, 1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291, 191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483, 1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675, 63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867, 2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037, 65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845, 1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653, 193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461, 1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269, 321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077, 1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885, 449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693, 1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501, 577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309, 1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117, 705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75, 1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267, 833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459, 1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651, 961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843, 1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035, 1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227, 895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419, 1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611, 767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803, 1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995, 639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909, 1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717, 511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525, 1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333, 383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141, 1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949, 255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757, 1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565, 127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373, 1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181 };
更新 :我已经尝试了下面介绍的不同解决scheme。
- 经过详尽的testing,我发现Math.sqrt()的结果加
0.5
是没有必要的,至less不是在我的机器上。 - John Carmack破解速度更快,但是从n = 410881开始,结果不正确。 但是,正如BobbyShaftoe所build议的那样 ,我们可以使用Carmack hack(n <410881)。
- 牛顿的方法比
Math.sqrt()
慢了一点。 这可能是因为Math.sqrt()
使用类似于牛顿的方法,但在硬件中实现,因此它比在Java中快得多。 另外,牛顿法还需要使用双打。 - 修改后的牛顿方法使用了一些技巧,只涉及整数运算,需要一些黑客来避免溢出(我希望这个函数能够处理所有正整数的64位有符号整数),并且它仍然比
Math.sqrt()
慢Math.sqrt()
。 - 二进制sorting甚至更慢。 这是有道理的,因为二进制斩将平均需要16遍才能find一个64位数的平方根。
约翰·D·库克 ( John D. Cook )提出了一个显示出改进的build议。 你可以观察到一个完美的正方形的最后一个hex数字(即最后4位)必须是0,1,4或9.这意味着75%的数字可以立即被消除为可能的正方形。 实施这个解决scheme导致运行时间减less了大约50%。
根据约翰的build议,我研究了一个完美正方形的最后n位的属性。 通过分析最后6位,我发现最后6位只有12个数值是可能的。 这意味着81%的数值可以在不使用任何math的情况下被消除。 实施这个解决scheme使运行时间减less了8%(与我的原始algorithm相比)。 分析多于6位会导致可能的结束位列表太大而不实用。
这里是我使用的代码,它运行在原始algorithm所需时间的42%(基于对前1亿个整数的运行)。 对于n小于410881的值,它仅运行原algorithm所需时间的29%。
private final static boolean isPerfectSquare(long n) { if (n < 0) return false; switch((int)(n & 0x3F)) { case 0x00: case 0x01: case 0x04: case 0x09: case 0x10: case 0x11: case 0x19: case 0x21: case 0x24: case 0x29: case 0x31: case 0x39: long sqrt; if(n < 410881L) { //John Carmack hack, converted to Java. // See: http://www.codemaestro.com/reviews/9 int i; float x2, y; x2 = n * 0.5F; y = n; i = Float.floatToRawIntBits(y); i = 0x5f3759df - ( i >> 1 ); y = Float.intBitsToFloat(i); y = y * ( 1.5F - ( x2 * y * y ) ); sqrt = (long)(1.0F/y); } else { //Carmack hack gives incorrect answer for n >= 410881. sqrt = (long)Math.sqrt(n); } return sqrt*sqrt == n; default: return false; } }
备注 :
- 根据John的testing,在C ++中使用
or
语句的速度比使用switch
要快,但在Java和C#中,似乎没有区别。 - 我也试着做一个查找表(作为一个私人的64个布尔值的静态数组)。 然后,而不是switch或
or
语句,我只是说if(lookup[(int)(n&0x3F)]) { test } else return false;
。 令我吃惊的是,这只是(稍微)慢一点。我不知道为什么。这是因为在Java中检查数组边界 。
我想出了一种方法,比起你的6位+ Carmack + sqrt代码,至less用我的CPU(x86)和编程语言(C / C ++)快35%。 你的结果可能会有所不同,特别是因为我不知道Java因素将如何发挥。
我的方法是三重的:
- 首先,滤除明显的答案。 这包括负数,并查看最后4位。 (我发现看最后六个没有帮助。)我也回答是0(在阅读下面的代码,请注意,我的input是
int64 x
。)if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) return false; if( x == 0 ) return true;
- 接下来,检查它是一个平方255 = 3 * 5 * 17。因为这是三个不同素数的乘积,所以只有大约1/8的255的残基是正方形。 然而,根据我的经验,调用模运算符(%)的成本高于获得的好处,所以我使用包含255 = 2 ^ 8-1的小技巧来计算残差。 (不pipe好坏,我不是用一个字读出单个字节的技巧,只是逐位和倒换。)
int64 y = x; y = (y & 4294967295LL) + (y >> 32); y = (y & 65535) + (y >> 16); y = (y & 255) + ((y >> 8) & 255) + (y >> 16); // At this point, y is between 0 and 511. More code can reduce it farther.
为了实际检查残留物是否是方块,我在预先计算的表格中查找答案。
if( bad255[y] ) return false; // However, I just use a table of size 512
- 最后,尝试使用类似于Hensel引理的方法来计算平方根。 (我不认为它是直接适用的,但它有一些修改。)在这之前,我用二分search划分了2的所有权力:
if((x & 4294967295LL) == 0) x >>= 32; if((x & 65535) == 0) x >>= 16; if((x & 255) == 0) x >>= 8; if((x & 15) == 0) x >>= 4; if((x & 3) == 0) x >>= 2;
在这一点上,我们的数字是一个正方形,它必须是1模8。
if((x & 7) != 1) return false;
Hensel引理的基本结构如下。 (注意:未经testing的代码;如果不起作用,请尝试t = 2或8)
int64 t = 4, r = 1; t <<= 1; r += ((x - r * r) & t) >> 1; t <<= 1; r += ((x - r * r) & t) >> 1; t <<= 1; r += ((x - r * r) & t) >> 1; // Repeat until t is 2^33 or so. Use a loop if you want.
这个想法是,在每次迭代时,你要在r的“当前”x平方根上添加一个位; 每个平方根都是精确的模2的大,大的幂,即t / 2。 最后,r和t / 2-r将是x模t / 2的平方根。 (注意,如果r是x的平方根,那么-r也是如此,即使是模数也是如此,但要小心,以某些数为模,甚至可以有2个以上的平方根,特别是包括2的幂。 )因为我们的实际平方根小于2 ^ 32,那么我们实际上可以检查r或t / 2-r是否是真正的平方根。 在我的实际代码中,我使用了以下修改的循环:
int64 r, t, z; r = start[(x >> 3) & 1023]; do { z = x - r * r; if( z == 0 ) return true; if( z < 0 ) return false; t = z & (-z); r += (z & t) >> 1; if( r > (t >> 1) ) r = t - r; } while( t <= (1LL << 33) );
这里的加速可以通过三种方式获得:预先计算的起始值(相当于循环的约10次迭代),循环的早期退出,以及跳过一些t值。 对于最后一部分,我看
z = r - x * x
,并设置t是2分z的一个小技巧的最大功率。 这允许我跳过不会影响r值的t值。 在我的情况下,预先计算的起始值挑出了“最小正数”平方根模8192。
即使这个代码对你来说工作不快,我希望你喜欢它包含的一些想法。 完整的,经过testing的代码如下,包括预先计算的表。
typedef signed long long int int64; int start[1024] = {1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11, 1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203, 129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395, 1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587, 257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779, 1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971, 385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163, 1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355, 513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547, 1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739, 641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931, 1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973, 769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781, 1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589, 897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397, 1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205, 1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013, 959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821, 1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629, 831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437, 1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245, 703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53, 1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139, 575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331, 1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523, 447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715, 1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907, 319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099, 1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291, 191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483, 1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675, 63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867, 2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037, 65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845, 1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653, 193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461, 1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269, 321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077, 1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885, 449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693, 1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501, 577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309, 1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117, 705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75, 1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267, 833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459, 1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651, 961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843, 1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035, 1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227, 895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419, 1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611, 767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803, 1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995, 639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909, 1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717, 511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525, 1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333, 383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141, 1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949, 255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757, 1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565, 127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373, 1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181}; bool bad255[512] = {0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1, 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1, 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1, 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1, 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1, 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1, 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1, 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1, 0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1, 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1, 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1, 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1, 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1, 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1, 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1, 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1, 0,0}; inline bool square( int64 x ) { // Quickfail if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) return false; if( x == 0 ) return true; // Check mod 255 = 3 * 5 * 17, for fun int64 y = x; y = (y & 4294967295LL) + (y >> 32); y = (y & 65535) + (y >> 16); y = (y & 255) + ((y >> 8) & 255) + (y >> 16); if( bad255[y] ) return false; // Divide out powers of 4 using binary search if((x & 4294967295LL) == 0) x >>= 32; if((x & 65535) == 0) x >>= 16; if((x & 255) == 0) x >>= 8; if((x & 15) == 0) x >>= 4; if((x & 3) == 0) x >>= 2; if((x & 7) != 1) return false; // Compute sqrt using something like Hensel's lemma int64 r, t, z; r = start[(x >> 3) & 1023]; do { z = x - r * r; if( z == 0 ) return true; if( z < 0 ) return false; t = z & (-z); r += (z & t) >> 1; if( r > (t >> 1) ) r = t - r; } while( t <= (1LL << 33) ); return false; }
我对这个派对很迟,但我希望能提供一个更好的答案。 (假设我的基准是正确的)也快得多 。
long goodMask; // 0xC840C04048404040 computed below { for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i); } public boolean isSquare(long x) { // This tests if the 6 least significant bits are right. // Moving the to be tested bit to the highest position saves us masking. if (goodMask << x >= 0) return false; final int numberOfTrailingZeros = Long.numberOfTrailingZeros(x); // Each square ends with an even number of zeros. if ((numberOfTrailingZeros & 1) != 0) return false; x >>= numberOfTrailingZeros; // Now x is either 0 or odd. // In binary each odd square ends with 001. // Postpone the sign test until now; handle zero in the branch. if ((x&7) != 1 | x <= 0) return x == 0; // Do it in the classical way. // The correctness is not trivial as the conversion from long to double is lossy! final long tst = (long) Math.sqrt(x); return tst * tst == x; }
第一个testing很快捕获了大多数非正方形。 它使用一个长的包装64项表,所以没有arrays访问成本(间接和边界检查)。 对于统一的随机long
,在这里结束的概率为81.25%。
第二个testing捕获所有具有奇数个二进制因子的数字。 Long.numberOfTrailingZeros
的方法非常快,因为它被JIT编译成一个i86指令。
在丢弃尾随零后,第三个testing处理以二进制结束011,101或111的数字,它们不是完美的正方形。 它也关心负数,并处理0。
最后的testing回落到double
算术。 由于double
只有53位尾数,所以从long
到double
的转换包括大值的舍入。 尽pipe如此,testing是正确的(除非certificate是错误的)。
试图纳入mod255的想法是不成功的。
你将不得不做一些基准testing。 最好的algorithm将取决于你的input分布。
你的algorithm可能几乎是最佳的,但你可能想要做一个快速的检查,以排除一些可能性,然后调用你的平方根例程。 例如,通过按位进行“and”来查看hex数字的最后一位数字。 完美的正方形只能在基数为16的0,1,4或9中结束,因此对于75%的input(假设它们是均匀分布的),可以避免调用平方根来换取一些非常快的位。
基普基准执行hex技巧的以下代码。 当testing数字1到100,000,000时,这个代码运行的速度是原来的两倍。
public final static boolean isPerfectSquare(long n) { if (n < 0) return false; switch((int)(n & 0xF)) { case 0: case 1: case 4: case 9: long tst = (long)Math.sqrt(n); return tst*tst == n; default: return false; } }
当我在C ++中testing类似的代码时,它实际上比原来的运行速度慢。 但是,当我删除switch语句时,hex技巧再次使代码快两倍。
int isPerfectSquare(int n) { int h = n & 0xF; // h is the last hex "digit" if (h > 9) return 0; // Use lazy evaluation to jump out of the if statement as soon as possible if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8) { int t = (int) floor( sqrt((double) n) + 0.5 ); return t*t == n; } return 0; }
消除switch语句对C#代码没有什么影响。
I was thinking about the horrible times I've spent in Numerical Analysis course.
And then I remember, there was this function circling around the 'net from the Quake Source code:
float Q_rsqrt( float number ) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating point bit level hacking i = 0x5f3759df - ( i >> 1 ); // wtf? y = * ( float * ) &i; y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed #ifndef Q3_VM #ifdef __linux__ assert( !isnan(y) ); // bk010122 - FPE? #endif #endif return y; }
Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).
It should be usable and might even be faster, it's from one of the phenomenal id software's game!
It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:
I originally found it at: http://www.codemaestro.com/reviews/9
Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method
You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:
- the
* (long*) &y
is basically a fast convert-to-long function so integer operations can be applied on the raw bytes. - the
0x5f3759df - (i >> 1);
line is a pre-calculated seed value for the approximation function. - the
* (float*) &i
converts the value back to floating point. - the
y = y * ( threehalfs - ( x2 * y * y ) )
line bascially iterates the value over the function again.
The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you… then you could add as much iteration as you need.
This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5F
multiply operation) and replace it with a few fixed number of multiplication operations instead.
I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root , algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.
If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:
(n+1)^2 = n^2 + 2n + 1 (n-1)^2 = n^2 - 2n + 1
So having calculated n^2
, the options are:
-
n^2 = target
: done, return true -
n^2 + 2n + 1 > target > n^2
: you're close, but it's not perfect: return false -
n^2 - 2n + 1 < target < n^2
: ditto -
target < n^2 - 2n + 1
: binary chop on a lowern
-
target > n^2 + 2n + 1
: binary chop on a highern
(Sorry, this uses n
as your current guess, and target
for the parameter. Apologise for the confusion!)
I don't know whether this will be faster or not, but it's worth a try.
EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x)
, so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.
I ran my own analysis of several of the algorithms in this thread and came up with some new results. You can see those old results in the edit history of this answer, but they're not accurate, as I made a mistake, and wasted time analyzing several algorithms which aren't close. However, pulling lessons from several different answers, I now have two algorithms that crush the "winner" of this thread. Here's the core thing I do differently than everyone else:
// This is faster because a number is divisible by 2^4 or more only 6% of the time // and more than that a vanishingly small percentage. while((x & 0x3) == 0) x >>= 2; // This is effectively the same as the switch-case statement used in the original // answer. if((x & 0x7) != 1) return false;
However, this simple line, which most of the time adds one or two very fast instructions, greatly simplifies the switch-case
statement into one if statement. However, it can add to the runtime if many of the tested numbers have significant power-of-two factors.
The algorithms below are as follows:
- Internet – Kip's posted answer
- Durron – My modified answer using the one-pass answer as a base
- DurronTwo – My modified answer using the two-pass answer (by @JohnnyHeggheim), with some other slight modifications.
Here is a sample runtime if the numbers are generated using Math.abs(java.util.Random.nextLong())
0% Scenario{vm=java, trial=0, benchmark=Internet} 39673.40 ns; ?=378.78 ns @ 3 trials 33% Scenario{vm=java, trial=0, benchmark=Durron} 37785.75 ns; ?=478.86 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 35978.10 ns; ?=734.10 ns @ 10 trials benchmark us linear runtime Internet 39.7 ============================== Durron 37.8 ============================ DurronTwo 36.0 =========================== vm: java trial: 0
And here is a sample runtime if it's run on the first million longs only:
0% Scenario{vm=java, trial=0, benchmark=Internet} 2933380.84 ns; ?=56939.84 ns @ 10 trials 33% Scenario{vm=java, trial=0, benchmark=Durron} 2243266.81 ns; ?=50537.62 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 3159227.68 ns; ?=10766.22 ns @ 3 trials benchmark ms linear runtime Internet 2.93 =========================== Durron 2.24 ===================== DurronTwo 3.16 ============================== vm: java trial: 0
As you can see, DurronTwo
does better for large inputs, because it gets to use the magic trick very very often, but gets clobbered compared to the first algorithm and Math.sqrt
because the numbers are so much smaller. Meanwhile, the simpler Durron
is a huge winner because it never has to divide by 4 many many times in the first million numbers.
Here's Durron
:
public final static boolean isPerfectSquareDurron(long n) { if(n < 0) return false; if(n == 0) return true; long x = n; // This is faster because a number is divisible by 16 only 6% of the time // and more than that a vanishingly small percentage. while((x & 0x3) == 0) x >>= 2; // This is effectively the same as the switch-case statement used in the original // answer. if((x & 0x7) == 1) { long sqrt; if(x < 410881L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); i = 0x5f3759df - ( i >> 1 ); y = Float.intBitsToFloat(i); y = y * ( 1.5F - ( x2 * y * y ) ); sqrt = (long)(1.0F/y); } else { sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }
And DurronTwo
public final static boolean isPerfectSquareDurronTwo(long n) { if(n < 0) return false; // Needed to prevent infinite loop if(n == 0) return true; long x = n; while((x & 0x3) == 0) x >>= 2; if((x & 0x7) == 1) { long sqrt; if (x < 41529141369L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); //using the magic number from //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf //since it more accurate i = 0x5f375a86 - (i >> 1); y = Float.intBitsToFloat(i); y = y * (1.5F - (x2 * y * y)); y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate sqrt = (long) ((1.0F/y) + 0.2); } else { //Carmack hack gives incorrect answer for n >= 41529141369. sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }
And my benchmark harness: (Requires Google caliper 0.1-rc5)
public class SquareRootBenchmark { public static class Benchmark1 extends SimpleBenchmark { private static final int ARRAY_SIZE = 10000; long[] trials = new long[ARRAY_SIZE]; @Override protected void setUp() throws Exception { Random r = new Random(); for (int i = 0; i < ARRAY_SIZE; i++) { trials[i] = Math.abs(r.nextLong()); } } public int timeInternet(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareInternet(trials[j])) trues++; } } return trues; } public int timeDurron(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareDurron(trials[j])) trues++; } } return trues; } public int timeDurronTwo(int reps) { int trues = 0; for(int i = 0; i < reps; i++) { for(int j = 0; j < ARRAY_SIZE; j++) { if(SquareRootAlgs.isPerfectSquareDurronTwo(trials[j])) trues++; } } return trues; } } public static void main(String... args) { Runner.main(Benchmark1.class, args); } }
UPDATE: I've made a new algorithm that is faster in some scenarios, slower in others, I've gotten different benchmarks based on different inputs. If we calculate modulo 0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241
, we can eliminate 97.82% of numbers that cannot be squares. This can be (sort of) done in one line, with 5 bitwise operations:
if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
The resulting index is either 1) the residue, 2) the residue + 0xFFFFFF
, or 3) the residue + 0x1FFFFFE
. Of course, we need to have a lookup table for residues modulo 0xFFFFFF
, which is about a 3mb file (in this case stored as ascii text decimal numbers, not optimal but clearly improvable with a ByteBuffer
and so forth. But since that is precalculation it doesn't matter so much. You can find the file here (or generate it yourself):
public final static boolean isPerfectSquareDurronThree(long n) { if(n < 0) return false; if(n == 0) return true; long x = n; while((x & 0x3) == 0) x >>= 2; if((x & 0x7) == 1) { if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false; long sqrt; if(x < 410881L) { int i; float x2, y; x2 = x * 0.5F; y = x; i = Float.floatToRawIntBits(y); i = 0x5f3759df - ( i >> 1 ); y = Float.intBitsToFloat(i); y = y * ( 1.5F - ( x2 * y * y ) ); sqrt = (long)(1.0F/y); } else { sqrt = (long) Math.sqrt(x); } return sqrt*sqrt == x; } return false; }
I load it into a boolean
array like this:
private static boolean[] goodLookupSquares = null; public static void initGoodLookupSquares() throws Exception { Scanner s = new Scanner(new File("24residues_squares.txt")); goodLookupSquares = new boolean[0x1FFFFFE]; while(s.hasNextLine()) { int residue = Integer.valueOf(s.nextLine()); goodLookupSquares[residue] = true; goodLookupSquares[residue + 0xFFFFFF] = true; goodLookupSquares[residue + 0x1FFFFFE] = true; } s.close(); }
Example runtime. It beat Durron
(version one) in every trial I ran.
0% Scenario{vm=java, trial=0, benchmark=Internet} 40665.77 ns; ?=566.71 ns @ 10 trials 33% Scenario{vm=java, trial=0, benchmark=Durron} 38397.60 ns; ?=784.30 ns @ 10 trials 67% Scenario{vm=java, trial=0, benchmark=DurronThree} 36171.46 ns; ?=693.02 ns @ 10 trials benchmark us linear runtime Internet 40.7 ============================== Durron 38.4 ============================ DurronThree 36.2 ========================== vm: java trial: 0
It should be much faster to use Newton's method to calculate the Integer Square Root , then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.
Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.
I want this function to work with all positive 64-bit signed integers
Math.sqrt() works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.
It's been pointed out that the last d
digits of a perfect square can only take on certain values. The last d
digits (in base b
) of a number n
is the same as the remainder when n
is divided by b
d
, ie. in C notation n % pow(b, d)
.
This can be generalized to any modulus m
, ie. n % m
can be used to rule out some percentage of numbers from being perfect squares. The modulus you are currently using is 64, which allows 12, ie. 19% of remainders, as possible squares. With a little coding I found the modulus 110880, which allows only 2016, ie. 1.8% of remainders as possible squares. So depending on the cost of a modulus operation (ie. division) and a table lookup versus a square root on your machine, using this modulus might be faster.
By the way if Java has a way to store a packed array of bits for the lookup table, don't use it. 110880 32-bit words is not much RAM these days and fetching a machine word is going to be faster than fetching a single bit.
Just for the record, another approach is to use the prime decomposition. If every factor of the decomposition is even, then the number is a perfect square. So what you want is to see if a number can be decomposed as a product of squares of prime numbers. Of course, you don't need to obtain such a decomposition, just to see if it exists.
First build a table of squares of prime numbers which are lower than 2^32. This is far smaller than a table of all integers up to this limit.
A solution would then be like this:
boolean isPerfectSquare(long number) { if (number < 0) return false; if (number < 2) return true; for (int i = 0; ; i++) { long square = squareTable[i]; if (square > number) return false; while (number % square == 0) { number /= square; } if (number == 1) return true; } }
I guess it's a bit cryptic. What it does is checking in every step that the square of a prime number divide the input number. If it does then it divides the number by the square as long as it is possible, to remove this square from the prime decomposition. If by this process, we came to 1, then the input number was a decomposition of square of prime numbers. If the square becomes larger than the number itself, then there is no way this square, or any larger squares, can divide it, so the number can not be a decomposition of squares of prime numbers.
Given nowadays' sqrt done in hardware and the need to compute prime numbers here, I guess this solution is way slower. But it should give better results than solution with sqrt which won't work over 2^54, as says mrzl in his answer.
For performance, you very often have to do some compromsies. Others have expressed various methods, however, you noted Carmack's hack was faster up to certain values of N. Then, you should check the "n" and if it is less than that number N, use Carmack's hack, else use some other method described in the answers here.
An integer problem deserves an integer solution. 从而
Do binary search on the (non-negative) integers to find the greatest integer t such that t**2 <= n
. Then test whether r**2 = n
exactly. This takes time O(log n).
If you don't know how to binary search the positive integers because the set is unbounded, it's easy. You starting by computing your increasing function f (above f(t) = t**2 - n
) on powers of two. When you see it turn positive, you've found an upper bound. Then you can do standard binary search.
You should get rid of the 2-power part of N right from the start.
2nd Edit The magical expression for m below should be
m = N - (N & (N-1));
and not as written
End of 2nd edit
m = N & (N-1); // the lawest bit of N N /= m; byte = N & 0x0F; if ((m % 2) || (byte !=1 && byte !=9)) return false;
1st Edit:
Minor improvement:
m = N & (N-1); // the lawest bit of N N /= m; if ((m % 2) || (N & 0x07 != 1)) return false;
End of 1st edit
Now continue as usual. This way, by the time you get to the floating point part, you already got rid of all the numbers whose 2-power part is odd (about half), and then you only consider 1/8 of whats left. Ie you run the floating point part on 6% of the numbers.
This is the fastest Java implementation I could come up with, using a combination of techniques suggested by others in this thread.
- Mod-256 test
- Inexact mod-3465 test (avoids integer division at the cost of some false positives)
- Floating-point square root, round and compare with input value
I also experimented with these modifications but they did not help performance:
- Additional mod-255 test
- Dividing the input value by powers of 4
- Fast Inverse Square Root (to work for high values of N it needs 3 iterations, enough to make it slower than the hardware square root function.)
public class SquareTester { public static boolean isPerfectSquare(long n) { if (n < 0) { return false; } else { switch ((byte) n) { case -128: case -127: case -124: case -119: case -112: case -111: case -103: case -95: case -92: case -87: case -79: case -71: case -64: case -63: case -60: case -55: case -47: case -39: case -31: case -28: case -23: case -15: case -7: case 0: case 1: case 4: case 9: case 16: case 17: case 25: case 33: case 36: case 41: case 49: case 57: case 64: case 65: case 68: case 73: case 81: case 89: case 97: case 100: case 105: case 113: case 121: long i = (n * INV3465) >>> 52; if (! good3465[(int) i]) { return false; } else { long r = round(Math.sqrt(n)); return r*r == n; } default: return false; } } } private static int round(double x) { return (int) Double.doubleToRawLongBits(x + (double) (1L << 52)); } /** 3465<sup>-1</sup> modulo 2<sup>64</sup> */ private static final long INV3465 = 0x8ffed161732e78b9L; private static final boolean[] good3465 = new boolean[0x1000]; static { for (int r = 0; r < 3465; ++ r) { int i = (int) ((r * r * INV3465) >>> 52); good3465[i] = good3465[i+1] = true; } } }
The following simplification of maaartinus's solution appears to shave a few percentage points off the runtime, but I'm not good enough at benchmarking to produce a benchmark I can trust:
long goodMask; // 0xC840C04048404040 computed below { for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i); } public boolean isSquare(long x) { // This tests if the 6 least significant bits are right. // Moving the to be tested bit to the highest position saves us masking. if (goodMask << x >= 0) return false; // Remove an even number of trailing zeros, leaving at most one. x >>= (Long.numberOfTrailingZeros(x) & (-2); // Repeat the test on the 6 least significant remaining bits. if (goodMask << x >= 0 | x <= 0) return x == 0; // Do it in the classical way. // The correctness is not trivial as the conversion from long to double is lossy! final long tst = (long) Math.sqrt(x); return tst * tst == x; }
It would be worth checking how omitting the first test,
if (goodMask << x >= 0) return false;
would affect performance.
This a rework from decimal to binary of the old Marchant calculator algorithm (sorry, I don't have a reference), in Ruby, adapted specifically for this question:
def isexactsqrt(v) value = v.abs residue = value root = 0 onebit = 1 onebit <<= 8 while (onebit < residue) onebit >>= 2 while (onebit > residue) while (onebit > 0) x = root + onebit if (residue >= x) then residue -= x root = x + onebit end root >>= 1 onebit >>= 2 end return (residue == 0) end
Here's a workup of something similar (please don't vote me down for coding style/smells or clunky O/O – it's the algorithm that counts, and C++ is not my home language). In this case, we're looking for residue == 0:
#include <iostream> using namespace std; typedef unsigned long long int llint; class ISqrt { // Integer Square Root llint value; // Integer whose square root is required llint root; // Result: floor(sqrt(value)) llint residue; // Result: value-root*root llint onebit, x; // Working bit, working value public: ISqrt(llint v = 2) { // Constructor Root(v); // Take the root }; llint Root(llint r) { // Resets and calculates new square root value = r; // Store input residue = value; // Initialise for subtracting down root = 0; // Clear root accumulator onebit = 1; // Calculate start value of counter onebit <<= (8*sizeof(llint)-2); // Set up counter bit as greatest odd power of 2 while (onebit > residue) {onebit >>= 2; }; // Shift down until just < value while (onebit > 0) { x = root ^ onebit; // Will check root+1bit (root bit corresponding to onebit is always zero) if (residue >= x) { // Room to subtract? residue -= x; // Yes - deduct from residue root = x + onebit; // and step root }; root >>= 1; onebit >>= 2; }; return root; }; llint Residue() { // Returns residue from last calculation return residue; }; }; int main() { llint big, i, q, r, v, delta; big = 0; big = (big-1); // Kludge for "big number" ISqrt b; // Make q sqrt generator for ( i = big; i > 0 ; i /= 7 ) { // for several numbers q = b.Root(i); // Get the square root r = b.Residue(); // Get the residue v = q*q+r; // Recalc original value delta = vi; // And diff, hopefully 0 cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n"; }; return 0; };
The sqrt call is not perfectly accurate, as has been mentioned, but it's interesting and instructive that it doesn't blow away the other answers in terms of speed. After all, the sequence of assembly language instructions for a sqrt is tiny. Intel has a hardware instruction, which isn't used by Java I believe because it doesn't conform to IEEE.
So why is it slow? Because Java is actually calling a C routine through JNI, and it's actually slower to do so than to call a Java subroutine, which itself is slower than doing it inline. This is very annoying, and Java should have come up with a better solution, ie building in floating point library calls if necessary. 好吧。
In C++, I suspect all the complex alternatives would lose on speed, but I haven't checked them all. What I did, and what Java people will find usefull, is a simple hack, an extension of the special case testing suggested by A. Rex. Use a single long value as a bit array, which isn't bounds checked. That way, you have 64 bit boolean lookup.
typedef unsigned long long UVLONG UVLONG pp1,pp2; void init2() { for (int i = 0; i < 64; i++) { for (int j = 0; j < 64; j++) if (isPerfectSquare(i * 64 + j)) { pp1 |= (1 << j); pp2 |= (1 << i); break; } } cout << "pp1=" << pp1 << "," << pp2 << "\n"; } inline bool isPerfectSquare5(UVLONG x) { return pp1 & (1 << (x & 0x3F)) ? isPerfectSquare(x) : false; }
The routine isPerfectSquare5 runs in about 1/3 the time on my core2 duo machine. I suspect that further tweaks along the same lines could reduce the time further on average, but every time you check, you are trading off more testing for more eliminating, so you can't go too much farther on that road.
Certainly, rather than having a separate test for negative, you could check the high 6 bits the same way.
Note that all I'm doing is eliminating possible squares, but when I have a potential case I have to call the original, inlined isPerfectSquare.
The init2 routine is called once to initialize the static values of pp1 and pp2. Note that in my implementation in C++, I'm using unsigned long long, so since you're signed, you'd have to use the >>> operator.
There is no intrinsic need to bounds check the array, but Java's optimizer has to figure this stuff out pretty quickly, so I don't blame them for that.
Project Euler is mentioned in the tags and many of the problems in it require checking numbers >> 2^64. Most of the optimizations mentioned above don't work easily when you are working with an 80 byte buffer.
I used java BigInteger and a slightly modified version of Newton's method, one that works better with integers. The problem was that exact squares n^2 converged to (n-1) instead of n because n^2-1 = (n-1)(n+1) and the final error was just one step below the final divisor and the algorithm terminated. It was easy to fix by adding one to the original argument before computing the error. (Add two for cube roots, etc.)
One nice attribute of this algorithm is that you can immediately tell if the number is a perfect square – the final error (not correction) in Newton's method will be zero. A simple modification also lets you quickly calculate floor(sqrt(x)) instead of the closest integer. This is handy with several Euler problems.
I like the idea to use an almost correct method on some of the input. Here is a version with a higher "offset". The code seems to work and passes my simple test case.
Just replace your:
if(n < 410881L){...}
code with this one:
if (n < 11043908100L) { //John Carmack hack, converted to Java. // See: http://www.codemaestro.com/reviews/9 int i; float x2, y; x2 = n * 0.5F; y = n; i = Float.floatToRawIntBits(y); //using the magic number from //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf //since it more accurate i = 0x5f375a86 - (i >> 1); y = Float.intBitsToFloat(i); y = y * (1.5F - (x2 * y * y)); y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate sqrt = Math.round(1.0F / y); } else { //Carmack hack gives incorrect answer for n >= 11043908100. sqrt = (long) Math.sqrt(n); }
I checked all of the possible results when the last n bits of a square is observed. By successively examining more bits, up to 5/6th of inputs can be eliminated. I actually designed this to implement Fermat's Factorization algorithm, and it is very fast there.
public static boolean isSquare(final long val) { if ((val & 2) == 2 || (val & 7) == 5) { return false; } if ((val & 11) == 8 || (val & 31) == 20) { return false; } if ((val & 47) == 32 || (val & 127) == 80) { return false; } if ((val & 191) == 128 || (val & 511) == 320) { return false; } // if((val & a == b) || (val & c == d){ // return false; // } if (!modSq[(int) (val % modSq.length)]) { return false; } final long root = (long) Math.sqrt(val); return root * root == val; }
The last bit of pseudocode can be used to extend the tests to eliminate more values. The tests above are for k = 0, 1, 2, 3
It first tests whether it has a square residual with moduli of power of two, then it tests based on a final modulus, then it uses the Math.sqrt to do a final test. I came up with the idea from the top post, and attempted to extend upon it. I appreciate any comments or suggestions.
Update: Using the test by a modulus, (modSq) and a modulus base of 44352, my test runs in 96% of the time of the one in the OP's update for numbers up to 1,000,000,000.
Considering for general bit length (though I have used specific type here), I tried to design simplistic algo as below. Simple and obvious check for 0,1,2 or <0 is required initially. Following is simple in sense that it doesn't try to use any existing maths functions. Most of the operator can be replaced with bit-wise operators. I haven't tested with any bench mark data though. I'm neither expert at maths or computer algorithm design in particular, I would love to see you pointing out problem. I know there is lots of improvement chances there.
int main() { unsigned int c1=0 ,c2 = 0; unsigned int x = 0; unsigned int p = 0; int k1 = 0; scanf("%d",&p); if(p % 2 == 0) { x = p/2; } else { x = (p/2) +1; } while(x) { if((x*x) > p) { c1 = x; x = x/2; }else { c2 = x; break; } } if((p%2) != 0) c2++; while(c2 < c1) { if((c2 * c2 ) == p) { k1 = 1; break; } c2++; } if(k1) printf("\n Perfect square for %d",c2); else printf("\n Not perfect but nearest to :%d :",c2); return 0; }
If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (eg into categories by largest bit set), then checking the value against an array of perfect squares within that range.
Don't know about fastest, but the simplest is to take the square root in the normal fashion, multiply the result by itself, and see if it matches your original value.
Since we're talking integers here, the fasted would probably involve a collection where you can just make a lookup.
If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?
Regarding the Carmac method, it seems like it would be quite easy just to iterate once more, which should double the number of digits of accuracy. It is, after all, an extremely truncated iterative method — Newton's, with a very good first guess.
Regarding your current best, I see two micro-optimizations:
- move the check vs. 0 after the check using mod255
- rearrange the dividing out powers of four to skip all the checks for the usual (75%) case.
即:
// Divide out powers of 4 using binary search if((n & 0x3L) == 0) { n >>=2; if((n & 0xffffffffL) == 0) n >>= 32; if((n & 0xffffL) == 0) n >>= 16; if((n & 0xffL) == 0) n >>= 8; if((n & 0xfL) == 0) n >>= 4; if((n & 0x3L) == 0) n >>= 2; }
Even better might be a simple
while ((n & 0x03L) == 0) n >>= 2;
Obviously, it would be interesting to know how many numbers get culled at each checkpoint — I rather doubt the checks are truly independent, which makes things tricky.
"I'm looking for the fastest way to determine if a long value is a perfect square (ie its square root is another integer)."
The answers are impressive, but I failed to see a simple check :
check whether the first number on the right of the long it a member of the set (0,1,4,5,6,9) . If it is not, then it cannot possibly be a 'perfect square' .
例如。
4567 – cannot be a perfect square.
It ought to be possible to pack the 'cannot be a perfect square if the last X digits are N' much more efficiently than that! I'll use java 32 bit ints, and produce enough data to check the last 16 bits of the number – that's 2048 hexadecimal int values.
…
好。 Either I have run into some number theory that is a little beyond me, or there is a bug in my code. In any case, here is the code:
public static void main(String[] args) { final int BITS = 16; BitSet foo = new BitSet(); for(int i = 0; i< (1<<BITS); i++) { int sq = (i*i); sq = sq & ((1<<BITS)-1); foo.set(sq); } System.out.println("int[] mayBeASquare = {"); for(int i = 0; i< 1<<(BITS-5); i++) { int kk = 0; for(int j = 0; j<32; j++) { if(foo.get((i << 5) | j)) { kk |= 1<<j; } } System.out.print("0x" + Integer.toHexString(kk) + ", "); if(i%8 == 7) System.out.println(); } System.out.println("};"); }
and here are the results:
(ed: elided for poor performance in prettify.js; view revision history to see.)
Here is a slightly different take, and it is not the fastest (in terms of benchmark) but it is quite simple and fast to write. If you just want to know if the square is a non-decimal number, and if you do not care if it is an integer, or a long, here is an example of what you can do:
public static boolean isRootLong(long originalNumber) { Double root = Math.sqrt(originalNumber); Long roundedRoot = Math.round(root); long square = Math.pow(roundedRoot); return square == originalNumber; }
If you do not need micro-optimization, this answer is better in terms of simplicity and maintainability.