你将如何计算string中string的出现?
我正在做一些事情,我意识到我想在一个string中能find多less个字符,然后让我觉得有几种方法可以做到,但是不能决定什么是最好的(或者最简单的)是。
目前我正在用类似的东西:
string source = "/once/upon/a/time/"; int count = source.Length - source.Replace("/", "").Length;
但是,我不喜欢它,任何接受者?
我真的不想挖出RegEx这个,是吗?
我知道我的string将有我正在寻找的术语,所以你可以假设…
当然对于长度大于1的string,
string haystack = "/once/upon/a/time"; string needle = "/"; int needleCount = ( haystack.Length - haystack.Replace(needle,"").Length ) / needle.Length
如果你使用的是.NET 3.5,你可以用LINQ单线程来完成:
int count = source.Count(f => f == '/');
如果你不想使用LINQ,你可以这样做:
int count = source.Split('/').Length - 1;
你可能会惊讶地发现,你的原始技术似乎比其中的任何一个都快30%! 我刚刚用“/ once / upon / a / time /”做了一个快速的基准,结果如下:
你的原稿= 12
source.Count = 19s
source.Split = 17s
foreach( 来自bobwienholt的答案 )= 10s
(时间是50000000次迭代,所以你不可能注意到现实世界中的很多差异。)
string source = "/once/upon/a/time/"; int count = 0; foreach (char c in source) if (c == '/') count++;
必须比source.Replace()
本身更快。
int count = new Regex(Regex.Escape(needle)).Matches(haystack).Count;
如果你想能够search整个string,而不仅仅是字符:
src.Select((c, i) => src.Substring(i)).Count(sub => sub.StartsWith(target))
读为“对于string中的每个字符,从该字符开始的string的其余部分作为子string;如果它以目标string开始,则将其计数。
我做了一些研究,发现理查德·沃森的解决scheme在大多数情况下是最快的。 这是表格中的每个解决scheme的结果(除了那些使用正则expression式,因为它在抛出像“test {test”)之类的string时抛出exception
Name | Short/char | Long/char | Short/short| Long/short | Long/long | Inspite | 134| 1853| 95| 1146| 671| LukeH_1 | 346| 4490| N/A| N/A| N/A| LukeH_2 | 152| 1569| 197| 2425| 2171| Bobwienholt | 230| 3269| N/A| N/A| N/A| Richard Watson| 33| 298| 146| 737| 543| StefanosKargas| N/A| N/A| 681| 11884| 12486|
你可以看到,如果在短string(10-50个字符)中发现短子string(1-5个字符)的出现次数,则优选原始algorithm。
另外,对于多字符子string,您应该使用以下代码(基于Richard Watson的解决scheme)
int count = 0, n = 0; if(substring != "") { while ((n = source.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1) { n += substring.Length; ++count; } }
LINQ适用于所有的集合,而且由于string只是一个字符集合,所以这个可爱的小单行如何:
var count = source.Count(c => c == '/');
确保你有using System.Linq;
在代码文件的顶部,因为.Count
是来自该命名空间的扩展方法。
这些都只适用于单字符search条件…
countOccurences("the", "the answer is the answer"); int countOccurences(string needle, string haystack) { return (haystack.Length - haystack.Replace(needle,"").Length) / needle.Length; }
可能会变得更好的更长的针…
但是必须有一个更优雅的方式。 🙂
string source = "/once/upon/a/time/"; int count = 0; int n = 0; while ((n = source.IndexOf('/', n)) != -1) { n++; count++; }
在我的电脑上,它比每字符解决scheme快5000万次迭代快了2秒。
2013修订版:
将string更改为char []并遍历该string。 在总共50m迭代的时间内再削减一两个!
char[] testchars = source.ToCharArray(); foreach (char c in testchars) { if (c == '/') count++; }
这更快:
char[] testchars = source.ToCharArray(); int length = testchars.Length; for (int n = 0; n < length; n++) { if (testchars[n] == '/') count++; }
为了更好的衡量,从数组末尾迭代到0似乎是最快的,减less了大约5%。
int length = testchars.Length; for (int n = length-1; n >= 0; n--) { if (testchars[n] == '/') count++; }
我想知道为什么这可能是和谷歌search周围(我记得一些关于反向迭代更快),并来到这个问题,这恼人地使用stringchar []技术已经。 不过,我认为逆转技巧在这方面是新的。
什么是最快速的方式来遍历C#中的string中的个别字符?
编辑:
source.Split('/').Length-1
Regex.Matches( Regex.Escape(input), "stringToMatch" ).Count
在C#中,一个不错的String SubString计数器是这个意外棘手的人:
public static int CCount(String haystack, String needle) { return haystack.Split(new[] { needle }, StringSplitOptions.None).Length - 1; }
string s = "65 fght 6565 4665 hjk"; int count = 0; foreach (Match m in Regex.Matches(s, "65")) count++;
private int CountWords(string text, string word) { int count = (text.Length - text.Replace(word, "").Length) / word.Length; return count; }
由于原来的解决scheme,是最快的字符,我想这也将是string。 所以这是我的贡献。
在上下文中:我正在查找日志文件中的“失败”和“成功”等单词。
Gr,Ben
对于任何想要使用String扩展方法的人来说,
这里是我使用哪些是基于最好的张贴答案:
public static class StringExtension { /// <summary> Returns the number of occurences of a string within a string, optional comparison allows case and culture control. </summary> public static int Occurrences(this System.String input, string value, StringComparison stringComparisonType = StringComparison.Ordinal) { if (String.IsNullOrEmpty(value)) return 0; int count = 0; int position = 0; while ((position = input.IndexOf(value, position, stringComparisonType)) != -1) { position += value.Length; count += 1; } return count; } /// <summary> Returns the number of occurences of a single character within a string. </summary> public static int Occurrences(this System.String input, char value) { int count = 0; foreach (char c in input) if (c == value) count += 1; return count; } }
public static int GetNumSubstringOccurrences(string text, string search) { int num = 0; int pos = 0; if (!string.IsNullOrEmpty(text) && !string.IsNullOrEmpty(search)) { while ((pos = text.IndexOf(search, pos)) > -1) { num ++; pos += search.Length; } } return num; }
我认为最简单的方法是使用正则expression式。 这样你可以得到与使用myVar.Split('x')相同的拆分计数,但是在多字符设置中。
string myVar = "do this to count the number of words in my wording so that I can word it up!"; int count = Regex.Split(myVar, "word").Length;
string出现的通用函数:
public int getNumberOfOccurencies(String inputString, String checkString) { if (checkString.Length > inputString.Length || checkString.Equals("")) { return 0; } int lengthDifference = inputString.Length - checkString.Length; int occurencies = 0; for (int i = 0; i < lengthDifference; i++) { if (inputString.Substring(i, checkString.Length).Equals(checkString)) { occurencies++; i += checkString.Length - 1; } } return occurencies; }
string source = "/once/upon/a/time/"; int count = 0, n = 0; while ((n = source.IndexOf('/', n) + 1) != 0) count++;
Richard Watson答案中的一个变体,在提高效率的同时稍快一点,string中出现的次数就越多,代码就越less!
虽然我必须说,没有广泛的testing每一个场景,我看到了一个非常显着的速度提高,通过使用:
int count = 0; for (int n = 0; n < source.Length; n++) if (source[n] == '/') count++;
stringstring:
在“JD JD JD JD等JDJDJDJDJDJDJDJD等中查找”等“
var strOrigin = " .. JD JD JD JD etc. and etc. JDJDJDJDJDJDJDJD and etc."; var searchStr = "etc"; int count = (strOrigin.Length - strOrigin.Replace(searchStr, "").Length)/searchStr.Length.
检查性能之前,抛弃这一个不健全/笨拙…
string Name = "Very good nice one is very good but is very good nice one this is called the term"; bool valid=true; int count = 0; int k=0; int m = 0; while (valid) { k = Name.Substring(m,Name.Length-m).IndexOf("good"); if (k != -1) { count++; m = m + k + 4; } else valid = false; } Console.WriteLine(count + " Times accures");
var conditionalStatement = conditionSetting.Value; //order of replace matters, remove == before =, incase of === conditionalStatement = conditionalStatement.Replace("==", "~").Replace("!=", "~").Replace('=', '~').Replace('!', '~').Replace('>', '~').Replace('<', '~').Replace(">=", "~").Replace("<=", "~"); var listOfValidConditions = new List<string>() { "!=", "==", ">", "<", ">=", "<=" }; if (conditionalStatement.Count(x => x == '~') != 1) { result.InvalidFieldList.Add(new KeyFieldData(batch.DECurrentField, "The IsDoubleKeyCondition does not contain a supported conditional statement. Contact System Administrator.")); result.Status = ValidatorStatus.Fail; return result; }
需要做一些类似于从string中testing条件语句的东西。
用单个字符replace了我正在寻找的东西,并计算了单个字符的实例。
很明显,你正在使用的单个字符将需要检查,以避免不正确的计数之前不存在的string。
string s = "HOWLYH THIS ACTUALLY WORKSH WOWH"; int count = 0; for (int i = 0; i < s.Length; i++) if (s[i] == 'H') count++;
它只是检查string中的每个字符,如果字符是你正在search的字符,加一个来计数。
如果你看看这个网页 ,有15种不同的方法是基准的,包括使用并行循环。
最快的方法似乎是使用单线程for循环(如果您有.Net版本<4.0)或parallel.for循环(如果使用.Net> 4.0与数以千计的检查)。
假设“ss”是你的searchstring,“ch”是你的字符数组(如果你有多个你正在查找的字符),下面是单线程运行速度最快的代码的基本要点:
for (int x = 0; x < ss.Length; x++) { for (int y = 0; y < ch.Length; y++) { for (int a = 0; a < ss[x].Length; a++ ) { if (ss[x][a] == ch[y]) //it's found. DO what you need to here. } } }
基准源代码也提供,所以你可以运行你自己的testing。
以为我会把我的扩展方法扔到戒指(更多信息见评论)。 我还没有做过任何正式的板凳标记,但是我认为在大多数情况下它必须非常快。
编辑:好的 – 所以这个问题让我想知道我们当前的实现的性能如何堆叠对这里提出的一些解决scheme。 我决定做一个小小的标记,发现我们的解决scheme非常符合理查德·沃森(Richard Watson)提供的解决scheme的性能,直到你用大string(100 Kb +),大的子string(32 Kb + )和许多embedded式重复(10K +)。 那时我们的解决scheme慢了大约2倍到4倍。 鉴于这一点以及我们非常喜欢Richard Watson提出的解决scheme,我们相应地重构了我们的解决scheme。 我只是想把这个提供给任何可能从中受益的人。
我们的原始解决
/// <summary> /// Counts the number of occurrences of the specified substring within /// the current string. /// </summary> /// <param name="s">The current string.</param> /// <param name="substring">The substring we are searching for.</param> /// <param name="aggressiveSearch">Indicates whether or not the algorithm /// should be aggressive in its search behavior (see Remarks). Default /// behavior is non-aggressive.</param> /// <remarks>This algorithm has two search modes - aggressive and /// non-aggressive. When in aggressive search mode (aggressiveSearch = /// true), the algorithm will try to match at every possible starting /// character index within the string. When false, all subsequent /// character indexes within a substring match will not be evaluated. /// For example, if the string was 'abbbc' and we were searching for /// the substring 'bb', then aggressive search would find 2 matches /// with starting indexes of 1 and 2. Non aggressive search would find /// just 1 match with starting index at 1. After the match was made, /// the non aggressive search would attempt to make it's next match /// starting at index 3 instead of 2.</remarks> /// <returns>The count of occurrences of the substring within the string.</returns> public static int CountOccurrences(this string s, string substring, bool aggressiveSearch = false) { // if s or substring is null or empty, substring cannot be found in s if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(substring)) return 0; // if the length of substring is greater than the length of s, // substring cannot be found in s if (substring.Length > s.Length) return 0; var sChars = s.ToCharArray(); var substringChars = substring.ToCharArray(); var count = 0; var sCharsIndex = 0; // substring cannot start in s beyond following index var lastStartIndex = sChars.Length - substringChars.Length; while (sCharsIndex <= lastStartIndex) { if (sChars[sCharsIndex] == substringChars[0]) { // potential match checking var match = true; var offset = 1; while (offset < substringChars.Length) { if (sChars[sCharsIndex + offset] != substringChars[offset]) { match = false; break; } offset++; } if (match) { count++; // if aggressive, just advance to next char in s, otherwise, // skip past the match just found in s sCharsIndex += aggressiveSearch ? 1 : substringChars.Length; } else { // no match found, just move to next char in s sCharsIndex++; } } else { // no match at current index, move along sCharsIndex++; } } return count; }
这里是我们修改后的解决scheme:
/// <summary> /// Counts the number of occurrences of the specified substring within /// the current string. /// </summary> /// <param name="s">The current string.</param> /// <param name="substring">The substring we are searching for.</param> /// <param name="aggressiveSearch">Indicates whether or not the algorithm /// should be aggressive in its search behavior (see Remarks). Default /// behavior is non-aggressive.</param> /// <remarks>This algorithm has two search modes - aggressive and /// non-aggressive. When in aggressive search mode (aggressiveSearch = /// true), the algorithm will try to match at every possible starting /// character index within the string. When false, all subsequent /// character indexes within a substring match will not be evaluated. /// For example, if the string was 'abbbc' and we were searching for /// the substring 'bb', then aggressive search would find 2 matches /// with starting indexes of 1 and 2. Non aggressive search would find /// just 1 match with starting index at 1. After the match was made, /// the non aggressive search would attempt to make it's next match /// starting at index 3 instead of 2.</remarks> /// <returns>The count of occurrences of the substring within the string.</returns> public static int CountOccurrences(this string s, string substring, bool aggressiveSearch = false) { // if s or substring is null or empty, substring cannot be found in s if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(substring)) return 0; // if the length of substring is greater than the length of s, // substring cannot be found in s if (substring.Length > s.Length) return 0; int count = 0, n = 0; while ((n = s.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1) { if (aggressiveSearch) n++; else n += substring.Length; count++; } return count; }
我最初的看法给了我一些东西:
public static int CountOccurrences(string original, string substring) { if (string.IsNullOrEmpty(substring)) return 0; if (substring.Length == 1) return CountOccurrences(original, substring[0]); if (string.IsNullOrEmpty(original) || substring.Length > original.Length) return 0; int substringCount = 0; for (int charIndex = 0; charIndex < original.Length; charIndex++) { for (int subCharIndex = 0, secondaryCharIndex = charIndex; subCharIndex < substring.Length && secondaryCharIndex < original.Length; subCharIndex++, secondaryCharIndex++) { if (substring[subCharIndex] != original[secondaryCharIndex]) goto continueOuter; } if (charIndex + substring.Length > original.Length) break; charIndex += substring.Length - 1; substringCount++; continueOuter: ; } return substringCount; } public static int CountOccurrences(string original, char @char) { if (string.IsNullOrEmpty(original)) return 0; int substringCount = 0; for (int charIndex = 0; charIndex < original.Length; charIndex++) if (@char == original[charIndex]) substringCount++; return substringCount; }
大海捞针中使用replace和除法的针数为21+秒,而这大约需要15.2秒。
编辑后添加一个将添加substring.Length - 1
到charIndex(像它应该),它在11.6秒。
编辑2:我使用了一个有26个双字符string的string,这里是更新到相同示例文本的时间:
大海捞针(OP的版本):7.8秒
build议的机制:4.6秒。
编辑3:添加单个angular色的angular落案例,它去了1.2秒。
编辑4:对于上下文:使用了5000万次迭代。
str="aaabbbbjjja"; int count = 0; int size = str.Length; string[] strarray = new string[size]; for (int i = 0; i < str.Length; i++) { strarray[i] = str.Substring(i, 1); } Array.Sort(strarray); str = ""; for (int i = 0; i < strarray.Length - 1; i++) { if (strarray[i] == strarray[i + 1]) { count++; } else { count++; str = str + strarray[i] + count; count = 0; } } count++; str = str + strarray[strarray.Length - 1] + count;
这是为了计算人物的发生。 对于这个例子输出将是“a4b4j3”
string search = "/string"; var occurrences = (regex.Match(search, @"\/")).Count;
每次程序发现“/ s”时都要进行计数(区分大小写),并将其数量存储在variables“occurrences”中