C#清理文件名

我最近一直在从各个地方把一堆MP3转移到一个存储库。 我一直在使用ID3标签构build新的文件名(谢谢,TagLib-Sharp!),我注意到我得到了一个System.NotSupportedException

“给定path的格式不被支持。”

这是由File.Copy()Directory.CreateDirectory()

认识到我的文件名需要消毒没多久。 所以我做了明显的事情:

 public static string SanitizePath_(string path, char replaceChar) { string dir = Path.GetDirectoryName(path); foreach (char c in Path.GetInvalidPathChars()) dir = dir.Replace(c, replaceChar); string name = Path.GetFileName(path); foreach (char c in Path.GetInvalidFileNameChars()) name = name.Replace(c, replaceChar); return dir + name; } 

令我惊讶的是,我仍然得到例外。 事实certificate,':'不在Path.GetInvalidPathChars()的集合中,因为它在path根中是有效的。 我想这是有道理的 – 但这是一个相当普遍的问题。 有没有人有一些短码清理path? 我已经提出了最彻底的,但它感觉可能是矫枉过正。

  // replaces invalid characters with replaceChar public static string SanitizePath(string path, char replaceChar) { // construct a list of characters that can't show up in filenames. // need to do this because ":" is not in InvalidPathChars if (_BadChars == null) { _BadChars = new List<char>(Path.GetInvalidFileNameChars()); _BadChars.AddRange(Path.GetInvalidPathChars()); _BadChars = Utility.GetUnique<char>(_BadChars); } // remove root string root = Path.GetPathRoot(path); path = path.Remove(0, root.Length); // split on the directory separator character. Need to do this // because the separator is not valid in a filename. List<string> parts = new List<string>(path.Split(new char[]{Path.DirectorySeparatorChar})); // check each part to make sure it is valid. for (int i = 0; i < parts.Count; i++) { string part = parts[i]; foreach (char c in _BadChars) { part = part.Replace(c, replaceChar); } parts[i] = part; } return root + Utility.Join(parts, Path.DirectorySeparatorChar.ToString()); } 

任何改进,使这个function更快,巴洛克式将不胜感激。

清理一个文件名,你可以做到这一点

 private static string MakeValidFileName( string name ) { string invalidChars = System.Text.RegularExpressions.Regex.Escape( new string( System.IO.Path.GetInvalidFileNameChars() ) ); string invalidRegStr = string.Format( @"([{0}]*\.+$)|([{0}]+)", invalidChars ); return System.Text.RegularExpressions.Regex.Replace( name, invalidRegStr, "_" ); } 

较短的解决scheme:

 var invalids = System.IO.Path.GetInvalidFileNameChars(); var newName = String.Join("_", origFileName.Split(invalids, StringSplitOptions.RemoveEmptyEntries) ).TrimEnd('.'); 

基于Andre的出色答案,但是考虑到Spud对保留字的评论,我做了这个版本:

 /// <summary> /// Strip illegal chars and reserved words from a candidate filename (should not include the directory path) /// </summary> /// <remarks> /// http://stackoverflow.com/questions/309485/c-sharp-sanitize-file-name /// </remarks> public static string CoerceValidFileName(string filename) { var invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars())); var invalidReStr = string.Format(@"[{0}]+", invalidChars); var reservedWords = new [] { "CON", "PRN", "AUX", "CLOCK$", "NUL", "COM0", "COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9", "LPT0", "LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9" }; var sanitisedNamePart = Regex.Replace(filename, invalidReStr, "_"); foreach (var reservedWord in reservedWords) { var reservedWordPattern = string.Format("^{0}\\.", reservedWord); sanitisedNamePart = Regex.Replace(sanitisedNamePart, reservedWordPattern, "_reservedWord_.", RegexOptions.IgnoreCase); } return sanitisedNamePart; } 

这些是我的unit testing

 [Test] public void CoerceValidFileName_SimpleValid() { var filename = @"thisIsValid.txt"; var result = PathHelper.CoerceValidFileName(filename); Assert.AreEqual(filename, result); } [Test] public void CoerceValidFileName_SimpleInvalid() { var filename = @"thisIsNotValid\3\\_3.txt"; var result = PathHelper.CoerceValidFileName(filename); Assert.AreEqual("thisIsNotValid_3__3.txt", result); } [Test] public void CoerceValidFileName_InvalidExtension() { var filename = @"thisIsNotValid.t\xt"; var result = PathHelper.CoerceValidFileName(filename); Assert.AreEqual("thisIsNotValid.t_xt", result); } [Test] public void CoerceValidFileName_KeywordInvalid() { var filename = "aUx.txt"; var result = PathHelper.CoerceValidFileName(filename); Assert.AreEqual("_reservedWord_.txt", result); } [Test] public void CoerceValidFileName_KeywordValid() { var filename = "auxillary.txt"; var result = PathHelper.CoerceValidFileName(filename); Assert.AreEqual("auxillary.txt", result); } 
 string clean = String.Concat(dirty.Split(Path.GetInvalidFileNameChars())); 

我使用System.IO.Path.GetInvalidFileNameChars()方法来检查无效字符,我没有任何问题。

我使用下面的代码:

 foreach( char invalidchar in System.IO.Path.GetInvalidFileNameChars()) { filename = filename.Replace(invalidchar, '_'); } 

我认为问题是你首先调用坏​​string上的Path.GetDirectoryName 。 如果这里有非文件名字符,.Net不能确定哪个部分是目录和引发。 你必须做string比较。

假设只有文件名是坏的,而不是整个path,试试这个:

 public static string SanitizePath(string path, char replaceChar) { int filenamePos = path.LastIndexOf(Path.DirectorySeparatorChar) + 1; var sb = new System.Text.StringBuilder(); sb.Append(path.Substring(0, filenamePos)); for (int i = filenamePos; i < path.Length; i++) { char filenameChar = path[i]; foreach (char c in Path.GetInvalidFileNameChars()) if (filenameChar.Equals(c)) { filenameChar = replaceChar; break; } sb.Append(filenameChar); } return sb.ToString(); } 

过去我已经取得了成功。

很好,很短而且静态的:-)

  public static string returnSafeString(string s) { foreach (char character in Path.GetInvalidFileNameChars()) { s = s.Replace(character.ToString(),string.Empty); } foreach (char character in Path.GetInvalidPathChars()) { s = s.Replace(character.ToString(), string.Empty); } return (s); } 

我想以某种方式保留字符,而不是简单地用下划线replace字符。

我认为的一种方式是用类似的字符(在我的情况下)replace字符,不太可能用作常规字符。 所以我拿了无效字符的列表,发现看起来像一个喜欢。

以下是使用look-a-likes进行编码和解码的function。

此代码不包含所有System.IO.Path.GetInvalidFileNameChars()字符的完整列表。 所以,由您来扩展或利用下划线replace任何剩余的字符。

 private static Dictionary<string, string> EncodeMapping() { //-- Following characters are invalid for windows file and folder names. //-- \/:*?"<>| Dictionary<string, string> dic = new Dictionary<string, string>(); dic.Add(@"\", "Ì"); // U+OOCC dic.Add("/", "Í"); // U+OOCD dic.Add(":", "¦"); // U+00A6 dic.Add("*", "¤"); // U+00A4 dic.Add("?", "¿"); // U+00BF dic.Add(@"""", "ˮ"); // U+02EE dic.Add("<", "«"); // U+00AB dic.Add(">", "»"); // U+00BB dic.Add("|", "│"); // U+2502 return dic; } public static string Escape(string name) { foreach (KeyValuePair<string, string> replace in EncodeMapping()) { name = name.Replace(replace.Key, replace.Value); } //-- handle dot at the end if (name.EndsWith(".")) name = name.CropRight(1) + "°"; return name; } public static string UnEscape(string name) { foreach (KeyValuePair<string, string> replace in EncodeMapping()) { name = name.Replace(replace.Value, replace.Key); } //-- handle dot at the end if (name.EndsWith("°")) name = name.CropRight(1) + "."; return name; } 

你可以select你自己的看起来像一个喜欢。 我使用Windows中的angular色地图应用程序来select我的%windir%\system32\charmap.exe

当我通过发现进行调整时,我将更新此代码。

如果将目录和文件名一起附加并清理,而不是单独清理它们,则代码将更清晰。 至于消毒:只要把string中的第二个字符。 如果它等于“replacechar”,则用冒号replace它。 由于这个应用程序是为自己使用,这样的解决scheme应该是完全足够的。

 using System; using System.IO; using System.Linq; using System.Text; public class Program { public static void Main() { try { var badString = "ABC\\DEF/GHI<JKL>MNO:PQR\"STU\tVWX|YZA*BCD?EFG"; Console.WriteLine(badString); Console.WriteLine(SanitizeFileName(badString, '.')); Console.WriteLine(SanitizeFileName(badString)); } catch (Exception ex) { Console.WriteLine(ex.ToString()); } } private static string SanitizeFileName(string fileName, char? replacement = null) { if (fileName == null) { return null; } if (fileName.Length == 0) { return ""; } var sb = new StringBuilder(); var badChars = Path.GetInvalidFileNameChars().ToList(); foreach (var @char in fileName) { if (badChars.Contains(@char)) { if (replacement.HasValue) { sb.Append(replacement.Value); } continue; } sb.Append(@char); } return sb.ToString(); } } 

这是一个基于Andre代码的高效的延迟加载扩展方法:

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace LT { public static class Utility { static string invalidRegStr; public static string MakeValidFileName(this string name) { if (invalidRegStr == null) { var invalidChars = System.Text.RegularExpressions.Regex.Escape(new string(System.IO.Path.GetInvalidFileNameChars())); invalidRegStr = string.Format(@"([{0}]*\.+$)|([{0}]+)", invalidChars); } return System.Text.RegularExpressions.Regex.Replace(name, invalidRegStr, "_"); } } } 
Interesting Posts