去掉一个string的html标签
我试图从string中删除html标签,以便我可以输出干净的文本。
[实际上这个工作]
let str = string.stringByReplacingOccurrencesOfString("<[^>]+>", withString: "", options: .RegularExpressionSearch, range: nil) print(str)
编辑:斯威夫特3
let str = string.replacingOccurrences(of: "<[^>]+>", with: "", options: .regularExpression, range: nil) print(str)
嗯,我尝试了你的function,它在一个小例子上工作:
var string = "<!DOCTYPE html> <html> <body> <h1>My First Heading</h1> <p>My first paragraph.</p> </body> </html>" let str = string.stringByReplacingOccurrencesOfString("<[^>]+>", withString: "", options: .RegularExpressionSearch, range: nil) print(str) //output " My First Heading My first paragraph. "
你能举一个问题的例子吗?
由于HTML不是一种常规的语言 (HTML是一种上下文无关的语言),所以不能使用正则expression式。 请参阅: 使用正则expression式parsingHTML:为什么不呢?
我会考虑使用NSAttributedString。
let htmlString = "LCD Soundsystem was the musical project of producer <a href='http://www.last.fm/music/James+Murphy' class='bbcode_artist'>James Murphy</a>, co-founder of <a href='http://www.last.fm/tag/dance-punk' class='bbcode_tag' rel='tag'>dance-punk</a> label <a href='http://www.last.fm/label/DFA' class='bbcode_label'>DFA</a> Records. Formed in 2001 in New York City, New York, United States, the music of LCD Soundsystem can also be described as a mix of <a href='http://www.last.fm/tag/alternative%20dance' class='bbcode_tag' rel='tag'>alternative dance</a> and <a href='http://www.last.fm/tag/post%20punk' class='bbcode_tag' rel='tag'>post punk</a>, along with elements of <a href='http://www.last.fm/tag/disco' class='bbcode_tag' rel='tag'>disco</a> and other styles. <br />" let htmlStringData = htmlString.dataUsingEncoding(NSUTF8StringEncoding)! let options: [String: AnyObject] = [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: NSUTF8StringEncoding] let attributedHTMLString = try! NSAttributedString(data: htmlStringData, options: options, documentAttributes: nil) let string = attributedHTMLString.string
我正在使用以下扩展来删除特定的HTML元素:
extension String { func deleteHTMLTag(tag:String) -> String { return self.stringByReplacingOccurrencesOfString("(?i)</?\(tag)\\b[^<]*>", withString: "", options: .RegularExpressionSearch, range: nil) } func deleteHTMLTags(tags:[String]) -> String { var mutableString = self for tag in tags { mutableString = mutableString.deleteHTMLTag(tag) } return mutableString } }
这使得只能从string中删除<a>
标签,例如:
let string = "my html <a href="">link text</a>" let withoutHTMLString = string.deleteHTMLTag("a") // Will be "my html link text"