一个 Swift 正则表达式类,抽象出 NSRegularExpression
的一些复杂性
每次在 Swift 中使用 NSRegularExpression
时,我都会一遍又一遍地犯同样的错误,关于 NSRange
和 Range<String.Index>
之间的范围和范围转换。
此外,使用捕获组提取内容既繁琐又容易出错。我想抽象出我一直在搞砸的一些事情。
let inputText: String = <some text to match against>
// Build the regex to match against (in this case, <number>\t<string>)
// This regex has two capture groups, one for the number and one for the string.
let regex = try DSFRegex(#"(\d*)\t\"([^\"]+)\""#)
// Retrieve ALL the matches for the supplied text
let searchResult = regex.matches(for: inputText)
// Loop over each of the matches found, and print them out
searchResult.forEach { match in
let foundStr = inputText[match.range] // The text of the entire match
let numberVal = inputText[match.captures[0]] // Retrieve the first capture group text.
let stringVal = inputText[match.captures[1]] // Retrieve the second capture group text.
Swift.print("Number is \(numberVal), String is \(stringVal)")
}
'matches' 结果的基本结构如下
Matches
> matches: An array of regex matches
> range: A match range. This range specifies the match range within the original text being searched
> captures: An array of capture groups
> A capture range. This range represents the range of a capture within the original text being searched
提供给调用者的所有范围(反之,将范围传递给 regex 对象时)都在为匹配传入的 Swift String
的范围内。
这一点很重要,因为 NSRegularExpression
使用 NSString
,并且 NSString
和 String
之间的代码点和字符范围信息不同,尤其是在处理高 Unicode 范围内的字符(如 emoji 🇦🇲 👨👩👦)时。
您可以使用构造函数和一个 regex 模式来创建一个 regex 匹配对象。 如果 regex 格式错误或无法编译,则此构造函数将抛出异常。
// Match against dummy phone numbers XXXX-YYY-ZZZ
let phoneNumberRegex = try DSFRegex(#"(\d{4})-(\d{3})-(\d{3})"#)
要检查字符串是否与 regex 匹配,请使用 hasMatch
方法。
let hasAMatch = phoneNumberRegex.hasMatch("0499-999-999") // true
let noMatch = phoneNumberRegex.hasMatch("0499 999 999") // false
如果要提取所有匹配信息,请使用 matches
方法。
let result = phoneNumberRegex.matches(for: "0499-999-999 0491-111-444 4324-222-123")
result.forEach { match in
let matchText = result.text(for: match.element)
Swift.print("Match `\(matchText)`")
for capture in match.captures {
let captureText = result.text(for: capture)
Swift.print(" - `\(captureText)`")
}
}
如果您有大量的输入文本或复杂的 regex 需要一段时间才能处理,或者您的内存条件受到限制,您可以选择枚举匹配结果,而不是预先处理所有内容。
枚举方法允许您随时或流程中的任何时间点停止处理(例如,如果您的时间限制,或正在寻找文本中的特定匹配项)。
/// Find all email addresses within a text
let inputString = "… some input string …"
let emailRegex = try DSFRegex("… some regex …")
emailRegex.enumerateMatches(in: inputString) { (match) -> Bool in
// Extract match information
let matchRange = match.range
let matchText = inputString[match.range]
Swift.print("Found '\(matchText)' at range \(matchRange)")
// Continue processing
return true
}
当您在字符串中零星地搜索时,例如为了响应用户点击“下一步”按钮,字符串搜索游标非常有用。游标会跟踪当前匹配项,并在字符串中查找下一个匹配项时使用。
var searchCursor: DSFRegex.Cursor?
var content: String
@IBAction func startSearch(_ sender: Any) {
let regex = DSFRegex(... some pattern ...)
// Find the first match in the string
self.searchCursor = self.content.firstMatch(for: regex)
self.displayForCurrentSearch()
}
@IBAction func nextSearchResult(_ sender: Any) {
if let previous = self.searchCursor {
// Find the next match in the string from the
self.searchCursor = self.content.nextMatch(for: previous)
}
self.displayForCurrentSearch()
}
internal func displayForCurrentSearch() {
// Update the UI reflecting the search result found in self.searchCursor
...
}
返回一个新字符串,其中包含与模板字符串替换的匹配正则表达式。
// Redact email addresses within the text
let emailRegex = try DSFRegex("… some regex …")
let redacted = emailRegex.stringByReplacingMatches(
in: inputString,
withTemplate: NSRegularExpression.escapedTemplate(for: "<REDACTED-EMAIL-ADDRESS>")
)
用于执行 regex 匹配的主要类。
一个对象,其中包含与文本匹配的 regex 的所有结果。它还提供了许多方法来帮助从匹配和/或捕获对象中提取文本。
单个匹配对象。存储匹配在原始字符串中的范围。如果 regex 中定义了捕获组,则还包含捕获组对象的数组。
捕获表示与 regex 结果中的捕获匹配的单个范围。每个 match
可能包含 0 个或多个捕获,具体取决于 regex 中可用的捕获。
通过 String
扩展进行搜索时使用的增量游标对象。
pod 'DSFRegex', :git => 'https://github.com/dagronf/DSFRegex/'
将 https://github.com/dagronf/DSFRegex
添加到您的项目。
将 Sources/DSFRegex
中的文件复制到您的项目中
有关更多示例和用法,您可以在 Tests
文件夹中找到一系列测试。
let phoneNumberRegex = try DSFRegex(#"(\d{4})-(\d{3})-(\d{3})"#)
let results = phoneNumberRegex.matches(for: "4499-999-999 3491-111-444 4324-222-123")
// results.numberOfMatches == 3
// results.text(match: 0) == "4499-999-999"
// results.text(match: 1) == "3491-111-444"
// results.text(match: 2) == "4324-222-123"
// Just retrieve the text for each of the matches
let textMatches = results.textMatching() // == ["4499-999-999", "3491-111-444, "4324-222-123"]
如果您只对第一个匹配项感兴趣,请使用
let first = phoneNumberRegex.firstMatch(in: "4499-999-999 3491-111-444 4324-222-123")
let allMatches = phoneNumberRegex.matches(for: "0499-999-999 0491-111-444 4324-222-123")
for match in allMatches.matches.enumerated() {
let matchText = allMatches.text(for: match.element)
Swift.print("Match (\(match.offset)) -> `\(matchText)`")
for capture in match.element.capture.enumerated() {
let captureText = allMatches.text(for: capture.element)
Swift.print(" Capture (\(capture.offset)) -> `\(captureText)`")
}
}
输出:-
Match (0) -> `0499-999-888`
Capture (0) -> `0499`
Capture (1) -> `999`
Capture (2) -> `888`
Match (1) -> `0491-111-444`
Capture (0) -> `0491`
Capture (1) -> `111`
Capture (2) -> `444`
Match (2) -> `4324-222-123`
Capture (0) -> `4324`
Capture (1) -> `222`
Capture (2) -> `123`
/// Find all email addresses within a text
let emailRegex = try DSFRegex("… some regex …")
let inputString = "This is a test.\n noodles@compuserve4.nginix.com and sillytest32@gmail.com, grubby@supernoodle.org lives here"
var count = 0
emailRegex.enumerateMatches(in: inputString) { (match) -> Bool in
count += 1
// Extract match information
let matchRange = match.range
let nsRange = NSRange(matchRange, in: inputString)
let matchText = inputString[match.range]
Swift.print("\(count) - Found '\(matchText)' at range \(nsRange)")
// Stop processing if we've found more than two
return count < 2
}
输出:-
1 - Found 'noodles@compuserve4.nginix.com' at range {17, 30}
2 - Found 'sillytest32@gmail.com' at range {52, 21}
MIT License
Copyright (c) 2024 Darren Ford
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.