Fuzi (斧子)

一个快速且轻量级的 Swift XML/HTML 解析器，让你的生活更轻松。 [文档]

Fuzi 基于 Mattt Thompson 的 Ono(斧) 的 Swift 移植版本，使用了它的大部分底层实现，并按照标准 Swift 惯例进行了适度的类和接口重新设计，同时修复了一些 Bug。

Fuzi(斧子) 的意思是 "axe" (斧头)，是为了向 Ono(斧) 致敬，而 Ono 又受到 Nokogiri (鋸) 的启发，Nokogiri 的意思是 "saw" (锯子)。

快速概览

let xml = "..."
// or
// let xmlData = <some NSData or Data>
do {
  let document = try XMLDocument(string: xml)
  // or
  // let document = try XMLDocument(data: xmlData)
  
  if let root = document.root {
    // Accessing all child nodes of root element
    for element in root.children {
      print("\(element.tag): \(element.attributes)")
    }
    
    // Getting child element by tag & accessing attributes
    if let length = root.firstChild(tag:"Length", inNamespace: "dc") {
      print(length["unit"])     // `unit` attribute
      print(length.attributes)  // all attributes
    }
  }
  
  // XPath & CSS queries
  for element in document.xpath("//element") {
    print("\(element.tag): \(element.attributes)")
  }
  
  if let firstLink = document.firstChild(css: "a, link") {
    print(firstLink["href"])
  }
} catch let error {
  print(error)
}

特性

继承自 Ono

极高性能的文档解析和遍历，由 libxml2 驱动
同时支持 XPath 和 CSS 查询
自动转换日期和数字值
正确且符合常识地处理元素和属性的 XML 命名空间
能够从 String 或 NSData 或 [CChar] 加载 HTML 和 XML 文档
全面的测试套件
完整的文档

Fuzi 中的改进

简单、现代的 API，遵循标准的 Swift 惯例，不再有像 AnyObject! 这样导致不必要类型转换的返回类型
可自定义的日期和数字格式化器
修复了一些 Bug
更多用于 HTML 文档的便捷方法
访问所有类型的 XML 节点（包括文本、注释等）
支持更多的 CSS 选择器（即将推出）

要求

iOS 8.0+ / Mac OS X 10.9+
Xcode 8.0+

对于 Swift 2.3，请使用 0.4.0 版本。

安装

有 4 种方法可以将 Fuzi 安装到你的项目中。

使用 CocoaPods

你可以使用 CocoaPods 来安装 Fuzi，方法是将其添加到你的 Podfile 中

platform :ios, '8.0'
use_frameworks!

target 'MyApp' do
	pod 'Fuzi', '~> 1.0.0'
end

然后，运行以下命令

$ pod install

使用 Swift Package Manager

Swift Package Manager 现在已经内置于 Xcode 11 (目前处于 beta 版)。你可以很容易地通过选择 File > Swift Packages > Add Package Dependency... 或在项目文件的 Swift Packages 选项卡中点击 + 来添加 Fuzi 作为依赖。只需使用 https://github.com/cezheng/Fuzi 作为仓库，Xcode 应该会自动解析当前版本。

手动

将 Fuzi 目录中的所有 *.swift 文件添加到你的项目中。
在你的 Xcode 项目的 Build Settings 中
1. 找到 Search Paths，将 $(SDKROOT)/usr/include/libxml2 添加到 Header Search Paths。
2. 找到 Linking，将 -lxml2 添加到 Other Linker Flags。

使用 Carthage

在你的项目的根目录下创建一个 Cartfile 或 Cartfile.private，并添加以下行

github "cezheng/Fuzi" ~> 1.0.0

运行以下命令

$ carthage update

然后在 Xcode 中执行以下操作

将 Carthage 构建的 Fuzi.framework 拖到你的目标的 General -> Embedded Binaries 中。
在 Build Settings 中，找到 Search Paths，将 $(SDKROOT)/usr/include/libxml2 添加到 Header Search Paths。

用法

XML

import Fuzi

let xml = "..."
do {
  // if encoding is omitted, it defaults to NSUTF8StringEncoding
  let document = try XMLDocument(string: html, encoding: String.Encoding.utf8)
  if let root = document.root {
    print(root.tag)
    
    // define a prefix for a namespace
    document.definePrefix("atom", defaultNamespace: "http://www.w3.org/2005/Atom")
    
    // get first child element with given tag in namespace(optional)
    print(root.firstChild(tag: "title", inNamespace: "atom"))

    // iterate through all children
    for element in root.children {
      print("\(index) \(element.tag): \(element.attributes)")
    }
  }
  // you can also use CSS selector against XMLDocument when you feels it makes sense
} catch let error as XMLError {
  switch error {
  case .noError: print("wth this should not appear")
  case .parserFailure, .invalidData: print(error)
  case .libXMLError(let code, let message):
    print("libxml error code: \(code), message: \(message)")
  }
}

HTML

HTMLDocument 是 XMLDocument 的一个子类。

import Fuzi

let html = "<html>...</html>"
do {
  // if encoding is omitted, it defaults to NSUTF8StringEncoding
  let doc = try HTMLDocument(string: html, encoding: String.Encoding.utf8)
  
  // CSS queries
  if let elementById = doc.firstChild(css: "#id") {
    print(elementById.stringValue)
  }
  for link in doc.css("a, link") {
      print(link.rawXML)
      print(link["href"])
  }
  
  // XPath queries
  if let firstAnchor = doc.firstChild(xpath: "//body/a") {
    print(firstAnchor["href"])
  }
  for script in doc.xpath("//head/script") {
    print(script["src"])
  }
  
  // Evaluate XPath functions
  if let result = doc.eval(xpath: "count(/*/a)") {
    print("anchor count : \(result.doubleValue)")
  }
  
  // Convenient HTML methods
  print(doc.title) // gets <title>'s innerHTML in <head>
  print(doc.head)  // gets <head> element
  print(doc.body)  // gets <body> element
  
} catch let error {
  print(error)
}

我不在乎错误处理

import Fuzi

let xml = "..."

// Don't show me the errors, just don't crash
if let doc1 = try? XMLDocument(string: xml) {
  //...
}

let html = "<html>...</html>"

// I'm sure this won't crash
let doc2 = try! HTMLDocument(string: html)
//...

我想访问文本节点

不仅是文本节点，你还可以指定你想访问的节点类型。

let document = ...
// Get all child nodes that are Element nodes, Text nodes, or Comment nodes
document.root?.childNodes(ofTypes: [.Element, .Text, .Comment])

从 Ono 迁移？

查看示例程序是了解差异的最快方式。以下 2 个示例做的事情完全相同。

Ono 示例

Fuzi 示例

访问子节点

Ono

[doc firstChildWithTag:tag inNamespace:namespace];
[doc firstChildWithXPath:xpath];
[doc firstChildWithXPath:css];
for (ONOXMLElement *element in parent.children) {
  //...
}
[doc childrenWithTag:tag inNamespace:namespace];

Fuzi

doc.firstChild(tag: tag, inNamespace: namespace)
doc.firstChild(xpath: xpath)
doc.firstChild(css: css)
for element in parent.children {
  //...
}
doc.children(tag: tag, inNamespace:namespace)

迭代查询结果

Ono

符合 NSFastEnumeration。

// simply iterating through the results
// mark `__unused` to unused params `idx` and `stop`
[doc enumerateElementsWithXPath:xpath usingBlock:^(ONOXMLElement *element, __unused NSUInteger idx, __unused BOOL *stop) {
  NSLog(@"%@", element);
}];

// stop the iteration at second element
[doc enumerateElementsWithXPath:XPath usingBlock:^(ONOXMLElement *element, NSUInteger idx, BOOL *stop) {
  *stop = (idx == 1);
}];

// getting element by index 
ONOXMLDocument *nthElement = [(NSEnumerator*)[doc CSS:css] allObjects][n];

// total element count
NSUInteger count = [(NSEnumerator*)[document XPath:xpath] allObjects].count;

Fuzi

符合 Swift 的 SequenceType 和 Indexable。

// simply iterating through the results
// no need to write the unused `idx` or `stop` params
for element in doc.xpath(xpath) {
  print(element)
}

// stop the iteration at second element
for (index, element) in doc.xpath(xpath).enumerate() {
  if idx == 1 {
    break
  }
}

// getting element by index 
if let nthElement = doc.css(css)[n] {
  //...
}

// total element count
let count = doc.xpath(xpath).count

评估 XPath 函数

Ono

ONOXPathFunctionResult *result = [doc functionResultByEvaluatingXPath:xpath];
result.boolValue;    //BOOL
result.numericValue; //double
result.stringValue;  //NSString

Fuzi

if let result = doc.eval(xpath: xpath) {
  result.boolValue   //Bool
  result.doubleValue //Double
  result.stringValue //String
}

许可证

Fuzi 在 MIT 许可证下发布。详情请参阅 LICENSE。