stefanspringer1/swiftxml
---
Getting started
Define the following dependencies in Package(..) of the file Package.swift, e.g. with a certain minimal version numbers below the next major version (replace the ... accordingly):
dependencies: [
...
.package(url: "https://github.com/stefanspringer1/SwiftXML.git", from: "..."),
.package(url: "https://github.com/stefanspringer1/LoopsOnOptionals.git", from: "..."),
...
]
The LoopsOnOptionals package is optional but can be convenient when using SwiftXML, and it is necessary for the examples used here where a loop over an optional sequence is used. Cf. the section “Related packages” below. You might not need or want to use the LoopsOnOptionals package for your own code.
Add the dependency "SwiftXML" and optionally the dependency "LoopsOnOptionals" to your target in Package.swift:
.target(
name: "...",
dependencies: [
...
"SwiftXML",
"LoopsOnOptionals","
...
],
...
),
Add the following import statements (or just the first of them) in any source file where you would like to use SwiftXML:
import SwiftXML
import LoopsOnOptionals
These import statements will be dropped in following code samples.
The example in this section does not use XML namespaces. For information on using XML namespaces, see the according section below.
Suppose you have an XML file input.xml with the following content:
<book>
<table label="1">
<title>A table with numbers</title>
<tr>
<td>7</td>
<td>8</td>
</tr>
</table>
</book>
It is very easy to parse this XML file into an XDocument instance:
let textAllowedInElementWithName = ["title", "td"]
let document = try parseXML(
fromPath: "input.xml",
registeringAttributes: .selected(["label"]),
textAllowedInElementWithName: textAllowedInElementWithName
)
The textAllowedInElementWithName: argument is there to help removing unnessary whitespace as long as no other method, e.g. an upcoming validation feature, is used to remove it. You might as well just dispense with this argument and leave the whitespace as it is in the XML source. The registeringAttributes: argument registers certain attributes to they can be accesssed directly, the direct access to the “label” attributes is used in this example.
Your can easily access and change elements in your document:
for table in document.elements("table") {
// ... do something with the table ...
}
document.elements("table") returns a lazy sequence over all <table> elements anywhere in your document. This iteration over the elements of a document by name does not have to search for those elements, elements in a document (and attributes as far as they are registered) can be directly accessed by their names. The order of such an iteration is the order by which the item has been added to the document for a specfic name, so according items that are added during the iteration will be part of the iteration.
To then iterate through the rows of a table:
for row in table.children("tr") {
// ... do something with the row ...
}
table.children("tr") also returns a lazy sequence. In both cases, you can change the XML tree of the document during the iteration without disturbing the iteration.
You can also define rules for the transformation of a document like the ones cited at the top and apply them as follows:
transformation.execute(inDocument: document)
In the code of this transformation you see an XElement being initialized at some point. The SwiftXML package uses names not starting with XML, but names starting with just X instead. Names starting with XML belong to the FoundationXML implementation which is part of the Swift toolchain but which is not used here. The parts of an XML tree (and also the XML document) are defined by classes so they can be easily passed around via function calls while keeping their identity. The subscript notation element["..."] of an element is used to get or set an attribute value.
Note that the order of the rules in a transformation is significant. In the given order, each rule is applied as long as an accordings item has been found, and the whole collection of rules is applied again and again until none of the rules has found an item.
After applying this transformation, the document can be then written to a file:
try document.write(
toPath: "output.xml",
pretty: true,
textAllowedInElementWithName: textAllowedInElementWithName
)
The pretty: true argument (which can also be set for the echo(...) and serialized(...) methods) adds linebreaks and indentations to make the serialized XML look pretty (and the textAllowedInElementWithName: makes sure no mixed content environment gets unwanted whitespace added). This is convient here in the examples, but in practice you might better dispense with this argument or use a production (see the according documentation below).
The content of the file output.xml is then:
<book>
<table>
<tr>
<td>7</td>
<td>8</td>
</tr>
</table>
<paragraph role="caption">Table (1): A table with numbers</paragraph>
</book>
Properties of the library
The library reads XML from a source into an XML document instance, and provides methods to transform (or manipulate) the document, and others to write the document to a file.
The library should be efficient and applications that use it should be very intelligible.
### Limitations
- The encoding of the source must be UTF-8 (ASCII is considered as a subset of it). (So UTF-16 as required by the XML standard is not supported.) The parser checks for correct UTF-8 encoding and also checks (according to the data available to the currently used Swift implementation) if a found codepoint is a valid Unicode codepoint.
- Currently no Unicode character normalization is done, even if the declared XML version is 1.1.
- Validation of an XML tree against an XML schema is not available yet (you might use [Libxml2Validation](https://github.com/stefanspringer1/Libxml2Validation) instead).
- An XML tree (e.g. a document) must not be examined or changed concurrently.
### Manipulation of an XML document
Other than some other libraries for XML, the manipulation of the document as built in memory is “in place”, i.e. no new XML document is built. The goal is to be able to apply many isolated manipulations to an XML document efficiently. But it is always possible to clone a document easily with references to or from the old version.
The following features are important:
- All iteration over content in the document using the according library functions are lazy by default, i.e. the iteration only looks at one item at a time and does not collect all items in advance.
- While lazily iterating over content in the document in this manner, the document tree can be changed without negatively affecting the iteration.
- Elements and attributes of specific names and processing instructions of specific targets can be efficiently found by an according lazy iteration without having to traverse the entire tree (for attributes it is necessary to configure the names of the attributes that should be “registered”). An according iteration proceeds in the order by which the items have been added to the document. Items that are added during the iteration will then also be found by the same iteration.
The following code takes any `<item>` with an integer value of `multiply` larger than 1 and additionally inserts an item with a `multiply` number one less, while removing the `multiply` value on the existing item (the library will be explained in more detail in subsequent sections):
```swift
let document = try parseXML(fromText: """
<a><item multiply="3"/></a>
""")
for item in document.elements("item") { in
if let multiply = item["multiply"], let n = Int(multiply), n > 1 {
item.insertPrevious {
XElement("item", ["multiply": n > 2 ? String(n-1) : nil])
}
item["multiply"] = nil
}
}
document.echo()
```
The output is:
```text
<a><item/><item/><item/></a>
```
Note that in this example – just to show you that it works – each new item is being inserted _before_ the current node but is then still being processed.
The elements returned by an iteration can even be removed without stopping the (lazy!) iteration:
```swift
let document = try parseXML(fromText: """
<a><item id="1" remove="true"/><item id="2"/><item id="3" remove="true"/><item id="4"/></a>
""")
document.traverse { content in
if let element = content as? XElement, element["remove"] == "true" {
element.remove()
}
}
document.echo()
```
The output is:
```text
<a><item id="2"/><item id="4"/></a>
```
Of course, since those iterations are regular sequences, all according Swift library functions like `map` and `filter` can be used. But in many cases, it might be better to use conditions on the content iterators (see the section on finding related content with filters) or chaining of content iterators (see the section on chained iterators).
The user of the library can also provide sets of rules to be applied (see the code at the beginning and a full example in the section about rules). In such a rule, the user defines what to do with an element or attribute with a certain name. A set of rules can then be applied to a document, i.e. the rules are applied in the order of their definition. This is repeated, guaranteeing that a rule is only applied once to the same object (if not fully removed from the document and added again, see the section below on document membership), until no more application takes places. So elements can be added during application of a rule and then later be processed by the same or another rule.
### The use of unnamed arguments
All methods that iterate over elements or attributes with specific names receive the names or targets via an unnamed argument. Code for the processing of XML can contain quite complex chains and conditions, and writing e.g. `children(ofName: "a")` instead of `children("a")`, etc., makes complex code less readable. Since something like `children("a")` is easy to understand, this shorter form has been used.
The method `processingInstructions(ofTarget:)` is typically used outside of such complex chains or conditionals, so the trade-off here is to keep the argument name for this method, although this does break consistency a bit.
### Other properties
The library uses the [SwiftXMLParser](https://github.com/stefanspringer1/SwiftXMLParser) to parse XML which implements the according protocol from [SwiftXMLInterfaces](https://github.com/stefanspringer1/SwiftXMLInterfaces).
Depending on the configuration of the parse process, all parts of the XML source can be retained in the XML document, including all comments and parts of an internal subset e.g. all entity or element definitions. (Elements definitions and attribute list definitions are, besides their reported element names, only retained as their original textual representation, they are not parsed into any other representation.)
In the current implementation, the XML library does not implement any validation, i.e. validation against a DTD or other XML schema, telling us e.g. if an element of a certain name can be contained in an element of another certain name. The user has to use other libraries (e.g. [Libxml2Validation](https://github.com/stefanspringer1/Libxml2Validation)) for such validation before reading or after writing the document. Besides validating the structure of an XML document, validation is also important for knowing if the occurrence of a whitespace text is significant (i.e. should be kept) or not. (E.g., whitespace text between elements representing paragraphs of a text document is usually considered insignificant.) To compensate for that last issue, the user of the library can provide a function that decides if an instance of whitespace text between elements should be kept or not. Also, possible default values of attributes have to be set by the user if desired once the document tree is built.
This library gives full control of how to handle entities. Named entity references can persist inside the document event if they are not defined. Named entity references are being scored as internal or external entity references during parsing, the external entity references being those which are referenced by external entity definitions in the internal subset inside the document declaration of the document. Replacements of internal entity references by text can be done automatically according to the internal subset and/or controlled by the application.
Automated inclusion of the content external parsed entities can be configurated, the content might then be wrapped by elements with according information of the enities.
For any error during parsing an error is thrown and no document is then provided.
---
**NOTE**
The description of the library that follows might not include all types and methods. Please see the documentation produced by DocC or use autocompletion in an according integrated development environment (IDE).
---Reading XML
The following functions take a source and return an XML document instance (XDocument). The source can either be provided as a URL, a path to a file, a text, or binary data.
Reading from a URL which references a local file:
public func parseXML(
from documentSource: XDocumentSource,
namespaceAware: Bool = false,
silentEmptyRootPrefix: Bool = false,
registeringAttributes: AttributeRegisterMode = .none,
registeringAttributeValuesFor: AttributeRegisterMode = .none,
registeringAttributesForNamespaces: AttributeWithNamespaceURIRegisterMode = .none,
registeringAttributeValuesForForNamespaces: AttributeWithNamespaceURIRegisterMode = .none,
sourceInfo: String? = nil,
textAllowedInElementWithName: [String]? = nil,
internalEntityAutoResolve: Bool = false,
internalEntityResolver: InternalEntityResolver? = nil,
internalEntityResolverHasToResolve: Bool = true,
insertExternalParsedEntities: Bool = false,
externalParsedEntitySystemResolver: ((String) -> URL?)? = nil,
externalParsedEntityGetter: ((String) -> Data?)? = nil,
externalWrapperElement: String? = nil,
keepComments: Bool = false,
keepCDATASections: Bool = false,
eventHandlers: [XEventHandler]? = nil,
immediateTextHandlingNearEntities: ImmediateTextHandlingNearEntities = .atExternalEntities
) throws -> XDocument
And accordingly:
func parseXML(
fromPath: String,
...
) throws -> XDocument
func parseXML(
fromText: String,
...
) throws -> XDocument
func parseXML(
fromData: Data,
...
) throws -> XDocument
If you want to be indifferent about which kind of source to process, use XDocumentSource for the source definition and use:
func parseXML(
from: XDocumentSource,
...
) throws -> XDocument
The optional textAllowedInElementWithName can specify which element allow text content, so insignificant space can easily get removed during parsing as long as no validation method is available, and this list can also be used for pretty-printing XML. If no text is allowed in the context but the text is not whitespace, an error is thrown.
All internal entity references in attribute values have to be replaced by text during parsing. In order to achieve this (in case that internal entity references occur at all in attribute values in the source), an InternalEntityResolver can be provided. An InternalEntityResolver has to implement the following method:
func resolve(
entityWithName: String,
forAttributeWithName: String?,
atElementWithName: String?
) -> String?
This method is always called when a named entity reference is encountered (either in text or attribute) which is scored as an internal entity. It either returns the textual replacement for the entity or it does not resolve the entity by returning nil. By default, the resolver has to resolve all entities presented to it, else an according error is thrown. You can remove this enforcement by setting internalEntityResolverHasToResolve: false in the call of the parse function; then, when the resolver returns nil, the entity reference is not replaced by a text, but is kept withput any further notice. In the case of a named entity in an attribute value, an error is always thrown when no replacement is given. The function arguments forAttributeWithName (name of the attribute) and atElementWithName (name of the element) have according values if and only if the entity is encountered inside an attribute value.
If internalEntityAutoResolve is set to true, the parser first tries to replace the internal entities by using the declarations in the internal subset of the document before calling an InternalEntityResolver.
The content of external parsed entities are not inserted by default, but they are if you set insertExternalParsedEntities to true. You can provides a method in the argument externalParsedEntitySystemResolver to resolved the system identitfier of the external parsed entity to an URL. You can also provide a method in the argument externalParsedEntityGetter to get the data for the system identifier (if externalParsedEntitySystemResolver is provided, then externalParsedEntitySystemResolver first has to return nil). At the end the system identifier is just added as path component to the source URL (if it exists) and the parser tries to load the entity from there.
When the content of an external parsed entitiy is inserted, you can declare an element name externalWrapperElement: the inserted content then gets wrapped into an element of that name with the information about the entity in the attributes name, systemID, and path (path being optional, as an external parsed entity might get resolved without an explicit path). (During later processing, you might want to change this representation, e.g. if the external parsed entity reference is the only content of an element, you might replace the wrapper by its content and set the according information as some attachments of the parent element, so validation of the document succeeds.)
One a more event handlers can be given a parseXML call, which implement XEventHandler from XMLInterfaces. This allows for the user of the library to catch any event during parsing like entering or leaving an element. E.g., the resolving of an internal entity reference could depend on the location inside the document (and not only on the name of the element or attribute), so this information can be collected by such an event handler.
keepComments (default: false) decides if a comment should be preserved (as XComment), else they will be discarded without notice. keepCDATASections (default: false) decides if a CDATA section should be preserved (as XCDATASection), else all CDATA sections get resolved as text.
Content of a document
An XML document (XDocument) can contain the following content:
XElement: an elementXText: a textXInternalEntity: an internal entity referenceXExternalEntity: an external entity referenceXCDATASection: a CDATA sectionXProcessingInstruction: a processing instructionXComment: a commentXLiteral: containing text that is meant to be serialized “as is”, i.e. no escaping e.g. of<and&is done, it could contain XML code that is to be serialized literally, hence its name
XLiteral is never the result of parsing XML, but might get added by an application. Subsequent XLiteral content is (just like XText, see the section on handling of text) always automatically combined.
Those content are of type type XContent, whereas the more general type XNode might be content or an XDocument.
The following is read from the internal subset:
XInternalEntityDeclaration: an internal entity declarationXExternalEntityDeclaration: an external entity declarationXUnparsedEntityDeclaration: a declaration of an unparsed external entityXNotationDeclaration: a notation declarationXParameterEntityDeclaration: a parameter entity declarationXElementDeclaration: an element declarationXAttributeListDeclaration: an attribute list declaration
They can be accessed via property declarationsInInternalSubset.
A document gets the following additional properties from the XML source (some values might be nil:
encoding: the encoding from the XML declarationpublicID: the public identifier from the document type declarationsourcePath: the source to the XML documentstandalone: the standalone value from the XML declarationsystemID: the system identifier from the document type declarationxmlVersion: the XML version from the XML declaration
When not set explicitely in the XML source, some of those values are set to a sensible value.
Displaying XML
When printing a content via print(...), only a top-level represenation like the start tag is printed and never the whole tree. When you would like to print the whole tree or document, use:
func echo(pretty: Bool, indentation: String, terminator: String)
pretty defaults to false; if it is set to true, linebreaks and spaces are added for pretty print. indentation defaults to two spaces, terminator defaults to "\n", i.e. a linebreak is then printed after the output.
With more control:
func echo(usingProductionTemplate: XProductionTemplate, terminator: String)
Productions are explained in the next section.
When you want a serialization of a whole tree or document as text (String), use the following method:
func serialized(pretty: Bool) -> String
pretty again defaults to false and has the same effect.
With more control:
func serialized(usingProductionTemplate: XProductionTemplate) -> String
Do not use serialized to print a tree or document, use echo instead, because using echo is more efficient in this case.
Writing XML
Any XML node (including an XML document) can be written, including the tree of nodes that is started by it, via the following methods.
func write(toURL: URL, usingProductionTemplate: XProductionTemplate) throws
func write(toPath: String, usingProductionTemplate: XProductionTemplate) throws
func write(toFile: FileHandle, usingProductionTemplate: XProductionTemplate) throws
func write(toWriter: Writer, usingProductionTemplate: XProductionTemplate) throws
You can also use the WriteTarget protocol to allow all the above possiblities:
func write(to writeTarget: WriteTarget, usingProductionTemplate: XProductionTemplate) throws
By the argument usingProductionTemplate: you can define a production, i.e. details of the serialization, e.g. if linebreaks are inserted to make the result look pretty. Its value defaults a an instance of XActiveProductionTemplate, which will give a a standard output.
The definition of such a production comes in two parts, a template that can be initialized with values for a further configuration of the serialization, and an active production which is to be applied to a certain target. This way the user has the ability to define completely what the serialization should look like, and then apply this definition to one or several serializations. In more detail:
A XProductionTemplate has a method activeProduction(for writer: Writer) -> XActiveProduction which by using the writer initializes an XActiveProduction where the according events trigger a writing to the writer. The configuration for such a production are to be provided via arguments to the initializer of the XProductionTemplate.
So an XActiveProduction defines how each part of the document is written, e.g. if > or " are written literally or as predefined XML entities in text sections. The production in the above function calls defaults to an instance of DefaultProductionTemplate which results in instances of ActiveDefaultProduction. ActiveDefaultProduction should be extended if only some details of how the document is written are to be changed. The productions ActivePrettyPrintProduction (which might be used by defining an PrettyPrintProductionTemplate) and ActiveHTMLProduction (which might be used by defining an HTMLProductionTemplate) already extend ActiveDefaultProduction, which might be used to pretty-print XML or output HTML. But you also extend one of those classes youself, e.g. you could override func writeText(text: XText) and func writeAttributeValue(name: String, value: String, element: XElement) to again write some characters as named entity references. Or you just provide an instance of DefaultProduction itself and change its linebreak property to define how line breaks should be written (e.g. Unix or Windows style). You might also want to consider func sortAttributeNames(attributeNames: [String], element: XElement) -> [String] to sort the attributes for output.
Example: write a linebreak before all elements:
class MyProduction: DefaultProduction {
override func writeElementStartBeforeAttributes(element: XElement) throws {
try write(linebreak)
try super.writeElementStartBeforeAttributes(element: element)
}
}
try document.write(toFile: "myFile.xml", usingProduction: MyProduction())
For generality, the following method is provided to apply any XActiveProduction to a node and its contained tree:
func applyProduction(activeProduction: XActiveProduction) throws
Cloning and document versions
Any node (including an XML document) can be cloned, including the tree of nodes that is started by it, using the following method:
var clone: XNode
(The result will be more specific if the subject is known to be more specific.)
By default, the clone of a document will register the same attributes and values, but by default clones loose their attachments. You can change this by calling clone(keepAttachments:registeringAttributes:registeringValuesForAttributes:) for a document or clone(keepAttachments:) for an element. (Those arguments have default values which produce the defaut behaviour, use nil for the to AttributeRegisterMode values to achieve this explicitly.) Those argument are also available for `makeVersion().
Any content and the document itself possesses the property backlink that can be used as a relation between a clone and the original node. If you create a clone by using the clone property, the backlink value of a node in the clone points to the original node. So when working with a clone, you can easily look at the original nodes.
(A backlink might also be set manuallay by the methods setting(backlink:) or copyingBacklink(from:), which might come in handy in transformations.)
Note that the backlink reference references the original node weakly, i.e. if you do not save a reference to the original node or tree then the original node disapears and the backlink property will be nil.
If you would like to use cloning to just save a version of your document to a copy, use its following method:
func makeVersion()
In that case a clone of the document will be created, but with the backlink property of an original node pointing to the clone, and the backlink property of the clone will point to the old backlink value of the original node. I.e. if you apply saveVersion() several times, when following the backlink values starting from a node in your original document, you will go through all versions of this node, from the newer ones to the older ones. The backlinks property gives you exactly that chain of backlinks. Other than when using clone, a strong reference to such a document version will be remembered by the document, so the nodes of the clone will be kept. Use forgetVersions(keeping:Int) on the document in order to stop this remembering, just keeping the last number of versions defined by the argument keeping (keeping defaults to 0). In the oldest version then still remembered or, if no remembered version if left, in the document itself all backlink values will then be set to nil.
The finalBacklink property follows the whole chain of backlink values and gives you the last value in this chain.
Sometimes, only a “shallow” clone is needed, i.e. the node itself without the whole tree of nodes with the node as root. In this case, just use:
func shallowClone(forwardref: Bool) -> XNode
The backlink is then set just like when using clone.
The property backlinkOrSelf gives the backlink or – it it is nil – the subject itself.
Content properties
If the parser (as it is the case with the SwiftXMLParser) reports the where a part of the document it is in the text (i.e. at what line and column it starts and at what line and column it ends), the property sourceRange: XTextRange (using XTextRange from SwiftXMLInterfaces) returns it for the respective node:
Example:
let document = try parseXML(fromText: """
<a>
<b>Hello</b>
</a>
""", textAllowedInElementWithName: { $0 == "b" })
for content in document.allContent {
if let sourceRange = content.sourceRange {
print("\(sourceRange): \(content)")
}
else {
content.echo()
}
}
Output:
1:1 - 3:4: <a>
2:5 - 2:16: <b>
2:8 - 2:12: Hello
Element names can be read and set by the using the property name of an element. After setting of a new name different from the existing one, the element is registered with the new name in the document, if it is part of a document. Setting the same name does not change anything (it is an efficient non-change).
For a text content (XText) its text can be read and set via its property value. So there is no need to replace a XText content by another to change text. Please also see the section below on handling of text.
The attributes of an element can be read and set via the “index notation”. If an attribute is not set, nil is returned; reversely, setting an attribute to nil results in removing it. Setting an attribute with a new name or removing an attribute changes the registering of attributes in the document, if the element is part of a document. Setting a non-nil value of an attribute that already exists is an efficient non-change concerning the registering if attributes.
Example:
// setting the "id" attribute to "1":
myElement["id"] = "1"
// reading an attribute:
if let id = myElement["id"] {
print("the ID is \(id)")
}
You can also get a sequence of attribute values (optional Strings) from a sequence of elements.
Example:
let document = try parseXML(fromText: """
<test>
<b id="1"/>
<b id="2"/>
<b id="3"/>
</test>
""")
print(document.children.children["id"].joined(separator: ", "))
Result:
1, 2, 3
If you want to get an attribute value and at the same time remove the attribute, use the method pullAttribute(...) of the element.
To get the names of all attributes of an element, use:
var attributeNames: [String]
Note that you also can a (lazy) sequence of the attribute values of a certain attribute name of a (lazy) sequence of elements by using the same index notation:
print(myElement.children("myChildName")["myAttributeName"].joined(separator: ", "))
All nodes can have “attachments”. Those are objects that can be attached via a textual key. Those attachments are not considered as belonging to the formal XML tree.
Those attachements are realized as a dictionary attached as a member of each node.
You can also set attachments immediately when creating en element or a document by using the argument attached: of the initializer. (Note that in this argument, some values might be nil for convenience.)
Get the XPath of a node via:
var xPath: String
Traversals
Traversing a tree depth-first starting from a node (including a document) can be done by the following methods:
func traverse(down: (XNode) throws -> (), up: ((XNode) throws -> ())? = nil) rethrows
func traverse(down: (XNode) async throws -> (), up: ((XNode) async throws -> ())? = nil) async rethrows
For a “branch”, i.e. a node that might contain other nodes (like an element, opposed to e.g. text, which does not contain other nodes), when returning from the traversal of its content (also in the case of an empty branch) the closure given the optional up: argument is called.
Example:
document.traverse { node in
if let element = node as? XElement {
print("entering element \(element.name)")
}
}
up: { node in
if let element = node as? XElement {
print("leaving element \(element.name)")
}
}
Note that the root of the traversal is not to be removed during the traversal.
Direct access to elements
As mentioned and the general description, the library allows to efficiently find elements of a certain name in a document without having to traverse the whole tree.
Finding the elements of a certain name:
func elements(prefix:_: String...) -> XElementsOfSameNameSequence
Example:
for paragraph in myDocument.elements("paragraph") {
if let id = paragraph["id"] {
print("found paragraph with ID \"\(ID)\"")
}
}
Find the elements of several name alternatives by using several names in the according argument. Note that just like the methods for single names, what you add during the iteration will then also be considered.
You can also use a prefix in the first (optional) argument for direct access to elements having a certain prefix (if you use nil as the value of this argument, the according elements that do not have a prefix are found). See more about prefixes in the section “Prefixes and namespaces” below.
Direct access to attributes
To directly find where an attribute with a certain name is set, you can use an analogue to the direct access to elements, but for efficiency reason you have to specify the attribute names which can be used for such a direct access. You specify these attribute names when creating a document (e.g. XDocument(registeringAttributes: .selected(["id", "label"]))) or indirecting when using the parse functions (e.g. try parseXML(fromText: "...", registeringAttributes: .selected(["id", "label"]))). You can also register attributes for a certain namespace or a prefix and then list them by additionally using the prefix: argument, see the section on prefixes and namespaces.
Example:
let document = try parseXML(fromText: """
<test>
<x a="1"/>
<x b="2"/>
<x c="3"/>
<x d="4"/>
</test>
""", registeringAttributes: .selected(["a", "c"]))
let registeredAttributesInfo = document.registeredAttributes("a", "b", "c", "d").map{ "\($0.name)=\"\($0.value)\" in \($0.element)" }.joined(separator: ", ")
print(registeredAttributesInfo) // "a="1" in <x a="1">, c="3" in <x c="3">"
let allValuesInfo = document.elements("x").compactMap{
if let name = $0.attributeNames.first, let value = $0[name] { "\(name)=\"\(value)\" in \($0)" } else { nil }
}.joined(separator: ", ")
print(allValuesInfo) // "a="1" in <x a="1">, b="2" in <x b="2">, c="3" in <x c="3">, d="4" in <x d="4">"
You can register attributes values by using the argument registeringValuesForAttributes: when parsing or creating a document:
let source = """
<a>
<b id="1"/>
<b id="2"/>
<b refid="1">First reference to "1".</b>
<b refid="1">Second reference to "1".</b>
</a>
"""
let document = try parseXML(fromText: source, registeringValuesForAttributes: .selected(["id", "refid"]))
print(#"id="1":"#)
print(document.registeredValues("1", forAttribute: "id").map{ $0.element.description }.joined(separator: "\n"))
print()
print(#"refid="1":"#)
print(document.registeredValues("1", forAttribute: "refid").map{ $0.element.serialized() }.joined(separator: "\n"))
Result:
id="1":
<b id="1">
refid="1":
<b refid="1">First reference to "1".</b>
<b refid="1">Second reference to "1".</b>
If the value according to an attribute name should be unique, find the according element by::
if let element = document.registeredValues("1", forAttribute: "id").first?.element {
...
}
NOTE
document.registeredAttributes("id")ordocument.registeredAttributes("refid")would give you an empty sequence in the above example, you would have to addregisteringAttributes: .selected(["id", "refid"]))to also find these attributes by name only.- As
registeredValues(forAttribute:)returns a lazy sequence that also considers new values that are set during its iteration, you might first make an array out of the sequence viaArrayArray(document.registeredValues(...))if you plan to change the according values. - It was decided not to introduce rules for attribute values for the time being.
Processing instructions
Processing instructions provide additional information to the application processing an XML document. They are written as ` in the XML source, where target and data are replaced with concrete values, or created via XProcessingInstruction(target: "...", data: "..."). The data part is optional, and it includes all whitespace that is included in the serialization of the processing instruction except the first space character after the target. A processing instruction can be understood as an according command specified by the target and an argument specified by the data. Alternatively, the target could specify the application that placed it or is intended to work on it, or the type of problem or the topic to which the processing instruction relates, the data providing the details. Namespace prefixes cannot be used in targets (a target with a colon is not correct syntax when namespaces are enabled), but prefixes written as, for example, ` can be useful. Processing instructions should be placed carefully: if they are placed in the middle of the text, this can disrupt the a text search at that point.
SwiftXML provides efficient direct access to the processing instructions of specific targets via the processingInstructions(ofTarget:) method:
let source = """
<a>
<b>Blabla.</b>
<b>Blabla.</b>
<b>Blabla.</b>
</a>
"""
let document = try parseXML(fromText: source)
print(
document.processingInstructions(ofTarget: "MyTarget")
.map { $0.data ?? "" }.joined(separator: "\n")
)
print("----")
print(
document.processingInstructions(ofTarget: "MyTarget", "OtherTarget")
.map { $0.data ?? "" }.joined(separator: "\n"),
)
Output:
Hello world!
This has the same target.
----
Hello world!
This has the same target.
This has another target.
If you want to delete all processing instructions of specific targets (e.g., before starting a process that sets exactly those processing instructions), you can do so using the following simple notation:
document.processingInstructions(ofTarget: "MyTarget", "OtherTarget").remove()
Chained iterators
Iterators can also be chained. The second iterator is executed on each of the node encountered by the first iterator. All this iteration is lazy, so the first iterator only searches for the next node if the second iterator is done with the current node found by the first iterator.
Example:
let document = try parseXML(fromText: """
<a>
<b>
<c>
<d/>
</c>
</b>
</a>
""")
for element in document.descendants.descendants { print(element) }
Output:
<b>
<c>
<d>
<c>
<d>
<d>
Also, in those chains operations finding single nodes when applied to a single node like parent also work, and you can use e.g. insertNext (see the section on tree manipulations), or with (see the next section on constructing XML), or echo().
When using an index with a String, you get a sequence of the according attribute values (where set):
for childID in element.children["id"] {
print("found child ID \(childID)")
}
Note that when using an Int as subscript value for a sequence of content, you get the child of the according index:
if let secondCHild = element.children[2] {
print("second child: \(secondChild)")
}
NOTE
If you use this subscript notation [n] for a sequence of XContent, XElement, or XText, then – despite using integer values – this is not (!) a random access to the elements (each time using such a subscript, the sequence is followed until the according item is found by counting), and the counting starts at 1 as in the XPath language, and not at 0 as e.g. for Swift arrays.
You should see this integer subscript more as a subscript with names, the integer values being the names that the positions are given in the XML, where counting from 1 is common.
Constructing XML
### Constructing an empty element
When constructing an element (without content), the name is given as the first (nameless) argument and the attribute values are given as (nameless) a dictionary.
Example: constructing an empty “paragraph” element with attributes `id="1"` and `style="note"`:
```swift
let myElement = XElement("paragraph", ["id": "1", "style": "note"])
```
### About the insertion of content
We would first like to give some important hints before we explain the corresponding functionalities in detail.
Note that when inserting content into an element or document and that content already exists somewhere else, the inserted content is _moved_ from its orginal place, and not copied. If you would like to insert a copy, insert the result of using the `clone` property of the content.
Be “courageous” when formulating your code, more might function than you might have thought. Anticipating the explanations in the following sections, e.g. the following code examples _do_ work:
Moving the “a” children and the “b” children of an element to the beginning of the element:
```swift
element.addFirst {
element.children(“a”)
element.children(“b”)
}
```
As the content is first constructed and then inserted, there is no inifinite loop here.
Note that in the result, the order of the content is just like defined inside the parentheses `{...}`, so in the example inside the resulting `element` there are first the “a” children and then the “b” children.
Wrap an element with another element:
```swift
element.replace {
XElement("wrapper") {
element
}
}
```
The content that you define inside parentheses `{...}` is constructed from the inside to the outside. From the notes above you might then think that `element` in the example is not as its original place any more when the content of the “wrapper” element has been constructed, before the replacement could actually happen. Yes, this is true, but nevertheless the `replace` method still knows where to insert this “wrapper” element. The operation does work as you would expect from a naïve perspective.
An instance of any type conforming to `XContentConvertible` (it has to implement its `collectXML(by:)` method) can be inserted as XML:
```swift
struct MyStruct: XContentConvertible {
let text1: String
let text2: String
func collectXML(by xmlCollector: inout XMLCollector) {
xmlCollector.collect(XElement("text1") { text1 })
xmlCollector.collect(XElement("text2") { text2 })
}
}
let myStruct1 = MyStruct(text1: "hello", text2: "world")
let myStruct2 = MyStruct(text1: "greeting", text2: "you")
let element = XElement("x") {
myStruct1
myStruct2
}
element.echo(pretty: true)
```
Result:
```xml
<x>
<text1>hello</text1>
<text2>world</text2>
<text1>greeting</text1>
<text2>you</text2>
</x>
```
For `XContentConvertible` there is also the `xml` property that returns an according array of `XContent`.
When constructing CDATA sections and comments, you can aso use the `XCDATASection { ... }` and `XComment { ... }` notation, but only with `String` content.
### Defining content
When constructing an element, its contents are given in parentheses `{...}` (those parentheses are the `builder` argument of the initializer).
```swift
let myElement = XElement("div") {
XElement("hr")
XElement("paragraph") {
"Hello world"
}
XElement("hr")
}
```
(The text `"Hello world"` could also be given as `XText("Hello world")`. The text will be converted in such an XML node automatically.)
The content might be given as an array or an appropriate sequence:
```swift
let myElement = XElement("div") {
XElement("hr")
myOtherElement.content
XElement("hr")
}
```
When not defining content, using `map` might be a sensible option:
```swift
let element = XElement("z") {
XElement("a") {
XElement("a1")
XElement("a2")
}
XElement("b") {
XElement("b1")
XElement("b2")
}
}
for content in element.children.map({ $0.children.first }) { print(content?.name ?? "-") }
```
Output:
```text
a1
b1
```
The same applies to e.g. the `filter` method, which, besides letting the code look more complex when used instead of the filter options described above, is not a good option when defining content.
The content of elements containing other elements while defining their content is being built from the inside to the ouside: Consider the following example:
```swift
let b = XElement("b")
let a = XElement("a") {
b
"Hello"
}
a.echo(pretty: true)
print("\n------\n")
b.replace {
XElement("wrapper1") {
b
XElement("wrapper2") {
b.next
}
}
}
a.echo(pretty: true)
```
First, the element “wrapper2” is built, and at that moment the sequence `b.next` contains the text `"Hello"`. So we will get as output:
```text
<a><b/>Hello</a>
------
<a>
<wrapper1>
<b/>
<wrapper2>Hello</wrapper2>
</wrapper1>
</a>
```
### Document membership in constructed elements
Elements that are part of a document (`XDocument`) are registered in the document. The reason is that this allows fast access to elements respectively attributes of a certain name via `elements(_:)` respectively `attributes(_:)`, and the rules (see the section about rules) use these registers (note that for efficiency reasons, the attribute names to be used in such a way have to be configured when a document is created).
In the moment of constructing a new element with its content defined in `{...}` brackets during construction, the element is not part any document. The nodes inserted to it leave the document tree, but they are not (!) unregistered from the document. I.e. the iteration `elements(_:)` will still find them, and according rules will apply to them. The reason for this behaviour is the common case of the new element getting inserted into the same document. If the content of the new element would first get unregistered from the document and then get reinserted into the same document again, they would then count as new elements, and the mentioned iterations might iterate over them again.
If you would like to get the content a newly built element to get unregistered from the document, use its method `adjustDocument()`. This method diffuses the current document of the element to its content. For a newly built element this document is `nil`, which unregisters a node from its document. You might also set the attribute `adjustDocument` to `true` in the initializer of the element to automatically call `adjustDocument()` when the building of the new element is accomplished. This call or setting to adjust of the document is only necessary at the top-level element, it is dispersed through the whole tree.
Note that if you insert an element into another document that is part of a document, the new child gets registered in the document of its new parent if not already registered there (and unregistered from any different document where it was registered before).
Example: a newly constructed element gets added to a document:
```swift
let document = try parseXML(fromText: """
<a><b id="1"/><b id="2"/></a>
""")
for element in document.elements("b") {
print("applying the rule to \(element)")
if element["id"] == "2" {
element.insertNext {
XElement("c") {
element.previous
}
}
}
}
print("\n-----------------\n")
document.echo()
```
Output:
```text
applying the rule to <b id="1">
applying the rule to <b id="2">
-----------------
<a><b id="2"/><c><b id="1"/></c></a>
```
As you can see from the `print` commands in the last example, the element `<b id="1">` does not lose its “connection” to the document (although it seems to get added again to it), so it is only iterated over once by the iteration.Tree manipulations
Besides changing the node properties, an XML tree can be changed by the following methods. Some of them return the subject itself as a discardable result. For the content specified in `{...}` (the builder) the order is preserved.
Add nodes at the end of the content of an element or a document respectively:
```swift
func add(builder: () -> [XContent])
```
Add nodes to the start of the content of an element or a document respectively:
```swift
func addFirst(builder: () -> [XContent])
```
Add nodes as the nodes previous to the node:
```swift
func insertPrevious(_ insertionMode: InsertionMode = .following, builder: () -> [XContent])
```
Add nodes as the nodes next to the node:
```swift
func insertNext(_ insertionMode: InsertionMode = .following, builder: () -> [XContent])
```
A more precise type is returned from `insertPrevious` and `insertNext` if the type of the subject is more precisely known.
By using the next two methods, a node gets removed.
Remove the node from the tree structure and the document:
```swift
func remove()
```
You might also use the method `removed()` of a node to remove the node but also use the node.
Replace the node by other nodes:
```swift
func replace(_ insertionMode: InsertionMode = .following, builder: () -> [XContent])
```
Note that the content that replaces a node is allowed to contain the node itself.
Sometimes you might want to insert a node, but not really the node, but a replacement of it. The `replacedBy` methods first replaces the node before inserting the result somehere else. The example at the top (which is used in the "Getting started" section) contains such an instance:
```swift
table.insertNext {
XElement("caption") {
if let label = table["label"] {
"Table "; label; ": "
}
table.firstChild("title")?.replacedBy { $0.content }
}
}
```
`replacedBy` also can be applied to sequences. It has to do a real first replacement in order to prevent nodes to leave the document in an unwanted way (note that the node where `replacedBy` is applied might iself be part of the replacement, so we cannot break with the usual replacement scheme).
The first replacement is isolated to prevent the concatenation of texts at its borders, this isolation is removed when the result of `replacedBy` is inserted somewhere else as in the example above.
---
**NOTE**
The `replacedBy` method is convenient at times, but as it actually first replaces something before inserting the replacement elsewhere, it is less efficient than doing this first replacement later e.g. in a following rule. With a more complete set of rules you might not see a need to use `replacedBy`.
---
Clear the contents of an element or a document respectively:
```swift
func clear()
```
Test if an element or a document is empty:
```swift
var isEmpty: Bool
```
Set the contents of an element or a document respectively:
```swift
func setContent(builder: () -> [XContent])
```
Example:
```swift
for table in myDocument.elements("table") {
table.insertNext {
XElement("legend") {
"this is the table legend"
}
XElement("caption") {
"this is the table caption"
}
}
}
```
Note that by default iterations continue with new nodes inserted by `insertPrevious` or `insertNext` also being considered. In the following cases, you have to add the `.skipping` directive to get the output as noted below (in the second case, you even get an infinite loop if you do not set `.skipping`):
```swift
let element = XElement("top") {
XElement("a1") {
XElement("a2")
}
XElement("b1") {
XElement("b2")
}
XElement("c1") {
XElement("c2")
}
}
element.echo(pretty: true)
print("\n---- 1 ----\n")
for content in element.content {
content.replace(.skipping) {
content.content
}
}
element.echo(pretty: true)
print("\n---- 2 ----\n")
for content in element.contentReversed {
content.insertPrevious(.skipping) {
XElement("I" + ((content as? XElement)?.name ?? "?"))
}
}
element.echo(pretty: true)
```
Output:
```text
<top>
<a1>
<a2/>
</a1>
<b1>
<b2/>
</b1>
<c1>
<c2/>
</c1>
</top>
---- 1 ----
<top>
<a2/>
<b2/>
<c2/>
</top>
---- 2 ----
<top>
<Ia2/>
<a2/>
<Ib2/>
<b2/>
<Ic2/>
<c2/>
</top>
```
Note that there is no such mechanism to skipping inserted content when not using `insertPrevious`, `insertNext`, or `replace`, e.g. when using `add`. Consider the combination `descendants.add`: there is then no “natural” way to correct the traversal of the tree. (A more common use case would be something like `descendants("table").add { XElement("caption") }`, so this should not be a problem in common cases, but something you should be aware of.)
When using `insertNext`, `replace` etc. in chained iterators, what happens is that the definition of the content in the parentheses `{...}` get _executed_ for each item in the sequence. You might should use the `collect` function to build content specifically for the current item instead. E.g. in the last example, you might use with the same result:
```swift
print("\n---- 1 ----\n")
element.content.replace { content in
collect {
content.content
}
}
element.echo(pretty: true)
print("\n---- 2 ----\n")
element.contentReversed.insertPrevious { content in
find {
XElement("I" + ((content as? XElement)?.name ?? "?"))
}
}
element.echo(pretty: true)
```
You may also not use `collect`:
```swift
let e = XElement("a") {
XElement("b")
XElement("c")
}
for descendant in e.descendants({ $0.name != "added" }) {
descendant.add { XElement("added") }
}
e.echo(pretty: true)
```
Output:
```swift
<a>
<b>
<added/>
</b>
<c>
<added/>
</c>
</a>
```
Note that a new `<added/>` is created each time. From what has already bee said, it should be clear that this “duplication” does not work with existing content (unless you use `clone` or `shallowClone`):
```swift
let myElement = XElement("a") {
XElement("to-add")
XElement("b")
XElement("c")
}
for descendant in myElement.descendants({ $0.name != "to-add" }) {
descendant.add {
myElement.descendants("to-add")
}
}
myElement.echo(pretty: true)
```
Output:
```text
<a>
<b/>
<c>
<to-add/>
</c>
</a>
```
As a general rule, when inserting a content, and that content is already part of another element or document, that content does not get duplicated, but removed from its original position.
Use `clone` (or `shallowClone`) when you actually want content to get duplicated, e.g. using `myElement.descendants("to-add").clone` in the last example would then output:
```text
<a>
<to-add/>
<b>
<to-add/>
</b>
<c>
<to-add/>
<to-add/>
</c>
</a>
```
By default, When you insert content, this new content is also followed (insertion mode `.following`), as this best reflects the dynamic nature of this library. If you do not want this, set `.skipping` as first argument of `insertPrevious` or `insertNext`. For example, consider the following code:
```swift
let myElement = XElement("top") {
XElement("a")
}
for element in myElement.descendants {
if element.name == "a" {
element.insertNext() {
XElement("b")
}
}
else if element.name == "b" {
element.insertNext {
XElement("c")
}
}
}
myElement.echo(pretty: true)
```
Output:
```text
<top>
<a/>
<b/>
<c/>
</top>
```
When `<b/>` gets inserted, the traversal also follows this inserted content. When you would like to skip the inserted content, use `.skipping` as the first argument of `insertNext`:
```swift
...
element.insertNext(.skipping) {
XElement("b")
}
...
```
Output:
```text
<top>
<a/>
<b/>
</top>
```
Similarly, if you replace a node, the content that gets inserted in place of the node is by default included in the iteration. Example: Assume you would like to replace every occurrence of some `<bold>` element by its content:
```swift
let document = try parseXML(fromText: """
<text><bold><bold>Hello</bold></bold></text>
""")
for bold in document.descendants("bold") { bold.replace { bold.content } }
document.echo()
```
The output is:
```text
<text>Hello</text>
```Handling of text
Subsequent text nodes (XText) are always automatically combined, and text nodes with empty text are automatically removed. The same treatment is applied to XLiteral nodes.
This can be very convenient when processing text, e.g. it is then very straightforward to apply regular expressions to the text in a document. But there might be some stumbling blocks involved here, when the different behaviour of text nodes and other nodes affects the result of your manipulations.
You can avoid merging of text text with other texts by setting the isolated property to true (you can also choose to set this value during initialization of an XText). Consider the following example where the occurrences of a search text gets a greenish background. In this example, you do not want part to be added to text in the iteration:
let document = try parseXML(fromText: """
<doc>
<paragraph>Hello world!</paragraph>
<paragraph>world world world</paragraph>
</doc>
""")
let searchText = "world"
document.traverse { node in
if let text = node as? XText {
if text.value.contains(searchText) {
text.isolated = true
var addSearchText = false
for part in text.value.components(separatedBy: searchText) {
text.insertPrevious {
if addSearchText {
XElement("span", ["style": "background:LightGreen"]) {
searchText
}
}
part
}
addSearchText = true
}
text.remove()
text.isolated = false
}
}
}
document.echo()
Output:
<a>Hello <span style="background:LightGreen">world</span>, the <span style="background:LightGreen">world</span> is nice.</a>
Note that when e.g. inserting nodes, the XText nodes of them are then treated as being isolated while being moved.
A String can be used where an XText is required, e.g. you can write "Hello" as XText".
XText, as well as XLiteral and XCDATASection, conforms to the XTextualContentRepresentation protocol, i.e. they all have a String property of name value that can be read and set and which represents content as it would be written into the serialized document (with some character escapes necessary in the case of XText when it is being written). Note that XComment does not conform to the XTextualContentRepresentation protocol.
Rules
When you only want to apply a few changes to a document, just go directly to the few according elements and apply the changes you want. But if you would like to transform a whole document into “something else”, you need a better tool to organise your manipulations of the document, you need a “transformation”.
As mentioned in the general description, a set of rules XRule in the form of a transformation instance of type XTransformation can be used as follows.
In a rule, the user defines what to do with elements or attributes certain names. The set of rules can then be applied to a document, i.e. the rules are applied in the order of their definition. This is repeated, guaranteeing that a rule is only applied once to the same object (if not removed from the document and added again), until no application takes place. So elements can be added during application of a rule and then later be processed by the same or another rule.
Example:
let document = try parseXML(fromText: """
<a><formula id="1"/></a>
""")
var count = 1
let transformation = XTransformation {
XRule(forElements: "formula") { element in
print("\n----- Rule for element \"formula\" -----\n")
print(" \(element)")
if count == 1 {
count += 1
print(" add image")
element.insertPrevious {
XElement("image", ["id": "\(count)"])
}
}
}
XRule(forElements: "image") { element in
print("\n----- Rule for element \"image\" -----\n")
print(" \(element)")
if count == 2 {
count += 1
print(" add formula")
element.insertPrevious {
XElement("formula", ["id": "\(count)"])
}
}
}
}
transformation.execute(inDocument: document)
print("\n----------------------------------------\n")
document.echo()
----- Rule for element "formula" -----
<formula id="1">
add image
----- Rule for element "image" -----
<image id="2">
add formula
----- Rule for element "formula" -----
<formula id="3">
----------------------------------------
<a><formula id="3"/><image id="2"/><formula id="1"/></a>
You can also formulate rules with the prefix: argument, see the section on prefixes and namespaces.
As a side note, for such an XTransformation the lengths of the element names do not really matter: apart from the initialization of the transformation before the execution and from what happens inside the rules, the appliance of the rules is not less efficient if the element names are longer.
Instead of using a transformation with a very large number of rules, you should use several transformations, each dedicated to a separate “topic”. E.g. for some document format you might first transform the inline elements and then the block elements. Splitting a transformation into several transformations practically does not hurt performance.
Note that the order of the rules matters: In the given order, each rule is applied as long as an accordings item has been found, and the whole collection of rules is applied again and again until none of the rules has found an item. If you need to look up e.g. the parent of the element in a rule, it is important to know if this parent has already been changed by another rule, i.e. if a preceding rule has transformed this element. An example is given in the following section “Transformations with inverse order”. The usage of several transformations as described in the preciding paragraph might help here. Methods to work with better contextual information are described in the sections “Transformations with attachments for context information”, “Transformations with document versions”, and “Transformations with traversals” below.
Also note that using an XTransformation you can only transform a whole document. In the section “Transformations with traversals” below, another option is described for transforming any XML tree.
A transformation can be stopped by calling stop() on the transformation, although that only works indirectly:
var transformationAlias: XTransformation? = nil
let transformation = XTransformation {
XRule(forElements: "a") { _ in
transformationAlias?.stop()
}
}
transformationAlias = transformation
transformation.execute(inDocument: myDocument)
Transformations with inverse order
As noted in the last section, the order of rules a crucial in some transformation, e.g. if the original context is important.
The “inverse order” of rules goes from the inner elements to the outer element so that the context is still unchanged when the rule applies, note the lookup of element.parent?.name to differentiate the color of the text:
let document = try parseXML(fromText: """
<document>
<section>
<hint>
<paragraph>This is a hint.</paragraph>
</hint>
<warning>
<paragraph>This is a warning.</paragraph>
</warning>
</section>
</document>
""", textAllowedInElementWithName: { $0 == "paragraph" })
let transformation = XTransformation {
XRule(forElements: "paragraph") { element in
let style: String? = if element.parent?.name == "warning" {
"color:Red"
} else {
nil
}
element.replace {
XElement("p", ["style": style]) {
element.content
}
}
}
XRule(forElements: "hint", "warning") { element in
element.replace {
XElement("div") {
XElement("p", ["style": "bold"]) {
element.name.uppercased()
}
element.content
}
}
}
}
transformation.execute(inDocument: document)
document.echo(pretty: true)
Result:
<document>
<section>
<div>
<p style="bold">HINT</p>
<p>This is a hint.</p>
</div>
<div>
<p style="bold">WARNING</p>
<p style="color:Red">This is a warning.</p>
</div>
</section>
</document>
This method might not be fully applicable in some transformations.
Transformations with attachments for context information
To have information about the context in the original document of transformed elements, attachements might be used. See how in the following code attached: ["source": element.name] is used in the construction of the div element, and how this information is then used in the rules for the paragraph element (the input document is the same as in the section “Transformations with inverse order” above; note that the inverse order described in that section is not used here):
let transformation = XTransformation {
XRule(forElements: "hint", "warning") { element in
element.replace {
XElement("div", attached: ["source": element.name]) {
XElement("p", ["style": "bold"]) {
element.name.uppercased()
}
element.content
}
}
}
XRule(forElements: "paragraph") { element in
let style: String? = if element.parent?.attached["source"] as? String == "warning" {
"color:Red"
} else {
nil
}
element.replace {
XElement("p", ["style": style]) {
element.content
}
}
}
}
transformation.execute(inDocument: document)
document.echo(pretty: true)
The result is the same as in the section “Transformations with inverse order” above.
Transformations with document versions
As explained in the above section about rules, sometimes you need to know the original context of a transformed element. For this you can use document versions, as explained below.
Note that this method comes with an penalty regarding efficiency because to need to create a (temparary) clone, but for very difficult transformations that might come in handy. The method might be used when you need to examine the orginal context in a complex way.
You first create a document version (this creates a clone such that your current document contains backlinks to the clone), and in certian rules, you might then copy the backlink from the node to be replaced by using the withBackLinkFrom: argument in the creation of an element (the input document is the same as in the section “Transformations with inverse order” above):
let transformation = XTransformation {
XRule(forElements: "hint", "warning") { element in
element.replace {
XElement("div", withBackLinkFrom: element) {
XElement("p", ["style": "bold"]) {
element.name.uppercased()
}
element.content
}
}
}
XRule(forElements: "paragraph") { element in
let style: String? = if element.parent?.backlink?.name == "warning" {
"color:Red"
} else {
nil
}
element.replace {
XElement("p", ["style": style]) {
element.content
}
}
}
}
// make a clone with inverse backlinks,
// pointing from the original document to the clone:
document.makeVersion()
transformation.execute(inDocument: document)
// remove the clone:
document.forgetLastVersion()
document.echo(pretty: true)
The result is the same as in the section “Transformations with inverse order” above.
Transformations with traversals
There is also another possibility for formulating transformations which uses traversals and which and can also be applied to parts of a document or to XML trees that are not part of a document.
As the XML tree can be changed during a traversal, you can traverse an XML tree and change the tree during the traversal by e.g. formulating manipulations according to the name of the current element inside a switch statement.
If you then formulate manipulations during the down direction of the traversal, you know that parents or other ancestors of the current node have already been transformed. Conversely, if you formulate manipulations only inside the up: traversal part and never manipulate any ancestors of the current element, you know that the parent and other ancestors are still the original ones (the input document is the same as in the section “Transformations with inverse order” above):
for section in document.elements("section") {
section.traverse { node in
// -
} up: { node in
if let element = node as? XElement {
guard node !== section else { return }
switch element.name {
case "paragraph":
let style: String? = if element.parent?.name == "warning" {
"color:Red"
} else {
nil
}
element.replace {
XElement("p", ["style": style]) {
element.content
}
}
case "hint", "warning":
element.replace {
XElement("div") {
XElement("p", ["style": "bold"]) {
element.name.uppercased()
}
element.content
}
}
default:
break
}
}
}
}
document.echo(pretty: true)
As the root of the traversal is not to be removed during the traversal, there is an according guard statement.
The result is the same as in the section “Transformations with inverse order” above.
Note that when using traversals for transforming an XML tree, using several transformations instead of one does have a negative impact on efficiency.
Keeping element identity during transformations
When transforming elements, it might be convenient to keep the identity of transformed elements, so the backlink property works also e.g. for a parent. It then might be better to just change the name and the attributes of an element instead of replacing it by a new one during the transformation.
Prefixes and namespaces
```swift
let mathMLPrefix = myDocument.prefix(forNamespaceURI: "http://www.w3.org/1998/Math/MathML")
for element in document.elements(prefix: mathMLPrefix, "math", "mo", "mi") {
print("element \"\(element.name)\" with prefix \"\(element.prefix ?? "")\"")
}
```
A namespace is referenced by a unique URI (Uniform Resource Identifier) and is supposed to differentiate between elements of different purpose or origin. Namespaces in the serialization of an XML document (i.e. in the textual file) are defined as attributes in the form `xmlns:prefix="namespace URI"` or `xmlns="namespace URI"` (with some values for the prefix `prefix` and the namespace URI `namespace URI`) and are valid in their respective contexts (i.e. in the according part of the XML tree). Elements that belong to that namespace then have the name `prefix:name` (with some value for `name`) in this serialization. `prefix:name` is then also called the “qualified” name and the value `name` is called the “local” name. In the case of `xmlns="namespace URI"`, all elements in the context whitout a prefix are supposed to belong to the namespace, as long as no competing definition occurs. An attribute can also have a namespace prefix, it is then considered to be an extra thing to the attributes that an element already has and which should _not_ have a namespace prefix set.
The handling of namespaces in this library differs from other libraries for XML in that the prefix plays a more prominent role. In addition to the name, elements can also have a prefix, which is not only useful for referencing namespaces, but can also be used independently of namespaces to distinguish between elements with the same name during the processing of a document. It is also very straightforward to write code that works regardless of whether a namespace is used for the corresponding elements, without losing the definiteness if the namespace is declared.
In this library, an element has the `prefix` property which is `nil` by default and which denotes the prefix (without the colon), and the `name` property which denotes the “local” name (there is no method to get the “qualified” name, as this should not be of any use during processing). Prefixes are crucial for direct access to elements and thus also differentiate the rules accordingly. For elements with a prefix, rules or a searches based on the element names like `children("someName")` have to use the additional `prefix:` argument as in `children(prefix: myPrefix, "someName")` to find the according elements, the method call `children("someName")` only finds elements which do not have the prefix set. If really needed, use e.g. `children({ $0.name == "..." })` to find elements with a certain name independently of the prefix. You can also search only by prefix e.g. with `descendants(prefix: myPrefix)`. If you use these methods without any arguments e.g. `descendants()` note that _only elements without prefix are found,_ this is different from using the according property e.g. `descendants` which is independent of prefixes. Use the methods `has(prefix:name:)` and `set(prefix:name:)` of an element to conveniently check and set the two values of `prefix` and `name`. Using an empty prefix value as in `myElement.prefix = ""` actually sets the prefix to `nil`.
If you need a new prefix independently of namespaces, use the method `registerIndependentPrefix(withPrefixSuggestion:)` of the document which returns an actual prefix to be used. If you add an element to a document with a literal prefix (not using the `prefix` property), this prefix will not be used as prefixes by subsequent uses of `register(namespaceURI:withPrefixSuggestion:)` or `registerIndependentPrefix(withPrefixSuggestion:)`, but nothing prevents collisions with previously registered prefixes.
When reading a document, namespace prefix definitions are only recognized if the argument `namespaceAware` is set to `true` in the call of the parse function used. The namespaces with their prefixes are registered at the document, according namespace attributes (`xmlns:...=...` or `xmlns=...`) are not (!) set in the tree and only appear in a serialization of the document or of parts of it (they appear after every other attribute at the top element of the serialization). To register a new namespace with its prefix, use the method `register(namespaceURI:withPrefixSuggestion:)` of the document which returns the actual prefix to be used.
During the reading of the document, an element that uses a namespace prefix defined in its context then gets the name _without_ the prefix (and without the separating colon), the prefix (without the separating colon) is separately stored in the `prefix` property of the element (which by default is `nil`). The actual prefixes might get changed during this process to avoid multiple prefix definitions for the same namespace URI or collisions, use the method `prefix(forNamespaceURI:)` of the document to get the actual prefix. On the other hand, an element with a colon in its orginal name whose literal prefix does not match a defined namespace prefix in its context then always keeps the full name and gets the prefix value `nil`. But such a literal prefix might cause the actual value of a namespace prefix to change during reading, so that in a serialization of the document the element does not acquire a different meaning.
When you add an element to a document with a `prefix` property for which a namespace URI is registered, you supposedly want to reference this namespace.
During serialization, every prefix value which is not `nil` is written as the prefix of the name (with a separating colon). Use the arguments `overwritingPrefixesForNamespaceURIs:` and `overwritingPrefixes:` of the serialization and output methods (each with an according map which has the prefixes for the serialization as values) to change prefixes in the serialization, where an empty String value means not outputting a prefix. Independently from those two arguments, use the argument `suppressDeclarationForNamespaceURIs:` to suppress the according namespace declarations in the output. Be careful with those settings as there is no check for consistency.
Some XML documents declare a namespace at the top of the document in the form `xmlns="..."` i.e. without a prefix to define the schema to be used for the document. When reading such a document with `namespaceAware: true`, consequently a prefix is created for this namespace and used for all according elements to conserve the affiliation to the namespace. Rules and the usual name based searches then have to take that prefix into account. If you want to avoid this, use the setting `silentEmptyRootPrefix: true` when parsing. The according namespace URI is then still registered at the document, but with prefix value `""`, and the according elements then have no prefix value set (their prefix value is `nil`), so no prefix value has to be considered in rules and searches. We then call this namespace a “silent” namespace. The method `prefix(forNamespaceURI:)` of the document returns `nil` for such a namespace, so you can use the prefix returned by this method for rules and searches regardless of the setting of `silentEmptyRootPrefix:` and also use this prefix in the construction of according elements. When adding an element without a prefix in the face of a silent namespace, the element is considerd to belong to the silent namespace.
When writing code that takes a possible prefix into account (i.e. the code should work regardless of whether `prefix` has a value or is `nil`), test your code with appropriate prefixes set, e.g. with the default `silentEmptyRootPrefix: false` and an according namespace definition in the source.
When moving elements between documents, missing namespaces with their prefixes are added to the target document, and prefixes of the moved elements are adjusted if necessary. For a removed or cloned element, the according namespace URI can still be found as long as the orginal document still exists and has not changed this value, so the element then behaves the same as being directly moved between documents.
Generally, there is no a need to change any prefix for a registered namespace during processing (there are also no tools added that would simplify this), just use the prefix returned by `prefix(forNamespaceURI:)` and, if necessary, define prefixes for a serialization.
Attributes can also have a prefix, set or get an attribute value via `element[prefix,name]`, where `prefix` might be `nil`. With regard to namespaces, the treatment of attributes corresponds to the treatment of elements, and independently of each other. The attributes that do not have an explicit namespace prefix set in the source do not get a namespace prefix during parsing, regardless of whether the corresponding element belongs to a namespace. So e.g. `myElement["id"]` also returns the according attribute value of an element that has a prefix, and `document.registeredAttributes("id")` finds all these attributes if `"id"` is a registered attribute name. Consequently, changing the prefix of an element does not change the prefixes of its attributes. For registering attributes with namespace prefixes or their values during parsing, namespace URIs must be given, and these provisions are translated into the according prefixes when the document has been parsed. When creating a document without parsing, to register attributes with prefixes or their values, the prefixes themselves must be specified.
Example:
```swift
let source = """
<a>
<math:math xmlns:math="http://www.w3.org/1998/Math/MathML"><math:mi>x</math:mi></math:math>
<b xmlns:math2="http://www.w3.org/1998/Math/MathML">
<math2:math><math2:mi>n</math2:mi>math2:mo>!</math2:mo></math2:math>
</b>
</a>
"""
let document = try parseXML(fromText: source, namespaceAware: true)
document.echo()
```
The resulting output is:
```xml
<a xmlns:math="http://www.w3.org/1998/Math/MathML">
<math:math><math:mi>x</math:mi></math:math>
<b>
<math:math><math:mi>n</math:mi><math:mo>!</math:mo></math:math>
</b>
</a>
```
When searching for elements with prefixes, those prefixes have to be used in the calls of the according methods, see the example at the beginning of this section which has as output:
```text
element "math" with prefix "math"
element "mi" with prefix "math"
element "math" with prefix "math"
element "mi" with prefix "math"
element "mo" with prefix "math"
```
A rule for one of those elements then could be formulated as follows:
```swift
let mathMLPrefix = myDocument.prefix(forNamespaceURI: "http://www.w3.org/1998/Math/MathML")
let transformation = XTransformation {
XRule(forPrefix: mathMLPrefix, "mo") { mo in
...
}
...
}
```
In the examples above, if the namespace URI is not declared in the source (and no according prefixes set at the elements), then the method `prefix(forNamespaceURI:)` of the document returns `nil`, and the code is still valid.
### Using async/await
You can use `traverse` with closures using `await`. And you can use the `async` property of the [Swift Async Algorithms package](https://github.com/apple/swift-async-algorithms) (giving a `AsyncLazySequence`) to apply `map` etc. with closures using `await` (e.g. `element.children.async.map { await a.f($0) }`).
Currently the SwiftXML packages defined a `forEachAsync` method for closure arguments using `await`, but this method might be removed in future versions of the package if the Swift Async Algorithms package should define it for `AsyncLazySequence`.
### Convenience extensions
`XContent` has the following extensions that are very convenient when working with XML in a complex manner:
- `applying`: apply some changes to an instance and return the instance
- `pulling`: take the content and give something else back, e.g. “pulling” something out of it
- `fullfilling`: test a condition for an instance and return it the condition is true, else return `nil`
- `fullfills`: test a condition on an instance return its result
(`fullfilling` is, in principle, a variant of the `filter` method for just one item.)
It is difficult to show the convenience of those extension with simple examples, where is easy to formulate the code without them. But they come in handy if the situation gets more complex.
Example:
```swift
let element1 = XElement("a") {
XElement("child-of-a") {
XElement("more", ["special": "yes"])
}
}
let element2 = XElement("b")
if let childOfA = element1.fullfilling({ $0.name == "a" })?.children.first,
childOfA.children.first?.fullfills({ $0["special"] == "yes" && $0["moved"] != "yes" }) == true {
element2.add {
childOfA.applying { $0["moved"] = "yes" }
}
}
element2.echo()
```
Result:
```text
<b><child-of-a moved="yes"><more special="yes"/></child-of-a></b>
```
`applying` is also predefined for a content sequence or a element sequence where it is shorter than using the `map` method in the general case (where a `return` statement might have to be included) and you can directly use it to define content (without the `asContent` property decribed above):
```swift
let myElement = XElement("a") {
XElement("b", ["inserted": "yes"]) {
XElement("c", ["inserted": "yes"])
}
}
print(Array(myElement.descendants.applying{ $0["inserted"] = "yes" }))
```
Result:
```text
[<b inserted="yes">, <c inserted="yes">]
```Tools
You can read the document properties (including an empty representation of the root element) without parsing the whole document as in the following example (you need to import SwiftXMLInterfaces in order to use XDocumentSource):
let documentProperties = try XDocumentSource.url(myURL).readDocumentProperties()
This is very useful if you need basic information about the kind of document before the actual parsing process, in order to configure the parsing accordingly.
public func copyXStructure(from start: XContent, to end: XContent, upTo: XElement? = nil, correction: ((StructureCopyInfo) -> XContent)?) -> XContent?
Copies the structure from start to end, optionally up to the upTo value. start and end must have a common ancestor. Returns nil if there is no common ancestor. The returned element is a clone of the upTo value if a) it is not nil and b) upTo is an ancestor of the common ancestor or the ancestor itself. Else it is the clone of the common ancestor (but generally with a different content in both cases). The correction can do some corrections.
Debugging
If one uses multiple instances of XRule bundled into a XTRansformation to transform a whole document, in can be useful to know which actions belonging to which rules "touched" an element. In debug builds all filenames and line numbers that are executed by a transformation during execution are recorded in the encounteredActionsAt property.
Package Metadata
Repository: stefanspringer1/swiftxml
Default branch: main
README: README.md