XML (Extensible Markup Language) can be found everywhere, from system configuration files to websites. XML files are very logical in the way that they associate items into groups and properties into items. However, it’s not always easy to work with the data that is held inside an XML file.
In this series of articles, I’m going to show you how to use PowerShell to work with XML files. But before we can dig into the nuts and bolts of searching and updating XML with PowerShell, you might benefit from learning the terms and some basics about working with XML files.
To get you up to speed with what XML is all about, I’ll explain some key XML concepts that you’ll need to understand before we continue. More specifically, I’ll cover the importance of nodes, attributes, comments, and namespaces.
Nodes are the basic building blocks of data in XML, and you’ll be working with nodes most of the time. So what are nodes? XML nodes are the nested hierarchy of properties and data inside the XML code itself, as shown in the image above.
Attributes are properties for an individual node. The node will identify all of the data about the object, but the node itself can have attributes, such as title or name. So what makes it an attribute instead of a node? It’s all in the XML markup. If the property is in its own XML tag, then it’s a node. If it’s included as part of an existing node, then it’s an attribute.
Similar to HTML, comments in XML start with the < !– characters and end with — >. The following line shows an accurate sample, where I’ve added spaces before and after the <> signs:
<!-- start and end with -->
Any text preceded by the string above are considered comments and will not be processed as XML nodes.
Namespaces can add a lot of complexity to the XML discussion. While it’s an important part of XML in general, it’s not as important to working with a specific XML file in PowerShell. Just know that if two different vendors may have customer data stored in XML, then that customer data could look completely different in the two files. One vendor could have very little defined in their customer data XML, while the second vendor has a robust data collection for them.
The differences between the two customer XMLs will be defined by a namespace that will define the different parts of the XML files. Namespaces are identified as a Universal Resource Identifier (URI), which is a web address to the definition.
One of the hardest and most repeated lessons that you’re likely to learn is that XML is case sensitive. You won’t be able to get results for a node or attribute if you’ve got a capital letter in your search where a lowercase letter should be.
Now that we’ve covered the basics of what XML is and how it is structured, let’s discuss ways that you can get XML code into PowerShell. If you’re trying to use XML with PowerShell, here are a few PowerShell cmdlets that can be useful for getting an XML file to work with it.
Use the Export-CliXML cmdlet to turn data that you’re processing into an XML file that can be saved or used later.
If you’ve already saved your data as CliXML, then it’s really easy to get it back by using the Import-CliXML cmdlet. Just point it as the XML file you exported previously and save the output as a variable.
Sometimes you’ve got an XML file that doesn’t want to import because the namespace is not clearly defined. You can bypass a lot of headaches by just simplifying it to getting the content and saving it as XML. This is really easy to do by using a variable that is cast typed as XML. By casting your variable, you can specify what kind of data the variable holds. Just put [XML], including the square brackets, immediately before a new variable is declared. Don’t use a space or a separate line because your casting is going to be part of the variable declaration, and it will go directly before the $.
[XML]$MyXMLVariable
Used by itself, that command would indeed give you an XML variable, but it would still be empty. To populate the variable, you can use the Get-Content cmdlet to read in the content of an existing file.
$MyXMLVariable = Get-Content “C:Windowssystem32inetsrvconfigapplicationHost.config
This would load up a web server’s apphost config file, a frequently accessed XML file used by IIS administrators. Notice that the variable was already cast as XML, and I didn’t have to keep including the “[XML]” designator over and over again when working with that variable. You only need to cast it when the variable is created.
A node that is a subnode of another is called a child node. A node that has a child node is the parent node of those children. The parent and child terminology works across multiple levels, so you could be correct in calling a node a grandchild or great-grandchild of another node.
Another term used to describe this relationship is descendant and ancestor. The difference is that all the ancestor terms describes all nodes higher up in the chain instead of a specific hierarchal relationship. Just like all children, grandchildren, and so on are all described as descendants of a node.
Siblings describes nodes that are at the same level in the hierarchy and share the same parent.
Attributes are titles and descriptors for any specific node. This is different from a child node that has provided further data about the node. It is usually a name or a number that applies only to the specific instance of the node.
InnerXML does not include XML data for a node, but instead has the XML data for only the descendants of the node.
OuterXML is the XML for the node as well as for all descendant nodes.
With a basic understanding of XML documents and how to get them in and out of PowerShell, now you’re ready to go a little bit deeper into working with them.