Parsing YAML Documents

When we talk about 'loading' a YAML stream, we mean that a YAML document is translated into native types. In Ruby, this might be a Hash, an Array or any other Ruby object. But before YAML is loaded into those types, it must be parsed. Parsing is the stage where the structure of the document becomes apparent, but not the native typing.

YAML.rb gives you access to a YAML document before it is transformed. At this stage, the document is represented as a tree of YAML::YamlNode objects. This structure can be quite useful for accessing the data as a raw structure, much as the XML world has their DOM API. Also, you can use YPath queries to retrieve data from the structure. Schemas can be applied to the YamlNode tree, to validate if the structure is intact and syntactically correct.

The YAML::parse and YAML::parse_documents methods are way of accessing this parsed data.

Parsing a single document

The YAML::parse method has the same syntax as the YAML::load method. A single IO object or String containing a YAML document is passed in to the method. Rather than returning a native Ruby object, though, the YAML::parse method returns a YamlNode representing the document.

tree = YAML::parse( File.open( "README" ) )
puts tree.type_id
# prints:
#   map

title = tree.select( "/title" )[0]
puts title.value
# prints:
#   YAML.rb

obj_tree = tree.transform
puts obj_tree['title']
# prints:
#   YAML.rb
Ex. 43: Parsing a YAML document

The YamlNode returned contains type and value information for the root-level collection or scalar. If, for example, the document contains a mapping at the root level, then the YamlNode will have a type_id of 'map' and a map of YamlNodes will be contained the object's 'value' property.

node = YAML::parse( <<EOY )
one: 1
two: 2
EOY

puts node.type_id
# prints: 'map'

p node.value['one']
# prints key and value nodes: 
#   [ #<YAML::YamlNode:0x8220278 @type_id="str", @value="one", @kind="scalar">, 
#     #<YAML::YamlNode:0x821fcd8 @type_id="int", @value="1", @kind="scalar"> ]'

# Mappings can also be accessed for just the value by accessing as a Hash directly
p node['one']
# prints: #<YAML::YamlNode:0x821fcd8 @type_id="int", @value="1", @kind="scalar"> 
Ex. 30: YamlNode representing a root-level mapping

Traversing a tree of YamlNodes can be painstaking in comparison to having the native types around. YPath statements are a much quicker means of querying for the data you need. YPath queries also give you a way to build new sets of YamlNodes for transformation.

The YamlNode#select method can be used to retrieve a sequence of matching nodes. The YamlNode#transform method can be applied to a YamlNode to complete the loading of a node into a native Ruby type.

players = YAML::parse( <<EOY )
  player:
    - given: Sammy
      family: Sosa
    - given: Ken
      family: Griffey
    - given: Mark
      family: McGwire
EOY

given = players.select( "/player/*/given" )
p given.transform
# prints:
#   ["Sammy", "Ken", "Mark"]
Ex. 45: Transforming the results of a YPath selection
Parsing many documents

The YAML::parse_documents method is identical to the YAML::load_documents method, except that the iterator loops through each document returning a YamlNode for that document. YPath expressions, schema validations, and transformations can all be applied to this YamlNode, as described above.

require 'yaml'
log = File.open( "/var/log/apache.yaml" )
yp = YAML::parse_documents( log ) { |tree|
  at = tree.select('/at')[0].value
  type = tree.select('/type')[0].value
  puts "#{at} #{type}"
}
Ex. 44: Parsing YAML documents from a stream