Moxml: Modern XML processing for Ruby

Contents

Introduction and purpose
Getting started
- Basic document creation
Working with documents
- Using the builder pattern
- Direct document manipulation
XML objects and their methods
Advanced features
Error handling
Configuration
Thread safety
Performance considerations
- Memory management
- Efficient querying
Best practices
- Document creation
- Node manipulation
Contributing
License

Introduction and purpose

Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying.

Key features:

Intuitive, Ruby-idiomatic API for XML manipulation
Consistent interface across different XML libraries
Efficient node mapping for XPath queries
Support for all XML node types and features
Easy switching between XML processing engines
Clean separation between interface and implementation

Getting started

Install the gem and at least one supported XML library:

# In your Gemfile
gem 'moxml'
gem 'nokogiri'  # Or 'ox' or 'oga'

Basic document creation

require 'moxml'

# Create a new XML document
doc = Moxml.new.create_document

# Add XML declaration
doc.add_declaration(version: "1.0", encoding: "UTF-8")

# Create root element with namespace
root = doc.create_element('book')
root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
doc.add_child(root)

# Add content
title = doc.create_element('dc:title')
title.text = 'XML Processing with Ruby'
root.add_child(title)

# Output formatted XML
puts doc.to_xml(indent: 2)

Working with documents

Using the builder pattern

The builder pattern provides a clean DSL for creating XML documents:

doc = Moxml.new.build do
  declaration version: "1.0", encoding: "UTF-8"

  element 'library', xmlns: 'http://example.org/library' do
    element 'book' do
      element 'title' do
        text 'Ruby Programming'
      end

      element 'author' do
        text 'Jane Smith'
      end

      comment 'Publication details'
      element 'published', year: '2024'

      cdata '<custom>metadata</custom>'
    end
  end
end

Direct document manipulation

doc = Moxml.new.create_document

# Add declaration
doc.add_declaration(version: "1.0", encoding: "UTF-8")

# Create root with namespace
root = doc.create_element('library')
root.add_namespace(nil, 'http://example.org/library')
doc.add_child(root)

# Add elements with attributes
book = doc.create_element('book')
book['id'] = 'b1'
root.add_child(book)

# Add mixed content
book.add_child(doc.create_comment('Book details'))
title = doc.create_element('title')
title.text = 'Ruby Programming'
book.add_child(title)

XML objects and their methods

Document object

The Document object represents an XML document and serves as the root container for all XML nodes.

# Creating a document
doc = Moxml.new.create_document
doc = Moxml.new.parse(xml_string)

# Document properties and methods
doc.encoding                # Get document encoding
doc.encoding = "UTF-8"      # Set document encoding
doc.version                # Get XML version
doc.version = "1.1"        # Set XML version
doc.standalone             # Get standalone declaration
doc.standalone = "yes"     # Set standalone declaration

# Document structure
doc.root                   # Get root element
doc.children              # Get all top-level nodes
doc.add_child(node)       # Add a child node
doc.remove_child(node)    # Remove a child node

# Node creation methods
doc.create_element(name)   # Create new element
doc.create_text(content)   # Create text node
doc.create_cdata(content)  # Create CDATA section
doc.create_comment(content) # Create comment
doc.create_processing_instruction(target, content) # Create PI

# Document querying
doc.xpath(expression)      # Find nodes by XPath
doc.at_xpath(expression)   # Find first node by XPath

# Serialization
doc.to_xml(options)        # Convert to XML string

Element object

Elements are the primary structural components of an XML document, representing tags with attributes and content.

# Element properties
element.name               # Get element name
element.name = "new_name"  # Set element name
element.text              # Get text content
element.text = "content"   # Set text content
element.inner_html        # Get inner XML content
element.inner_html = xml   # Set inner XML content

# Attributes
element[name]             # Get attribute value
element[name] = value     # Set attribute value
element.attributes        # Get all attributes
element.remove_attribute(name) # Remove attribute

# Namespace handling
element.namespace         # Get element's namespace
element.namespace = ns     # Set element's namespace
element.add_namespace(prefix, uri) # Add new namespace
element.namespaces        # Get all namespace definitions

# Node structure
element.parent            # Get parent node
element.children          # Get child nodes
element.add_child(node)   # Add child node
element.remove_child(node) # Remove child node
element.add_previous_sibling(node) # Add sibling before
element.add_next_sibling(node)    # Add sibling after
element.replace(node)     # Replace with another node
element.remove           # Remove from document

# Node type checking
element.element?         # Returns true
element.text?           # Returns false
element.cdata?          # Returns false
element.comment?        # Returns false
element.processing_instruction? # Returns false

# Node querying
element.xpath(expression)  # Find nodes by XPath
element.at_xpath(expression) # Find first node by XPath

Text object

Text nodes represent character data in the XML document.

# Creating text nodes
text = doc.create_text("content")

# Text properties
text.content             # Get text content
text.content = "new"     # Set text content

# Node type checking
text.text?              # Returns true

# Structure
text.parent             # Get parent node
text.remove            # Remove from document
text.replace(node)      # Replace with another node

CDATA object

CDATA sections contain text that should not be parsed as markup.

# Creating CDATA sections
cdata = doc.create_cdata("<raw>content</raw>")

# CDATA properties
cdata.content           # Get CDATA content
cdata.content = "new"   # Set CDATA content

# Node type checking
cdata.cdata?           # Returns true

# Structure
cdata.parent           # Get parent node
cdata.remove          # Remove from document
cdata.replace(node)    # Replace with another node

Comment object

Comments contain human-readable notes in the XML document.

# Creating comments
comment = doc.create_comment("Note")

# Comment properties
comment.content         # Get comment content
comment.content = "new" # Set comment content

# Node type checking
comment.comment?        # Returns true

# Structure
comment.parent          # Get parent node
comment.remove         # Remove from document
comment.replace(node)   # Replace with another node

Processing instruction object

Processing instructions provide instructions to applications processing the XML.

# Creating processing instructions
pi = doc.create_processing_instruction("xml-stylesheet",
  'type="text/xsl" href="style.xsl"')

# PI properties
pi.target              # Get PI target
pi.target = "new"      # Set PI target
pi.content            # Get PI content
pi.content = "new"     # Set PI content

# Node type checking
pi.processing_instruction? # Returns true

# Structure
pi.parent             # Get parent node
pi.remove            # Remove from document
pi.replace(node)      # Replace with another node

Attribute object

Attributes represent name-value pairs on elements.

# Attribute properties
attr.name              # Get attribute name
attr.name = "new"      # Set attribute name
attr.value            # Get attribute value
attr.value = "new"     # Set attribute value

# Namespace handling
attr.namespace         # Get attribute's namespace
attr.namespace = ns    # Set attribute's namespace

# Node type checking
attr.attribute?        # Returns true

Namespace object

Namespaces define XML namespaces used in the document.

# Namespace properties
ns.prefix             # Get namespace prefix
ns.uri               # Get namespace URI

# Formatting
ns.to_s              # Format as xmlns declaration

# Node type checking
ns.namespace?        # Returns true

Node traversal and inspection

Each node type provides methods for traversing the document structure:

node.parent               # Get parent node
node.children            # Get child nodes
node.next_sibling        # Get next sibling
node.previous_sibling    # Get previous sibling
node.ancestors           # Get all ancestor nodes
node.descendants         # Get all descendant nodes

# Type checking
node.element?           # Is it an element?
node.text?             # Is it a text node?
node.cdata?            # Is it a CDATA section?
node.comment?          # Is it a comment?
node.processing_instruction? # Is it a PI?
node.attribute?        # Is it an attribute?
node.namespace?        # Is it a namespace?

# Node information
node.document          # Get owning document
node.path              # Get XPath to node
node.line_number       # Get source line number (if available)

Advanced features

XPath querying and node mapping

Moxml provides efficient XPath querying by leveraging the native XML library’s implementation while maintaining consistent node mapping:

# Find all book elements
books = doc.xpath('//book')
# Returns Moxml::Element objects mapped to native nodes

# Find with namespaces
titles = doc.xpath('//dc:title',
  'dc' => 'http://purl.org/dc/elements/1.1/')

# Find first matching node
first_book = doc.at_xpath('//book')

# Chain queries
doc.xpath('//book').each do |book|
  # Each book is a mapped Moxml::Element
  title = book.at_xpath('.//title')
  puts "#{book['id']}: #{title.text}"
end

Namespace handling

# Add namespace to element
element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')

# Create element in namespace
title = doc.create_element('dc:title')
title.text = 'Document Title'

# Query with namespaces
doc.xpath('//dc:title',
  'dc' => 'http://purl.org/dc/elements/1.1/')

Accessing native implementation

While not typically needed, you can access the underlying XML library’s nodes:

# Get native node
native_node = element.native

# Get adapter being used
adapter = element.context.config.adapter

# Create from native node
element = Moxml::Element.new(native_node, context)

Error handling

Moxml provides specific error classes for different types of errors that may occur during XML processing:

begin
  doc = context.parse(xml_string)
rescue Moxml::ParseError => e
  # Handles XML parsing errors
  puts "Parse error at line #{e.line}, column #{e.column}"
  puts "Message: #{e.message}"
rescue Moxml::ValidationError => e
  # Handles XML validation errors
  puts "Validation error: #{e.message}"
rescue Moxml::XPathError => e
  # Handles XPath expression errors
  puts "XPath error: #{e.message}"
rescue Moxml::Error => e
  # Handles other Moxml-specific errors
  puts "Error: #{e.message}"
end

Configuration

Moxml can be configured globally or per instance:

# Global configuration
Moxml.configure do |config|
  config.default_adapter = :nokogiri
  config.strict = true
  config.encoding = 'UTF-8'
end

# Instance configuration
moxml = Moxml.new do |config|
  config.adapter = :ox
  config.strict = false
end

Thread safety

Moxml is thread-safe when used properly. Each instance maintains its own state and can be used safely in concurrent operations:

class XmlProcessor
  def initialize
    @mutex = Mutex.new
    @context = Moxml.new
  end

  def process(xml)
    @mutex.synchronize do
      doc = @context.parse(xml)
      # Modify document
      doc.to_xml
    end
  end
end

Performance considerations

Memory management

Moxml maintains a node registry to ensure consistent object mapping:

doc = context.parse(large_xml)
# Process document
doc = nil  # Allow garbage collection of document and registry
GC.start   # Force garbage collection if needed

Efficient querying

Use specific XPath expressions for better performance:

# More efficient - specific path
doc.xpath('//book/title')

# Less efficient - requires full document scan
doc.xpath('//title')

# Most efficient - direct child access
root.xpath('./title')

Best practices

Document creation

# Preferred - using builder pattern
doc = Moxml.new.build do
  declaration version: "1.0", encoding: "UTF-8"
  element 'root' do
    element 'child' do
      text 'content'
    end
  end
end

# Alternative - direct manipulation
doc = Moxml.new.create_document
doc.add_declaration(version: "1.0", encoding: "UTF-8")
root = doc.create_element('root')
doc.add_child(root)

Node manipulation

# Preferred - chainable operations
element
  .add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
  .add_child(doc.create_text('content'))

# Preferred - clear node type checking
if node.element?
  node.add_child(doc.create_text('content'))
end

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin feature/my-new-feature)
Create a new Pull Request

License

This project is licensed under the BSD-2-Clause License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
bin		bin
lib		lib
sig		sig
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
Gemfile		Gemfile
README.adoc		README.adoc
Rakefile		Rakefile
moxml.gemspec		moxml.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moxml: Modern XML processing for Ruby

Introduction and purpose

Getting started

Basic document creation

Working with documents

Using the builder pattern

Direct document manipulation

XML objects and their methods

Document object

Element object

Text object

CDATA object

Comment object

Processing instruction object

Attribute object

Namespace object

Node traversal and inspection

Advanced features

XPath querying and node mapping

Namespace handling

Accessing native implementation

Error handling

Configuration

Thread safety

Performance considerations

Memory management

Efficient querying

Best practices

Document creation

Node manipulation

Contributing

License

About

Releases

Packages

Languages

lutaml/moxml

Folders and files

Latest commit

History

Repository files navigation

Moxml: Modern XML processing for Ruby

Introduction and purpose

Getting started

Basic document creation

Working with documents

Using the builder pattern

Direct document manipulation

XML objects and their methods

Document object

Element object

Text object

CDATA object

Comment object

Processing instruction object

Attribute object

Namespace object

Node traversal and inspection

Advanced features

XPath querying and node mapping

Namespace handling

Accessing native implementation

Error handling

Configuration

Thread safety

Performance considerations

Memory management

Efficient querying

Best practices

Document creation

Node manipulation

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages