Overview

YAML (YAML Ain’t Markup Language) is a human-friendly data serialization format widely used for configuration (Docker, Kubernetes, Ansible) and data storage. Indentation-based syntax makes it easy to read:

Example:

person:
  name: Alice
  age: 30
  skills:
    - Python
    - JavaScript

Why YAML Isn’t Built-in

Unlike JSON, included in Python’s standard library for its dominance in web APIs, YAML relies on third-party packages due to:

  • JSON’s Prevalence: JSON is the standard for data exchange in web services.
  • YAML’s Niche: YAML excels in configurations but is less common for data interchange.
  • Third-Party Support: Packages like PyYAML and ruamel.yaml meet YAML needs effectively.
  • Evolving Specifications: YAML’s versions (1.0, 1.1, 1.2, 1.2.2) introduce complexity and compatibility risks, likely contributing to its exclusion from the standard library, unlike JSON’s single, stable specification.

YAML Specifications

YAML has evolved through multiple versions:

VersionReleaseKey Features
1.02001Initial specification
1.12005Improved syntax, clarifications, Introduced a lot of strings as Booleans
1.22009JSON compatibility
1.2.22021Minor updates, clarifications

YAML 1.2 is a superset of JSON, but version differences may cause parsing issues.

Python YAML Packages

Key packages for YAML handling include:

PackageYAML VersionKey FeaturesBest For
PyYAML1.1Simple parsing/dumpingBasic YAML tasks
ruamel.yaml1.2Round-trip, preserves commentsAdvanced YAML, formatting preservation
strictyaml1.1/1.2Type-safe, restricted parsingSecure, limited-feature parsing

Install via pip:

pip install pyyaml
pip install ruamel.yaml

When to Use Each Package

  • PyYAML: Ideal for simple YAML 1.1 parsing or legacy systems. Lacks YAML 1.2 support.
  • ruamel.yaml: Best for YAML 1.2 or when preserving comments/formatting is needed. More extensible.
  • strictyaml: Use for secure, type-safe parsing with restricted features.

Key YAML Features

YAML offers:

  • Readability: Uses spaces (not tabs) for indentation.
  • Data Types: Scalars (strings, numbers), sequences (lists), mappings (dictionaries).
  • Comments: Supports # for inline documentation.
  • Extensibility: Allows custom tags for new data types.
  • Anchors/Aliases: Reduces duplication with & and *.

Example with Comments and Anchors/Aliases:

# Comments start with # for documentation
# Scalars: basic data types
string: "Hello, World!"
number: 42
boolean: true
null_value: null
 
# Sequences: lists
skills:
  - Python
  - JavaScript
 
# Mappings: dictionaries
person:
  name: Alice
  age: 30
 
# Anchors/Aliases: reuse data to avoid duplication
config: &config  # Define anchor 'config'
  env: production
  retries: 3
 
service:
  <<: *config  # Merge 'config' anchor
  retries: 6
  port: 8080
 
database:
  <<: *config  # Reuse 'config' anchor
  host: localhost
  port: 5432

The service and database section is equivalent to below duplicated version:

service:
  env: production
  retries: 6 # overwritten by service retries value
  port: 8080
 
database:
  env: production
  retries: 3
  host: localhost
  port: 5432

Reading and Writing YAML

PyYAML Example:

import yaml
# Read from string
yaml_string = "name: Bob\nage: 25"
data = yaml.safe_load(yaml_string)
print(data["name"])  # Output: Bob
 
# Write to file
data = {"name": "Charlie", "age": 35}
with open("output.yaml", "w") as file:
    yaml.safe_dump(data, file)

CAUTION

Never parse YAML from untrusted sources with yaml.load() unless using a safe loader. Always prefer safe_load() or explicitly specify a safe loading strategy.

ruamel.yaml Example:

from ruamel.yaml import YAML
yaml = YAML()
with open("config.yaml", "r") as file:
    data = yaml.load(file)
data["age"] = 35
with open("output.yaml", "w") as file:
    yaml.dump(data, file)

Compatibility Issues

  • YAML Versions: YAML 1.2’s stricter rules may fail with YAML 1.1 parsers like PyYAML.
  • ruamel.yaml: Version 0.18.0 deprecated load/dump, requiring YAML().load()/YAML().dump().
  • JSON: Single specification, no versioning issues, fully compatible with YAML 1.2.

TIP

If you rely on on/off as booleans, stick with a YAML 1.1–aware parser (e.g., PyYAML). For YAML 1.2 (e.g., ruamel.yaml), quote them or replace them with true/false.

Best Practices:

  • Use encoding='utf-8' for non-ASCII files.

  • Use safe_load or YAML(typ='safe') to avoid security risks (code injection).

  • Handle errors:

    try:
        data = yaml.safe_load(file)
    except yaml.YAMLError as e:
        print(f"Invalid YAML: {e}")

Applications

  • Configuration Files: Defines services in Docker, Kubernetes, Ansible.
  • Data Serialization: Stores structured, readable data.
  • DevOps: Simplifies infrastructure-as-code workflows.