đź”— Key Take Aways
- YAML is human-readable, ideal for configurations, with versions 1.0 to 1.2.2.
- Python uses PyYAML (YAML 1.1) for simplicity, ruamel.yaml (YAML 1.2) for advanced features.
- YAML’s evolving specifications likely contribute to its absence from Python’s standard library, unlike JSON’s stable standard.
- Use safe parsing, handle encoding/errors, and select the appropriate package for your YAML version and needs.
Overview
YAML (YAML Ain’t Markup Language) is a human-friendly data serialization format widely used for configuration (Docker, Kubernetes, Ansible) and data storage. Indentation-based syntax makes it easy to read:
Example:
person:
name: Alice
age: 30
skills:
- Python
- JavaScript
Why YAML Isn’t Built-in
Unlike JSON, included in Python’s standard library for its dominance in web APIs, YAML relies on third-party packages due to:
- JSON’s Prevalence: JSON is the standard for data exchange in web services.
- YAML’s Niche: YAML excels in configurations but is less common for data interchange.
- Third-Party Support: Packages like PyYAML and ruamel.yaml meet YAML needs effectively.
- Evolving Specifications: YAML’s versions (1.0, 1.1, 1.2, 1.2.2) introduce complexity and compatibility risks, likely contributing to its exclusion from the standard library, unlike JSON’s single, stable specification.
YAML Specifications
YAML has evolved through multiple versions:
Version | Release | Key Features |
---|---|---|
1.0 | 2001 | Initial specification |
1.1 | 2005 | Improved syntax, clarifications, Introduced a lot of strings as Booleans |
1.2 | 2009 | JSON compatibility |
1.2.2 | 2021 | Minor updates, clarifications |
YAML 1.2 is a superset of JSON, but version differences may cause parsing issues.
Python YAML Packages
Key packages for YAML handling include:
Package | YAML Version | Key Features | Best For |
---|---|---|---|
PyYAML | 1.1 | Simple parsing/dumping | Basic YAML tasks |
ruamel.yaml | 1.2 | Round-trip, preserves comments | Advanced YAML, formatting preservation |
strictyaml | 1.1/1.2 | Type-safe, restricted parsing | Secure, limited-feature parsing |
Install via pip:
pip install pyyaml
pip install ruamel.yaml
When to Use Each Package
- PyYAML: Ideal for simple YAML 1.1 parsing or legacy systems. Lacks YAML 1.2 support.
- ruamel.yaml: Best for YAML 1.2 or when preserving comments/formatting is needed. More extensible.
- strictyaml: Use for secure, type-safe parsing with restricted features.
Key YAML Features
YAML offers:
- Readability: Uses spaces (not tabs) for indentation.
- Data Types: Scalars (strings, numbers), sequences (lists), mappings (dictionaries).
- Comments: Supports
#
for inline documentation. - Extensibility: Allows custom tags for new data types.
- Anchors/Aliases: Reduces duplication with
&
and*
.
Example with Comments and Anchors/Aliases:
# Comments start with # for documentation
# Scalars: basic data types
string: "Hello, World!"
number: 42
boolean: true
null_value: null
# Sequences: lists
skills:
- Python
- JavaScript
# Mappings: dictionaries
person:
name: Alice
age: 30
# Anchors/Aliases: reuse data to avoid duplication
config: &config # Define anchor 'config'
env: production
retries: 3
service:
<<: *config # Merge 'config' anchor
retries: 6
port: 8080
database:
<<: *config # Reuse 'config' anchor
host: localhost
port: 5432
The service
and database
section is equivalent to below duplicated version:
service:
env: production
retries: 6 # overwritten by service retries value
port: 8080
database:
env: production
retries: 3
host: localhost
port: 5432
Reading and Writing YAML
PyYAML Example:
import yaml
# Read from string
yaml_string = "name: Bob\nage: 25"
data = yaml.safe_load(yaml_string)
print(data["name"]) # Output: Bob
# Write to file
data = {"name": "Charlie", "age": 35}
with open("output.yaml", "w") as file:
yaml.safe_dump(data, file)
CAUTION
Never parse YAML from untrusted sources with
yaml.load()
unless using a safe loader. Always prefersafe_load()
or explicitly specify a safe loading strategy.
ruamel.yaml Example:
from ruamel.yaml import YAML
yaml = YAML()
with open("config.yaml", "r") as file:
data = yaml.load(file)
data["age"] = 35
with open("output.yaml", "w") as file:
yaml.dump(data, file)
Compatibility Issues
- YAML Versions: YAML 1.2’s stricter rules may fail with YAML 1.1 parsers like PyYAML.
- ruamel.yaml: Version 0.18.0 deprecated
load
/dump
, requiringYAML().load()
/YAML().dump()
. - JSON: Single specification, no versioning issues, fully compatible with YAML 1.2.
TIP
If you rely on
on
/off
as booleans, stick with a YAML 1.1–aware parser (e.g., PyYAML). For YAML 1.2 (e.g.,ruamel.yaml
), quote them or replace them withtrue
/false
.
Best Practices:
-
Use
encoding='utf-8'
for non-ASCII files. -
Use
safe_load
orYAML(typ='safe')
to avoid security risks (code injection). -
Handle errors:
try: data = yaml.safe_load(file) except yaml.YAMLError as e: print(f"Invalid YAML: {e}")
Applications
- Configuration Files: Defines services in Docker, Kubernetes, Ansible.
- Data Serialization: Stores structured, readable data.
- DevOps: Simplifies infrastructure-as-code workflows.