When combining multiple fields into a single unique identifier, you want something compact, deterministic (if needed), and with minimal risk of collision. Below is a list of methods ordered from most to least unique.
1. UUIDv5 (Namespace + Name-Based)
import uuid
def generate_key(email, sample_number):
data = f"{email}|{sample_number}"
return str(uuid.uuid5(uuid.NAMESPACE_DNS, data))
Pros:
- Very low collision risk
- Standard UUID format
- Deterministic (same input → same UUID)
Cons:
- 36 characters
- Based on SHA-1 (but still safe for this use)
2. SHA-256 + Base64 (Full Digest or Truncated)
import hashlib, base64
def generate_key(email, sample_number):
data = f"{email}|{sample_number}"
digest = hashlib.sha256(data.encode()).digest()
return base64.urlsafe_b64encode(digest).decode()[:16]
Pros:
- Deterministic
- Compact (truncated)
- Strong cryptographic hash
Cons:
- Truncation introduces slight collision risk
- Not a standard identifier format
3. SHA-256 + Hex Digest (Truncated)
import hashlib
def generate_key(email, sample_number):
data = f"{email}|{sample_number}"
return hashlib.sha256(data.encode()).hexdigest()[:16]
Pros:
- Deterministic
- Easy to debug (hex format)
Cons:
- Truncation reduces uniqueness
- Larger output than Base64 for same bit length
4. UUIDv4 (Random-Based)
import uuid
def generate_key():
return str(uuid.uuid4())
Pros:
- Universally unique (very low collision risk)
- Standard format
Cons:
- Not deterministic
- Cannot regenerate same key without storing original
5. MurmurHash3 (via mmh3)
import mmh3
def generate_key(email, sample_number):
data = f"{email}|{sample_number}"
return str(mmh3.hash(data))
Pros:
- Fast
- Short output
- Deterministic
Cons:
- Not cryptographically secure
- Higher chance of collision than SHA-based methods
- Requires external package
6. Substring Concatenation
def generate_key(email, sample_number):
return f"{email[:8]}_{sample_number[:8]}"
Pros:
- Very fast
- Human-readable
Cons:
- Very high collision risk
- Not suitable for unique identification
- Only works in controlled or small-scale environments