When combining multiple fields into a single unique identifier, you want something compact, deterministic (if needed), and with minimal risk of collision. Below is a list of methods ordered from most to least unique.


1. UUIDv5 (Namespace + Name-Based)

import uuid
 
def generate_key(email, sample_number):
    data = f"{email}|{sample_number}"
    return str(uuid.uuid5(uuid.NAMESPACE_DNS, data))

Pros:

  • Very low collision risk
  • Standard UUID format
  • Deterministic (same input → same UUID)

Cons:

  • 36 characters
  • Based on SHA-1 (but still safe for this use)

2. SHA-256 + Base64 (Full Digest or Truncated)

import hashlib, base64
 
def generate_key(email, sample_number):
    data = f"{email}|{sample_number}"
    digest = hashlib.sha256(data.encode()).digest()
    return base64.urlsafe_b64encode(digest).decode()[:16]

Pros:

  • Deterministic
  • Compact (truncated)
  • Strong cryptographic hash

Cons:

  • Truncation introduces slight collision risk
  • Not a standard identifier format

3. SHA-256 + Hex Digest (Truncated)

import hashlib
 
def generate_key(email, sample_number):
    data = f"{email}|{sample_number}"
    return hashlib.sha256(data.encode()).hexdigest()[:16]

Pros:

  • Deterministic
  • Easy to debug (hex format)

Cons:

  • Truncation reduces uniqueness
  • Larger output than Base64 for same bit length

4. UUIDv4 (Random-Based)

import uuid
 
def generate_key():
    return str(uuid.uuid4())

Pros:

  • Universally unique (very low collision risk)
  • Standard format

Cons:

  • Not deterministic
  • Cannot regenerate same key without storing original

5. MurmurHash3 (via mmh3)

import mmh3
 
def generate_key(email, sample_number):
    data = f"{email}|{sample_number}"
    return str(mmh3.hash(data))

Pros:

  • Fast
  • Short output
  • Deterministic

Cons:

  • Not cryptographically secure
  • Higher chance of collision than SHA-based methods
  • Requires external package

6. Substring Concatenation

def generate_key(email, sample_number):
    return f"{email[:8]}_{sample_number[:8]}"

Pros:

  • Very fast
  • Human-readable

Cons:

  • Very high collision risk
  • Not suitable for unique identification
  • Only works in controlled or small-scale environments