Automated observability with Puppet in a zero-trust environment

Haroon Rafique

Manager
Development and Operations
Student Information Systems, ITS

“If you want everything to be familiar, you will never learn anything new, because it can’t be significantly different from what you already know.”

— Rich Hickey

The Toil Trap

  • Manual monitoring target updates
  • Config drift from reality
  • Onboarding friction due to cert. management
  • YAML sprawl

Zero-Toil Architecture

  • Self-registration
  • Automated discovery
  • Identity-driven
  • Zero-trust

Implementation Blueprint

  • Classify nodes automatically during CSR signing
  • Separate logic from data using Hiera roles
  • Automate service discovery with exported resources
  • Secure metrics scraping with Caddy and mTLS
  • Standardize monitoring with layered exporter profiles

Components in This Reference Stack

  • OpenVox/OpenVox Server for policy and certificates
  • OpenVoxDB for exported resource inventory
  • VictoriaMetrics for metrics storage
  • Vmagent for metrics scraping
  • Grafana for visualization

Pattern 1: CSR Classification

  • Immutable identity requested during provisioning via CSR
  • Cryptographic proof stored in the certificate extensions
  • Tamper-proof classification secure against agent-side overrides
  • Automated autosigning (optionally) enabled by secure policy scripts

pp_role and what goes into it?

  • roles
    • monitoring_scraper
    • webserver

Hiera data:

roles
├── monitoring_scraper.yaml
└── webserver.yaml

Pattern 2: Pure Data Roles

  • CSR Extension attributes

    /etc/puppetlabs/puppet/csr_attributes.yaml
    ---
    extension_requests:
      pp_role: webserver
  • Zero logic inside site manifest or node definitions

    manifests/site.pp
    node default {
      # Lookup classes from Hiera
      lookup('classes', Array[String], 'unique').include
    }

  • Hiera lookup determines config based on pp_role

    data/roles/webserver.yaml
    ---
    # Webserver role - Apache with monitoring
    # Applied to nodes with pp_role=webserver in csr_attributes.yaml
    classes:
    - profile::apache
    - profile::monitoring::apache_exporter
  • Hiera configuration

    excerpt from hiera.yaml
    hierarchy:
    - name: "Per-role data (from CSR attributes)"
      path: "roles/%{trusted.extensions.pp_role}.yaml"

Pattern 3: Auto-Discovery

  • Exported resources publish state to OpenVoxDB

    @@file { "/etc/vmagent/targets.d/${facts['networking']['fqdn']}_${name}.yaml"
      ensure  => file,
      content => @("EOT"),
        # Managed by Puppet
        - 
            - ${facts['networking']['fqdn']}:9090/${name}/metrics
          
             ${name}
             ${facts['networking']['fqdn']}
             ${name}
        | EOT
      tag     => 'vmagent_target',
    }

  • Dynamic collection gathers all active targets

    # Collect all exported vmagent targets from other nodes
    File <<| tag == 'vmagent_target' |>>

Pattern 4: Zero-Trust mTLS

  • Caddy acts as a sidecar reverse proxy and mTLS server

    excerpt from Caddyfile
    # Authenticate via Puppet CA with mTLS
    tls /etc/caddy/node-cert.pem /etc/caddy/node-key.pem {
      client_auth {
        mode require_and_verify
        trust_pool file /etc/caddy/puppet-ca.pem
      }
    }
  • Import exporter configurations

    import /etc/caddy/conf.d/*.caddy

  • Proxy the exporter traffic

    excerpt from apache.caddy
    handle_path /apache* {
      reverse_proxy localhost:9117
    }
  • Only allow legitimate scrapers

    @authorized_scrapers_snippet {
      expression \
        {tls_client_subject} == "CN=vmagent.local"
    }
  • Block unauthorized scrapers

    handle {
     abort
    }

Pattern 5: Layered Observability

  • Common profile applies to all nodes

    data/common.yaml
    classes:
    - profile::base
  • Baseline metrics applied to every node

    site/profile/manifests/base.pp
    class profile::base {
      include profile::monitoring::node_exporter
    }
  • Role-specific exporters are added based on workload

    roles/webserver.yaml
    classes:
    - profile::apache
    - profile::monitoring::apache_exporter

The Automated Pipeline

Host Lifecycle

  • Host boots and starts OpenVox agent
  • Agent submits CSR with role intent
  • Host applies role-specific profiles
  • Exporters publish scrape endpoints

System Response

  • CA auto-signs based on secure policy
  • Hiera maps data classes to the host
  • OpenVoxDB stores the exported assets
  • Vmagent polls the target automatically

Security Guardrails For Production

  • Strict validation of role-bearing CSRs.
  • Role restrictions to prevent privilege escalation.
  • Least-privilege paths for authorized scrapers only.
  • Continuous auditing of exported resources.

Does It Work?

  • Initial State: Verify current targets are up.
  • The Trigger: Provision a new webserver node.
  • The Magic: Watch Puppet execute automatic classification.
  • The Result: The new scrape target populates automatically.
  • The Payoff: View the live Grafana dashboards.

Demo

Operational Impact

  • Scales infinitely from isolated labs to massive production fleets.
  • Adapts dynamically to volatile node churn and auto-scaling.
  • Accelerates visibility by establishing instant scrape targets.
  • Minimizes effort by keeping operational overhead flat as the nodes scale.

Engineering Anti-Patterns

  • Mutable scripts installing runtime packages
  • Floating tags using latest in production
  • Hard-coded credentials in compose files
  • Copy-pasted logic repeated across services
  • Ungoverned attributes trusted without policy controls

Key Takeaways

  • Automate observability from the exact second a node boots
  • Eliminate bottlenecks by moving classification to CSRs
  • Drop manual updates using native exported resources
  • Recycle existing security to achieve zero-trust mTLS
  • Scale your infrastructure without increasing team toil

Bonus Slide: System Architecture

Questions?

Git Repo