【Envoy-03】Securing Envoy Proxy

Posted by Hao Liang's Blog on Sunday, December 17, 2023

1. Envoy Threat model

refer to: threat_model

The Threat Model is:

  • Identifying and enumerating threats and vulnerabilities
  • Devising mitigations
  • Prioritising residual risks
  • Escalating the most important risks

Why Treat Model?

  • Identify security flaws early
  • Save money and time consuming redesigns
  • Focus your security requirements
  • Identify complex risks and data flows for critical assets

2. Configuration Best Practices

refer to: best_practices/edge

An example to run envoy with secure config:

# https://github.com/solo-io/hoot/blob/master/03-security/edge.yaml
admin:
  # access log to admin interface
  access_log_path: "/tmp/envoy_admin.log"
  address:
    socket_address:
      address: 127.0.0.1 # only local machine can access to the admin interface
      port_value: 9090

# overload manager: protect envoy from overloading system resources
overload_manager:
  # Interval to check the resources
  refresh_interval: 0.25s
  resource_monitors:
    - name: "envoy.resource_monitors.fixed_heap"
      typed_config:
        "@type": type.googleapis.com/envoy.config.resource_monitor.fixed_heap.v2alpha.FixedHeapConfig
        # Defining the max heap size (tune to your system) - the thresholds below are percentages of
        # this value. For example - in kubernetes this should match your pods memory limit
        max_heap_size_bytes: 2147483648 # 2 GiB
  actions:
    # When using 95% of heap limit defined above, try to shrink it
    - name: "envoy.overload_actions.shrink_heap"
      triggers:
        - name: "envoy.resource_monitors.fixed_heap"
          threshold:
            value: 0.95
      # When reaching 98% of the heal limit defined above, stop accepting new requests
    - name: "envoy.overload_actions.stop_accepting_requests"
      triggers:
        - name: "envoy.resource_monitors.fixed_heap"
          threshold:
            value: 0.98

static_resources:
  listeners:
    - name: example_listener_name
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8443
      listener_filters:
        - name: "envoy.filters.listener.tls_inspector"
          typed_config: {}
      # Number of bytes buffered in the connection buffer.
      per_connection_buffer_limit_bytes: 32768 # 32 KiB
      filter_chains:
        - filter_chain_match:
            server_names: ["example.com", "www.example.com"]
          transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              common_tls_context:
                tls_certificates:
                  - certificate_chain: { filename: "example_com_cert.pem" }
                    private_key: { filename: "example_com_key.pem" }
          filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                # Use the peer's address as the client address (i.e. don't use XFF header to determine client address)
                # see more info here: https://www.envoyproxy.io/docs/envoy/v1.15.0/configuration/http/http_conn_man/headers#x-forwarded-for
                use_remote_address: true
                common_http_protocol_options:
                  idle_timeout: 3600s # 1 hour
                  # Reject requests that has headers with underscores, to protect upstreams that convert
                  # them to hyphens
                  headers_with_underscores_action: REJECT_REQUEST
                http2_protocol_options:
                  # max http2 streams at the same time
                  max_concurrent_streams: 100
                  # the buffer size for each stream
                  initial_stream_window_size: 65536 # 64 KiB
                  # the buffer size for the connection
                  initial_connection_window_size: 1048576 # 1 MiB
                stream_idle_timeout: 300s # 5 mins, must be disabled for long-lived and streaming requests
                request_timeout: 300s # 5 mins, must be disabled for long-lived and streaming requests
                route_config:
                  virtual_hosts:
                    - name: default
                      domains: "*"
                      routes:
                        - match: { prefix: "/" }
                          route:
                            cluster: service_foo
                            idle_timeout: 15s # must be disabled for long-lived and streaming requests
                http_filters:
                  - name: envoy.filters.http.router
  clusters:
    - name: service_foo
      connect_timeout: 15s
      type: STATIC
      lb_policy: ROUND_ROBIN
      per_connection_buffer_limit_bytes: 32768 # 32 KiB
      load_assignment:
        cluster_name: service_foo
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 8083
      http2_protocol_options:
        initial_stream_window_size: 65536 # 64 KiB
        initial_connection_window_size: 1048576 # 1 MiB
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          sni: example.com
layered_runtime:
  # use runtime to limit number of connections to prevent file descriptor exhaustion
  layers:
    - name: static_layer_0
      static_layer:
        envoy:
          resource_limits:
            listener:
              example_listener_name:
                connection_limit: 10000
        overload:
          global_downstream_max_connections: 50000 # maximum number of connection envoy allows for downstream
  • In admin section, 127.0.0.1 means only local machine can access to the admin interface
  • In overload manager section, it prevents the envoy from overloading by monitoring the resources’ usage.(DDoS protection)
  • In http connection manager section, use_remote_address means envoy can get the remote IP address of the client.(by appending the x-forward-for header in Envoy http filter, use case: matching the same user client for rate limit)
  • In layered_runtime section, it’s a global limitation for envoy proxy
    • limit the file descriptor number(e.g. limits connection number)

3. Hardening

Some Hardening Tips:

  • Use a minimal image, with no sharp tools (curl, gcc, ssh, etc…)
  • Envoy doesn’t write files unless you tell it to - ser readOnlyRootFilesystem
  • Remove capabilities
  • Non-root user
  • PIE (to allow ASLR)

An example to run envoy with hardening config:

# https://github.com/solo-io/hoot/blob/master/03-security/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: gloo
    gateway-proxy-id: gateway-proxy
    gloo: gateway-proxy
  name: gateway-proxy
  namespace: gloo-system
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      gateway-proxy-id: gateway-proxy
      gloo: gateway-proxy
  template:
    metadata:
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: "8081"
        prometheus.io/scrape: "true"
      labels:
        gateway-proxy: live
        gateway-proxy-id: gateway-proxy
        gloo: gateway-proxy
    spec:
      containers:
        - args:
            - --disable-hot-restart
          env:
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
          image: quay.io/solo-io/gloo-ee-envoy-wrapper:1.4.8
          imagePullPolicy: IfNotPresent
          name: gateway-proxy
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
            - containerPort: 8443
              name: https
              protocol: TCP
          resources:
            requests:
              cpu: 500m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 1Gi
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: 10101
            capabilities:
              add:
                - NET_BIND_SERVICE
              drop:
                - ALL
            readOnlyRootFilesystem: true
          volumeMounts:
            - mountPath: /etc/envoy
              name: envoy-config
      dnsPolicy: ClusterFirst
      serviceAccountName: gateway-proxy
      volumes:
        - configMap:
            defaultMode: 420
            name: gateway-proxy-envoy-config
          name: envoy-config
  • Resource requests/limits setting
  • In securityContext, use non-root user, not allow privilege escalation, use read only root file system
  • Drop all capability except for NET_BIND_SERVICE (allows envoy to bind a specific port)
  • Have own separate service account

4. Reference