1. Envoy Observability
Concept:
- Mechanisms to observe Envoy’s state
- Debugging and monitoring Envoy
Overview:
- Admin interface
- stats
- config dump
- clusters
- log level
- Debug logs
- Access logs
- Metrics Collection
- Tracing
2. Admin Interface
- /stats : histogram metrics, current status of Envoy(e.g. how many requests, how many succeeded, how many failed)
- /config_dump: dump current internal Envoy configuration
- /clusters: actual membership of cluster
- /logging: Envoy logs
# https://github.com/solo-io/hoot/blob/master/02-observe/stats.yaml
admin:
access_log_path: /dev/stdout
address:
socket_address: { address: 127.0.0.1, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 10000 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: edge_http
route_config:
name: local_route
virtual_hosts:
- name: namespace.local_service
virtual_clusters:
- name: actions
headers:
- name: ":path"
prefix_match: "/foo"
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: somecluster }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: somecluster
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: somecluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8082
Run the envoy config with stats
# envoy -c ./stats.yaml
The admin UI will be in: http://127.0.0.1:9901
Envoy stats
stats: http://127.0.0.1:9901/stats?filter=&format=html&type=All&histogram_buckets=cumulative
# cluster.<cluster_name>.<stats_name>: <stats>
cluster.somecluster.assignment_stale: 0
cluster.somecluster.assignment_timeout_received: 0
cluster.somecluster.assignment_use_cached: 0
cluster.somecluster.bind_errors: 0
cluster.somecluster.circuit_breakers.default.cx_open: 0
cluster.somecluster.circuit_breakers.default.cx_pool_open: 0
cluster.somecluster.circuit_breakers.default.rq_open: 0
cluster.somecluster.circuit_breakers.default.rq_pending_open: 0
cluster.somecluster.circuit_breakers.default.rq_retry_open: 0
cluster.somecluster.circuit_breakers.high.cx_open: 0
cluster.somecluster.circuit_breakers.high.cx_pool_open: 0
cluster.somecluster.circuit_breakers.high.rq_open: 0
cluster.somecluster.circuit_breakers.high.rq_pending_open: 0
cluster.somecluster.circuit_breakers.high.rq_retry_open: 0
cluster.somecluster.default.total_match_count: 73
cluster.somecluster.lb_healthy_panic: 0
cluster.somecluster.lb_local_cluster_not_ok: 0
cluster.somecluster.lb_recalculate_zone_structures: 0
cluster.somecluster.lb_subsets_active: 0
cluster.somecluster.lb_subsets_created: 0
cluster.somecluster.lb_subsets_fallback: 0
cluster.somecluster.lb_subsets_fallback_panic: 0
cluster.somecluster.lb_subsets_removed: 0
cluster.somecluster.lb_subsets_selected: 0
cluster.somecluster.lb_zone_cluster_too_small: 0
cluster.somecluster.lb_zone_no_capacity_left: 0
cluster.somecluster.lb_zone_number_differs: 0
cluster.somecluster.lb_zone_routing_all_directly: 0
cluster.somecluster.lb_zone_routing_cross_zone: 0
cluster.somecluster.lb_zone_routing_sampled: 0
cluster.somecluster.max_host_weight: 1
Envoy config dump
config dump: http://127.0.0.1:9901/config_dump?resource=&mask=&name_regex=
Envoy cluster discovery
clusters: http://127.0.0.1:9901/clusters
current member of the cluster:
somecluster::observability_name::somecluster
somecluster::default_priority::max_connections::1024
somecluster::default_priority::max_pending_requests::1024
somecluster::default_priority::max_requests::1024
somecluster::default_priority::max_retries::3
somecluster::high_priority::max_connections::1024
somecluster::high_priority::max_pending_requests::1024
somecluster::high_priority::max_requests::1024
somecluster::high_priority::max_retries::3
somecluster::added_via_api::false
somecluster::127.0.0.1:8082::cx_active::0
somecluster::127.0.0.1:8082::cx_connect_fail::0
somecluster::127.0.0.1:8082::cx_total::0
somecluster::127.0.0.1:8082::rq_active::0
somecluster::127.0.0.1:8082::rq_error::0
somecluster::127.0.0.1:8082::rq_success::0
somecluster::127.0.0.1:8082::rq_timeout::0
somecluster::127.0.0.1:8082::rq_total::0
somecluster::127.0.0.1:8082::hostname::127.0.0.1
somecluster::127.0.0.1:8082::health_flags::healthy
somecluster::127.0.0.1:8082::weight::1
somecluster::127.0.0.1:8082::region::
somecluster::127.0.0.1:8082::zone::
somecluster::127.0.0.1:8082::sub_zone::
somecluster::127.0.0.1:8082::canary::false
somecluster::127.0.0.1:8082::priority::0
somecluster::127.0.0.1:8082::success_rate::-1
somecluster::127.0.0.1:8082::local_origin_success_rate::-1
Envoy log level
log level: http://127.0.0.1:9901/logging
active loggers:
admin: info
alternate_protocols_cache: info
aws: info
assert: info
backtrace: info
cache_filter: info
client: info
config: info
connection: info
conn_handler: info
decompression: info
dns: info
dubbo: info
envoy_bug: info
ext_authz: info
ext_proc: info
rocketmq: info
file: info
filter: info
...
change Envoy log level to debug:
# static
# envoy -c ./stats.yaml -l debug
# dynamic
# curl -XPOST "localhost:9901/logging?level=debug"
3. Access logs
# https://github.com/solo-io/hoot/blob/master/02-observe/accesslogs.yaml
admin:
access_log_path: /dev/stdout
address:
socket_address: { address: 127.0.0.1, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 10000 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
access_log:
- name: "envoy.access_loggers.file"
filter:
status_code_filter:
comparison:
op: GE
value:
default_value: 400
runtime_key: "filter.request_type"
typed_config:
"@type": "type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog"
path: /dev/stdout
stat_prefix: edge_http
route_config:
name: local_route
virtual_hosts:
- name: namespace.local_service
domains: ["*"]
routes:
- match: { prefix: "/" }
route: { cluster: somecluster }
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: somecluster
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: somecluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8082
the crucial part of access log config:
access_log:
- name: "envoy.access_loggers.file"
filter:
status_code_filter:
comparison:
op: GE
value:
default_value: 400
runtime_key: "filter.request_type"
typed_config:
"@type": "type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog"
path: /dev/stdout
It clarifies whenever the status code is equal or greater than 400, write access log to /dev/stdout(terminal output).
# print access log every 1 ms
# envoy -c ./accesslogs.yaml --file-flush-interval-msec 1
The 500 status code access log printed on terminal
[2023-12-09T05:33:20.740Z] "GET / HTTP/1.1" 503 UF 0 151 8 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "27d6dca9-1170-4f92-be1f-214a260e639b" "127.0.0.1:10000" "127.0.0.1:8082"
[2023-12-09T05:33:21.086Z] "GET /favicon.ico HTTP/1.1" 503 UF 0 151 0 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "22f9a6d6-8c43-474d-91fa-394d364fbb9e" "127.0.0.1:10000" "127.0.0.1:8082"
[2023-12-09T05:33:43.391Z] "GET /123 HTTP/1.1" 503 UF 0 151 1 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "7ca8235b-b4c2-4f42-ab9e-c4a98f65bf85" "127.0.0.1:10000" "127.0.0.1:8082"
[2023-12-09T05:33:43.505Z] "GET /favicon.ico HTTP/1.1" 503 UF 0 151 2 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" "711fb9fb-09b5-4df2-b4e7-6f17e06d43bc" "127.0.0.1:10000" "127.0.0.1:8082"
4. Metrics Collection
Prometheus integration: http://127.0.0.1:9901/stats/prometheus?filter=
# TYPE envoy_cluster_assignment_stale counter
envoy_cluster_assignment_stale{envoy_cluster_name="somecluster"} 0
# TYPE envoy_cluster_assignment_timeout_received counter
envoy_cluster_assignment_timeout_received{envoy_cluster_name="somecluster"} 0
# TYPE envoy_cluster_assignment_use_cached counter
envoy_cluster_assignment_use_cached{envoy_cluster_name="somecluster"} 0
# TYPE envoy_cluster_bind_errors counter
envoy_cluster_bind_errors{envoy_cluster_name="somecluster"} 0
# TYPE envoy_cluster_default_total_match_count counter
envoy_cluster_default_total_match_count{envoy_cluster_name="somecluster"} 55
5. Tracing
Jaeger integration, trace the associated requests and report them to jaeger open tracing system.
# https://github.com/solo-io/hoot/blob/master/02-observe/jaeger.yaml
admin:
access_log_path: /dev/stdout
address:
socket_address: { address: 127.0.0.1, port_value: 9901 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 10000 }
traffic_direction: OUTBOUND
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
generate_request_id: true
tracing:
provider:
name: envoy.tracers.dynamic_ot
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.DynamicOtConfig
library: ./libjaegertracing.so.0.4.2
config:
service_name: edge-proxy
sampler:
type: const
param: 1
reporter:
localAgentHostPort: 127.0.0.1:6831
headers:
jaegerDebugHeader: jaeger-debug-id
jaegerBaggageHeader: jaeger-baggage
traceBaggageHeaderPrefix: edgectx-
baggage_restrictions:
denyBaggageOnInitializationFailure: false
hostPort: ""
stat_prefix: edge_http
use_remote_address: true
route_config:
name: local_route
virtual_hosts:
- name: namespace.local_service
domains: ["*"]
routes:
- match: { prefix: "/" }
decorator:
operation: fetchContent
route:
cluster: somecluster
rate_limits:
- actions:
- {source_cluster: {}}
- {generic_key: {descriptor_value: some_value}}
http_filters:
- name: envoy.filters.http.rate_limit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: "domain"
timeout: 5s
rate_limit_service:
grpc_service:
timeout: 5s
envoy_grpc:
cluster_name: rate-limit
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: somecluster
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: somecluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8082
- name: rate-limit
http2_protocol_options: {}
connect_timeout: 0.25s
type: logical_dns
lb_policy: round_robin
load_assignment:
cluster_name: rate-limit
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 10004
- name: jaeger
connect_timeout: 1s
type: strict_dns
lb_policy: round_robin
load_assignment:
cluster_name: jaeger
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 9411
# envoy -c jaeger.yaml