However, all of our internal services use LogNet’s awesome SpringBoot GRPC library to communicate but there’s no native Micrometer support. GRPC itself does have internal metrics but they aren’t yet exposed to Spring in that GRPC library. Since we are a tiny startup with limited resources, we did some simple things to get Micrometer hooked up to our GRPC services for some basic metrics.
Micrometer Setup
Our Micrometer setup was to include the dependency in our service’s build file:
implementation("io.micrometer:micrometer-registry-prometheus")
And since these are internal services, we exposed everything:
management:
endpoints:
web:
exposure:
include: "*"
Then for every service, we have the HTTP endpoints $HOST:$PORT/actuator/metrics
and $HOST:$PORT/actuator/prometheus
available for use.
Prometheus Configuration
We run things in Kubernetes, so we first add the following annotations to our service pods to make them discoverable by Prometheus.
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/actuator/prometheus"
prometheus.io/port: "<port>"
And we add the following job to Prometheus Server’s prometheus.yml to discover and scrape pods.
scrape_config:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
This job is already included by default with the Prometheus Helm chart.
Method Timings
We went with the standard Spring/Micrometer generic method timing approach for this. The upside was that it was trivial to implement, but the downside is that we have to remember to annotate each GRPC method.
In a @Configuration
class, we added a TimedAspect
bean:
@Bean
fun timedAspect(registry: MeterRegistry): TimedAspect {
return TimedAspect(registry)
}
And then for every GRPC call, we throw on a @Timed
annotation.
@Timed
override fun getFoo(request: FooService.GetFooRequest,
responseObserver: StreamObserver<FooService.FooResponse>) {
[...]
}
This adds then adds the GRPC method metrics to the Prometheus actuator under the /actuator/prometheus
endpoint:
# HELP method_timed_seconds
# TYPE method_timed_seconds summary
method_timed_seconds_count{class="com.stackhawk.FooService",exception="none",method="createFoo",} 3.0
method_timed_seconds_sum{class="com.stackhawk.Foo",exception="none",method="createFoo",} 0.0344318
# HELP method_timed_seconds_max
# TYPE method_timed_seconds_max gauge
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="createFoo",} 0.0272329
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="updateFoo",} 0.0181494
With that getting pulled into Prometheus, we can then do things like get the average length per GRPC call using PromQL like so:
rate(method_timed_seconds_sum[1m]) / rate(method_timed_seconds_count[1m])
Exception Metrics
For this, we decided to hook in a Micrometer registry counter into our existing generic GRPC exception handler, which lives in an internal shared library that all GRPC services automatically pull in via our common Gradle platform.
All we did here was to add the MeterRegistry
to the constructor, so it gets set by the Spring context. Then we use that MeterRegistry
instance to increment a counter with the full class name as a Tag
in the catch block.
class GlobalGrpcExceptionHandler(private val registry: MeterRegistry? = null) : ServerInterceptor {
private val logger: Logger = LoggerFactory.getLogger(GlobalGrpcExceptionHandler::class.java)
override fun <ReqT : Any?, RespT : Any?> interceptCall(call: ServerCall<ReqT, RespT>?, headers: Metadata?, next: ServerCallHandler<ReqT, RespT>?): ServerCall.Listener<ReqT> {
val delegate = next?.startCall(call, headers)
return object : ForwardingServerCallListener.SimpleForwardingServerCallListener<ReqT>(delegate) {
override fun onHalfClose() {
try {
super.onHalfClose()
} catch (e: Exception) {
registry?.counter("grpc.exception.counter", Tags.of("type", e.javaClass.canonicalName))?.increment()
logger.error(e.message, e)
call?.close(Status.INTERNAL
.withCause(e)
.withDescription(e.message), Metadata())
}
}
}
}
}
Then each service gets the context’s MeterRegistry autowired into a config constructor and just sets it on the exception handler bean:
@Configuration
class FooConfig(private val meterRegistry: MeterRegistry) {
@Bean
@GRpcGlobalInterceptor
fun globalGrpcExceptionHandler(): GlobalGrpcExceptionHandler {
return GlobalGrpcExceptionHandler(meterRegistry)
}
}
With those in place, the /actuator/prometheus
endpoint now has a new counter with the full class name of the exception as a tag:
# HELP grpc_exception_counter_total
# TYPE grpc_exception_counter_total counter
grpc_exception_counter_total{type="software.amazon.awssdk.core.exception.SdkClientException",} 1.0
Which in PromQL then lets you do stuff like:
rate(grpc_exception_counter_total[1m])