Prometheus Metrics withSpringBoot + GRPC Services

However, all of our internal services use LogNet’s awesome SpringBoot GRPC library to communicate but there’s no native Micrometer support. GRPC itself does have internal metrics but they aren’t yet exposed to Spring in that GRPC library. Since we are a tiny startup with limited resources, we did some simple things to get Micrometer hooked up to our GRPC services for some basic metrics.

Add Automated Security Testing to Your Pipeline

Micrometer Setup

Our Micrometer setup was to include the dependency in our service’s build file:

implementation("io.micrometer:micrometer-registry-prometheus")

And since these are internal services, we exposed everything:

YAML
management:
 endpoints:
   web:
     exposure:
       include: "*"

Then for every service, we have the HTTP endpoints $HOST:$PORT/actuator/metrics and $HOST:$PORT/actuator/prometheus available for use.

Prometheus Configuration

We run things in Kubernetes, so we first add the following annotations to our service pods to make them discoverable by Prometheus.

YAML
metadata:
  annotations: 
    prometheus.io/scrape: "true"
    prometheus.io/path: "/actuator/prometheus"
    prometheus.io/port: "<port>"

And we add the following job to Prometheus Server’s prometheus.yml to discover and scrape pods.

YAML
scrape_config:
  - job_name: kubernetes-pods
	kubernetes_sd_configs:
  	- role: pod
	relabel_configs:
  	- action: keep
    	regex: true
    	source_labels:
      	- __meta_kubernetes_pod_annotation_prometheus_io_scrape
  	- action: replace
    	regex: (.+)
    	source_labels:
      	- __meta_kubernetes_pod_annotation_prometheus_io_path
    	target_label: __metrics_path__
  	- action: replace
    	regex: ([^:]+)(?::\d+)?;(\d+)
    	replacement: $1:$2
    	source_labels:
      	- __address__
      	- __meta_kubernetes_pod_annotation_prometheus_io_port
    	target_label: __address__
  	- action: labelmap
    	regex: __meta_kubernetes_pod_label_(.+)
  	- action: replace
    	source_labels:
      	- __meta_kubernetes_namespace
    	target_label: kubernetes_namespace
  	- action: replace
    	source_labels:
      	- __meta_kubernetes_pod_name
    	target_label: kubernetes_pod_name

This job is already included by default with the Prometheus Helm chart.

Method Timings

We went with the standard Spring/Micrometer generic method timing approach for this. The upside was that it was trivial to implement, but the downside is that we have to remember to annotate each GRPC method.

In a @Configuration class, we added a TimedAspect bean:

Kotlin
@Bean
fun timedAspect(registry: MeterRegistry): TimedAspect {
   return TimedAspect(registry)
}

And then for every GRPC call, we throw on a @Timed annotation.

Kotlin
@Timed
override fun getFoo(request: FooService.GetFooRequest,
                               responseObserver: StreamObserver<FooService.FooResponse>) {
[...]
}

This adds then adds the GRPC method metrics to the Prometheus actuator under the /actuator/prometheus endpoint:

Kotlin
# HELP method_timed_seconds  
# TYPE method_timed_seconds summary
method_timed_seconds_count{class="com.stackhawk.FooService",exception="none",method="createFoo",} 3.0
method_timed_seconds_sum{class="com.stackhawk.Foo",exception="none",method="createFoo",} 0.0344318
# HELP method_timed_seconds_max  
# TYPE method_timed_seconds_max gauge
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="createFoo",} 0.0272329
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="updateFoo",} 0.0181494

With that getting pulled into Prometheus, we can then do things like get the average length per GRPC call using PromQL like so:

Kotlin

rate(method_timed_seconds_sum[1m]) / rate(method_timed_seconds_count[1m])

Exception Metrics

For this, we decided to hook in a Micrometer registry counter into our existing generic GRPC exception handler, which lives in an internal shared library that all GRPC services automatically pull in via our common Gradle platform.

All we did here was to add the MeterRegistry to the constructor, so it gets set by the Spring context. Then we use that MeterRegistry instance to increment a counter with the full class name as a Tag in the catch block.

Kotlin
class GlobalGrpcExceptionHandler(private val registry: MeterRegistry? = null) : ServerInterceptor {

   private val logger: Logger = LoggerFactory.getLogger(GlobalGrpcExceptionHandler::class.java)

   override fun <ReqT : Any?, RespT : Any?> interceptCall(call: ServerCall<ReqT, RespT>?, headers: Metadata?, next: ServerCallHandler<ReqT, RespT>?): ServerCall.Listener<ReqT> {
       val delegate = next?.startCall(call, headers)
       return object : ForwardingServerCallListener.SimpleForwardingServerCallListener<ReqT>(delegate) {
           override fun onHalfClose() {
               try {
                   super.onHalfClose()
               } catch (e: Exception) {
                   registry?.counter("grpc.exception.counter", Tags.of("type", e.javaClass.canonicalName))?.increment()

                   logger.error(e.message, e)
                   call?.close(Status.INTERNAL
                           .withCause(e)
                           .withDescription(e.message), Metadata())
               }
           }
       }
   }
}

Then each service gets the context’s MeterRegistry autowired into a config constructor and just sets it on the exception handler bean:

Kotlin
@Configuration
class FooConfig(private val meterRegistry: MeterRegistry) {

	@Bean
	@GRpcGlobalInterceptor
	fun globalGrpcExceptionHandler(): GlobalGrpcExceptionHandler {
  		 return GlobalGrpcExceptionHandler(meterRegistry)
	}
}

With those in place, the /actuator/prometheus endpoint now has a new counter with the full class name of the exception as a tag:

Kotlin

# HELP grpc_exception_counter_total  
# TYPE grpc_exception_counter_total counter
grpc_exception_counter_total{type="software.amazon.awssdk.core.exception.SdkClientException",} 1.0

Which in PromQL then lets you do stuff like:

Kotlin

rate(grpc_exception_counter_total[1m])

Topher Lamey | July 30, 2020

BLOG

How Does StackHawk Work?

Explore Stackhawk

Watch a Demo

Getting Started

API Discovery

API Discovery

Oversight

Modern DAST

StackHawk for Financial Services

Getting Started With AppSec

DevSecOps

API Security Testing

Developer-First AppSec

OWASP Top 10

GraphQL Security Testing

gRPC Security Testing

Integrate with your existing tools and workflows

GitHub

Snyk

AWS

Atlassian

Microsoft

Our Hawksome Customers

Financial Services

StackHawk + GitHub

Getting Started

StackHawk API

Integrations

StackHawk Platform

Authentication

BLOG

Dynamic Application Security Testing: Overview and Tooling

Maturity Model

Watch a Demo

Blog

Getting Started

Press Room

About

Micrometer Setup

Prometheus Configuration

Method Timings

Exception Metrics

Read More