New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operators can more easily see when routes and endpoints are added or removed from a given gorouter's routing table in the gorouter logs #172
Comments
I've spiked this out and here are the results so far: branch: https://github.com/cloudfoundry/gorouter/tree/log-route-table-changes The log messages below are the only log messages I see now... no more repeated "unregister-route" or "start-pruning" messages... I am getting a couple duplicate messages when scaling apps up and down, but only single repeats... haven't dug into what's going on there yet. Add a new route:
Scale app from 1 to 2 instances
Scale app from 2 to 1 instances
Remove a route
|
Oh, this is because I had two routes mapped to the app :) |
With this change, log noise is significantly reduced and signal is significantly boosted. Now, no messsages are emitted at steady state, and all changes to the routing table are logged. This should enable one to reconstruct the routing table at any point in time based on log messages alone. Additional details: Log noise reduction: Before, route and endpoint registrations and unregistrations were only logged at debug level, so these messages were often unavailable in troubleshooting scenarios. Furthermore, unregistration _messages_ were logged at info level, which increases log noise, as these messages are repeated several times. Finally, log messages marking the start and end of the route/endpoint pruning polling cycles are now reduced to debug level. In current versions of gorouter, these rarely result in any activity, as the pruning strategy for stale routes is now based on lazy evaluation of application identity based on the TLS connection to backends. And in the case where this pruning cycle is still meaningful, while the polling cycle is not demarcated, messages for each route/endpoint that are pruned are still logged at info level. Signal boosting: The following log messages are emitted any time the routing table changes: - `route-registered`: a new route is added to the table - `route-unregistred`: an existing route is removed from the table - `endpoint-registered`: a new backend is added to the table e.g. an app is scaled up and a new app instance is started - `endpoint-unregistered`: a backend is removed from the table e.g. an app is scaled down and an app instance is stopped cloudfoundry/routing-release#172
Hi 👋 This functionality is now available in Routing Release Thanks, |
With this change, log noise is significantly reduced and signal is significantly boosted. Now, no messsages are emitted at steady state, and all changes to the routing table are logged. This should enable one to reconstruct the routing table at any point in time based on log messages alone. Additional details: Log noise reduction: Before, route and endpoint registrations and unregistrations were only logged at debug level, so these messages were often unavailable in troubleshooting scenarios. Furthermore, unregistration _messages_ were logged at info level, which increases log noise, as these messages are repeated several times. Finally, log messages marking the start and end of the route/endpoint pruning polling cycles are now reduced to debug level. In current versions of gorouter, these rarely result in any activity, as the pruning strategy for stale routes is now based on lazy evaluation of application identity based on the TLS connection to backends. And in the case where this pruning cycle is still meaningful, while the polling cycle is not demarcated, messages for each route/endpoint that are pruned are still logged at info level. Signal boosting: The following log messages are emitted any time the routing table changes: - `route-registered`: a new route is added to the table - `route-unregistred`: an existing route is removed from the table - `endpoint-registered`: a new backend is added to the table e.g. an app is scaled up and a new app instance is started - `endpoint-unregistered`: a backend is removed from the table e.g. an app is scaled down and an app instance is stopped cloudfoundry/routing-release#172
Summary
Our logs have very limited information about the state of the routing table as we only log when routes are unregistered not when they are registered.
In addition, the unregister messages themselves are noisier than need be because we log any time we receive an "unregister-route" message, and these messages are repeated a number of times by the route-emitter make sure they are received.
Rather than logging what messages we receive, it would be nice if instead we had the following
Info
level logs if and only if those messages result in an change to the routing table:Expected Results
Given A new route is added
When a given gorouter adds the route to its routing table
Then I see exactly one log message in gorouter.stdout.log indicating the route was added.
Given A route is removed or pruned
When a given gorouter removes the route from its routing table
Then I see exactly one log message in gorouter.stdout.log indicating the route was added.
Given An endpoint is added for a route (e.g. when an app is scaled up from N to N+1 instances)
When a given gorouter adds the endpoint to its routing table
Then I see exactly one log message in gorouter.stdout.log for each endpoint added.
Given An endpoint is removed or pruned for a route (e.g. when an app is scaled down from N to N-1 instances)
When a given gorouter removes the endpoint from its routing table
Then I see exactly one log message in gorouter.stdout.log for each endpoint removed.
Context
An ongoing source of support burden circles around questions of "stale routes" which are often suspected as a possible cause of folks 'issues when getting errors in their requests to apps, and our documentation for troubleshooting error responses has an entire section devoted to diagnosing stale routes.
The gorouter routing table is eventually consistent (and therefore often inconsistent). The docs cover how to check the routing table at any moment using the
/var/vcap/jobs/gorouter/bin/retreive-local-routes
script that gets put on the router VM. But that only tells you what routes the router has in its table now. When handling support issues, we usually start with logs.Our logs have very limited information about the state of the routing table as we only log when routes are unregistered not when they are registered. More specifically we log any time we receive an "unregister-route" message.
"register-route" messages and "unregister-route" messages are both sent repeatedly to make sure they are received, so logging them all is noisy. While still noisy, we do log "unregister-route" messages because they stop after some time. But "register-route" messages are repeated indefinitely, so we don't log them at all.
Rather than logging what messages we receive at Info level, it would be nice if instead we logged at Info level any time a register-route message results in a route or endpoint actually being added to or removed from the routing table.
The text was updated successfully, but these errors were encountered: