New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing more Envoy configuration knobs #2225
Comments
Thank you for raising this issue. I want to give a bit of background. The general policy of this project has been to avoid knobs and tuneable for their own sake where possible. We've done this in the past by adjusting our defaults, and only in the cases where there are no generally applicable defaults exposed those tuneable to contour users. What I would propose is that rather than tackling six different aspects of connection manager parameters, which are only tangentially related by the fact that they are all kind of tuneable things and are all presented in a single issue, we break this request into different parts, because before implementing anything, we need to have a user story, a reason to do something, and a way to know when we did it correctly. The more different requests are mixed into one issue, the lower the chance of success. Does that sound reasonable to you? |
@davecheney totally reasonable. Thanks for sharing your reasoning. I mentioned the HTTPConnectionManager configuration to share some of the options we chose to explicitly configure, and how we were able to configure all of them as a group of options, rather than individual changes. Our starting point was the recommended configuration described at https://www.envoyproxy.io/docs/envoy/latest/configuration/best_practices/edge and discussed by @mattalberts in this thread: #1375 (comment) What do you think about starting with |
@davecheney Although historically Contour has always aimed for the defaults to be universal, maybe there's a case for a general envoy parameterization mechanism so that we don't end up adding support for a lot of different parameters on a case-by-case basis? |
WRT max connection duration, what’s the use case for limiting it? |
@davecheney our experience of providing ingress capabilities for all the teams using our cluster has led us to enforce cluster-wide limits to help with potential application issues. One concrete example: we’ve hit this issue many times golang/go#30702 and applying max-duration for connection helped in mitigating its impact. Choosing a good default applicable to different use cases seems somewhat tricky, hence the original desire to make this a configurable option with sane default. |
@pims thanks for the information. What would be a good value for you? |
@davecheney we’ve configured most of our timeouts to have a value of Let me know if you'd like to move this conversation to a dedicated issue. And we can close this one. |
I do have a need to tune (little bit) Envoy with exactly same values. If this is out of scope for Contour what other suggestion are there as a replacement? @pims is it possible to open-source your work? |
@linecolumn Can you please log a separate issue with your use case? As we say in our philosophy document, we prefer not to add tunables without trying to find better defaults first. Outlining what you are trying to do and how tuning those values will help you allows us to determine if we can change a default first, or if not, which settings need to be made tunable. I think it's important to note that Envoy's timeouts are not straightforward, when I first looked into how to do client timeouts, the timeout I thought we needed to set was not correct. |
@linecolumn https://github.com/mattalberts/contour/tree/v1.0.1-cloudflare Some recent changes haven't been pushed yet, but this covers a few things we had to change. |
I'm bringing some conversation from #2247 back here, as it's larger in scope than that single use case. Also, sorry for the giant wall of text. After some discussion amongst the Contour maintainers, I'd like to talk to everyone about adding more tweakability for these timeout parameters. I can see from the discussion on #2247 that for many of these timeouts, default values are missing valid use cases, and from @pims' fork that you, our users, really want a lot more customisation than you have. (As an aside, @pims Is there any chance I could schedule a call with you to talk about your fork? I'm really interested in how you've used all the tweakability you've added.) What I really want to avoid is having the I'll add the action items for this into a separate comment so you don't need to parse this giant wall of text to figure out what's happening next. As part of this, I'd like everyone watching this issue to think about how you'd change these defaults and why. Are there any changes that would be useful for all users of Contour, or are the changes you're making specific to your use cases? About ext_auth and other settings, I'm still strongly of the opinion that, if Contour accepts configuration for something, we should surface information for that config back to the people who it affects. For services that involve passing a third service to Envoy, then I believe that Contour needs to be able to expose some information about that service back to its consumers. I've covered this requirement in #2325, and I'd really like some sort of answer to this before we do anything more to provide any functionality that uses a similar pattern. For external auth, that means having a way for Contour to check that the external auth service is healthy and processing requests - the intent is to allow someone who creates a HTTPProxy object to be able to see why all of a sudden everything is failing by inspecting the HTTPProxy object itself, rather than having to know where to go to check on the auth service. Maybe there are reasons that's not practical, but if so, I want to have thought about them, and recorded why we're not doing that. I should also be clear that I'm very skeptical of the idea that we need to allow the passing of arbitrary Envoy config. Part of the point of Contour is to abstract away Envoy details, I'm yet to be convinced this would be a good way to allow more flexibility. |
Here's where I would like to take this configurability forward:
|
@youngnick thanks for the detailed follow-up.
If we believe Happy to get on a call and talk about our use case. I'll email you. |
Thanks for the catch-up @pims. I agree that having all these settings on the command line will end up unwieldy, so I'll have a think about a more general stance there too. |
Okay, from here, I'm going to propose the following:
There's some other stuff in the queue before this, but I will work on scheduling this work for after 1.5. |
From our slack discussion about timeouts.
|
I found this resource on Envoy timeouts a while ago and it looks quite comprehensive - https://github.com/chemicL/envoy-timeouts. Please ensure that the documentation and clarity for timeouts makes it clear what applies to what covering:
|
Still to do:
|
I'm going to close this out as the discussed timeouts have been added as options to the config file and documented. There's some additional work I'm still doing to reconcile the timeout syntax between the API and the config file (see #2697 and #2698), and #2690 has been created to cover moving |
Please describe the problem you have
[A clear, concise, description of the problem you are facing. What is the problem that feature X would solve for you?]
When deploying Contour/Envoy in a large cluster with heterogenous workloads (gRPC, websockets, h2, etc.), some Envoy settings become quite important to get right.
Here's a subset of settings we have configured through a custom patch:
Most of these are part of the
HTTPConnectionManager
filter configuration.We have been using these settings with great success for several months in production now, and would be interested in contributing these changes upstream.
My questions to the Contour team: would this fall into the scope of what you envision for Contour?
If you believe that this is out of scope for this project, would you be open to minor code changes that would make it easier for people to patch.
A concrete example being the addition of
requestTimeout
in commit d197482If this could be wrapped in some kind of
type HTTPConnectionManagerConfiguration struct{}
, adding unsupported fields would no longer require changing the function signature, which is used extensively through the comprehensive test suite.Examples of features we have added over the years:
Worth noting, we were able to retain all of Contour's simplicity through custom defaults while giving us the ability to configure Envoy to our needs.
wdyt?
The text was updated successfully, but these errors were encountered: