Kubernetes infrastructure - VPC peering discussion

Hello!

Last week during our tech alignment, we had an opportunity to discuss VPC peering in our new Kubernetes infrastructure. The use of peering will be consistent with the existing production infrastructure and it seemed like it would satisfy our needs.

Let’s have an open discussion on designs using peering and not using peering.

I think everyone is familiar with the concept of peering but I’ll include this AWS document just in case:

https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html

Much like the diagram on that page, we have a one to one relationship between the Kubernetes cluster VPC and a “resource” VPC which contains resources like RDS and Elasticache. Right now, we do not have multiple production clusters so we only have one cluster communicating with those resources in that one resource VPC.

The problem:

Where should a shared resource, like an RDS database, exist in our network and how will network traffic be routed, monitored (I’m thinking security monitoring / flow log analysis) and restricted?

Is it appropriate to segment resource types by VPC, adding a peering connection and restricting access based on security groups?

How can you weigh in on the design?

  • Share your personal experience with VPC peering
  • Propose a different architectural model
  • Raise concerns from a performance and availability perspective
  • Raise concerns from an architecture design perspective (anti-patterns, etc.)

An additional architecture note:

Today, we have one Kubernetes cluster in one VPC (the VPC setup is part of the Terraform to create the cluster) but in the future we may have multiple clusters spread across multiple VPCs. Pods running in multiple clusters could share the same RDS database.

We reached consensus during last weeks tech alignment that VPC peering was the best way to separate concerns between cluster destruction and RDS provisioning since our RDS databases will likely be far longer lived than the clusters themselves.

Ideally, we’d like to close this issue out soon and finalize the architecture we use for deploying the database and other supporting resources.

Cheers,
Daniel

I might be missing use-cases and so on (in which case feel free to ignore), but otherwise here’s a probable anti-pattern:

In most cases difference services are in different VPCs, and one service needs to access the data of another service. When doing VPC peering to access RDS, it means that you want to expose the database directly to the other service. Usually though, you’d want both services to stay separated and have a clean interface for exchanging data (such as an authenticated API over a public interface).

By establishing a standard for easily peer VPCs in order to do direct database connections, we encourage moving to a state where services are not cleanly separated.

Note: technically speaking, the services could still be cleanly separated with an authenticated API and use VPC peering, so that the interface is not publicly available - but it doesn’t seem to be the intent here.

I’ve not used VPC peering myself, but in cloud services we planned to move to that model, having each distinct product in it’s own VPC and our shared VPC resources (e.g. monitoring) in a VPC that was peered to our product VPCs.

Assuming there is a planned need for the services in the shared VPC to be made available not only to this one Kubernetes VPC but also to other VPCs, this design makes sense.

If either the sole shared resource you’re envisioning is RDS (and it’s not really shared as it’s just used by the Kubuernetes cluster) or if the sole reason you’re thinking of putting RDS in a different VPC is to work around provisioning complexity of bringing up RDS in the same VPC as the kubernetes cluster I’d recommend looking at it more closely and seeing if we can solve the provisioning challenges.

Maybe a good way to dig into this is

  • What are the services you envision running in the shared VPC (beyond RDS)?
  • What products or planned products would need access to each of these services?

This may tease out the use cases.

monitored (I’m thinking security monitoring / flow log analysis)

The addition of VPC peering won’t help in security monitoring or flow log analysis. VPC flow logs are available without VPC peering. Security monitoring via NSM is currently only accomplished with Bro running on a NAT instance and in the case of internal VPC traffic or VPC peering, a NAT instance would not be traversed and so we wouldn’t have an NSM view into the traffic.

how will network traffic be routed … and restricted?

VPC peering doesn’t add to this either as all of the network layer controls to restrict access between systems (e.g. kubernetes and RDS) are present in a single VPC. The same routing and security group capabilities exist in a single VPC as exist in two peered VPCs

So to summarize, if there is a use case where you need to share certain resources (potentially RDS) between two distinct products/services that you plan to run in different VPCs, then a shared VPC with VPC peering is a good solution. If you’re looking to solve a different use case, we should lay out what that is and see how well VPC peering fits with that use case as well as other solutions.

In this case we are using VPC peering due to the fact that applications running in Kubernetes need to have some persistent durable store that doesn’t live in the kubernetes cluster or cluster VPC. Django does not support database writes via http(s) so we’ve got to have some instance of MySQL and Postgres somewhere. Putting it in the cluster itself provides isolation but durability problems when it comes to persistence.

Other options are:

  • form RDS inside the same VPC as kubernetes. This has a disadvantage in that it makes it difficult for us to delete and recreate the whole environment.
  • public facing RDS.

The way I see it separating these into an additional peered private space gives an additional audit point (VPC flow logs) for inter-vpc traffic and clear ways to identify database vs kube worker based on addressing by using separate address ranges and port tuples to identify what’s what. Example:

  • 10.0.0.0/8 for workers
  • 172.16.0.0/12 for resource VPC
  • Containers get a carrier grade NAT range.

Also note that we’re securing RDS with username/password limited scope users to databases and also RDS security groups. Though notably all production workers need access (at the network level) to all production RDS instances.

form RDS inside the same VPC as kubernetes. This has a disadvantage in that it makes it difficult for us to delete and recreate the whole environment.

Is it difficult to regenerate a Kubuernetes deployment without blowing away the actual VPC that it lives in? I would think these provisioning steps would be distinct from each other and deleting a kubernetes cluster could be done without deleting the VPC that it lived in.

The way I see it separating these into an additional peered private space gives an additional audit point (VPC flow logs) for inter-vpc traffic

I thought VPC flow logs were available within a single VPC for whatever network interfaces you wanted without the need of VPC peering?

clear ways to identify database vs kube worker based on addressing by using separate address ranges and port tuples to identify what’s what

Could the different classes of systems, were they within a single VPC, merely be assigned IPs in different ranges (without a need for VPC peering)? Or if you wanted put on different subnets within the VPC? I’d think this would allow for easy differentiation in VPC flow logs.

Thanks everyone! My team members may still want to add to this discussion if they get a chance.

Is it difficult to regenerate a Kubuernetes deployment without blowing away the actual VPC that it lives in? I would think these provisioning steps would be distinct from each other and deleting a kubernetes cluster could be done without deleting the VPC that it lived in.

Right now, we can’t but that can be changed. When EKS became generally available, I started with the Terraform that the Terraform team released. That source can be found here. For each EKS cluster that it creates, it will create a new VPC. In a sense, we inherited that behavior because I didn’t see a reason to change it at the time.

I just looked up the CloudFormation from Amazon for adding worker nodes and I can see that they took a different approach and they allow you to specify a VPC as a parameter when deploying new workers.

Separating the network from the cluster is something that we can do if it is a better architecture. At that point, we can deploy resources like RDS (or additional Kubernetes clusters) into that VPC without risking persistence if the cluster needs to be removed for some reason.

Ignoring the shared resources, I think it is probably unlikely that we will run multiple production clusters in the same region. It would make more sense to scale out an existing cluster. So we will probably see a 1:1 relationship between VPCs and clusters anyway.

If we have a cluster in us-west-2 and a cluster in us-east-1, do we need them to share resources like RDS or Elasticsearch?

The general concern with using VPC peering is equal parts of adding additional complexity along with what kang pointed out in promoting anti-patterns. The use cases for VPC peering tend to boil down to ones where you are solving for a) organizational and functional issues, rather than technical ones - for instance the use cases given in the official AWS documentation point to different parts of your organization deployed to different VPCs, which is not about a technical need. b) for legacy applications where you have been forced into a design pattern where you are deploying things within a VPC but have a protocol you can’t easily wrap or secure and a consumer of the service in an adjacent VPC.

In the case of this deployment I don’t see either of these two use cases, so the question I would have is what use case calls for the need for a separate networking construct in order to achieve an optimal deployment? VPC is the coarsest network building block you’ll use in AWS, bringing with it a set of various needs: allocating IPs, allocating subnets, setting up routing, potentially setting up additional EIPs, NATting, gateways, NACLs, and so forth in order to instantiate it. If you have an application which relies on having a unique and unshared set of those resources, it makes sense to create one then deploy it. If you don’t, then the less complex pattern tells you to deploy into something where you needn’t instantiate all these other constructs you don’t actually need.

From what I’ve seen in this app, none of the additional complexity there is needed, there’s no reason to update two VPCs to create a connection between them in order to facilitate communication because neither of the VPCs is precluded from running what you would in the other VPC.

As a follow to my initial comment:
after discussing a bit more I now understand the application better, in this case, it’s really not about interfacing databases with different services, but solely about solving a technical limitation where you cannot respin the services separately from the long-lived data of the same service.

As long as we only use such a pattern for a similar technical problem I don’t see a problem with it (regardless of the API-less implementation details, VPC peering or else).

Another side-note is that we’ve seen public facing RDS for certain services (to avoid doing VPC peering) which had their own set of problems to secure their access correctly (ideally they’d need an access proxy in front, which requires an instance, and so on…) and reliably update access controls. It’s also possible, though you have to decide which set of problem you want to solve.

Separating the network from the cluster is something that we can do if it is a better architecture.

I would argue that it is.

So we will probably see a 1:1 relationship between VPCs and clusters anyway.

Given this, to me it sounds like VPC peering is not the right solution here and intead, looking into seperating the VPC creation and cluster creation in terraform.

what use case calls for the need for a separate networking construct in order to achieve an optimal deployment

Exactly, it sounds to me that there is no network contruct here, it’s just the legacy of terraform containing both the VPC and the cluster code.

it’s really not about interfacing databases with different services, but solely about solving a technical limitation where you cannot respin the services separately from the long-lived data of the same service.

Agreed.

So given all this, I’d agree with everyone else here and recommend against establishing an additional VPC and setting up VPC peering. Instead decouple your provisioning code that provisions the VPC from that which provisions the cluster.