In a previous blog we showed how to build a cheap Kubernetes project, avoiding some of the bigger costs involved in hosting a cluster. That blog got a lot of attention, so here is a much improved follow-up. In this blog I’ll show how to build an even cheaper and more secure Kubernetes cluster for small projects.
A few years ago now we published a concept for hosting Kubernetes on Google Cloud Platform for less than £20 pcm. It’s fair to say quite a lot has changed since then, but irrespective of resource price increases, a lot of providers still support a generous free tier to help you build up your own expertise in their product offering.
We are big fans of Google Cloud Platform (GCP), especially for their best-in-class Kubernetes support. Kubernetes itself can be quite a beast to contend with, but for any aspiring Software/DevOps Engineer looking to learn a new technology, GCP offers the most seamless experience to get started. Like any cloud provider, the costs can creep up if not planned up front or managed over time.
There’s a lot more to cloud computing than just Google, perhaps we can pick up some extra features along the way…
So what does Even Cheaper Private Kubernetes look like?
There were plenty of opportunities to squeeze every last penny out of our previous post on this topic. In revisiting this challenge I wanted to both maintain the same level of performance, whilst also explore options for hardening the security of the cluster.
The cluster still has to be useful - shrinking the VM templates would be a considerable cost saving but would completely miss the point of hosting personal projects. Let’s start from a comparable specification for the node that will be used to host projects:
- 2 vCPUs
- 4GB RAM
- 40GiB total HDD
- Services hosted at a public domain name
We previously highlighted the immediate cost savings from using Spot (preemptible) VMs instead of Standard VMs, but it’s worth reiterating a few points:
- Standard VM prices are fixed, and will not change without advance announcement from Google.
- Spot VM Pricing is variable, but is no more than 60% of the Standard price.
- Worst case for a $24.46
e2-medium
would be $9.78. When the original article was written they were at $7.34, and today would be $8.44.
- Worst case for a $24.46
- Spot VMs can be stopped at any moment, and will never run longer than 24 hours.
- This isn’t a problem for most services that can tolerate some downtime
- If you are building something that’s a bit more sensitive, then it’s a reasonably tame Chaos Monkey !
On top of this, Kubernetes clusters can be configured as private or public, which affects their networking connections and service availability. By setting up a private cluster, you can retain complete control not just of the traffic coming in and out of the cluster, but also of the Kubernetes control plane.
Google Cloud Free Tier
It’s worth pausing for a moment to summarise the relevant parts of GCP Free Tier Limits , as you’ll notice GCP Products Calculator will report different estimates to some of the prices presented in this article.
Note: you must be eligible for the Free Tier to benefit from the discount. We do not recommend using any cloud provider without also setting up usage/billing alerts to ensure you do not encounter unexpected bills.
The key free tier resources for a personal Kubernetes cluster are these:
- No cluster management fee for one Autopilot or Zonal cluster per Cloud Billing account. Saving: $73.00/month.
- 1 non-preemptible e2-micro VM instance per month in us-west1, us-central1, or us-east1. Saving: $6.11/month.
- 30 GiB-months standard persistent disk. Saving: $1.20/month.
Luckily storage is pretty cheap, so even if your project needs more/less than 40GB it wouldn’t make a massive difference to your costs. However, hosting more than one Kubernetes cluster or using a Regional cluster would cost significantly more. We have also not considered network egress traffic; more on this later.
The non-preemptible instance isn’t really capable enough to participate in the Kubernetes node pool, so it’s quite limited in its usefulness.
Ingress-as-a-Problem
I am certainly guilty of starting many projects that never see the light of day, but even during development it is important to understand the challenges around building and deploying a public-facing project.
For anything other than the simplest of projects, ingress is a challenge in two ways. Firstly you need to get traffic to arrive at your cluster safely and securely, and secondly you need to route the different streams of traffic to the different services that comprise your application.
A Google Cloud Load Balancer will set you back around $18.25/month, and these are created via inference with certain types of Kubernetes Service resources. This is an easy way to start paying a lot more than you intended for your cluster!
The previous article used a dedicated e2-micro
instance running kubeip
to assign a static IP address.
This node and its associated IP accounted for just over 50% of the total cost, but is within the free tier in some regions.
Ingress is an annoying problem, and frequently leads to unforeseen security or availability issues.
Using kubeip
to set a static IP necessitates using a public Kubernetes cluster, which if misconfigured could allow traffic to reach certain services without first passing through a reverse proxy/WAF/etc.
Enter Cloudflare
Cloudflare at its core is a high-speed global network, with the ability to intelligently manage and route traffic. As part of their Zero Trust suite of products, you can connect your applications to their network using Cloudflare Tunnels without opening any inbound ports in your private cluster. Instead, Cloudflare Tunnels need a daemon process to run inside the cluster, which becomes the entry point for traffic into the cluster rather than relying on an external IP attached to a Load Balancer.
Not only does this eliminate the need for Kubernetes or Google Cloud native ingress resources, your origin servers become inaccessible to requests outside of Cloudflare’s network.
They also have a generous Free plan .
Cluster Design
Let’s consider a simple project with a web server and an API server, both running as separate containers but served from the same domain:
So, let’s go through the steps needed to provision this infrastructure, and for ease of cleanup let’s use Terraform.
Step 0: Prerequisites
If you want to follow along the build process, you will need:
- Google Cloud Platform account, with a project,
- Cloudflare account, with a fully managed domain name,
gcloud
command line tool, withkubectl
as well,cloudflared
command line tool to create locally-managed tunnels,terraform
to create/update/delete infrastructure.
You should ensure your gcloud
environment is logged in, with valid Application Default Credentials, and with the default project selected.
All Terraform in this guide should be placed in a single main.tf
file, and terraform init
must be run in the same folder to set up a local state cache. Our main.tf
should start like this:
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "6.8.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "2.33.0"
}
}
}
locals {
name = "test-k8s" # Base name of created cloud resources
project = "<YOUR_PROJECT_NAME>"
# Region and zone can be modified, but may cause resources to be created outside of Free Tier Limits
region = "us-central1"
zone = "us-central1-c"
}
provider "google" {
project = local.project
region = local.region
}
Step 1: Create the Kubernetes cluster
To ensure the cluster remains private, all the required infrastructure is provisioned manually. It is perfectly possible to create a private cluster within the GCP console, however it is under constant development and many of the options once set during creation cannot be updated.
During this process we will avoid using the default subnets, or the default compute engine service account. This service account has the project Editor role, which is a legacy GCP primitive role with excessive permissions. Whenever service accounts are used we would always recommend assigning the minimum privileges possible.
Networking Setup
Create a VPC network with a single subnet in the region of choice. IP CIDR range can be modified but must not overlap with the default range if the default subnet is still present in the project.
# VPC network for the Kubernetes Cluster
resource "google_compute_network" "vpc" {
name = local.name
auto_create_subnetworks = "false"
}
resource "google_compute_subnetwork" "subnet" {
name = local.name
region = local.region
network = google_compute_network.vpc.name
ip_cidr_range = "10.0.0.0/20" # 10.0.0.0-10.0.15.255
private_ip_google_access = true # Ensure core GCP services can still be accessed
}
Private Zonal Cluster
Create a Kubernetes cluster, ensuring it remains private and all resources stay in the chosen region/zone.
# Private zonal cluster
resource "google_container_cluster" "cluster" {
name = local.name
location = local.zone
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
deletion_protection = false # Allows terraform to clean up resources
remove_default_node_pool = true # We will create this later
initial_node_count = 1
private_cluster_config {
enable_private_endpoint = false # This flag actually disables the public endpoint if true
enable_private_nodes = true # Nodes have internal IPs only and use the private master endpoint
master_ipv4_cidr_block = "10.13.0.0/28" # 10.13.0.0-10.13.0.15
}
ip_allocation_policy {
cluster_ipv4_cidr_block = "10.11.0.0/21" # 10.11.0.0-10.11.7.255
services_ipv4_cidr_block = "10.12.0.0/21" # 10.12.0.0-10.12.7.255
}
}
Why is enable_private_endpoint
false? Unfortunately, despite its name this flag doesn’t quite behave as you might expect.
Turning it on would mean the Kubernetes master API endpoint only has a private address, which would make the cluster impossible to manage remotely without additional infrastructure.
enable_private_nodes
ensures the nodes have internal IPs only, and that they use the private master endpoint.
There are situations where it is a good idea to disable the public endpoint, but that’s for another blog in the future!
All of the additional IP ranges defined in this section become secondary ranges to the established subnet, effectively acting as alias addresses. Behind the scenes, GCP routes traffic for these alias IPs to the correct VM in the subnet.
At this point, the cluster exists but has no node pool, due to the default pool having been removed.
Node Pool
Create a node pool using the e2-medium
template, with 10GB minimum persistent disk size and sensible defaults.
# Primary node pool
resource "google_container_node_pool" "primary-nodes" {
name = "${local.name}-primary"
location = local.zone
cluster = google_container_cluster.cluster.name
node_count = 1
node_locations = [local.zone]
node_config {
oauth_scopes = [
# https://registry.terraform.io/providers/hashicorp/google/6.8.0/docs/resources/container_node_pool
"https://www.googleapis.com/auth/cloud-platform"
]
machine_type = "e2-medium"
disk_type = "pd-standard"
disk_size_gb = 10
spot = true
metadata = {
disable-legacy-endpoints = "true"
}
}
}
This creates a single Spot (preemptible) VM within the chosen zone, with a private IP address. However, with no external IP address there is no route for outbound traffic to the internet.
Any resources created within the cluster would only be able to access Google cloud services, but setting up a Cloudflare tunnel requires a route to the internet.
NAT Setup
NAT setup within GCP uses a Cloud Router to host a NAT rule.
# Cloud Router (required for NAT)
resource "google_compute_router" "router" {
name = "${local.name}-router"
region = local.region
network = google_compute_network.cluster-vpc.id
}
# Cloud NAT (for outbound internet access)
resource "google_compute_router_nat" "nat" {
name = "${local.name}-nat"
router = google_compute_router.router.name
region = local.region
nat_ip_allocate_option = "AUTO_ONLY" # Automatically assign a public IP
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
endpoint_types = ["ENDPOINT_TYPE_VM"]
auto_network_tier = "STANDARD"
}
This does require a public IPv4 address, and outbound connections from pods connected to the NAT will originate from this address.
Step 2: Cloudflare Tunnel Setup
Cloudflare tunnels can be set up in two ways: either managed through the Cloudflare Zero Trust portal, or via a local configuration.
As the configuration needs to be transferred to a Kubernetes container, the tunnel needs to be set up locally using cloudflared
.
cloudflared tunnel login
cloudflared tunnel create test-k8s-tunnel
This will output the tunnel UUID, and the resulting configuration will be saved at ~/.cloudflared
.
You’ll be able to access services through the tunnel at https://<TUNNEL_UUID>.cfargotunnel.com.
Accessing the Cluster
Most of the time, access to a k8s cluster is via the kubectl
command, but we can also manage k8s resources from within Terraform as well.
There are other tools such as helm
for managing resources within a cluster, and Terraform is certainly not ideal for all use-cases.
However, it’s great for establishing required namespaces, secrets, and some other services that allow other workloads in the cluster to function properly.
data "google_client_config" "provider" {} // Needed for k8s access
provider "kubernetes" {
host = "https://${google_container_cluster.cluster.endpoint}"
token = data.google_client_config.provider.access_token
cluster_ca_certificate = base64decode(
google_container_cluster.cluster.master_auth[0].cluster_ca_certificate,
)
}
resource "kubernetes_namespace" "main" {
metadata {
name = local.name
}
}
At this point we have used the default google client configuration to connect to the cluster endpoint, and to enable the kubernetes provider within Terraform.
We’ve also added a new namespace (rather than using default
) just to check it works.
Tunnel Secret
The configuration needs to be saved as a Kubernetes secret ready for the deployment of cloudflared
to pick up.
You will need the UUID from the previous step:
data "local_file" "tunnel-credentials" {
filename = "~/.cloudflared/<TUNNEL_UUID>.json"
}
resource "kubernetes_secret" "tunnel-credentials" {
metadata {
name = "tunnel-credentials"
namespace = kubernetes_namespace.main.id
}
data = {
"credentials.json" = data.local_file.tunnel-credentials.content
}
}
At this point the local copy of the credentials should be deleted or transferred to a separate secret manager. If this credential is disclosed it would allow someone else to receive traffic destined for your tunnel.
Step 3: Deploy Tunnel
cloudflared
is provisioned as a deployment within the Kubernetes cluster.
The deployment uses a ConfigMap for general, non-sensitive configuration, and the Secret created with the tunnel credentials.
Multiple instances of cloudflared
can run at the same time using this configuration.
resource "kubernetes_config_map" "cloudflared-config" {
metadata {
name = "cloudflared-config"
namespace = kubernetes_namespace.main.id
}
data = {
"config.yaml" = yamlencode({
tunnel = "test-k8s-tunnel"
credentials-file = "/etc/cloudflared/creds/credentials.json"
metrics = "0.0.0.0:2000"
no-autoupdate = true
ingress = [
{
path = "/test.*"
service = "hello_world"
},
{
service = "http_status:404"
}
]
})
}
}
resource "kubernetes_deployment" "cloudflared" {
metadata {
name = "cloudflared"
namespace = kubernetes_namespace.main.id
}
spec {
selector {
match_labels = {
app = "cloudflared"
}
}
replicas = 2
template {
metadata {
labels = {
app = "cloudflared"
}
}
spec {
container {
name = "cloudflared"
image = "cloudflare/cloudflared:latest"
args = ["tunnel", "--config", "/etc/cloudflared/config/config.yaml", "run"]
liveness_probe {
http_get {
path = "/ready"
port = 2000
}
failure_threshold = 1
initial_delay_seconds = 10
period_seconds = 10
}
volume_mount {
name = "config"
mount_path = "/etc/cloudflared/config"
read_only = true
}
volume_mount {
name = "creds"
mount_path = "/etc/cloudflared/creds"
read_only = true
}
}
volume {
name = "creds"
secret {
secret_name = kubernetes_secret.tunnel-credentials.metadata[0].name
}
}
volume {
name = "config"
config_map {
name = kubernetes_config_map.cloudflared-config.metadata[0].name
items {
key = "config.yaml"
path = "config.yaml"
}
}
}
}
}
}
}
Deploying Additional Applications
Depending on what you want to use your cluster for, you can deploy additional workloads to it and make them accessible via the internet by updating this cloudflared
config.
This involves adding an ingress rule to target the service, e.g.:
# Within cloudflared-config.data.ingress
{
hostname = "subdomain.example.com"
service = "http://web-server.${kubernetes_namespace.main.id}.svc.cluster.local:8080"
},
Cleanup
Once you have finished testing this project, run terraform destroy
to remove all created resources from GCP.
This should remove all the billable items as the Cloudflare services are free, but it is also worth cleaning up the configuration added specifically for this project as well.
Cost Breakdown and Final Thoughts
So like the previous design, we have ended up with a single-node Kubernetes cluster with externally accessible services. However, this cluster is configured with private networking and we have eliminated all public node IP assignments (with the exception of Cloud NAT).
By having a private cluster, nothing hosted within it can respond to unsolicited internet requests as there is no public IP attached to the VMs to interrogate, and Cloud NAT only permits inbound traffic on established connections.
Throughout all the previous sections, there has not been any consideration of networking costs. For personal or development projects this is often not really too much of a concern, but if things take off then suddenly this becomes a significant burden to manage. Traffic costs are unavoidable, however this project setup gives you all the tools you need to minimise your traffic costs:
- Cloudflare Caching - by setting caching policies on your content you can avoid duplicated traffic ever reaching your cluster.
- GCP CDN Interconnect - Cloudflare have a direct connection to GCP, and network egress will appear on your bill as “Network Data Transfer Out via Carrier Peering Network - Americas Based”, which is charged at $0.04/GiB, a third of the cost of the equivalent premium tier egress.
For September 2024, the following costs were reported in GBP:
Resource | Cost |
---|---|
Spot Preemptible E2 Instance Core running in Americas | £4.74 |
Networking Cloud NAT IP Usage | £2.73 |
Spot Preemptible E2 Instance Ram running in Americas | £2.54 |
Networking Cloud Nat Gateway Uptime | £0.76 |
TOTAL (ex VAT) | £10.77 |
In USD this is roughly $13.90, compared to $17.93 as our last best effort.
I would expect all but the simplest applications to outgrow this relatively quickly, however hopefully this has been a useful introduction to private nodes and kubernetes, whilst also showcasing how this can be managed through Infrastructure-as-Code!
References
- https://developers.cloudflare.com/cloudflare-one/tutorials/many-cfd-one-tunnel/
- https://medium.com/google-cloud/gcp-terraform-to-deploy-private-gke-cluster-bebb225aa7be