Quantcast
Channel: Backoffice – Daniel's Tech Blog
Viewing all 181 articles
Browse latest View live

Automate taking backups from Azure disks attached to Azure Kubernetes Service

$
0
0

At the beginning of 2019 I wrote a blog post about taking backups from Azure disks attached to an Azure Kubernetes Service cluster.

-> https://www.danielstechblog.io/taking-backups-from-azure-disks-attached-to-aks-via-azure-automation/

Since then, some things changed. Azure Function PowerShell support went into public preview in April 2019 and the AzTable (AzureRmStorageTable) module I use in the solution had new releases.

So, I decided it is the right timing to overhaul and rewrite the implementation.

Here we are and I like to walk you through the new solution. But let us step back why you need such a solution.

When you use dynamically created Azure disks via Kubernetes PVC (Persistent Volume Claim) in AKS the documentation mentions as backup option snapshots.

-> https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynamic-pv#back-up-a-persistent-volume

As you properly will not do it manually a solution that automates this would be nice. Especially looking at options like retention time and automatic cleanup functionality.

That is what I tried to accomplish with my implementation. It does not matter if the Azure disk got dynamically or statically/manually created. As long as they are tagged correctly.

Before we dive into the solution and its deployment let us have a look at the table for the differences between v1 and v2.

v1v2
Infrastructure setup
  • Manually
  • Azure Resource Manager template
Solution
  • Azure Automation
  • Azure Table storage
  • Azure Function
  • Azure Table storage
Azure disk onboarding
  • Each disk manually with a new schedule
  • Using tags in Azure identifying onboarded disks

The deployment of the necessary Azure resources is an Azure CLI one-liner. You only need to replace the parameter values.

### OPTION 1 ###
> az deployment sub create --name "setup-aks-snapshot-backup" --location "westeurope" \
  --template-uri https://raw.githubusercontent.com/neumanndaniel/armtemplates/master/aks-snapshot-backup/aks-snapshot-backup.json \
  --parameters \
  resourceGroupName='aks-snapshot-backup-infrastructure' \
  location='westeurope' \
  functionName='aks-snapshot-backup' \
  timeZone='W. Europe Standard Time' \
  --verbose
### OPTION 2 ###
> az deployment sub create --name "setup-aks-snapshot-backup" --location "westeurope" \
  --template-uri https://raw.githubusercontent.com/neumanndaniel/armtemplates/master/aks-snapshot-backup/aks-snapshot-backup.json \
  --parameters \
  resourceGroupName='aks-snapshot-backup-infrastructure' \
  location='westeurope' \
  functionName='aks-snapshot-backup' \
  timeZone='W. Europe Standard Time' \
  operatorRoleDefinitionGuid='e6dec0ce-6745-4c05-ae5c-903faf38a590' \
  operatorRoleAssignmentGuid='5a34bfc2-f4a8-4ad9-b73a-ac22e7fb74ff' \
  contributorRoleDefinitionGuid='b02a5d6d-e3f3-4530-b768-e384d10c6100' \
  contributorRoleAssignmentGuid='85ac478a-a2aa-4714-9222-39a6492a6305' \
  storageContributorRoleAssignmentGuid='f34a3ad5-37c9-4e39-a33c-7218ac852c69' \
  --verbose

What gets deployed into your subscription?

The main Azure Resource Manager template creates a resource group and two custom roles for Azure RBAC.

These custom roles called AKS Snapshot Backup Contributor and Operator. Only necessary actions are defined.

AKS Snapshot Backup ContributorAKS Snapshot Backup Operator
Actions
  • Microsoft.Compute/snapshots/read
  • Microsoft.Compute/snapshots/write
  • Microsoft.Compute/snapshots/delete
  • Microsoft.Compute/disks/beginGetAccess/action
  • Microsoft.Compute/disks/read
Scope
  • Resource group
  • Subscription

All the required Azure resource are deployed via the first linked template. Our Azure Table storage gets tagged with the key value pair aksSnapshotBackupDatabase:tableStorage.

So, the Azure Function can find its backend storage storing necessary information about the taken snapshots.

For the pending role assignments, the linked template returns the principal id of the function’s managed identity.

Three role assignments need to be set. One on subscription level via the main template and two on resource group level via the second linked template.

Role assignment subscription level Role assignments resource group level

Azure Function upload

Afterwards we upload the PowerShell function with the Azure Function Core Tools.

-> https://github.com/neumanndaniel/serverless/tree/master/aks-snapshot-backup

> func azure functionapp publish aks-snapshot-backup

The Function App itself consists of two functions. One for taking snapshots and the other one for deleting snapshots that are older than the specified retention time.

Default schedules for the execution are midnight for taking snapshots and 2am in the morning for the cleanup. But that is only valid, if you specified the correct time zone during the ARM template deployment.

Azure disk onboarding

Specifying which Azure disks should have a backup is easy and done using tags.

Set the key value pair aksSnapshotBackupEnabled:true and the disk is onboarded to the process. You can specify with the tag retentionTime:7 a custom retention time in days. If the tag is not set a default of 3 days applies.

Disk tagging Disk tagging

After the first run you should see the snapshot in the solutions resource group and an entry for it in the Azure Table storage.

Snapshots created Table storage entries

The naming pattern for the snapshot follows the schema {AZURE_DISK_NAME}-%Y-%m-%dT%I-%M%p. In the automatically created table akssnapshotbackup the following information are stored.

ColumnDescription
PartitionKeyDate format e.g., 2020-11-2
RowKeySnapshot name
TimestampAutomatically created when adding the entry
azureSnapshotResourceIdResource id of the snapshot
azureSourceDiskResourceIdResource id of the Azure disk
regionAzure region
retentionTimeDays to retain the snapshot

The cleanup process removes the snapshot and the table entry when the lifetime expires.

You find the ARM templates and the PowerShell function in my GitHub repositories.

-> https://github.com/neumanndaniel/armtemplates/tree/master/aks-snapshot-backup
-> https://github.com/neumanndaniel/serverless/tree/master/aks-snapshot-backup

Der Beitrag Automate taking backups from Azure disks attached to Azure Kubernetes Service erschien zuerst auf Daniel's Tech Blog.


Troubleshooting Azure Kubernetes Service tunnel component issues

$
0
0

In Azure Kubernetes Service Microsoft manages the AKS control plane (Kubernetes API server, scheduler, etcd, etc.) for you. The AKS control plane interacts with the AKS nodes in your subscription via a secure connection that is established through the tunnelfront / aks-link component.

-> https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads#kubernetes-cluster-architecture

As you can run the AKS control plane within a free tier (SLO) or a paid tier (SLA) the tunnel component differs. For the free tier it is still the tunnelfront component compared to the paid tier with the aks-link component. In this blog post I am talking about the aks-link component using the AKS control plane with the paid tier (SLA) option.

The tunnel component runs in the kube-system namespace on your nodes.

> kubectl get pods -l app=aks-link -n kube-system
NAME                        READY   STATUS    RESTARTS   AGE
aks-link-7dd7c4b96f-986vs   2/2     Running   0          7m22s
aks-link-7dd7c4b96f-f5zr5   2/2     Running   0          7m22s

The issue

Let me tell you what happened today on one of our AKS clusters during a release of one of our microservices.

We received an error that we hit the timeout for calling the Istio webhook for the automatic sidecar injection. Istio uses a mutating webhook for the automatic sidecar injection.

[ReplicaSet/microservice]FailedCreate: Error creating: Internal error occurred: failed calling webhook 'sidecar-injector.istio.io': Post https://istiod.istio-system.svc:443/inject?timeout=30s: context deadline exceeded

Further investigation showed us that commands like kubectl get or describe run successfully. But kubectl logs runs into the typical timeout indicating at the first sight an issue with the control plane.

Error from server: Get https://aks-nodepool-12345678-vmss000001:10250/containerLogs/microservice/microservice-1234567890-ab123/microservice-container: dial tcp x.x.x.x:10250: i/o timeout

The resource health and the AKS Diagnostics showed no issues.

Azure Resource Health blade AKS Diagnostics blade AKS Diagnostics Cluster Insights results

Beside a warning in the resource health blade that a planned control plane update happened in the morning.

Degraded : Updating Control Plane (Planned)
At Tuesday, November 17, 2020, 6:22:41 AM GMT+1, the Azure monitoring system received the following information regarding your Azure Kubernetes Service (AKS):
Your cluster was updating. You may see this message if you created your cluster for the first time or if there is a routine update on your cluster.
Recommended Steps
No action is required. You cluster was updating. The control plane is fully managed by AKS. To learn more about which features on AKS are fully managed, check the Support Policies documentation.

As the AKS cluster was fully operational, no customer impact and beside we could not deploy anything, we opened a support ticket and started our own recovery procedures.

After a support engineer was assigned, we quickly identified and mitigated the issue. We just needed to restart the aks-link component and therefore stopped our own recovery procedures.

kubectl rollout restart deployment aks-link -n kube-system

Summary

The takeaway in this situation is restarting the aks-link component when the following conditions are met.

  • Resource health blade shows a healthy state or a warning
  • AKS Diagnostics shows a healthy state
  • kubectl commands like get and describe succeed as they only interact with the API server, control plane, itself
  • kubectl commands like logs fail as the control plane needs to interact with the kubelet component on the nodes
  • Deployments fail as the control plane needs to interact with the kubelet component on the nodes

The difference here is important. Calls only require the control plane succeed. But calls requiring interaction between the control plane and the nodes fail are a good indicator for an issue with the aks-link component.

Hence a restart of the aks-link component might solve this, and you do not need to reach out to the Azure Support.

Der Beitrag Troubleshooting Azure Kubernetes Service tunnel component issues erschien zuerst auf Daniel's Tech Blog.

Azure Reservations and the RBAC dilemma

$
0
0

Cloud computing underlies a constant change. Things you take today for granted are different tomorrow. Surprisingly, even designs and implementations on the same platform can be different.

Welcome to today’s topic of Azure Reservations and the RBAC dilemma.

As I have written in my brief introduction, designs and implementations can be different. Azure RBAC is one example here. Being the owner of an Azure subscription does not mean you have access to everything. Access to the Azure Cost Management and Azure Reservations are not part of your permission set. Especially for Azure Reservations this might be surprisingly new for some folks as the RBAC roles Owner and Reservation Purchaser on subscription level have the permissions to purchase reservations.

Overview Azure Reservations

Let us have a look at the Azure documentation.

The user who purchases a reservation and the account administrator of the subscription used for billing the reservation get the Owner role on the reservation order and the reservation.
-> https://docs.microsoft.com/en-us/azure/cost-management-billing/reservations/save-compute-costs-reservations#permissions-to-view-and-manage-reservations

Per default only the user who purchases the reservation and the account administrator have Owner permissions on the reservation itself and its parent the reservation order.

Working in a team of cloud engineers or in a cloud center of excellence, this approach is counterproductive. A single user should not have exclusively access to the purchased Azure Reservations.

Implementation:

You can purchase reservations in two ways. Programmatically or through the Azure portal.

The programmatic way uses a PowerShell or bash script executed by an Azure Service Principal for purchasing the reservations.

-> https://docs.microsoft.com/en-us/powershell/module/az.reservations/?view=azps-5.1.0#reservations
-> https://docs.microsoft.com/en-us/cli/azure/reservations?view=azure-cli-latest

This ensures a decoupling from a real user as several cloud engineers have access to an Azure Service Principal for instance.

But in the end, you end up in the same situation whether you purchase the reservations programmatically or via the Azure portal. Not everyone from the cloud engineering team can see the reservations in the portal. Especially in the case of using a script with an Azure Service Principal. Nobody sees them.

So, you should assign the Owner role to an Azure AD group on the reservation order after you purchased the reservation.

This can be easily done with the following bash script.

RESERVATION_ORDERS=$(az reservations reservation-order list --query '[].id' -o tsv)
for ITEM in $RESERVATION_ORDERS; do
  az role assignment create --assignee "{AZURE_AD_GROUP_OBJECT_ID}" --role "Owner" --scope $ITEM --verbose
done

For the programmatic way using a script executed by an Azure Service Principal it is part of the automated workflow.

Doing it via the Azure portal, the role assignment is part of a defined process. But in the end a manual task that sometimes might be forgotten to be executed.

Summary:

Whatever way you choose a necessary step in the process of ordering a new Azure Reservation should be the assignment of the Owner role on the reservation order to an Azure AD group.

Der Beitrag Azure Reservations and the RBAC dilemma erschien zuerst auf Daniel's Tech Blog.

Increase your application availability with a PodDisruptionBudget on Azure Kubernetes Service

$
0
0

This is the first blog post of a series of posts covering the topic about increasing the application availability on Azure Kubernetes Service / Kubernetes.

Today we cover the PodDisruptionBudget.

What is a PodDisruptionBudget?

A PDB is an additional Kubernetes object that is deployed beside your Deployment, ReplicaSet or StatefulSet increasing your application’s availability. This is done by specifying either the minAvailable or maxUnavailable setting for a PDB. Both settings accept an absolute number or a percentage as value.

When you choose percentage as value Kubernetes rounds up to the nearest integer if your current replica number is uneven.

Assuming you want to specify how many pods must always be available, you choose the minAvailable setting.

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: app-one-pdb
  namespace: app-one
spec:
  minAvailable: 50%
  selector:
    matchLabels:
      app: app-one

The other way around with the maxUnavailable setting lets you specify how many pods can be unavailable.

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: app-two-pdb
  namespace: app-two
spec:
  maxUnavailable: 2
  selector:
    matchLabels:
      app: app-two

Let me show you a brief example of what this looks like for a highly available Istio control plane setup on my AKS cluster.

-> https://www.danielstechblog.io/high-available-control-plane-with-istio-1-5-on-azure-kubernetes-service/

Istio PDB Kubernetes output

As seen in the screenshot, both deployments istio-ingressgateway and istiod allow one disruption running with two replicas each.

> kubectl get poddisruptionbudgets istio-ingressgateway -o yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  labels:
    app: istio-ingressgateway
    install.operator.istio.io/owning-resource: installed-state
    install.operator.istio.io/owning-resource-namespace: istio-system
    istio: ingressgateway
    istio.io/rev: default
    operator.istio.io/component: IngressGateways
    operator.istio.io/managed: Reconcile
    operator.istio.io/version: 1.7.3
    release: istio
  name: istio-ingressgateway
  namespace: istio-system
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: istio-ingressgateway
      istio: ingressgateway

What protection provides a PDB?

A PDB provides protection against so called voluntary evictions.

This can be a cluster upgrade of your AKS cluster for instance. Every node gets replaced after another during the upgrade process by evicting the pods, deleting the node, and bringing up a new one.

-> https://docs.microsoft.com/en-us/azure/aks/upgrade-cluster

Or you use kured (Kubernetes Reboot Daemon) for automatically rebooting your nodes applying the latest security patches. Again, this is done by evicting the pods and then rebooting the node.

-> https://github.com/weaveworks/kured
-> https://docs.microsoft.com/en-us/azure/aks/node-updates-kured

Another scenario is a manual initiated maintenance by running the kubectl drain command.

The PDB does not provide protection against node failures!

Just assume you run at the minimum size specified in the PDB. Then the node with your application pod fails. Your application will then be unavailable for a brief period until it is brought up again by Kubernetes.

Summary

A PDB increases your application availability by protecting the application from voluntary evictions like cluster upgrades or planned node reboots.

But you are not protected against node failures as mentioned above.

Der Beitrag Increase your application availability with a PodDisruptionBudget on Azure Kubernetes Service erschien zuerst auf Daniel's Tech Blog.

Azure Kubernetes Service – Azure RBAC for Kubernetes authorization

$
0
0

At this year’s Ignite conference Microsoft announced the next major step of integrating Azure functionality into AKS: Azure RBAC for Kubernetes authorization.

-> https://docs.microsoft.com/en-us/azure/aks/manage-azure-rbac

Azure RBAC for Kubernetes authorization lets you assign built-in or custom roles onto the Azure Kubernetes Service object in Azure. So, you do not have to create Kubernetes roles and role bindings in Kubernetes assigning permissions to your developers.

Yes, you read it correctly with Azure RBAC for Kubernetes authorization you do your Kubernetes access management in Azure instead in Kubernetes itself.

The only requirement your AKS cluster needs to fulfill is the usage of the managed AAD integration. Also sometimes called the AAD integration v2. Besides the following limitations apply to the currently available preview version.

  • Only new clusters are supported. Existing will be supported with the GA version.
  • kubectl v1.18.3 or higher
  • New role assignments can take up to 5 minutes to be pushed to the Kubernetes API server
  • AAD tenant for the subscription and the managed AAD integration must be the same
  • CRDs are not represented as data actions when it comes to custom role definitions. But they can be covered with Microsoft.ContainerService/managedClusters/*/read as data action.

Four built-in roles are available at the time of writing.

  • Azure Kubernetes Service RBAC Reader
  • Azure Kubernetes Service RBAC Writer
  • Azure Kubernetes Service RBAC Admin
  • Azure Kubernetes Service RBAC Cluster Admin

Those four built-in roles matching the permission set of the following Kubernetes cluster roles.

  • view
  • edit
  • admin
  • cluster-admin

Let us now dive in assigning one of the built-in roles and creating a custom role for our AKS cluster.

Built-in role – Azure Kubernetes Service RBAC Reader

In our first scenario we assign the Azure Kubernetes Service RBAC Reader role to the kube-system namespace. Yes, it is possible to do a role assignment on the whole cluster or only on a specific namespace.

I am using the following Azure CLI command assigning the Azure Kubernetes Service RBAC Reader role to my Azure AD user object onto the kube-system namespace.

> RESOURCE_ID="/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/azst-aks-demo/providers/Microsoft.ContainerService/managedClusters/azst-aks-demo"
> az role assignment create --role "Azure Kubernetes Service RBAC Reader" \
--assignee xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--scope $RESOURCE_ID/namespaces/kube-system

Side note: It seems to be the case that you cannot use an Azure AD group for the role assignment. I do not know if this is only a preview limitation or a general issue. But during the testing of the preview, I was not able to get an AAD group role assignment to work.

As seen in the screenshot, a role assignment on a child resource is not represented in the Azure portal.

Role assignment overview

Instead download the role assignments and select only Children.

Download role assignments

There we go. As seen in the downloaded report, our role assignment was successful.

[
  {
    "RoleAssignmentId": "20e9b3be-4924-4bb4-80b5-59393d8b0513",
    "Scope": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/azst-aks-demo/providers/Microsoft.ContainerService/managedClusters/azst-aks-demo/namespaces/kube-system",
    "DisplayName": "Daniel Neumann",
    "SignInName": "REDACTED",
    "RoleDefinitionName": "Azure Kubernetes Service RBAC Reader",
    "RoleDefinitionId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/providers/Microsoft.Authorization/roleDefinitions/7f6c6a51-bcf8-42ba-9220-52d62157d7db",
    "ObjectId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "ObjectType": "User"
  }
]

The Kubernetes permissions itself are represented as data actions in the Azure RBAC system.

AKS RBAC Reader Data Actions snippet

Using kubectl to show the pods in the kube-system namespace works as intended.

> kubectl get pods -n kube-system
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code REDACTED to authenticate.
NAME                                                  READY   STATUS    RESTARTS   AGE
aks-link-54b67dd945-5rwvs                             2/2     Running   0          6h55m
aks-link-54b67dd945-xfh69                             2/2     Running   0          7h11m
azure-policy-78c78fdfb4-zw78l                         1/1     Running   0          6h55m
azure-policy-webhook-75b4fffd8c-qhhbc                 1/1     Running   0          6h55m
calico-node-7mg46                                     1/1     Running   0          7h11m
calico-node-dtp85                                     1/1     Running   0          7h11m
calico-node-hvmxt                                     1/1     Running   0          7h9m
calico-typha-deployment-5664ccf987-5rxmn              1/1     Running   0          6h55m
calico-typha-horizontal-autoscaler-78dd9bb4b5-lp4mp   1/1     Running   0          6h55m
coredns-748cdb7bf4-68ggr                              1/1     Running   0          6h55m
coredns-748cdb7bf4-jbdrg                              1/1     Running   0          6h55m
coredns-autoscaler-868b684fd4-2jxvr                   1/1     Running   0          6h55m
kube-proxy-gvfk6                                      1/1     Running   0          50m
kube-proxy-qf9zv                                      1/1     Running   0          50m
kube-proxy-zh74s                                      1/1     Running   0          51m
metrics-server-58fdc875d5-x2qgg                       1/1     Running   0          6h55m
omsagent-2d4h7                                        1/1     Running   1          7h11m
omsagent-6bs4v                                        1/1     Running   0          7h9m
omsagent-mtk7f                                        1/1     Running   0          7h11m
omsagent-rs-7cb7c7fb4-9sf56                           1/1     Running   1          7h12m

Doing the same on another namespace throws a permission denied message.

> kubectl get pods -n gatekeeper-system
Error from server (Forbidden): pods is forbidden: User "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" cannot list resource "pods" in API group "" in the namespace "gatekeeper-system": User does not have access to the resource in Azure. Update role assignment to allow access.

What is working with kubectl does not work with the Kubernetes resource view in the Azure portal as the resource view requires permission to list all namespaces on an AKS cluster.

Kubernetes resource view error

We will solve this with a custom role.

Custom role – AKS Namespace Viewer

We have seen how to use one of the built-in roles but sometimes we need a custom role.

The following JSON body describes the role definition we want to create to list all namespaces on an AKS cluster.

{
  "name": "AKS Namespace Viewer",
  "description": "Lets you view all namespaces.",
  "actions": [],
  "notActions": [],
  "dataActions": [
    "Microsoft.ContainerService/managedClusters/namespaces/read"
  ],
  "notDataActions": [],
  "assignableScopes": [
    "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
  ]
}

The role only gives read access to resources of the type namespaces.

We use the Azure CLI again to do the role assignment after creating the role definition.

> JSON='
{
  "name": "AKS Namespace Viewer",
  "description": "Lets you view all namespaces.",
  "actions": [],
  "notActions": [],
  "dataActions": [
    "Microsoft.ContainerService/managedClusters/namespaces/read"
  ],
  "notDataActions": [],
  "assignableScopes": [
    "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
  ]
}
'
> az role definition create --role-definition $JSON
> RESOURCE_ID="/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/azst-aks-demo/providers/Microsoft.ContainerService/managedClusters/azst-aks-demo"
> az role assignment create --role "AKS Namespace Viewer" \
--assignee xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--scope $RESOURCE_ID

AKS Namespace Viewer role assignment

Afterwards we can view pods in the kube-system namespace via the Kubernetes resource view in the Azure portal.

Kubernetes resource view

You find the full reference which data action for AKS are currently available on Azure docs.

-> https://docs.microsoft.com/en-us/azure/role-based-access-control/resource-provider-operations#microsoftcontainerservice

Summary

I hope the Azure RBAC for Kubernetes authorization feature will GA soon. As this is a game changer on how to do permission assignments on an AKS cluster.

The good news is that permission assignments via roles and role bindings are still possible. So, for Kubernetes service accounts nothing changes. Furthermore, you can fallback to default Kubernetes mechanisms when a specific operation is not available yet as data action for AKS.

The only quirk now is the thing with non-working role assignments using Azure AD groups as mentioned above.

Der Beitrag Azure Kubernetes Service – Azure RBAC for Kubernetes authorization erschien zuerst auf Daniel's Tech Blog.

Increase your application availability with pod anti-affinity settings in Azure Kubernetes Service

$
0
0

This is the second blog post of a series of posts covering the topic about increasing the application availability on Azure Kubernetes Services / Kubernetes.

Today we cover the pod anti-affinity setting.

What is the pod anti-affinity?

In the first post of the series, I talked about the PodDisruptionBudget. The PDB guarantees that a certain amount of your application pods is available.

Defining a pod anti-affinity is the next step increasing your application’s availability. A pod anti-affinity guarantees the distribution of the pods across different nodes in your Kubernetes cluster.

You can define a soft or a hard pod anti-affinity for your application.

The soft anti-affinity is best-effort and might lead to the state that a node runs two replicas of your application instead of distributing it across different nodes.

Using the hard anti-affinity guarantees the distribution across different nodes in your cluster. The only downside using the hard anti-affinity in certain circumstances is a reduction in the overall replica count of your deployment when a node or several nodes have an outage.

Combined with a PDB this can also lead to a deadlock.

So, I recommend using the soft anti-affinity.

Using the pod anti-affinity setting

Let us have a look at the following Kubernetes template which makes use of the pod anti-affinity.

...
  template:
    metadata:
      labels:
        app: go-webapp
        version: v1
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - go-webapp
              topologyKey: kubernetes.io/hostname
      containers:
...

In the template itself I am using a soft anti-affinity which is defined using the term preferredDuringSchedulingIgnoredDuringExecution where a hard anti-affinity is defined by requiredDuringSchedulingIgnoredDuringExecution.

The soft anti-affinity has a special configuration setting called weight which is added to the scheduler calculation controlling the likelihood distributing the pods across different nodes. 1 is the lowest value and 100 the highest. When you want a higher chance of distributing the pods across different nodes with the soft anti-affinity use the value 100 here.

The labelSelector and topologyKey then defines how the scheduling works. The definition above is read like this: A pod should not be scheduled on the node if a pod with the label app=go-webapp is already running on it.

When we deploy our template on the AKS cluster all our replicas run on different nodes.

Pods after scheduling with a soft anti-affinity

> kubectl get pods -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP             NODE                                NOMINATED NODE   READINESS GATES
go-webapp-75c66f85cf-984sk   2/2     Running   0          41s   10.240.0.28    aks-nodepool1-14987876-vmss00001m   <none>           <none>
go-webapp-75c66f85cf-plnk5   2/2     Running   0          26s   10.240.2.10    aks-nodepool1-14987876-vmss00001o   <none>           <none>
go-webapp-75c66f85cf-twck2   2/2     Running   0          41s   10.240.1.145   aks-nodepool1-14987876-vmss00001n   <none>           <none>

Frankly, Kubernetes always tries to distribute your application pods across different nodes. But the pod anti-affinity allows you to better control it.

Soft vs. hard anti-affinity

As mentioned previously soft is best-effort and hard guarantees the distribution. For instance, let us deploy the Kubernetes template on a Docker for Mac single node Kubernetes cluster. First time with the soft anti-affinity setting and the second time with the hard anti-affinity setting.

...
  template:
    metadata:
      labels:
        app: go-webapp
        version: v1
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - go-webapp
            topologyKey: kubernetes.io/hostname
      containers:
...

Using the soft anti-affinity setting brings up all three replicas compared to the one replica using the hard anti-affinity setting.

### Soft anti-affinity ###
> kubectl get pods -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP          NODE             NOMINATED NODE   READINESS GATES
go-webapp-666859f746-hnnrv   2/2     Running   0          59s   10.1.0.65   docker-desktop   <none>           <none>
go-webapp-666859f746-ltgvr   2/2     Running   0          82s   10.1.0.64   docker-desktop   <none>           <none>
go-webapp-666859f746-tjqqp   2/2     Running   0          38s   10.1.0.66   docker-desktop   <none>           <none>
### Hard anti-affinity ###
> kubectl get pods -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP          NODE             NOMINATED NODE   READINESS GATES
go-webapp-5748776476-cxq76   0/2     Pending   0          48s   <none>      <none>           <none>           <none>
go-webapp-5748776476-sdnkj   2/2     Running   0          74s   10.1.0.67   docker-desktop   <none>           <none>
go-webapp-5748776476-twwbt   0/2     Pending   0          48s   <none>      <none>           <none>           <none>

Also have a look at the following screenshots where I did the same on the AKS cluster and drained one of the nodes.

Soft anti-affinity - one node drained Hard anti-affinity - one node drained

As you see using the hard anti-affinity leads to a state where the overall replica count is reduced until a new node is available to host the pod.

What protection provides the pod anti-affinity?

The pod anti-affinity provides protection against node failures and thus ensures a higher availability of your application.

Summary

Using the pod anti-affinity protects your application against node failures distributing the pods across different nodes on a best-effort or guarantee basis.

As mentioned earlier Kubernetes always tries to distribute your application pods across different nodes even without a specified pod anti-affinity. But the pod anti-affinity allows you to better control it.

You can even go further and use another topologyKey like topology.kubernetes.io/zone protecting your application against zonal failures.

A better solution for this are pod topology spread constraints which reached the stable feature state with Kubernetes 1.19.

I will cover pod topology spread constraints in the next blog post of this series. Stay tuned.

Der Beitrag Increase your application availability with pod anti-affinity settings in Azure Kubernetes Service erschien zuerst auf Daniel's Tech Blog.

Evaluating Gatekeeper policies with the Rego Playground

$
0
0

Writing and evaluating Gatekeeper policies can be hard sometimes. Especially the testing part of a newly created policy.

There are different approaches to tackle this like having a dedicated test Kubernetes cluster for it. An alternative we used was a script starting a single node KinD cluster on Docker for Mac and installing Gatekeeper onto it.

The advantage of this approach you see how the policy works under real conditions. The disadvantage is properly the long feedback loop until the policy works as expected. Especially when you write complex policies.

Rego Playground

Here comes the Rego Playground into play. It is a web service that lets you easily evaluate your written policy.

-> https://play.openpolicyagent.org/

Rego Playground start page

As seen in the screenshot on the left-hand side you put your Gatekeeper policy. You do not paste the whole ConstraintTemplate into the field just the part in the template after rego: |.

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8sdisableautomountserviceaccounttoken
spec:
  crd:
    spec:
      names:
        kind: K8sDisableAutomountServiceAccountToken
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sdisableautomountserviceaccounttoken
        missing(obj) = true {
          not obj.automountServiceAccountToken == true
          not obj.automountServiceAccountToken == false
          obj.serviceAccount == "default"
        }
        violation[{"msg": msg}] {
          p := input_pod[_]
          missing(p.spec)
          msg := sprintf("automountServiceAccountToken field is missing for pod %v while using Service Account %v", [p.metadata.name, p.spec.serviceAccount])
        }
        violation[{"msg": msg, "details": {}}] {
          p := input_pod[_]
          p.spec.automountServiceAccountToken
          p.spec.serviceAccount == "default"
          msg := sprintf("Service Account token automount is not allowed for pod %v while using Service Account %v, spec.automountServiceAccountToken: %v", [p.metadata.name, p.spec.serviceAccount, p.spec.automountServiceAccountToken])
        }
        input_pod[p] {
          p := input.review.object
        }

On the right-hand side in the upper part, you paste your input object as JSON. This can be for instance the output of an existing pod in your Kubernetes cluster.

kubectl get pods go-webapp-58554df444-mkph9 -o json

The reference to the input object for Gatekeeper on Kubernetes is input.object.review.

When pasting the JSON output for our pod into the Rego Playground as input we adjust the reference to the input object to just input.

After that is done a click on Evaluate validates the input against the policy.

You get the results on the right-hand side directly underneath of the input section.

First policy evaluation without adjustments Second policy evaluation with adjustments and value true

Looking at the screenshot the pod violates the policy. One small adjustment to the input object later it passes the policy.

Third policy evaluation with adjustments and value false

Summary

The Rego Playground makes the writing of Gatekeeper policies a breeze. You get a fast feedback loop without spinning up a local KinD cluster or having a test Kubernetes cluster in place.

Der Beitrag Evaluating Gatekeeper policies with the Rego Playground erschien zuerst auf Daniel's Tech Blog.

Run the Envoy Proxy ratelimit service for Istio on AKS with Azure Cache for Redis

$
0
0

The Istio sidecar proxy uses Envoy and therefore supports two different rate limiting modes. A local one targeting only a single service and a global one targeting the entire service mesh.

The local rate limit implementation only requires Envoy itself without the need for a rate limit service. In contrast the global rate limit implementation requires a rate limit service as its backend.

Looking at Istio and Envoy there is a reference implementation available by the Envoy Proxy community: The Envoy Proxy ratelimit service.

-> https://github.com/envoyproxy/ratelimit

So, in today’s post I walk you through the setup of the Envoy Proxy ratelimit service using an Azure Cache for Redis as its backend storage.

First, we deploy the Azure Cache for Redis in our Azure subscription in the same region we have the Azure Kubernetes Service cluster running.

> az redis create --name ratelimit --resource-group ratelimit \
  --location northeurope --sku Standard --vm-size c0

The choice here is the Standard SKU and size C0. It is the smallest Redis instance on Azure which offers an SLA of 99,9%. But you can also choose the Basic SKU.

The repo of the ratelimit service only offers a Docker Compose file and no Kubernetes template or Helm Chart. Therefore, we build the template ourselves.

So, how will our deployment look like?

Envoy Proxy ratelimit service deployment

The entire deployment consists of a namespace, a deployment, a service, a secret, a network policy, a peer authentication policy and two configuration maps.

-> https://github.com/neumanndaniel/kubernetes/tree/master/envoy-ratelimit

Let us focus on the deployment template. It will roll out the ratelimit service and a sidecar container exporting Prometheus metrics.

I have chosen the following configuration for the ratelimit service which is passed over as environment variables.

...
        env:
          - name: USE_STATSD
            value: "true"
          - name: STATSD_HOST
            value: "localhost"
          - name: STATSD_PORT
            value: "9125"
          - name: LOG_FORMAT
            value: "json"
          - name: LOG_LEVEL
            value: "debug"
          - name: REDIS_SOCKET_TYPE
            value: "tcp"
          - name: REDIS_URL
            valueFrom:
              secretKeyRef:
                name: redis
                key: url
          - name: REDIS_AUTH
            valueFrom:
              secretKeyRef:
                name: redis
                key: password
          - name: REDIS_TLS
            value: "true"
          - name: REDIS_POOL_SIZE
            value: "5"
          - name: LOCAL_CACHE_SIZE_IN_BYTES # 25 MB local cache
            value: "26214400"
          - name: RUNTIME_ROOT
            value: "/data"
          - name: RUNTIME_SUBDIRECTORY
            value: "runtime"
          - name: RUNTIME_WATCH_ROOT
            value: "false"
          - name: RUNTIME_IGNOREDOTFILES
            value: "true"
...

The first part is the configuration for exporting the rate limit metrics. We pass the statsd exporter configuration over as a configuration map object and use the default settings from the ratelimit service repo.

-> https://github.com/envoyproxy/ratelimit/blob/main/examples/prom-statsd-exporter/conf.yaml

- name: LOG_FORMAT
  value: "json"
- name: LOG_LEVEL
  value: "debug"

I recommend setting the log format to json and for the introduction phase the debug log level.

Afterwards comes the Redis configuration where we only change the default values for enabling TLS and reducing the pool size from 10 to 5. The last setting is important not to exhaust the Azure Cache for Redis connection limit for our chosen SKU and size.

- name: REDIS_TLS
  value: "true"
- name: REDIS_POOL_SIZE
  value: "5"

Another important setting is the local cache which per default is turned off. The local cache only stores information about already exhausted rate limits and reduces calls to the Redis backend.

- name: LOCAL_CACHE_SIZE_IN_BYTES # 25 MB local cache
  value: "26214400"

As our Redis in azure has 250 MB storage I am using 25 MB for the local cache size. Ten percent from Redis total storage amount.

Looking at the runtime configuration we specify a different root and sub directory. Bu the important settings are RUNTIME_WATCH_ROOT and RUNTIME_IGNOREDOTFILES. First one should be set to false and the last one to true. This guarantees the correct loading of our rate limit configuration which we again pass in via a configuration map.

apiVersion: v1
kind: ConfigMap
metadata:
  name: ratelimit-config
  namespace: ratelimit
data:
  config.yaml: |-
    domain: ratelimit
    descriptors:
      - key: PATH
        value: "/src-ip"
        rate_limit:
          unit: second
          requests_per_unit: 1
      - key: remote_address
        rate_limit:
          requests_per_unit: 10
          unit: second
      - key: HOST
        value: "aks.danielstechblog.de"
        rate_limit:
          unit: second
          requests_per_unit: 5

In my rate limit configuration, I am using PATH, remote_address and HOST as rate limits. If you want, you can specify different config.yaml files in one configuration map to separate different rate limit configurations from each other.

In our Kubernetes service object definition, we expose all ports. The three different ports of the ratelimit service and the two ports of the statsd exporter.

ContainerPortDescription
ratelimit8080healthcheck and json endpoint
ratelimit8081GRPC endpoint
ratelimit6070debug endpoint
statsd-exporter9102Prometheus metrics endpoint
statsd-exporter9125statsd endpoint

Special configuration for Istio

Whether you want the ratelimit service to be part of the service mesh or not is a debatable point. I highly encourage you to include the ratelimit service in the service mesh. The Istio sidecar proxy provides insightful information when the Istio ingress gateway talks via GRPC with the ratelimit service. Especially when you run into errors. But this is part of the next blog post about connecting the Istio ingress gateway to the ratelimit service.

So, why do we need a peer authentication and network policy for the ratelimit service?

The issue is the GRPC protocol here. When you use a STRICT mTLS configuration in your service mesh you need another peer authentication policy. Otherwise, the ingress gateway cannot connect to the ratelimit service. This is a known issue in Istio, and it seems not to be fixed in the future.

apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "ratelimit"
  namespace: "ratelimit"
spec:
  selector:
    matchLabels:
      app: ratelimit
  portLevelMtls:
    8081:
      mode: PERMISSIVE

Therefore, we use a namespace bound peer authentication policy setting the mTLS mode on the GRPC port to PERMISSIVE as seen above.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-inbound
  namespace: ratelimit
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-istio-ingressgateway
  namespace: ratelimit
spec:
  podSelector:
    matchLabels:
      app: ratelimit
  policyTypes:
  - Ingress
  ingress:
  - from:
      - namespaceSelector: {}
        podSelector:
          matchLabels:
            istio: ingressgateway
    ports:
    - port: 8081

Using the above network policy ensures that only the Istio ingress gateway can talk to our ratelimit service. I highly recommend making use of a network policy in that case.

Envoy Proxy ratelimit service rollout

The rollout is done easily by running the setup.sh script.

-> https://github.com/neumanndaniel/kubernetes/blob/master/envoy-ratelimit/setup.sh

Before you run the script adjust the configuration map matching your rate limit configuration. Afterwards just specify the Azure Cache for Redis resource group and name as parameters.

> ./setup.sh ratelimit-redis istio-ratelimit

The ratelimit service should be up and running as seen in the screenshot.

Azure Kubernetes Dashboard ratelimit service CLI ratelimit service

Testing the ratelimit service functionality

Luckily, the project offers a GRPC client which we use to test the functionality of our ratelimit service configuration as well a REST API endpoint.

-> https://github.com/envoyproxy/ratelimit#grpc-client
-> https://github.com/envoyproxy/ratelimit#http-port

Let us start with the REST API endpoint. For that we need a test payload in JSON format.

{
  "domain": "ratelimit",
  "descriptors": [
    {
      "entries": [
        {
          "key": "remote_address",
          "value": "127.0.0.1"
        }
      ]
    },
    {
      "entries": [
        {
          "key": "PATH",
          "value": "/src-ip"
        }
      ]
    },
    {
      "entries": [
        {
          "key": "HOST",
          "value": "aks.danielstechblog.de"
        }
      ]
    }
  ]
}

We then use kubectl port-forward connecting to one of the pods.

> kubectl port-forward ratelimit-fb66b5547-qpqtk 8080:8080

Calling the endpoint / healthcheck in our browser returns an OK.

ratelimit service healthcheck endpoint

Our test payload is sent via curl to the json endpoint.

> DATA=$(cat payload.json)
> curl --request POST --data-raw "$DATA" http://localhost:8080/json | jq .
---
{
  "overallCode": "OK",
  "statuses": [
    {
      "code": "OK",
      "currentLimit": {
        "requestsPerUnit": 10,
        "unit": "SECOND"
      },
      "limitRemaining": 9,
      "durationUntilReset": "1s"
    },
    {
      "code": "OK",
      "currentLimit": {
        "requestsPerUnit": 1,
        "unit": "SECOND"
      },
      "durationUntilReset": "1s"
    },
    {
      "code": "OK",
      "currentLimit": {
        "requestsPerUnit": 5,
        "unit": "SECOND"
      },
      "limitRemaining": 4,
      "durationUntilReset": "1s"
    }
  ]
}

Now let us connect to the GRPC endpoint and talk to the ratelimit service.

> kubectl port-forward ratelimit-fb66b5547-qvhns 8081:8081
> ./client -dial_string localhost:8081 -domain ratelimit -descriptors PATH=/src-ip
---
domain: ratelimit
descriptors: [ <key=PATH, value=/src-ip> ]
response: overall_code:OK  statuses:{code:OK  current_limit:{requests_per_unit:1  unit:SECOND}  duration_until_reset:{seconds:1}}

Also, the GRPC endpoint looks good and our ratelimit service is fully operational.

Summary

It takes a bit of an effort to get the reference implementation of a rate limit service for Envoy up and running. But it is worthwhile the effort as you get a good performing rate limit service for your Istio service mesh implementation.

In the next blog post I walk you through the setup connecting the Istio ingress gateway to the ratelimit service.

Der Beitrag Run the Envoy Proxy ratelimit service for Istio on AKS with Azure Cache for Redis erschien zuerst auf Daniel's Tech Blog.


Distribute your application across different availability zones in AKS using Pod Topology Spread Constraints

$
0
0

This is the last blog post of a series of posts covering the topic about increasing the application availability on Azure Kubernetes Service / Kubernetes.

Today we cover the pod topology spread constraints.

What are pod topology spread constraints?

In the first post of the series, I talked about the pod disruption budget. The PDB guarantees that a certain amount of your application pods is available.

The last post covered pod anti-affinity settings distributing the application pods across different nodes in your Kubernetes cluster.

Pod topology spread constraints are like the pod anti-affinity settings but new in Kubernetes. They were promoted to stable with Kubernetes version 1.19.

So, what are pod topology spread constraints? Pod topology spread constraints controlling how pods are scheduled across the Kubernetes cluster. They rely on failure-domains like regions, zones, nodes, or custom defined topology domains which need to be defined as node labels.

Using the pod topology spread constraints setting

You can choose between two ways of specifying the topology spread constraints. On pod-level or on cluster-level. Both ways have the same three settings: maxSkew, topologyKey and whenUnsatisfiable.

On pod-level you must additionally specify the labelSelector setting which in contrast is calculated automatically on cluster-level using the information from services, replication controllers, replica sets or stateful sets a pod belongs to.

Let us have a look at the different settings except the labelSelector and topologyKey setting as they are well-known.

The whenUnsatisfiable setting is the easiest one. What should happen with a pod when the pod does not satisfy the topology spread constraints? You can choose between DoNotSchedule, which is the default, or ScheduleAnyway.

In favor of keeping your application high available I recommend ScheduleAnyway. Even this means that pods can land on the same node in the same availability zone under rare circumstances.

The maxSkew setting defines the allowed drift for the pod distribution across the specified topology. For instance, a maxSkew setting of 1 and whenUnsatisfiable set to DoNotSchedule is the most restrictive configuration. Defining a higher value for maxSkew leads to a more non-restrictive scheduling of your pods. Which might be not the result you want.

To guarantee high availability and max scheduling flexibility of your application maxSkew should be 1 and whenUnsatisfiable should be ScheduleAnyway.

This combination ensures that the scheduler gives higher precedence to topologies that help reducing the skew. But will not led to pods in pending state that cannot satisfy the pod topology spread constraints due to the maxSkew setting.

Enabling pod topology spread constraints

On a managed Kubernetes cluster like Azure Kubernetes Service, you cannot use the cluster-level configuration unfortunately. The required apiVersion and kind are not exposed to be customizable by the end user. Therefore, we only can set pod topology spread constraints on the pod-level itself.

Let us have a look at the following example.

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
spec:
  ...
  template:
  ...
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: go-webapp
      containers:
      ...

In the template I set the setting to my already mentioned recommendations. After applying the template to my AKS cluster with three nodes in three different availability zones the application pods are distributed evenly across all three zones.

Pod topology spread constraints distribution

Adjusting the replica number to five spins up the two additional pods also across the availability zones.

Pod topology spread constraints distribution

Beside defining only a single topology spread constraint you are also able to specify multiple ones.

You find more details and examples for pod topology spread constraints in the Kubernetes documentation.

-> https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/

What protection provides the pod topology spread constraints setting?

The pod topology spread constraints provide protection against zonal or node failures for instance whatever you have defined as your topology. It is like the pod anti-affinity which can be replaced by pod topology spread constraints allowing more granular control for your pod distribution.

Summary

Pod topology spread constraints are one of the latest feature sets in Kubernetes helping you to run highly available applications.

Compared to the pod anti-affinity setting pod topology spread constraints gives you better control about the pod distribution across the used topology.

Therefore, I recommend the usage of pod topology spread constraints on Kubernetes clusters with version 1.19 or higher.

Der Beitrag Distribute your application across different availability zones in AKS using Pod Topology Spread Constraints erschien zuerst auf Daniel's Tech Blog.

Detecting SNAT port exhaustion on Azure Kubernetes Service

$
0
0

Running applications on an Azure Kubernetes Service cluster which make a lot of outbound calls might led to a SNAT port exhaustion.

In today’s blog article I walk you through how to detect and mitigate a SNAT port exhaustion on AKS.

What is a SNAT port exhaustion?

It is important to know what a SNAT port exhaustion is to apply the correct mitigation.

SNAT, Source Network Address Translation, is used in AKS whenever an outbound call to an external address is made. Assuming you use AKS in its standard configuration, it enables IP masquerading for the backend VMSS instances of the load balancer.

SNAT ports get allocated for every outbound connection to the same destination IP and destination port. The default configuration of an AKS cluster provides 64.000 SNAT ports with a 30-minute ide timeout before idle connections are released. Furthermore, AKS uses automatic allocation for the SNAT ports based on the number of nodes the cluster uses.

Number of nodesPre-allocated SNAT ports per node
1-501.024
51-100512
101-200256
201-400128
401-80064
801-1.00032

When running into a SNAT port exhaustion new outbound connections fail. So, it is important to detect a SNAT port exhaustion as early as possible.

How to detect a SNAT port exhaustion?

The guidance on Azure docs is well hidden.

-> https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-standard-diagnostics#how-do-i-check-my-snat-port-usage-and-allocation
-> https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections

In the end you check the metrics of your load balancer of the AKS cluster. The metric SNAT Connection Count shows you when a SNAT port exhaustion happened. Important step here is to add the filter for the connection state and set it to failed.

SNAT Connection Count Metric - Overall

You can filter even further on backend IP address level and apply splitting to it.

SNAT Connection Count Metric - Per backend IP address

A value higher than 0 is a SNAT port exhaustion. As not all AKS nodes running into the port exhaustion at the same time we use the following metrics Allocated SNAT Ports and Used SNAT Ports identifying how bad the SNAT port exhaustion is on the affected node(s).

SNAT Port Metrics

It is important using two filters here as otherwise we get an aggregated value which led to false assumptions. One for the protocol type set to TCP and the other one for the backend IP address set to the node that experiences the SNAT port exhaustion.

As seen above in the screenshot the used ports are not near nor equals the allocated ports. So, all good in this case. But when the used ports value gets near or equals the allocated ports value and SNAT Connection Count is also above 0 it is time for mitigating the issue.

Mitigating a SNAT port exhaustion

For AKS we have two different mitigation options that directly have an impact and solves the issue. The third option is more for a long-term strategy and an extension to the first one.

Our first option is the one which can be rolled out without architectural changes. We adjust the pre-allocated number of ports per node in the load balancer configuration. This disables the automatic allocation.

Per default in an AKS standard configuration the load balancer has one outbound public IP which results in 64.000 available ports. Each node in the cluster automatically gets a predefined number of ports assigned. The assignment is based on the number of nodes in the cluster as previously mentioned. Idle TCP connections get released after 30 minutes.

Assuming our AKS cluster uses the cluster autoscaler and can scale up to a maximum of 20 nodes. We then adjust the load balancer configuration that every node gets 3.000 ports pre-allocated compared to the default 1.024 without requiring an additional public IP. Larger values requiring additional outbound public IPs.

Furthermore, we set the TCP idle reset to 4 minutes releasing idle connections faster and free used SNAT ports.

An example Terraform configuration is shown below.

...
  network_profile {
    load_balancer_sku = "standard"
    outbound_type     = "loadBalancer"
    load_balancer_profile {
      outbound_ports_allocated  = "3000"
      idle_timeout_in_minutes   = "4"
      managed_outbound_ip_count = "1"
    }
    network_plugin     = "azure"
    network_policy     = "calico"
    dns_service_ip     = "10.0.0.10"
    docker_bridge_cidr = "172.17.0.1/16"
    service_cidr       = "10.0.0.0/16"
  }
...

The second option assigns a dedicated public IP to every node in the cluster. On the one hand it increases the costs for large AKS clusters but on the other hand it totally mitigates the SNAT issue as SNAT is not used anymore. You find the guidance in the Azure docs.

-> https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools#assign-a-public-ip-per-node-for-your-node-pools

At the beginning of this section, I mentioned a third option that complements the first one. When you use a lot of Azure PaaS services like Azure Database for PostgreSQL, Azure Cache for Redis or Azure Storage for instance you should use them with Azure Private Link. Using Azure PaaS services via their public endpoints consumes SNAT ports.

Making use of Azure Private Link reduces the SNAT port usage in your AKS cluster even further.

-> https://docs.microsoft.com/en-us/azure/private-link/private-link-overview

Summary

Long story short keep an eye on the SNAT port usage of your AKS cluster. Especially when a lot of outbound calls are made to external systems whether these are Azure PaaS services or not.

One last remark we have one more option for the SNAT port exhaustion mitigation: Azure Virtual Network NAT.

-> https://docs.microsoft.com/en-us/azure/virtual-network/nat-overview

I did not mention it as I could not find any information if this is supported by AKS. It should be but I am not 100% sure. So, let us see.

Der Beitrag Detecting SNAT port exhaustion on Azure Kubernetes Service erschien zuerst auf Daniel's Tech Blog.

Implement rate limiting with Istio on Azure Kubernetes Service

$
0
0

In my last blog post I walked you through the setup of the rate limiting reference implementation: The Envoy Proxy ratelimit service.

-> https://www.danielstechblog.io/run-the-envoy-proxy-ratelimit-service-for-istio-on-aks-with-azure-cache-for-redis/

Our today’s topic is about connecting the Istio ingress gateway to the ratelimit service. The first step for us is the Istio documentation.

-> https://istio.io/latest/docs/tasks/policy-enforcement/rate-limit/

Connect Istio with the ratelimit service

Currently, the configuration of rate limiting in Istio is tied to the EnvoyFilter object. There is no abstracting resource available which makes it quite difficult to implement it. However, with the EnvoyFilter object we have access to all the goodness the Envoy API provides.

Let us start with the first Envoy filter that connects the Istio ingress gateway to the ratelimit service. This does not apply rate limiting to inbound traffic.

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: filter-ratelimit
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      istio: ingressgateway
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: GATEWAY
        listener:
          filterChain:
            filter:
              name: "envoy.filters.network.http_connection_manager"
              subFilter:
                name: "envoy.filters.http.router"
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.ratelimit
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
            domain: ratelimit
            failure_mode_deny: false
            timeout: 25ms
            rate_limit_service:
              grpc_service:
                envoy_grpc:
                  cluster_name: rate_limit_cluster
              transport_api_version: V3
    - applyTo: CLUSTER
      match:
        cluster:
          service: ratelimit.ratelimit.svc.cluster.local
      patch:
        operation: ADD
        value:
          name: rate_limit_cluster
          type: STRICT_DNS
          connect_timeout: 25ms
          lb_policy: ROUND_ROBIN
          http2_protocol_options: {}
          load_assignment:
            cluster_name: rate_limit_cluster
            endpoints:
            - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: ratelimit.ratelimit.svc.cluster.local
                      port_value: 8081

I do not walk you through all the lines, only through the important ones.

...
          typed_config:
            "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
            domain: ratelimit
            failure_mode_deny: false
            timeout: 25ms
            rate_limit_service:
              grpc_service:
                envoy_grpc:
                  cluster_name: rate_limit_cluster
              transport_api_version: V3
...

First the value for domain must match what you defined in the config map of the ratelimit service.

apiVersion: v1
kind: ConfigMap
metadata:
  name: ratelimit-config
  namespace: ratelimit
data:
  config.yaml: |-
    domain: ratelimit
...

The value for failure_mode_deny can be either set to false or true. If this value is set to true, the Istio ingress gateway returns an HTTP 500 error when it cannot reach the ratelimit service. This results in unavailability of your application. My recommendation, set the value to false ensuring the availability of your application.

The timeout value defines the time the ratelimit service needs to return a response on a request. It should not be set to high as otherwise your users will experience increased latency on their requests. Especially, when the ratelimit service is temporary unavailable. For Istio and the ratelimit service running on AKS and having the backing Azure Cache for Redis in the same Azure region as AKS I experienced that 25ms for the timeout is a reasonable value.

The last important value is cluster_name. Which provides the name we reference in the second patch of the Envoy filter.

...
    - applyTo: CLUSTER
      match:
        cluster:
          service: ratelimit.ratelimit.svc.cluster.local
      patch:
        operation: ADD
        value:
          name: rate_limit_cluster
          type: STRICT_DNS
          connect_timeout: 25ms
          lb_policy: ROUND_ROBIN
          http2_protocol_options: {}
          load_assignment:
            cluster_name: rate_limit_cluster
            endpoints:
            - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: ratelimit.ratelimit.svc.cluster.local
                      port_value: 8081

Basically, we define the FQDN of the ratelimit service object and port the Istio ingress gateway then connects to.

Rate limit actions

The Istio ingress gateway is now connected to the ratelimit service. However, we still missing the rate limit actions that matches our ratelimit service config map configuration.

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: filter-ratelimit-svc
  namespace: istio-system
spec:
  workloadSelector:
    labels:
      istio: ingressgateway
  configPatches:
    - applyTo: VIRTUAL_HOST
      match:
        context: GATEWAY
        routeConfiguration:
          vhost:
            name: "*.danielstechblog.de:80"
            route:
              action: ANY
      patch:
        operation: MERGE
        value:
          rate_limits:
            - actions:
              - request_headers:
                  header_name: ":authority"
                  descriptor_key: "HOST"
            - actions:
              - remote_address: {}
            - actions:
              - request_headers:
                  header_name: ":path"
                  descriptor_key: "PATH"

Again, I walk you through the important parts.

...
        routeConfiguration:
          vhost:
            name: "*.danielstechblog.de:80"
            route:
              action: ANY
...

The routeConfiguration specifies the domain name and port the rate limit actions apply to.

...
        value:
          rate_limits:
            - actions:
              - request_headers:
                  header_name: ":authority"
                  descriptor_key: "HOST"
            - actions:
              - remote_address: {}
            - actions:
              - request_headers:
                  header_name: ":path"
                  descriptor_key: "PATH"

In this example configuration the rate limit actions apply to the domain name, the client IP, and the request path. This matches exactly our ratelimit service config map configuration.

...
    descriptors:
      - key: PATH
        value: "/src-ip"
        rate_limit:
          unit: second
          requests_per_unit: 1
      - key: remote_address
        rate_limit:
          requests_per_unit: 10
          unit: second
      - key: HOST
        value: "aks.danielstechblog.de"
        rate_limit:
          unit: second
          requests_per_unit: 5

After applying the rate limit actions, we test the rate limiting.

Successful request Rate limited request

As seen in the screenshots I am hitting the rate limit when calling the path /src-ip more than once per second.

Summary

It is a bit tricky to get the configuration done correctly for the EnvoyFilter objects. But when you got around it you can use all the goodness the Envoy API provides. Thus, saying the Istio documentation is no longer your friend here. Instead, you should familiarize yourself with the Envoy documentation.

-> https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ratelimit/v3/rate_limit.proto

I added the Envoy filter YAML template to my GitHub repository and adjusted the setup script to include the template as well.

-> https://github.com/neumanndaniel/kubernetes/tree/master/envoy-ratelimit

So, what is next after the Istio ingress gateway got connected to the ratelimit service? Observability! Remember that statsd runs as a sidecar container together with the ratelimit service?

In the last blog post of this series, I will show you how to collect the Prometheus metrics of the ratelimit service with Azure Monitor for containers.

Der Beitrag Implement rate limiting with Istio on Azure Kubernetes Service erschien zuerst auf Daniel's Tech Blog.

Monitor the Envoy Proxy ratelimit service with Azure Monitor for containers

$
0
0

The last two blog posts of this series covered the setup of the Envoy Proxy ratelimit service and its implementation with Istio.

-> https://www.danielstechblog.io/run-the-envoy-proxy-ratelimit-service-for-istio-on-aks-with-azure-cache-for-redis/
-> https://www.danielstechblog.io/implement-rate-limiting-with-istio-on-azure-kubernetes-service/

In today’s post I walk you through on how to monitor the ratelimit service with Azure Monitor for containers. Not the standard monitoring of the container itself. We focus on the scraping of the Prometheus metrics exposed by the statsd-exporter.

statsd-exporter configuration adjustments

Per default metrics that have been sent to the statsd-exporter do not expire. Depending on the Azure Monitor agent collection interval, the default TTL in the statsd-exporter config map should match the collection interval. This guarantees accurate metrics of the ratelimit service in Azure Monitor.

In our example we set it to one minute.

...
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: statsd-exporter-config
  namespace: ratelimit
data:
  config.yaml: |
    defaults:
      ttl: 1m # Resets the metrics every minute
    mappings:
...

For the config load metrics, we override the default TTL and set the value to three minutes. Otherwise, it might be that those metrics are not collected as those are only set ones during the container startup.

...
      - match:
          "ratelimit.service.config_load_success"
        name: "ratelimit_service_config_load_success"
        match_metric_type: counter
        ttl: 3m
      - match:
          "ratelimit.service.config_load_error"
        name: "ratelimit_service_config_load_error"
        match_metric_type: counter
        ttl: 3m
...

Ratelimit service deployment adjustments

Azure Monitor for containers supports different configuration options to scrape Prometheus metrics. The most convenient one is the monitoring of Kubernetes pods which have specific annotations set.

...
  template:
    metadata:
      labels:
        app: ratelimit
        version: v1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/scheme: "http"
        prometheus.io/path: "/metrics"
        prometheus.io/port: "9102"
...

-> https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-prometheus-integration

Additionally, we add an Istio specific annotation disabling the metrics merge.

...
        prometheus.istio.io/merge-metrics: "false"
...

-> https://istio.io/latest/docs/ops/integrations/prometheus/#option-1-metrics-merging

During my setup I discovered that the standard merge produces a male formatted result that the Azure Monitor agent cannot handle.

Network policy and peer authentication policy adjustments

As the ratelimit service namespace is locked down for inbound traffic and currently only allows GRPC traffic from the Istio ingress gateway to the ratelimit service we need to add another network policy.

...
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-oms-agent
  namespace: ratelimit
spec:
  podSelector:
    matchLabels:
      app: ratelimit
  policyTypes:
  - Ingress
  ingress:
  - from:
      - namespaceSelector: {}
        podSelector:
          matchLabels:
            rsName: omsagent-rs
      - namespaceSelector: {}
        podSelector:
          matchLabels:
            component: oms-agent
    ports:
    - port: 9102

Without the additional network policy, the Azure Monitor agent cannot scrape the Prometheus metrics.

The same counts towards for the peer authentication policy. Per default services in our Istio service mesh use the mTLS mode STRICT.

Services that are not part of the mesh cannot talk to ones that are part of the mesh. Therefore, we set the mTLS mode for the metrics endpoint of the statsd-exporter to PERMISSIVE.

apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "ratelimit"
  namespace: "ratelimit"
spec:
  selector:
    matchLabels:
      app: ratelimit
  portLevelMtls:
    8081:
      mode: PERMISSIVE
    9102:
      mode: PERMISSIVE

Otherwise, the Azure Monitor agent cannot scrape the metrics.

Azure Monitor for containers configuration

Microsoft provides a comprehensive documentation what can be configured.

-> https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-agent-config
-> https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-prometheus-integration

So, I keep the focus on what is configured in our example.

...
  prometheus-data-collection-settings: |-
    [prometheus_data_collection_settings.cluster]
      interval = "1m"
      fieldpass = [
        "ratelimit_service_config_load_success",
        "ratelimit_service_config_load_error",
        "ratelimit_service_rate_limit_near_limit",
        "ratelimit_service_rate_limit_over_limit",
        "ratelimit_service_rate_limit_total_hits",
        "ratelimit_service_rate_limit_within_limit",
        "ratelimit_service_should_rate_limit_error",
        "ratelimit_service_total_requests",
        "ratelimit_service_response_time_seconds"
      ]
      monitor_kubernetes_pods = true
      monitor_kubernetes_pods_namespaces = ["ratelimit"]

First, as earlier mentioned in this blog post the scraping interval is configured to one minute. We do not want to scrape all metrics from the statsd-exporter of our ratelimit service. Hence, we use the fieldpass option only scarping metrics we want.

Additionally, we limit the pod monitoring only to the ratelimit service namespace and enable the monitoring.

Run KQL queries

After we applied all configuration adjustments to our Azure Kubernetes Service cluster, we can start to run KQL queries analyzing the ingested metrics.

The first KQL query returns the results for the config load metrics.

InsightsMetrics
| where Namespace == "prometheus"
| where Name =='ratelimit_service_config_load_success' or Name =='ratelimit_service_config_load_error'
| extend json = todynamic(Tags)
| extend Pod = tostring(json.pod_name)
| summarize count() by Pod, Name, Value=Val
| render columnchart

azure monitor config load results

As seen in the screenshot the config load of the ratelimit service was successful and the ratelimit service is operational.

Another KQL query shows the different metrics for the rate limiting.

InsightsMetrics
| where Namespace == "prometheus"
| where Name =='ratelimit_service_rate_limit_total_hits' or Name == 'ratelimit_service_rate_limit_near_limit' or Name == 'ratelimit_service_rate_limit_over_limit' or Name == 'ratelimit_service_rate_limit_within_limit'
| extend json = todynamic(Tags)
| where json.key1 == 'PATH_/src-ip'
| project TimeGenerated, Name, Sum=Val
| render columnchart

azure monitor rate limiting results

The metrics for ratelimit_service_rate_limit_over_limit in this specific example is in total 12.

Beside this metric there is another way to receive information about specific paths that hit the rate limit. This only requires that the Istio logging is enabled.

ContainerLog
| where ContainerID == 'b07608b4e81be5f5e515255b954832dc0a56772303ca3d4fc0c9a44e7bcfa301' or ContainerID == 'fb91ee37f6e1b9b0f57d4b47cf2391f72bb4327f9bafa1e8df653bbdfe91a5af'
| extend json = todynamic(LogEntry)
| where json.response_code == '429' and json.path == '/src-ip'
| summarize count=count() by tostring(json.response_code), tostring(json.response_code_details), tostring(json.path), tostring(json.authority)

azure monitor istio ingress gateway logs

We have the same number of requests that got rate limited and did not need to scrape Prometheus metrics from the ratelimit service for it. Depending on how much insights you want the Istio logging might be enough. For deeper insights I recommend scraping the Prometheus metrics.

Summary

This is the last blog post of this series where we started with how to run the Envoy Proxy ratelimit service, implement the rate limiting with Istio and ended the topic how to monitor the ratelimit service.

-> https://www.danielstechblog.io/run-the-envoy-proxy-ratelimit-service-for-istio-on-aks-with-azure-cache-for-redis/
-> https://www.danielstechblog.io/implement-rate-limiting-with-istio-on-azure-kubernetes-service/
-> https://github.com/neumanndaniel/kubernetes/tree/master/envoy-ratelimit

I hope you got some useful information and insights on how to implement rate limiting for Istio on your AKS cluster and protect your microservices from being overloaded.

Der Beitrag Monitor the Envoy Proxy ratelimit service with Azure Monitor for containers erschien zuerst auf Daniel's Tech Blog.

Remove dangling container manifests from Azure Container Registry

$
0
0

The Azure Container Registry offers three different SKUs which differentiate from each other not only from the feature set. Each SKU comes with included storage starting at 10 GB up to 500 GB.

ACR overview

Depending on the usage pattern the included storage fills up quickly. This can be due to a lot of different container manifests from successful build pipelines or from failed pipelines pushing the container manifests each time with the same tag to the ACR.

The latter one is problematic and is the major reason for a full ACR which then runs in storage overage usage. Such container manifests are called dangling container manifests or untagged manifests.

When you push an image to the ACR with the same tag repeatedly the ACR overwrites the previous manifest. It does it by untagging the existing container manifest to accommodate the new version. The already existing container manifest becomes an untagged manifest and still consumes storage space.

Using the ACR Premium SKU you prevent your registry from filling up by activating the retention feature. After X days, the default is seven days, untagged manifests get deleted automatically.

-> https://docs.microsoft.com/en-us/azure/container-registry/container-registry-retention-policy

For the Basic and Standard SKU, you need a simple PowerShell script achieving the same result. Such PowerShell script is executed by an Azure Function or a GitHub Action on a regular schedule.

$ACRS = Get-AzContainerRegistry
foreach ($ACR in $ACRS) {
  $REPOS = Get-AzContainerRegistryRepository -RegistryName $ACR.Name
  foreach ($REPO in $REPOS) {
    $MANIFESTS = (Get-AzContainerRegistryManifest -RegistryName $ACR.Name -RepositoryName $REPO).ManifestsAttributes | Where-Object { $_.Tags -eq $null } | Sort-Object -Property LastUpdateTime -Descending
    foreach ($ITEM in $MANIFESTS) {
      $TAG = $ITEM.digest
      Write-OutPut "------------------------"
      Write-Output "Delete dangling image $REPO@$TAG"
      Remove-AzContainerRegistryManifest -RegistryName $ACR.Name -RepositoryName $REPO -Manifest $TAG
    }
  }
}

-> https://github.com/neumanndaniel/kubernetes/blob/master/acr/Remove-UntaggedManifests.ps1

The script simply iterates over every ACR in an Azure subscription and then over every repository of an ACR. Each untagged manifest is deleted. As untagged manifests are independent from the tagged manifest the deletion does not affect the production container manifest.

------------------------
Delete dangling image akscnicalc@sha256:af59dd54b997e61081efbfc7ed63647b01e44c002d9513a8db349a3a6b75f1a0
True
------------------------
Delete dangling image azp@sha256:82a0acbc2b10af684dd3379b7383e31446029ad8b5d0f9ce2de2b733f907dd9f
True

Summary

The ACR Premium SKU provides the mechanism for the clean-up of untagged manifests out-of-the-box. The other SKUs require a script for that.

What all SKUs have in common is that old manifests need to be untagged or cleaned up again by some automation mechanism. Currently, the Azure Container Registry does not provide a built-in mechanism to automatically untag container manifests that are older than X days.

So, you need to implement such mechanism yourself via a script.

Der Beitrag Remove dangling container manifests from Azure Container Registry erschien zuerst auf Daniel's Tech Blog.

Identify the max capacity of ephemeral OS disks for Azure VM sizes

$
0
0

Back in 2019 Microsoft introduced the ephemeral OS disk option for Azure VMs and VMSS.

-> https://azure.microsoft.com/en-us/updates/azure-ephemeral-os-disk-now-generally-available/

Instead of storing and persisting the OS disk to the Azure remote storage the ephemeral OS disk is stored onto the VM’s cache. Hence, ephemeral OS disks are perfect for stateless workloads like Azure Kubernetes Service node pools.

For instance, the ephemeral OS disks are the default for new AKS node pools. And here is the general issue. When you specify an OS disk size which is larger than the VM’s cache size AKS falls back to the managed OS disk option. This happens especially with smaller Azure VM sizes compared to the larger ones.

I have written a PowerShell-based Azure Function and a PowerShell script tackling this problem and getting the max capacity of ephemeral OS disks for Azure VMs in an Azure region.

You call the Azure Function API endpoint under the following URL.

-> https://ephemeraldisk.danielstechblog.de/api/ephemeraldisk

As input parameters you provide the Azure region ?location=northeurope and you can provide the VM family &family=Dsv3 as a filter for the result set.

API call with northeurope as region API call with northeurope as region and Dsv3 as family filter

-> https://ephemeraldisk.danielstechblog.de/api/ephemeraldisk?location=northeurope
-> https://ephemeraldisk.danielstechblog.de/api/ephemeraldisk?location=northeurope&family=Dsv3

The difference between the function and the PowerShell script is minimal. Using the script, you only get the results for Azure VM sizes which are unlocked for the subscription. Where the function returns all VM sizes for an Azure region.

> ./Get-EphemeralOsDiskVmSizes.ps1 northeurope Dsv3
[
  {
    "Name": "Standard_D16s_v3",
    "Family": "DSv3",
    "MaxEphemeralOsDiskSizeGb": 400,
    "EphemeralOsDiskSupported": true
  },
  {
    "Name": "Standard_D2s_v3",
    "Family": "DSv3",
    "MaxEphemeralOsDiskSizeGb": 50,
    "EphemeralOsDiskSupported": true
  },
  {
    "Name": "Standard_D32s_v3",
    "Family": "DSv3",
    "MaxEphemeralOsDiskSizeGb": 800,
    "EphemeralOsDiskSupported": true
  },
  {
    "Name": "Standard_D48s_v3",
    "Family": "DSv3",
    "MaxEphemeralOsDiskSizeGb": 1200,
    "EphemeralOsDiskSupported": true
  },
  {
    "Name": "Standard_D4s_v3",
    "Family": "DSv3",
    "MaxEphemeralOsDiskSizeGb": 100,
    "EphemeralOsDiskSupported": true
  },
  {
    "Name": "Standard_D64s_v3",
    "Family": "DSv3",
    "MaxEphemeralOsDiskSizeGb": 1600,
    "EphemeralOsDiskSupported": true
  },
  {
    "Name": "Standard_D8s_v3",
    "Family": "DSv3",
    "MaxEphemeralOsDiskSizeGb": 200,
    "EphemeralOsDiskSupported": true
  }
]

Both solutions are available on my GitHub repository.

Der Beitrag Identify the max capacity of ephemeral OS disks for Azure VM sizes erschien zuerst auf Daniel's Tech Blog.

Running Podman on macOS with Multipass

$
0
0

Several months ago, I worked on a little side project during my spare time but instead of writing a blog post I set it aside till today.

Since the announcement that Docker made yesterday on what has changed in the Docker Subscription Service Agreement my side project got my attention again.

-> https://www.docker.com/blog/updating-product-subscriptions/

For most of us nothing will change as Docker for Desktop stays free for personal use. But for companies which have been using Docker for Desktop for free and not using Docker Hub as their primary container registry things have changed since yesterday.

Looking at alternatives for Docker for Desktop, Podman will be definitely the container engine that gets most of the attention right now.

-> https://podman.io/

In today’s blog post I walk you through how to run Podman on macOS with Multipass as an alternative for Docker for Desktop. This was my little side project I worked on several months ago.

Let us start with the prerequisites for it. You need two tools installed on your Mac. The Podman client and Multipass. Both tools can be installed easily via brew.

I have written an install script which installs both tools via brew in the first step.

#!/bin/bash
PODMAN_MODE=$1
# Install podman client & multipass
brew install podman multipass
# Symlink as otherwise `az acr login` does not work.
ln -s /usr/local/bin/podman /usr/local/bin/docker || true
# Podman setup
SSH_PUB_KEY=$(cat ~/.ssh/id_rsa.pub)
echo '
ssh_authorized_keys:
  - '${SSH_PUB_KEY}'' >> user-data
./create.sh $PODMAN_MODE

Instead of only using docker as an alias for podman, as recommended in most articles throughout the web, I am creating a symlink too. The symlink is a hard requirement when you are working with the Azure Container Registry and using the command az acr login. The command checks for the docker binary executable and fails if it cannot find it on the system. So, the docker alias will not work in that case but the symlink does.

Next step is your SSH public key which is added to the end of the user-data file.

users:
  - default
write_files:
  - path: /home/ubuntu/setup-podman.sh
    content: |
      #!/bin/bash
      # Set correct permission on own home folder
      sudo chown ubuntu:ubuntu .
      chmod 755 .
      # Install podman
      . /etc/os-release
      echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
      curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/Release.key | sudo apt-key add -
      sudo apt update
      sudo apt install podman fuse-overlayfs -y
      sudo cp /home/ubuntu/.ssh/authorized_keys /root/.ssh/authorized_keys
      sudo systemctl --system enable --now podman.socket
      systemctl --user enable --now podman.socket
      sudo loginctl enable-linger $USER
      sudo systemctl enable --now ssh.service
    permissions: "0755"
runcmd:
  - sudo cp /etc/skel/.bashrc /home/ubuntu/.bashrc
  - sudo cp /etc/skel/.bash_logout /home/ubuntu/.bash_logout
  - sudo cp /etc/skel/.profile /home/ubuntu/.profile

The user-data file is a cloud-init file which configures our Ubuntu VM we spin up with Multipass on our Mac.

As seen above a setup script for installing Podman in the Ubuntu VM is placed in the user’s home folder. Podman gets configured to be accessible in root and rootless mode. Depending on the parameter you provide when running the install.sh script you interact in root or rootless mode with Podman.

> ./install.sh root
> ./install.sh rootless

Last step is running the create.sh script.

#!/bin/bash
PODMAN_MODE=$1
INSTANCE_NAME="podman"
multipass set client.primary-name=$INSTANCE_NAME
multipass launch -c 4 -m 8G -d 32G -n $INSTANCE_NAME --cloud-init user-data 20.04
multipass exec $INSTANCE_NAME -- /home/ubuntu/setup-podman.sh
IP=$(multipass info $INSTANCE_NAME | grep IPv4: | cut -d ':' -f2 | tr -ds ' ' '')
if [ "$PODMAN_MODE" == "root" ]; then
  podman system connection add $INSTANCE_NAME --identity ~/.ssh/id_rsa  ssh://root@${IP}/run/podman/podman.sock
else
  podman system connection add $INSTANCE_NAME --identity ~/.ssh/id_rsa  ssh://ubuntu@${IP}/run/user/1000/podman/podman.sock
fi
# List of volume mounts that Docker for Desktop also mounts per default.
multipass mount /Users $INSTANCE_NAME
multipass mount /Volumes $INSTANCE_NAME
multipass mount /private $INSTANCE_NAME
multipass mount /tmp $INSTANCE_NAME
multipass mount /var/folders $INSTANCE_NAME
multipass list
echo "#######################"
podman system connection list

The script launches a new Multipass Ubuntu 20.04 instance with 4 cores, 8 GB memory and 32 GB disk configured. Podman then gets installed by executing the setup script. Afterwards a system connection is added for Podman which enables us using the Podman client from our Mac to interact with the Podman server in the Multipass instance.

Finally, several folders are mounted into the Multipass instance and Podman is ready. The volume mounts are the exact default mounts Docker for Desktop uses. In the end you only need the /Users mount point.

“By default the /Users, /Volume, /private, /tmp and /var/folders directory are shared.”
https://docs.docker.com/desktop/mac/#resources

When we run now podman version or docker version we get the version information about the Podman client and server.

Terminal - Running podman version and docker version

Let us start a simple Hello World container application. As I am interacting with Podman in root mode I can bind the container to the privileged port 80.

> docker run -d -p 80:80 mcr.microsoft.com/azuredocs/aci-helloworld:latest
> docker ps
CONTAINER ID  IMAGE                                              COMMAND               CREATED         STATUS             PORTS               NAMES
c0aca4d57e9a  mcr.microsoft.com/azuredocs/aci-helloworld:latest  /bin/sh -c node /...  16 seconds ago  Up 16 seconds ago  0.0.0.0:80->80/tcp  exciting_colden

Opening a browser and accessing the Multipass instance via its IP address we see the Hello World application.

Browser - Hello World application

As always, you find the scripts on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/tree/main/Bash/Podman

I also added two more scripts. One for switching between root and rootless mode and the other one to reset the Multipass instance.

Der Beitrag Running Podman on macOS with Multipass erschien zuerst auf Daniel's Tech Blog.


5 years as a Microsoft MVP

Local Kubernetes setup with KinD on Podman

$
0
0

In one of my last blog posts I walked you through the setup how to run Podman on macOS with Multipass as Docker for Desktop alternative.

-> https://www.danielstechblog.io/running-podman-on-macos-with-multipass/

Today I briefly show you the local Kubernetes setup with KinD on Podman. Even the Podman support of KinD is in an experimental state it runs stable enough for the daily usage.

The setup I am running is the same I use with Docker for Desktop.

-> https://www.danielstechblog.io/local-kubernetes-setup-with-kind/
-> https://github.com/neumanndaniel/kubernetes/tree/master/kind

A tremendous difference between my Podman setup with Multipass and Docker for Desktop is the network access. KinD on Docker for Desktop uses the localhost interface 127.0.0.1 where Podman with Multipass has its own IP address from the bridge interface.

Therefore, I created a new setup script.

#!/bin/bash
brew install gsed
# Podman IP configuration
INSTANCE_NAME="podman"
IP=$(multipass info $INSTANCE_NAME | grep IPv4: | cut -d ':' -f2 | tr -ds ' ' '')
IP_CONFIG_EXISTS=$(cat /private/etc/hosts | grep -c "$IP")
if [[ $IP_CONFIG_EXISTS -eq 0 ]]; then
  echo "$IP $INSTANCE_NAME" | sudo tee -a /private/etc/hosts
fi
# Create KinD cluster
wget https://raw.githubusercontent.com/neumanndaniel/kubernetes/master/kind/single-node.yaml -O /tmp/single-node.yaml
gsed -i 's/127.0.0.1/'$IP'/g' /tmp/single-node.yaml
kind create cluster --config=/tmp/single-node.yaml
# Calico
kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl apply -f https://raw.githubusercontent.com/neumanndaniel/kubernetes/master/kind/calico-config.yaml
sleep 120
# Metrics Server
kubectl config set-context --current --namespace kube-system
helm repo add bitnami https://charts.bitnami.com/bitnami || true
helm repo update
helm upgrade metrics-server --install \
  --set apiService.create=true \
  --set extraArgs.kubelet-insecure-tls=true \
  --set extraArgs.kubelet-preferred-address-types=InternalIP \
  bitnami/metrics-server --namespace kube-system

First, the script installs GNU sed via brew. It reads then the IP address from the Multipass instance, downloads the KinD configuration, and replaces 127.0.0.1 with the IP. The instance name is also added to the host file. Afterwards the KinD single node gets instantiated with Calico and the metrics server enabled.

I went a step further and installed Istio with a sample application.

-> https://www.danielstechblog.io/running-istio-on-kind-kubernetes-in-docker/

istioctl install -f install-istio.yaml --skip-confirmation

The application is reachable under the instance name the script added earlier to the hosts file on the Mac.

Sample application

As always, you find the script on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/blob/main/Bash/Podman/kind.sh

Der Beitrag Local Kubernetes setup with KinD on Podman erschien zuerst auf Daniel's Tech Blog.

Azure Policy for Kubernetes – custom policies on Azure Arc enabled Kubernetes

$
0
0

On September 1st Microsoft announced the public preview of the custom policy support for Azure Policy for AKS.

-> https://azure.microsoft.com/en-us/updates/custom-aks-policy-support-now-public-preview/

I am already using the public preview on my AKS cluster and was curious about if this would work as well with Azure Arc enabled Kubernetes.

The short answer is yes, but with some minor adjustments.

Configuration

First step is the deployment of the Azure Policy for Kubernetes on the Azure Arc enabled Kubernetes cluster. The default values of the Helm chart install Gatekeeper and the Azure Policy components with outdated versions. Therefore, I checked my AKS cluster with the enabled Azure Policy add-on for the recent version.

Below is the command for the Helm chart installing the recent version of the Azure Policy add-on.

> helm upgrade azure-policy-addon azure-policy/azure-policy-addon-arc-clusters --install --namespace kube-system \
    --set azurepolicy.env.resourceid=/subscriptions/{REDACTED}/resourceGroups/KinD/providers/Microsoft.Kubernetes/connectedClusters/KinD \
    --set azurepolicy.env.clientid={REDACTED} \
    --set azurepolicy.env.clientsecret='{REDACTED}' \
    --set azurepolicy.env.tenantid={REDACTED} \
    --set azurepolicy.image.tag="prod_20210719.1" \
    --set azurepolicywebhook.image.tag="prod_20210713.1" \
    --set gatekeeper.image.tag="v3.4.1"

I am using a single node KinD cluster on my MacBook for Azure Arc enabled Kubernetes.

Azure portal Azure Arc Kubernetes

> kubectl get nodes
NAME                 STATUS   ROLES                  AGE     VERSION
kind-control-plane   Ready    control-plane,master   5d23h   v1.21.1
> kubectl get pods -n azure-arc
NAME                                         READY   STATUS    RESTARTS   AGE
cluster-metadata-operator-77d878d65c-5rxvz   2/2     Running   4          5d23h
clusterconnect-agent-6d894d44b-tp6gz         3/3     Running   6          5d23h
clusteridentityoperator-578c88fb78-wsxsq     2/2     Running   4          5d23h
config-agent-78974cb85b-9wp27                2/2     Running   4          5d23h
controller-manager-5b99f7b9df-4l5fs          2/2     Running   4          5d23h
extension-manager-58fd78456c-pltm5           2/2     Running   4          5d23h
flux-logs-agent-bd5659f94-xmhxs              1/1     Running   2          5d23h
kube-aad-proxy-78954db59b-9lh6q              2/2     Running   4          5d23h
metrics-agent-675566f58f-h652h               2/2     Running   4          5d23h
resource-sync-agent-5c547cd6-dvckp           2/2     Running   4          5d23h
> kubectl get pods -n gatekeeper-system
NAME                                             READY   STATUS    RESTARTS   AGE
gatekeeper-audit-56f4664568-9hjhj                1/1     Running   1          2d11h
gatekeeper-controller-manager-6597bd98fc-6fcw2   1/1     Running   1          2d11h
gatekeeper-controller-manager-6597bd98fc-hv2sm   1/1     Running   1          2d11h
> kubectl get pods -n kube-system | grep azure-policy
azure-policy-7cdb6d4d9b-88b4x                1/1     Running   1          2d11h
azure-policy-webhook-fc78c646d-w9dt4         1/1     Running   1          2d11h

When you follow the instructions for the custom policy setup take a look into the generated JSON document.

-> https://techcommunity.microsoft.com/t5/azure-governance-and-management/azure-policy-for-kubernetes-releases-support-for-custom-policy/ba-p/2699466

Especially at the field policyRule. The type of resource the custom policy applies to only includes Azure Kubernetes Service cluster.

{
  "properties": {
    "policyType": "Custom",
    "mode": "Microsoft.Kubernetes.Data",
    ...
    "policyRule": {
      "if": {
        "field": "type",
        "in": [
          "Microsoft.ContainerService/managedClusters"
        ]
      },
      "then": {
        "effect": "[parameters('effect')]",
...

So, how we get the custom policy on an Azure Arc enabled Kubernetes cluster? We add the resource type.

{
  "properties": {
    "policyType": "Custom",
    "mode": "Microsoft.Kubernetes.Data",
    ...
    "policyRule": {
      "if": {
        "field": "type",
        "in": [
          "Microsoft.Kubernetes/connectedClusters",
          "Microsoft.ContainerService/managedClusters"
        ]
      },
      "then": {
        "effect": "[parameters('effect')]",
...

After that we continue with the setup steps in the documentation.

The custom policy shows up in the Azure portal.

Azure portal custom policy Azure portal custom policy

Azure Policy for Kubernetes synchronizes every 15 minutes with Azure for new policies, submitting audit results, or configuration changes.

Checking the Kubernetes log of the Azure Policy pod it shows the successful download and creation of the constraint template from the Azure Storage account.

> kubectl logs azure-policy-7cdb6d4d9b-88b4x azure-policy | grep azurekubernetes
{"level":"info","ts":"2021-09-18T19:38:49.354457715Z","msg":"Retrieving file from url","log-id":"c99951d9-d-72","method":"github.com/Azure/azure-policy-kubernetes/pkg/formatter.(*PolicyFormatter).GetFileFromURL","url":"https://azurekubernetespolicy.blob.core.windows.net/constraint-templates/disable-automount-default-service-account-token.yaml"}
{"level":"info","ts":"2021-09-18T19:38:49.763245049Z","msg":"Constraint template created from URL","log-id":"c99951d9-d-73","method":"github.com/Azure/azure-policy-kubernetes/pkg/resourceutils/templateutils.(*Utils).CreateTemplateFromPolicy","Source":"https://azurekubernetespolicy.blob.core.windows.net/constraint-templates/disable-automount-default-service-account-token.yaml"}

Azure Storage account custom policy

Receiving the first audit results of the new policy takes between 15-30 minutes. The screenshots below showing a non-compliant state for the Azure Arc enabled Kubernetes cluster.

Azure portal custom policy audit results Azure portal custom policy audit results Azure portal custom policy audit results

Summary

Even Microsoft currently says that the public preview is only for Azure Policy for AKS it also works for Azure Arc enabled Kubernetes.

Only two adjustments are needed to get the custom policy working on Azure Arc enabled Kubernetes. First, using the same version for the Azure Policy add-on as AKS. Second, adding the resource type to the policyRule field in the generated JSON document.

Der Beitrag Azure Policy for Kubernetes – custom policies on Azure Arc enabled Kubernetes erschien zuerst auf Daniel's Tech Blog.

Mitigating slow container image pulls on Azure Kubernetes Service

$
0
0

It might happen that you experience slow container image pulls on your Azure Kubernetes Service nodes. First thought might be the Azure Container Registry is the root cause. Even when using the ACR without the geo-replication option enabled, image pulls from an ACR in Europe to AKS nodes running in Australia are fast. Therefore, it can be the ACR especially when you do not use the Premium SKU as the Basic and Standard SKUs have lower values for ReadOps per minute, WriteOps per minute, and download bandwidth.

-> https://docs.microsoft.com/en-us/azure/container-registry/container-registry-skus#service-tier-features-and-limits

As Microsoft states in its documentation those limits are minimum estimates and ACR strives to improve performance as usage requires. Beside that you can run through the troubleshooting guide for the ACR excluding the ACR itself as the root cause.

-> https://docs.microsoft.com/en-us/azure/container-registry/container-registry-troubleshoot-performance

Investigation

If it is not the container registry, what else can be the root cause for slow container image pulls on your AKS nodes?

The node itself. Especially the node’s OS disk. Depending on the VM size and the OS disk size the disk’s performance differs. Another difference is the node’s OS disk configuration whether you use ephemeral or persistent disks. For instance, an AKS node with a 128 GB OS disk running as D4s_v3 only achieves 500 IOPS and 96 MB/sec throughput using a persistent disk. The ephemeral disk option provides 8000 IOPS and 64 MB/sec throughput for this VM size.

How do we identify the OS disk as the root cause for the slow image pulls?

We look at the VMSS metrics of the particular node pool of our Azure Kubernetes Service cluster.

Azure portal AKS cluster properties Azure portal AKS node resource group Azure portal AKS node pool VMSS metrics

The OS disk queue depth metric is the most important one here.

OS Disk Queue Depth: The number of current outstanding IO requests that are waiting to be read from or written to the OS disk.

-> https://docs.microsoft.com/en-us/azure/virtual-machines/disks-metrics#disk-io-throughput-and-queue-depth-metrics

Azure portal AKS node pool VMSS OS disk queue depth metric

In the screenshot above you see OS disk queue depths from 10 to 17 on my AKS nodes. The first values are during the AKS cluster start as I am using the start/stop functionality for my cluster. The second peak was scaling a Kubernetes deployment from 2 to 50 replicas.

A value above 20 is a concern which you should observe and might take actions on as it indicates that the disk performance might not be enough for handling the load for container image pulls and other related disk operations.

Taking a further look into the IOPS and throughput metrics in comparison to the OS disk queue depth metric shows us which is or might become the bottleneck.

Azure portal AKS node pool VMSS metrics comparison

I am using ephemeral disks for my AKS nodes. Therefore, my IOPS limit is 8000 and my throughput limit is 64 MB/sec. Looking at the throughput values this might become an issue on my AKS nodes during container image pulls.

Mitigation

When do we use a larger OS disk or a larger VM size mitigating a high OS disk queue depth?

Action should be taken on that when the OS disk queue depth is constantly above 20 during normal operation of your AKS cluster. If you only see a high OS disk queue depth during an AKS cluster upgrade, a node reboot for applying security patches, or a scale-out this can be annoying but normally no action is needed. During those events, a lot of container images get pulled at the same time leading to a lot of writes, small writes, to the OS disk which in the end leads to a high OS disk queue depth.

This normally resolves itself within minutes. Should this be not the case, or this is still a concern for you then a larger OS disk or larger VM size resolves the issue.

Another option you have at your fingertips is the image pull policy setting in your Kubernetes templates. You can set it to IfNotPresent instead of Always preventing container images to be pulled which are already present on the node. The image pull policy Always instead queries every time the container registry resolving the container image digest. If the digest is different from the one of the container image in the local cache the container image gets pulled from the container registry.

Summary

It is not easy identifying the root cause for slow container image pulls on your AKS nodes as you need to dig deep into configurations and metrics. But it is a good exercise understanding your entire infrastructure configuration whether it is the container registry, the networking, or the Kubernetes nodes itself and how all works together delivering your applications to your users.

Der Beitrag Mitigating slow container image pulls on Azure Kubernetes Service erschien zuerst auf Daniel's Tech Blog.

Using Conftest for Azure Policy for Kubernetes

$
0
0

Conftest is a tool that lets you write tests against structure data like Kubernetes templates.

-> https://www.conftest.dev/

So, why should you use Conftest when you already established your policies with Azure Policy for Kubernetes?

As Azure Policy for Kubernetes uses Gatekeeper the OPA implementation for Kubernetes under the hood it uses Gatekeeper constraint templates written in Rego. Tests written for Conftest are also written in Rego. Therefore, you reuse the Rego part of the Gatekeeper constraint template for your Conftest test.

Providing Conftest tests to your developers makes live for them much easier. They may know how to write the Kubernetes templates to comply with all policies in place. But this might not be a guarantee for a successful deployment.

Without Conftest tests your developers need to check the replica set when a deployment fails ensuring a policy violation is the root cause for that.

This is cumbersome and not straight forward. And here are coming Conftest tests into play. Your developers include those tests into the application’s deployment pipeline and ensure that the Kubernetes template complies with the policies in place before deploying it to the Azure Kubernetes Service cluster.

Write Conftest tests

After that long introduction let us write a Conftest test. For instance, I have a custom policy deployed to my AKS cluster via Azure Policy for Kubernetes which checks that pods have disabled the automount of the service account token when the service account is the default one.

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8sdisableautomountserviceaccounttoken
spec:
  crd:
    spec:
      names:
        kind: K8sDisableAutomountServiceAccountToken
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sdisableautomountserviceaccounttoken
        missing(obj) = true {
          not obj.automountServiceAccountToken == true
          not obj.automountServiceAccountToken == false
          obj.serviceAccount == "default"
        }
        check(obj) = true {
          obj.automountServiceAccountToken
          obj.serviceAccount == "default"
        }
        violation[{"msg": msg}] {
          p := input_pod[_]
          missing(p.spec)
          msg := sprintf("automountServiceAccountToken field is missing for pod %v while using Service Account %v", [p.metadata.name, p.spec.serviceAccount])
        }
        violation[{"msg": msg, "details": {}}] {
          p := input_pod[_]
          check(p.spec)
          msg := sprintf("Service Account token automount is not allowed for pod %v while using Service Account %v, spec.automountServiceAccountToken: %v", [p.metadata.name, p.spec.serviceAccount, p.spec.automountServiceAccountToken])
        }
        input_pod[p] {
          p := input.review.object
        }

-> https://github.com/neumanndaniel/kubernetes/blob/master/conftest/constraint-template.yaml

Most policies you write targeting pods as otherwise such policies written for deployments could be bypassed easily using a replica set or pod template for the deployment.

Our Conftest test needs to support deployments and cron jobs beside pods. Therefore, our test has three different input options covering all three kinds of Kubernetes objects.

input_pod[p] {
  input.kind == "Deployment"
  p := input.spec.template
}
input_pod[p] {
  input.kind == "CronJob"
  p := input.spec.jobTemplate.spec.template
}
input_pod[p] {
  input.kind == "Pod"
  p := input
}

The two violations we check for are mostly identical with the one from the Gatekeeper constraint template.

violation[{"msg": msg}] {
  p := input_pod[_]
  missing(p.spec)
  msg := sprintf("automountServiceAccountToken field is missing for %v %v while using Service Account default", [input.kind, input.metadata.name])
}
violation[{"msg": msg, "details": {}}] {
  p := input_pod[_]
  check(p.spec)
  msg := sprintf("Service Account token automount is not allowed for %v %v while using Service Account default, spec.automountServiceAccountToken: %v", [input.kind, input.metadata.name, p.spec.automountServiceAccountToken])
}

Minor adjustments to the message parts and the variables are required supporting the three kinds of Kubernetes objects mentioned before.

During a deployment Kubernetes automatically adds the serviceAccount field to the pod object with the value default if not defined otherwise. Hence, only one check and missing function is required for the Gatekeeper constraint template. For the Conftest test two additional check and missing functions are needed to check for a missing serviceAccount field in the Kubernetes templates we want to validate with Conftest.

missing(obj) {
  not obj.automountServiceAccountToken == true
  not obj.automountServiceAccountToken == false
  missingServiceAccount(obj, "serviceAccount")
}
check(obj) {
  obj.automountServiceAccountToken
  missingServiceAccount(obj, "serviceAccount")
}

Those two functions calling the missingServiceAccount functions to determine a missing serviceAccount field.

missingServiceAccount(obj, field) {
  not obj[field]
}
missingServiceAccount(obj, field) {
  obj[field] == ""
}

And here is the full Conftest test.

package main
missingServiceAccount(obj, field) {
  not obj[field]
}
missingServiceAccount(obj, field) {
  obj[field] == ""
}
missing(obj) {
  not obj.automountServiceAccountToken == true
  not obj.automountServiceAccountToken == false
  missingServiceAccount(obj, "serviceAccount")
}
missing(obj) {
  not obj.automountServiceAccountToken == true
  not obj.automountServiceAccountToken == false
  obj.serviceAccount == "default"
}
check(obj) {
  obj.automountServiceAccountToken
  missingServiceAccount(obj, "serviceAccount")
}
check(obj) {
  obj.automountServiceAccountToken
  obj.serviceAccount == "default"
}
violation[{"msg": msg}] {
  p := input_pod[_]
  missing(p.spec)
  msg := sprintf("automountServiceAccountToken field is missing for %v %v while using Service Account default", [input.kind, input.metadata.name])
}
violation[{"msg": msg, "details": {}}] {
  p := input_pod[_]
  check(p.spec)
  msg := sprintf("Service Account token automount is not allowed for %v %v while using Service Account default, spec.automountServiceAccountToken: %v", [input.kind, input.metadata.name, p.spec.automountServiceAccountToken])
}
input_pod[p] {
  input.kind == "Deployment"
  p := input.spec.template
}
input_pod[p] {
  input.kind == "CronJob"
  p := input.spec.jobTemplate.spec.template
}
input_pod[p] {
  input.kind == "Pod"
  p := input
}

-> https://github.com/neumanndaniel/kubernetes/blob/master/conftest/test.rego

Use Conftest tests

Conftest tests are normally placed into a subfolder policy in your current working directory. But you can use the –policy option providing the path to where you store your tests at a central location.

Assuming we placed the test into the policy subfolder the following command tests our Kubernetes templates.

> conftest test <path to template>

I prepared three different Kubernetes templates all using the default service account with different settings violating or complying with the policy.

> conftest test deployment.yaml cronjob.yaml pod.yaml
FAIL - cronjob.yaml - main - Service Account token automount is not allowed for CronJob go-webapp while using Service Account default, spec.automountServiceAccountToken: true
FAIL - pod.yaml - main - automountServiceAccountToken field is missing for Pod go-webapp while using Service Account default
6 tests, 4 passed, 0 warnings, 2 failures, 0 exceptions

Terminal Conftest test results

As seen in the test results the pod template misses the automountServiceAccountToken field, the cron job template has set the automountServiceAccountToken field to true, and the deployment template complies with the policy.

Summary

Conftest tests help you validating your Kubernetes templates against your Azure Policy for Kubernetes policies before the actual deployment happens to your AKS cluster. Especially for your developers those tests are tremendously helpful.

For more details about Conftest look into the documentation.

-> https://www.conftest.dev/

As always you find the Gatekeeper constraint template I am using as custom policy within Azure Policy for Kubernetes and the corresponding Conftest test in my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/tree/master/conftest

Der Beitrag Using Conftest for Azure Policy for Kubernetes erschien zuerst auf Daniel's Tech Blog.

Viewing all 181 articles
Browse latest View live