Azure Load Testing news

August 12, 2023, 1:47 pm

≫ Next: My preparation and tips for the Certified Kubernetes Administrator exam

≪ Previous: How to not block Terraform with Azure resource locks

I have been using Azure Load Testing for my Azure Chaos Studio demos for a while now. The service provides an on-demand infrastructure to run your load tests as a managed service.

Recently, the service received some significant updates I like to share with you.

The first update targets the test duration. Previously limited to three hours max, you can now request an increase to 24 hours. That opens up some new testing scenarios.

-> https://azure.microsoft.com/en-us/updates/azure-load-testing-run-tests-for-up-to-24-hours/?wt.mc_id=AZ-MVP-5000119

The second update also increases a previous limit. You can now use up to 400 engine instances per test run. That results into test runs simulating up to 100.000 virtual users.

-> https://azure.microsoft.com/en-us/updates/azure-load-testing-run-tests-with-100000-virtual-users/?wt.mc_id=AZ-MVP-5000119

Last but not least, the preview of the Azure CLI support arrived.

-> https://azure.microsoft.com/en-us/updates/azure-load-testing-create-and-manage-tests-and-test-runs-using-azure-cli/?wt.mc_id=AZ-MVP-5000119

Der Beitrag Azure Load Testing news erschien zuerst auf Daniel's Tech Blog.

↧

My preparation and tips for the Certified Kubernetes Administrator exam

August 24, 2023, 1:02 pm

≫ Next: Running Fluent Bit on Azure Linux in Azure Kubernetes Service

≪ Previous: Azure Load Testing news

A few weeks back, I passed the Certified Kubernetes Administrator exam, a long-term item on my to-do list, and eventually accomplished it.

-> https://www.cncf.io/certification/cka/
-> https://training.linuxfoundation.org/certification/certified-kubernetes-administrator-cka/

I have been working with Kubernetes for nearly seven years now. Mostly with managed Kubernetes on Azure, Azure Kubernetes Service (AKS). Besides Azure Kubernetes Service, I am using KinD, Kubernetes in Docker, or Kubernetes on Docker Desktop daily for testing. I also played around with kubeadm, k3s, and Rancher Kubernetes Engine (RKE) in my home lab.

So, I would say I have quite some experience with Kubernetes and have not taken the exam lightly. Hence, I share with you how I prepared for the exam.

Preparation

Even though I am working daily with Kubernetes, I bought a book to prepare for the exam. Yes, a book. As I am addicted to the books from O’Reilly, I got myself the Certified Kubernetes Administrator (CKA) Study Guide by Benjamin Muschko.

-> https://www.oreilly.com/library/view/certified-kubernetes-administrator/9781098107215/

I went through the book and focused on topics like Kubernetes cluster creation, updates, and all things regarding etcd. Those are especially helpful for the exam when you use and work with managed Kubernetes.

After finishing the book, I did the exercises mostly on Kubernetes on Docker Desktop and the Kubernetes cluster and etcd part on two Raspberry Pi 4 with 4 GB memory each.

Besides the technical part, I invested time in getting very familiar with the Kubernetes documentation and how to find the relevant sections via the search in the shortest amount of time. The Kubernetes documentation is one allowed resource during the exam and is tremendously helpful in getting the correct syntax for the Kubernetes templates.

Final preparation

When you book your Certified Kubernetes Administrator exam, you get two killer.sh sessions for free to prepare for the exam with the exam simulator.

-> https://killer.sh/

I started using the exam simulator five days before my exam date. Each simulator session is available for 36 hours, and you can reset the simulator environment.

In total, I did two runs of the exam simulator per session and actually spent eight hours in the simulator getting familiar with the exam environment.

From those runs, I can tell you that you get the most out of the simulator and the confidence to pass the exam.

Might have been the simulator sessions enough to prepare for the exam? I would say yes, but better safe than sorry, as the exam is expensive. In my case, I used my discount code from KubeCon Europe and was fortunate enough that my employer, LeanIX, covered the exam costs.

One thing that should not be underestimated is to get enough sleep the night before the exam. Be well rested before the exam, as it will be your two-hour fire drill, which will be energy-sapping enough.

Everyone who did the exam or works as an on-call engineer knows what I am talking about.

Tips for the exam

Now we come to the exciting part. First, I want to highlight that you should spend the first three to five minutes of the exam to get Firefox and the terminal ready for the rest of the exam. Even wasting those five minutes at the beginning will pay off during the remaining time.

My first action was opening the Kubernetes documentation with Firefox.

-> https://kubernetes.io/docs/

After that, I opened the terminal and edited the .vimrc and .bashrc. Yes, I am using Vim and strongly recommend this text editor for the exam. The .vimrc contains three configurations per default, where I only kept the following two.

set tabstop=2
set expandtab

set tabstop=2 defines that a tab stop is two whitespaces long. The other one, set expandtab, tells Vim that a tab stop uses whitespaces instead of the tabulator character, a huge difference when dealing with Kubernetes templates. Besides those two, I added a couple of other configurations to the .vimrc.

set number
set list
set lcs+=space:^
syntax on

I only like to highlight set number and set lcs+=space:^. Having line numbers in Vim is tremendously helpful if you messed up the Kubernetes template and kubectl apply -f returns the line number of the faulty configuration.

set lcs+=space:^ lets Vim display whitespaces as ^ in my case. Kubernetes templates are YAML files. Hence, correct indentation is crucial, and using set lcs+=space:^ makes it straightforward during the exam to accomplish this.

After modifying the .vimrc I added a couple of configurations to the .bashrc to have them also available in the tmux windows.

Before we dive into those configurations, I have another tip for you. When you are not familiar with tmux or another terminal multiplexer, become familiar with terminal multiplexing. You definitely need this in the exam.

alias kx='kubectl config use-context'
alias kn='kubectl config set-context --current --namespace'
alias info='kubectl config get-contexts'
export do="--dry-run=client -o yaml"

The kx, kn, and info aliases let you easily switch between clusters and namespaces and verify that you are working within the correct context.

One of the tips from the exam simulator was the export do=”–dry-run=client -o yaml”. It is a huge time save as you do not need to type every time –dry-run=client -o yaml when you want to create the Kubernetes template with kubectl run or kubectl create from scratch. And you need to do this in every up to every second question.

Before we sum it up, another lifesaving tip from my side. When using kubectl run or kubectl create, always provide the target namespace with –namespace. This saves you from deploying to the wrong namespace when you forget to switch to the correct namespace with the kn alias.

Summary

I hope you find the insights and tips helpful. The Certified Kubernetes Administrator exam is challenging but should not be an issue for people working with Kubernetes daily for several years.

Der Beitrag My preparation and tips for the Certified Kubernetes Administrator exam erschien zuerst auf Daniel's Tech Blog.

↧

Running Fluent Bit on Azure Linux in Azure Kubernetes Service

September 20, 2023, 1:27 pm

≫ Next: Fluent Bit and Kata Containers on Azure Kubernetes Service

≪ Previous: My preparation and tips for the Certified Kubernetes Administrator exam

In May this year, Microsoft announced the general availability of the Azure Linux support in Azure Kubernetes Service.

-> https://azure.microsoft.com/en-us/updates/generally-available-azure-linux-support-in-aks/?WT.mc_id=AZ-MVP-5000119
-> https://techcommunity.microsoft.com/t5/linux-and-open-source-blog/introducing-the-azure-linux-container-host-for-aks/ba-p/3824101?WT.mc_id=AZ-MVP-5000119

Azure Linux is Microsoft’s Linux distribution of CBL-Mariner.

-> https://github.com/microsoft/CBL-Mariner

You can choose now between using Ubuntu or Azure Linux as the host operating system for your node pools in Azure Kubernetes Service, where Ubuntu is still the default.

Today, we quickly focus on running Fluent Bit on Azure Linux in Azure Kubernetes Service.

Situation

In its Azure Kubernetes Service documentation, Microsoft highlighted current limitations when using Azure Linux until September 19, 2023.

Some addons, extensions, and open-source integrations may not be supported yet on Azure Linux. Azure Monitor, Grafana, Helm, Key Vault, and Container Insights are supported.

-> https://learn.microsoft.com/en-us/azure/aks/use-azure-linux?WT.mc_id=AZ-MVP-5000119
-> https://github.com/MicrosoftDocs/azure-docs/commit/54b63b106c932c835f8bf0cc0bb612e774f4d251

That said, they did not explicitly mention Fluent Bit as supported. But Container Insights uses the following open-source tools under the hood: Fluentd, Fluent Bit, and Telegraf.

-> https://github.com/microsoft/Docker-Provider

Hence, Fluent Bit should work without any issues on Azure Linux.

Evidence

I am running an Azure Kubernetes Service cluster with Kubernetes in version 1.27.3 in my Azure subscription. The Azure Kubernetes Service cluster uses Azure Linux as the host operating system for its node pools.

❯ kubectl get nodes -o wide
NAME                              STATUS   ROLES   AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE            KERNEL-VERSION     CONTAINER-RUNTIME
aks-default-33543087-vmss00000l   Ready    agent   7m24s   v1.27.3   10.240.0.4    <none>        CBL-Mariner/Linux   5.15.122.1-2.cm2   containerd://1.6.18
aks-default-33543087-vmss00000m   Ready    agent   7m      v1.27.3   10.240.0.6    <none>        CBL-Mariner/Linux   5.15.122.1-2.cm2   containerd://1.6.18
aks-default-33543087-vmss00000n   Ready    agent   7m9s    v1.27.3   10.240.0.5    <none>        CBL-Mariner/Linux   5.15.122.1-2.cm2   containerd://1.6.18

My next step is the deployment of Fluent Bit in version 2.1.9 to the cluster by running my deployment script.

./deploy-fluent-bit.sh <resource_group> <log_analytics_workspace>

You can find my example configuration on my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit

After a few seconds, Fluent Bit is up and running on Azure Linux.

❯ kubectl get pods
NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-7kdtn   1/1     Running   0          2m30s
fluent-bit-brnmz   1/1     Running   0          2m38s
fluent-bit-v9wjs   1/1     Running   0          2m22s

❯ kubectl logs fluent-bit-7kdtn
Fluent Bit v2.1.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/09/20 18:47:32] [ info] [fluent bit] version=2.1.9, commit=c625ad7ea4, pid=1
[2023/09/20 18:47:32] [ info] [storage] ver=1.4.0, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128
[2023/09/20 18:47:32] [ info] [storage] backlog input plugin: storage_backlog.2
[2023/09/20 18:47:32] [ info] [cmetrics] version=0.6.3
[2023/09/20 18:47:32] [ info] [ctraces ] version=0.3.1
[2023/09/20 18:47:32] [ info] [input:storage_backlog:storage_backlog.2] initializing
[2023/09/20 18:47:32] [ info] [input:storage_backlog:storage_backlog.2] storage_strategy='memory' (memory only)
[2023/09/20 18:47:32] [ info] [input:storage_backlog:storage_backlog.2] queue memory limit: 9.5M
[2023/09/20 18:47:32] [ info] [filter:kubernetes:logs_filter_1] https=1 host=10.240.0.6 port=10250
[2023/09/20 18:47:32] [ info] [filter:kubernetes:logs_filter_1]  token updated
[2023/09/20 18:47:32] [ info] [filter:kubernetes:logs_filter_1] local POD info OK
[2023/09/20 18:47:32] [ info] [filter:kubernetes:logs_filter_1] testing connectivity with Kubelet...
[2023/09/20 18:47:33] [ info] [filter:kubernetes:logs_filter_1] connectivity OK
[2023/09/20 18:47:33] [ info] [filter:kubernetes:events_filter_1] https=1 host=10.240.0.6 port=10250
[2023/09/20 18:47:33] [ info] [filter:kubernetes:events_filter_1]  token updated
[2023/09/20 18:47:33] [ info] [filter:kubernetes:events_filter_1] local POD info OK
[2023/09/20 18:47:33] [ info] [filter:kubernetes:events_filter_1] testing connectivity with Kubelet...
[2023/09/20 18:47:33] [ info] [filter:kubernetes:events_filter_1] connectivity OK
[2023/09/20 18:47:33] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2023/09/20 18:47:33] [ info] [sp] stream processor started

A few minutes later, we see the ingested log data in Log Analytics, the final proof of a correct working Fluent Bit installation on Azure Linux in Azure Kubernetes Service.

Summary

Fluent Bit works out of the box on Azure Linux with the same configuration as on Ubuntu. When you are curious to learn more about Azure Linux, have a look at the following link.

-> https://learn.microsoft.com/en-us/azure/azure-linux/?WT.mc_id=AZ-MVP-5000119

Der Beitrag Running Fluent Bit on Azure Linux in Azure Kubernetes Service erschien zuerst auf Daniel's Tech Blog.

↧

Fluent Bit and Kata Containers on Azure Kubernetes Service

November 2, 2023, 3:47 pm

≫ Next: Using HTTP status code 307/308 for HTTPS redirect with the Istio ingress gateway

≪ Previous: Running Fluent Bit on Azure Linux in Azure Kubernetes Service

In the past, I have written two blog posts about how to run untrusted workloads on Azure Kubernetes Service.

-> https://www.danielstechblog.io/running-gvisor-on-azure-kubernetes-service-for-sandboxing-containers/
-> https://www.danielstechblog.io/using-kata-containers-on-azure-kubernetes-service-for-sandboxing-containers/

Today, I walk you through how you gather log data of an untrusted workload isolated by Kata Containers with Fluent Bit. When you hear isolated, it always comes to mind that only one pattern works to gather log data: the sidecar pattern.

Fluent Bit would run as a sidecar in every isolated pod to ensure the logging capability. Furthermore, you must ensure that your application writes stdout/stderr to a place where the sidecar container can read it or use Fluent Bit’s HTTP ingestion endpoint instead. The latter would be another solution to not have a sidecar container in every isolated pod.

Both solutions have a lot of overhead for gathering log data from untrusted workloads isolated by Kata Containers. Fortunately, Kata Containers integrates well into containerd with Kubernetes via the Shim V2 API.

-> https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/containerd-kata.md#containerd-runtime-v2-api-shim-v2-api

Hence, we continue to use the existing Fluent Bit daemon set installation on our Azure Kubernetes Service cluster. Starting an untrusted workload does not differ from a trusted workload in the context of Fluent Bit. Log files of the different containers in a pod run by Kata Containers are stored under /var/log/containers on the Kubernetes node. An already in place Fluent Bit installation picks up new log data and ingests that log data to the logging backend.

Let us look at one of the Kubernetes nodes that hosts the untrusted workload.

How does it work?

I ran the following command to add a Kata Containers node pool to the Azure Kubernetes Service cluster.

❯ AKS_CLUSTER_RG="rg-azst-1"
❯ AKS_CLUSTER_NAME="aks-azst-1"
❯ az aks nodepool add --cluster-name $AKS_CLUSTER_NAME --resource-group $AKS_CLUSTER_RG \
--name kata --os-sku mariner --workload-runtime KataMshvVmIsolation --node-vm-size Standard_D4s_v3 \
--node-taints kata=enabled:NoSchedule --labels kata=enabled --node-count 1 --zones 1 2 3

-> https://learn.microsoft.com/en-us/azure/aks/use-pod-sandboxing?WT.mc_id=AZ-MVP-5000119

After successfully deploying the Kata Containers node pool, we deploy an untrusted workload to this node pool.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-kata-untrusted
spec:
  containers:
  - name: nginx-kata-untrusted
    image: nginx
  runtimeClassName: kata-mshv-vm-isolation
  tolerations:
    - key: kata
      operator: Equal
      value: "enabled"
      effect: NoSchedule
  nodeSelector:
    kata: enabled

We then use the kubectl debug command to examine the Kubernetes node’s container runtime configuration.

❯ kubectl debug node/aks-kata-26012561-vmss000001 -it --image=ubuntu

The Kubernetes node’s file system gets mounted under /host into the debug pod. Looking at the containerd configuration, we see the Kata Containers configuration.

❯ cat /host/etc/containerd/config.toml
version = 2
oom_score = 0
[plugins."io.containerd.grpc.v1.cri"]
  sandbox_image = "mcr.microsoft.com/oss/kubernetes/pause:3.6"
  [plugins."io.containerd.grpc.v1.cri".containerd]
    default_runtime_name = "runc"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      BinaryName = "/usr/bin/runc"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.untrusted]
      runtime_type = "io.containerd.runc.v2"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.untrusted.options]
      BinaryName = "/usr/bin/runc"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
      runtime_type = "io.containerd.kata.v2"
  [plugins."io.containerd.grpc.v1.cri".registry]
    config_path = "/etc/containerd/certs.d"
  [plugins."io.containerd.grpc.v1.cri".registry.headers]
    X-Meta-Source-Client = ["azure/aks"]
[metrics]
  address = "0.0.0.0:10257"

By checking the runtime class kata-mshv-vm-isolation, we get the handler reference that points to the runtime kata matching the containerd configuration.

❯ kubectl get runtimeclasses.node.k8s.io kata-mshv-vm-isolation -o yaml
apiVersion: node.k8s.io/v1
handler: kata
kind: RuntimeClass
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: kata-mshv-vm-isolation
scheduling:
  nodeSelector:
    kubernetes.azure.com/kata-mshv-vm-isolation: "true"

For the following command, we need the container id of our untrusted workload.

❯ kubectl get pods nginx-kata-untrusted -o json | jq '.status.containerStatuses[].containerID' -r | cut -d '/' -f3
c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d

As we used the kubectl debug command for our debug pod, the pod is not privileged, and we cannot use nsenter to run commands out of the debug pod on the Kubernetes node itself. Instead, we use the run command functionality for Azure Virtual Machine Scale Sets to retrieve the information.

❯ AKS_CLUSTER_RG="rg-aks-azst-1-nodes"
❯ AKS_CLUSTER_VMSS="aks-kata-26012561-vmss"
❯ AKS_CLUSTER_VMSS_INSTANCE=0

❯ az vmss run-command invoke -g $AKS_CLUSTER_RG -n $AKS_CLUSTER_VMSS --command-id RunShellScript \
  --instance-id $AKS_CLUSTER_VMSS_INSTANCE --scripts "systemctl status containerd | grep c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d"
{
  "value": [
    {
      "code": "ProvisioningState/succeeded",
      "displayStatus": "Provisioning succeeded",
      "level": "Info",
      "message": "Enable succeeded:
      [stdout]
      Sep 26 20:23:38 aks-kata-26012561-vmss000000 containerd[1392]: time=\"2023-09-26T20:23:38.736319554Z\" level=info msg=\"CreateContainer within sandbox \\\"a49dceacf12b7786e9ff93eecf9c17a0c52f8d101975503acef7b4ecc8d9e06c\\\" for &ContainerMetadata{Name:nginx-kata-untrusted,Attempt:0,} returns container id \\\"c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d\\\"\"
      Sep 26 20:23:38 aks-kata-26012561-vmss000000 containerd[1392]: time=\"2023-09-26T20:23:38.737003061Z\" level=info msg=\"StartContainer for \\\"c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d\\\"\"
      Sep 26 20:23:38 aks-kata-26012561-vmss000000 containerd[1392]: time=\"2023-09-26T20:23:38.811806460Z\" level=info msg=\"StartContainer for \\\"c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d\\\" returns successfully\"
      [stderr]
      ",
      "time": null
    }
  ]
}

The command above returns information from the containerd status showing that the untrusted workload has been scheduled and started correctly.

With the following command, we check if the untrusted workload runs in the Kata Containers sandbox.

❯ az vmss run-command invoke -g $AKS_CLUSTER_RG -n $AKS_CLUSTER_VMSS --command-id RunShellScript \
  --instance-id $AKS_CLUSTER_VMSS_INSTANCE --scripts "systemctl list-units | grep kata | grep c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d"
  {
    "value": [
      {
        "code": "ProvisioningState/succeeded",
        "displayStatus": "Provisioning succeeded",
        "level": "Info",
        "message": "Enable succeeded:
        [stdout]
        15d42cb3b4c287-hostname
          run-kata\\x2dcontainers-shared-sandboxes-a49dceacf12b7786e9ff93eecf9c17a0c52f8d101975503acef7b4ecc8d9e06c-mounts-c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d\\x2dae2fbc5dd0f923ed\\x2dtermination\\x2dlog.mount loaded active mounted   /run/kata-containers/shared/sandboxes/a49dceacf12b7786e9ff93eecf9c17a0c52f8d101975503acef7b4ecc8d9e06c/mounts/c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d-ae2fbc5dd0f923ed-termination-log
          ...
          run-kata\\x2dcontainers-shared-sandboxes-a49dceacf12b7786e9ff93eecf9c17a0c52f8d101975503acef7b4ecc8d9e06c-shared-c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d\\x2dc9c1c86332ee8d01\\x2dhosts.mount              loaded active mounted   /run/kata-containers/shared/sandboxes/a49dceacf12b7786e9ff93eecf9c17a0c52f8d101975503acef7b4ecc8d9e06c/shared/c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d-c9c1c86332ee8d01-hosts
        [stderr]
        ",
        "time": null
      }
    ]
  }

In the output, we see the key strings run-kata as well /run/kata-containers. As mentioned earlier, we find the logs of a Kata Containers sandboxed pod under the standard log path for container logs on the Kubernetes node, which is /var/log/containers.

Again, we use the debug pod to examine the log file path.

❯ ls -ahl /host/var/log/containers | grep c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d

lrwxrwxrwx  1 root root  106 Sep 26 20:23 nginx-kata-untrusted_logging_nginx-kata-untrusted-c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d.log -> /var/log/pods/logging_nginx-kata-untrusted_5d775411-fc5c-4a96-bfd6-f5fc3c01194d/nginx-kata-untrusted/0.log

Finally, we run the following KQL query on the Azure Log Analytics workspace, retrieving the logs of the untrusted workload.

ContainerLogV2_CL
| where PodName_s == "nginx-kata-untrusted" and ContainerId_s == "c5fddf80b5f758af8d267477a48397d11b1b98bbfd0f304512fa959f38b88e9d"
| project TimeGenerated, PodNamespace_s, PodName_s, LogMessage_s, Computer

Summary

Using Kata Containers on Azure Kubernetes Service lets you isolate untrusted workloads and ensures at the same time by integrating into containerd with Kubernetes via the Shim V2 API that the existing logging solution does not need any additional or specific configuration to pick up the logs from an untrusted workload.

You can find the example pod template in my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/tree/master/kata-containers

Der Beitrag Fluent Bit and Kata Containers on Azure Kubernetes Service erschien zuerst auf Daniel's Tech Blog.

↧

Using HTTP status code 307/308 for HTTPS redirect with the Istio ingress gateway

December 6, 2023, 1:58 pm

≫ Next: Azure PostgreSQL Flexible Server – Feature set on par with Single Server

≪ Previous: Fluent Bit and Kata Containers on Azure Kubernetes Service

The gateway definition for the Istio ingress gateway provides a configuration parameter to enable the HTTPS redirect of HTTP connections.

-> https://istio.io/latest/docs/reference/config/networking/gateway/#ServerTLSSettings

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: azst-aks-gateway
  namespace: istio-config
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - hosts:
    - "*.danielstechblog.de"
    port:
      number: 80
      name: http
      protocol: HTTP
    tls:
      httpsRedirect: true
  - hosts:
    - "*.danielstechblog.de"
    port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: istio-ingress-cert

When the httpsRedirect parameter is true, the Istio ingress gateway sends a 301 redirect for HTTP connections to use HTTPS.

For most scenarios, this is sufficient enough. The downside of using a 301 redirect is that a POST method might arrive as a GET request at the HTTPS endpoint and cause unexpected behavior from a user perspective. Even though the specification requires that the method and body remain unchanged, not all user agents follow this. It applies to the 302 redirect as well.

-> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/301
-> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/302

To provide an expected and consistent user experience you use a 307 or 308 redirect. Both redirects guarantee that the method and body remain unchanged. Unfortunately, some web applications use the 308 redirect in a non-standard way. Hence, using the 307 redirect is the most generic way.

-> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/307
-> https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/308

Let us now dive into the configuration for the Istio ingress gateway.

Using a 307/308 HTTP redirect on Istio

When we want to provide a custom redirect like the 307 redirect, we need two gateway definitions: One for the actual HTTP to HTTP redirect and a second one that handles the ingress routing to the correct application.

The issue with only having one gateway definition is that the HTTP to HTTPS redirect acts as a catch-all directive in the routing chain. In the case of a matching entry in the routing chain, requests to an application would be served unencrypted via HTTP instead of encrypted via HTTPS.

Below is the first gateway definition for the 307 redirect.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: azst-aks-gateway-redirect
  namespace: istio-config
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - hosts:
    - "*.danielstechblog.de"
    port:
      number: 80
      name: http
      protocol: HTTP

Here is the second one for the actual ingress routing.

apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: azst-aks-gateway
  namespace: istio-config
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - hosts:
    - "*.danielstechblog.de"
    port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: istio-ingress-cert

Now, we apply the following virtual service definition to the gateway that does the 307 redirect.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: redirect
  namespace: istio-config
spec:
  gateways:
    - azst-aks-gateway-redirect
  hosts:
    - "*.danielstechblog.de"
  http:
    - name: redirect
      redirect:
        redirectCode: 307
        scheme: https

A quick validation with the following curl command shows that the 307 redirect works.

❯ curl -sIL http://aks.danielstechblog.de
HTTP/1.1 307 Temporary Redirect
location: https://aks.danielstechblog.de/
date: Tue, 05 Dec 2023 21:12:52 GMT
server: istio-envoy
transfer-encoding: chunked

HTTP/2 200
date: Tue, 05 Dec 2023 21:12:52 GMT
content-length: 1487
content-type: text/html; charset=utf-8
x-envoy-upstream-service-time: 12
server: istio-envoy

Summary

We have to keep several things in mind for the 307 redirect implementation, and it might be that the built-in 301 redirect is enough for your use cases. So, you can go with that or with a few configuration changes using a custom redirect.

You can find the example configurations in my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/tree/master/istio-custom-redirect

Der Beitrag Using HTTP status code 307/308 for HTTPS redirect with the Istio ingress gateway erschien zuerst auf Daniel's Tech Blog.

↧

Azure PostgreSQL Flexible Server – Feature set on par with Single Server

December 31, 2023, 1:28 am

≫ Next: Configure Microsoft Defender for Cloud continuous export via Terraform

≪ Previous: Using HTTP status code 307/308 for HTTPS redirect with the Istio ingress gateway

The Azure PostgreSQL Flexible Server was from its launch the better option than the Single Server, especially from a performance perspective. However, the Flexible Server was missing important features that were built-in in the Single Server from the beginning.

Since the retirement announcement of the Single Server, it was time for Microsoft to bring the Flexible Server feature set on par.

-> https://azure.microsoft.com/en-us/updates/azure-database-for-postgresql-single-server-will-be-retired-migrate-to-flexible-server-by-28-march-2025?WT.mc_id=AZ-MVP-5000119
-> https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/retiring-azure-database-for-postgresql-single-server-in-2025/ba-p/3783783?WT.mc_id=AZ-MVP-5000119

Eventually, that happened in the fourth quarter of 2023.

What was missing?

A couple of relevant features for the so-called day-2 operations were missing. Let us start first with the most annoying ones.

Till the autumn timeframe of 2023, a compute or a storage resize required downtime of the DBMS. Even worse, when you resized compute and storage, you had two downtimes after another. In the worst cases I have seen, the DBMS was down for 15-20 minutes. With the missing online storage resize, the storage auto-growth was unavailable and required close monitoring of the storage consumption.

Since the autumn timeframe of 2023, the Flexible Server now supports online storage resize and storage auto-growth. Only crossing the 4 TB storage size boundary still requires an offline resize.

-> https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-compute-storage?WT.mc_id=AZ-MVP-5000119#limitations-and-considerations

Another helpful feature that is only available on the Flexible Server is the performance tier. You can provide more IOPS to your Flexible Server without increasing the storage size, which is also an online operation.

-> https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-compute-storage?WT.mc_id=AZ-MVP-5000119#iops-preview

When we look at the compute resize, there has been a significant improvement. Preconditioned that the region has enough capacity, the compute resize will be near zero downtime. So, the resize has a downtime of less than 30 seconds.

-> https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-scaling-resources?WT.mc_id=AZ-MVP-5000119
-> https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/near-zero-downtime-scaling-in-azure-database-for-postgresql/ba-p/3974282?WT.mc_id=AZ-MVP-5000119

If capacity is not available, the compute resize has an increased downtime as the operation falls back to the standard resize procedure, from my experience, 3-10 minutes.

A neat feature from the Single Server currently in preview for the Flexible Server is the server logs functionality. The server logs allow you to download the logs every hour from the DBMS without sending them to Log Analytics via the diagnostic settings.

-> https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/how-to-server-logs-portal?WT.mc_id=AZ-MVP-5000119
-> https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/howto-configure-and-access-logs?WT.mc_id=AZ-MVP-5000119

Last but not least, what bothered me the most was the missing integration of Microsoft Defender for Cloud for the Azure PostgreSQL Flexible Server. This feature is available since mid-December 2023.

-> https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/enhance-your-azure-postgresql-flexible-server-security-posture/ba-p/3999093?WT.mc_id=AZ-MVP-5000119

When you are still operating Azure PostgreSQL Single Server installations and have Microsoft Defender for Cloud configured your Flexible Server installations are automatically onboarded and protected. Otherwise, you need to enable Microsoft Defender for Cloud.

Summary

The latest additions to the Azure PostgreSQL Flexible Server feature set bring the Flexible Server on par with the Single Server and have removed the last show stoppers for a migration.

Der Beitrag Azure PostgreSQL Flexible Server – Feature set on par with Single Server erschien zuerst auf Daniel's Tech Blog.

↧

Configure Microsoft Defender for Cloud continuous export via Terraform

February 18, 2024, 12:58 am

≫ Next: Azure Kubernetes Fleet Manager – Advance your Kubernetes cluster update management on Azure

≪ Previous: Azure PostgreSQL Flexible Server – Feature set on par with Single Server

Microsoft Defender for Cloud supports the continuous export of a variety of data to Azure Event Hubs and Azure Log Analytics workspaces. When you use Azure Event Hubs, you can stream those data also to 3rd-party solutions or Azure Data Explorer. The continuous export is handy for security alerts to maintain them for a longer period than the default 90 days.

Using the Azure portal to configure the continuous export functionality is straightforward, but it gets cumbersome when configuring it for multiple subscriptions.

Infrastructure as code

Here comes infrastructure as code into play to automate the configuration in a well-defined way. In our example, we use Terraform and the azurerm_security_center_automation resource to export security alerts to a Log Analytics workspace.

data "azurerm_client_config" "current" {
}

resource "azurerm_security_center_automation" "continuous_export" {
  name                = var.name
  location            = var.location
  resource_group_name = var.resource_group_name

  enabled = true

  action {
    type        = "loganalytics"
    resource_id = var.log_analytics_workspace_id
  }

  source {
    event_source = "Alerts"
    rule_set {
      rule {
        property_path  = "Severity"
        operator       = "Equals"
        expected_value = "high"
        property_type  = "String"
      }
    }
    rule_set {
      rule {
        property_path  = "Severity"
        operator       = "Equals"
        expected_value = "medium"
        property_type  = "String"
      }
    }
    rule_set {
      rule {
        property_path  = "Severity"
        operator       = "Equals"
        expected_value = "low"
        property_type  = "String"
      }
    }
  }

  scopes = ["/subscriptions/${data.azurerm_client_config.current.subscription_id}"]
}

We provide the Terraform module with the required inputs and apply those configuration changes.

module "microsoft_defender_continuous_export" {
  source = "../modules/microsoft_defender_continuous_export"

  name                       = "continuous-export"
  resource_group_name        = "continuous-export-config"
  location                   = "northeurope"
  log_analytics_workspace_id = "/subscriptions/<subscription_id>/resourceGroups/operations-management/providers/Microsoft.OperationalInsights/workspaces/sentinel-sec"
}

Afterward, we have a look into the Azure portal and see that the export object gets created. However, the settings page for the continuous export still represents an unconfigured continuous export.

Does our configuration actually work? The answer is yes, and triggering demo alerts, for instance, for Azure Key Vault, provides the proof.

As seen in the screenshot above the security alerts got exported to the Log Analytics workspace.

Make continuous export configuration visible

There is still the question of why the settings page shows an unconfigured continuous export. First, we can configure multiple continuous exports for Microsoft Defender for Cloud on a subscription with different targets, for instance, different Log Analytics workspaces. Second, the settings page expects a specific name for the configuration, and the name is ExportToWorkspace.

Now we know how to make the configuration visible in the Azure portal on the settings page.

module "microsoft_defender_continuous_export" {
  source = "../modules/microsoft_defender_continuous_export"

  name                       = "ExportToWorkspace"
  resource_group_name        = "continuous-export-config"
  location                   = "northeurope"
  log_analytics_workspace_id = "/subscriptions/<subscription_id>/resourceGroups/operations-management/providers/Microsoft.OperationalInsights/workspaces/sentinel-sec"
}

Applying the adapted Terraform module deletes the former export object and creates a new one with the name ExportToWorkspace.

The configuration is now visible. Again, we trigger demo alerts to verify our configuration.

Summary

When configuring the continuous export of Microsoft Defender for Cloud for a subscription, you should use the name ExportToWorkspace or ExportToHub. Using the expected names ensures that the default continuous export configuration is visible in the Azure portal on the settings page. Additional continuous export configurations can have a different name and are fully functional besides the default continuous export configuration.

You can find the Terraform module on my GitHub repository.

-> https://github.com/neumanndaniel/terraform/tree/master/modules/microsoft_defender_continuous_export

↧

Azure Kubernetes Fleet Manager – Advance your Kubernetes cluster update management on Azure

February 25, 2024, 12:24 pm

≫ Next: Show enabled feature gates on an Azure Kubernetes Service cluster

≪ Previous: Configure Microsoft Defender for Cloud continuous export via Terraform

The Azure Kubernetes Fleet Manager comes with two different configuration options with and without a hub cluster configuration.

In today’s blog post, we focus on the Azure Kubernetes Fleet Manager without a hub cluster configuration. This configuration option only provides the Azure Kubernetes Service update management, and this is our focus for today,

Before we dive into the topic, let us step back and answer the question of why we need the Azure Kubernetes Fleet Manager in times of infrastructure as code.

Why we need the Azure Kubernetes Fleet Manager?

Imagine you use Terraform for your infrastructure as code and use either GitHub Actions or Terraform Cloud to apply your definitions. You have a large number of Azure Kubernetes Service clusters with hundreds of nodes each. Depending on how you have configured the max surge setting, a Kubernetes version upgrade can take a long time.

This is a problem in two ways. First, the Terraform provider for Azure has a default 90-minute timeout configured for the azurerm_kubernetes_cluster resource. It might be that you need to overwrite the default value to prevent running into a timeout. Second, costs. For instance, GitHub Actions are billed per minute and have default timeouts as well.

Another issue arises with infrastructure as code when using the Azure Kubernetes Service automated Kubernetes version upgrades. You must use then the lifecycle ignore changes instructions to prevent changes for the Kubernetes version when you apply your infrastructure as code configuration. This makes a Kubernetes version upgrade a bit more work, as you need to remove the instructions first and re-add them later.

Here comes the Azure Kubernetes Fleet Manager into play, which allows better scheduling, control, and execution of Kubernetes version upgrades.

Fleet Manager Deployment

In the Azure portal, we create a new Azure Kubernetes Fleet Manager instance and select the hub cluster mode without hub cluster.

Afterward, we onboard the existing Azure Kubernetes Service cluster in this example aks-azst-1 and aks-azst-2, both running Kubernetes version 1.27.7.

During the Azure Kubernetes Service cluster onboarding to the Fleet Manager instance, you can define update groups for each cluster that are important at a later stage. In our case, aks-azst-1 is assigned to the canary update group and aks-azst-2 to the production one.

The clusters are now members of the Fleet Manager instance and show up on the overview page reporting their Kubernetes and node OS image versions.

Define an update strategy

Before we start a Kubernetes cluster upgrade of our Azure Kubernetes Service, we define an update strategy to control the overall update process.

Within a strategy, we can define multiple stages to control the update process. The first stage is called canary and targets the canary update group. Before we proceed with the next stage in the strategy, we will pause the update of other Azure Kubernetes Service clusters for an hour.

Our second stage is called production and targets our production update group.

The update strategy is now ready to be used within an update run.

Update stages are executed sequentially during an update run, and all update groups within a stage in parallel.

-> https://learn.microsoft.com/en-us/azure/kubernetes-fleet/architectural-overview?WT.mc_id=AZ-MVP-5000119#update-orchestration-across-multiple-clusters

Execute an update run

Defining an update run does not execute the Kubernetes upgrade immediately. It has to be triggered manually by starting the defined update run.

For our update run, we set the update sequence to stages and copy the stages from our previously created update strategy. We set the upgrade scope to Kubernetes version and select 1.28.3 as the target version. Furthermore, we want to use the latest available node OS image in the respective Azure regions.

As said before, an update run does not start automatically. Hence, we select the update run and hit Start, which kicks off the run.

By clicking on the update run and the different stages, we get important information about the execution states of our update in progress.

Once the update run succeeded, we can check the clusters’ Kubernetes and node OS image versions on the Fleet Manager’s overview page.

Both Azure Kubernetes Service clusters are now running Kubernetes version 1.28.3.

Summary

The Azure Kubernetes Fleet Manager is an invaluable addition to the Azure Kubernetes Service cluster management in Azure and makes the Kubernetes version upgrade a breeze, as well in combination with already existing infrastructure as code configurations.

Besides the Kubernetes version upgrade capabilities, the Fleet Manager can also replicate and maintain Kubernetes resource objects when running the Fleet Manager with the hub cluster configuration.

-> https://learn.microsoft.com/en-us/azure/kubernetes-fleet/resource-propagation?WT.mc_id=AZ-MVP-5000119

You can use Terraform to define the bespoken update configuration entirely in infrastructure as code.

-> Create Azure Kubernetes Fleet Manager: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_fleet_manager
-> Add member clusters: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_fleet_member
-> Define an update strategy: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_fleet_update_strategy

The only thing right now that needs to be defined and executed manually is the actual update run.

For further information about the Azure Kubernetes Fleet Manager, have a look at the Azure documentation.

-> https://learn.microsoft.com/en-us/azure/kubernetes-fleet/?WT.mc_id=AZ-MVP-5000119

↧

Show enabled feature gates on an Azure Kubernetes Service cluster

March 6, 2024, 12:06 am

≫ Next: Using Istio with Kubernetes native sidecars on Azure Kubernetes Service

≪ Previous: Azure Kubernetes Fleet Manager – Advance your Kubernetes cluster update management on Azure

Recently, I needed to check which feature gates are active on an Azure Kubernetes Service cluster running Kubernetes version 1.29.0. In particular, the SidecarContainers feature gate brings support for running sidecar containers as init containers. For instance, a service mesh proxy container now starts before the main container and solves a couple of issues with service mesh proxies in Kubernetes.

The SidecarContainers feature gate was introduced in Kubernetes version 1.28 as an alpha version and graduated to beta with Kubernetes version 1.29.

-> https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

Microsoft is saying that per default beta feature gates are enabled on Azure Kubernetes Service. But I wanted to be sure upfront before trying out the new init containers implementation.

So, I ran a search engine query and came across a Kubernetes GitHub issue and the Google Cloud Platform documentation.

-> https://github.com/kubernetes/kubernetes/issues/87869#issuecomment-1465634605
-> https://cloud.google.com/kubernetes-engine/docs/concepts/feature-gates#check-feature-gate-state

Since Kubernetes version 1.26 and later, feature gates information is available via the /metrics endpoint as a gauge. A one indicates the feature gate is enabled, and a zero indicates it is disabled.

I ran the following command to determine if the SidecarContainers feature gate is enabled on Azure Kubernetes Service running Kubernetes version 1.29.

❯ kubectl get --raw /metrics | grep kubernetes_feature_enabled | grep Sidecar
kubernetes_feature_enabled{name="SidecarContainers",stage="BETA"} 1

When you want to show all feature gates, remove the last grep command to get the entire list.

❯ kubectl get --raw /metrics | grep kubernetes_feature_enabled
# HELP kubernetes_feature_enabled [BETA] This metric records the data about the stage and enablement of a k8s feature.
# TYPE kubernetes_feature_enabled gauge
kubernetes_feature_enabled{name="APIListChunking",stage=""} 1
kubernetes_feature_enabled{name="APIPriorityAndFairness",stage=""} 1
kubernetes_feature_enabled{name="APIResponseCompression",stage="BETA"} 1
kubernetes_feature_enabled{name="APISelfSubjectReview",stage=""} 1
kubernetes_feature_enabled{name="APIServerIdentity",stage="BETA"} 1
...
kubernetes_feature_enabled{name="WatchList",stage="ALPHA"} 0
kubernetes_feature_enabled{name="WinDSR",stage="ALPHA"} 0
kubernetes_feature_enabled{name="WinOverlay",stage="BETA"} 1
kubernetes_feature_enabled{name="WindowsHostNetwork",stage="ALPHA"} 1
kubernetes_feature_enabled{name="ZeroLimitedNominalConcurrencyShares",stage="BETA"} 0

↧

Using Istio with Kubernetes native sidecars on Azure Kubernetes Service

March 18, 2024, 1:16 am

≫ Next: Cost optimize your Azure PostgreSQL Flexible Server deployments

≪ Previous: Show enabled feature gates on an Azure Kubernetes Service cluster

In my previous blog post, I showed you how to check for specific feature gates on an Azure Kubernetes Service cluster.

-> https://www.danielstechblog.io/show-enabled-feature-gates-on-an-azure-kubernetes-service-cluster/

Especially for the SidecarContainers feature gate, which is enabled on Azure Kubernetes Service running Kubernetes version 1.29 or higher.

The SidecarContainers feature gate brings support for running sidecar containers as init containers. For instance, a service mesh proxy container now starts before the main container and solves a couple of issues with service mesh proxies in Kubernetes.

It was introduced in Kubernetes version 1.28 as an alpha version and graduated to beta with Kubernetes version 1.29.

-> https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

Today, I am walking you through how to use Istio with Kubernetes native sidecars on Azure Kubernetes Service.

As stated in the Istio blog post from 2023, it is an environment variable called ENABLE_NATIVE_SIDECARS that needs to be set to true.

-> https://istio.io/latest/blog/2023/native-sidecars/

I use the IstioOperator custom resource definition to define my Istio installation configuration options in a YAML file.

The following configuration activates the Kubernetes native sidecar support in Istio.

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: istiocontrolplane
spec:
  components:
    ...
  meshConfig:
    ...
  values:
    global:
      ...
    pilot:
      env:
        PILOT_ENABLE_STATUS: true
        ENABLE_NATIVE_SIDECARS: true
    sidecarInjectorWebhook:
      rewriteAppHTTPProbe: true

After applying the IstioOperator configuration, we check if the istio-proxy is now running as an init container. For that, I deployed a simple container application in its own namespace.

❯ kubectl images -c 1,2
[Summary]: 1 namespaces, 3 pods, 9 containers and 2 different images
+----------------------------+--------------------+
|            Pod             |     Container      |
+----------------------------+--------------------+
| go-webapp-64cc9779d4-8kp7m | go-webapp          |
+                            +--------------------+
|                            | (init) istio-init  |
+                            +--------------------+
|                            | (init) istio-proxy |
+----------------------------+--------------------+
| go-webapp-64cc9779d4-f4hrf | go-webapp          |
+                            +--------------------+
|                            | (init) istio-init  |
+                            +--------------------+
|                            | (init) istio-proxy |
+----------------------------+--------------------+
| go-webapp-64cc9779d4-mrbc9 | go-webapp          |
+                            +--------------------+
|                            | (init) istio-init  |
+                            +--------------------+
|                            | (init) istio-proxy |
+----------------------------+--------------------+

As seen in the above output, the istio-proxy is now running as a Kubernetes native sidecar.

You can find the full example IstioOperator configuration file on my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/blob/master/istio/istio-1.21.yaml

↧

Cost optimize your Azure PostgreSQL Flexible Server deployments

May 21, 2024, 12:54 pm

≫ Next: Microsoft drops data transfer charges between Availability Zones

≪ Previous: Using Istio with Kubernetes native sidecars on Azure Kubernetes Service

As I am currently preparing my session for Experts Live Germany about Azure Cost Optimization, I thought it might be good to share parts of the session as blog articles with the community. So, expect more to come in the next weeks and months.

Today we focus on cost-optimizing Azure PostgreSQL Flexible Server deployments. Therefore, we look at the different compute SKUs, the performance tier feature, and the Premium SSD v2 support.

Compute SKU

When we look at the current compute SKU offering for the Flexible Server, we have the choice between V3, V4, and V5 for the General Purpose and Memory Optimized offerings. Between those compute SKU versions, depending on the workload, you have a major performance difference.

According to Microsoft, the V4 can be up to 40% faster than the V3.

“So, depending on your workload and your data size, you could expect up to 40% performance improvement with V4 series compared to V3.”

-> https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/flexible-server-now-supports-v4-compute-series-in-postgresql-on/ba-p/2815092?WT.mc_id=AZ-MVP-5000119

Up to 50% more performance is provided by the V5 compared to the V4.

“The addition of Intel V5 Ddsv5 and EdsV5 compute support for Azure Database for PostgreSQL Flexible Server in select regions takes performance to new heights offering 50% increase in core to memory ratio compared to the previous generation (V4 SKU).”

-> https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/introducing-intel-v5-compute-and-32-tb-storage-support-on-azure/ba-p/3839849?WT.mc_id=AZ-MVP-5000119

Let us look at the following table, where we compare the smallest General Purpose Flexible Server SKU versions against each other that offer two cores and 8 GB memory.

SKU version	Performance improvement compared to the previous SKU version	Compute instance	Region	Price per month (14.05.2024)
V3	–	D2s_v3	West Europe	139 €
V4	Up to 40%	D2ds_v4	West Europe	145 €
V5	Up to 50%	D2ds_v5 D2ads_v5	West Europe	145 €

The price difference per month is small in this example and gets bigger with larger instances. The interesting part, which is unfortunate, is the missing reservations for the V5 SKU version. So, the general advice regarding the SKU version is to switch from a V3 to a V4 or V5. Even it means you pay a bit more per month, the SKU version switch might save you from scaling the current instance to the next instance size, which would double the costs.

Performance Tier

Still in preview, the performance tier feature allows you to provide more IOPS and throughput to the currently selected disk size as it would have per default. Before that feature, you would have to scale to the next available disk size or an even larger one to achieve this, even if you do not need the additional disk space.

Depending on how much disk speed you need, you must scale the Azure PostgreSQL Flexible Server deployment to the next instance size, as this is then the limiting factor.

The only downside of the performance tier feature is the cost increase. It is the same increase if you would scale to the next larger disk size. On the other hand, you can scale down the performance tier to the disk’s default one, which is not possible for the disk size. This makes the performance tier compelling again in scenarios where you only need additional performance for peak times.

Furthermore, the performance tier feature might be an intermediate solution for more disk performance where no increase in disk size is needed till Premium SSDs v2 are generally available for Azure PostgreSQL Flexible Server.

Premium SSD v2

Premium SSD v2 is another feature in preview for the Azure PostgreSQL Flexible Server but with a lot of limitations that do not make it compelling right now for every use case.

-> https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-storage?WT.mc_id=AZ-MVP-5000119#premium-ssd-v2-early-preview-limitations

But we are not here to talk about the limits. We want to understand what Premium SSD v2 improves for the Flexible Server deployments.

The Premium SSD v2 feature provides total flexibility in size, IOPS, and throughput. All three storage configuration options can be adapted independently from each other. Instead of choosing a preconfigured disk size, we can fine-tune the storage to the needs of the databases hosted by the Flexible Server.

Another advantage of the Premium SSD v2 option compared to the Premium SSD option is the better cost-performance ratio.

Let us have a look into an example Flexible Server configuration using the Standard_D2s_v3 compute SKU in Central US with the Premium SSD v2 option and without.

As seen above, both Flexible Servers have 128 GB storage and cost the same per month. Using the Premium SSD v2 option provides the Flexible Server constantly with 3000 IOPS and 125 MB/s throughput. The Premium SSD option only provides 500 IOPS and 100 MB/s throughput but can burst occasionally above these baseline values.

When we now want to have the performance of a P30 disk size, we scale IOPS and throughput on the Premium SSD v2 to 5000 IOPS and 200 MB/s throughput. The other Flexible Server can be scaled by increasing the disk size or using the performance tier feature. From a cost perspective, both options are the same. Comparing Premium SSDs v2 against Premium SSD, the Premium SSD v2 option saves us around 66 € per month.

Summary

The potential for the cost optimization of an Azure PostgreSQL Flexible Server deployment depends on the compute SKU. When running on compute SKU V3 or V4, there is an optimization potential available.

On the storage side, both features are in preview. The only real option right now is the performance tier, as Premium SSDs v2 have too many limitations at the moment. But the performance tier is only an interim solution till the general availability of the Premium SSD v2 feature.

↧

Microsoft drops data transfer charges between Availability Zones

June 7, 2024, 12:09 am

≫ Next: Ingesting Azure Diagnostic Logs into Azure Data Explorer

≪ Previous: Cost optimize your Azure PostgreSQL Flexible Server deployments

An important announcement during this year’s Microsoft Build has not gotten much traction and might have been missed in the vast amount of updates and announcements.

Microsoft did an update to its pricing structure for inter-zone traffic.

“We are announcing that Azure will not charge for the data transfer across availability zones regardless of using private or public IPs on your Azure resources.”

-> https://azure.microsoft.com/en-us/updates/update-on-interavailability-zone-data-transfer-pricing/

They drop the charges for traffic between Availability Zones within an Azure region. Before that, you were charged $0.01 per GB.

-> https://web.archive.org/web/20240520015843/https://azure.microsoft.com/en-us/pricing/details/bandwidth/

As of writing this blog article, 07.06.2024, GCP and AWS still charge for inter-zone traffic. For GCP, it is $0.01 per GB, and for AWS, it is $0.01 per GB.

-> https://cloud.google.com/vpc/network-pricing

“Data transferred “in” to and “out” from Amazon EC2, Amazon RDS, Amazon Redshift, Amazon DynamoDB Accelerator (DAX), and Amazon ElastiCache instances, Elastic Network Interfaces or VPC Peering connections across Availability Zones in the same AWS Region is charged at $0.01/GB in each direction.”

-> https://aws.amazon.com/ec2/pricing/on-demand/#Data_Transfer_within_the_same_AWS_Region

↧

Ingesting Azure Diagnostic Logs into Azure Data Explorer

June 30, 2024, 11:39 am

≫ Next: Using Azure Data Explorer as logging backend for Fluent Bit

≪ Previous: Microsoft drops data transfer charges between Availability Zones

In today’s blog post, we look at the Azure Diagnostic Logs and how to ingest them into Azure Data Explorer. Besides the Diagnostic Logs, we cover Activity Logs and Diagnostic Metrics as well.

All three log and monitor data can be easily exported to an Azure Storage Account, an Event Hub, or a Log Analytics workspace.

Unfortunately, there is no direct export integration for Azure Data Explorer available.

Azure Data Explorer – Ingestion Method

Looking at the export options for the log and monitor data we can choose between an Azure Storage Account or an Event Hub. Both solutions act as a middleware component to store data till it is ingested into Azure Data Explorer.

The Storage Account option might be the one that will get expensive if a lot of files are written and read from the Storage Account due to transaction costs. Furthermore, the Storage Account option requires an Event Grid and Event Hub to get the log and monitor data into Azure Data Explorer. On the other hand, you can store the exported data in the Storage Account for a longer period.

In our case, we choose the Azure Event Hub export option. I created the Event Hub Namespace upfront with two dedicated Event Hubs each of them with four partitions and enabled the auto-inflate functionality for the Event Hub throughput units.

Configure ingestion to Azure Data Explorer

Before we start to prepare everything on the Azure Data Explorer side, we configure the export of the Activity Logs of the subscription to the Event Hub activity_logs. Followed by the Diagnostic Logs and Metrics for the Azure Data Explorer cluster to the Event Hub diagnostic_logs.

Now, we begin with the Azure Data Explorer configuration. Microsoft has excellent documentation, which I used as a base for the following configuration.

-> https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-no-code?WT.mc_id=AZ-MVP-5000119&tabs=diagnostic-logs

We now walk in detail through the Diagnostic Logs case. The link to the code examples for the Activity Logs and Diagnostic Metrics is at the end of this blog post.

Our first step is the provisioning of a new database in our Azure Data Explorer cluster, which is called AzureMonitor with the default settings.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-cluster-and-database?WT.mc_id=AZ-MVP-5000119&tabs=full#create-a-database

Once created, we open the query editor to prepare the tables for the log and monitor data.

The initial table for the Diagnostic Logs only contained one column called RawRecords to understand the structure of the Diagnostic Logs. After understanding the structure, I used the alter command to have the final structure of the table and also updated the function we will see later on. When we look at our first KQL command, you recognize that I kept the column RawRecords. You do not need to do the same.

.create table DiagnosticLogs (
    TimeGenerated: datetime, ResourceId: string, OperationName: string,
    OperationVersion: string, Category: string, CorrelationId: string,
    Result: string, Properties: dynamic, RawRecord: dynamic
    )

After running the command, we have our table called DiagnosticLogs. The next table we create is called DiagnosticRawRecords and is used for the data ingestion from the diagnostic_logs Event Hub.

.create table DiagnosticRawRecords (Records: dynamic)

As we do not want to store data in this table, we set the retention policy to 0.

.alter-merge table DiagnosticRawRecords policy retention softdelete = 0d

Our next step is the ingestion mapping to ensure a correct ingestion into the table.

.create table DiagnosticRawRecords ingestion json mapping 'DiagnosticRawRecordsMapping' '[{"column":"Records","Properties":{"path":"$.records"}}]'

Getting the ingested log data and monitor data into the target table DiagnosticLogs requires a KQL function and an update policy on the table.

.create function DiagnosticLogsExpand() {
        DiagnosticRawRecords
        | mv-expand events = Records
        | where isnotempty(events.operationName)
        | project
            TimeGenerated = todatetime(events['time']),
            ResourceId = tostring(events.resourceId),
            OperationName = tostring(events.operationName),
            OperationVersion = tostring(events.operationVersion),
            Category = tostring(events.category),
            CorrelationId = tostring(events.correlationId),
            Result = tostring(events.resultType),
            Properties = events.properties,
            RawRecord = events
    }

The KQL function above uses the mv-expand operator to extract the different JSON values from the Records column into a new output called events. With the project operator, we map those values onto our target table structure.

.alter table DiagnosticLogs policy update @'[{"Source": "DiagnosticRawRecords", "Query": "DiagnosticLogsExpand()", "IsEnabled": "True", "IsTransactional": true}]'

By running the above KQL command, we update the policy on our target table whenever a new record arrives in the source table DiagnosticRawRecords Azure Data Explorer executes the previously defined function and ingests the result into our target table DiagnosticLogs.

-> https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/update-policy?WT.mc_id=AZ-MVP-5000119

Before we proceed, we create the other tables for the Activity Logs and Diagnostic Metrics. As mentioned earlier, here are the links to the KQL commands.

-> https://github.com/neumanndaniel/scripts/tree/main/Azure_Data_Explorer/Diagnostic_Logs

One thing you might already notice is that we only have one RawRecord table for the Diagnostic Logs and Metrics. This is possible as both data types can be easily distinguished from each other. Diagnostic Logs have the operationName value in their records and Diagnostic Metrics have the metricName value. Those values are used in the different KQL functions.

Connect Event Hubs with Azure Data Explorer

After provisioning the table for the data ingestion, we create the necessary data connections between the Event Hubs and the Azure Data Explorer database.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-event-hubs-connection?WT.mc_id=AZ-MVP-5000119&tabs=portalADX%2Cget-data-2

As seen in the screenshot, we provide a name for the data connection and select the appropriate Event Hub. The compression setting is kept with its default setting None. Furthermore, we provide the table name of the raw records table with the corresponding ingestion mapping. Last but not least, we select the managed identity type for the data connection. In our case, from type system-assigned.

Once the data connection has been created, we can monitor the connection and see how many events have been received and processed.

Reducing ingestion latency

Per default, our tables with the Event Hub connections use the queued ingestion method. When data is finally ingested is defined by three configuration parameters time, item, and size for the ingestion batches. The default values for those parameters are 5 minutes, 1000 items, and 1 GB. Whatever threshold is reached first triggers the final ingestion.

In the worst case, we have an ingestion latency of 5 minutes. It might be fast enough, but when we want near real-time ingestion, we either customize the batch ingestion policy or enable the streaming ingestion policy.

-> https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-overview?WT.mc_id=AZ-MVP-5000119

For the latter, streaming ingestion must be enabled on the Azure Data Explorer cluster.

We decide on the streaming ingestion policy and enable the policy on the whole Azure Data Explorer database instead of specific tables.

.alter database AzureMonitor policy streamingingestion enable

Querying log and monitor data

Finally, we run our first queries against the log and monitor data.

DiagnosticLogs
| where TimeGenerated > ago(1d)

In the screenshot above, we see the Diagnostic Logs of the Azure Data Explorer cluster.

One of the more interesting data points is the Diagnostic Metric about the ingestion latency.

DiagnosticMetrics
| where TimeGenerated > ago(1d)
| where MetricName == "IngestionLatencyInSeconds"
| extend IngestionLatency=Total
| project TimeGenerated, ["Ingestion Latency (sec)"]=IngestionLatency
| render timechart

As seen in the screenshots, you see the near real-time ingestion when using the streaming ingestion policy and when it was enabled.

Summary

Activity Logs, Diagnostic Logs, and Diagnostic Metrics can be ingested via Event Hubs into an Azure Data Explorer. It is indeed more effort than directly using an Azure Log Analytics workspace but depending on your needs and requirements be the better solution.

For instance, from a pure infrastructure cost perspective the solution with Azure Data Explorer and Event Hub is less expensive than Log Analytics when you ingest a lot of data. If data retention is a topic for you then Azure Data Explorer is again an advantage as you can define flexible retention periods. Even an infinite retention period is possible. As of writing this blog post, Log Analytics is limited to an interactive retention period of 730 days max and an archive retention period of up to 12 years.

Ultimately, it depends on your needs and requirements if you choose Azure Data Explorer or Log Analytics to store your log and monitor data.

The example KQL files for the Activity Logs, Diagnostic Logs, and Diagnostic Metrics can be found on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/tree/main/Azure_Data_Explorer/Diagnostic_Logs

↧

Using Azure Data Explorer as logging backend for Fluent Bit

July 30, 2024, 1:04 am

≫ Next: Sneak peek into the new Fluent Bit Azure Data Explorer output plugin version

≪ Previous: Ingesting Azure Diagnostic Logs into Azure Data Explorer

Azure Data Explorer can be used as a logging backend for Fluent Bit in three different ways. In today’s blog post, we focus on one of them that in my opinion is the best one out of those three.

This is the way

Let us start first with a brief overview of the three different solutions and why I have chosen the not-so-obvious one. In my last blog post, I already talked about two of them: Azure Storage and Azure Event Hub. Not particularly for Fluent Bit but as ingestion methods for Azure Data Explorer.

-> https://www.danielstechblog.io/ingesting-azure-diagnostic-logs-into-azure-data-explorer/

Before we cover those two options let us look into the first option that comes to our mind looking into Fluent Bit’s output plugin offerings. There is a native output plugin for Azure Data Explorer.

-> https://docs.fluentbit.io/manual/pipeline/outputs/azure_kusto

The Azure Data Explorer output plugin uses the queued ingestion method which can be optimized by applying a batch ingestion policy. If a 10-second ingestion delay for your logs is acceptable then the Azure Data Explorer output plugin is the best option. During my tests, I found out that the core oauth2 implementation in Fluent Bit is not always working as expected. Especially, when you restart Fluent Bit pods on Kubernetes. For instance, on my three-node Azure Kubernetes Service cluster two Fluent Bit pods were working after the initial deployment. The version I used for my tests was Fluent Bit 3.0.7.

[2024/07/02 06:50:18] [ info] [output:azure_kusto:azure_kusto.0] endpoint='https://ingest-adxaks.northeurope.kusto.windows.net', database='Kubernetes', table='FluentBitLogs'
[2024/07/02 06:50:32] [ info] [output:azure_kusto:azure_kusto.0] loading kusto ingestion resourcs
[2024/07/02 06:50:32] [ info] [oauth2] HTTP Status=200
[2024/07/02 06:50:32] [ info] [oauth2] access token from 'login.microsoftonline.com:443' retrieved

The other one was stuck in the following state.

[2024/07/02 06:50:18] [ info] [output:azure_kusto:azure_kusto.0] endpoint='https://ingest-adxaks.northeurope.kusto.windows.net', database='Kubernetes', table='FluentBitLogs'
[2024/07/02 06:50:32] [ info] [output:azure_kusto:azure_kusto.0] loading kusto ingestion resourcs

After simulating several restarts of the Fluent Bit pods, all ended up in this stuck state. Unfortunately, I have not had the time to dive deeper into this or open a GitHub issue on the project’s repository. That said you should keep that in mind when you choose the Azure Data Explorer output plugin.

Another option is the Azure Blob output plugin which is officially developed by Microsoft.

-> https://docs.fluentbit.io/manual/pipeline/outputs/azure_blob

I already highlighted in my above-mentioned blog post that the Azure Storage Account option gets expensive when a lot of files are written and read from the Storage Account due to transaction costs. Especially, with a lot of pods running in a Kubernetes cluster, the costs will explode.

Here is a small example from my three-node Azure Kubernetes Service cluster with only running a small number of pods.

Within four hours I had 25.000 transactions in total. Now imagine a couple of large Kubernetes clusters emit their container logs to an Azure Storage Account. Besides that, the Storage Account option requires an Event Grid and Event Hub to get the container logs into Azure Data Explorer.

Therefore, the only viable option to get container logs from Fluent Bit into Azure Data Explorer is the usage of the Kafka output plugin.

-> https://docs.fluentbit.io/manual/pipeline/outputs/kafka

The Kafka output plugin gets connected to an Azure Event Hub as Event Hub provides an Apache Kafka endpoint. Using the Event Hub option for the Azure Data Explorer ingestion allows us to enable the streaming ingestion into Azure Data Explorer for container logs we gather with Fluent Bit. Streaming ingestion allows a near real-time ingestion.

Let us start with the Azure Data Explorer and Fluent Bit configuration.

Prepare Azure Data Explorer for Fluent Bit

As mentioned, before we use the Kafka output plugin in Fluent Bit to get our container logs via Event Hub into Azure Data Explorer.

Now, we begin with the Azure Data Explorer configuration. Our first step is the provisioning of a new database in our Azure Data Explorer cluster, which is called Kubernetes with the default settings.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-cluster-and-database?WT.mc_id=AZ-MVP-5000119&tabs=full#create-a-database

Once created, we open the query editor to prepare the table for the container logs that match the configured output of Fluent Bit.

.create table ContainerLogs (
    TimeGenerated: datetime, Region: string, PodNamespace: string, PodName: string, ContainerName: string, LogSource: string, LogMessage: dynamic,
    ContainerImage: string, ContainerImageDigest: string, PodId: guid, ContainerId: string,
    Environment: string, Cluster: string, Computer: string, NodeIp: string
    )

The next step is the ingestion mapping to ensure a correct ingestion into the table.

.create-or-alter table ContainerLogs ingestion json mapping "FluentBitMapping"
    ```[
    {"column": "TimeGenerated", "datatype": "datetime", "Properties": {"Path": "$.TimeGenerated"}},
    {"column": "Region", "datatype": "string", "Properties": {"Path": "$.Region"}},
    {"column": "PodNamespace", "datatype": "string", "Properties": {"Path": "$.PodNamespace"}},
    {"column": "PodName", "datatype": "string", "Properties": {"Path": "$.PodName"}},
    {"column": "ContainerName", "datatype": "string", "Properties": {"Path": "$.ContainerName"}},
    {"column": "LogSource", "datatype": "string", "Properties": {"Path": "$.LogSource"}},
    {"column": "LogMessage", "datatype": "dynamic", "Properties": {"Path": "$.LogMessage"}},
    {"column": "ContainerImage", "datatype": "string", "Properties": {"Path": "$.ContainerImage"}},
    {"column": "ContainerImageDigest", "datatype": "string", "Properties": {"Path": "$.ContainerImageDigest"}},
    {"column": "PodId", "datatype": "guid", "Properties": {"Path": "$.PodId"}},
    {"column": "ContainerId", "datatype": "string", "Properties": {"Path": "$.ContainerId"}},
    {"column": "Environment", "datatype": "string", "Properties": {"Path": "$.Environment"}},
    {"column": "Cluster", "datatype": "string", "Properties": {"Path": "$.Cluster"}},
    {"column": "Computer", "datatype": "string", "Properties": {"Path": "$.Computer"}},
    {"column": "NodeIp", "datatype": "string", "Properties": {"Path": "$.NodeIp"}}
    ]```

Using Event Hub for the Azure Data Explorer ingestion allows us to enable streaming ingestion for near real-time ingestion. Hence, we enable the streaming ingestion policy for the whole Kubernetes database.

.alter database Kubernetes policy streamingingestion enable

After provisioning the table for the data ingestion, we create the necessary data connection between Event Hub and the Azure Data Explorer database.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-event-hubs-connection?WT.mc_id=AZ-MVP-5000119&tabs=portalADX%2Cget-data-2

As seen in the screenshot, we provide a name for the data connection and select the appropriate Event Hub. The compression setting is kept with its default setting None. Furthermore, we provide the table name with the corresponding ingestion mapping. Last but not least, we select the managed identity type for the data connection. In our case, from type system-assigned.

Once the data connection has been created, we can monitor the connection and see how many events have been received and processed when we start to send container logs with Fluent Bit via the Kafka output plugin.

The Event Hub Namespace was created upfront with an Event Hub called Kubernetes with 8 partitions and an enabled auto-inflate functionality for the Event Hub throughput units.

For a production environment, I would set the partition count to the max value of 32 partitions.

Configure Fluent Bit

The entire Fluent Bit configuration is available on my GitHub repository under the following link.

-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer

Instead, we focus on the two filters after the kubernetes filter to prepare the container logs and transfer them into the correct key format that matches the Azure Data Explorer ingestion mapping. Afterward, the configuration of the Kafka output plugin follows.

...
    [FILTER]
        Name         nest
        Alias        logs_filter_2
        Match        kubernetes.logs.*
        Operation    lift
        Nested_under kubernetes
        Add_prefix   kubernetes_
...

The first filter that follows the kubernetes filter is the nest filter that makes Kubernetes pod metadata information under the prefix kubernetes_ available.

...
    [FILTER]
        Name   modify
        Alias  logs_filter_3
        Match  kubernetes.logs.*
        Add    Cluster                    ${CLUSTER}
        Add    Region                     ${REGION}
        Add    Environment                ${ENVIRONMENT}
        Add    NodeIp                     ${NODE_IP}
        Rename time                       TimeGenerated
        Rename message                    LogMessage
        Rename kubernetes_pod_name        PodName
        Rename kubernetes_namespace_name  PodNamespace
        Rename kubernetes_container_image ContainerImage
        Rename kubernetes_container_hash  ContainerImageDigest
        Rename kubernetes_docker_id       ContainerId
        Rename kubernetes_container_name  ContainerName
        Rename kubernetes_pod_id          PodId
        Rename kubernetes_host            Computer
        Rename stream                     LogSource
        Remove logtag
...

We then use the modify filter to add additional keys to the log output and rename existing keys to match the Azure Data Explorer ingestion mapping. Configuration placeholders like ${CLUSTER} are environment variables passed into the Fluent Bit pod via the daemon set configuration as seen in the below snippet.

...
          env:
            - name: FLUENT_BIT_EVENT_HUB_NAMESPACE
              valueFrom:
                secretKeyRef:
                  name: azureeventhub
                  key: namespace
            - name: FLUENT_BIT_EVENT_HUB
              valueFrom:
                secretKeyRef:
                  name: azureeventhub
                  key: topic
            - name: FLUENT_BIT_EVENT_HUB_CONNECTION_STRING
              valueFrom:
                secretKeyRef:
                  name: azureeventhub
                  key: connection_string
...

Before we dive deeper into the Kafka output plugin we generate a new shared access policy for our Kubernetes Event Hub.

Using the shared access policy of the Event Hub and not of the entire Event Hub Namespace allows us to restrict Fluent Bit’s access to only this particular Event Hub.

...
  output-kubernetes.conf: |
    [OUTPUT]
        Name                          kafka
        Alias                         logs_output
        Match                         kubernetes.logs.*
        Brokers                       ${FLUENT_BIT_EVENT_HUB_NAMESPACE}.servicebus.windows.net:9093
        Topics                        ${FLUENT_BIT_EVENT_HUB}
        Retry_Limit                   False
        Log_Level                     info
        Queue_Full_Retries            0
        Timestamp_Key                 @TimeGenerated
        Timestamp_Format              iso8601_ns
        Format                        json
        rdkafka.client.id             fluent-bit
        rdkafka.security.protocol     SASL_SSL
        rdkafka.sasl.mechanism        PLAIN
        rdkafka.sasl.username         $ConnectionString
        rdkafka.sasl.password         ${FLUENT_BIT_EVENT_HUB_CONNECTION_STRING}
        rdkafka.request.required.acks 1
        rdkafka.log.connection.close  false
        rdkafka.message.timeout.ms    0

One of the most important things when using Fluent Bit is to ensure that we do not lose any log data on the log collection side. Hence, the configuration parameters Retry_Limit, Queue_Full_Retries, and rdkafka.message.timeout.ms try to ensure that. Then we use some best practice configurations for the parameters rdkafka.request.required.acks and rdkafka.connection.close. The rdkafka.sasl.mechanism is set to PLAIN for using a connection string for authentication.

A full list of the rdkafka configuration parameters and Fluent Bit’s Kafka output plugin can be found under the following links.

-> https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md
-> https://docs.fluentbit.io/manual/pipeline/outputs/kafka

The required Kubernetes secret in our setup that provides the Event Hub Namespace name, the Event Hub name, and the connection string to Fluent Bit is created by running the following script that also deploys Fluent Bit.

❯ ./deploy-fluent-bit.sh RESOURCE_GROUP EVENT_HUB_NAMESPACE EVENT_HUB SHARED_ACCESS_POLICY_NAME
❯ ./deploy-fluent-bit.sh adx adxaks kubernetes fluent-bit

After a successful deployment of Fluent Bit, we should see the first container logs in Azure Data Explorer.

Summary

Fluent Bit supports several output plugins that can be used for data ingestion into Azure Data Explorer. The most suitable and versatile is from my current experience the Kafka output plugin with an Azure Event Hub. This option has some key advantages compared to the Azure Data Explorer and Azure Blob output plugin. First, we can use the streaming ingestion for near real-time ingestion compared to the queued ingestion method that the Azure Data Explorer output plugin uses. Second, it is cost-efficient compared to the Azure Blob output plugin which suffers in the end from the storage transaction costs in Azure.

So, my recommendation at the moment is the Kafka output plugin with an Azure Event Hub to ingest container logs via Fluent Bit into Azure Data Explorer.

Unfortunately, I had issues getting the Azure Data Explorer output plugin into a stable operation which would have been my preferred solution.

The example KQL file and the entire Fluent Bit deployment configuration can be found on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/tree/main/Azure_Data_Explorer/Fluent_Bit_Kubernetes
-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer

↧

Sneak peek into the new Fluent Bit Azure Data Explorer output plugin version

August 28, 2024, 1:08 pm

≫ Next: Export Azure Kubernetes Service control plane logs to Azure Data Explorer

≪ Previous: Using Azure Data Explorer as logging backend for Fluent Bit

In my last blog post, I wrote about the different options for using Azure Data Explorer as a logging backend for Fluent Bit.

-> https://www.danielstechblog.io/using-azure-data-explorer-as-logging-backend-for-fluent-bit/

Especially, about my issues getting the Azure Data Explorer output plugin working and why this led to the decision to use the Kafka output plugin in combination with Azure Event Hub to ingest the container log data.

Shortly after publishing the blog post the Azure Data Explorer engineering team reached out to me and asked if I was interested in a demo session as they are currently working on releasing the v2 version of the Azure Data Explorer output plugin.

Besides the other new functionality they added, they made the authentication and initialization part more robust and configurable. So, here we are again revisiting the topic “Using Azure Data Explorer as logging backend for Fluent Bit” with the Azure Data Explorer output plugin version v2.

INFO:

The Azure Data Explorer engineering team provided me with a pre-built Fluent Bit image that contained the latest v2 version of the Azure Data Explorer output plugin for evaluation. As of writing this blog post, it might be that the v2 version is still under review and not available yet to the entire Fluent Bit community. You can have a look at whether the following pull request is still open or was merged. The latter means that the new Azure Data Explorer output plugin version is available.

-> https://github.com/fluent/fluent-bit/pull/8430

After that important note let us get started.

Prepare Azure Data Explorer for Fluent Bit

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-cluster-and-database?WT.mc_id=AZ-MVP-5000119&tabs=full#create-a-database

Once created, we open the query editor to prepare the table for the container logs that match the configured output of Fluent Bit.

.create table ContainerLogs (
    TimeGenerated: datetime, Region: string, PodNamespace: string, PodName: string, ContainerName: string, LogSource: string, LogMessage: dynamic,
    ContainerImage: string, ContainerImageDigest: string, PodId: guid, ContainerId: string,
    Environment: string, Cluster: string, Computer: string, NodeIp: string
    )

The next step is the ingestion mapping to ensure a correct ingestion into the table.

.create-or-alter table FluentBitLogs ingestion json mapping "FluentBitMapping"
    ```[
    {"column": "TimeGenerated", "datatype": "datetime", "Properties": {"Path": "$.log.TimeGenerated"}},
    {"column": "Region", "datatype": "string", "Properties": {"Path": "$.log.Region"}},
    {"column": "PodNamespace", "datatype": "string", "Properties": {"Path": "$.log.PodNamespace"}},
    {"column": "PodName", "datatype": "string", "Properties": {"Path": "$.log.PodName"}},
    {"column": "ContainerName", "datatype": "string", "Properties": {"Path": "$.log.ContainerName"}},
    {"column": "LogSource", "datatype": "string", "Properties": {"Path": "$.log.LogSource"}},
    {"column": "LogMessage", "datatype": "dynamic", "Properties": {"Path": "$.log.LogMessage"}},
    {"column": "ContainerImage", "datatype": "string", "Properties": {"Path": "$.log.ContainerImage"}},
    {"column": "ContainerImageDigest", "datatype": "string", "Properties": {"Path": "$.log.ContainerImageDigest"}},
    {"column": "PodId", "datatype": "guid", "Properties": {"Path": "$.log.PodId"}},
    {"column": "ContainerId", "datatype": "string", "Properties": {"Path": "$.log.ContainerId"}},
    {"column": "Environment", "datatype": "string", "Properties": {"Path": "$.log.Environment"}},
    {"column": "Cluster", "datatype": "string", "Properties": {"Path": "$.log.Cluster"}},
    {"column": "Computer", "datatype": "string", "Properties": {"Path": "$.log.Computer"}},
    {"column": "NodeIp", "datatype": "string", "Properties": {"Path": "$.log.NodeIp"}}
    ]```

Whereas in the previous blog post, we could reference the keys directly, for instance, $.PodName, the Azure Data Explorer output plugin puts all relevant container log data under the log key section, for instance, $.log.PodName.

Compared to the Kafka output plugin with an Azure Event Hub we only have the queue ingestion option with the Azure Data Explorer output plugin available instead of the streaming ingestion option.

Hence, we apply the following batch ingestion policy on the entire database level.

.alter database Kubernetes policy ingestionbatching
```
{
"MaximumBatchingTimeSpan": "00:00:10",
"MaximumNumberOfItems": 500,
"MaximumRawDataSizeMB": 1024
}
```

Without any further optimization on the Azure Data Explorer output plugin side and depending on the ingestion volume and Azure Data Explorer cluster size we can expect the same ingestion performance as someone is used to with Azure Log Analytics in combination with Fluent Bit.

The last step that is required on the Azure Data Explorer side is the Database Ingestor permission for the service principal we use later for Fluent Bit.

.add database Kubernetes ingestors ('aadapp=<Application_ID>;<Tenant_ID>')

We set the permissions on the database level.

Configure Fluent Bit

The entire Fluent Bit configuration is available on my GitHub repository under the following link.

-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer

...
    [FILTER]
        Name         nest
        Alias        logs_filter_2
        Match        kubernetes.logs.*
        Operation    lift
        Nested_under kubernetes
        Add_prefix   kubernetes_
...

The first filter that follows the kubernetes filter is the nest filter that makes Kubernetes pod metadata information under the prefix kubernetes_ available.

...
    [FILTER]
        Name   modify
        Alias  logs_filter_3
        Match  kubernetes.logs.*
        Add    Cluster                    ${CLUSTER}
        Add    Region                     ${REGION}
        Add    Environment                ${ENVIRONMENT}
        Add    NodeIp                     ${NODE_IP}
        Rename time                       TimeGenerated
        Rename message                    LogMessage
        Rename kubernetes_pod_name        PodName
        Rename kubernetes_namespace_name  PodNamespace
        Rename kubernetes_container_image ContainerImage
        Rename kubernetes_container_hash  ContainerImageDigest
        Rename kubernetes_docker_id       ContainerId
        Rename kubernetes_container_name  ContainerName
        Rename kubernetes_pod_id          PodId
        Rename kubernetes_host            Computer
        Rename stream                     LogSource
        Remove logtag
...

...
          env:
            - name: FLUENT_ADX_TENANT_ID
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: tenant_id
            - name: FLUENT_ADX_CLIENT_ID
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: client_id
            - name: FLUENT_ADX_CLIENT_SECRET
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: client_secret
...

The below Azure Data Explorer output plugin configuration contains some of the new configuration options. In my example, I am using the standard default values for them.

...
  output-kubernetes.conf: |
    [OUTPUT]
        Name                        azure_kusto
        Match                       kubernetes.logs.*
        Tenant_Id                   ${FLUENT_ADX_TENANT_ID}
        Client_Id                   ${FLUENT_ADX_CLIENT_ID}
        Client_Secret               ${FLUENT_ADX_CLIENT_SECRET}
        Ingestion_Endpoint          https://ingest-adxaks.northeurope.kusto.windows.net
        Database_Name               Kubernetes
        Table_Name                  ContainerLogs
        Ingestion_Mapping_Reference FluentBitMapping
        Log_Key                     log
        Include_Tag_Key             Off
        Include_Time_Key            Off
        Retry_Limit                 False
        Log_Level                   info
        compression_enabled         on
        ingestion_endpoint_connect_timeout 60
        ingestion_resources_refresh_interval 3600
        buffering_enabled false

Beside the necessary settings Tenant_Id, Client_Id, Client_Secret, Ingestion_Endpoint, Database_Name, Table_Name, and Ingestion_Mapping_Reference I want to highlight Retry_Limit and Log_Level.

One of the most important things when using Fluent Bit is to ensure that we do not lose any log data on the log collection side by setting the configuration parameter Retry_Limit to False.

Having the Log_Level set to info, the default value, allows us to retrieve information about the authentication and initialization part of the Azure Data Explorer output plugin.

❯ kubectl logs fluent-bit-pph4m
Fluent Bit v3.0.7
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  <
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/

[2024/08/23 19:45:07] [ info] [fluent bit] version=3.0.7, commit=598e92b0f6, pid=1
[2024/08/23 19:45:07] [ info] [storage] ver=1.5.2, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128
[2024/08/23 19:45:07] [ info] [storage] backlog input plugin: storage_backlog.1
[2024/08/23 19:45:07] [ info] [cmetrics] version=0.9.1
[2024/08/23 19:45:07] [ info] [ctraces ] version=0.5.1
[2024/08/23 19:45:07] [ info] [input:storage_backlog:storage_backlog.1] initializing
[2024/08/23 19:45:07] [ info] [input:storage_backlog:storage_backlog.1] storage_strategy='memory' (memory only)
[2024/08/23 19:45:07] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 47.7M
[2024/08/23 19:45:07] [ info] [filter:kubernetes:logs_filter_1] https=1 host=kubernetes.default.svc port=443
[2024/08/23 19:45:07] [ info] [filter:kubernetes:logs_filter_1]  token updated
[2024/08/23 19:45:07] [ info] [filter:kubernetes:logs_filter_1] local POD info OK
[2024/08/23 19:45:07] [ info] [filter:kubernetes:logs_filter_1] testing connectivity with Kubelet...
[2024/08/23 19:45:08] [ info] [filter:kubernetes:logs_filter_1] connectivity OK
[2024/08/23 19:45:08] [ info] [output:azure_kusto:azure_kusto.0] endpoint='https://ingest-adxaks.northeurope.kusto.windows.net', database='Kubernetes', table='ContainerLogs'
[2024/08/23 19:45:08] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2024/08/23 19:45:08] [ info] [sp] stream processor started
[2024/08/23 19:45:22] [ info] [output:azure_kusto:azure_kusto.0] loading kusto ingestion resources and refresh interval is 3904
[2024/08/23 19:45:22] [ info] [oauth2] HTTP Status=200
[2024/08/23 19:45:22] [ info] [oauth2] access token from 'login.microsoftonline.com:443' retrieved

As seen above the authentication and initialization part is now working like a charm compared to the v1 version of the output plugin.

The required Kubernetes secret in our setup that provides the service principal credentials is created by running the following script that also deploys Fluent Bit.

❯ ./deploy-fluent-bit-adx.sh TENANT_ID CLIENT_ID CLIENT_SECRET
❯ ./deploy-fluent-bit-adx.sh 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 Pa$$W0rd

After a successful deployment of Fluent Bit, we should see the first container logs in Azure Data Explorer.

Summary

The new v2 version of the Azure Data Explorer output plugin fixes the issues in the authentication and initialization part and adds additional new configuration parameters to fine tune the ingestion on the plugin side.

The example KQL file and the entire Fluent Bit deployment configuration can be found on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/tree/main/Azure_Data_Explorer/Fluent_Bit_Kubernetes
-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer

↧

Export Azure Kubernetes Service control plane logs to Azure Data Explorer

October 5, 2024, 12:59 pm

≫ Next: Designing the Azure Data Explorer table structure for Azure Diagnostic Logs or Defender for Cloud data ingestion

≪ Previous: Sneak peek into the new Fluent Bit Azure Data Explorer output plugin version

In today’s blog post, we look at the Azure Kubernetes Service control plane logs and how to ingest them into Azure Data Explorer. Especially, the Kubernetes Audit (kube-audit) log.

Azure Data Explorer – Ingestion Method

Looking at the export options for the Azure Kubernetes Service control plane logs we can choose between an Azure Storage Account or an Event Hub. Both solutions act as a middleware component to store data till it is ingested into Azure Data Explorer.

In our case, we choose the Azure Event Hub export option. I created the Event Hub Namespace upfront with one dedicated Event Hub with 32 partitions and enabled the auto-inflate functionality for the Event Hub throughput units.

Configure ingestion to Azure Data Explorer

Before we start to prepare everything on the Azure Data Explorer side, we configure the export of the Kubernetes Audit log to the Event Hub aks_control_plane.

Now, we begin with the Azure Data Explorer configuration. Our first step is the provisioning of a new database in our Azure Data Explorer cluster, which is called Kubernetes with the default settings.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-cluster-and-database?WT.mc_id=AZ-MVP-5000119&tabs=full#create-a-database

Once created, we open the query editor to prepare the tables for the Kubernetes Audit log data.

.create table ControlPlaneLogs (
    TimeGenerated: datetime, Category: string, ResourceId: string,
    LogSource: string, Pod: string, ContainerId: string, LogMessage: dynamic
    )

After running the command, we have our table called ControlPlaneLogs. The next table we create is called ControlPlaneLogsRawRecords and is used for the data ingestion from the aks_control_plane Event Hub.

.create table ControlPlaneLogsRawRecords (Records: dynamic)

As we do not want to store data in this table, we set the retention policy to 0.

.alter-merge table ControlPlaneLogsRawRecords policy retention softdelete = 0d

Our next step is the ingestion mapping to ensure a correct ingestion into the table.

.create table ControlPlaneLogsRawRecords ingestion json mapping 'ControlPlaneLogsRawRecordsMapping' '[{"column":"Records","Properties":{"path":"$.records"}}]'

Getting the ingested log data and monitor data into the target table ControlPlaneLogs requires a KQL function and an update policy on the table.

.create function ControlPlaneLogRecordsExpand() {
        ControlPlaneLogsRawRecords
        | mv-expand events = Records
        | project
            TimeGenerated = todatetime(events['time']),
            Category = tostring(events.category),
            ResourceId = tostring(events.resourceId),
            LogSource = tostring(events.properties.stream),
            Pod = tostring(events.properties.pod),
            ContainerId = tostring(events.properties.containerID),
            LogMessage = parse_json(tostring(events.properties.log))
    }

You may have recognized that I transform the log property first to a string before I parse it again as JSON. Looking at the following screenshot you see that the content of the log property has escape characters.

Those escape characters prevent, that you can reference the keys directly. The next screenshot shows the difference.

.alter table ControlPlaneLogs policy update @'[{"Source": "ControlPlaneLogsRawRecords", "Query": "ControlPlaneLogRecordsExpand()", "IsEnabled": "True", "IsTransactional": true}]'

By running the above KQL command, we update the policy on our target table whenever a new record arrives in the source table ControlPlaneLogsRawRecords Azure Data Explorer executes the previously defined function and ingests the result into our target table ControlPlaneLogs.

-> https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/update-policy?WT.mc_id=AZ-MVP-5000119

Connect Event Hubs with Azure Data Explorer

After provisioning the table for the data ingestion, we create the necessary data connection between the Event Hub and the Azure Data Explorer database.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-event-hubs-connection?WT.mc_id=AZ-MVP-5000119&tabs=portalADX%2Cget-data-2

Once the data connection has been created, we can monitor the connection and see how many events have been received and processed.

Reducing ingestion latency

-> https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-overview?WT.mc_id=AZ-MVP-5000119

For the latter, streaming ingestion must be enabled on the Azure Data Explorer cluster.

We decide on the streaming ingestion policy and enable the policy on the whole Azure Data Explorer database instead of specific tables.

.alter database Kubernetes policy streamingingestion enable

Querying log and monitor data

Finally, we run our first query against the Kubernetes Audit log data.

ControlPlaneLogs
| where LogMessage.user.username == "{USER_OBJECT_ID}"
| project TimeGenerated, Cluster=split(ResourceId,"/",8)[0], Level=LogMessage.level, Stage=LogMessage.stage, IPs=LogMessage.sourceIPs, 
    Client=LogMessage.userAgent, Verb=LogMessage.verb, URI=LogMessage.requestURI, Status=LogMessage.responseStatus.code, 
    Decision=LogMessage.annotations["authorization.k8s.io/decision"], Reason=LogMessage.annotations["authorization.k8s.io/reason"]

In the screenshot above, we see the Kubernetes Audit log of the Azure Kubernetes Service cluster aks-azst-1.

The query filters on my username which is the object id and shows important information about my allowed action against the Kubernetes API server. In this specific case, I ran kubectl get nodes.

Summary

Kubernetes Audit log data can be ingested via Azure Event Hubs into an Azure Data Explorer cluster. It is indeed more effort than directly using an Azure Log Analytics workspace but depending on your needs and requirements be the better solution. Especially, for the Kubernetes Audit log it makes sense to prefer an Azure Data Explorer cluster instead of an Azure Log Analytics workspace from a cost perspective. The Kubernetes Audit log contains all events against the Kubernetes API server and can be several GB of data per day.

The example KQL file be found on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/blob/main/Azure_Data_Explorer/Azure_Kubernetes_Service/Control_Plane_Logs.kql

↧

Designing the Azure Data Explorer table structure for Azure Diagnostic Logs or Defender for Cloud data ingestion

November 15, 2024, 2:24 pm

≫ Next: Retrieve Kubernetes Pods IP addresses with Fluent Bit

≪ Previous: Export Azure Kubernetes Service control plane logs to Azure Data Explorer

In my recent blog posts about Azure Data Explorer, I wrote about Activity Logs and Diagnostic Logs ingestion.

-> https://www.danielstechblog.io/ingesting-azure-diagnostic-logs-into-azure-data-explorer/
-> https://www.danielstechblog.io/export-azure-kubernetes-service-control-plane-logs-to-azure-data-explorer/

Today, I would like to discuss how to design the Azure Data Explorer table for the Diagnostic Logs or Defender for Cloud log data ingestion. Depending on your preferences, you can choose between a generic design that covers all Diagnostic Logs data from different services, or a more customized design that supports you in your day-to-day business.

We have a look today at the latter one, creating a customized table design. As an example, we use Microsoft Defender for Cloud alerts that are ingested via an Event Hub into our Azure Data Explorer cluster.

First, we configure the continuous export of the Microsoft Defender for Cloud alerts to an Event Hub. This can be done via the Azure portal or Infrastructure as Code. In our example, we use Infrastructure as Code using the following Terraform module.

-> https://github.com/neumanndaniel/terraform/tree/master/modules/microsoft_defender_continuous_export

module "microsoft_defender_continuous_export_eventhub" {
  source = "../modules/microsoft_defender_continuous_export"

  name                       = "ExportToEventHub"
  resource_group_name        = "continuous-export-config"
  location                   = "northeurope"
  type                       = "eventhub"
  eventhub_id                = "/subscriptions/<subscription_id>/resourceGroups/adx/providers/Microsoft.EventHub/namespaces/adxaks/eventhubs/microsoft_defender_for_cloud"
  eventhub_connection_string = "Endpoint=sb://adxaks.servicebus.windows.net/;SharedAccessKeyName=mdfc_send;SharedAccessKey=<shared_access_key>;EntityPath=microsoft_defender_for_cloud"
}

Prerequisites

Our first step is the provisioning of a new database in our Azure Data Explorer cluster, which is called MDfC.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-cluster-and-database?WT.mc_id=AZ-MVP-5000119&tabs=full#create-a-database

Afterward, we open the query editor and create the table MDfCRawRecords.

.alter database MDfC policy streamingingestion enable

.create table MDfCRawRecords (Records: dynamic)

.create table MDfCRawRecords ingestion json mapping 'MDfCRawRecordsMapping' '[{"column":"Records","Properties":{"path":"$"}}]'

This table contains the security alert records as they arrive from the Event Hub into the Azure Data Explorer cluster.

Before we can start with the design and definition of our table holding the final result set, we connect the Event Hub with the Azure Data Explorer cluster.

-> https://learn.microsoft.com/en-us/azure/data-explorer/create-event-hubs-connection?WT.mc_id=AZ-MVP-5000119&tabs=portalADX%2Cget-data-2

Designing the Azure Data Explorer table

Microsoft Defender for Cloud has a nice feature to trigger sample security alerts to generate alerts for testing.

-> https://learn.microsoft.com/en-us/azure/defender-for-cloud/alert-validation?WT.mc_id=AZ-MVP-5000119

We use this functionality to populate security alert data into the MDfCRawRecords table.

Once the data has arrived in our Azure Data Explorer table, we examine the structure to design our final table MDfCSecurityAlerts.

As you might have recognized we use for the ingestion mapping into the Records column the root path $. This allows us to retrieve the JSON data as it is emitted from Microsoft Defender for Cloud. For instance, Azure Diagnostic Logs are emitted differently to an Event Hub under the path $.records. Also, in this case, the root path $ works.

A fallback solution to retrieve the emitted data structure is the usage of the Data Explorer in an Event Hub.

This comes in handy when the expected data structure is different from the emitted data structure and the ingestion mapping does not work.

So, after examining the data structure we define our final table MDfCSecurityAlerts and the corresponding KQL function for the data ingestion into this table.

Let us start with the KQL function.

.create function SecurityAlertRecordsExpand() {
        MDfCRawRecords
        | extend events = Records
        | where isnotnull(events.Severity) and isnotnull(events.SystemAlertId)
        | project
            TimeGenerated = todatetime(events.TimeGenerated),
            StartTimeUtc = todatetime(events.StartTimeUtc),
            EndTimeUtc = todatetime(events.EndTimeUtc),
            Status = tostring(events.Status),
            Severity = tostring(events.Severity),
            CompromisedEntity = tostring(events.CompromisedEntity),
            Intent = tostring(events.Intent),
            AlertType = tostring(events.AlertType),
            AlertName = tostring(events.AlertDisplayName),
            AlertDescription = tostring(events.Description),
            AlertId = tostring(events.SystemAlertId),
            VendorName = tostring(events.VendorName),
            ResourceId = tostring(events.AzureResourceId),
            Properties = events.ExtendedProperties,
            Link = tostring(events.AlertUri),
            Incident = tobool(events.IsIncident)
    }

The KQL function queries the MDfCRawRecords table and uses the extend command to map the Records column onto events. For Azure Diagnostic Logs you would use mv-expand as the Records column would include several entries you would like to have an own row for each entry. But in the case of the Microsoft Defender for Cloud security alerts, we must use extend.

Before we use the project command to map the results to our table structure we use a filter. The filter is necessary as we might want to emit additional Microsoft Defender for Cloud data to the Event Hub and therefore end up in the MDfCRawRecords table and those data entries should not end up in the MDfCSecurityAlerts table.

After creating the KQL function we finally can create the table MDfCSecurityAlerts.

.create table MDfCSecurityAlerts (
    TimeGenerated: datetime, StartTimeUtc: datetime, EndTimeUtc: datetime,
    Status: string, Severity: string, CompromisedEntity: string,
    Intent: string, AlertType: string, AlertName: string, AlertDescription: string,
    AlertId: string, VendorName: string, ResourceId: string, Properties: dynamic,
    Link: string, Incident: bool
    )

Last but not least we set the retention on the MDfCRawRecords table to 0 days.

.alter-merge table MDfCRawRecords policy retention softdelete = 0d

We do not want to keep records in this table as we use the following update policy to execute the KQL function when new data arrives in the MDfCRawRecords table to ingest the data to the MDfCSecurityAlerts table.

.alter table MDfCSecurityAlerts policy update @'[{"Source": "MDfCRawRecords", "Query": "SecurityAlertRecordsExpand()", "IsEnabled": "True", "IsTransactional": true}]'

Summary

Using an Azure Data Explorer cluster as a target for Azure Diagnostic Logs or Microsoft Defender for Cloud data instead of Azure Log Analytics has the advantage of customizing the final table to support the day-to-day business. On the other side, it requires a bit more upfront work to get to this state but it will pay off.

The example KQL file and the Terraform module can be found on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/blob/main/Azure_Data_Explorer/Microsoft_Defender_for_Cloud/Security_Alerts.kql
-> https://github.com/neumanndaniel/terraform/tree/master/modules/microsoft_defender_continuous_export

↧

Retrieve Kubernetes Pods IP addresses with Fluent Bit

November 28, 2024, 11:44 pm

≫ Next: New Fluent Bit Azure Data Explorer output plugin version available

≪ Previous: Designing the Azure Data Explorer table structure for Azure Diagnostic Logs or Defender for Cloud data ingestion

In the recent 3.2.1 release, Fluent Bit added a long-awaited functionality that has been available for a long time in FluentD: the capability to extract the Kubernetes Pod IP address and enrich the log data with it.

Kubernetes (Filter)

Retrieve kubernetes pod ip address if it is set in status.podip (#2783)

-> https://fluentbit.io/announcements/v3.2.1/
-> https://github.com/fluent/fluent-bit/issues/2301
-> https://github.com/fluent/fluent-bit/pull/2783

If you are using several filters like me that process the output of the Kubernetes filter you need to adjust those filters to benefit from this new functionality.

For instance, I am just using the nest and modify filter and only need one line Rename kubernetes_pod_ip PodIp to add the Kubernetes Pod IP address to the log data.

...
    [FILTER]
        Name         nest
        Alias        logs_filter_2
        Match        kubernetes.logs.*
        Operation    lift
        Nested_under kubernetes
        Add_prefix   kubernetes_

    [FILTER]
        Name   modify
        Alias  logs_filter_3
        Match  kubernetes.logs.*
        Add    Cluster                    ${CLUSTER}
        Add    Region                     ${REGION}
        Add    Environment                ${ENVIRONMENT}
        Add    NodeIp                     ${NODE_IP}
        Rename time                       TimeGenerated
        Rename message                    LogMessage
        Rename kubernetes_pod_name        PodName
        Rename kubernetes_namespace_name  PodNamespace
        Rename kubernetes_container_image ContainerImage
        Rename kubernetes_container_hash  ContainerImageDigest
        Rename kubernetes_docker_id       ContainerId
        Rename kubernetes_container_name  ContainerName
        Rename kubernetes_pod_id          PodId
        Rename kubernetes_pod_ip          PodIp
        Rename kubernetes_host            Computer
        Rename stream                     LogSource
        Remove logtag
...

After applying the configuration changes to the Fluent Bit deployment on my Azure Kubernetes Service cluster, it takes a few seconds for the new log data to have the Pod’s IP address attached to it.

The entire configuration example for the Azure Data Explorer and Fluent Bit configuration is available on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/tree/main/Azure_Data_Explorer/Fluent_Bit_Kubernetes
-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer
-> https://www.danielstechblog.io/sneak-peek-into-the-new-fluent-bit-azure-data-explorer-output-plugin-version/

↧

New Fluent Bit Azure Data Explorer output plugin version available

December 3, 2024, 11:43 pm

≫ Next: Use Fluent Bit for Kubernetes events gathering on Azure Kubernetes Service

≪ Previous: Retrieve Kubernetes Pods IP addresses with Fluent Bit

In the recent 3.2.2 release, the new Azure Data Explorer output plugin version is available.

Azure_kusto (Output)

fix multiple files tail issue and timeout issue (#8430)

-> https://fluentbit.io/announcements/v3.2.2/

-> https://github.com/fluent/fluent-bit/pull/8430

The previous version had a couple of issues that have now been fixed. For instance, I was running into an unreliable authentication with the earlier version where in the end after several pod restarts the Fluent Bit pods were stuck in the authentication part against the Azure Data Explorer.

In the new version, the authentication and initialization part has been made more robust and configurable. It works like a charm as I have tested it in a preview version of the output plugin.

-> https://www.danielstechblog.io/sneak-peek-into-the-new-fluent-bit-azure-data-explorer-output-plugin-version/

❯ kubectl logs fluent-bit-vxptf
Fluent Bit v3.2.2
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  _____
|  ___| |                | |   | ___ (_) |         |____ |/ __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \  / /
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)_____/


[2024/11/28 21:16:53] [ info] [fluent bit] version=3.2.2, commit=a59c867924, pid=1
[2024/11/28 21:16:53] [ info] [storage] ver=1.5.2, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128
[2024/11/28 21:16:53] [ info] [storage] backlog input plugin: storage_backlog.1
[2024/11/28 21:16:53] [ info] [simd    ] disabled
[2024/11/28 21:16:53] [ info] [cmetrics] version=0.9.9
[2024/11/28 21:16:53] [ info] [ctraces ] version=0.5.7
[2024/11/28 21:16:53] [ info] [input:storage_backlog:storage_backlog.1] initializing
[2024/11/28 21:16:53] [ info] [input:storage_backlog:storage_backlog.1] storage_strategy='memory' (memory only)
[2024/11/28 21:16:53] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 47.7M
[2024/11/28 21:16:53] [ info] [filter:kubernetes:logs_filter_1] https=1 host=kubernetes.default.svc port=443
[2024/11/28 21:16:53] [ info] [filter:kubernetes:logs_filter_1]  token updated
[2024/11/28 21:16:53] [ info] [filter:kubernetes:logs_filter_1] local POD info OK
[2024/11/28 21:16:53] [ info] [filter:kubernetes:logs_filter_1] testing connectivity with Kubelet...
[2024/11/28 21:16:53] [ info] [filter:kubernetes:logs_filter_1] connectivity OK
[2024/11/28 21:16:53] [ info] [output:azure_kusto:azure_kusto.0] endpoint='https://ingest-adxaks.northeurope.kusto.windows.net', database='Kubernetes', table='ContainerLogs'
[2024/11/28 21:16:53] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2024/11/28 21:16:53] [ info] [sp] stream processor started
[2024/11/28 21:17:07] [ info] [output:azure_kusto:azure_kusto.0] loading kusto ingestion resourcs and refresh interval is 3487
[2024/11/28 21:17:08] [ info] [oauth2] HTTP Status=200
[2024/11/28 21:17:08] [ info] [oauth2] access token from 'login.microsoftonline.com:443' retrieved
...

Unfortunately, the PR for the updated documentation has not been merged yet. Hence, I am linking the PR here.

-> https://github.com/fluent/fluent-bit-docs/pull/1405

When you are using Fluent Bit with the Azure Data Explorer output plugin, I highly recommend updating to Fluent Bit 3.2.2 to benefit from the improvements that have been made with this new output plugin version.

↧

Use Fluent Bit for Kubernetes events gathering on Azure Kubernetes Service

January 1, 2025, 3:46 pm

≫ Next: Egress traffic blocking with Cilium cluster-wide network policies on Azure Kubernetes Service

≪ Previous: New Fluent Bit Azure Data Explorer output plugin version available

For a while now Fluent Bit has a new input plugin that allows us to gather Kubernetes events, modify, and ingest them into the logging backend.

-> https://docs.fluentbit.io/manual/pipeline/inputs/kubernetes-events

Today we look at how to configure and deploy Fluent Bit to gather Kubernetes events on an Azure Kubernetes Service cluster and ingest them into an Azure Data Explorer cluster.

Deployment

Fluent Bit runs per default as a Kubernetes daemon set on every node in a Kubernetes cluster to gather container logs. The Kubernetes Events input plugin should not be configured, at the time of writing, on a Fluent Bit daemon set installation as the input plugin does not have a leader election functionality. Hence, we would gather the same Kubernetes events over and over again.

-> https://github.com/fluent/fluent-bit/discussions/6942

The only viable option for the Kubernetes Events input plugin is a Kubernetes deployment with a single replica.

Furthermore, we need external storage for the database that the Kubernetes Events input plugin uses to track the state of events that have already been gathered.

In the case of an Azure Kubernetes Service cluster, I have chosen in this example an Azure File Share as an external storage. Unfortunately, we cannot use one of the already existing storage classes as they are all missing an important configuration parameter.

The nobrl parameter must be set, otherwise Fluent Bit will complain about a locked database. nobrl is used to avoid sending byte range lock requests to the server.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azurefile-csi-fluent-bit
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
mountOptions:
  - mfsymlinks
  - actimeo=30
  - nosharesock
  - nobrl # nobrl is required for Fluent Bit to work correctly
parameters:
  skuName: Standard_LRS

With the above-mentioned storage class, we hand over the Azure Storage Account creation to Azure. So, no pre-provisioning is required, and the Storage Account will be created within the Azure Kubernetes Service node resource group.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fluent-bit-kubernetes-events
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile-csi-fluent-bit
  resources:
    requests:
      storage: 5Gi

For the persistent volume claim, that represents the Azure File Share, we choose 5 GB as the initial storage capacity.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fluent-bit-kubernetes-events
    version: v3.2.3
    kubernetes.io/cluster-service: "true"
  name: fluent-bit-kubernetes-events
  namespace: logging
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: fluent-bit-kubernetes-events
  template:
    metadata:
      labels:
        app: fluent-bit-kubernetes-events
        version: v3.2.3
        kubernetes.io/cluster-service: "true"
    spec:
      terminationGracePeriodSeconds: 75
      containers:
        - name: fluent-bit-kubernetes-events
          image: cr.fluentbit.io/fluent/fluent-bit:3.2.3
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 2020
          livenessProbe:
            httpGet:
              path: /api/v1/health
              port: 2020
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          env:
            - name: FLUENT_ADX_TENANT_ID
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: tenant_id
            - name: FLUENT_ADX_CLIENT_ID
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: client_id
            - name: FLUENT_ADX_CLIENT_SECRET
              valueFrom:
                secretKeyRef:
                  name: azuredataexplorer
                  key: client_secret
            - name: CLUSTER
              value: aks-azst-1
            - name: REGION
              value: northeurope
            - name: ENVIRONMENT
              value: prod
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
          volumeMounts:
            - name: fluent-bit-kubernetes-events-config
              mountPath: /fluent-bit/etc/
            - name: fluent-bit-kubernetes-events-data
              mountPath: /fluent-bit/data/
          resources:
            limits:
              cpu: 500m
              memory: 750Mi
            requests:
              cpu: 75m
              memory: 325Mi
          securityContext:
            runAsNonRoot: true
            runAsUser: 65534
            runAsGroup: 65534
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
      volumes:
        - name: fluent-bit-kubernetes-events-config
          configMap:
            name: fluent-bit-kubernetes-events-config
        - name: fluent-bit-kubernetes-events-data
          persistentVolumeClaim:
            claimName: fluent-bit-kubernetes-events
      serviceAccountName: fluent-bit-kubernetes-events
      priorityClassName: system-cluster-critical

The deployment is kept simple and only has three specific configurations.

First, the increased termination grace period to provide Fluent Bit with enough time to shut down during the pod termination phase.

Second, the priority class as we do not want that our Fluent Bit deployment will be evicted from the node by the scheduler when pods with higher priority are scheduled under normal configuration circumstances.

Third, we use the recreate strategy to prevent interference between two pods accessing the database simultaneously.

Configuration

As of version 3.1, Fluent Bit uses a Kubernetes watch stream to retrieve Kubernetes events via the input plugin. Hence, we use the default configuration for the input plugin, followed by several filters to prepare the data for the Azure Data Explorer output plugin.

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-kubernetes-events-config
  namespace: logging
data:
  # General settings
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush                     15
        # Ensures that log chunks, where the flush failed previously, are flushed on container termination
        Grace                     60
        Log_Level                 info
        Daemon                    Off
        HTTP_Server               On
        HTTP_Listen               0.0.0.0
        HTTP_Port                 2020
        Health_Check              On
        HC_Errors_Count           5
        HC_Retry_Failure_Count    5
        HC_Period                 60
        # Backpressue fallback
        storage.path              /fluent-bit/data/flb-storage/
        storage.sync              normal
        storage.checksum          off
        storage.backlog.mem_limit 50M

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-kubernetes.conf

  # Kuberetes Events configuration
  # ======================================================
  input-kubernetes.conf: |
    [INPUT]
        Name                kubernetes_events
        Alias               events_input
        Tag                 kubernetes.events.*
        DB                  /fluent-bit/data/flb_kubernetes_events.db
        DB.sync             normal
        kube_retention_time 1h
        Log_Level           warning

  filter-kubernetes.conf: |
    [FILTER]
        Name         nest
        Alias        events_filter_1
        Match        kubernetes.events.*
        Operation    lift
        Nested_under involvedObject
        Add_prefix   involvedObject_

    [FILTER]
        Name         nest
        Alias        events_filter_2
        Match        kubernetes.events.*
        Operation    lift
        Nested_under source
        Add_prefix   source_

    [FILTER]
        Name         nest
        Alias        events_filter_3
        Match        kubernetes.events.*
        Operation    lift
        Nested_under metadata
        Add_prefix   metadata_

    [FILTER]
        Name      modify
        Alias     events_filter_4
        Match     kubernetes.events.*
        Condition Key_does_not_exist source_host
        Add       source_host        ""

    [FILTER]
        Name      modify
        Alias     events_filter_5
        Match     kubernetes.events.*
        Add       Cluster                    ${CLUSTER}
        Add       Region                     ${REGION}
        Add       Environment                ${ENVIRONMENT}
        Rename    metadata_creationTimestamp CreationTimestamp
        Rename    source_component           SourceComponent
        Rename    source_host                SourceComputer
        Rename    reportingComponent         ReportingComponent
        Rename    reportingInstance          ReportingComputer
        Rename    involvedObject_kind        Kind
        Rename    involvedObject_apiVersion  ApiVersion
        Rename    involvedObject_name        Name
        Rename    involvedObject_namespace   Namespace
        Rename    count                      Count
        Rename    action                     Action
        Rename    reason                     Reason
        Rename    message                    Message
        Rename    type                       KubeEventType
        Rename    firstTimestamp             FirstSeen
        Rename    lastTimestamp              LastSeen
        Remove    metadata
        Remove    involvedObject
        Remove    source
        Remove    eventTime
        Remove    involvedObject_resourceVersion
        Remove    involvedObject_uid
        Remove    involvedObject_fieldPath
        Remove    involvedObject_labels
        Remove    involvedObject_annotations
        Remove    metadata_name
        Remove    metadata_namespace
        Remove    metadata_uid
        Remove    metadata_resourceVersion
        Remove    metadata_managedFields

  output-kubernetes.conf: |
    [OUTPUT]
        Name                        azure_kusto
        Match                       kubernetes.events.*
        Tenant_Id                   ${FLUENT_ADX_TENANT_ID}
        Client_Id                   ${FLUENT_ADX_CLIENT_ID}
        Client_Secret               ${FLUENT_ADX_CLIENT_SECRET}
        Ingestion_Endpoint          https://ingest-adxaks.northeurope.kusto.windows.net
        Database_Name               Kubernetes
        Table_Name                  KubeEvents
        Ingestion_Mapping_Reference FluentBitMappingEvents
        Log_Key                     log
        Include_Tag_Key             Off
        Include_Time_Key            On
        Time_Key                    TimeGenerated
        Retry_Limit                 False
        Log_Level                   info
        compression_enabled         on
        ingestion_endpoint_connect_timeout 60
        ingestion_resources_refresh_interval 3600
        # buffering_enabled false

Before we roll out the Fluent Bit deployment, we prepare the Azure Data Explorer side with a new table called KubeEvents in the Kubernetes database.

.create table KubeEvents (
    TimeGenerated: datetime, Namespace: string, Name: string, Kind: string, ApiVersion: string, KubeEventType: string, Action: string,
    Reason: string, Message: string, Count: string, CreationTimestamp: datetime, FirstSeen: datetime, LastSeen: datetime,
    SourceComponent: string, SourceComputer: string, ReportingComponent: string, ReportingComputer: string,
    Cluster: string, Region: string, Environment: string
    )

Afterwards, we set the ingestion mapping.

.create-or-alter table KubeEvents ingestion json mapping "FluentBitMappingEvents"
    ```[
    {"column": "TimeGenerated", "datatype": "datetime", "Properties": {"Path": "$.TimeGenerated"}},
    {"column": "Namespace", "datatype": "string", "Properties": {"Path": "$.log.Namespace"}},
    {"column": "Name", "datatype": "string", "Properties": {"Path": "$.log.Name"}},
    {"column": "Kind", "datatype": "string", "Properties": {"Path": "$.log.Kind"}},
    {"column": "ApiVersion", "datatype": "string", "Properties": {"Path": "$.log.ApiVersion"}},
    {"column": "KubeEventType", "datatype": "string", "Properties": {"Path": "$.log.KubeEventType"}},
    {"column": "Action", "datatype": "string", "Properties": {"Path": "$.log.Action"}},
    {"column": "Reason", "datatype": "string", "Properties": {"Path": "$.log.Reason"}},
    {"column": "Message", "datatype": "string", "Properties": {"Path": "$.log.Message"}},
    {"column": "Count", "datatype": "string", "Properties": {"Path": "$.log.Count"}},
    {"column": "CreationTimestamp", "datatype": "datetime", "Properties": {"Path": "$.log.CreationTimestamp"}},
    {"column": "FirstSeen", "datatype": "datetime", "Properties": {"Path": "$.log.FirstSeen"}},
    {"column": "LastSeen", "datatype": "datetime", "Properties": {"Path": "$.log.LastSeen"}},
    {"column": "SourceComponent", "datatype": "string", "Properties": {"Path": "$.log.SourceComponent"}},
    {"column": "SourceComputer", "datatype": "string", "Properties": {"Path": "$.log.SourceComputer"}},
    {"column": "ReportingComponent", "datatype": "string", "Properties": {"Path": "$.log.ReportingComponent"}},
    {"column": "ReportingComputer", "datatype": "string", "Properties": {"Path": "$.log.ReportingComputer"}},
    {"column": "Cluster", "datatype": "string", "Properties": {"Path": "$.log.Cluster"}},
    {"column": "Region", "datatype": "string", "Properties": {"Path": "$.log.Region"}},
    {"column": "Environment", "datatype": "string", "Properties": {"Path": "$.log.Environment"}},
    ]```

Rollout

Once everything is prepared, we roll out the Fluent Bit deployment to gather Kubernetes events and ingest them to the Azure Data Explorer cluster.

❯ ./deploy.sh TENANT_ID CLIENT_ID CLIENT_SECRET
❯ ./deploy.sh 00000000-0000-0000-0000-000000000000 00000000-0000-0000-0000-000000000000 Pa$$W0rd

As seen in the screenshot below, the single Fluent Bit pod for the Kubernetes event gathering is running in our Azure Kubernetes Service cluster.

Besides that, we see the first data flowing into the Azure Data Explorer table.

Summary

Since Fluent Bit switched to the Kubernetes watch stream the configuration for the input plugin is straightforward. The only challenge is the configuration of an external storage to hold the database to keep record which Kubernetes events have been processed already.

The examples can be found on my GitHub repository.

-> https://github.com/neumanndaniel/scripts/blob/main/Azure_Data_Explorer/Fluent_Bit_Kubernetes/Kubernetes_Events_ADX_Output.kql
-> https://github.com/neumanndaniel/kubernetes/tree/master/fluent-bit/azure-data-explorer-kubernetes-events

↧