• When we talk about private cloud, we often end up talking about automation.

    That makes sense. Nobody wants to build a cloud platform where every request still ends up as a ticket, a manual change, and a meeting. Self-service is part of the point.

    But self-service without guardrails is not really cloud. It is delegated infrastructure access with a nicer user interface.

    That is why I think this is one of the more interesting parts of VMware Cloud Foundation 9.1 and VCF Automation. Users can request virtual machines, namespaces, VKS clusters, networks, and other services, but the platform team can also define the boundaries around that consumption up front.

    • Who can consume what?
    • Where can workloads land?
    • How much can a tenant consume?
    • Which networks can they attach to?
    • When should temporary workloads disappear again?

    Those boundaries are what I mean by guardrails in this context.

    In this article I am focusing on the All Apps organization model in VCF Automation. That is the model built around vSphere Supervisor, regions, region quotas, projects, namespaces, namespace classes, VPCs, and the IaaS services used to consume VMs and Kubernetes-based services. I am not trying to cover the VM Apps organization model or the older Aria Automation-style consumption model here.

    Guardrails are not a single feature

    It is tempting to look for a specific feature in VCF Automation called “guardrails”. I do not think that is the right way to look at it.

    The guardrails are really the result of how the platform model is put together. Organizations, regions, quotas, projects, namespaces, namespace classes, VPCs, catalog items, policies, identity, and extensibility all contribute in different ways.

    Some of this becomes hard limits. Some of it becomes access control. Some of it becomes lifecycle management or placement control. And some of it is simply about giving users sensible defaults instead of making every request a small design exercise.

    That last part is easy to underestimate. A shared platform should not rely on every consumer understanding the full architecture. It should encode enough of the architecture so that normal users can consume services without constantly stepping outside the intended design.

    Layered guardrails model for VCF Automation 9.1 All Apps

    Guardrails in the VCF Automation 9.1 All Apps model, shown as layered platform controls.

    The organization is the tenant boundary

    At the top of the All Apps consumption model we have the organization.

    An organization can represent a tenant, a customer, a department, a line of business, or some other administrative boundary. Exactly what it represents depends on the environment.

    In an enterprise, I would not automatically create an organization per application team. That would probably create too much overhead. I would rather use organizations for larger boundaries where identity, quota, catalog, networking, and governance need to be separated.

    In a service provider or internal platform provider model, the mapping is more obvious. One organization can represent one customer, agency, department, or tenant.

    The key point is that the organization becomes the first real consumption boundary. It groups users, policies, resources, services, catalog entities, and access control.

    This is also where VCF 9 starts to feel less like infrastructure with automation added on the side, and more like an actual platform model. The provider is not just handing consumers access to vCenter and NSX objects, but exposing more controlled abstractions through VCF Automation.

    Regions and quotas define what can be consumed

    A region in VCF Automation represents a collection of infrastructure resources. In practice, it maps to one or more Supervisor-enabled clusters and provides compute, memory, storage, and networking capacity.

    The region by itself does not give an organization access to anything. That access is controlled through a region quota. The quota defines how much of the region an organization can consume, but also which capabilities are available. CPU and memory limits, reservations, VM classes, storage classes, storage limits, and zones are all part of the model.

    This is also why I see region quota as one of the defining constructs of the All Apps model. It is the mechanism that connects provider-managed infrastructure to organization-level consumption.

    I think this is one of the more important guardrails in the design because it changes the nature of self-service. Without quota, self-service depends too much on trust and informal agreements. With quota, it becomes an allocation model. The provider can give an organization access to capacity without giving it unlimited access to the platform.

    It also makes the conversation with the business or application teams more concrete. Instead of saying “please don’t use too much”, the platform team can say “this is the capacity envelope assigned to you”. For a shared private cloud platform, that is a much healthier starting point.

    Projects are where teams start to consume

    Inside an organization, projects are closer to the application team or workload team boundary. Users and groups are added to projects, catalog items can be shared with projects, and resources are provisioned into namespaces that belong to those projects.

    I like this model because it avoids making organizations too granular. The organization can remain the tenant or major administrative boundary, while projects can represent application teams, environments, or delivery groups.

    For example, one organization might represent a department, with multiple projects for application teams, separate namespaces for application environments, and different policies depending on whether the project is used for production, test, or development.

    That gives the platform team a useful hierarchy without exposing every underlying infrastructure detail.

    Namespaces are the application landing zone

    Namespaces are where the consumption model starts to become practical for application teams.

    A namespace gives a team a defined place to deploy into. Compute, memory, storage, and networking are assigned to that namespace, and VMs, VKS clusters, volumes, and other services can then be provisioned inside that boundary.

    That is a cleaner model than giving every team direct access to the underlying infrastructure and expecting them to interpret the platform design correctly. The application team gets a place to land, while the platform team still controls the size of that place, which storage classes can be used, which VM classes are available, and which networks are attached.

    In this context, a namespace is not just a Kubernetes construct. It becomes a cloud consumption boundary for an application or environment.

    Namespace classes make the model repeatable

    If every namespace is created manually with custom settings, the model will eventually drift. Namespace classes help avoid that by turning common namespace patterns into reusable templates.

    A namespace class can define CPU limits, memory limits, reservations, storage limits, VM classes, storage classes, content libraries, and zone placement. This is the kind of construct I like in a platform design because it makes the intended consumption model repeatable without forcing every application team to understand all the underlying design choices.

    Instead of asking each team how much CPU, memory, storage, and which capabilities they need from scratch, the platform team can define a small number of sensible options.

    For example, the platform team might define classes such as dev-small, test-medium, prod-medium, prod-large, gpu-enabled, or restricted.

    The exact names are not important. What matters is giving users enough choice to be useful, without turning every namespace request into a small design exercise.

    Networking guardrails are critical

    Networking is one of the areas where self-service can quickly become problematic. It is useful to let application teams consume network services, but it is usually not a good idea to expose the full networking design to every consumer.

    In VCF Automation, the networking model uses constructs such as VPCs, connectivity profiles, transit gateways, IP blocks, external connections, NAT, VPN, and load balancing. The provider defines the available building blocks, while organization and project users consume them through controlled abstractions.

    IP consumption is part of that model as well. External IP blocks are not just pools of addresses that organizations can freely consume. The provider can apply IP quotas to control how many individual IP addresses or CIDRs an organization can allocate, and can also restrict the largest subnet size that can be requested. That matters because IP address space is often one of the easiest shared resources to waste if it is not governed.

    This fits well with how I think NSX should be used in a modern VCF platform. NSX should provide the network and security foundation, but most consumers should not need to work directly with every NSX object underneath. They need a network model they can consume safely, not unrestricted access to the implementation details.

    The platform team still owns the routing model, external connectivity, IP allocation boundaries, edge capacity, and security architecture. VCF Automation helps expose those capabilities at the right level.

    Catalog items reduce unnecessary freedom

    There is a difference between self-service and free-for-all. A catalog item gives users a controlled way to request something useful, whether that is a VM, a namespace, a VKS cluster, an application blueprint, or a workflow.

    The value is not only in the request itself, but in the design work that happens before the item is published. The platform team or organization administrator can decide what a good request should look like, which inputs should be exposed, which defaults should be used, and which options should be hidden.

    This is also why I think custom forms are more important than they might look at first. They are not just there to make the request page nicer. They can hide complexity, validate input, simplify choices, and reduce the number of ways users can get something wrong.

    A good catalog item should be easy, almost boring, to consume. The design work should happen before the item is published, not every time someone requests it.

    Policies are the visible guardrails

    When people think about governance in automation platforms, they often think about policies first. That makes sense, because approval policies, lease policies, day-2 action policies, and IaaS resource policies are the most explicit governance constructs in VCF Automation.

    Each of them solves a different problem. Approval policies can require human approval before a deployment or day-2 action continues. Lease policies help prevent temporary workloads from living forever. Day-2 action policies control what users are allowed to do after something has been deployed. IaaS resource policies can enforce more technical constraints around what resources are allowed inside namespaces.

    They are all useful, but I would not start with approval policies. If everything requires approval, the platform probably has not encoded enough of the design into hard guardrails. At that point, the approval process becomes a substitute for platform design, and that is not where I would want to end up.

    I would rather use quotas, namespace classes, project scoping, RBAC, catalog design, and IaaS resource policies to prevent bad outcomes automatically. Then use approvals for exceptions, high-cost requests, risky actions, or production-impacting changes.

    Day-2 matters just as much as day-1

    Provisioning usually gets most of the attention. That is understandable. The first thing people want to see from an automation platform is often whether it can deploy something at all.

    But the operating model does not stop when the deployment is created. The next set of questions is just as important: can the user resize the VM, delete the deployment, extend the lease, add a disk, change network settings, or run a custom action?

    Those actions can have just as much impact as the original request. In some cases they are more sensitive. Creating something in a controlled way is one thing. Changing or deleting something that already exists can be a bigger operational risk.

    This is where day-2 action policies become useful. They allow the platform team to define which actions are available after deployment, and to whom. If the deployment process is governed but the operational lifecycle is wide open, the platform still has a gap.

    Lease policies are simple, but powerful

    Lease policies are not a new idea, but they are still one of the most useful guardrails. A lot of waste in infrastructure platforms does not come from bad intent. It comes from resources that were created for a valid reason and then forgotten.

    That is especially common with test environments, proof-of-concepts, sandboxes, and other temporary deployments. The work gets delayed, the person who created the environment moves on, and eventually nobody knows whether it is still needed.

    For those types of workloads, I would almost always use leases. They do not have to be aggressive, but there should be some kind of lifecycle expectation. Temporary resources should either be renewed because they are still needed, or reclaimed because they are not.

    Production is different, of course. I would not want production workloads to disappear because someone forgot to click renew. But for non-production and temporary environments, lease policies are one of the easiest ways to keep the platform clean.

    IaaS resource policies are where hard constraints belong

    IaaS resource policies are interesting because they move beyond human approval and into admission control. Instead of asking someone to approve every exception, the platform can enforce certain rules before resources are created.

    This is where you can control what types of namespace resources are allowed. For example, you might restrict which services can be used, which VM classes are available, which Kubernetes versions are acceptable, or how VKS clusters can be shaped.

    That is closer to how I think platform engineering should work. The desired behavior should not only live in design documents, naming standards, or operational guidelines. Where it makes sense, it should be encoded into the platform itself.

    At the same time, this is also where policy design needs some care. If policies overlap badly or contradict each other, users will not experience the platform as governed. They will experience it as broken.

    For that reason, I would keep the first version simple. Start with a small number of clear policies that express real constraints, and expand when the operational consequences are better understood.

    Identity and roles still matter

    Guardrails are not only about resources. They are also about who can do what, and that is where the role model becomes important.

    VCF Automation has provider-level roles, organization roles, and project roles. That makes it possible to separate platform administration, tenant administration, project administration, consumption, and auditing without giving everyone the same level of access.

    This is another area where I would avoid over-engineering too early. I would rather start with a small number of clear personas:

    • Platform provider administrator
    • Organization administrator
    • Project administrator
    • Project user
    • Auditor

    Once those personas are clear, it becomes easier to decide what each of them should actually be allowed to do.

    The worst outcome is usually not that you have too few roles in the beginning. The worse outcome is a role model nobody understands. If people cannot explain who is allowed to do what, the access model is already too complicated.

    Extensibility fills the gaps

    No platform product will know every rule an organization wants to enforce. There will always be things that depend on local processes, existing systems, naming standards, ownership models, or security requirements.

    That is where event subscriptions, VCF Operations Orchestrator, APIs, and external systems become useful. They allow the platform to connect with the parts of the organization that sit outside VCF Automation itself.

    In practice, that might mean validating names against a standard, creating tickets, updating a CMDB, applying metadata or tags, or enriching an approval based on cost center, data classification, or environment type. These are not always native product features, but they are still real platform requirements.

    I would not use extensibility to compensate for a weak platform design. But when the main model is sound, extensibility is a good way to connect the platform to the rest of the operating model.

    My preferred design approach

    If I were designing this for a serious enterprise VCF platform, I would not start with approval policies. I would start with the platform boundaries.

    The first step would be to define the basic consumption model: organizations, regions, zones, and region quotas. That gives the platform a structure before users start consuming anything.

    After that, I would define the common landing zones and connectivity model. That means standard namespace classes, the VPC and connectivity model, projects, project roles, and the catalog items that represent the normal consumption paths.

    Only then would I start adding policies. IaaS resource policies are useful for hard technical constraints. Lease policies are useful for temporary workloads. Day-2 policies are useful for operational control. Approval policies should be added where human approval actually adds value, not as the default answer to every governance question.

    Finally, I would use extensibility for the things that need to connect to the rest of the organization, such as external validation, CMDB updates, ticketing, metadata, or approval enrichment.

    This order matters because it changes the role of approvals. If you start with approvals, you risk building a ticketing system with a cloud portal in front of it. If you start with the platform boundaries, approvals become the exception rather than the foundation.

    Closing thoughts

    VCF Automation guardrails are not only about stopping users from doing things. They are mainly about making the platform easier to consume safely.

    A good guardrail should not feel like unnecessary friction. It should make the intended path clearer, reduce the number of decisions an application team needs to make, protect the platform from uncontrolled growth, and make operations more predictable.

    In older infrastructure designs, a lot of this discipline lived in documents, meetings, naming standards, and the heads of a few experienced engineers. That can work for a while, but it does not scale very well once more teams start consuming the platform.

    What I like about the VCF Automation model is that more of this discipline can be encoded into the platform itself. Not everything, and not perfectly, but enough to make the operating model more explicit.

    That is why guardrails are interesting to me. Not because they make self-service more restrictive, but because they make self-service more usable. Without them, a private cloud easily becomes just another shared infrastructure environment with a portal in front of it.

  • In my previous article, I reflected on what I would design differently if I were building an NSX platform today. That piece focused on architectural choices — fewer abstractions, clearer boundaries, stronger defaults.

    But design decisions are only part of the story. What ultimately matters is who carries responsibility for how the platform behaves over time.

    Much of my current work revolves around VMware Cloud Foundation and NSX. That hasn’t changed. What has changed is the altitude at which I tend to operate. The conversations are less about individual features and more about responsibility boundaries, lifecycle, and what the platform should enforce by default. VCF 9 simply makes those questions harder to ignore.

    Deploying a platform is one thing. Owning it is something else entirely. And on a VCF 9 platform, that distinction becomes very visible.

    Defaults and Guardrails

    On a VCF 9 platform, one of the primary responsibilities of a platform engineer is defining the defaults that everything else inherits. This sounds simple, but it rarely is.

    Defaults determine how namespaces are structured, how VPCs are segmented, what network isolation patterns are applied automatically, and which policies are enforced without anyone having to request them explicitly. They define what “normal” looks like.

    In many organizations, the tendency is to focus on flexibility first. Every team wants options. Every application is treated as unique. Over time, this leads to exception-driven design, where the platform becomes a collection of special cases. A platform engineer has to resist that.

    The goal isn’t to maximize flexibility. It’s to maximize safe autonomy. Application teams should be able to move quickly — within boundaries that are intentional and well understood. If every new workload requires new networking decisions or repeated security debates, the platform isn’t providing enough guidance.

    In a VCF 9 world, constructs like VPCs and integrated lifecycle management make it easier to encode these decisions directly into the platform. That doesn’t remove responsibility. It concentrates it. The platform engineer must decide which choices are available, which are restricted, and which shouldn’t be visible at all. Those decisions shape the operational behavior of the platform far more than any single feature configuration.

    Lifecycle Over Provisioning

    Provisioning is visible. Lifecycle is not.

    On a VCF 9 platform, provisioning a workload — whether a VM or a Kubernetes cluster — is increasingly straightforward. Templates exist. Policies are predefined. The automation layer does most of the work. That part isn’t where the platform engineer earns their keep.

    The real responsibility sits in lifecycle.

    Clusters need to be upgraded. Supervisor versions change. NSX components evolve. Dependencies shift. Policies that made sense six months ago may conflict with new operating models — all without breaking what already runs on the platform.

    A platform engineer has to think in terms of failure domains and blast radius. What happens when a management domain upgrade introduces a behavioral change? How isolated are tenant VPCs during an incident? What is the rollback strategy if a network policy update has unintended side effects?

    These aren’t provisioning questions. They’re lifecycle questions.

    In many environments, the temptation is to optimize for day-one deployment. The real complexity shows up in year two and year three, when the platform has grown, ownership has shifted, and the original architects are no longer deeply involved.

    On VCF 9, lifecycle is integrated across compute, storage, networking, and automation. That integration is powerful, but at the same time it also means that changes ripple across layers. The platform engineer needs to understand those relationships and design with change in mind from the beginning.

    Provisioning makes something exist. Lifecycle ensures it continues to operate safely over time.

    What the Platform Engineer Is Not Responsible For

    Clear responsibility boundaries are just as important as defined ownership.

    In many organizations, “platform engineering” becomes a catch-all term. Anything that sits somewhere between infrastructure and applications tends to fall under it, and that ambiguity creates friction quickly.

    On a VCF 9 platform, the platform engineer isn’t responsible for writing application code, designing CI/CD pipelines, tuning individual microservices, or debugging application-level issues. Those are different domains, owned by different teams.

    The platform engineer also isn’t responsible for designing bespoke networking patterns for every workload. If each new application triggers a new NSX policy design session, the platform engineer has drifted into application ownership. The goal is to define patterns once and let teams consume them safely.

    Nor is the platform engineer a ticket router between compute, storage, and networking teams. One of the reasons VCF exists is to reduce fragmentation between those domains. The responsibility is to ensure they behave coherently — not to manually coordinate every interaction.

    In practice, this means saying “no” at the right time. It means pushing back when flexibility undermines operability. It means protecting the platform from well-intentioned customization that introduces long-term fragility.

    Platform engineering isn’t about controlling everything. It’s about deciding what must be controlled — and leaving the rest to the teams that own it.

    Closing Thoughts

    On a VCF 9 platform, the role of the platform engineer is less about configuring components and more about defining how they work together over time.

    The technology stack is mature. The automation layers are capable. Provisioning is no longer the hard part.

    Responsibility is.

    Defining defaults, designing for lifecycle, setting boundaries, and protecting coherence across compute, storage, networking, and automation aren’t glamorous tasks — but they determine whether the platform remains stable and operable years after its initial deployment.

    In that sense, platform engineering on VCF 9 isn’t a new discipline. It’s a shift in focus — less about building things and more about owning how they behave over time. And that shift changes what the role is really about.

  • When I started designing large NSX platforms, most of the hard problems were technical.

    How far could we push microsegmentation?
    How much overlay networking could we introduce?
    How flexible could we make the design so it would survive future requirements?

    At the time, that made a lot of sense.

    Today, the situation is different. NSX is mature, stable, and extremely capable. But the environment around it has changed quite a lot. VMware Cloud Foundation 9, platform thinking, GitOps, and new operating models have shifted where things like responsibility, security, and control should live.

    Looking back, I don’t think we designed things wrong. The designs were actually pretty good given the context. But if I were to design an NSX platform today, there are several choices I would make differently — not because NSX matters less, but because it needs to matter in a different way.

    I would design for fewer abstractions, not more

    Early NSX designs often aimed for maximum flexibility.

    We built layers of abstractions: segments, groups, policies, nested constructs. The thinking was usually sound — future-proofing, reuse, and separation of concerns. And technically, this worked.

    Operationally, it often didn’t.

    Every abstraction adds cognitive load. Someone has to understand it, maintain it, debug it, and eventually explain it to the next team. Over time, that cost adds up faster than most people expect.

    If I were designing today, I would be more conservative. I would introduce abstractions only when there is a clear need — not just because “we might need it later”.

    I would stop treating microsegmentation as the primary security control

    This is not an argument against microsegmentation. It is still one of the strongest features NSX offers.

    But in the past, microsegmentation was often treated as the foundation of security — something every workload would eventually get, almost by default.

    What has changed here is that there are now more layers that contribute meaningfully to blast-radius reduction:

    • Identity-aware workloads
    • Stronger application-level security controls
    • Kubernetes-native isolation mechanisms
    • Platform-level guardrails

    In many environments, I’ve seen microsegmentation applied too early and too broadly. The result is usually slower adoption, more fragile policy sets, and eventually, a security posture that is hard to reason about during incidents.

    Today, I would treat microsegmentation as a precision tool, not a baseline requirement. Extremely valuable in the right places — but not something that needs to be everywhere from day one.

    The goal remains the same: reduce blast radius.
    There are now simply more ways to get there.

    What NSX is (and is not) meant to be today

    This is something I myself underestimated early on.

    NSX is very powerful, and that sometimes creates pressure to use it as a solution for problems it was never meant to solve.

    NSX is not:

    • A developer-facing API
    • A CI/CD system
    • An application security framework
    • A replacement for platform governance

    With Supervisor, namespaces, and Kubernetes in the picture, there is often pressure to surface NSX concepts upwards into application design. In many cases, that is a mistake.

    NSX works best when it is treated as foundational infrastructure — something that provides strong defaults and guardrails, but stays largely invisible to application teams.

    NSX is becoming part of the platform, not a separate design exercise

    If I were designing an NSX platform today, I would lean heavily into the fact that NSX is no longer a separate layer that needs to be engineered and exposed on its own.

    With the VPC construct and the VCF Automation All Apps model, NSX becomes an integrated part of the platform’s native application model. Lifecycle, isolation, and consumption patterns are defined up front, rather than stitched together through project-specific designs.

    From an architectural perspective, this is an important shift. It reduces the need to surface NSX concepts upward, while still preserving strong isolation and governance underneath. It also enforces some of the discipline that previously depended entirely on human restraint.

    This does not remove the need for NSX expertise — but it does change where that expertise should be applied. The focus moves away from per-application engineering and toward designing the platform’s default behavior correctly from the start.

    I would align NSX more with platform lifecycle, not projects

    One reason lifecycle thinking matters so much is organizational reality. Platforms rarely live in stable conditions. Teams change, people leave, responsibilities shift, and platforms get handed over.

    Designs that assume long-term ownership by the same small group tend to fail quietly over time. When I design NSX platforms today, I try to assume that I won’t be there in a few years — and that the people operating it will have different constraints and priorities than I did.

    This pushes me toward standardization, clearer boundaries, and designs that are easier to reason about under pressure.

    Closing Thoughts

    NSX still matters. A lot.

    But its value today is less about how much it can do, and more about how deliberately it is used. Platform-centric and cloud-native operating models don’t make NSX irrelevant — they make its role clearer and more focused.

    If I were designing an NSX platform today, I would aim for fewer concepts, clearer boundaries, and designs that survive not just technical change, but organizational change as well.

    The best NSX platforms I see today are often the ones nobody talks about — because they just work, and nobody is afraid to touch them.

  • Avi Load Balancer Metrics with Prometheus and Grafana

    Avi Load Balancer offers a wealth of valuable metrics that can be accessed directly via the Avi Controller’s UI or API.

    However, there are various reasons why you might want to make these metrics available outside of its native platform. For instance, you might wish to avoid granting users or systems direct access to the Avi Load Balancer management plane solely for metric consumption. Alternatively, you might need to store and analyze metrics in a centralized system or simply back them up for future use.

    Fortunately, there are several methods for fetching metrics from the Avi Load Balancer and processing or storing them externally. In this article, I’ll guide you through the process of setting up an automated workflow where Avi Load Balancer metrics are fetched by Prometheus and visualized in Grafana.

    Lab Environment

    My lab environment for this exercise consists of the follow components:

    • vSphere 8 Update 3
    • A vSphere cluster with 3 ESXi hosts configured as Supervisor
    • A TKG Service cluster (Kubernetes cluster) with 1 controlplane node and 3 worker nodes
    • NSX 4.2.1.0 as the network stack
    • Avi Load Balancer 30.2.2 with DNS virtual service
    • Avi Kubernetes Operator (AKO)
    • vSAN storage

    Apart from the Avi Load Balancer, none of these components are strictly required. For all I know, this exercise could be performed using upstream Kubernetes on bare metal instead. However, this is how my lab is currently configured, and I wanted to share that setup for your reference.

    High-Level Overview

    Below is a simple high-level overview illustrating the workflow we’re going to build. It demonstrates how Avi Load Balancer metrics flow through the system, from collection to visualization, using Prometheus and Grafana.

    The various components—Grafana, Prometheus, and Avi API Proxy—will be deployed as pods within my Kubernetes cluster.

    Let’s go!

    Namespace

    Keeping the components together in a dedicated namespace is my preferred approach in this case. This way, Prometheus can communicate with the Avi API Proxy using its Kubernetes-internal FQDN, and the same applies to communication between Grafana and Prometheus.

    Create the observability namespace:

    Deploying Components

    Now, we can begin deploying the various components within this namespace.

    Avi API Proxy

    The Avi API Proxy is not a required component, but I recommend using it. Without the proxy, Prometheus would need to communicate directly with the Avi Controller. This would require enabling Basic Auth on the Avi Controller, which might not be desirable. There are additional advantages, as outlined in the official documentation. Essentially, by placing a proxy between the Avi Controller and Prometheus, we abstract away some complexity, resulting in a cleaner and more manageable solution.

    The official documentation also references a Docker container. However, since I want to deploy the Avi API Proxy as a pod in Kubernetes, the manifest for the deployment I came up with (including the method to expose it) looks like this:

    ##
    ## avi-api-proxy-deployment.yaml
    ##
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: avi-api-proxy
      namespace: observability
      labels:
        app: avi-api-proxy
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: avi-api-proxy
      template:
        metadata:
          labels:
            app: avi-api-proxy
        spec:
          containers:
          - name: avi-api-proxy
            image: avinetworks/avi-api-proxy:latest
            ports:
            - containerPort: 8080
            env:
            - name: AVI_CONTROLLER
              value: "10.203.240.15"
            - name: AVI_USERNAME
              value: "prometheus"
            - name: AVI_PASSWORD
              value: "VMware1!"
            - name: AVI_TIMEOUT
              value: "60"
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: avi-api-proxy-service
      namespace: observability
      labels:
        app: avi-api-proxy
    spec:
      selector:
        app: avi-api-proxy
      ports:
      - protocol: TCP
        port: 8080
        targetPort: 8080
    

    You’ll have to update the values for AVI_CONTROLLER, AVI_USERNAME, and AVI_PASSWORD . After that you should be good to go:

    Verify that the deployment and service are up and running:

    Looking good!

    Prometheus ConfigMap

    Prometheus uses a configuration file in YAML format. Since we’re deploying Prometheus in Kubernetes, we’ll add the contents of our specific configuration file as a ConfigMap within our namespace. We’ll then instruct Prometheus to look for the configuration in that ConfigMap:

    ##
    ## prometheus-configmap.yaml 
    ##
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-server-conf
      labels:
        name: prometheus-server-conf
      namespace: observability
    data:
      prometheus.yml: |-
        global:
          scrape_interval: 15s
          evaluation_interval: 15s
        rule_files:
          - /etc/prometheus/prometheus.rules
        scrape_configs:
          - job_name: avi_api_vs1  ## Job name
            honor_timestamps: true
            params:
              tenant:
              - admin   ## Tenant Names to be mentioned comma separated 
            scrape_interval: 1m ## scrape interval
            scrape_timeout: 45s ## scrape timeout
            metrics_path: /api/analytics/prometheus-metrics/virtualservice ## VirtualService metrics collected
            scheme: http
            follow_redirects: true
            metric_relabel_configs:  ## config to replace the controller instance name
            - source_labels: [instance]
              separator: ;
              regex: (.*)
              target_label: instance
              replacement: pod-240-alb-controller ## replacement name to be used
              action: replace
            static_configs:
            - targets:
              - avi-api-proxy-service.observability.svc:8080 ## avi-api-proxy container ip address and port 
          - job_name: avi_api_se_specific
            honor_timestamps: true
            params:
              metric_id:
              - se_if.avg_bandwidth,se_if.avg_rx_pkts,se_if.avg_rx_bytes,se_if.avg_tx_bytes,se_if.avg_tx_pkts  ## Specific SE metrics which are collected
              tenant:
              - admin
            scrape_interval: 1m
            scrape_timeout: 45s
            metrics_path: /api/analytics/prometheus-metrics/serviceengine   ## Metrics path for  Service Engine
            scheme: http
            follow_redirects: true
            metric_relabel_configs:
            - source_labels: [instance]
              separator: ;
              regex: (.*)
              target_label: instance
              replacement: pod-240-alb-controller
              action: replace
            static_configs:
            - targets:
              - avi-api-proxy-service.observability.svc:8080
          - job_name: avi_api_se
            honor_timestamps: true
            params:
              tenant:
              - admin
            scrape_interval: 1m
            scrape_timeout: 45s
            metrics_path: /api/analytics/prometheus-metrics/serviceengine ## Metrics path for  Service Engine
            scheme: http
            follow_redirects: true
            metric_relabel_configs:
            - source_labels: [instance]
              separator: ;
              regex: (.*)
              target_label: instance
              replacement: pod-240-alb-controller
              action: replace
            static_configs:
            - targets:
              - avi-api-proxy-service.observability.svc:8080
          - job_name: avi_api_pool
            honor_timestamps: true
            params:
              tenant:
              - admin
            scrape_interval: 1m
            scrape_timeout: 45s
            metrics_path: /api/analytics/prometheus-metrics/pool  ## Metrics path for Pool  
            scheme: http
            follow_redirects: true
            metric_relabel_configs:
            - source_labels: [instance]
              separator: ;
              regex: (.*)
              target_label: instance
              replacement: pod-240-alb-controller
              action: replace
            static_configs:
            - targets:
              - avi-api-proxy-service.observability.svc:8080
          - job_name: avi_api_controller
            honor_timestamps: true
            scrape_interval: 1m
            scrape_timeout: 45s
            metrics_path: /api/analytics/prometheus-metrics/controller  ## Metrics path for Avi Controller 
            scheme: http
            follow_redirects: true
            metric_relabel_configs:
            - source_labels: [instance]
              separator: ;
              regex: (.*)
              target_label: instance
              replacement: pod-240-alb-controller
              action: replace
            static_configs:
            - targets:
              - avi-api-proxy-service.observability.svc:8080
    

    You’ll want to replace pod-240-alb-controller with the name of your Avi Load Balancer Controller.

    Note that we’re targeting the Avi API Proxy service and addressing it by its internal FQDN: avi-api-proxy-service.observability.svc

    Create the ConfigMap:

    Prometheus

    The manifest for the Prometheus deployment looks like this:

    ##
    ## prometheus-deployment.yaml
    ##
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: prometheus-deployment
      namespace: observability
      labels:
        app: prometheus-server
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus-server
      template:
        metadata:
          labels:
            app: prometheus-server
        spec:
          containers:
            - name: prometheus
              image: prom/prometheus
              args:
                - "--storage.tsdb.retention.time=12h"
                - "--config.file=/etc/prometheus/prometheus.yml"
                - "--storage.tsdb.path=/prometheus/"
              ports:
                - containerPort: 9090
              resources:
                requests:
                  cpu: 500m
                  memory: 500M
                limits:
                  cpu: 1
                  memory: 1Gi
              volumeMounts:
                - name: prometheus-config-volume
                  mountPath: /etc/prometheus/
                - name: prometheus-storage-volume
                  mountPath: /prometheus/
          volumes:
            - name: prometheus-config-volume
              configMap:
                defaultMode: 420
                name: prometheus-server-conf
            - name: prometheus-storage-volume
              emptyDir: {}
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: prometheus-service
      namespace: observability
      annotations:
          prometheus.io/scrape: 'true'
          prometheus.io/port:   '9090'
    spec:
      selector:
        app: prometheus-server
      ports:
        - port: 9090
          protocol: TCP
          targetPort: 9090
    

    Note that the contents of the ConfigMap are made accessible to Prometheus via a volume.

    Deploy Prometheus:

    Check the result:

    We’re good.

    Ingress for Prometheus (optional)

    This is optional, but Prometheus has a web UI that can be quite handy from time to time. Additionally, since I have the Avi Kubernetes Operator (AKO) configured in my cluster, it’s easy to create an Ingress for the Prometheus service.

    ##
    ## prometheus-ingress.yaml
    ##
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: prometheus-ingress
      namespace: observability
    spec:
      ingressClassName: avi-lb
      rules:
        - host: prometheus.ako.lab
          http:
            paths:
            - path: /
              pathType: Prefix
              backend:
                service:
                  name: prometheus-service
                  port: 
                    number: 9090
    

    The prometheus.ako.lab and several other Ingress hosted by Avi Load Balancer in my lab:

    Grafana ConfigMap

    Grafana is the component that transforms our Prometheus metrics into visually appealing graphs, making it easier to interpret the data.

    The configuration we need to inject into our Grafana instance is the Prometheus data source. To do this, we’ll use the following ConfigMap:

    ##
    ## grafana-configmap.yaml 
    ##
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: grafana-datasources
      namespace: observability
    data:
      prometheus.yaml: |-
        {
            "apiVersion": 1,
            "datasources": [
                {
                   "access":"proxy",
                    "editable": true,
                    "name": "Prometheus",
                    "orgId": 1,
                    "type": "prometheus",
                    "url": "http://prometheus-service.observability.svc:9090",
                    "version": 1
                }
            ]
        }
    

    Grafana will use Prometheus’s internal FQDN to access the service: prometheus-service.observability.svc

    Create the ConfigMap:

    Grafana

    Finally, we deploy Grafana using this manifest:

    ##
    ## grafana-deployment.yaml
    ##
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: grafana
      namespace: observability
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: grafana
      template:
        metadata:
          name: grafana
          labels:
            app: grafana
        spec:
          containers:
          - name: grafana
            image: grafana/grafana:latest
            ports:
            - name: grafana
              containerPort: 3000
            resources:
              limits:
                memory: "1Gi"
                cpu: "1000m"
              requests: 
                memory: 500M
                cpu: "500m"
            volumeMounts:
              - mountPath: /var/lib/grafana
                name: grafana-storage
              - mountPath: /etc/grafana/provisioning/datasources
                name: grafana-datasources
                readOnly: false
          volumes:
            - name: grafana-storage
              emptyDir: {}
            - name: grafana-datasources
              configMap:
                  defaultMode: 420
                  name: grafana-datasources
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: grafana-service
      namespace: observability
      annotations:
          prometheus.io/scrape: 'true'
          prometheus.io/port:   '3000'
    spec:
      selector:
        app: grafana
      ports:
        - port: 3000
          protocol: TCP
          targetPort: 3000
    

    Validate the result:

    Ingress for Grafana (optional)

    Creating an Ingress for the Grafana service is also optional. If you want to access this Grafana instance from outside the Kubernetes cluster, there are several ways to achieve that. In my case, I’ll create an Ingress and let AKO handle the rest. 😉

    ##
    ## grafana-ingress.yaml
    ##
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: grafana
      namespace: observability
    spec:
      ingressClassName: avi-lb
      rules:
        - host: grafana.ako.lab
          http:
            paths:
            - path: /
              pathType: Prefix
              backend:
                service:
                  name: grafana-service
                  port: 
                    number: 3000
    

    Grafana Dashboards

    At this point, we should have a working solution. Prometheus is fetching and storing Avi Load Balancer metrics through the Avi API Proxy, while Grafana is configured with the Prometheus data source and ready to visualize the data using dashboards.

    Speaking of dashboards, I found some deep down in the devops repository of Avi Networks GitHub that work out-of-the-box with what we’ve set up here. There are a total of six dashboards::

    Make sure to download these files, then log in to Grafana and import them as dashboards:

    Once all the files have been imported, you should see something similar to this:

    You can click through the slideshow below to view a screenshot of each dashboard:

    Summary

    In this article, we explored the steps for configuring an automated workflow that collects Avi Load Balancer metrics using Prometheus via the Avi API Proxy and visualization through Grafana dashboards. All components were deployed as containers within a dedicated Kubernetes namespace.

    This simple solution was implemented in an isolated lab environment. Deploying it in a production environment would require additional considerations, such as externalization, persistence, security, backup strategies, and adherence to Kubernetes best practices.

    The YAML manifests that I used in this exercise can be found in my GitHub repository.

    Hopefully this article gave you some inspiration on your network observability journey.

    Thank you for reading! Feel free to share your thoughts or ask questions in the comments below or just reach out to me directly.

    References:

    How to Setup Prometheus Monitoring On Kubernetes Cluster
    Avi Load Balancer Prometheus Integration

  • Network Visibility for TKG Service Clusters

    TKG Service Clusters using the default Antrea CNI, can be easily configured for enhanced network visibility through flow visualization and monitoring.

    The ability to monitor network traffic within your Kubernetes clusters, as well as between your Kubernetes constructs and the outside world, is essential for understanding system behavior—and especially important when things aren’t working as intended.

    In this article, I’ll walk you through the steps to enable network visibility specifically for TKG Service Clusters. However, similar steps can be applied to any Kubernetes cluster that is using the Antrea CNI.

    Bill of Materials

    My lab environment for this exercise consists of the follow components:

    • vSphere 8 Update 3
    • A vSphere cluster with 3 ESXi hosts configured as Supervisor
    • A TKG Service cluster with 1 controlplane node and 3 worker nodes
    • NSX 4.2.1.0 as the network stack
    • Avi Load Balancer 30.2.2 with DNS virtual service
    • Avi Kubernetes Operator (AKO)
    • vSAN storage

    Note: Neither Avi Kubernetes Operator (AKO) nor Avi Load Balancer are required components, but you’ll most likely run into these when working with production vSphere Supervisor environments.

    Diagram

    The diagram below provides a high-level overview of the lab environment.

    Obviously, not all details are provided in this overview, but it should at least give you an idea on how the lab environment has been configured.

    Step 1 – Enable FlowExporter in AntreaConfig

    The first step is to make sure that the Antrea FlowExporter feature is enabled in our cluster.

    Connect to the Supervisor endpoint for our Namespace:

    Switch to the context where the AntreaConfig is stored:

    Fetch the name of the AntreaConfig:

    And finally edit the AntreaConfig:

    Here we need have a look at two items:

    • spec.antrea.config.featureGates.FlowExporter
    • spec.antrea.config.flowExporter.enable

    Both of these flags must be set to true:

    apiVersion: cni.tanzu.vmware.com/v1alpha1
    kind: AntreaConfig
    metadata:
      name: tkgs-cluster-1-antrea-package
      namespace: tkgs-ns-1
    spec:
      antrea:
        config:
          featureGates:
            ...
            FlowExporter: true
          flowExporter:
            ...
            enable: true
    

    In the case that changes had to be made to the AntreaConfig, the Antrea agents need to be restarted inside the TKG Service cluster. Connect to the TKG Service cluster:

    Switch to the context of the cluster:

    Issue the following command to restart the Antrea agent pods:

    Step 2 – Install the Flow Aggregator

    Next we install the Flow Aggregator and for that we’ll use Helm. Make sure that you’re still connected to the TKG Service cluster and in the cluster’s context.

    Add the Helm repository and receive the latest information about its charts:

    Install the Flow Aggregator using the following Helm command:

    Note: After installing the Flow Aggregator you wil notice that its pod moves into a CrashLoopBackOff state. This is expected behaviour as the service it’s trying to connect to is not installed yet.

    Step 3 – Install and Configure Theia

    Theia is installed on top of Antrea and consumes the network flows that are exported by Antrea.

    What I like about Theia is that it comes with ClickHouse and Grafana pre-configured. This means that almost everything works out-of-the-box. Flow data is processed, stored, and visualized without having to spend time on manually configuring and maintaining integrations.

    We’ll use Helm to install Theia as well:

    I mentioned that “almost” everything works out of the box. Well, life is rarely perfect, and to get everything working as intended, we (unfortunately) need to roll up our sleeves and get our hands dirty.

    Fortunately, it’s not too complicated. What happened is that for some reason, a couple of columns are missing for some tables in the ClickHouse database. We need to manually add these to allow the Flow Aggregator pod to exit the CrashLoopBackOff state.

    Exec into the Clickhouse pod and start a shell:

    kubectl -n flow-visibility exec --stdin --tty chi-clickhouse-clickhouse-0-0-0 -- /bin/bash

    Start the Clickhouse client:

    Add the missing columns to the tables by using these commands:

    Exit from the Clickhouse client and the pod.

    BTW, I did appreciate the message that was printed when exiting the Clickhouse client. All is forgiven now, well, almost all 🙂

    Step 4 – Consume Network Visibility

    The components are in place and it’s time to have a look at what we’ve ended up with.

    By default the Grafana Pod is exposed using a NodePort service:

    This means that we can access Grafana on the IP address of a node using (in this case) port 32366. To help you find out which node IP and port you should use, the Theia documentation provides a series of commands that will provide that information:

    Using that IP address and port, we can connect to the Grafana service. After logging in with the default credentials (username: admin, password: admin) and changing the initial password, we arrive at the homepage. This page provides an cluster overview dashboard featuring some key metrics for the entire cluster:

    There are eight dashboard pages in total. The Flow Records and the Network Topology dashboards look particulary interesting as well:

    Bonus Step – Ingress for Grafana

    As mentioned earlier, I deployed and configured Avi Load Balancer with a DNS virtual service for this lab. Additionally, I set up the Avi Kubernetes Operator in my TKG Service cluster (tkgs-cluster-1). With these components in place, I can create an Ingress to access the Grafana service using a proper FQDN:

    First I expose the Grafana pod as a ClusterIP service. My theia-dashboard-service.yaml contains the following:

    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: theia-dashboard-service
      namespace: flow-visibility
    spec:
      selector:
        app: grafana
      ports:
        - port: 3000
    

    Creating the service:

    Next, I create the Ingress. My theia-dashboard-ingress.yaml looks like this:

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: theia-dashboard-ingress
      namespace: flow-visibility
    spec:
      ingressClassName: avi-lb
      rules:
        - host: theia-dashboard.ako.lab
          http:
            paths:
            - path: /
              pathType: Prefix
              backend:
                service:
                  name: theia-dashboard-service
                  port:
                    number: 3000
    

    Creating the Ingress:

    Now AKO will do its magic creating a pool, VIP, and a virtual service. Additionally, Avi Load Balancer DNS will take care of registring the DNS record (theia-dashboard.ako.lab):

    Using the FQDN as defined in my Ingress to reach Grafana:

    Summary

    Adding network visibility to your TKG Service clusters—or any Kubernetes cluster using the Antrea CNI, for that matter—is not a complex task. Aside from the missing columns issue, it actually works seamlessly out of the box. These are exactly the kinds of solutions many of my customers are looking for. 😊

    Hopefully this article gave you some inspiration on your network observability journey.

    Thank you for reading! Feel free to share your thoughts or ask questions in the comments below or reach out to me directly.

  • Integrating TKG Service Clusters with NSX Security

    Organizations aiming to leverage NSX for securing their TKG Service Clusters (Kubernetes clusters) can now achieve this with relative ease. In this guide, I’ll walk you through configuring the integration between a TKG Service Cluster and NSX—a required step for centrally managing security policies within TKG Service Clusters and between these clusters and external networks.

    Architecture Diagram

    For your reference, the diagram below, which is part of the NSX documentation, illustrates the architecture for the integration. Key component is the Antrea NSX adapter running on the control plane nodes of the TKG Service Cluster.

    Bill of Materials

    My lab environment for this exercise includes the following components:

    • vSphere 8.0 Update 3
    • A vSphere cluster with 4 ESXi hosts
    • NSX 4.2.1
    • vSAN storage
    • NSX Networking & Security, deployed and configured

    Note: For this proof-of-concept, I did not use Avi Load Balancer. However, this component is typically included in production SDDC environments.

    Step 1 – Activate the Supervisor Service

    Before deploying any TKG Service Clusters, you must configure and activate the Supervisor service on a vSphere cluster. This can be achieved through the vCenter GUI, API calls, or by importing a configuration file.

    To save some time and space, I’ll share the contents of the Supervisor configuration file I used to active the Supervisor service in my lab.

    {
      "specVersion": "1.0",
      "supervisorSpec": {
        "supervisorName": "Pod-210-S1"
      },
      "envSpec": {
        "vcenterDetails": {
          "vSphereZones": [
            "domain-c9"
          ],
          "vcenterAddress": "Pod-210-vCenter.SDDC.Lab",
          "vcenterDatacenter": "Pod-210-DataCenter"
        }
      },
      "tkgsComponentSpec": {
        "tkgsStoragePolicySpec": {
          "masterStoragePolicy": "vSAN Default Storage Policy",
          "imageStoragePolicy": "vSAN Default Storage Policy",
          "ephemeralStoragePolicy": "vSAN Default Storage Policy"
        },
        "tkgsMgmtNetworkSpec": {
          "tkgsMgmtNetworkName": "SEG-VKS-Management",
          "tkgsMgmtIpAssignmentMode": "STATICRANGE",
          "tkgsMgmtNetworkStartingIp": "10.204.210.10",
          "tkgsMgmtNetworkGatewayCidr": "10.204.210.1/24",
          "tkgsMgmtNetworkDnsServers": [
            "10.203.0.5"
          ],
          "tkgsMgmtNetworkSearchDomains": [
            "sddc.lab"
          ],
          "tkgsMgmtNetworkNtpServers": [
            "10.203.0.5"
          ]
        },
        "tkgsNcpClusterNetworkInfo": {
          "tkgsClusterDistributedSwitch": "Pod-210-VDS",
          "tkgsNsxEdgeCluster": "Pod-210-T0-Edge-Cluster-01",
          "tkgsNsxTier0Gateway": "T0-Gateway-01",
          "tkgsNamespaceSubnetPrefix": 28,
          "tkgsRoutedMode": false,
          "tkgsNamespaceNetworkCidrs": [
            "10.244.0.0/19"
          ],
          "tkgsIngressCidrs": [
            "10.204.211.0/25"
          ],
          "tkgsEgressCidrs": [
            "10.204.211.128/25"
          ],
          "tkgsWorkloadDnsServers": [
            "10.203.0.5"
          ],
          "tkgsWorkloadServiceCidr": "10.96.0.0/22"
        },
        "controlPlaneSize": "MEDIUM"
      }
    }
    

    Note: Obviously I’m using NSX as the networking stack for the Supervisor service.

    Importing a configuration file is done at step 1 of the Supervisor service activation wizard:

    For more details on Supervisors, TKG Service Clusters, and related concepts, check the vSphere documentation and its chapters on the vSphere IaaS Control Plane.

    Step 2 – Create a vSphere Namespace

    A vSphere Namespace is the runtime environment for TKG Service Clusters. You can create one by using the vCenter UI, making an API call, or kubectl.

    In the vCenter UI:

    • Navigate to Workload Management > Namespaces > New Namespace:

    After creation, configure permissions, storage policies, and the VM Service. Here’s a snapshot of my vSphere Namespace configuration for this exercise:

    Step 3 – Prepare the vSphere Namespace for the NSX Integration

    Before deploying a TKG Service Cluster, we need to make sure the Antrea-NSX Adapter as well as Antrea Policy are enabled for any TKG Service Clusters being deployed within the Namespace. This is accomplished by adding an AntreaConfig to the Namespace. Below is the configuration file that I used in my lab:

    # AntreaConfig.yaml
    apiVersion: cni.tanzu.vmware.com/v1alpha1
    kind: AntreaConfig
    metadata:
     name: vks-cluster-1-antrea-package # The TKG Service Cluster name as prefix is required
     namespace: vks-ns-1
    spec:
      antrea:
        config:
          featureGates:
            AntreaTraceflow: true # Facilitates network troubleshooting and visibility (Optional)
            AntreaPolicy: true # Enables advanced policy capabilities in Antrea (Required)
            NetworkPolicyStats: true # Provides visibility into the enforcement of network policies (Optional)
      antreaNSX:
        enable: true # This is the Antrea-NSX adapter which is disabled by default
    

    Using kubectl follow these steps:

    Connect to the Supervisor endpoint for the Namespace:

    kubectl vsphere login --server=10.204.212.2 --vsphere-username administrator@vsphere.local --insecure-skip-tls-verify

    Switch to your Namespace context:

    kubectl config use-context vks-ns-1

    Apply the YAML file containing the AntreaConfig:

    kubectl apply -f AntreaConfig.yaml 

    Step 4 – Deploy the TKG Service Cluster

    Finally, deploy the TKG Service Cluster using a cluster specification file. The specification below is what I used in my lab:

    # vks-cluster-1.yml
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      name: vks-cluster-1 # The name of the TKG Service Cluster. Must match the prefix in AntreaConfig
      namespace: vks-ns-1 # The vSphere Namespace created in step 2
    spec:
      clusterNetwork:
        services:
          cidrBlocks: ["10.97.0.0/23"] # Internal non-routable IP CIDR for services
        pods:
          cidrBlocks: ["10.245.0.0/20"] # Internal non-routable IP CIDR for Pods
        serviceDomain: "cluster.local"
      topology:
        class: tanzukubernetescluster
        version: v1.30.1---vmware.1-fips-tkg.5
        controlPlane:
          replicas: 1 # The number of control plane nodes
        workers:
          machineDeployments:
            - class: node-pool
              name: node-pool-01
              replicas: 3 # The number of worker nodes
        variables:
          - name: vmClass
            value: best-effort-medium # The VM class you assigned during step 2
          - name: storageClass
            value: vsan-default-storage-policy # The storage policy you assigned during step 2
    

    Apply the YAML file with the cluster specification to initiate the cluster deployment:

    kubectl apply -f vks-cluster-1.yml

    Verifying Integration Status in NSX Manager

    Once the TKG Service Cluster is fully deployed we can head over to NSX Manager where we should find our TKG Service Cluster under System > Configuration > Fabric > Nodes > Container Clusters > Antrea. Preferably, the cluster should appear in an “Up” state. 😊

    Another place where we can verify the integration is under Inventory > Containers > Clusters:

    Using the Integration

    With the integration in place, NSX Manager provides centralized control over security policies within the TKG Service Cluster, leveraging Antrea-native policies. These policies are designed to enhance security by managing traffic between Kubernetes resources, such as Pods.

    In NSX Manager, this is achieved by assigning Kubernetes Pods, Services, and/or Namespaces as members of NSX Antrea groups. These groups are then referenced in the rules created within the NSX Distributed Firewall interface.

    Example of an NSX Antrea group with a Pod as a member:

    Policies are applied to TKG Service Clusters at the policy-level “Applied To” scope. Rules within these policies can specify an Antrea group or IP addresses/CIDRs as the source. Instead of explicitly defining a destination, the “Applied To” field is used, targeting the Antrea group that contains the resources serving as the destination.

    Using NSX Generic groups, you can secure traffic between virtual machines and Kubernetes constructs like service, namespace, and ingress along with some other Kubernetes constructs.

    Example of an NSX Generic group using dynamic criteria to “pick up” a specific Kubernetes Service:

    There are no special considerations in this scenario when it comes to the firewall policies or the rules within the policies.

    Summary

    That’s it! This short guide demonstrated how to integrate TKG Service Clusters with NSX, enabling centralized management of security policies across clusters and platforms like Kubernetes and vSphere. The process involves enabling the Supervisor service, creating a Namespace, preparing it for NSX, and finally deploying the TKG Service Cluster.

    To learn more about the integration between the Antrea CNI and NSX you should have a look at the official documentation.

    Thank you for reading! Feel free to share your thoughts or ask questions in the comments below.

  • SDDC.Lab v6 Released

    Slow and steady. That’s how I would describe the pace and progress around making SDDC.Lab version 6 the new default and recommended version of the project.

    If you’re not familiar with the SDDC.Lab project, it’s a collection of Ansible Playbooks that perform fully automated deployments of nested VMware Software Defined Data Center environments called pods. Each pod consists of solutions like vSphere, vSAN, NSX, Tanzu, NSX Advanced Load Balancer, Aria Operations for Logs, and VyOS Router.

    What’s New?

    Product Versions

    As always a wide range of product versions can be deployed using the SDDC.Lab scripts. The latest versions that we tested are:

    • vCenter Server version 8 Update 2
    • ESXi version 8 Update 2
    • NSX version 4.1.2.1
    • vSphere with Tanzu version 8 Update 2
    • Aria Operations for Logs version 8.14.1
    • NSX Advanced Load Balancer version 30.1.1
    • VyOS Latest Rolling Nightly Build
    • Ubuntu Server 22.04 (for ISC BIND)

    New Features and Improvements

    Luis and I, recommend that you have a look at the project’s CHANGELOG.md for a detailed list of all the new features and improvements that were added in version 6. The list below highlights some of the main new features and improvements:

    • Automated deployment of NSX Advanced Load Balancer including configuration of the NSX integration (NSX-T Cloud)
    • BFD is configured and enabled by default between the NSX Tier-0 Gateway and the VyOS router
    • Developed SDDC.Lab credentials file for Firefox to automate logins to the different management UIs.
    • Improved VyOS Router deployment process (thanks rexit1982)
    • Improved DNS Server deployment process
    • Added additional checks that validate prerequisites for successful SDDC.Lab pod deployments before launching a deployment
    • Corrected an issue with the BGP peering between the VyOS router and the physical L3 switch

    Besides this we’ve worked on many smaller items like code optimization that improve stability and performance of SDDC.Lab pod deployments.

    How to Get Started?

    Getting started with SDDC.Lab v6 is quite easy. You simply head over to the GitHub repository and read through the README.md which contains all the information you need to successfully deploy your SDDC.Lab pods.

    Summary

    SDDC.Lab version 6 is the most stable and mature release in the history of the project. It comes with some good improvements and new features and we really hope you will appreciate it.

    We have plans and ideas for the next release and a new development branch for SDDC.Lab is in place already. Check it out if you want to follow what’s coming next in the project.

  • Quick Tip: NSX Advanced Load Balancer for vSphere Tanzu with NSX Networking

    As of NSX version 4.1.1, NSX Advanced Load Balancer version 22.1.4, and vSphere with Tanzu version 8.0 Update 2 we have the option to leverage the NSX Advanced Load Balancer as the load balancer provider for new vSphere with Tanzu backed by NSX networking deployments.

    This deployment option is a very welcome addition knowing that the NSX “native” load balancer is scheduled for deprecation in a future release.

    Registering NSX Advanced Load Balancer with NSX

    After deployment and the initial configuration of NSX and the NSX Advanced Load Balancer (detailed steps available in the vSphere with Tanzu documentation) we register the NSX Advanced Load Balancer with NSX Manager. This is accomplished with a simple API call:

    PUT /policy/api/v1/infra/alb-onboarding-workflow 

    The accompanying request body contains the following keys:

    {
    "owned_by": "LCM",
    "cluster_ip": "<nsx-alb-controller-cluster-ip>",
    "infra_admin_username" : "username",
    "infra_admin_password" : "password",
    "dns_servers": ["<dns-servers-ips>"],
    "ntp_servers": ["<ntp-servers-ips>"]
    }

    Bringing this together in a curl one-liner could look something like this:

    curl -u admin --location --request PUT 'https://ams-nsxt-lm/policy/api/v1/infra/alb-onboarding-workflow' \
    --header 'X-Allow-Overwrite: True' \
    --header 'Content-Type: application/json' \
    --data-raw '{
    "owned_by": "LCM",
    "cluster_ip": "10.203.200.15",
    "infra_admin_username" : "admin",
    "infra_admin_password" : "VMware1!",
    "dns_servers": ["10.203.0.5"],
    "ntp_servers": ["10.203.0.5"]
    }'

    More information about this method can be found in the NSX API documentation.

    When the registration is done you’ll notice that a shortcut to the NSX Advanced Load Balancer Controller UI has been added to the NSX Manager UI. Handy!

    But more important, when we enable Tanzu Supervisor and/or deploy a Tanzu Kubernetes cluster under this Supervisor, we see that it is the NSX Advanced Load Balancer that’s hosting the VIP(s) on its Service Engine(s):

    Summary

    A very short article just to make you aware of this option and how it’s configured. I’m happy to see that customers can now use the NSX Advanced Load Balancer for their new vSphere with Tanzu backed by NSX networking installations.

  • NSX 4.1.2 – GRE Tunnels

    NSX 4.1.2 introduces support for Generic Routing Encapsulation (GRE) tunnels for Tier-0 gateways and Tier-0 VRF gateways offering another standards-based option for “plumbing” network paths that lead traffic into and out of the Software-Defined Data Center (SDDC).

    In today’s short article I’ll go over configuring a GRE tunnel in order to facilitate communication between two environments. This article is not a comprehensive walkthrough by any means. Certain prerequisites have been taken care of in advance and building things in a lab means one can take shortcuts never to be taken in a production environment. Nevertheless, this article should provide you with a basic understanding of how GRE tunnels are configured and managed in NSX 4.1.2.

    Lab Environment

    The following are the components in the lab environment that are relevant for today’s exercise:

    • NSX 4.1.2
    • vSphere 8
    • VyOS 1.4 ( the remote router)

    Diagram

    The diagram below shows what it is we’re trying to put together:

    A Tenant (Blue) has a virtual machine connected to an NSX overlay segment which in turn is attached to a Tier-1 gateway. The tenant’s VRF gateway connects the environment to the outside world (and vice versa). Remote to this environment our tenant has another environment hosting some applications.

    We are tasked with configuring connectivity between those environments and for this we should make use of the new GRE tunnel support in NSX 4.1.2. Naturally, routing should also be configured so that the tenant’s VM (10.203.246.20) is able to communicate with the server (172.16.20.20) in the tenant’s remote environment. Let’s see how this is done!

    Configuring GRE Tunnels

    Network tunnels have endpoints (interfaces) and GRE tunnels are no exception. We begin by configuring the GRE tunnel endpoint on the tenant’s VRF gateway and then do the same on the remote router.

    NSX VRF Gateway

    In NSX Manager we navigate to Networking > Tier-0 Gateways and edit the Tier-0 VRF gateway called VRF Blue. Click on Set to the right of GRE Tunnels.

    In the Set GRE Tunnels dialog we click the Add GRE Tunnel. This is where we get to configure settings and parameters for our GRE Tunnel.

    In my lab environment I’m using the following settings:

    ItemValueDescription
    Tunnel NameGRE Tunnel 1Name of the GRE tunnel
    Destination Address10.203.247.1Remote router external IP address

    The rest of the settings are left with the default values. Note that the MTU size is set to 1476 bytes and that Keep Alive can be enabled and configured if required:

    In the Tunnel Addresses column we click on Set to further configure the tunnel properties. Here I’m using the following settings in my lab environment:

    ItemValueDescription
    Edge NodePod-240-EdgeVM-01The NSX Edge node that will be hosting the GRE tunnel-
    Source Address10.203.246.2The source IP address to be used. A VRF source interface is selected from the list. Both external interfaces and loopback interfaces can be used here. Just make sure that this IP address is reachable by the remote router.
    Tunnel Interface Subnets192.168.100.1/30The IP subnet (and address) attached to this GRE tunnel interface.

    This completes the GRE tunnel configuration on the NSX side.

    Remote Router

    The remote router (VyOS in this case) needs to be configured in much the same way in order to establish a GRE tunnel with the NSX VRF gateway:

    set interfaces tunnel tun100 encapsulation gre
    set interfaces tunnel tun100 remote 10.203.246.2
    set interfaces tunnel tun100 source-address 10.203.247.1
    set interfaces tunnel tun100 address 192.168.100.2/30
    set interfaces tunnel tun100 mtu 1476

    The above commands are rather self-explanatory but let’s have a quick look at them anyway:

    ItemValueDescription
    tunneltunn100Name of the tunnel interface
    encapsulationgreTunnel encapsulation protocol. Must be the same on both sides GRE it is.
    remote10.203.246.2NSX VRF Blue’s external/reachable IP address
    source-address10.203.247.1The source IP address to be used
    address192.168.100.2/30The IP subnet (and address) attached to this GRE tunnel interface.
    mtu1476MTU size (matching the MTU size we have on the NSX VRF)

    Validate Tunnel

    Now that both the NSX VRF gateway and the remote router are configured, it’s time to check whether a GRE tunnel has actually been established.

    In the NSX Manager UI we can check tunnel status from the VRF within the GRE Tunnels dialog:

    Clicking Tunnel Connectivity Status brings up the dialog where

    Status is Up which seems good to me.

    On the remote router we can use to following command to validate the status of the GRE tunnel:

    show interfaces tunnel tun100 brief

    Adding Static Routes

    Before we can test network communication between the tenant’s virtual machine and the tenant’s server, routing information is required. We might have a GRE tunnel up and running, but at this point the virtual machine has no clue on how to get to the server and the other way around. In our scenario we’ll simply solve this by adding a static route to each router.

    On the NSX VRF gateway we add a static route that ensures that traffic heading towards the 172.16.20.0/24 network will use 192.168.100.2 (tunnel interface IP address on remote router) as the next hop:

    Similarly, on the remote router we add a static route so that it knows the 10.204.246.0/24 network is reached via 192.168.100.1 (the tunnel interface IP address on the NSX VRF):

    set protocols static route 10.204.246.0/24 next-hop 192.168.100.1

    Validate Communication

    GRE tunnels in place, static routes in place. Communication between the tenant’s virtual machine and their remote server should now be working. Let’s do a quick test.

    Good old ping always sometimes comes in handy for these kinds of tests:

    Starting a ping from the virtual machine to the server in the remote environment at the other side of the tunnel seems to work fine. Tunnel statistics on the NSX side also seem to indicate that packets are indeed being transmitted and received over our GRE tunnel:

    Mission completed!

    Summary

    This article provided an overview of the new GRE tunnel feature in NSX 4.1.2 which is giving us another option for establishing network connectivity between different environments. Although the scenario we used in this article is kind of “conceptual” and more is to be considered in a real life production scenario, I hope you at least got an idea on how GRE tunnels are implemented in NSX 4.1.2.

    Make sure to check the latest NSX documentation including the release notes to learn more about NSX 4.1.2 and its new features. The NSX Reference Design Guide is another great resource for further reading and learning all about the VMware NSX solution.

    Thanks for reading.

  • NSX 4.1.2 – IDS/IPS Packet Capture

    A nice new feature that shipped with NSX 4.1.2 is the ability to download packet capture files (PCAPs) containing packets that were detected or prevented by NSX IDS/IPS.

    This enables teams to store and investigate network data related to intrusion attempts, outside of NSX and in a common format whenever that is required.

    Packet Capture Feature

    The feature itself is enabled within an NSX IDS/IPS Profile which are found under Security > IDS/IPS & Malware Prevention > Profiles. Up until now IDS/IPS profiles were used to group signatures, which are then applied to selected applications, but now they also contain a section where packet capture is managed. This is interesting as it gives us the flexibility to enable packet capture on a per application level.

    Besides the On/Off switch we can adjust the size of the PCAP files and define the total packets to be captured.

    API

    Of course we can leverage the NSX REST API to configure the packet capture feature as well. For this you would do a:

    PATCH policy/api/v1/infra/settings/firewall/security/intrusion-services/profiles/{profile-id} 

    The request body that goes along with this PATCH request contains the necessary configuration (check the NSX REST API documentation for more information on this). Specifically for the packet capture feature a new type “IdsPcapConfig” has been added:

    PCAP Files

    Enabling and configuring this feature is very straight forward and once it’s done, each time a network traffic pattern matches an NSX IDS/IPS signature (i.e. detection/prevention is triggered), the relevant packets are captured and available for export and download. Again, the scope being defined by where the IDS/IPS profile is applied.

    Let’s begin with a look under Security > IDS/IPS > Monitoring in the NSX Manager UI. As we see there have been some intrusion attempts:

    Next, if we click on Packet Capture Query the same intrusion events are displayed but this time in a table format:

    From this interface we are able to perform some pretty good filtering on things like Attacker IP/Port, Target IP/Port, Signature ID, PCAP ID. Clicking on an Event ID link we also instantly get to see more information about a specific intrusion attempt:

    And now. To get hold of the relevant PCAP-file(s) we first select one or multiple events (rows in the table) and click on Export Packet Capture Data. Data is now exported to PCAP files that can then be downloaded from the NSX UI:

    Note that the PCAPs are packed in a compressed tarball (tar.gz). Once downloaded and unpacked we can see our PCAP file:

    Which can then be opened and inspected with a tool like Wireshark:

    Pretty cool!

    API?

    Yes, we can do all of this (except for opening the PCAP file in Wireshark) using the NSX REST API as well. Two API calls are required where the first one is requesting the PCAP file(s) export:

    POST /policy/api/v1/infra/settings/firewall/security/intrusion-services/pcaps/export 

    The second call performs the actual download of the exported PCAP files:

    GET /policy/api/v1/infra/settings/firewall/security/intrusion-services/pcaps/{file-name}/download 

    For details and examples on these calls check the NSX REST API documentation

    Thanks for reading!