Skip to main content

Command Palette

Search for a command to run...

Automating VCF 9.0.2 Day 1 Setup with Codex

Updated
7 min read

This automation is based on publicly documented APIs and validated against standard UI operations in a homelab environment.

Introduction

With VCF9, the Installer deploys NSX Manager and finishes the initial TEP setup for you. But after that, setting up Edge, VPC, and Avi is still mostly a manual UI task.

There are many steps, and doing the same work again every time after destroying the environment is honestly not very fun.

How to Integrate NSX-VPC and Avi for VCF 9.0.2

In this article, I share my attempt to automate VCF9 Day 1 setup with GPT-5.4 and Codex.

  • Day 0 setup: until the VCF Installer completes deployment

  • Day 1 setup: setting up NSX Edge, VPC, and Avi, and deploying Supervisor

This work is intended for homelab use only and is not meant for production environments.

Automation Approach with Codex

The overall approach was simple.

  1. Break the workflow into smaller steps and implement each task as a Bash leaf script with check/apply/delete

  2. Build a small Python orchestrator to connect those scripts together

I chose this structure because I did not want to rely on a single large automation tool. Instead, I wanted something easier to test, rerun, and control.

With Codex support, I was able to take a more ambitious approach and build a small custom orchestrator myself—something that would have felt too costly to implement otherwise.

The full Day 1 setup takes about 90 minutes, so this split worked well for running a long workflow more safely.

The check/apply/delete pattern also worked well for quick iteration and repeated testing. At the same time, delete workflows were much harder to get right. I was not able to make operations like Supervisor removal or NSX Edge cleanup work reliably enough, and that became another hard part of this automation work.

The actual flow looked like this.

# ./provision-vcf9.py list

Phase 0: VCF Installer (01:18:45)
  0-1 [done] [0-init] Clone ESX nodes (00:07:36)
  0-2 [done] [0-init] Validate VCF Installer JSON (00:04:38)
  0-3 [done] [0-init] Start VCF Installer (Go to phase 1 after fleet management milestone) (01:06:31)

Phase 1: vCenter (00:00:10)
  1-1 [done] [1-vsphere] Enable vCenter SSH access (00:00:08)
  1-2 [done] [1-vsphere] Disable HA admission control (00:00:02)

Phase 2: NSX (00:38:02)
  2-1 [done] [2-nsx] Disable OVF validation (00:00:10)
  2-2 [done] [2-nsx] Accept EULA and enable SSH (00:00:12)
  2-3 [done] [2-nsx] Create Edge TEP IP pool (00:00:02)
  2-4 [done] [2-nsx] Deploy Edge cluster (00:37:32)
  2-5 [done] [2-nsx] Configure VPC connectivity (00:00:02)
  2-6 [done] [2-nsx] Configure Tier-0 (00:00:02)
  2-7 [done] [2-nsx] Configure Avi management network (00:00:02)

Phase 3: Avi - NSX Cloud (00:21:54)
  3-1 [done] [3-avi] Create Avi content library (00:00:04)
  3-2 [done] [3-avi] Enable single-node Avi feature (00:02:12)
  3-3 [done] [3-avi] Download and upload Avi bundle (00:01:56)
  3-4 [done] [3-avi] Deploy single-node Avi controller (00:17:14)
  3-5 [done] [3-avi] Register Avi license (00:00:14)
  3-6 [done] [3-avi] Create Avi controller certificate (00:00:04)
  3-7 [done] [3-avi] Trust Avi certificate in SDDC Manager (00:00:04)
  3-8 [done] [3-avi] Verify enforcement point (00:00:02)
  3-9 [done] [3-avi] Create NSX cloud integration (00:00:04)

Phase 4: VKS (00:25:54)
  4-1 [done] [4-vks] Create storage policy (00:00:04)
  4-2 [done] [4-vks] Enable Supervisor (00:20:02)
  4-3 [done] [4-vks] Create vSphere namespace (00:00:04)
  4-4 [done] [4-vks] Attach VM Classes to namespace (00:00:04)
  4-5 [done] [4-vks] Create and attach VKS content library (00:00:12)
  4-6 [done] [4-vks] Set proxy for Supervisor (00:00:02)
  4-7 [done] [4-vks] Upgrade Supervisor Service VKS (00:05:26)

What I Learned: Phase 1 Can Start After Fleet Management Is Deployed

After several test runs, I found that the earliest reliable point to move from Phase 0 to Phase 1 is not the completion of the entire VCF Installer workflow, but the completion of this milestone:

Waiting for the full VCF Installer workflow adds extra time. In my lab, once this milestone was completed, Phase 1 could start safely.

At first, I thought Phase 1 might be able to start as soon as NSX Manager was up. However, testing showed that the NSX milestone was too early.

The completion of Deploy and configure NSX only means that NSX has reached a certain point. It does not guarantee that the SDDC Manager bring-up process is stable enough for the next Day 1 operations.

In one test, I started the next phase immediately after the NSX milestone. The VCF Installer workflow later failed during Deploy and configure the fleet management appliance. The failed subtask was:

Upload Personality to SDDC Manager

This put the management domain into an ERROR state. As a result, later operations such as Edge cluster creation failed with an error similar to the following:

CREATE_EDGE_CLUSTER is not allowed due to resource of type DOMAIN ... in ERROR state

This confirmed that NSX being available is not enough. For that reason, the current orchestrator uses the Fleet Management appliance deployment and configuration milestone as the gate, instead of the NSX deployment milestone.

This small change allows Phase 1 to start before the full VCF Installer workflow completes. As a result, the Supervisor has already been fully configured before the VCF Automation deployment finishes.

What I Learned: Parallel Execution Does Not Work Cleanly

I also tried to save time by running Phase 2 and Phase 3 in parallel. Phase 2 builds NSX Edge, and Phase 3 deploys the Avi Controller.

But this did not work cleanly. When the Avi Controller deployment started while the Edge deployment was still running, I hit lock errors.

Because of that, I went back to serial execution.

Difficult Parts

Even with Codex, two parts took a lot of time and a lot of tokens.

Edge Deployment and VPC Configuration (Phase 2-4, 2-5)

From the operator side, this looks like something you configure from vCenter Network Connectivity. And just looking at the UI tells you automation will not be simple, because there are so many input fields.

What made this harder was that vCenter was not doing everything by itself. For Edge deployment, vCenter sends the request to SDDC Manager. For VPC setup, it sends the request to NSX Manager.

So even though this looks like a single task in the vCenter UI, the actual API calls go to different systems.

Here is the Bash script and the JSON payload I used to deploy the Edge cluster via the SDDC Manager API:

Payload for VPC Configuration.

Enabling Supervisor (Phase 4-2)

This part was harder than expected. Even though these are public APIs, I had to capture the exact POST payloads using the browser's developer tools for two main reasons:

  • Scattered Documentation: Similar to the Edge deployment, finding the right API for enabling the Supervisor was a challenge.

  • Incomplete Schemas: The official docs often lacked the payload details required for a real-world deployment.

  • Gist - enable-supervisor.sh

  • Gist - enable-supervisor.json

Compared with NSX and Avi, the vCenter API was much harder to work with. I am really looking forward to seeing native support for these Day 1 operations in future releases of PowerCLI or govc, as it would make this automation much simpler.

Conclusion

Day 1 setup is basically a one-time operation, so the need for automation may not be very high.

Still, for lab and validation work, it is very useful to start the deployment before going to bed and wake up to a fully configured Supervisor the next morning.

At least for me, this reduced the amount of manual work and gave me more time to grab a coffee and focus on the real work.