Best Practices 11 min read

Top AI Skills for Cloud Infrastructure [2026]

Top 5 cloud infrastructure skills from our scored review — Terraform state migrations, Cloudflare anti-pattern catalogs, and 4,000+ lines of guidance.

We installed 10 cloud infrastructure skills covering Azure, AWS, Cloudflare, Terraform, and multi-cloud networking. Most were either raw repo dumps with no curated structure, narrow scaffolding tools, or CI/CD skills mislabeled as infrastructure. The five below earned their scores by containing actual cloud architecture guidance — provisioning commands, cost optimization checklists, security hardening rules, and real Terraform HCL blocks you can deploy. We scored each on five dimensions and read every installed file.

How We Scored

Each skill was scored across five dimensions, 0-10 each, for a maximum of 50 points:

  • Relevance — Does it address real cloud infrastructure concerns (provisioning, networking, security, cost)?
  • Depth — How much actual content? Specific configs, architectures, not vague advice.
  • Actionability — Can a developer follow the guidance to deploy or manage cloud infra?
  • Structure — Well-organized with clear cloud coverage?
  • Adoption — Install count as a proxy for real-world validation.

We scored by reading the actual installed skill files — not descriptions, not README summaries.

Quick Comparison

SkillScoreKey FeaturePlatforms / ToolsInstalls
@hashicorp/refactor-module43/50State migration + module refactoring workflowTerraform, AWS, Azure7,934
@cloudflare/workers-best-practices43/5012 anti-patterns + retrieval-first review processCloudflare Workers7,182
@josiahsiegel/azure-well-architected-framework42/50Five-pillar checklists + CLI commandsAzure6,355
@josiahsiegel/cloudflare-knowledge40/504,062 lines covering full Cloudflare platformCloudflare, Zero Trust2,269
@wshobson/hybrid-cloud-networking39/50Multi-cloud VPN/Direct Connect with Terraform HCLAWS, Azure, GCP, OCI6,336

1. @hashicorp/refactor-module — 43/50

Score: 43/50 | Relevance: 8 · Depth: 9 · Actionability: 9 · Structure: 9 · Adoption: 8

One file, 538 lines. This is the most actionable Terraform skill in the registry — it takes a monolithic .tf configuration and walks through transforming it into a versioned, tested module with proper state migration.

The before/after code transformation is the centerpiece. The “before” section shows a monolithic main.tf with hardcoded VPC, subnet, and internet gateway resources. The “after” section shows the same resources restructured into a modules/vpc/ directory with separate main.tf, variables.tf, and outputs.tf files. Every variable gets a description, a type, and where appropriate a validation block — the CIDR block variable includes can(cidrhost(var.cidr_block, 0)) to catch invalid input before apply.

The state migration section is what earns the depth score. It covers both Terraform 1.1+ moved blocks and pre-1.1 manual terraform state mv commands. The moved block examples map each resource from its old address to the new module path, including the tricky for_each index mapping (e.g., aws_subnet.public_1 to module.vpc.aws_subnet.public["us-east-1a"]). The skill also includes the verification step most developers forget: running terraform plan after migration to confirm zero changes.

Three refactoring patterns round out the skill: Resource Grouping (networking, compute, data), Configuration Layering (base module with environment wrappers), and Composition (small focused modules wired through a root configuration). The common pitfalls section warns against over-abstraction (map(map(any)) variable types), tight coupling between modules, and the specific risk of running state migration in production before testing in a non-prod environment.

skillsafe install @hashicorp/refactor-module

2. @cloudflare/workers-best-practices — 43/50

Score: 43/50 | Relevance: 8 · Depth: 9 · Actionability: 9 · Structure: 9 · Adoption: 8

Three files totaling 764 lines: a 127-line SKILL.md, a 463-line references/rules.md, and a 174-line references/review.md. This is the official Cloudflare skill for Workers code review, and it is built around a retrieval-first philosophy — it tells the agent to fetch the latest docs rather than relying on baked-in knowledge.

The anti-pattern catalog is the strongest section. Twelve documented anti-patterns, each with an explanation of why it matters in the Workers runtime specifically. await response.text() on unbounded data risks the 128 MB memory limit. Math.random() for tokens is predictable — use crypto.randomUUID(). Module-level mutable variables cause cross-request data leaks because Workers instances handle multiple requests. Destructuring ctx (the execution context) loses the this binding and throws “Illegal invocation” at runtime. Direct string comparison for secrets is vulnerable to timing side-channels — use crypto.subtle.timingSafeEqual instead.

The configuration rules table covers six items that trip up most new Workers projects: setting compatibility_date, enabling nodejs_compat, running wrangler types to generate the Env interface instead of hand-writing it, using wrangler secret put instead of hardcoding, and using JSONC config format for newer features.

The review workflow in references/review.md is a seven-step process: retrieve latest types, read full files (not just diffs), check types, check config, check patterns, check security, and validate with tools (npx tsc --noEmit, lint for no-floating-promises). The emphasis on reading full files rather than diffs is practical — binding access patterns span multiple functions and a diff review misses context.

skillsafe install @cloudflare/workers-best-practices

3. @josiahsiegel/azure-well-architected-framework — 42/50

Score: 42/50 | Relevance: 9 · Depth: 8 · Actionability: 8 · Structure: 9 · Adoption: 8

One file, 409 lines. This skill maps all five pillars of Microsoft’s Well-Architected Framework into actionable az CLI commands and implementation checklists. It is the most structured Azure skill available.

Each pillar gets three things: principles, best practices with CLI snippets, and a deployment checklist. The Reliability pillar includes the exact availability SLA numbers (single VM with Premium SSD: 99.9%, Availability Set: 99.95%, Availability Zones: 99.99%) alongside the az vm create --zone 1 command to deploy across zones. The Security pillar covers managed identities (az vm identity assign), RBAC assignments, storage encryption with TLS 1.2 enforcement, and Microsoft Defender enablement — four layers of defense-in-depth with copy-pasteable commands.

The Cost Optimization pillar is where the skill becomes particularly useful. It shows how to query Azure Advisor for cost recommendations (az advisor recommendation list --category Cost), create budget alerts at 80%, 100%, and 120% thresholds, and apply Azure Hybrid Benefit for Windows and SQL workloads. The Reserved Instances section notes savings up to 72% versus pay-as-you-go for 3-year commitments.

The Common Patterns section at the bottom provides three architecture templates. The “Highly Available Web Application” pattern chains Application Gateway, App Service Premium tier, zone-redundant Azure SQL, Redis Cache, Application Insights, and Azure Front Door. The “Mission-Critical Application” pattern adds multi-region deployment with Traffic Manager and geo-redundant storage. The “Cost-Optimized Dev/Test” pattern recommends auto-shutdown, B-series burstable VMs, and Azure DevTest Labs. These patterns give the AI concrete reference architectures rather than forcing it to compose from scratch.

skillsafe install @josiahsiegel/azure-well-architected-framework

4. @josiahsiegel/cloudflare-knowledge — 40/50

Score: 40/50 | Relevance: 9 · Depth: 9 · Actionability: 8 · Structure: 8 · Adoption: 6

Six files totaling 4,062 lines: a 1,164-line SKILL.md and five reference files covering AI models, cost comparison, MCP server development, third-party integrations, and Zero Trust setup. This is the largest cloud infrastructure skill we evaluated by raw content.

The storage deep dives are what justify the depth score. The skill documents five storage services (KV, R2, D1, Durable Objects, Queues) with characteristics, code examples, and specific limits. KV gets propagation time (up to 60 seconds), max value size (25 MiB), and free tier limits (100K reads/day, 1K writes/day). R2 includes multipart upload code for files over 5 GB. D1 includes batch operations for transactional writes and PRAGMA optimize as a post-migration step. Durable Objects includes the WebSocket Hibernation API with the full webSocketMessage, webSocketClose, and webSocketError handler lifecycle.

The Hyperdrive section stands out for its performance explanation. It breaks down the round-trip cost: without Hyperdrive, a cold connection requires 8 round-trips (TCP handshake, TLS negotiation, DB authentication, query). With Hyperdrive, the pooled connection reduces this to 1 round-trip. The section also lists four specific cases where Hyperdrive should NOT be used — D1 databases, local development, prepared statements across requests, and Durable Objects storage.

The pricing reference table covers Workers, KV, R2, D1, Durable Objects, and Queues with both free and paid tiers. The references/cost-comparison.md file (283 lines) provides side-by-side comparisons with AWS and Azure equivalents. The scan flagged three HIGH security issues related to systemd and launchctl persistence commands in the Zero Trust setup reference, which is worth noting — those are legitimate service installation steps for cloudflared tunnels, but agents should treat them with caution.

skillsafe install @josiahsiegel/cloudflare-knowledge

5. @wshobson/hybrid-cloud-networking — 39/50

Score: 39/50 | Relevance: 9 · Depth: 7 · Actionability: 7 · Structure: 8 · Adoption: 8

Two files totaling 274 lines: a 256-line SKILL.md and an 18-line references/direct-connect.md. This is the only skill we found that covers hybrid connectivity across four cloud providers in a single file — AWS, Azure, GCP, and Oracle Cloud Infrastructure.

The skill documents two connection types per provider: VPN (lower cost, internet-dependent) and dedicated private connectivity (Direct Connect, ExpressRoute, Cloud Interconnect, FastConnect). Each VPN section includes Terraform HCL code. The AWS VPN example defines aws_vpn_gateway, aws_customer_gateway, and aws_vpn_connection resources with BGP ASN configuration. The Azure VPN example defines an azurerm_virtual_network_gateway with RouteBased VPN type and VpnGw1 SKU.

Three hybrid network patterns provide the architectural framing. Hub-and-Spoke uses Transit Gateway (AWS) or vWAN (Azure) as a central hub with production, staging, and development spokes. Multi-Region Hybrid uses dual Direct Connect links to different regions with cross-region peering. Multi-Cloud Hybrid shows a single on-premises datacenter with four simultaneous private connections to AWS, Azure, GCP, and OCI.

The High Availability section includes dual VPN tunnel Terraform code with separate customer gateways, plus guidance on active-active configuration with BGP failover and ECMP routing. The troubleshooting section provides the exact CLI commands for checking VPN status on AWS (aws ec2 describe-vpn-connections), Azure (az network vpn-connection show), and OCI (oci network ip-sec-connection list).

The skill is concise — 274 lines is lean for four cloud providers. It lacks the depth of a single-provider skill: there are no GCP Terraform examples, and the OCI sections have descriptions without code. But as the only multi-cloud networking skill in the registry, it fills a gap that no other skill addresses.

skillsafe install @wshobson/hybrid-cloud-networking

Frequently Asked Questions

Which skill should I install first if I only use one cloud provider? If you are on Azure, start with @josiahsiegel/azure-well-architected-framework — it covers all five pillars with CLI commands and checklists. If you are on Cloudflare, install @cloudflare/workers-best-practices for code review and @josiahsiegel/cloudflare-knowledge for platform-wide reference. For AWS, the options are thinner: @awslabs/mcp is a 500-file repo dump of MCP servers, not a curated infrastructure skill. We expect more focused AWS skills to appear as the registry grows.

Are these skills safe to install? Eight of the ten skills we installed scored A+ (100/100) on the SkillSafe security scan. The exception was @josiahsiegel/cloudflare-knowledge, which flagged three HIGH issues for systemd and launchctl persistence commands in its Zero Trust setup reference. Those are legitimate cloudflared service installation steps, but AI agents should confirm before executing them. We recommend running skillsafe scan on any skill before using it in production.

Why is there no dedicated AWS infrastructure skill in the top five? We searched for AWS-focused skills using multiple queries (“aws cloud,” “serverless lambda,” “aws cdk”). The closest match, @awslabs/mcp, is the entire awslabs/mcp GitHub repository imported as a single skill — 500 files including CI workflows, contribution guides, and 30+ MCP server source directories. It is useful as a reference repository but not as a curated infrastructure skill. @prowler-cloud/gh-aw covers GitHub Agentic Workflows for the Prowler security platform, which is CI/CD-adjacent rather than cloud infrastructure. The AWS skill ecosystem is weighted toward MCP servers and tooling rather than architectural guidance.

Conclusion

Cloud infrastructure skills split into two categories: platform-specific depth and cross-platform breadth. The Cloudflare skills (workers-best-practices and cloudflare-knowledge) and the Azure Well-Architected Framework skill excel at depth — anti-pattern catalogs, pillar-by-pillar checklists, storage API deep dives. The Terraform refactor-module skill covers the IaC layer that sits beneath any cloud provider. And hybrid-cloud-networking is the only skill that addresses the multi-cloud connectivity problem directly.

The gap is AWS. With 6,314 installs, @awslabs/mcp has adoption, but it is a raw repository, not a curated skill. A focused AWS infrastructure skill — covering VPC design, IAM policies, cost optimization, and multi-account strategy — would immediately rank in our top five if it delivered the same density as the Azure or Cloudflare skills here.

Related roundups: Browse all Best Of roundups