Category: Cloud Architecting

  • The state of SEOScoreAPI – An OpenClaw Project

    The state of SEOScoreAPI – An OpenClaw Project

    If you remember, I did an experiment with OpenClaw. I let it have access to build anything it wanted with a single EC2 instance. It built https://seoscoreapi.com which is a fantastic tool for checking the status of your websites SEO.

    Initially I thought I would scrap the project and just let the domain expire. But I started doing some manual promotion after OpenClaw spent over $200 and got the first paid user! Since then I’ve put a few hours a day promoting and adding some features.

    What’s New

    The last few months have been the most productive stretch since launch. Here’s what shipped:

    ADA Accessibility Audits

    This one came from watching the news. ADA website lawsuits hit over 4,000 in 2025 and they’re still climbing. Small businesses, e-commerce sites, local restaurants — everyone’s a target.

    We built a full WCAG 2.1 AA compliance endpoint that injects axe-core (the industry-standard accessibility engine) into a headless browser and scans the rendered page. It returns a compliance score, a lawsuit risk assessment, category breakdowns across 10 areas (color contrast, forms, keyboard navigation, ARIA, etc.), and specific fix suggestions for every violation.

    We ran it on our own site first. Scored a 71. Found contrast issues, links that weren’t distinguishable from body text, a misused aside element. Fixed everything. Now we score 100. That’s the point — even developers who care about accessibility miss things.

    Available on all paid plans. Starter gets 5/month, Ultra gets 500.

    GEO (Generative Engine Optimization)

    Traditional SEO gets you into Google. GEO gets you into ChatGPT, Claude, Perplexity, and every RAG pipeline pulling from the web.

    The GEO audit checks 26 factors across four categories: crawl accessibility, structural markup, content extractability, and AI discoverability. It answers questions like: Do you have an llms.txt file? Is your content chunked in a way that RAG systems can ingest? Do you have freshness signals? Are AI crawlers even allowed in your robots.txt?

    This is becoming more relevant every month. Traffic from AI systems is growing and most sites aren’t optimized for it at all.

    Competitive Audits

    You can now audit your site against competitors in a single request. The response shows a side-by-side comparison with score differentials, category-by-category breakdowns, and which specific checks you’re winning or losing on. Useful for agencies pitching prospects and for anyone doing competitive analysis.

    What’s Next

    Honestly I’m not sure. I would love to get more people using the service. Possibly some more integrations, a WordPress plugin?

    Try It

    If you’ve read this far, go audit your site: seoscoreapi.com

    The demo on the homepage doesn’t require a signup. Type in your URL, see your score, read the priorities list. If you want API access, the free tier takes 30 seconds to set up — just an email and a verification code.

    If you’re a developer, the docs have everything. Python and Node.js SDKs are on PyPI and npm. The GitHub Action is at SeoScoreAPI/seo-audit-action.

    If you have questions or feedback, I’m at aaron@seoscoreapi.com.

  • Evolution of my build failure agent

    Evolution of my build failure agent

    I’ve written in the past about troubleshooting build pipelines with AI. While all of this is a great step in speeding up your development and reducing the amount of troubleshooting the DevOps team needs to do in the enterprise, it is NOT the end goal.

    The end goal would be to have the AI fix the problem for you.

    I’m rebranding my Jenkins Sentinel to just be Sentinel. This workflow allows you to automate remediation for your pipelines while still retaining human in the loop security.

    The other primary feature is storing your build failures and remediations in a database that you can view, update, analyze for custom model training.

    Originally we had the dispatch layer that would notify us of build failures and possible resolutions. The new addition is the cluster of “workers”. Running on AWS Fargate, this team of developers works with the LLM on Bedrock to resolve the failure.

    1. The task spins up in the cluster
    2. The build logs identify the repository and branch
    3. The repository is cloned, and branch checked out
    4. The code fix is implemented
    5. The task generates its reasoning and updates the database accordingly
    6. Code is committed to a new branch and a pull request is opened.
    7. The task cleans up and shuts down

    Dispatch still remains the same and the developer is notified accordingly. I need to implement developer specific notifications so that channels are not flooded or email lists abused.

    The other major thing I wanted to see was the cost per fix.

    This screenshot is from the dashboard which shows the compute spend and the LLM spend. For this simple Terraform fix you can see the was a little around $0.02. Assuming your code bases are more complex this value could increase proportionally.

    I also included a stats page which shows the totals for the entire organization.

    This is all real data from my testing project. The build agent is successfully troubleshooting pipelines for:

    • Python
    • Terraform
    • Java
    • Typescript
    • Docker
    • Kubernetes
    • Go
    • Cloudformation

    I plan to continue to add more supported platforms and languages as time allows. The other major integration that I am working on is support for GitHub Actions. Once I complete that integration and put this into all of my pipelines I expect that my troubleshooting and development time will decrease rapidly.

    Other future plans include:

    • Ingestion of bugs through sources like Jira, ToDoist (my favorite), or another ticketing system.
    • Discord Dispatching
    • Teams Dispatching – although this is really hard to develop for without a paid account
    • Custom model – using the build failure data to train a model

    Anyway, this project has been super fun. If you want to implement it on your own infrastructure feel free to reach out!

    Don’t miss an update

    PS: the featured image was generated and setup through my Nano Banana WordPress plugin

  • Why I built my own WordPress Platform

    Why I built my own WordPress Platform

    I’ve been building websites for a long time. I remember learning HTML and Microsoft FrontPage as my first website builder. It was such a fun time to be creating horrific looking websites back in the early 2000s. As the internet progressed so did my skills and back in 2016 I formed my company 45Squared to build websites for small businesses. My whole goal is to be your trusted resource when it comes to being online.

    When I started the company I built WordPress websites of various shapes and sizes but they always ran on AWS. This helped me expand my AWS skills as well as provide robust infrastructure for my client’s websites to live on. I managed the website and the underlying infrastructure for a small monthly cost that beat the competition. The result, a bunch of paying customers a decent side hustle.

    As time went on, selling became harder and the race to zero for cost was apparent. So, as the AI boom is on, I decided that it was time to automate the site building process.

    I started documenting out how I would want this to work. Fully automated website deployments, design, content, custom domains, good SEO base and deployed FAST!

    Enter https://ai.45sq.net. This platform is fully automated. The customer can provide inputs and descriptions of what they want as well as photos or other graphical content. The workflow takes all of the inputs and builds a fully functional WordPress website hosted on AWS. The user can easily point their own domain to the server and setup automatic payments. They then get full administrative access to their website so they can expand and add features just like any other WordPress site.

    So why did I build this?

    If you contact a web designer now you will have to pay them to build up the initial design, work with their timelines, end up with something that needs revisions and your time to live will be in the weeks not minutes.

    The platform I built for 45Squared eliminates the need for the initial design fees and focuses on getting you online quickly. Its great for small businesses who are just getting started.

    So now when I get a request to build a site I can tell the customer that I have two options. First, fully custom. I’m still willing to sit with you and build out the picture perfect website. Or, two, you can launch your own and I will still support the website and help you with your online presence.

    So that’s it. An easy to use WordPress website launcher. Running on enterprise grade cloud. With content, design, layout and all the rest handled by the magic of Claude Opus.

    Try it out: https://ai.45sq.net. No contracts. No weird fees. Get online today.

  • Building in Public – The Automated WordPress Deployment Platform Part 2

    Building in Public – The Automated WordPress Deployment Platform Part 2

    A few days ago I wrote about building an automated WordPress deployment platform using Terraform and AI. Well, i’m happy to report that the entire platform is live and ready for you to explore and launch your own WordPress website.

    Introducing 45Squared’s WordPress deployment platform powered by Ubuntu and Claude. Try it out today at https://ai.45sq.net.

    Let’s talk about how this all works.

    The front end infrastructure that an end user will see is pretty straightforward. I am utilizing an ECS cluster and NextJS to deliver the end user experience. The second portion of the user experience is handled by an AWS API Gateway to manage all of the user credentials, payment processing, sit launch status. Authentication is handled by AWS Cognito. Hate on it all you want, Cognito works just fine when configured correctly.

    Frontend architecture

    Behind the scenes, once a user transaction has completed successfully, the website is provisioned using another ECS task. This container runs through a sequence of steps to provision the AWS EC2 instance for the user to utilize. Each tenant instance is running a hardened Ubuntu image that is built using Packer. I will cover this in another post. Throughout the provisioning process, the task is updating the DynamoDB table so that the user gets a live look into how their website is progressing.

    Provisioning architecture

    Each tenant is given a subdomain as well as the ability to utilize a custom domain name. Each tenant is also given a Cloudfront CDN for global static content distribution. And of course, each tenant receives their own SSL certificate for both their custom domain and their subdomain.

    Each site can be managed by SSM which will eventually be linked into an AI agent for management through Slack or another messaging platform.

    I don’t intend to use this platform to compete with the large players. 45Squared’s vision has always been to serve the small to medium size businesses who want personalized support while still receiving an amazing product. This platform gives them the ability to quickly launch a website and get their company on the world wide web within 10 minutes.

    If you are interested in building out a website using the platform the first few users can receive 50% using code “BETATESTER50”. There are limited redemption so be sure to get going quickly!

    Don’t miss an update

  • Building In Public – Designing an automated WordPress deployment engine with Terraform

    If you didn’t know I built a lot of technical experience by building out WordPress websites. I started a business around it. I did quite well. Lately, I’ve been trying to get back into building websites for small businesses. I find that the space of AI has made some of the bigger players too complex even though they claim to be easy.

    My goal is to create a single web page where a user can describe their website and have it auto provision a full WordPress instance build upon core AWS services. The end result will be a fully managed WordPress instance, backed by AWS’s 99.999% up time. It will be fully automated and take around 10 minutes to have a fully developed website.

    Current Architecture Plans:

    I’m utilizing AWS to handle everything (no surprise). Originally I was going to utilize Step Functions as the provisioner but as I started building I ended up hitting too many roadblocks and restrictions from a timing perspective.

    When dealing with Bedrock the response times can vary. So I made the switch to go with an ECS approach. The control plan/signup page is built there so the provisioner should also just be another task.

    Features:

    • Custom Domains
    • Automated SSL
    • Load balancers
    • Auto healing (through auto scaling)
    • Monitoring

    Essentially all the standard features you would expect from a web host. Just without the design portion.

    This will be a new ongoing series for you all to read about. If you’re interested in following along subscribe to my mailing list!

    Don’t miss an update

  • Troubleshooting Jenkins Pipelines with AI

    Do you love or hate Jenkins? I feel like a lot of the DevOps world has issue with it but, this post and system could easily be modified to any CI/CD tool.

    One thing I do not enjoy about Jenkins is reading through its logs and trying to find out why my pipelines have failed. Because of this I decided this is a perfect use case for an AI to come in and find the problem and present possible solutions for me. I schemed up this architecture:

    ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
    │    Jenkins      │     │   API Gateway   │     │   Ingestion     │
    │  (Shared Lib)   │────▶│    /webhook     │────▶│    Lambda       │
    └─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                             │
                                                             ▼
    ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
    │   SQS Queue     │◀────│   Analyzer      │◀────│   SQS Queue     │
    │  (Dispatcher)   │     │    Lambda       │     │   (Analyzer)    │
    │     + DLQ       │     │   (Bedrock)     │     │     + DLQ       │
    └────────┬────────┘     └─────────────────┘     └─────────────────┘
             │
             ▼
    ┌─────────────────┐     ┌─────────────────┐
    │   Dispatcher    │────▶│   SNS Topic     │────▶ Email/Slack/etc. 
    │    Lambda       │     │ (Notifications) │
    └─────────────────┘     └─────────────────┘

    A simple explanation is that when a pipeline fails we are going to send the logs to an AI and it will send us the reasoning as to why the failure occurred as well as possible troubleshooting steps.

    Fine. This isn’t that interesting. It saves time which is awesome. Here is a sample output into my Slack:

    This failure is because I shutdown my Docker Swarm as I migrated to K3s.

    Here is the same alert via email from SNS:

    So why build this? Well, this weekend I worked on adding “memory” to this whole process in preparation of two things:

    1. MCP Server
    2. Troubleshooting Runbook(s)

    Jenkins already has an MCP server that works great in Claude Code. You can use it to query jobs, get logs, have Claude Code troubleshoot, resolve and redeploy.

    Unless you provide Claude Code with ample context about your deployment, its architecture and the application it might not do a great job fixing the problem. Or, it might change some architecture or pattern that is not to your organization or personal standards. This is where my thoughts about adding memory to this process comes in.

    If we add a data store to the overall process and log an incident, give it unique identifiers we can begin to have patterns and ultimately help the LLM make better decisions about solving problems within pipelines.

    Example:

    {
     "PK": "FP#3315b888564167f2f72185c51b3c433b6bfa79e7b0e4f734e9fe46fe0df2d8c6",
     "SK": "INC#66a6660f-6745-468f-b516-41c51b8d0ecf",
     "build_number": 69,
     "category": "environment",
     "confidence_score": 0.65,
     "created_at": "2026-02-09T14:50:26.322599+00:00",
     "fingerprint": "3315b888564167f2f72185c51b3c433b6bfa79e7b0e4f734e9fe46fe0df2d8c6",
     "incident_id": "66a6660f-6745-468f-b516-41c51b8d0ecf",
     "job_name": "java-test-project",
     "relevant_log_lines": [
      "✗ Deployment to Docker Swarm failed",
      "ERROR: script returned exit code 255",
      "Stage \"Verify Deployment\" skipped due to earlier failure(s)",
      "unset SSH_AGENT_PID; echo Agent pid 2897883 killed; [ssh-agent] Stopped."
     ],
     "requires_human_review": true,
     "reviewed_by": null,
     "root_cause": "Docker service deployment to Swarm failed with exit code 255 during SSH command execution",
     "status": "suspected",
     "suggested_fixes": [
      "SSH into the Docker Swarm manager node manually and run: `docker service ls` and `docker node ls` to verify Swarm is operational and the manager node is reachable",
      "Check SSH connectivity and credentials by running: `ssh -i <ssh-key> <swarm-manager-host> 'docker info'` to ensure the Jenkins agent can authenticate and execute Docker commands",
      "Review the SSH agent logs immediately before the failure (lines truncated in output) - the actual error message from the `docker service create` or `docker service update` command is not visible in the provided logs",
      "Verify the Docker image `192.168.1.23:5000/test-java-app:69` is accessible from the Swarm manager node by running: `docker pull 192.168.1.23:5000/test-java-app:69` on the target host",
      "Check Docker Swarm manager node resources (disk space, memory) with: `df -h` and `free -m` as resource exhaustion can cause service creation failures with exit code 255"
     ],
     "updated_at": "2026-02-09T14:50:26.322599+00:00"
    }

    This is the log from the same incident I showed before. The same information now stored. So, now, if we hook up an MCP to our AI and ask it to find patterns we get something like this:

    Here's the summary of your 18 incidents across 7 jobs, all from today and yesterday:
    
      ---
      Category Breakdown
      ┌───────────────┬───────┬─────┐
      │   Category    │ Count │  %  │
      ├───────────────┼───────┼─────┤
      │ environment   │ 11    │ 61% │
      ├───────────────┼───────┼─────┤
      │ configuration │ 6     │ 33% │
      ├───────────────┼───────┼─────┤
      │ code          │ 1     │ 6%  │
      └───────────────┴───────┴─────┘
      Most Common: environment (11 incidents)
    
      The dominant failure pattern is Kubernetes deployment rollout timeouts — pods failing to become ready after kubectl apply or
      kubectl rollout status. This hits nearly every job:
    
      - accounting-cloud (4 incidents) — rollout timeout + health check timeout
      - oovoo-javer-bot (1) — rollout timeout, also noted k3ks-3 node is NotReady
      - dividend-tracker (3) — rollout timeout + health check wait timeout
      - aws-cost-optimizer (1) — rollout timeout
      - Password Generator (1) — rollout timeout
      - homelab-proxy (1) — deployment timeout
    
      The root cause is consistently the same: pods don't start in time. The AI analysis flags likely causes as ImagePullBackOff,
      CrashLoopBackOff, resource constraints, and the NotReady node (k3ks-3).
    
      Second Most Common: configuration (6 incidents)
    
      These are more varied:
    
      - oovoo-javer-bot (3) — missing Jenkins credential oovoo-javer-bot-env, missing K8s secret oovoo-javer-bot-secrets, empty
      withCredentials block
      - homelab-proxy (2) — Traefik CRD API version mismatch (traefik.containo.us/v1alpha1 vs traefik.io/v1alpha1)
      - dividend-tracker (1) — PVC resize forbidden (can't shrink PVC below current capacity)
      - food-cost-app-pipeline (1) — Terraform state lock held by another process
    
      Key Takeaway
    
      Your k3s cluster is the bottleneck. 11 of 18 failures are pods not starting in time. The k3ks-3 NotReady node suggests a
      cluster health issue that's likely causing scheduling failures or resource pressure across all your deployments. Fixing that
      node (or removing it from the cluster) would likely resolve the majority of these incidents.
    

    Ignore the fact that many of these are the same error. I did a lot of this on purpose.

    What you SHOULD imagine is how does this run inside your environment(s) and what data would you collect. If you think about it, you should realize you would find the bottlenecks of your own deployments. You would find the spots where your developers are getting stuck. You can then create solutions to those issues and hopefully reduce that trend line.

    Next steps.

    We need a human in the loop element. I’m going to start crafting a web interface where these issues are presented to a human engineer. That engineer could add notes or better steps for resolution. With that data added into the memory the troubleshooting agent can follow best practices of your organization or home lab.

    So, stay tuned for the web interface. If you’re interested in setting this up for yourself shoot me a message and I’ll give you access to the repository.

  • How I utilize Claude Code and AI to build complex applications

    “A Fever You Can’t Sweat Out – 20th Anniversary Deluxe” is an album that came out? Wow. I remember seeing Panic! as a teenager…

    I stayed away from AI for a long time. I think a lot of people in my field were nervous about security, bad code, incorrect information and much more. In the early days of ChatGPT it was easy to have the AI hallucinate and come up with some nonsense. While its still possible for this to happen I found a workflow that has helped me build applications and proof of concept work very quickly.

    First – I have always given AI tasks that I can do myself.
    Second – If I can’t do a task, I need to learn about it first.

    These aren’t really rules, but, things I think about when I’m building out projects. I won’t fall victim to the robot uprising!

    Let’s talk about my workflows.

    Tools:
    – Claude (Web)
    – Claude Code
    – Gemini
    – Gemini CLI
    – ChatGPT
    – Todoist

    I pay for Claude and I have subscriptions to Gemini Pro through my various GSuite Subscriptions. ChatGPT I use for free. ToDoist is my to do app of choice. I’ve had the subscription since back in my Genius Phone Repair days to manage all of the stores and their various tasks.

    The Flow

    As with most of you, I’m sure you get ideas or fragments of ideas at random times. I put these into ToDoist where I have a project called “Idea Board” its basically a simplified Kanban board with three columns:

    Idea | In progress | Finished

    The point of this is to track things and get them out of my brain to free up space in there everything else that happens in my life. I utilize the “In Progress” column for when I’m researching or actually sitting down to process the idea with more detail. Finally, the “Finished” column is utilize for either ideas that I’m not going to work on or ideas that have turned into full projects. This is not the part of the project where I actually detail out the project. It’s just a landing place for ideas.

    The next part of the flow is where I actually detail out what I want to do. If you have been utilizing Claude Code or Gemini CLI or Codex you know that input is everything and it always has been since AI became consumer ready. I generally make a folder on my computer and start drafting my ideas with more detail into markdown files. If we look at CrumbCounts.com as an example, I started with simply documenting out the problem I was trying to solve:

    Calculate the cost for this recipe.

    In order to do that we then need to put a bunch of pieces together. Because I am an AWS Fanboy most of my designs and architectures revolve around AWS but some day I might actually learn another cloud and then utilize that instead. Fit for purpose.

    Anyway, the markdown file will continually grow as I start to build the idea into a mostly detailed out document that lays out the architecture, design principals, technologies to utilize, user flow and much more. The more detail the better!

    When I am satisfied with the initial idea markdown file I will provide it to Gemini. Its not my favorite AI model out there but it possess the ability to take in and track a large amount of context which is useful when presenting big ideas.

    I assign Gemini the role of “Senior Technology Architect”. I assume the role of “stakeholder”. Gemini’s task is to review the idea that I have and either validate or, create the architecture for the idea. I prompt it to return back a markdown file that contains the technical architecture and technical details for the idea. At this point we reach our first “Human in the loop” point.

    Because I don’t trust our AI overlords this is the first point at which I will fully review the document output by Gemini. I need to make sure that what the AI is putting out is valid, will work, and is using tools and technology that I am familiar with. If the output is proposing something that i’m unsure of I need to research or ask the AI to utilizing something else.

    After I am satisfied with the architecture document I place that into the project directory. This is where we change AI Models. You see Gemini is good at big picture stuff but not so good at specifics (in my opinion). I take the architecture document and provide it to Claude (Opus, web browser or app) and give it the role of Senior Technology Engineer. Its job is to review the architecture diagram, find any weak points or things that are missing or, sometimes, things that just won’t work. Then build a report and an engineering plan. This plan details out SPECIFIC technologies, patterns and resources to use.

    I usually repeat this process a few times and review each LLM’s output looking for things that might have been missed by either myself or the AI. Once I have them both in a place that I feel confident this is when I actually start building.

    Because I lack trust in AI, I make my own repository in GitHub and setup the repository on my local machine. I do allow the AI the ability to commit and push code to the repository. Once the repository has been created I have Gemini CLI build out the application file structure. This could include:

    • Creating folders
    • Creating empty files
    • Creating base logic
    • Creating Terraform module structures

    But NOTHING specific. Gemini, once again, is not good at detailed work. Maybe i’m using it wrong. Either way, I now have all of the basic structure. Think of Gemini as a Junior Engineer. It knows enough to be dangerous so it has many guardrails.

    # SAMPLE PROMPT FOR GEMINI
    You are a junior engineer working on your first project. Your current story is to review the architecture.md and the enginnering.md. Then, create a plan.md file that details out how you would go about creating the structure of this application. You should detail out every file that you think needs to be created as well as the folder structure. 

    Inside of the architecture and engineering markdown files there is detail about how the application should be designed, coded, and architected. Essentially a pure runbook for our junior engineer.

    Once Gemini has created its plan and I have reviewed it, I allow it write files into our project directory. These are mostly placeholder files. I will allow it to write some basic functions for coding and layout some Terraform files that are simple.

    Once our junior engineer, Gemini, has completed I usually go through and review all of the files against the plan that it created. If anything is missing I will direct it to review the plan again and make any corrections. Once the code is at a place where I am happy with it, I create my first commit and push this baseline into the repository.

    At this point its time for the heavy lifting. Time to put my expensive Anthropic subscription to use. Our “Senior Developer” Claude (Opus model) is let loose on the code base to build out all the logic. 9 times out of 10 I will allow it to make all the edits it wants and just let it go while I work on something else (watching YouTube).

    # SAMPLE CLAUDE PROMPT
    You are a senior developer. You are experienced in many application development patterns, AWS, Python and Terraform. You love programming and its all you ever want to do. Your story in this sprint is to first review the engineering.md, architecture.md and plan.md file. Then review the Junior Engineer's files in this project directory. Once you have a good grasp on the project write your own plan as developer-plan.md. Stop there and I, your manager, will review.

    After I review the plan I simply tell it to execute on the plan. Then I cringe as my usage starts to skyrocket.

    Claude will inevitably have an issue so I take a look at it every now and then, respond to questions if it has any or allow it to continue. Once it reaches a logical end I start reviewing its work. At this point it should have built me some form of the application that I can run locally. I’ll get this fired up and start poking around to make sure the application does what I want it to do.

    At this point we can take a step back from utilizing AI and start documenting bugs. If I think this is going to be a long project this is where I will build out a new project in Todoist so that I can have a persistent place to take notes and track progress. This is essentially a rudimentary Jira instance where each “task” is a story. I separate them into Bugs, Features, In Progress, Testing.

    My Claude Code utilizes the Todoist MCP so it can view/edit/complete tasks as needed. After I have documented as much as I can find I let Claude loose on fixing the bugs.

    I think the real magic also comes with automation. Depending on the project I will allow Claude Code access to my Jenkins server via MCP. This allows Claude code to monitor and troubleshooting builds. This allows Claude to operate independently. What happens is that it will create new branches and push them into a development environment triggering an automated deployment. The development environment is simply my home lab. I don’t care if anything breaks there and it doesn’t really cost any money. If the build fails, Claude can review the logs and process a fix and start the CI/CD all over again.

    Ultimately, I repeat the bug fix process until I get to my minimal viable product state and then deploy the application or project into whatever is deemed the production environment.

    So, its 2026, we’re using AI to build stuff. What is your workflow? Still copying and pasting? Not using AI at all? AI is just a bubble? Feel free to comment below!

  • Cloudwatch Alarm AI Agent

    I think one of the biggest time sucks is getting a vague alert or issue and not having a clue on where to start with troubleshooting.

    I covered this in the past when I built an agent that can review your AWS bill and find practical ways to save money within your account. This application wasn’t event driven but rather a container that you could spin up when you needed a review or something you could leave running in your environment. If we take a same read-only approach to building an AWS Agent we can have have a new event driven teammate that helps us with our initial troubleshooting.

    The process flow is straight forward:

    1. Given a Cloudwatch Alarm
    2. Send a notification to SNS
    3. Subscribe a Lambda function to the topic (this is our teammate)
    4. The function utilizes the AWS Nova Lite model to investigate the contents of the alarm and utilizes its read only capabilities to find potential solutions
    5. The agent sends its findings to you on your preferred platform

    For my environment I primarily utilize Slack for alerting and messaging so I built that integration. Here is an architecture diagram:

    When the alarm triggers we should see a message in Slack like:

    The AI is capable of providing you actionable steps to either find the root cause of the problem or in some cases, present you with steps to solve the problem.

    This workflow significantly reduces your troubleshooting time and by reducing the troubleshooting time it reduces your downtime.

    So, if this is something you are interested in deploying I have created a Terraform module so you can quickly deploy it into your own environment to reduce your troubleshooting steps!

    Check it out here: https://aiopscrew.com

    If you have questions feel free to reach out to me at anytime!

  • Fantasy Football and AI – Week 12

    Well, unfortunately we took a big loss and are now in a three way tie for first place. Here are the actual results:

    I think the biggest hit was how poorly Josh Allen played. What is interesting is that I was reviewing his passed performance against Houston and he has had his worst outings of his career there. This week was no different… The other interesting thing is that Saquon Barkley just isn’t the same back as he was last year. He is trending down.

    In response to Josh Allen’s poor outing I added a deviation and historical performance analysis against and opponent to the data set so now we have a value like:

    "HOU": {
       "avg_actual": 11.6,
       "avg_deviation": -2.67,
       "avg_deviation_percentage": -18.69,
       "avg_projected": 14.27,
       "by_season": {
        "2024": {
         "avg_actual": 14.6,
         "avg_deviation": 0,
         "avg_deviation_percentage": 0,
         "avg_projected": 0,
         "games_played": 1,
         "games_with_projections": 0
        },
        "2025": {
         "avg_actual": 5.05,
         "avg_deviation": -5.65,
         "avg_deviation_percentage": -52.8,
         "avg_projected": 10.7,
         "games_played": 2,
         "total_actual": 10.1,
         "total_projected": 21.4
        }
       },

    I then coded a new tool inside of the AI to reference this data set to assist with its final calculations.

    So, when we ask the tool about Josh Allen’s performance against a team we get something like:

    Found exactly what you're looking for in Josh Allen's performance data.
    
    Josh Allen vs Kansas City (Week 9, 2025):
    - 28.8 fantasy points - That's a STRONG performance!
    - He was projected for 22.0 points but exceeded by 6.8 points (+30.9%)
    - This was one of his better games this season
    
    Historical Context (2024 vs KC):
    - Week 11, 2024: 24.0 fantasy points vs Kansas City
    
    Bottom Line: Josh Allen has performed WELL against Kansas City in recent matchups. In their two most recent meetings:
    - 2025 (Week 9): 28.8 points
    - 2024 (Week 11): 24.0 points
    
    That's an average of 26.4 fantasy points against KC - well above his season average. Allen seems to rise to the occasion against elite competition like Kansas City. The Chiefs' defense can be vulnerable to mobile QBs who can extend plays, and Allen's dual-threat ability has historically given them problems.
    
    The takeaway: When Allen faces KC, he typically delivers QB1 numbers. That 28.8-point performance this season shows he can still light up even the defending champs!

    I need to backload some more historical data yet but that is a problem for a different day. Looking forward, I hope to build my own model based on this data and setup an automated workflow that will include data ingestion and automated model building so that I can consistently build predictions.

    So, on to week 13. We have a bunch of injuries this week to contend with so this lineup will likely be changing once people are reporting. Here is the current lineup:

    In other news I built and launched https://gridirondata.com which is an API to reference all of the data I have collected so far.

    Unfortunately, its not free. But if you message me about it I’ll probably hook you up!

  • Building jsontotoon.io: A Free Tool to Cut Your LLM API Costs

    If you’re working with LLM APIs, you’re probably watching your token counts like a hawk. Every JSON object you send to Claude, GPT-4, or Gemini costs tokens, and those curly braces and quotes add up fast. I built https://jsontotoon.io to solve this exact problem—and it’s completely free to use.

    The Problem: JSON is Token-Inefficient

    Here’s the thing: JSON is fantastic for machine-to-machine communication. It’s ubiquitous, well-supported, and everyone knows how to work with it. But when you’re paying per token to send data to an LLM? It’s wasteful.

    Look at a simple example:

    [
      {"name": "Alice", "age": 30, "city": "NYC"},
      {"name": "Bob", "age": 25, "city": "LA"},
      {"name": "Carol", "age": 35, "city": "Chicago"}
    ]

    That’s 125 tokens. All those quotes, braces, and commas? The LLM doesn’t need them to understand the structure. You’re literally paying to send redundant syntax.

    Enter TOON Format

    TOON (Token-Oriented Object Notation) converts that same data to:

    name, age, city
    Alice, 30, NYC
    Bob, 25, LA
    Carol, 35, Chicago

    68 tokens. That’s a 46% reduction. The same information, fully reversible back to JSON, but nearly half the cost.

    I realize this sounds too good to be true, but the math checks out. I tested it across real-world datasets—API responses, database dumps, RAG context—and consistently saw 35-45% token reduction. Your mileage will vary depending on data structure, but the savings are real.

    How I Built It

    The backend is straightforward Python running on AWS Lambda. The TOON parser itself is deterministic—same JSON always produces the same TOON output, and round-trip conversion is lossless. No data gets mangled, no weird edge cases (well, I fixed those during testing).

    Infrastructure-wise:

    CloudFront + S3 for the static frontend

    API Gateway + Lambda for the conversion endpoint

    DynamoDB for API key storage (with email verification via SES)

    WAF with rate limiting to prevent abuse (10 requests per 5 minutes on API endpoints)

    CloudWatch dashboards for monitoring

    The whole setup costs me about $8-15/month in AWS fees, mostly for WAF. The conversion itself is so fast (< 100ms average) and cheap that I can offer unlimited free API keys without worrying about runaway costs.

    Real Use Cases

    I built this because I was spending way too much on Claude API calls for my fantasy football AI agent project. Every week I send player stats, injury reports, and matchup data in prompts. Converting to TOON saved me about 38% on tokens—which adds up when you’re making hundreds of calls per week.

    But the use cases go beyond my specific problem:

    RAG systems: Fit more context documents in your prompts without hitting limits

    Data analysis agents: Send larger datasets for analysis at lower cost

    Few-shot learning: Include more examples without token bloat

    Structured outputs: LLMs can generate TOON that’s easier to parse than JSON

    Try It Yourself

    The web interface at https://jsontotoon.io is free to use—no signup required. Just paste your JSON, get TOON. If you want to integrate it into your application, grab a free API key (also no cost, no expiration).

    Full API docs are available at https://jsontotoon.io/docs.html, with code examples in Python, JavaScript, Go, and cURL.