Building a CloudWatch AI Agent – Architecture & Lessons Learned

I started off this project as a simple AI agent that would help me troubleshoot issues within my AWS environments. If we wrap up an LLM inside of a Lambda function and then feed the alarms to it, have Amazon Nova Lite interpret the alarm and then give me some troubleshooting steps.

I chose Nova Lite because of its cost and and it is quite preformant when troubleshooting AWS resources. The overall time from alarm to Slack notification was between 12 and 20 seconds.

When I first created this solution I wanted to sell it as a monthly service. The user would deploy the routing infrastructure into their own account and I would host the LLM layer. I priced this cheap. $5 per month. That’s it.

The problem was that nobody was interested.

As I continued to promote the product the constant feedback was that nobody wanted to send any data to a 3rd party account. I refactored the architecture so that the LLM functionality would live in their account and the subscription would cover maintenance and support.

Still no interest.

At the end of the day, I created something before validating that someone actually wanted the solution. While it saves time in troubleshooting it lacks the ability to solve the problem. A human engineer might save some time discovering the root cause but they still have to fix the issue.

I still use this setup in my personal account. I made the repository open source so that if anyone wants to utilize it they can. Maybe someone will find it useful!

Check it out on my other website

Don’t miss an update

Comments

Leave a Reply