Aaron VanSledright

Category: Cloud Architecting

  • Security Group ID Finder

    Security Group ID Finder

    I have been working on deploying resources to a lot of AWS accounts lately where each account has the same network infrastructure. When deploying Lambdas, I had the common name of the security group but not the ID. I wrote this utility to get the security group ID for me quickly.

    import boto3
    import sys
    
    def get_security_group_id(common_name):
        ec2 = boto3.client("ec2", region_name="us-west-2")
    
        response = ec2.describe_security_groups()
        for security_group in response['SecurityGroups']:
            if security_group['GroupName'] == common_name:
                return security_group['GroupId']
            
    if __name__ == '__main__':
        if sys.argv[1] == "help" or sys.argv[1] == "--help" or sys.argv[1] == "usage" or sys.argv[1] == "--usage":
            print("USAGE: python3 main.py <security group name>")
        else:
            sg_id = get_security_group_id(sys.argv[1])
            if sg_id == None:
                print("Security Group Not found")
            else:
                print(sg_id)

    This is a simple tool that can be used on your command line by doing:

    python3 main.py <security group name>

    I hope this helps speed up your deployments. Feel free to share the code with your friends and team!

    Github

  • A Dynamo Data Migration Tool

    A Dynamo Data Migration Tool

    Have you ever wanted to migrate data from one Dynamo DB table to another? I haven’t seen an AWS tool to do this so I wrote one using Python.

    A quick walk through video
    import sys
    import boto3
    
    ## USAGE ############################################################################
    ## python3 dynamo.py <Source_Table> <destination table>                            ## 
    ## Requires two profiles to be set in your AWS Config file "source", "destination" ##
    #####################################################################################
    def dynamo_bulk_reader():
        session = boto3.session.Session(profile_name='source')
        dynamodb = session.resource('dynamodb', region_name="us-west-2")
        table = dynamodb.Table(sys.argv[1])
    
        print("Exporting items from: " + str(sys.argv[1]))
    
        response = table.scan()
        data = response['Items']
    
        while 'LastEvaluatedKey' in response:
            response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
            data.extend(response['Items'])
    
        print("Finished exporting: " + str(len(data)) + " items.")
        return data
    
    def dynamo_bulk_writer():
        session = boto3.session.Session(profile_name='destination')
        dynamodb = session.resource('dynamodb', region_name='us-west-2')
        table = dynamodb.Table(sys.argv[2])
        print("Importing items into: " + str(sys.argv[2]))
        for table_item in dynamo_bulk_reader():
            with table.batch_writer() as batch:
                response = batch.put_item(
                Item=table_item
                )
    
        print("Finished importing items...")
    if __name__ == '__main__':
        print("Starting Dynamo Migrater...")
        dynamo_bulk_writer()
        print("Exiting Dynamo Migrator")

    The process is pretty simple. First, we get all of our data from our source table. We store this in a list. Next, we iterate over that list and write it to our destination table using the ‘Batch Writer’.

    The program has been tested against tables containing over 300 items. Feel free to use it for your environments! If you do use it, please share it with your friends and link back to this article!

    Github: https://github.com/avansledright/dynamo-migrate

  • Querying and Editing a Single Dynamo Object

    I have a workflow that creates a record inside of a DynamoDB table as part of a pipeline within AWS. The record has a primary key of the Code Pipeline job. Later in the pipeline I wanted to edit that object to append the status of resources created by this pipeline.

    In order to do this, I created two functions. One that first returns the item from the table and the second that actually does the update and puts the updated item back into the table. Take a look at the code below and utilize it if you need to!

    import boto3 
    from boto3.dynamodb.conditions import Key
    
    def query_table(id):
        dynamodb = boto3.resource('dynamodb')
        table = dynamodb.Table('XXXXXXXXXXXXXX')
        response = table.query(
            KeyConditionExpression=Key('PRIMARYKEY').eq(id)
        )
        return response['Items']
    
    
    def update_dynanmo_status(id, resource_name, status):
        dynamodb = boto3.resource('dynamodb')
        table = dynamodb.Table('XXXXXXXXXXXXX')
        items = query_table(id)
        for item in items:
            # Do your update here
            response = table.put_item(Item=item)
        return response
  • Pandas & NumPy with AWS Lambda

    Fun fact: Pandas and NumPy don’t work out of the box with Lambda. The libraries that you might download from your development machine probably won’t work either.

    The standard Lambda Python environment is very barebones by default. There is no point in loading in a bunch of libraries if they aren’t needed. This is why we package our Lambda functions into ZIP files to be deployed.

    My first time attempting to use Pandas on AWS Lambda was in regards to concatenating Excel files. The point of this was to take a multi-sheet Excel file and combine it into one sheet for ingestion into a data lake. To accomplish this I used the Pandas library to build the new sheet. In order to automate the process I setup an S3 trigger on a Lambda function to execute the script every time a file was uploaded.

    And then I ran into this error:

    [ERROR] Runtime.ImportModuleError: Unable to import module 'your_module':
    IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
    Importing the numpy c-extensions failed.

    I had clearly added the NumPy library into my ZIP file:

    So what was the problem? Well, apparently, the version of NumPy that I downloaded on both my Macbook and my Windows desktop is not compatible with Amazon Linux.

    To resolve this issue, I first attempted to download the package files manually from PyPi.org. I grabbed the latest “manylinux1_x86_x64.whl” file for both NumPy and Pandas. I put them back into my ZIP file and re-uploaded the file. This resulted in the same error.

    THE FIX THAT WORKED:

    The way to get this to work without failure is to spin up an Amazon Linux EC2 instance. Yes this seems excessive and it is. Not only did I have to spin up a new instance I had to install Python 3.8 because Amazon Linux ships with Python 2.7 by default. But, once installed you can use Pip to install the libraries to a directory by doing:

    pip3 install -t . <package name>

    This is useful for getting the libraries in the same location to ZIP back up for use. You can remove a lot of the files that are not needed by running:

    rm -r *.dist-info __pycache__

    After you have done the cleanup, you can ZIP up the files and move them back to your development machine, add your Lambda function and, upload to the Lambda console.

    Run a test! It should work as you intended now!

    If you need help with this please reach out to me on social media or leave a comment below.

  • A File Management Architecture

    A File Management Architecture

    This post is a continuation of my article: “A File Extraction Project”. This project has been a great learning experience for both frontend and backend application architecture and design. Below you will find a diagram and an explanation of all the pieces that make this work.

    1. The entire architecture is powered by Flask on an EC2 instance. When I move this project to production I intend to put an application load balancer in front to manage traffic. The frontend is also secured by Google Authentication. This provides authentication against the users existing GSuite deployment so that only individuals within the organization can access the application.
    2. The first Lambda function processes the upload functions. I am allowing for as many files as needed by the customer. The form also includes a single text field for specifying the value of the object tag. The function sends the objects into the first bucket which is object #4.
    3. The second Lambda function is the search functionality. This function allows the user to provide a tag value. The function queries all objects in bucket #4 and creates a list of objects that match the query. It then moves the objects to bucket #5 where it packages them up and presents them to the user in the form of a ZIP file.
    4. The first bucket is the storage for all of the objects. This is the bucket where all the objects are uploaded to from the first Lambda function. It is not publicly accessible.
    5. The second bucket is a temporary storage for files requested by the user. Objects are moved into this bucket from the first bucket. This bucket has a deletion policy that only allows objects to live inside it for 24 hours.

    Lambda Function for File Uploading:

    def upload():
        if request.method == 'POST':
            tag = request.form['tag']
            files = request.files.getlist('file')
            print(files)
            for file in files:
    
                print(file)
                if file:
                        filename = secure_filename(file.filename)
                        file.save(filename)
                        s3.upload_file(
                            Bucket = BUCKET_NAME,
                            Filename=filename,
                            Key = filename
                        )
                        
                        s3.put_object_tagging(
                            Bucket=BUCKET_NAME,
                            Key=filename,
                            Tagging={
                                'TagSet': [
                                    {
                                        'Key': 'Tag1',
                                        'Value': tag
                                    },
                                    {
                                        'Key': 'Tag2',
                                        'Value': 'Tag-value'
                                    },
                                ]
                            },
                        )
            msg = "Upload Done ! "

    The function lives within the Flask application. I have AWS permissions setup on my EC2 instance to allow the “put_object” function. You can assign tags as needed. The first tag references the $tag variable which is provided by the form submission.

    For Google Authentication I utilized a project I found on Github here. In the “auth” route that is created I modified it to authenticate against the “hd” parameter passed by the processes. You can see how this works here:

    @app.route('/auth')
    def auth():
        token = oauth.google.authorize_access_token()
        user = oauth.google.parse_id_token(token)
        session['user'] = user
        if "hd" not in user:
            abort(403)
        elif user['hd'] != 'Your hosted domain':
            abort(403)
        else:
            return redirect('/')

    If the “hd” parameter is not passed through the function it will abort with a “403” error.

    If you are interested in this project and want more information feel free to reach out and I can provide more code examples or package up the project for you to deploy on your own!

    If you found this article helpful please share it with your friends.

  • A File Extraction Project

    I had a client approach me regarding a set of files they had. The files were a set of certificates to support their products. They deliver these files to customers in the sales process.

    The workflow currently involves manually packaging the files up into a deliverable format. The client asked me to automate this process across their thousands of documents.

    As I started thinking through how this would work, I decided to create a serverless approach utilizing Amazon S3 for document storage and Lambda to do the processing and Amazon S3 and Cloudfront to generate a front end for the application.

    My current architecture involves two S3 buckets. One bucket to store the original PDF documents and one to pull in the documents that we are going to package up for the client before sending.

    The idea is that we can tag each PDF file with its appropriate lot number supplied by the client. I will then use a simple form submission process to supply input into the function that will collect the required documents.

    Here is the code for the web frontend:

    <!DOCTYPE html>
    <html>
    <head>
        <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
        <script type="text/javascript">
            $(document).ready(function() {
    
                $("#submit").click(function(e) {
                    e.preventDefault();
    
                    var lot = $("#lot").val();
    
                    $.ajax({
                        type: "POST",
                        url: 'API_URLHERE',
                        contentType: 'application/json',
                        data: JSON.stringify({
                            'body': lot,
                        }),
                        success: function(res){
                            $('#form-response').text('Query Was processed.');
                        },
                        error: function(){
                            $('#form-response').text('Error.');
                        }
                    });
    
                })
    
            });
        </script>
    </head>
    <body>
    <form>
        <label for="lot">Lot</label>
        <input id="lot">
        <button id="submit">Submit</button>
    </form>
    <div id="form-response"></div>
    </body>
    </html>

    This is a single field input form that sends a string to my Lambda function. Once the string is received we will convert it into a JSON object and then use that to find our objects within Amazon S3.

    Here is the function:

    import boto3
    import json
    
    
    def lambda_handler(event, context):
        form_response = event['body']
        tag_list = json.loads(form_response)
        print(tag_list)
        tag_we_want = tag_list['body']
        
        
        
        s3 = boto3.client('s3')
        bucket = "source_bucket"
        destBucket = "destination_bucket"
        download_list = []
        #get all the objects in a bucket
        get_objects = s3.list_objects(
            Bucket= bucket,
        )
    
        object_list = get_objects['Contents']
    
        object_keys = []
        for object in object_list:
            object_keys.append(object['Key'])
    
        object_tags = []
        for key in object_keys:
            object_key = s3.get_object_tagging(
                Bucket= bucket,
                Key=key,
            )
    
            object_tags.append(
                {
                'Key': key,
                'tags': object_key['TagSet'][0]['Value']
                }
            )
    
        for tag in object_tags:
    
            if tag['tags'] == tag_we_want:
                object_name = tag['Key']
                s3.copy_object(
                    Bucket= destBucket,
                    CopySource= {
                        'Bucket': bucket,
                        'Key': object_name,
                    },
                    Key= object_name,
                )
                download_list.append(object_name)
    
        return download_list, tag_we_want

    In this code, we define our source and destination buckets first. With the string from the form submission, we first gather all the objects within the bucket and then iterate over each object to find matching tags.

    Once we gather the files we want for our customers we then transfer these files to a new bucket. I return the list of files out of the function as well as the tag name.

    My next step is to package all the files required into a ZIP file for downloading. I first attempted to do this in Lambda but quickly realized you cannot use Lambda to generate files as the file system is read only.

    Right now, I am thinking of utilizing Docker to spawn a worker which will generate the ZIP file, place it back into the bucket and provide a time-sensitive download link to the client.

    Stay tuned for more updates on this project.

  • A Self Hosted Server Health Check

    I’m not big on creating dashboards. I find that I don’t look at them enough to warrant hosting the software on an instance and having to have the browser open to the page all the time.

    Instead, I prefer to be alerted via Slack as much as possible. I wrote scripts to collect DNS records from Route53. I decided that I should expand on the idea and create a scheduled job that would execute at a time interval. This way my health checks are fully automated.

    Before we get into the script, you might ask me why I don’t just use Route53 health checks! The answer is fairly simple. First, the cost of health checks for HTTPS doesn’t make sense for the number of web servers that I am testing. Second, I don’t want to test Route53 or any AWS resource from within AWS. Rather, I would like to use my own network to test as it is not connected to AWS.

    You can find the code and the Lambda function hosted on GitHub. The overall program utilizes a few different AWS products:

    • Lambda
    • SNS
    • CloudWatch Logs

    It also uses Slack but that is an optional piece that I will explain. The main functions reside in “main.py”. This piece of code follows the process of:

    1. Iterating over Route53 Records
    2. Filtering out “A” records and compiling a list of domains
    3. Testing each domain and processing the response code
    4. Logging all of the results to CloudWatch Logs
    5. Sending errors to the SNS topic

    I have the script running on a CRON job every hour.

    The second piece of this is the Lambda function. The function is all packaged in the “lambda_function.zip” but, I also added the function outside of the ZIP file for editing. You can modify this function to utilize your Slack credentials.

    The Lambda function is subscribed to your SNS topic so that whenever a new message appears, that message is sent to your specified Slack channel.

    I have plans to test my Terraform skills to automate the deployment of the Lambda function, SNS topic, CloudWatch Logs, and the primary script in some form.

    If you have any comments on how I could improve this function please post a comment here or raise an issue on GitHub. If you find this script helpful in anyway feel free to share it with your friends!

    Links:
    Server Health Check – GitHub

    Code – Main Function (main.py)

    import boto3
    import requests
    import os
    import time
    
    
    #aws variables
    sns = boto3.client('sns')
    aws = boto3.client('route53')
    cw = boto3.client('logs')
    paginator = aws.get_paginator('list_resource_record_sets')
    response = aws.list_hosted_zones()
    hosted_zones = response['HostedZones']
    time_now = int(round(time.time() * 1000))
    
    #create empty lists
    zone_id_to_test = []
    dns_entries = []
    zones_with_a_record = []
    #Create list of ZoneID's to get record sets from       
    for key in hosted_zones:
        zoneid = key['Id']
        final_zone_id = zoneid[12:]
        zone_id_to_test.append(final_zone_id)
    
    #Create ZoneID List    
    def getARecord(zoneid):
        for zone in zoneid:
            try:
                response = paginator.paginate(HostedZoneId=zone)
                for record_set in response:
                    dns = record_set['ResourceRecordSets']
                    dns_entries.append(dns)
    
            except Exception as error:
                print('An Error')
                print(str(error))
                raise
    #Get Records to test
    def getCNAME(entry):
        for dns_entry in entry:
            for record in dns_entry:
                if record['Type'] == 'A':
                    url = (record['Name'])
                    final_url = url[:-1]
                    zones_with_a_record.append(f"https://{final_url}")
    #Send Result to SNS                
    def sendToSNS(messages):
        message = messages
        try:
            send_message = sns.publish(
                TargetArn='YOUR_SNS_TOPIC_ARN_HERE',
                Message=message,
                )
        except:
            print("something didn't work")
    def tester(urls):
        for url in urls:
            try:
                user_agent = {'User-agent': 'Mozilla/5.0'}
                status = requests.get(url, headers = user_agent, allow_redirects=True)
                code = (status.status_code)
                if code == 401:
                    response = f"The site {url} reports status code: {code}"
                    writeLog(response)
                elif code == 301:
                    response = f"The site {url} reports status code: {code}"
                    writeLog(response)
                elif code == 302:
                    response = f"The site {url} reports status code: {code}"
                    writeLog(response)
                elif code == 403:
                    response = f"The site {url} reports status code: {code}"
                    writeLog(response)
                elif code !=200:
                    sendToSNS(f"The site {url} reports: {code}")
                    response = f"The site {url} reports status code: {code}"
                    writeLog(response)
                else:
                    response = f"The site {url} reports status code: {code}"
                    writeLog(response)
            except:
                sendToSNS(f"The site {url} failed testing")
                response = f"The site {url} reports status code: {code}"
                writeLog(response)
    
    def writeLog(message):
        getToken = cw.describe_log_streams(
            logGroupName='healthchecks',   
            )
        logInfo = (getToken['logStreams'])
        nextToken = logInfo[0]['uploadSequenceToken']
        response = cw.put_log_events(
            logGroupName='YOUR_LOG_GROUP_NAME',
            logStreamName='YOUR_LOG_STREAM_NAME',
            logEvents=[
                {
                    'timestamp': time_now,
                    'message': message
                },
            ],
            sequenceToken=nextToken
        )
    #Execute            
    getARecord(zone_id_to_test)
    getCNAME(dns_entries)
    tester(zones_with_a_record)
    
    

    Code: Lambda Function (lambda_function.py)

    import logging
    logging.basicConfig(level=logging.DEBUG)
    
    import os
    from slack import WebClient
    from slack.errors import SlackApiError
    
    
    slack_token = os.environ["slackBot"]
    client = WebClient(token=slack_token)
    
    def lambda_handler(event, context):
        detail = event['Records'][0]['Sns']['Message']
        response_string = f"{detail}"
        try:
            response = client.chat_postMessage(
                channel="YOUR CHANNEL HERE",
                text="SERVER DOWN",
                blocks = [{"type": "section", "text": {"type": "plain_text", "text": response_string}}]
            )   
    
        except SlackApiError as e:
            assert e.response["error"]
        return
  • Where Is It 5 O’Clock Pt: 4

    As much as I’ve scratched my head working on this project it has been fun to learn some new things and build something that isn’t infrastructure automation. I’ve learned some frontend web development some backend development and utilized some new Amazon Web Services products.

    With all that nice stuff said I’m proud to announce that I have built a fully functioning project that is finally working the way I intended it. You can visit the website here:

    www.whereisitfiveoclock.net

    To recap, I bought this domain one night as a joke and thought “Hey, maybe one day I’ll build something”. I started off building a fully Python application backed by Flask. You can read about that in Part 1.This did not work out the way I intended as it did not refresh the timezones on page load. In part 3 I discussed how I was rearchitecting the project to include an API that would be called upon page load.

    The API worked great and delivered two JSON objects into my frontend. I then parsed the two JSON objects into two separate tables that display where you can be drinking and where you probably shouldn’t be drinking.

    This is a snippet of the JavaScript I wrote to iterate over the JSON objects while adding them into the appropriate table:

    function buildTable(someinfo){
                    var table1 = document.getElementById('its5pmsomewhere')
                    var table2 = document.getElementById('itsnot5here')
                    var its5_json = JSON.parse(someinfo[0]);
                    var not5_json = JSON.parse(someinfo[1]);
                    var its5_array = []
                    var not5_array = []
                    its5_json['its5'].forEach((value, index) => {
    
                        var row = `<tr>
                                    <td>${value}</td>
                                    <td></td>
                                    </tr>`
                    
                        table1.innerHTML += row
                    })  
                    not5_json['not5'].forEach((value, index) => {
    
                            var row = `<tr>
                                    <td></td>
                                    <td>${value}</td>
                                    </tr>`
                    
                        table2.innerHTML += row
                    })  

    First I reference two different HTML tables. I then parse the JSON from the API. I take both JSON objects and iterate over them adding the timezones into the table and then returning them into the HTML table.

    If you want more information on how I did this feel free to reach out.

    I want to continue iterating over this application to add new features. I need to do some standard things like adding Google Analytics so I can track traffic. I also want to add a search feature and a map that displays the different areas of drinking acceptability.

    I also am open to requests. One of my friends suggested that I add a countdown timer to each location that it is not yet acceptable to be drinking.

    Feel free to reach out in the comments or on your favorite social media platform! And as always, if you liked this project please share it with your friends.

  • Where Is It Five O’Clock Pt: 3

    So I left this project at a point where I felt it needed to be re-architected based on the fact that Flask only executes the function once and not every time the page loads.

    I re-architected the application in my head to include an API that calls the Lambda function and returns a list of places where it is and is not acceptable to be drinking based on the 5 O’Clock rules. These two lists will be JSON objects that have a single key with multiple values. The values will be the timezones appropriate to be drinking in.

    After the JSON objects are generated I can reference them through the web frontend and display them in an appropriate way.

    At this point I have the API built out and fully funcitoning the way I think I want it. You can use it by executing the following:
    curl https://5xztnem7v4.execute-api.us-west-2.amazonaws.com/whereisit5

    I will probably only have this publically accessible for a few days before locking it back down.

    Hopefully, in part 4 of this series, I will have a frontend demo to show!

  • Where Is It 5 O’Clock Pt: 2

    So I spend the evening deploying this web application to Amazon Web Services. In my test environment, everything appeared to be working great because every time I reloaded the page it reloaded the function as well.

    When I transferred this over to a live environment I realized the Python function only ran every time I committed a change and it was re-deployed to my Elastic Beanstalk environment.

    This poses a new problem. If the function doesn’t fire every time the page is refreshed the time won’t properly update and it will show incorrect areas of where it is 5 O’Clock. Ugh.

    So, over the next few weeks, in my spare time, I will be re-writing this entire application to function the way I intended it to.

    I think to do this I will write each function as an AWS Lambda function and then write a frontend that calls these functions on page load. Or, the entire thing will be one function and return the information and it will deploy in one API call.

    I also really want to display a map that shows the areas that it is 5PM or later but I think this will come in a later revision once the project is actually functioning correctly. Along with some more CSS to make it pretty and responsive so it works on all devices.

    The punch list is getting long…

    Follow along here: https://whereisitfiveoclock.net