Your cart is currently empty!
Tag: python
Pandas & NumPy with AWS Lambda
Fun fact: Pandas and NumPy don’t work out of the box with Lambda. The libraries that you might download from your development machine probably won’t work either.
The standard Lambda Python environment is very barebones by default. There is no point in loading in a bunch of libraries if they aren’t needed. This is why we package our Lambda functions into ZIP files to be deployed.
My first time attempting to use Pandas on AWS Lambda was in regards to concatenating Excel files. The point of this was to take a multi-sheet Excel file and combine it into one sheet for ingestion into a data lake. To accomplish this I used the Pandas library to build the new sheet. In order to automate the process I setup an S3 trigger on a Lambda function to execute the script every time a file was uploaded.
And then I ran into this error:
[ERROR] Runtime.ImportModuleError: Unable to import module 'your_module': IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE! Importing the numpy c-extensions failed.
I had clearly added the NumPy library into my ZIP file:
So what was the problem? Well, apparently, the version of NumPy that I downloaded on both my Macbook and my Windows desktop is not compatible with Amazon Linux.
To resolve this issue, I first attempted to download the package files manually from PyPi.org. I grabbed the latest “manylinux1_x86_x64.whl” file for both NumPy and Pandas. I put them back into my ZIP file and re-uploaded the file. This resulted in the same error.
THE FIX THAT WORKED:
The way to get this to work without failure is to spin up an Amazon Linux EC2 instance. Yes this seems excessive and it is. Not only did I have to spin up a new instance I had to install Python 3.8 because Amazon Linux ships with Python 2.7 by default. But, once installed you can use Pip to install the libraries to a directory by doing:
pip3 install -t . <package name>
This is useful for getting the libraries in the same location to ZIP back up for use. You can remove a lot of the files that are not needed by running:
rm -r *.dist-info __pycache__
After you have done the cleanup, you can ZIP up the files and move them back to your development machine, add your Lambda function and, upload to the Lambda console.
Run a test! It should work as you intended now!
If you need help with this please reach out to me on social media or leave a comment below.
Concatenating Multi-Sheet Excel Files with Python
I recently came across a data source that used multi-sheets within an Excel file. My dashboard cannot read a multi-sheet Excel file so I needed to combine them into one sheet.
The file is being uploaded into an S3 bucket and then needs to move through the data lake to be read into the dashboard. The final version of this script will be a Lambda function that is triggered on upload of the file, concatenate the sheets, and then place a new file into the next layer of the data lake.
Using Pandas you can easily accomplish this task. One issue I did run into is that Pandas no longer will read XLSX files so I did have to convert it down into an XLS file which is easily done through Excel. In the future this will also have to be done programmatically. Let’s get into the code.
import pandas as pd workbook = pd.ExcelFile('Yourfile.XLS') sheets = ['create', 'a', 'list'] dataframe = []import pandas as pd workbook = pd.ExcelFile('file.xls') sheets = ['create', 'a', 'list'] dataframe = [] for sheet in sheets: df = pd.read_excel(workbook, sheet_name=sheet, skiprows=[list of rows to skip], skipfooter=number_of_rows_to_skip_from_bottom) df.columns = ['list', 'of', 'column', 'headers'] dataframe.append(df) df = pd.concat(dataframe) df.to_excel("output.xls", index=False)
To start we are going to import the Pandas library and then read in our Excel file. In the future revision of this script I will be reading in the file from S3 through the Lambda event so this will need to change.
The “sheets” variable is a list of sheets that you want the script to look at. You can remove this if you want it to look at all the sheets. My file had a few sheets that could be ignored. We will also create an empty list called “dataframe”. This empty list will be used to store each of the sheets that we want to concatenate. In the production version of this script there is some modifications that need to be done on each sheet. I accomplished this by adding in “if/then” statements based on the sheet name.
At the end of the “for” loop we will append the data frame into our empty list. Once all the sheets have been added, we will use Pandas to concatenate the objects and output the file. You can specify your output file name. I also included the “index=false” which removes the first column of index numbers. This is not needed for my project.
So there you have it, a simple Python script to concatenate a multi-sheet Excel file. If this script helps you please share it with your network!
Where Is It 5 O’Clock Pt: 4
As much as I’ve scratched my head working on this project it has been fun to learn some new things and build something that isn’t infrastructure automation. I’ve learned some frontend web development some backend development and utilized some new Amazon Web Services products.
With all that nice stuff said I’m proud to announce that I have built a fully functioning project that is finally working the way I intended it. You can visit the website here:
To recap, I bought this domain one night as a joke and thought “Hey, maybe one day I’ll build something”. I started off building a fully Python application backed by Flask. You can read about that in Part 1.This did not work out the way I intended as it did not refresh the timezones on page load. In part 3 I discussed how I was rearchitecting the project to include an API that would be called upon page load.
The API worked great and delivered two JSON objects into my frontend. I then parsed the two JSON objects into two separate tables that display where you can be drinking and where you probably shouldn’t be drinking.
This is a snippet of the JavaScript I wrote to iterate over the JSON objects while adding them into the appropriate table:
function buildTable(someinfo){ var table1 = document.getElementById('its5pmsomewhere') var table2 = document.getElementById('itsnot5here') var its5_json = JSON.parse(someinfo[0]); var not5_json = JSON.parse(someinfo[1]); var its5_array = [] var not5_array = [] its5_json['its5'].forEach((value, index) => { var row = `<tr> <td>${value}</td> <td></td> </tr>` table1.innerHTML += row }) not5_json['not5'].forEach((value, index) => { var row = `<tr> <td></td> <td>${value}</td> </tr>` table2.innerHTML += row })
First I reference two different HTML tables. I then parse the JSON from the API. I take both JSON objects and iterate over them adding the timezones into the table and then returning them into the HTML table.
If you want more information on how I did this feel free to reach out.
I want to continue iterating over this application to add new features. I need to do some standard things like adding Google Analytics so I can track traffic. I also want to add a search feature and a map that displays the different areas of drinking acceptability.
I also am open to requests. One of my friends suggested that I add a countdown timer to each location that it is not yet acceptable to be drinking.
Feel free to reach out in the comments or on your favorite social media platform! And as always, if you liked this project please share it with your friends.
Where Is It Five O’Clock Pt: 3
So I left this project at a point where I felt it needed to be re-architected based on the fact that Flask only executes the function once and not every time the page loads.
I re-architected the application in my head to include an API that calls the Lambda function and returns a list of places where it is and is not acceptable to be drinking based on the 5 O’Clock rules. These two lists will be JSON objects that have a single key with multiple values. The values will be the timezones appropriate to be drinking in.
After the JSON objects are generated I can reference them through the web frontend and display them in an appropriate way.
At this point I have the API built out and fully funcitoning the way I think I want it. You can use it by executing the following:
curl https://5xztnem7v4.execute-api.us-west-2.amazonaws.com/whereisit5
I will probably only have this publically accessible for a few days before locking it back down.
Hopefully, in part 4 of this series, I will have a frontend demo to show!
Where Is It 5 O’Clock Pt: 2
So I spend the evening deploying this web application to Amazon Web Services. In my test environment, everything appeared to be working great because every time I reloaded the page it reloaded the function as well.
When I transferred this over to a live environment I realized the Python function only ran every time I committed a change and it was re-deployed to my Elastic Beanstalk environment.
This poses a new problem. If the function doesn’t fire every time the page is refreshed the time won’t properly update and it will show incorrect areas of where it is 5 O’Clock. Ugh.
So, over the next few weeks, in my spare time, I will be re-writing this entire application to function the way I intended it to.
I think to do this I will write each function as an AWS Lambda function and then write a frontend that calls these functions on page load. Or, the entire thing will be one function and return the information and it will deploy in one API call.
I also really want to display a map that shows the areas that it is 5PM or later but I think this will come in a later revision once the project is actually functioning correctly. Along with some more CSS to make it pretty and responsive so it works on all devices.
The punch list is getting long…
Follow along here: https://whereisitfiveoclock.net
Where Is It Five O’Clock Pt: 1
I bought the domain whereisitfiveoclock.net a while back and have been sitting on it for quite some time. I had an idea to make a web application that would tell you where it is five o’clock. Yes, this is a drinking website.
I saw this project as a way to learn more Python skills, as well as some more AWS skills, and boy, has it put me to the test. So I’m going to write this series of posts as a way to document my progress in building this application.
Part One: Building The Application
I know that I want to use Python because it is my language of choice. I then researched what libraries I could use to build the frontend with. I came across Flask as an option and decided to run with that. The next step I had to do was actually find out where it was 5PM.
In my head, I came up with the process that if I could first get a list of all the timezone and identify the current time in them I could filter out which timezones it was 5PM. Once establishing where it was 5PM, I can then get that information to Flask and figure out a way to display it.
Here is the function for identifying the current time in all timezones and then storing each key pair of {Timezone : Current_Time }
def getTime(): now_utc = datetime.now(timezone('UTC')) #print('UTC:', now_utc) timezones = pytz.all_timezones #get all current times and store them into a list tz_array = [] for tz in timezones: current_time = now_utc.astimezone(timezone(tz)) values = {tz: current_time.hour} tz_array.append(values) return tz_array
Once everything was stored into tz_array I took that info and passed it through the following function to identify it was 5PM. I have another function that identifies everything that is NOT 5PM.
def find5PM(): its5pm = [] for tz in tz_array: timezones = tz.items() for timezone, hour in timezones: if hour >= 17: its5pm.append(timezone) return its5pm
I made a new list and stored just the timezone name into that list and return it.
Once I had all these together I passed them through as variables to Flask. This is where I first started to struggle. In my original revisions of the functions, I was only returning one of the values rather than returning ALL of the values. This resulted in hours of struggling to identify the cause of the problem. Eventually, I had to start over and completely re-work the code until I ended up with what you see above.
The code was finally functional and I was ready to deploy it to Amazon Web Services for public access. I will discuss my design and deployment in Part Two.
EC2 Action Slack Notification
I took a brief break from my Lambda function creation journey to go on vacation but, now i’m back!
This function will notify a Slack channel of your choosing when an EC2 instance enters “Starting, Stopping, Stopped, or Shutting-Down” status. I thought this might be useful for instances that reside under a load balancer. It would be useful to see when your load balancer is scaling up or down in real-time via Slack notification.
In order to use this function, you will need to create a Slack Application with an OAuth key and set that key as an environment variable in your Lambda function. If you are unsure of how to do this I can walk you through it!
Please review the function below
import logging import requests import boto3 import os from urllib.parse import unquote_plus from slack import WebClient from slack.errors import SlackApiError logging.basicConfig(level=logging.DEBUG) # Check EC2 Status def lambda_handler(event, context): detail = event['detail'] ids = detail['instance-id'] eventname = detail['state'] ec2 = boto3.resource('ec2') # Slack Variables slack_token = os.environ["slackBot"] client = WebClient(token=slack_token) channel_string = "XXXXXXXXXXXXXXXXXXXX" # Post to slack that the instance is running if eventname == 'running': try: instance = ids response_string = f"The instance: {instance} has started" response = client.chat_postMessage( channel= channel_string, text="An Instance has started", blocks = [{"type": "section", "text": {"type": "plain_text", "text": response_string}}] ) except SlackApiError as e: assert e.response["error"] #Post to slack that instance is shutting down elif eventname == 'shutting-down': try: instance = ids response_string = f"The instance: {instance} is shutting down" response = client.chat_postMessage( channel= channel_string, text="An Instance is Shutting Down", blocks = [{"type": "section", "text": {"type": "plain_text", "text": response_string}}] ) except SlackApiError as e: assert e.response["error"] elif eventname == 'stopped': try: instance = ids response_string = f"The instance: {instance} has stopped" response = client.chat_postMessage( channel= channel_string, text="An Instance has stopped", blocks = [{"type": "section", "text": {"type": "plain_text", "text": response_string}}] ) except SlackApiError as e: assert e.response["error"] elif eventname == 'stopping': try: instance = ids response_string = f"The instance: {instance} is stopping" response = client.chat_postMessage( channel= channel_string, text="An Instance is stopping", blocks = [{"type": "section", "text": {"type": "plain_text", "text": response_string}}] ) except SlackApiError as e: assert e.response["error"]
As always the function is available on GitHub as well:
https://github.com/avansledright/ec2ActionPostToSlackIf you find this function helpful please share it with your friends or repost it on your favorite social media platform!
Check EC2 Instance Tags on Launch
In my ever-growing quest to automate my AWS infrastructure deployments, I realized that just checking my tags wasn’t good enough. I should force myself to put tags in otherwise my instances won’t launch at all.
I find this particularly useful because I utilize AWS Backup to do automated snapshots nightly of all of my instances. If I don’t put the “Backup” tag onto my instance it will not be included in the rule. This concept of forced tagging could be utilized across many different applications including tagging for development, production, or testing environments.
To do this I created the Lambda function below. Utilizing EventBridge I have this function every time there is an EC2 instance that enters the “running” state.
import json import boto3 def lambda_handler(event, context): detail = event['detail'] ids = detail['instance-id'] eventname = detail['state'] ec2 = boto3.resource('ec2') while eventname == 'Running': print(ids) #Check to see if backup tag is added to the instance tag_to_check = 'Backup' instance = ec2.Instance(ids) for tag in instance.tags: if tag_to_check not in [t['Key'] for t in instance.tags]: instance.stop() print("Stopping Instance: ", instance) #Get instance state to break the infinite loop state = instance.state['Name'] if state == "shutting-down": print("instance is shutting-down") break elif state == "stopped": print("Instance is already stopped") break elif state == "stopping": print("instance is stopping") break break
The function then will check the status of the instance to ensure that it is stopped and then break the loop.
You can clone the repository from GitHub here:
https://github.com/avansledright/aws-force-ec2-launch-tagsIf you utilize the script please share it with your friends. Feel free to modify it as you please and let me know how it works for you! As always, if you have any questions feel free to reach out here or on any other platform!
Lambda Function Post to Slack
I wrote this script out of a need to practice my Python skills. The idea is that if a file gets uploaded to an S3 bucket then the function will trigger and a message with that file name will be posted to a Slack channel of your choosing.
To utilize this you will need to include the Slack pip package as well as the slackclient pip package when you upload the function to the AWS Console.
You will also need to create an OAuth key for a Slack application. If you are unfamiliar with this process feel free to drop a comment below and or shoot me a message and I can walk you through the process or write a second part of the guide.
Here is a link to the project:
https://github.com/avansledright/posttoSlackLambdaIf this helps you please share this post on your favorite social media platform!
Automatically Transcribing Audio Files with Amazon Web Services
I wrote this Lambda function to automatically transcribe audio files that are uploaded to an S3 bucket. This is written in Python3 and utilizes the Boto3 library.
You will need to give your Lambda function permissions to access S3, Transcribe and CloudWatch.
The script will create an AWS Transcribe job with the format:
'filetranscription'+YYYYMMDD-HHMMSS
I will be iterating over the script to hopefully add in a web front end as well as potentially branching to do voice call transcriptions for phone calls and Amazon Connect.
You can view the code here
If you have questions or comments feel free to reach out to me here or on any Social Media.