In the data world we always talk about automation and how we should avoid repetitive manual tasks.
But when I started as an analyst 5 years ago, I quickly realized that I had to run my Python scripts daily and that it was time consuming (and annoying).
You probably ran into the same problem and depending on your BI department backlog you might have been left without a solution.
Today I’m sharing 3 ways to schedule your Python scripts that I used throughout the years.
1. Automate using your Machine
This is by no mean ideal, but it gets the job done when no other options is available. The idea is that you schedule the scripts to run at a certain time on your own computer.
I recommend you choose a time when you’re typically not working and you can just let your computer do the work. For example, when I use to do this I ran scripts at 3 in the morning when I was sleeping. Even if for some reasons my scripts took longer to execute I was certain they would be done by the time I came in around 8.
The advantage of this solution is that it’s quick, but the drawback is that you have to leave your computer open at night for your scripts to run.
Also, one thing you should always do is reach to your BI department to let them know that you’re running scripts at night, to be sure you’re not interfering with any of their ETLs.
2. Use a Virtual Machine
At one point you’ll get tired of running your scripts on your computer: slows down your computer, need to leave computer open at night, a python window is always open, and other reasons.
The next step is to ask your BI department to give you access to a Virtual Machine. A virtual machine is simply a computer that you can connect to using the Remote Desktop Connection app on Windows. The advantage is that the VM is on 24-7 and it keeps your computer clean.
This solution is not great for code sharing though. For example, I once wanted a colleague to schedule his script using my VM and it turned out to be way more complicated than we thought.
3. DataBricks and Similar Tools
This is the real long term solution. Tools like DataBricks let you write you scripts, save them, share them, and schedule them. The scripts run on their server and not within your organization so that saves computing power and BI time. This is an example of Scala Scripts executed by their scheduler:
The cost for this solution is pretty high though, so your company might not want to pay for it. If it’s the case than the best solution I found was #2.
The Code for Scheduling
The idea is to create a Python script that will always run on your computer and make it call other scripts when a certain condition is met. This can easily be done using the Schedule library and a batch file.
Save the Python .py file in a folder, and then create a batch (.bat) file with this command:
start C:\Users\xxxxx\xxxxx\xxxxxx\Continuum\Anaconda3\python.exe "C:\Users\xxxxxx\Desktop\Scripts\script.py"
You will have to change paths for python.exe and your .py script.
Then create a .py script with the following code:
from subprocess import call
from datetime import datetime
print("script.py executed successfully at "+ str(datetime.now())+ ".")
except Exception as e:
time.sleep(60) # wait one minute
You’re creating a function that executes the .bat file (which executes your .py script) every morning at 6 am.
To run that scheduler you just have to create a scheduler.bat file and run it. You will see a black window open, just leave it open. It will wait for the right time and then call the specified function.
Now you know how to schedule your python scripts!