Uptime monitoring is the practice of regularly checking that a website is reachable and alerting you when it stops responding. Python makes this easy to build yourself with the requests library and a few dozen lines of code.
In this article, you’ll build a small but reliable uptime monitor step by step. It will check a site, measure how fast it responds, retry the failures worth retrying, run on a schedule, and notify you on Slack when something breaks. At the end, we’ll look at where a hand-rolled monitor is enough and where a dedicated service makes more sense.
How to Check if a Website Is Up in Python
The simplest possible check is a single GET request, where a 200 status code means the site is up.
import requests
resp = requests.get("https://example.com")
print(resp.status_code)
On the surface this looks like enough, but as a monitor it has several problems underneath. A bare requests.get() will wait forever if the server accepts the connection but never sends a response, so one stuck request freezes the whole monitor.
It also raises an exception on a DNS failure or a refused connection, which means the script crashes on exactly the outages you wrote it to catch. On top of that, it makes no distinction between a 404, which is a genuine HTTP response, and a connection error, which never reached the server at all.
Before this is useful, it needs to handle timeouts, decide which failures are worth retrying, and tell the different kinds of failure apart.
A More Reliable Uptime Check
The function below addresses each of those problems. It returns a dictionary rather than a raw status code, so the caller learns whether the site is up, what the status was, how long the response took, and what went wrong when it failed.
import time
import requests
def check_site(url, timeout=10, retries=3, backoff=2):
"""Return a dict describing whether `url` is up, its status, and latency.
Network errors and 5xx responses are retried with exponential backoff,
because they are often transient. A 4xx is returned immediately — a 404
or 401 will not fix itself on a second attempt.
"""
last_status = None
last_error = None
for attempt in range(1, retries + 1):
start = time.perf_counter()
try:
resp = requests.get(url, timeout=timeout, allow_redirects=True)
elapsed_ms = round((time.perf_counter() - start) * 1000)
last_status = resp.status_code
if resp.status_code < 400:
return {"up": True, "status": resp.status_code, "response_ms": elapsed_ms}
if resp.status_code < 500:
return {"up": False, "status": resp.status_code, "response_ms": elapsed_ms}
last_error = f"HTTP {resp.status_code}"
except requests.RequestException as exc:
last_error = exc.__class__.__name__
if attempt < retries:
time.sleep(backoff ** attempt)
return {"up": False, "status": last_status, "response_ms": None, "error": last_error}
Each line in the function maps back to one of the problems above. The timeout argument is what stops a single stuck request from freezing the monitor, since without it requests will wait indefinitely on a half-open connection. Setting allow_redirects=True keeps a site that forwards from http to https counted as up instead of reported as a failure. Wrapping the request in except requests.RequestException catches timeouts, refused connections, and DNS errors together, so the function returns a clean result rather than raising and crashing the loop.
The retry block treats failures differently depending on whether they can recover. A 5xx response or a dropped connection is often temporary, so the function waits and tries again, backing off from two seconds to four. A 4xx such as 404 or 401 is returned right away, because retrying a request the server has already rejected only delays the answer without changing it.
You can confirm all three outcomes by running it against a healthy site, a failing endpoint, and a domain that doesn’t exist.
if __name__ == "__main__":
targets = [
"https://example.com",
"https://httpbin.org/status/503",
"https://does-not-exist-9f3a2b.example",
]
for target in targets:
print(f"{target} -> {check_site(target, retries=2)}")
Output:
https://example.com -> {'up': True, 'status': 200, 'response_ms': 78}
https://httpbin.org/status/503 -> {'up': False, 'status': 503, 'response_ms': None, 'error': 'HTTP 503'}
https://does-not-exist-9f3a2b.example -> {'up': False, 'status': None, 'response_ms': None, 'error': 'ConnectionError'}
The response_ms value is worth recording on its own, because a site that answers in four seconds is technically up but already degrading, and watching latency over time warns you before an outage rather than only after one.
Sending Down Alerts to Slack
A check that only prints to a terminal won’t reach you when you’re asleep, so the monitor needs a way to notify you. A Slack incoming webhook is the simplest destination, since sending a message to it is a single POST request.
import requests
def send_alert(message, webhook_url, timeout=10):
resp = requests.post(webhook_url, json={"text": message}, timeout=timeout)
resp.raise_for_status()
return resp.status_code
Create an incoming webhook in your Slack workspace and pass its URL to send_alert. The raise_for_status() call makes a wrong webhook URL fail during setup instead of silently dropping every alert.
Running the Check on a Schedule
Running the check a single time only reports the current state, so to monitor a site continuously you need to run it on an interval. The schedule library handles recurring jobs in plain Python without relying on cron.
import time
import schedule
from alerting import send_alert
from uptime_check import check_site
URL = "https://example.com"
WEBHOOK_URL = "https://hooks.slack.com/services/XXX/YYY/ZZZ"
def job():
result = check_site(URL)
print(time.strftime("%H:%M:%S"), result)
if not result["up"]:
send_alert(f":rotating_light: {URL} is DOWN — {result}", WEBHOOK_URL)
schedule.every(60).seconds.do(job)
if __name__ == "__main__":
job() # run once on startup
while True:
schedule.run_pending()
time.sleep(1)
This runs the check immediately, then every 60 seconds, and fires a Slack alert only when the site is down.
Output:
11:51:40 {‘up’: True, ‘status’: 200, ‘response_ms’: 36}
Run it under systemd or in a screen session pointed at your site, and you have a working uptime monitor that is genuinely enough for a single personal project.
Limitations of a DIY Uptime Monitor
The script above works well for one site you own, but a few limitations show up quickly once uptime actually matters.
The first is where the monitor runs. If it lives on the same server as the site it watches, the two go offline together and no alert is sent at the moment you most need one, so in practice you end up maintaining a second machine purely to host the monitor.
The second is that a single machine only ever sees the site from one location. A large share of outages are regional, where a site stays reachable from Europe but fails from the United States, and one script in one data center cannot detect that — you would need probes in several regions plus logic to reconcile their disagreements.
The third is the surrounding work that accumulates around the check itself. You take on the overhead of deduplicating alerts so a brief blip doesn’t fire dozens of messages, matching and reconciling results coming from multiple regions, tracking SSL certificate expiry, storing uptime history for later, and publishing a status page — each of which is a separate piece to build and keep running.
At that point it is worth deciding whether you want to maintain monitoring infrastructure or spend that time on your actual work. If it is the latter, it helps to compare a few dedicated website monitoring tools before building a second monitor by hand.
Monitoring from Python Without the Infrastructure
The step up from the DIY script is still Python. DevHelm has a typed Python SDK, so a multi-region monitor with built-in alerting and history takes only a few lines.
Once the SDK is installed, you can write a short script to create a monitor in code:
import os
from devhelm import Devhelm
client = Devhelm(token=os.environ["DEVHELM_API_TOKEN"])
monitor = client.monitors.create(
{
"name": "My Site",
"type": "HTTP",
"config": {"url": "https://example.com", "method": "GET"},
"frequencySeconds": 300, # check every 5 minutes
"regions": ["us-east", "eu-west"],
}
)
print(f"Created monitor {monitor.id}")
print(f" name: {monitor.name}")
print(f" enabled: {monitor.enabled}")
print(f" regions: {monitor.regions}")
Output:
Created monitor 6419ea9b-0b15-4a5f-abcd-09763fa07429
name: My Site
enabled: True
regions: ['us-east', 'eu-west']
The regions list is the part a single machine cannot replicate: the same check runs from multiple locations and the service reconciles them, so a regional outage is reported as regional rather than as a false alarm. The monitor it creates polls on the interval you set and takes care of the alerting and history that the DIY version left for you to build.
Conclusion
Building your own uptime monitor in Python is a good way to learn how HTTP failures actually behave and why timeouts and retry policy matter. For a single personal site, the check_site function, a Slack webhook, and a schedule loop are genuinely enough.
