Practical Workflow Automation with Python

Most professional knowledge work involves repetitive tasks that could be automated: processing files, sending reports, updating spreadsheets, querying APIs, moving data between systems. Python is the most practical language for this kind of automation — its standard library handles most of what you need, and its ecosystem fills in the gaps.

When automation is worth it

Before writing any code, consider whether automation is actually worth the investment. A useful heuristic: if a task takes you 10 minutes per week, spending 2 hours automating it pays off in about a month. If it takes 10 minutes per year, the automation probably takes longer to build and maintain than to just do manually.

Good candidates for automation:

Tasks you do frequently and find tedious
Tasks with clear rules and predictable inputs
Tasks where manual execution is error-prone
Tasks that could be parallelized to save time

File processing

Python's pathlib and shutil modules handle most file operations clearly:

from pathlib import Path
import shutil

source = Path("/data/reports")
archive = Path("/data/archive")

for file in source.glob("*.csv"):
    if file.stat().st_mtime < some_cutoff:
        shutil.move(str(file), archive / file.name)

pathlib.Path is almost always preferable to string manipulation for file paths — it handles OS differences, provides clear methods for common operations, and makes code easier to read.

Working with APIs

The httpx or requests libraries handle HTTP well. A few patterns that make API automation more robust:

import httpx
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def fetch_records(api_key: str, page: int) -> list[dict]:
    response = httpx.get(
        "https://api.example.com/records",
        headers={"Authorization": f"Bearer {api_key}"},
        params={"page": page, "per_page": 100},
        timeout=30.0,
    )
    response.raise_for_status()
    return response.json()["records"]

The tenacity library handles retries with exponential backoff — essential for any production automation that depends on external APIs.

Scheduling

For simple scheduling on a single machine, schedule provides a readable API:

import schedule
import time

def run_daily_report():
    # ... your report logic

schedule.every().day.at("08:00").do(run_daily_report)
schedule.every().monday.at("09:00").do(run_weekly_summary)

while True:
    schedule.run_pending()
    time.sleep(60)

For production scheduling, cron on Linux/macOS is reliable and requires no additional dependencies. For complex DAGs and monitoring, Airflow, Prefect, or Dagster are worth considering.

Handling failures gracefully

Automation scripts that fail silently are dangerous — you may not notice that something went wrong until the damage compounds. A few practices:

Log everything useful:

import logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
    handlers=[
        logging.FileHandler("automation.log"),
        logging.StreamHandler(),
    ]
)

Use exceptions for control flow:

try:
    result = process_file(path)
except FileNotFoundError:
    logging.error(f"File not found: {path}")
    return
except ValueError as e:
    logging.error(f"Invalid data in {path}: {e}")
    return

Alert on failures: Send a notification (email, Slack, PagerDuty) when critical automation fails. The notification should include what failed, when, and enough context to start debugging.

Environment and configuration

Never hardcode credentials or environment-specific values in scripts. Use environment variables and read them at startup:

import os
from dotenv import load_dotenv

load_dotenv()

DATABASE_URL = os.environ["DATABASE_URL"]
API_KEY = os.environ["THIRD_PARTY_API_KEY"]
REPORT_RECIPIENTS = os.environ["REPORT_RECIPIENTS"].split(",")

The python-dotenv library reads .env files in development, while production systems typically inject environment variables directly.

Testing automation scripts

Scripts that run unattended need tests. Focus on:

Unit tests for the data transformation logic (the most likely source of bugs)
Integration tests that run against a test environment or mock APIs
A dry-run mode that shows what would happen without actually doing it

def process_file(path: Path, dry_run: bool = False) -> None:
    records = read_records(path)
    transformed = transform_records(records)
    if dry_run:
        logging.info(f"[DRY RUN] Would write {len(transformed)} records")
        return
    write_to_database(transformed)

Summary

Effective Python automation starts with identifying tasks worth automating, then handling the four main concerns: reliable file/data processing, resilient API calls with retries, robust scheduling, and visible failure handling with logging and alerts. Configuration belongs in environment variables, not code. Tests — especially for the transformation logic — catch bugs before they corrupt production data.

Practical Workflow Automation with Python

When automation is worth it

File processing

Working with APIs

Scheduling

Handling failures gracefully

Environment and configuration

Testing automation scripts

Summary

More Intelligence

n8n: Open-Source Workflow Automation Without the Lock-in

AI Agents: What They Are and How They Work

The Orbital Economy: Who Controls the Pipes in Space