Most professional knowledge work involves repetitive tasks that could be automated: processing files, sending reports, updating spreadsheets, querying APIs, moving data between systems. Python is the most practical language for this kind of automation — its standard library handles most of what you need, and its ecosystem fills in the gaps.
When automation is worth it
Before writing any code, consider whether automation is actually worth the investment. A useful heuristic: if a task takes you 10 minutes per week, spending 2 hours automating it pays off in about a month. If it takes 10 minutes per year, the automation probably takes longer to build and maintain than to just do manually.
Good candidates for automation:
- Tasks you do frequently and find tedious
- Tasks with clear rules and predictable inputs
- Tasks where manual execution is error-prone
- Tasks that could be parallelized to save time
File processing
Python's pathlib and shutil modules handle most file operations clearly:
from pathlib import Path
import shutil
source = Path("/data/reports")
archive = Path("/data/archive")
for file in source.glob("*.csv"):
if file.stat().st_mtime < some_cutoff:
shutil.move(str(file), archive / file.name)
pathlib.Path is almost always preferable to string manipulation for file paths — it handles OS differences, provides clear methods for common operations, and makes code easier to read.
Working with APIs
The httpx or requests libraries handle HTTP well. A few patterns that make API automation more robust:
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def fetch_records(api_key: str, page: int) -> list[dict]:
response = httpx.get(
"https://api.example.com/records",
headers={"Authorization": f"Bearer {api_key}"},
params={"page": page, "per_page": 100},
timeout=30.0,
)
response.raise_for_status()
return response.json()["records"]
The tenacity library handles retries with exponential backoff — essential for any production automation that depends on external APIs.
Scheduling
For simple scheduling on a single machine, schedule provides a readable API:
import schedule
import time
def run_daily_report():
# ... your report logic
schedule.every().day.at("08:00").do(run_daily_report)
schedule.every().monday.at("09:00").do(run_weekly_summary)
while True:
schedule.run_pending()
time.sleep(60)
For production scheduling, cron on Linux/macOS is reliable and requires no additional dependencies. For complex DAGs and monitoring, Airflow, Prefect, or Dagster are worth considering.
Handling failures gracefully
Automation scripts that fail silently are dangerous — you may not notice that something went wrong until the damage compounds. A few practices:
Log everything useful:
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(message)s",
handlers=[
logging.FileHandler("automation.log"),
logging.StreamHandler(),
]
)
Use exceptions for control flow:
try:
result = process_file(path)
except FileNotFoundError:
logging.error(f"File not found: {path}")
return
except ValueError as e:
logging.error(f"Invalid data in {path}: {e}")
return
Alert on failures: Send a notification (email, Slack, PagerDuty) when critical automation fails. The notification should include what failed, when, and enough context to start debugging.
Environment and configuration
Never hardcode credentials or environment-specific values in scripts. Use environment variables and read them at startup:
import os
from dotenv import load_dotenv
load_dotenv()
DATABASE_URL = os.environ["DATABASE_URL"]
API_KEY = os.environ["THIRD_PARTY_API_KEY"]
REPORT_RECIPIENTS = os.environ["REPORT_RECIPIENTS"].split(",")
The python-dotenv library reads .env files in development, while production systems typically inject environment variables directly.
Testing automation scripts
Scripts that run unattended need tests. Focus on:
- Unit tests for the data transformation logic (the most likely source of bugs)
- Integration tests that run against a test environment or mock APIs
- A dry-run mode that shows what would happen without actually doing it
def process_file(path: Path, dry_run: bool = False) -> None:
records = read_records(path)
transformed = transform_records(records)
if dry_run:
logging.info(f"[DRY RUN] Would write {len(transformed)} records")
return
write_to_database(transformed)
Summary
Effective Python automation starts with identifying tasks worth automating, then handling the four main concerns: reliable file/data processing, resilient API calls with retries, robust scheduling, and visible failure handling with logging and alerts. Configuration belongs in environment variables, not code. Tests — especially for the transformation logic — catch bugs before they corrupt production data.