I have a problem. Every time I come back from shooting — whether it's the coast at La Paloma, the streets of Montevideo, or drone footage over the countryside — I end up with hundreds of photos across two cameras and an SD card that desperately needs formatting. For years, my "workflow" was: dump everything into a folder called NEW STUFF, promise myself I'd sort it later, and then never sort it.

Sound familiar? I finally decided to fix this the way I fix most things: with Python.

Here's how I automated my entire photo workflow — from RAW ingestion to EXIF-based organisation to batch resizing — and why you should too.

Python code on screen
Python + photography = a workflow that actually works.

The Problem with Manual Photo Management

I shoot with a Sony A7R II and a Fujifilm X-T2. The Sony produces massive 42-megapixel ARW files. The Fujifilm gives me RAF files with gorgeous film simulations. Both cameras write to SD cards, and both use their own RAW formats that Lightroom and Capture One handle fine — but the file management side? That's where things fall apart.

My pre-automation problems:

  • Photos sorted by camera, not by date or location
  • No consistent folder structure between shoots
  • Duplicates from trying different exposure settings
  • Keywords and ratings living only in Lightroom — not in the files themselves
  • Exporting for web was a manual resize-and-save-one-at-a-time process

Each of these is solvable in Lightroom, but I didn't want to be locked into a single application's catalog system. I wanted my file organisation to work whether I was using Lightroom, Capture One, or just browsing files on my NAS.

Step 1: Extracting EXIF Data with Python

Every digital photo carries EXIF metadata — date, time, camera model, lens, exposure settings, and sometimes GPS coordinates. This data is gold for organisation, but most people never use it programmatically.

The exiftool command-line tool (via the PyExifTool wrapper) gives you access to everything:

import subprocess
import json
from pathlib import Path
from dataclasses import dataclass

@dataclass
class PhotoMetadata:
    filepath: Path
    date_taken: str
    camera: str
    lens: str
    focal_length: str
    iso: int
    aperture: str
    shutter_speed: str
    gps_lat: float  # None if unavailable
    gps_lon: float  # None if unavailable

def extract_metadata(photo_path: Path) -> PhotoMetadata:
    """Extract EXIF metadata from a photo using exiftool."""
    result = subprocess.run(
        ["exiftool", "-json", str(photo_path)],
        capture_output=True, text=True
    )
    data = json.loads(result.stdout)[0]

    return PhotoMetadata(
        filepath=photo_path,
        date_taken=data.get("DateTimeOriginal", "unknown"),
        camera=data.get("Model", "unknown"),
        lens=data.get("LensModel", "unknown"),
        focal_length=data.get("FocalLength", "unknown"),
        iso=data.get("ISO", 0),
        aperture=data.get("FNumber", "unknown"),
        shutter_speed=data.get("ExposureTime", "unknown"),
        gps_lat=data.get("GPSLatitude", 0.0),
        gps_lon=data.get("GPSLongitude", 0.0),
    )

# Quick test
meta = extract_metadata(Path("DSC00456.ARW"))
print(f"{meta.camera} | {meta.lens} | {meta.date_taken} | ISO {meta.iso}")
# Output: ILCE-7RM2 | FE 35mm f/1.8 | 2025:03:15 16:42:33 | ISO 200

One function and I have every piece of metadata I need to organise, filter, and catalog my photos. No Lightroom catalog required.

Step 2: Automatic Folder Organisation

Here's where it gets satisfying. Instead of dumping photos into NEW STUFF, I now organise them into a proper structure based on the date and camera:

from datetime import datetime
from pathlib import Path
import shutil

def organise_photos(source_dir: Path, dest_root: Path, dry_run: bool = True):
    """Organise photos into YEAR/MONTH/DAY/CAMERA structure."""
    extensions = {".arw", ".raf", ".cr2", ".nef", ".dng", ".jpg", ".jpeg", ".png"}
    moved = 0

    for photo in source_dir.rglob("*"):
        if photo.suffix.lower() not in extensions:
            continue

        meta = extract_metadata(photo)

        # Parse the date from EXIF (format: "2025:03:15 16:42:33")
        try:
            dt = datetime.strptime(meta.date_taken, "%Y:%m:%d %H:%M:%S")
        except ValueError:
            # Fall back to file modification time
            dt = datetime.fromtimestamp(photo.stat().st_mtime)

        # Build destination path: YEAR/MONTH/DAY/CAMERA
        dest_dir = dest_root / f"{dt.year}" / f"{dt.month:02d}-{dt.strftime('%B')}" / f"{dt.day:02d}" / meta.camera
        dest_dir.mkdir(parents=True, exist_ok=True)
        dest_path = dest_dir / photo.name

        if dry_run:
            print(f"[DRY RUN] {photo.name} -> {dest_path}")
        else:
            shutil.move(str(photo), str(dest_path))
            print(f"Moved {photo.name} -> {dest_path}")

        moved += 1

    print(f"\n{'Would move' if dry_run else 'Moved'} {moved} photos")
    return moved

# Always dry-run first!
organise_photos(Path("/mnt/sd_card/DCIM"), Path("/photos/organised"), dry_run=True)

The folder structure ends up looking like this:

/photos/organised/
  2025/
    03-March/
      15/
        ILCE-7RM2/     # Sony A7R II
          DSC00456.ARW
          DSC00457.ARW
        X-T2/           # Fujifilm X-T2
          _DSC1234.RAF
          _DSC1235.RAF
      16/
        ILCE-7RM2/
          DSC00512.ARW

Every photo in the right place, automatically. And because it's based on EXIF data, it doesn't matter if the file modification date changed when I copied it off the SD card.

Python logo
Python turned my photo chaos into something I can actually find things in.

Step 3: Finding and Removing Duplicates

Anyone who shoots bursts knows the pain: you take 15 shots of the same scene, keep one, and the other 14 sit there forever eating disk space. With 42-megapixel ARW files, that's 80 MB per photo. A single burst can be over a gigabyte of near-identical files.

Here's a deduplication approach that uses perceptual hashing — it finds photos that look similar, not just files with identical bytes:

from PIL import Image
import imagehash
from collections import defaultdict

def find_near_duplicates(photo_dir: Path, hash_size: int = 16, threshold: int = 5):
    """Find groups of visually similar photos using perceptual hashing."""
    hashes = defaultdict(list)

    for photo in photo_dir.rglob("*"):
        if photo.suffix.lower() not in {".jpg", ".jpeg", ".png", ".tiff"}:
            continue

        try:
            img = Image.open(photo)
            h = str(imagehash.phash(img, hash_size=hash_size))
            hashes[h].append(photo)
        except Exception:
            continue

    # Group photos with similar hashes
    duplicate_groups = []
    for h, paths in hashes.items():
        if len(paths) > 1:
            duplicate_groups.append(paths)

    # Also check between different hashes for near-matches
    hash_list = list(hashes.keys())
    for i, h1 in enumerate(hash_list):
        for h2 in hash_list[i + 1:]:
            diff = imagehash.hex_to_hash(h1) - imagehash.hex_to_hash(h2)
            if diff <= threshold:
                combined = hashes[h1] + hashes[h2]
                if combined not in duplicate_groups:
                    duplicate_groups.append(combined)

    return duplicate_groups

groups = find_near_duplicates(Path("/photos/organised/2025"))
print(f"Found {len(groups)} groups of similar photos")
for group in groups[:5]:  # Preview first 5
    print(f"  Group: {', '.join(p.name for p in group)}")

This won't delete anything automatically — it gives you the groups so you can decide which to keep. For RAW files, I use the embedded JPEG thumbnail for hashing, which is fast enough to process thousands of photos in seconds.

Step 4: Batch Export for the Web

I publish photos on this blog and on social media. Both need resized, compressed JPEGs — not 42-megapixel RAW files. Here's the batch export pipeline:

from PIL import Image, ImageOps
from pathlib import Path

def batch_export(
    source_dir: Path,
    output_dir: Path,
    max_dimension: int = 2048,
    quality: int = 85,
):
    """Export photos for web with resizing and optional watermark."""
    output_dir.mkdir(parents=True, exist_ok=True)
    exported = []

    for photo in source_dir.rglob("*"):
        if photo.suffix.lower() not in {".jpg", ".jpeg", ".png", ".tiff"}:
            continue

        try:
            img = Image.open(photo)

            # Auto-rotate based on EXIF
            img = ImageOps.exif_transpose(img)

            # Resize maintaining aspect ratio
            img.thumbnail((max_dimension, max_dimension), Image.Resampling.LANCZOS)

            # Build output path
            output_path = output_dir / f"{photo.stem}_web.jpg"
            img.save(output_path, "JPEG", quality=quality, optimize=True)
            exported.append(output_path)
            print(f"Exported: {output_path.name} ({img.size[0]}x{img.size[1]})")

        except Exception as e:
            print(f"Skipped {photo.name}: {e}")

    return exported

# Export all March 2025 photos for web
batch_export(
    Path("/photos/organised/2025/03-March"),
    Path("/photos/exported/web/2025/03"),
    max_dimension=2048,
    quality=85
)

The ImageOps.exif_transpose call is essential — it auto-rotates portrait photos based on the EXIF orientation tag. Without it, your portrait shots will render sideways in browsers that don't respect EXIF rotation (which is more browsers than you'd think).

Step 5: Adding Location Data Without Built-in GPS

Neither the Sony A7R II nor the Fujifilm X-T2 has built-in GPS. The Sony can tag photos with your phone's location via the PlayMemories Smart Remote app, but it only embeds the coordinates in the copy sent to your phone — not the file on the camera's SD card. The Fujifilm Camera Remote app does a one-shot location transfer, but you'd need to reconnect every time you move. Neither is practical for a day of shooting.

My solution: I run a GPS track log on my phone using GeoTag Photos Pro, then match the timestamps in post-processing. Here's how to apply the GPS data with Python:

import gpxpy
import httpx
from datetime import datetime, timezone

def load_gps_track(gpx_path: str) -> list:
    """Load GPS track points from a GPX file."""
    with open(gpx_path) as f:
        gpx = gpxpy.parse(f)
    points = []
    for track in gpx.tracks:
        for segment in track.segments:
            for point in segment.points:
                points.append({
                    "time": point.time.replace(tzinfo=timezone.utc),
                    "lat": point.latitude,
                    "lon": point.longitude,
                })
    return sorted(points, key=lambda p: p["time"])

def find_nearest_location(photo_time: datetime, gps_points: list, max_offset_sec: int = 300):
    """Find the closest GPS point to a photo's timestamp."""
    closest = None
    min_diff = float("inf")
    for pt in gps_points:
        diff = abs((photo_time - pt["time"]).total_seconds())
        if diff < min_diff and diff <= max_offset_sec:
            min_diff = diff
            closest = pt
    return closest

def reverse_geocode(lat: float, lon: float) -> dict:
    """Get location name from GPS coordinates using Nominatim."""
    response = httpx.get(
        "https://nominatim.openstreetmap.org/reverse",
        params={"lat": lat, "lon": lon, "format": "json", "zoom": 10},
        headers={"User-Agent": "photo-organiser/1.0"}
    )
    data = response.json()
    address = data.get("address", {})
    return {
        "city": address.get("city", address.get("town", "")),
        "country": address.get("country", ""),
    }

# Load track and match to photos
gps_points = load_gps_track("/tracks/2025-03-15.gpx")
loc = reverse_geocode(gps_points[0]["lat"], gps_points[0]["lon"])
print(f"Shoot location: {loc['city']}, {loc['country']}")  # Montevideo, Uruguay

The key is keeping your camera's clock synchronised with your phone's clock — I check this before every shoot. The max_offset_sec parameter lets you control how much time drift to tolerate; 5 minutes works well for walking around a city.

Putting It All Together: The Full Pipeline

Here's the complete pipeline I run every time I come back from a shoot:

#!/bin/bash
# photo-pipeline.sh -- run after every shoot
set -e

SOURCE=/mnt/sd_card/DCIM
DEST=/photos/organised
EXPORT=/photos/exported
GPX=/tracks
DATE=$(date +%Y-%m-%d)

echo "[$DATE] Starting photo pipeline..."

# 1. Extract metadata and organise by date/camera
python3 organise_photos.py --source $SOURCE --dest $DEST

# 2. Find and report near-duplicates
python3 find_duplicates.py --dir $DEST --report /tmp/duplicates-$DATE.json

# 3. Apply GPS track log to photos
python3 geotag_photos.py --dir $DEST --gpx $GPX/$DATE.gpx

# 4. Export web versions
python3 batch_export.py --source $DEST --output $EXPORT --max-dim 2048 --quality 85

# 5. Sync exports to cloud storage
rclone sync $EXPORT/web cloud:photos/web --progress

echo "[$DATE] Pipeline complete!"

From SD card to organised, geotagged, deduplicated, web-ready, and backed up — in about 5 minutes for a typical shoot. It used to take me an entire evening.

The Bigger Picture

I've been shooting photos and writing Python for years, but it took me too long to put the two together. The same skills I use for data science — extracting, transforming, and loading data — apply perfectly to photo management. EXIF data is just another dataset. File organisation is just another ETL pipeline. Batch processing is batch processing, whether you're resizing images or transforming pandas DataFrames.

The scripts above aren't production-grade software — they're the kind of quick, practical tools I've always written for myself and shared on this blog. They work for my workflow with my cameras. But the approach is universal: extract the metadata, automate the boring parts, and keep your creative energy for the things that actually matter — like deciding which of those 15 burst shots is the keeper.

If you're sitting on a hard drive full of unorganised photos, start with Step 1 — just the EXIF extraction and folder organisation. That alone will change how you interact with your photo library. The rest is gravy.