I have a problem. Every time I come back from shooting — whether it's the coast at La Paloma, the streets of Montevideo, or drone footage over the countryside — I end up with hundreds of photos across two cameras and an SD card that desperately needs formatting. For years, my "workflow" was: dump everything into a folder called NEW STUFF, promise myself I'd sort it later, and then never sort it.
Sound familiar? I finally decided to fix this the way I fix most things: with Python.
Here's how I automated my entire photo workflow — from RAW ingestion to EXIF-based organisation to batch resizing — and why you should too.
The Problem with Manual Photo Management
I shoot with a Sony A7R II and a Fujifilm X-T2. The Sony produces massive 42-megapixel ARW files. The Fujifilm gives me RAF files with gorgeous film simulations. Both cameras write to SD cards, and both use their own RAW formats that Lightroom and Capture One handle fine — but the file management side? That's where things fall apart.
My pre-automation problems:
- Photos sorted by camera, not by date or location
- No consistent folder structure between shoots
- Duplicates from trying different exposure settings
- Keywords and ratings living only in Lightroom — not in the files themselves
- Exporting for web was a manual resize-and-save-one-at-a-time process
Each of these is solvable in Lightroom, but I didn't want to be locked into a single application's catalog system. I wanted my file organisation to work whether I was using Lightroom, Capture One, or just browsing files on my NAS.
Step 1: Extracting EXIF Data with Python
Every digital photo carries EXIF metadata — date, time, camera model, lens, exposure settings, and sometimes GPS coordinates. This data is gold for organisation, but most people never use it programmatically.
The exiftool command-line tool (via the PyExifTool wrapper) gives you access to everything:
import subprocess
import json
from pathlib import Path
from dataclasses import dataclass
@dataclass
class PhotoMetadata:
filepath: Path
date_taken: str
camera: str
lens: str
focal_length: str
iso: int
aperture: str
shutter_speed: str
gps_lat: float # None if unavailable
gps_lon: float # None if unavailable
def extract_metadata(photo_path: Path) -> PhotoMetadata:
"""Extract EXIF metadata from a photo using exiftool."""
result = subprocess.run(
["exiftool", "-json", str(photo_path)],
capture_output=True, text=True
)
data = json.loads(result.stdout)[0]
return PhotoMetadata(
filepath=photo_path,
date_taken=data.get("DateTimeOriginal", "unknown"),
camera=data.get("Model", "unknown"),
lens=data.get("LensModel", "unknown"),
focal_length=data.get("FocalLength", "unknown"),
iso=data.get("ISO", 0),
aperture=data.get("FNumber", "unknown"),
shutter_speed=data.get("ExposureTime", "unknown"),
gps_lat=data.get("GPSLatitude", 0.0),
gps_lon=data.get("GPSLongitude", 0.0),
)
# Quick test
meta = extract_metadata(Path("DSC00456.ARW"))
print(f"{meta.camera} | {meta.lens} | {meta.date_taken} | ISO {meta.iso}")
# Output: ILCE-7RM2 | FE 35mm f/1.8 | 2025:03:15 16:42:33 | ISO 200
One function and I have every piece of metadata I need to organise, filter, and catalog my photos. No Lightroom catalog required.
Step 2: Automatic Folder Organisation
Here's where it gets satisfying. Instead of dumping photos into NEW STUFF, I now organise them into a proper structure based on the date and camera:
from datetime import datetime
from pathlib import Path
import shutil
def organise_photos(source_dir: Path, dest_root: Path, dry_run: bool = True):
"""Organise photos into YEAR/MONTH/DAY/CAMERA structure."""
extensions = {".arw", ".raf", ".cr2", ".nef", ".dng", ".jpg", ".jpeg", ".png"}
moved = 0
for photo in source_dir.rglob("*"):
if photo.suffix.lower() not in extensions:
continue
meta = extract_metadata(photo)
# Parse the date from EXIF (format: "2025:03:15 16:42:33")
try:
dt = datetime.strptime(meta.date_taken, "%Y:%m:%d %H:%M:%S")
except ValueError:
# Fall back to file modification time
dt = datetime.fromtimestamp(photo.stat().st_mtime)
# Build destination path: YEAR/MONTH/DAY/CAMERA
dest_dir = dest_root / f"{dt.year}" / f"{dt.month:02d}-{dt.strftime('%B')}" / f"{dt.day:02d}" / meta.camera
dest_dir.mkdir(parents=True, exist_ok=True)
dest_path = dest_dir / photo.name
if dry_run:
print(f"[DRY RUN] {photo.name} -> {dest_path}")
else:
shutil.move(str(photo), str(dest_path))
print(f"Moved {photo.name} -> {dest_path}")
moved += 1
print(f"\n{'Would move' if dry_run else 'Moved'} {moved} photos")
return moved
# Always dry-run first!
organise_photos(Path("/mnt/sd_card/DCIM"), Path("/photos/organised"), dry_run=True)
The folder structure ends up looking like this:
/photos/organised/
2025/
03-March/
15/
ILCE-7RM2/ # Sony A7R II
DSC00456.ARW
DSC00457.ARW
X-T2/ # Fujifilm X-T2
_DSC1234.RAF
_DSC1235.RAF
16/
ILCE-7RM2/
DSC00512.ARW
Every photo in the right place, automatically. And because it's based on EXIF data, it doesn't matter if the file modification date changed when I copied it off the SD card.
Step 3: Finding and Removing Duplicates
Anyone who shoots bursts knows the pain: you take 15 shots of the same scene, keep one, and the other 14 sit there forever eating disk space. With 42-megapixel ARW files, that's 80 MB per photo. A single burst can be over a gigabyte of near-identical files.
Here's a deduplication approach that uses perceptual hashing — it finds photos that look similar, not just files with identical bytes:
from PIL import Image
import imagehash
from collections import defaultdict
def find_near_duplicates(photo_dir: Path, hash_size: int = 16, threshold: int = 5):
"""Find groups of visually similar photos using perceptual hashing."""
hashes = defaultdict(list)
for photo in photo_dir.rglob("*"):
if photo.suffix.lower() not in {".jpg", ".jpeg", ".png", ".tiff"}:
continue
try:
img = Image.open(photo)
h = str(imagehash.phash(img, hash_size=hash_size))
hashes[h].append(photo)
except Exception:
continue
# Group photos with similar hashes
duplicate_groups = []
for h, paths in hashes.items():
if len(paths) > 1:
duplicate_groups.append(paths)
# Also check between different hashes for near-matches
hash_list = list(hashes.keys())
for i, h1 in enumerate(hash_list):
for h2 in hash_list[i + 1:]:
diff = imagehash.hex_to_hash(h1) - imagehash.hex_to_hash(h2)
if diff <= threshold:
combined = hashes[h1] + hashes[h2]
if combined not in duplicate_groups:
duplicate_groups.append(combined)
return duplicate_groups
groups = find_near_duplicates(Path("/photos/organised/2025"))
print(f"Found {len(groups)} groups of similar photos")
for group in groups[:5]: # Preview first 5
print(f" Group: {', '.join(p.name for p in group)}")
This won't delete anything automatically — it gives you the groups so you can decide which to keep. For RAW files, I use the embedded JPEG thumbnail for hashing, which is fast enough to process thousands of photos in seconds.
Step 4: Batch Export for the Web
I publish photos on this blog and on social media. Both need resized, compressed JPEGs — not 42-megapixel RAW files. Here's the batch export pipeline:
from PIL import Image, ImageOps
from pathlib import Path
def batch_export(
source_dir: Path,
output_dir: Path,
max_dimension: int = 2048,
quality: int = 85,
):
"""Export photos for web with resizing and optional watermark."""
output_dir.mkdir(parents=True, exist_ok=True)
exported = []
for photo in source_dir.rglob("*"):
if photo.suffix.lower() not in {".jpg", ".jpeg", ".png", ".tiff"}:
continue
try:
img = Image.open(photo)
# Auto-rotate based on EXIF
img = ImageOps.exif_transpose(img)
# Resize maintaining aspect ratio
img.thumbnail((max_dimension, max_dimension), Image.Resampling.LANCZOS)
# Build output path
output_path = output_dir / f"{photo.stem}_web.jpg"
img.save(output_path, "JPEG", quality=quality, optimize=True)
exported.append(output_path)
print(f"Exported: {output_path.name} ({img.size[0]}x{img.size[1]})")
except Exception as e:
print(f"Skipped {photo.name}: {e}")
return exported
# Export all March 2025 photos for web
batch_export(
Path("/photos/organised/2025/03-March"),
Path("/photos/exported/web/2025/03"),
max_dimension=2048,
quality=85
)
The ImageOps.exif_transpose call is essential — it auto-rotates portrait photos based on the EXIF orientation tag. Without it, your portrait shots will render sideways in browsers that don't respect EXIF rotation (which is more browsers than you'd think).
Step 5: Adding Location Data Without Built-in GPS
Neither the Sony A7R II nor the Fujifilm X-T2 has built-in GPS. The Sony can tag photos with your phone's location via the PlayMemories Smart Remote app, but it only embeds the coordinates in the copy sent to your phone — not the file on the camera's SD card. The Fujifilm Camera Remote app does a one-shot location transfer, but you'd need to reconnect every time you move. Neither is practical for a day of shooting.
My solution: I run a GPS track log on my phone using GeoTag Photos Pro, then match the timestamps in post-processing. Here's how to apply the GPS data with Python:
import gpxpy
import httpx
from datetime import datetime, timezone
def load_gps_track(gpx_path: str) -> list:
"""Load GPS track points from a GPX file."""
with open(gpx_path) as f:
gpx = gpxpy.parse(f)
points = []
for track in gpx.tracks:
for segment in track.segments:
for point in segment.points:
points.append({
"time": point.time.replace(tzinfo=timezone.utc),
"lat": point.latitude,
"lon": point.longitude,
})
return sorted(points, key=lambda p: p["time"])
def find_nearest_location(photo_time: datetime, gps_points: list, max_offset_sec: int = 300):
"""Find the closest GPS point to a photo's timestamp."""
closest = None
min_diff = float("inf")
for pt in gps_points:
diff = abs((photo_time - pt["time"]).total_seconds())
if diff < min_diff and diff <= max_offset_sec:
min_diff = diff
closest = pt
return closest
def reverse_geocode(lat: float, lon: float) -> dict:
"""Get location name from GPS coordinates using Nominatim."""
response = httpx.get(
"https://nominatim.openstreetmap.org/reverse",
params={"lat": lat, "lon": lon, "format": "json", "zoom": 10},
headers={"User-Agent": "photo-organiser/1.0"}
)
data = response.json()
address = data.get("address", {})
return {
"city": address.get("city", address.get("town", "")),
"country": address.get("country", ""),
}
# Load track and match to photos
gps_points = load_gps_track("/tracks/2025-03-15.gpx")
loc = reverse_geocode(gps_points[0]["lat"], gps_points[0]["lon"])
print(f"Shoot location: {loc['city']}, {loc['country']}") # Montevideo, Uruguay
The key is keeping your camera's clock synchronised with your phone's clock — I check this before every shoot. The max_offset_sec parameter lets you control how much time drift to tolerate; 5 minutes works well for walking around a city.
Putting It All Together: The Full Pipeline
Here's the complete pipeline I run every time I come back from a shoot:
#!/bin/bash
# photo-pipeline.sh -- run after every shoot
set -e
SOURCE=/mnt/sd_card/DCIM
DEST=/photos/organised
EXPORT=/photos/exported
GPX=/tracks
DATE=$(date +%Y-%m-%d)
echo "[$DATE] Starting photo pipeline..."
# 1. Extract metadata and organise by date/camera
python3 organise_photos.py --source $SOURCE --dest $DEST
# 2. Find and report near-duplicates
python3 find_duplicates.py --dir $DEST --report /tmp/duplicates-$DATE.json
# 3. Apply GPS track log to photos
python3 geotag_photos.py --dir $DEST --gpx $GPX/$DATE.gpx
# 4. Export web versions
python3 batch_export.py --source $DEST --output $EXPORT --max-dim 2048 --quality 85
# 5. Sync exports to cloud storage
rclone sync $EXPORT/web cloud:photos/web --progress
echo "[$DATE] Pipeline complete!"
From SD card to organised, geotagged, deduplicated, web-ready, and backed up — in about 5 minutes for a typical shoot. It used to take me an entire evening.
The Bigger Picture
I've been shooting photos and writing Python for years, but it took me too long to put the two together. The same skills I use for data science — extracting, transforming, and loading data — apply perfectly to photo management. EXIF data is just another dataset. File organisation is just another ETL pipeline. Batch processing is batch processing, whether you're resizing images or transforming pandas DataFrames.
The scripts above aren't production-grade software — they're the kind of quick, practical tools I've always written for myself and shared on this blog. They work for my workflow with my cameras. But the approach is universal: extract the metadata, automate the boring parts, and keep your creative energy for the things that actually matter — like deciding which of those 15 burst shots is the keeper.
If you're sitting on a hard drive full of unorganised photos, start with Step 1 — just the EXIF extraction and folder organisation. That alone will change how you interact with your photo library. The rest is gravy.