Engineering

A technical deep-dive into how the Observatory is built, how data flows through it, and the decisions behind the architecture.

Architecture overview

The Observatory is a Python/Flask application running inside Docker on a single VPS. The pipeline is a set of daily-scheduled crawlers that write into a normalised SQLite database; the web layer reads from that same database and renders pages server-side. There is no message queue, no microservices, no external data store — simplicity is a feature.

Tech stack

Layer Technology Why
Web framework Flask + Jinja2 Minimal, well-understood, easy to deploy as a single process.
Database SQLite Read-heavy workload with a single writer (the pipeline). SQLite's file-based model makes backups trivial.
Frontend Tabler UI + HTMX + Chart.js Server-rendered HTML with islands of interactivity via HTMX. No build step.
Crawlers Python (requests + BeautifulSoup) Each source has a dedicated crawler module. eBay uses the official Browse API.
Scheduling cron (inside container) Simple, no external dependencies.
Auth Flask-Login + bcrypt Session-based auth with hashed passwords. No OAuth dependencies.
Deployment Docker + Nginx reverse proxy Single-compose stack; Nginx handles SSL termination and static files.

Data pipeline

Data flows through four stages:

1. Crawl

Each source-specific crawler fetches raw listing HTML or JSON and writes to raw_html.

2. Extract

Parser modules extract structured fields (title, price, currency, condition, URL, image) from raw HTML.

3. Normalise

Prices converted to EUR; condition codes unified; titles translated to English via AI for cross-source matching.

4. Publish

Normalised observations written to staging_observations. Status set to active once all checks pass.

Key design decisions

Single SQLite file
Avoids the operational overhead of a server-based DB at this scale (~200k observations). Daily VACUUM keeps the file compact. Backups are a single cp.
Server-side image proxy
Hispasonic and Noiz block hotlinking via Referer checks. A lightweight /img/<obs_id> route fetches images server-side (no Referer header) and caches them in memory for 10 minutes.
No eBay images
eBay's ToS prohibit displaying their images outside their platform. Image URLs are nullified at ingest; listings link out to eBay directly.
CC BY 4.0 open data
All aggregated price data is published openly. The goal is a public resource, not a data moat.

Want to dig deeper or discuss the architecture? contact@intellisynthprices.com