"Why percentiles, not averages: the statistical reasoning behind fair price computation"

2026-05-12 7 min read

statistics data-engineering pricing methodology python

"How the Observatory calculates P25/P50/P75 fair prices from noisy, sparse, multilingual listing data — and why the mean would be the wrong statistic for a second-hand synth market."

The Observatory publishes three numbers for each synthesiser model: P25, P50, and P75. These are the 25th, 50th, and 75th percentile prices across all active listings for that model. No weighted averages, no moving means. The choice is deliberate and reflects the actual shape of second-hand synth prices.

Why the mean fails here

Consider the Roland Juno-106. On any given week you might find:

12 listings between 350€ and 500€ — working units in average condition
2 listings at 900€ — fully serviced units with new voice chips
1 listing at 1,400€ — a boutique reseller with optimistic expectations
1 listing at 80€ — non-functional, listed for parts

The arithmetic mean of those 16 listings lands somewhere around 490€, which is misleading in two directions simultaneously: it understates what a buyer will actually pay for a working unit, and it is inflated by the high-end outliers. Neither the parts unit nor the boutique listing represents the market most buyers operate in.

The P50 — the median — is 430€. That number answers the question a buyer actually needs: what does a typical working Juno-106 cost in the current market?

Bimodal distributions

The more fundamental problem is that second-hand synth prices are often bimodal. The Juno-106 has a known hardware failure mode (the voice chips degrade over time), and the market has priced this in. There are two distinct populations: faulty or partially working units, and fully functional or restored units. These sell at structurally different prices, and any single-point statistic — mean or median — blurs this.

The P25/P50/P75 triplet preserves information about the spread:

P25 approximates the lower end of the functional market — a reasonable deal, possibly with minor issues
P50 approximates the central market price for a working unit
P75 signals what a premium condition or freshly serviced unit costs

A wide gap between P25 and P75 (relative to P50) is itself informative: it indicates a heterogeneous market where condition, provenance, or recent restoration history matters significantly. A narrow spread indicates a commodity: condition has been homogenised or the model is consistently available at a stable price.

The data reality: sparsity

The European second-hand synth market is not deep. Many models have only 2–5 active listings at any given moment. Running percentile calculations on 3 data points is statistically legitimate but potentially misleading — a single outlier can move the P50 substantially.

The Observatory enforces a minimum sample threshold: a fair price is only published when at least 3 observations exist for a model within the active listing window. Below that threshold, the model page shows "insufficient data" rather than a number with false confidence.

Three is a low bar, but it is the honest threshold for this market. Setting it higher — say, 10 — would suppress fair prices for the majority of the catalogue, covering fewer models and providing less value. The trade-off is disclosed in the methodology.

The 50€ price floor

Listings below 50€ are excluded from price calculations regardless of how they are classified.

The reasoning: a listing for a Roland D-50 at 35€ is almost certainly for a non-functional unit, sold for parts or as a project, or is a data error (price in a non-EUR currency that was not correctly converted). Including it would pull the P25 toward a price point that represents a fundamentally different product.

This is a hard filter, not a statistical one. It does not try to detect outliers — it removes a category of listing that is definitionally outside the scope of the fair price calculation: what does a working instrument cost?

What counts as an active listing

The price calculation uses only listings with status = 'active' in staging_observations. A listing becomes inactive when:

The scraper detects it has been removed from the source marketplace
It exceeds the maximum age threshold (currently 60 days)

The 60-day ceiling prevents stale listings from anchoring the price. A Roland Juno-106 listed at 650€ twelve months ago that never sold is not evidence of market value — it is evidence of an optimistic seller. Capping at 60 days keeps the percentiles reflective of what is actually trading.

Trend direction

Each model page shows a directional trend indicator (up / stable / down). The trend is computed by comparing the current P50 against the oldest P50 entry in the historical price log for that model.

The classification is conservative:

Change	Label
> +3%	Rising
< −3%	Falling
Within ±3%	Stable

The ±3% band exists because price movements smaller than this are within the noise of sparse data. A P50 of 430€ vs 435€ from four weeks ago tells you nothing meaningful about direction — it tells you there were slightly different listings in the pool. The ±3% threshold filters this noise out.

What is not (yet) implemented

Source concentration correction: if 8 of 10 listings for a model come from a single marketplace, the percentiles reflect that marketplace's pricing norms more than the broader European market. A source-weighting correction that limits any single source to a maximum share of the pool is on the roadmap but not yet implemented. Until it is, the methodology page discloses that Hispasonic (the largest source by listing volume) has an outsized influence on Spanish-brand models.

Condition-stratified prices: the condition field now exists in the database (extracted as part of the PII purge pipeline), but condition-stratified fair prices — "P50 for Mint condition listings" vs "P50 for Used — Good condition" — require a larger per-model sample than the current catalogue supports. This will become viable as observation volume grows.

The published numbers

All three percentiles (P25, P50, P75) are included in the CC BY 4.0 open dataset at github.com/albertjimrod/eusynth-market-data, along with observation count, source breakdown, and the date of the calculation. The observation count is included specifically to allow downstream users to apply their own minimum-sample filtering.

Methodology Open dataset Contact

← All case studies