Video QoE metrics for streaming products: definitions, healthy ranges, and the alerts that actually fire

June 4, 2026

15 Min

Video Engineering

Sixteen QoE metrics measure how your video plays back in production. Five of them deserve a page at 3 am. The other eleven belong on dashboards for diagnosis and attribution, not in your on-call rotation.

This article is the working reference for the people on the hook when streams break: video engineering leads, streaming product owners, and on-call engineers at companies where viewer experience and revenue are the same thing.

Every metric below is paired with its definition, the unit FastPix stores it in, the healthy range, and the exact threshold where it warrants action.

TL;DR

The 5 metrics that trigger production alerts in FastPix: error_rate, rebuffer_ratio, video_startup_time, exit_before_video_start, video_startup_failure
Healthy thresholds in FastPix units: VST under 2,000 ms, rebuffer_ratio under 0.005 (0.5%), error_rate under 0.005, EBS under 0.05 (5%), startup failure under 0.005
The other 11 metrics matter for diagnosis (player startup time, connection setup, frame drops, bitrate, upscaling, rebuffer frequency, stall rate, watch time, concurrent streams, and more). They live in dashboards, not in PagerDuty.
Alert rules support: operator (> strict, >= inclusive), thresholdValue, minViews, severity (low/medium/high/critical), cooldownMinutes, recoveryBucketCount, and AND-logic filters across 7 dimensions
Filters supported: country, os_name, cdn, player_name, device_type, browser_name, video_id
FastPix also runs exception alerts at the video-title level (e.g., 30% playback failure on one asset triggers automatically, no setup required)
Composite signal: FastPix QoE Score (0-100 rollup from stability, render quality, startup) for exec reporting, not engineering debugging
Free tier: up to 100,000 streaming views per month, no credit card required

Quick reference: 16 QoE metrics with units and alert status

Category	Metric	Unit	Healthy	FastPix custom alert
Startup	Video startup time (VST)	ms	< 2,000	Yes: `video_startup_time`
Startup	Video startup failure	rate	< 0.005	Yes: `video_startup_failure`
Startup	Player startup time	ms	< 800	Dashboard only
Startup	Connection setup time	ms	< 300	Dashboard only
Startup	Time to first frame	ms	< 1,500	Dashboard only
Continuity	Rebuffer ratio	decimal	< 0.005	Yes: `rebuffer_ratio`
Continuity	Rebuffer frequency	events/min	< 0.3	Dashboard only
Continuity	Rebuffer duration (p95)	ms	< 1,000	Dashboard only
Continuity	Stall rate	session %	< 5%	Dashboard only
Video quality	QoE Score	0-100	> 80	Dashboard only
Video quality	Average bitrate	bps	Top 60% of ABR ladder	Dashboard only
Video quality	Downscaling rate	session %	< 2%	Dashboard only
Video quality	Frame drops	render %	< 1%	Dashboard only
Engagement	Error rate	decimal	< 0.005	Yes: `error_rate`
Engagement	Exits before video start	decimal	< 0.05	Yes: `exit_before_video_start`
Engagement	Watch time	seconds	(workload-specific)	Dashboard only
Engagement	Concurrent stream count	count	(workload-specific)	Dashboard only

The five "Yes" rows above are the FastPix custom alert metrics. The rest are captured per session as dashboard dimensions for diagnosis, attribution, and exec reporting. Setting custom alerts on the others is not supported by design.

Why FastPix alerts on 5 metrics, not 16

Picking what to page someone on is a product decision, not an engineering one. Alerting on every metric the SDK captures is the fastest way to burn out an on-call rotation. Each false page costs roughly 12 minutes of engineering focus when someone has to triage, ack, and decide whether to wake other people up; a noisy alert rule that fires four times a week costs nearly an engineering day per quarter.

The five metrics in the custom alert set were picked because they correlate directly with viewer drop-off and have well-defined thresholds. The other eleven are noisier signals that move for legitimate reasons (a new codec test ships, a CDN swap rolls out, a popular video drives traffic to a slow region). Alerting on those generates pages that look like incidents but resolve themselves when the underlying experiment ends. You want to see them on a dashboard, you do not want them in PagerDuty.

This is also why the FastPix Video Data SDK still captures 50+ data points per session even though only five trigger custom alerts. The other 45+ are the dimensions and attributes that turn a fired alert into an actionable diagnosis: when error_rate spikes, the dashboard shows you it was Android Chrome viewers on Comcast residential in the US Northeast at 8pm peak hours. The five alerts catch the incident; the 45+ dimensions tell you what to fix.

The 5 alert-worthy metrics with thresholds and API examples

Every FastPix alert rule is created with the same POST /api/v1/alert-rules endpoint. The metric name, unit, and threshold value change; everything else (operator, minViews, severity, cooldownMinutes, recoveryBucketCount, filters) is consistent across all five.

error_rate

Total views with a fatal player error divided by total views in the window. Captures codec-unsupported, manifest-404, DRM-failure, and other categorical failures that prevent the stream from playing. Stored as a decimal (0.10 = 10%).

Healthy	Warning	Critical
< 0.005 (0.5%)	0.005 - 0.02 (0.5% - 2%)	> 0.02 (2%)

Example: alert when error rate exceeds 10% on Android mobile in India, after at least 100 views in a 15-minute window.

json

1POST /api/v1/alert-rules
2{
3  "name": "High Error Rate: Android Mobile India",
4  "metric": "error_rate",
5  "operator": ">",
6  "thresholdValue": 0.10,
7  "minViews": 100,
8  "severity": "high",
9  "cooldownMinutes": 30,
10  "recoveryBucketCount": 3,
11  "windowMinutes": 15,
12  "filters": [
13    { "filterKey": "country", "filterValue": "IN" },
14    { "filterKey": "device_type", "filterValue": "mobile" },
15    { "filterKey": "os_name", "filterValue": "Android" }
16  ]
17}

rebuffer_ratio

Total time spent buffering during playback divided by total playback time. Stored as a decimal, not a percentage. 0.01 means 1%. Setting thresholdValue: 1 will compare against 100%, which never fires. Set thresholdValue: 0.01 for a 1% threshold.

Healthy	Warning	Critical
< 0.005 (0.5%)	0.005 - 0.01 (0.5% - 1%)	> 0.01 (1%)

Industry guidance puts the production threshold at 1% rebuffer ratio. Above 2%, you are losing meaningful viewer trust per session.

Example: alert when rebuffer ratio exceeds 1% on any device, with a 5-minute window for fast detection.

json

1POST /api/v1/alert-rules
2{
3  "name": "High Rebuffer Ratio",
4  "metric": "rebuffer_ratio",
5  "operator": ">",
6  "thresholdValue": 0.01,
7  "minViews": 50,
8  "severity": "critical",
9  "cooldownMinutes": 15,
10  "recoveryBucketCount": 2,
11  "windowMinutes": 5,
12  "filters": []
13}

video_startup_time

Time from the play event to the first frame rendered, measured on the client. Stored in milliseconds, not seconds. Setting thresholdValue: 3 will compare against 3 ms (always fires); set thresholdValue: 3000 for a 3-second threshold.

Healthy	Warning	Critical
< 2,000 ms	2,000 - 3,000 ms	> 3,000 ms

VST is the single signal most correlated with Exits Before Start. If you can only tune one startup alert, tune VST.

Example: alert when startup time exceeds 3 seconds on mobile, with a 15-minute window.

json

1POST /api/v1/alert-rules
2{
3  "name": "Slow Video Startup: Mobile",
4  "metric": "video_startup_time",
5  "operator": ">",
6  "thresholdValue": 3000,
7  "minViews": 100,
8  "severity": "high",
9  "cooldownMinutes": 30,
10  "recoveryBucketCount": 3,
11  "windowMinutes": 15,
12  "filters": [
13    { "filterKey": "device_type", "filterValue": "mobile" }
14  ]
15}

exit_before_video_start

Percentage of play events that produced zero frames viewed. The viewer pressed play, something failed, they left without seeing a single frame. Stored as a decimal (0.10 = 10%).

Healthy	Warning	Critical
< 0.05 (5%)	0.05 - 0.10 (5% - 10%)	> 0.10 (10%)

EBS above 10% is a strong startup-failure signal. Above 20% is an emergency.

Example: alert when EBS exceeds 10% across all viewers, with a 30-minute window.

json

1POST /api/v1/alert-rules
2{
3  "name": "High Exits Before Start",
4  "metric": "exit_before_video_start",
5  "operator": ">",
6  "thresholdValue": 0.10,
7  "minViews": 200,
8  "severity": "high",
9  "cooldownMinutes": 30,
10  "recoveryBucketCount": 3,
11  "windowMinutes": 30,
12  "filters": []
13}

video_startup_failure

Sessions where startup failed completely (the player initialized but never rendered a frame). Distinct from VST (slow startup) and EBS (viewer left during startup). This is the categorical "the stream did not start, full stop" signal. Stored as a decimal.

Healthy	Warning	Critical
< 0.005 (0.5%)	0.005 - 0.02 (0.5% - 2%)	> 0.02 (2%)

Example: alert when startup failure exceeds 1% on iOS Safari, with a 15-minute window.

json

1POST /api/v1/alert-rules
2{
3  "name": "Startup Failure: iOS Safari",
4  "metric": "video_startup_failure",
5  "operator": ">",
6  "thresholdValue": 0.01,
7  "minViews": 100,
8  "severity": "critical",
9  "cooldownMinutes": 15,
10  "recoveryBucketCount": 2,
11  "windowMinutes": 15,
12  "filters": [
13    { "filterKey": "os_name", "filterValue": "iOS" },
14    { "filterKey": "browser_name", "filterValue": "Safari" }
15  ]
16}

How FastPix custom alerts actually fire

Five mechanics distinguish FastPix's alert system from naive threshold tools that fire on noise.

minViews: the alert evaluates only when enough views have accumulated in the window. This prevents the classic false positive where 1 view with an error reads as 100% error rate. Set minViews based on traffic volume: 50 for small workloads, 200+ for high traffic.

operator: > (strict) or >= (inclusive) comparison against thresholdValue. Use > when you want the alert to fire on values strictly above the threshold; use >= when the threshold itself is unacceptable.

incident creation: when the threshold is breached and minViews is met, an incident opens. The incident records severity (low/medium/high/critical), the observed value at firing time, timestamp, and the matched filter combination. This gives on-call a structured handoff rather than a vague notification.

auto-recovery: when the metric drops below 50% of threshold for recoveryBucketCount consecutive windows, the incident auto-closes. No manual ack required. recoveryBucketCount: 3 means the metric has to stay healthy for 3 consecutive evaluation windows before the incident is marked resolved, which prevents flapping incidents.

cooldownMinutes: after a close, no new incident fires for the cooldown duration. This stops alert fatigue when a metric oscillates around the threshold. 15 minutes is reasonable for critical alerts; 30+ for high-severity alerts that need a longer settle.

windowMinutes controls the evaluation interval. Set windowMinutes: 0 for fastest detection (fires as soon as minViews is reached); set windowMinutes: 5 to 60 for a fixed rolling window.

When an alert fires, on-call receives an email like the one above: severity badge, the metric and observed value, the time, the filter combination that matched, the view count in the window, and direct links to view the incident in the FastPix dashboard or mute the alert for a defined window.

Filtering alerts by dimension

Every custom alert rule supports AND-logic filters across seven dimensions:

filterKey	What it scopes	Example values
country	ISO country code	"IN", "US", "BR"
os_name	Operating system	"Android", "iOS", "Windows"
cdn	CDN provider	"akamai", "cloudflare", "fastly"
player_name	Video player library	"videojs", "shaka", "fastpix"
device_type	Device category	"mobile", "desktop", "tv"
browser_name	Browser	"Chrome", "Safari", "Firefox"
video_id	Specific video identifier	your internal video ID

All filters use the = (equals) operator. Multiple filters are combined with AND logic: every condition must match for a view to count toward the alert bucket.

Example: alert only for Chrome users on Android in India.

json

1"filters": [
2  { "filterKey": "country", "filterValue": "IN" },
3  { "filterKey": "os_name", "filterValue": "Android" },
4  { "filterKey": "browser_name", "filterValue": "Chrome" }
5]

When that specific combination breaches the threshold, the alert fires. Other device/browser/country combinations do not trigger the same rule. This is how you build alerts that catch real regressions on specific viewer segments without false positives from the rest of the audience.

Exception alerts: the second kind of FastPix alert

Custom alerts are opt-in. You configure them via the API. FastPix runs a second alerting layer that is opt-out: exception alerts.

Exception alerts fire automatically when FastPix detects unusual error rates at the video-title level. If a specific media asset is producing playback errors above the default threshold (for example, 30% playback failure on one video across 100 views), an exception alert fires without you configuring anything. On-call receives an email naming the affected video, the error type, the failure rate, and the affected viewer count.

This catches the failure mode that custom alerts miss: a single broken asset that does not move the aggregate enough to trigger a workspace-wide alert but is clearly broken for everyone who tries to play it. A 100% failure on 200 views of one video is a real incident; the same 100% failure across 200 views in a workspace with 100,000 daily views is a 0.2% workspace error rate, invisible to a workspace-level threshold.

Exception alerts cover the asset-level failures that custom alerts cannot reach. Together, the two systems cover both workspace-wide degradation and per-asset breakage.

The 11 dashboard metrics for diagnosis (not paging)

The remaining 11 metrics are captured per session and surfaced in the FastPix Video Data dashboard. They are the signals you slice and dice when an alert fires, the columns you sort by, the dimensions you attribute root cause to. They are not paging-grade because they move for legitimate reasons that do not warrant an on-call response.

Startup dashboard metrics

Player startup time: time from page load to player ready state. Healthy under 800 ms. Above 1.5s usually points to JavaScript bundle bloat or render-blocking resources. Diagnostic for "is startup slow because of our app or because of the network?"

Connection setup time: DNS, TCP, TLS, and CDN edge selection before the first segment request. Healthy under 300 ms. Spikes usually mean CDN routing issues, stale DNS, or geographic blind spots. Track per region.

Time to first frame: the subset of VST that excludes player initialization. Useful for isolating whether a startup problem is player-side or pipeline-side.

Playback continuity dashboard metrics

Rebuffer frequency: number of distinct rebuffer events per minute of playback. A 0.5% rebuffer ratio split across twelve micro-stalls is worse than the same 0.5% in one longer stall. Frequency matters because each event resets viewer attention.

Rebuffer duration per event: average length of each buffer event. Sub-second stalls register as stutter; above 2 seconds registers as broken. Track p50 and p95 separately; p95 outliers drive complaints.

Stall rate: percentage of sessions that experience at least one stall. Different from rebuffer ratio (per-session view). A high stall rate with a low ratio means rare but disruptive events are reaching a meaningful share of viewers.

Video quality dashboard metrics

Average bitrate: mean delivered bitrate across the session, weighted by playback duration. Useful for verifying your ABR ladder is selecting the rungs you expected. A persistently low average bitrate on high-bandwidth networks signals a player tuning problem, not a network problem.

Downscaling and upscaling rates: percentage of session time where the player rendered at a lower or higher resolution than the display panel. Downscaling means the viewer's screen is showing less detail than the source supports. Both are signals the ABR is misjudging conditions.

Frame drops: frames the player decoded but failed to render within their scheduled time. Sub-1% drop rate is normal. Above 3% sustained, viewers perceive choppiness, especially on motion-heavy content like sports.

Engagement and composite dashboard metrics

Concurrent stream count: specific to live workloads. Unique viewers connected at a given moment. Useful as a denominator for live-event QoE aggregations and as a leading indicator for capacity events.

Watch time: total seconds watched in a session. Track against total video duration to get a normalized completion view. Watch time is noisier than the alert-worthy metrics because content quality is a confound: a great video keeps people watching regardless of QoE.

QoE Score and Watch time: FastPix's composite signals

FastPix publishes a composite QoE Score (0-100) computed from three component groups: stability (rebuffer-related signals), render quality (frame drop and downscaling signals), and startup (VST and EBS signals). The score is useful for exec reporting where leadership wants a single number to track quarter-over-quarter. It is not useful for engineering debugging because the rollup hides which underlying metric actually moved.

Practical rule: track the QoE Score for exec dashboards, alert on the five category-specific signals (error_rate, rebuffer_ratio, video_startup_time, exit_before_video_start, video_startup_failure). When the QoE Score drops without a category alert firing, your thresholds are tuned too loose.

Watch time, the second FastPix composite, tracks total seconds watched per session. Unlike completion rate (which depends on video duration), Watch time is a raw quantity. Normalize it against total duration for completion percentages, or against viewer count for average engagement per video.

Per-context calibration: mobile, desktop, CTV, live, VOD

QoE thresholds are not universal. The same 2,500 ms VST is acceptable on a smart TV warming up its decoder and intolerable on a mobile app where users expect instant playback.

Mobile users have the lowest patience and the highest network variance, a brutal combination. CTV users tolerate longer startup but expect zero rebuffering once playback begins. Desktop is the most forgiving baseline. Live workloads cannot recover from buffer underruns the way VOD can: a 2-second stall on VOD is degraded; the same stall on live sports is the difference between watching the goal and watching the celebration. Live alerting thresholds should be roughly 2x stricter than VOD.

Context	Acceptable VST	Acceptable rebuffer_ratio
Mobile VOD	< 2,000 ms	< 0.008 (0.8%)
Desktop VOD	< 2,500 ms	< 0.010 (1.0%)
CTV VOD	< 3,500 ms	< 0.005 (0.5%)
Mobile live	< 1,500 ms	< 0.003 (0.3%)
CTV live (sports)	< 2,000 ms	< 0.002 (0.2%)

The clean way to enforce this is one alert rule per context, with the filters that scope the rule to that audience. For example, a mobile VOD rule becomes a device_type: mobile filter plus your VOD-specific tag (a video_id list or a custom dimension).

Get started with FastPix custom alerts

Item	Value
Supported metrics	error_rate, rebuffer_ratio, video_startup_time, exit_before_video_start, video_startup_failure
Supported filters	country, os_name, cdn, player_name, device_type, browser_name, video_id
Operators	`>` and `>=`
Severity levels	low, medium, high, critical
Free tier	up to 100,000 streaming views per month, no credit card required
Create alert rules	`POST /api/v1/alert-rules`
List open incidents	`GET /api/v1/incidents?alert_type=custom`
Exception alerts	run automatically at the video-title level, no setup required

Signup is self-serve. Drop in the FastPix Video Data SDK on Web, iOS, or Android, send the first session events, then create your first alert rule via the API. Each metric maps to one rule per filter combination, so a "mobile India Chrome on Android" rule and a "mobile Brazil Chrome on Android" rule are two separate alert rules that can have different thresholds and severities.

FAQ

Why does FastPix support custom alerts on only 5 metrics?

Because picking what to page someone on is a product decision. Alerting on every metric the SDK captures burns out the on-call rotation: each false page costs roughly 12 minutes of focus, and noisy alert rules that fire weekly cost an engineering day per quarter. The five FastPix supports (error_rate, rebuffer_ratio, video_startup_time, exit_before_video_start, video_startup_failure) correlate directly with viewer drop-off and have well-defined thresholds. The other 11 metrics are valuable for diagnosis and exec reporting but are noisier signals that move for legitimate reasons that do not warrant paging.

What units does FastPix use for video_startup_time and rebuffer_ratio?

video_startup_time is stored in milliseconds. Set thresholdValue: 3000 for a 3-second alert, not thresholdValue: 3. rebuffer_ratio is stored as a decimal (0.01 = 1%). Set thresholdValue: 0.01 for a 1% alert, not thresholdValue: 1.

What's the difference between FastPix QoE Score and VMAF?

VMAF is the Netflix-developed perceptual quality metric scored 0-100, originally for VOD encoding evaluation. FastPix uses its own composite called QoE Score (also 0-100) computed from stability metrics (rebuffer-related), render quality metrics (frame drop and downscaling), and startup metrics (VST and EBS). The QoE Score is built specifically for live and VOD streaming QoE rather than codec-level perceptual quality, and is what shows up in the FastPix dashboard's executive view.

What is the difference between Watch time and completion rate?

Completion rate is the percentage of content consumed, typically calculated as watched seconds divided by total video duration. Watch time is the raw count of seconds watched. FastPix stores Watch time as the underlying metric; you derive completion percentages from it by dividing against total duration. Watch time is the more flexible primitive because the same value supports completion rate, average engagement per viewer, and total watch hours across an audience.

What's the difference between QoE and QoS?

QoS describes infrastructure (bandwidth, latency, packet loss). QoE describes viewer experience (startup, stalls, fidelity). QoS is causal; QoE is observational. Perfect QoS can still produce degraded QoE if the player or encoding is misconfigured.

Do QoE thresholds differ for live vs VOD?

Yes. Live should be roughly 2x stricter on rebuffer_ratio because viewers cannot rewind past a stall. Sports stricter still: target sub-0.3% rebuffer_ratio (thresholdValue: 0.003) and sub-1,500 ms startup for high-stakes mobile live.

What happens when an alert fires?

On-call receives an email with severity badge, the metric and observed value, the timestamp, the filter combination that matched, the view count in the window, and links to view the incident in the FastPix dashboard or mute the alert. The incident is created in FastPix's incident store and remains open until the metric drops below 50% of threshold for recoveryBucketCount consecutive windows, at which point it auto-closes. After close, the cooldownMinutes window prevents a new incident from firing until the cooldown expires.

Can I use FastPix Video Data without buying FastPix Live Streaming or VOD?

Yes. The Video Data SDK works with any player and any video infrastructure. Drop the SDK into your existing HLS.js, AVPlayer, ExoPlayer, or Shaka player, point it at your FastPix workspace, and you get the five custom alerts plus the 11 dashboard metrics with no migration of your live or VOD pipeline.

How does FastPix's exception alert work?

Exception alerts run automatically at the video-title level. FastPix watches per-asset error rates against a default threshold (configurable per workspace); when one asset exceeds the threshold across a meaningful view count, an exception alert fires regardless of whether you configured a custom alert. This catches the per-asset failures that workspace-level custom alerts cannot see, like a single broken video producing 100% errors that only represents 0.2% of total workspace error rate.

Author

Hema Gowtham RSoftware Engineer

Join Our Video Streaming Newsletter

Save Time with Video Search

Utilize powerful video search capabilities to find content instantly.

Know more

Enjoyed reading? You might also like

Understanding Why Viewers Drop Off: Combining FastPix Video Data with ClickHouse

June 19, 2026|12 Min

Video QoE metrics for streaming products: definitions, healthy ranges, and the alerts that actually fire

Join Our Newsletter for the Latest in Streaming Technology

TL;DR

Quick reference: 16 QoE metrics with units and alert status

Why FastPix alerts on 5 metrics, not 16

The 5 alert-worthy metrics with thresholds and API examples

error_rate

rebuffer_ratio

video_startup_time

exit_before_video_start

video_startup_failure

How FastPix custom alerts actually fire

Filtering alerts by dimension

Exception alerts: the second kind of FastPix alert

The 11 dashboard metrics for diagnosis (not paging)

Startup dashboard metrics

Playback continuity dashboard metrics

Video quality dashboard metrics

Engagement and composite dashboard metrics

QoE Score and Watch time: FastPix's composite signals

Per-context calibration: mobile, desktop, CTV, live, VOD

Get started with FastPix custom alerts

FAQ

Why does FastPix support custom alerts on only 5 metrics?

What units does FastPix use for video_startup_time and rebuffer_ratio?

What's the difference between FastPix QoE Score and VMAF?

What is the difference between Watch time and completion rate?

What's the difference between QoE and QoS?

Do QoE thresholds differ for live vs VOD?

What happens when an alert fires?

Can I use FastPix Video Data without buying FastPix Live Streaming or VOD?

How does FastPix's exception alert work?

Join Our Video Streaming Newsletter

Enjoyed reading? You might also like

Understanding Why Viewers Drop Off: Combining FastPix Video Data with ClickHouse

Microlearning vs e-learning: 6 differences that change how you build

How to add multi-language subtitle tracks to your video