FastPix

Video QoE metrics for streaming products: definitions, healthy ranges, and the alerts that actually fire

June 4, 2026
15 Min
Video Engineering

Sixteen QoE metrics measure how your video plays back in production. Five of them deserve a page at 3 am. The other eleven belong on dashboards for diagnosis and attribution, not in your on-call rotation.

This article is the working reference for the people on the hook when streams break: video engineering leads, streaming product owners, and on-call engineers at companies where viewer experience and revenue are the same thing.

Every metric below is paired with its definition, the unit FastPix stores it in, the healthy range, and the exact threshold where it warrants action.

TL;DR

  • The 5 metrics that trigger production alerts in FastPix: error_rate, rebuffer_ratio, video_startup_time, exit_before_video_start, video_startup_failure
  • Healthy thresholds in FastPix units: VST under 2,000 ms, rebuffer_ratio under 0.005 (0.5%), error_rate under 0.005, EBS under 0.05 (5%), startup failure under 0.005
  • The other 11 metrics matter for diagnosis (player startup time, connection setup, frame drops, bitrate, upscaling, rebuffer frequency, stall rate, watch time, concurrent streams, and more). They live in dashboards, not in PagerDuty.
  • Alert rules support: operator (> strict, >= inclusive), thresholdValue, minViews, severity (low/medium/high/critical), cooldownMinutes, recoveryBucketCount, and AND-logic filters across 7 dimensions
  • Filters supported: country, os_name, cdn, player_name, device_type, browser_name, video_id
  • FastPix also runs exception alerts at the video-title level (e.g., 30% playback failure on one asset triggers automatically, no setup required)
  • Composite signal: FastPix QoE Score (0-100 rollup from stability, render quality, startup) for exec reporting, not engineering debugging
  • Free tier: up to 100,000 streaming views per month, no credit card required

Quick reference: 16 QoE metrics with units and alert status

CategoryMetricUnitHealthyFastPix custom alert
StartupVideo startup time (VST)ms< 2,000Yes: `video_startup_time`
StartupVideo startup failurerate< 0.005Yes: `video_startup_failure`
StartupPlayer startup timems< 800Dashboard only
StartupConnection setup timems< 300Dashboard only
StartupTime to first framems< 1,500Dashboard only
ContinuityRebuffer ratiodecimal< 0.005Yes: `rebuffer_ratio`
ContinuityRebuffer frequencyevents/min< 0.3Dashboard only
ContinuityRebuffer duration (p95)ms< 1,000Dashboard only
ContinuityStall ratesession %< 5%Dashboard only
Video qualityQoE Score0-100> 80Dashboard only
Video qualityAverage bitratebpsTop 60% of ABR ladderDashboard only
Video qualityDownscaling ratesession %< 2%Dashboard only
Video qualityFrame dropsrender %< 1%Dashboard only
EngagementError ratedecimal< 0.005Yes: `error_rate`
EngagementExits before video startdecimal< 0.05Yes: `exit_before_video_start`
EngagementWatch timeseconds(workload-specific)Dashboard only
EngagementConcurrent stream countcount(workload-specific)Dashboard only

The five "Yes" rows above are the FastPix custom alert metrics. The rest are captured per session as dashboard dimensions for diagnosis, attribution, and exec reporting. Setting custom alerts on the others is not supported by design.

Why FastPix alerts on 5 metrics, not 16

Picking what to page someone on is a product decision, not an engineering one. Alerting on every metric the SDK captures is the fastest way to burn out an on-call rotation. Each false page costs roughly 12 minutes of engineering focus when someone has to triage, ack, and decide whether to wake other people up; a noisy alert rule that fires four times a week costs nearly an engineering day per quarter.

The five metrics in the custom alert set were picked because they correlate directly with viewer drop-off and have well-defined thresholds. The other eleven are noisier signals that move for legitimate reasons (a new codec test ships, a CDN swap rolls out, a popular video drives traffic to a slow region). Alerting on those generates pages that look like incidents but resolve themselves when the underlying experiment ends. You want to see them on a dashboard, you do not want them in PagerDuty.

This is also why the FastPix Video Data SDK still captures 50+ data points per session even though only five trigger custom alerts. The other 45+ are the dimensions and attributes that turn a fired alert into an actionable diagnosis: when error_rate spikes, the dashboard shows you it was Android Chrome viewers on Comcast residential in the US Northeast at 8pm peak hours. The five alerts catch the incident; the 45+ dimensions tell you what to fix.

The 5 alert-worthy metrics with thresholds and API examples

Every FastPix alert rule is created with the same POST /api/v1/alert-rules endpoint. The metric name, unit, and threshold value change; everything else (operator, minViews, severity, cooldownMinutes, recoveryBucketCount, filters) is consistent across all five.

error_rate

Total views with a fatal player error divided by total views in the window. Captures codec-unsupported, manifest-404, DRM-failure, and other categorical failures that prevent the stream from playing. Stored as a decimal (0.10 = 10%).

HealthyWarningCritical
< 0.005 (0.5%)0.005 - 0.02 (0.5% - 2%)> 0.02 (2%)

Example: alert when error rate exceeds 10% on Android mobile in India, after at least 100 views in a 15-minute window.

json
1POST /api/v1/alert-rules
2{
3  "name": "High Error Rate: Android Mobile India",
4  "metric": "error_rate",
5  "operator": ">",
6  "thresholdValue": 0.10,
7  "minViews": 100,
8  "severity": "high",
9  "cooldownMinutes": 30,
10  "recoveryBucketCount": 3,
11  "windowMinutes": 15,
12  "filters": [
13    { "filterKey": "country", "filterValue": "IN" },
14    { "filterKey": "device_type", "filterValue": "mobile" },
15    { "filterKey": "os_name", "filterValue": "Android" }
16  ]
17}

rebuffer_ratio

Total time spent buffering during playback divided by total playback time. Stored as a decimal, not a percentage. 0.01 means 1%. Setting thresholdValue: 1 will compare against 100%, which never fires. Set thresholdValue: 0.01 for a 1% threshold.

HealthyWarningCritical
< 0.005 (0.5%)0.005 - 0.01 (0.5% - 1%)> 0.01 (1%)

Industry guidance puts the production threshold at 1% rebuffer ratio. Above 2%, you are losing meaningful viewer trust per session.

Example: alert when rebuffer ratio exceeds 1% on any device, with a 5-minute window for fast detection.

json
1POST /api/v1/alert-rules
2{
3  "name": "High Rebuffer Ratio",
4  "metric": "rebuffer_ratio",
5  "operator": ">",
6  "thresholdValue": 0.01,
7  "minViews": 50,
8  "severity": "critical",
9  "cooldownMinutes": 15,
10  "recoveryBucketCount": 2,
11  "windowMinutes": 5,
12  "filters": []
13}

video_startup_time

Time from the play event to the first frame rendered, measured on the client. Stored in milliseconds, not seconds. Setting thresholdValue: 3 will compare against 3 ms (always fires); set thresholdValue: 3000 for a 3-second threshold.

HealthyWarningCritical
< 2,000 ms2,000 - 3,000 ms> 3,000 ms

VST is the single signal most correlated with Exits Before Start. If you can only tune one startup alert, tune VST.

Example: alert when startup time exceeds 3 seconds on mobile, with a 15-minute window.

json
1POST /api/v1/alert-rules
2{
3  "name": "Slow Video Startup: Mobile",
4  "metric": "video_startup_time",
5  "operator": ">",
6  "thresholdValue": 3000,
7  "minViews": 100,
8  "severity": "high",
9  "cooldownMinutes": 30,
10  "recoveryBucketCount": 3,
11  "windowMinutes": 15,
12  "filters": [
13    { "filterKey": "device_type", "filterValue": "mobile" }
14  ]
15}

exit_before_video_start

Percentage of play events that produced zero frames viewed. The viewer pressed play, something failed, they left without seeing a single frame. Stored as a decimal (0.10 = 10%).

HealthyWarningCritical
< 0.05 (5%)0.05 - 0.10 (5% - 10%)> 0.10 (10%)

EBS above 10% is a strong startup-failure signal. Above 20% is an emergency.

Example: alert when EBS exceeds 10% across all viewers, with a 30-minute window.

json
1POST /api/v1/alert-rules
2{
3  "name": "High Exits Before Start",
4  "metric": "exit_before_video_start",
5  "operator": ">",
6  "thresholdValue": 0.10,
7  "minViews": 200,
8  "severity": "high",
9  "cooldownMinutes": 30,
10  "recoveryBucketCount": 3,
11  "windowMinutes": 30,
12  "filters": []
13}

video_startup_failure

Sessions where startup failed completely (the player initialized but never rendered a frame). Distinct from VST (slow startup) and EBS (viewer left during startup). This is the categorical "the stream did not start, full stop" signal. Stored as a decimal.

HealthyWarningCritical
< 0.005 (0.5%)0.005 - 0.02 (0.5% - 2%)> 0.02 (2%)

Example: alert when startup failure exceeds 1% on iOS Safari, with a 15-minute window.

json
1POST /api/v1/alert-rules
2{
3  "name": "Startup Failure: iOS Safari",
4  "metric": "video_startup_failure",
5  "operator": ">",
6  "thresholdValue": 0.01,
7  "minViews": 100,
8  "severity": "critical",
9  "cooldownMinutes": 15,
10  "recoveryBucketCount": 2,
11  "windowMinutes": 15,
12  "filters": [
13    { "filterKey": "os_name", "filterValue": "iOS" },
14    { "filterKey": "browser_name", "filterValue": "Safari" }
15  ]
16}

How FastPix custom alerts actually fire

Five mechanics distinguish FastPix's alert system from naive threshold tools that fire on noise.

minViews: the alert evaluates only when enough views have accumulated in the window. This prevents the classic false positive where 1 view with an error reads as 100% error rate. Set minViews based on traffic volume: 50 for small workloads, 200+ for high traffic.

operator: > (strict) or >= (inclusive) comparison against thresholdValue. Use > when you want the alert to fire on values strictly above the threshold; use >= when the threshold itself is unacceptable.

incident creation: when the threshold is breached and minViews is met, an incident opens. The incident records severity (low/medium/high/critical), the observed value at firing time, timestamp, and the matched filter combination. This gives on-call a structured handoff rather than a vague notification.

auto-recovery: when the metric drops below 50% of threshold for recoveryBucketCount consecutive windows, the incident auto-closes. No manual ack required. recoveryBucketCount: 3 means the metric has to stay healthy for 3 consecutive evaluation windows before the incident is marked resolved, which prevents flapping incidents.

cooldownMinutes: after a close, no new incident fires for the cooldown duration. This stops alert fatigue when a metric oscillates around the threshold. 15 minutes is reasonable for critical alerts; 30+ for high-severity alerts that need a longer settle.

windowMinutes controls the evaluation interval. Set windowMinutes: 0 for fastest detection (fires as soon as minViews is reached); set windowMinutes: 5 to 60 for a fixed rolling window.

When an alert fires, on-call receives an email like the one above: severity badge, the metric and observed value, the time, the filter combination that matched, the view count in the window, and direct links to view the incident in the FastPix dashboard or mute the alert for a defined window.

Filtering alerts by dimension

Every custom alert rule supports AND-logic filters across seven dimensions:

filterKeyWhat it scopesExample values
countryISO country code"IN", "US", "BR"
os_nameOperating system"Android", "iOS", "Windows"
cdnCDN provider"akamai", "cloudflare", "fastly"
player_nameVideo player library"videojs", "shaka", "fastpix"
device_typeDevice category"mobile", "desktop", "tv"
browser_nameBrowser"Chrome", "Safari", "Firefox"
video_idSpecific video identifieryour internal video ID

All filters use the = (equals) operator. Multiple filters are combined with AND logic: every condition must match for a view to count toward the alert bucket.

Example: alert only for Chrome users on Android in India.

json
1"filters": [
2  { "filterKey": "country", "filterValue": "IN" },
3  { "filterKey": "os_name", "filterValue": "Android" },
4  { "filterKey": "browser_name", "filterValue": "Chrome" }
5]

When that specific combination breaches the threshold, the alert fires. Other device/browser/country combinations do not trigger the same rule. This is how you build alerts that catch real regressions on specific viewer segments without false positives from the rest of the audience.

Exception alerts: the second kind of FastPix alert

Custom alerts are opt-in. You configure them via the API. FastPix runs a second alerting layer that is opt-out: exception alerts.

Exception alerts fire automatically when FastPix detects unusual error rates at the video-title level. If a specific media asset is producing playback errors above the default threshold (for example, 30% playback failure on one video across 100 views), an exception alert fires without you configuring anything. On-call receives an email naming the affected video, the error type, the failure rate, and the affected viewer count.

This catches the failure mode that custom alerts miss: a single broken asset that does not move the aggregate enough to trigger a workspace-wide alert but is clearly broken for everyone who tries to play it. A 100% failure on 200 views of one video is a real incident; the same 100% failure across 200 views in a workspace with 100,000 daily views is a 0.2% workspace error rate, invisible to a workspace-level threshold.

Exception alerts cover the asset-level failures that custom alerts cannot reach. Together, the two systems cover both workspace-wide degradation and per-asset breakage.

The 11 dashboard metrics for diagnosis (not paging)

The remaining 11 metrics are captured per session and surfaced in the FastPix Video Data dashboard. They are the signals you slice and dice when an alert fires, the columns you sort by, the dimensions you attribute root cause to. They are not paging-grade because they move for legitimate reasons that do not warrant an on-call response.

Startup dashboard metrics

Player startup time: time from page load to player ready state. Healthy under 800 ms. Above 1.5s usually points to JavaScript bundle bloat or render-blocking resources. Diagnostic for "is startup slow because of our app or because of the network?"

Connection setup time: DNS, TCP, TLS, and CDN edge selection before the first segment request. Healthy under 300 ms. Spikes usually mean CDN routing issues, stale DNS, or geographic blind spots. Track per region.

Time to first frame: the subset of VST that excludes player initialization. Useful for isolating whether a startup problem is player-side or pipeline-side.

Playback continuity dashboard metrics

Rebuffer frequency: number of distinct rebuffer events per minute of playback. A 0.5% rebuffer ratio split across twelve micro-stalls is worse than the same 0.5% in one longer stall. Frequency matters because each event resets viewer attention.

Rebuffer duration per event: average length of each buffer event. Sub-second stalls register as stutter; above 2 seconds registers as broken. Track p50 and p95 separately; p95 outliers drive complaints.

Stall rate: percentage of sessions that experience at least one stall. Different from rebuffer ratio (per-session view). A high stall rate with a low ratio means rare but disruptive events are reaching a meaningful share of viewers.

Video quality dashboard metrics

Average bitrate: mean delivered bitrate across the session, weighted by playback duration. Useful for verifying your ABR ladder is selecting the rungs you expected. A persistently low average bitrate on high-bandwidth networks signals a player tuning problem, not a network problem.

Downscaling and upscaling rates: percentage of session time where the player rendered at a lower or higher resolution than the display panel. Downscaling means the viewer's screen is showing less detail than the source supports. Both are signals the ABR is misjudging conditions.

Frame drops: frames the player decoded but failed to render within their scheduled time. Sub-1% drop rate is normal. Above 3% sustained, viewers perceive choppiness, especially on motion-heavy content like sports.

Engagement and composite dashboard metrics

Concurrent stream count: specific to live workloads. Unique viewers connected at a given moment. Useful as a denominator for live-event QoE aggregations and as a leading indicator for capacity events.

Watch time: total seconds watched in a session. Track against total video duration to get a normalized completion view. Watch time is noisier than the alert-worthy metrics because content quality is a confound: a great video keeps people watching regardless of QoE.

QoE Score and Watch time: FastPix's composite signals

FastPix publishes a composite QoE Score (0-100) computed from three component groups: stability (rebuffer-related signals), render quality (frame drop and downscaling signals), and startup (VST and EBS signals). The score is useful for exec reporting where leadership wants a single number to track quarter-over-quarter. It is not useful for engineering debugging because the rollup hides which underlying metric actually moved.

Practical rule: track the QoE Score for exec dashboards, alert on the five category-specific signals (error_rate, rebuffer_ratio, video_startup_time, exit_before_video_start, video_startup_failure). When the QoE Score drops without a category alert firing, your thresholds are tuned too loose.

Watch time, the second FastPix composite, tracks total seconds watched per session. Unlike completion rate (which depends on video duration), Watch time is a raw quantity. Normalize it against total duration for completion percentages, or against viewer count for average engagement per video.

Per-context calibration: mobile, desktop, CTV, live, VOD

QoE thresholds are not universal. The same 2,500 ms VST is acceptable on a smart TV warming up its decoder and intolerable on a mobile app where users expect instant playback.

Mobile users have the lowest patience and the highest network variance, a brutal combination. CTV users tolerate longer startup but expect zero rebuffering once playback begins. Desktop is the most forgiving baseline. Live workloads cannot recover from buffer underruns the way VOD can: a 2-second stall on VOD is degraded; the same stall on live sports is the difference between watching the goal and watching the celebration. Live alerting thresholds should be roughly 2x stricter than VOD.

ContextAcceptable VSTAcceptable rebuffer_ratio
Mobile VOD< 2,000 ms< 0.008 (0.8%)
Desktop VOD< 2,500 ms< 0.010 (1.0%)
CTV VOD< 3,500 ms< 0.005 (0.5%)
Mobile live< 1,500 ms< 0.003 (0.3%)
CTV live (sports)< 2,000 ms< 0.002 (0.2%)

The clean way to enforce this is one alert rule per context, with the filters that scope the rule to that audience. For example, a mobile VOD rule becomes a device_type: mobile filter plus your VOD-specific tag (a video_id list or a custom dimension).

Get started with FastPix custom alerts

ItemValue
Supported metricserror_rate, rebuffer_ratio, video_startup_time, exit_before_video_start, video_startup_failure
Supported filterscountry, os_name, cdn, player_name, device_type, browser_name, video_id
Operators`>` and `>=`
Severity levelslow, medium, high, critical
Free tierup to 100,000 streaming views per month, no credit card required
Create alert rules`POST /api/v1/alert-rules`
List open incidents`GET /api/v1/incidents?alert_type=custom`
Exception alertsrun automatically at the video-title level, no setup required

Signup is self-serve. Drop in the FastPix Video Data SDK on Web, iOS, or Android, send the first session events, then create your first alert rule via the API. Each metric maps to one rule per filter combination, so a "mobile India Chrome on Android" rule and a "mobile Brazil Chrome on Android" rule are two separate alert rules that can have different thresholds and severities.

FAQ

Why does FastPix support custom alerts on only 5 metrics?

Because picking what to page someone on is a product decision. Alerting on every metric the SDK captures burns out the on-call rotation: each false page costs roughly 12 minutes of focus, and noisy alert rules that fire weekly cost an engineering day per quarter. The five FastPix supports (error_rate, rebuffer_ratio, video_startup_time, exit_before_video_start, video_startup_failure) correlate directly with viewer drop-off and have well-defined thresholds. The other 11 metrics are valuable for diagnosis and exec reporting but are noisier signals that move for legitimate reasons that do not warrant paging.

What units does FastPix use for video_startup_time and rebuffer_ratio?

video_startup_time is stored in milliseconds. Set thresholdValue: 3000 for a 3-second alert, not thresholdValue: 3. rebuffer_ratio is stored as a decimal (0.01 = 1%). Set thresholdValue: 0.01 for a 1% alert, not thresholdValue: 1.

What's the difference between FastPix QoE Score and VMAF?

VMAF is the Netflix-developed perceptual quality metric scored 0-100, originally for VOD encoding evaluation. FastPix uses its own composite called QoE Score (also 0-100) computed from stability metrics (rebuffer-related), render quality metrics (frame drop and downscaling), and startup metrics (VST and EBS). The QoE Score is built specifically for live and VOD streaming QoE rather than codec-level perceptual quality, and is what shows up in the FastPix dashboard's executive view.

What is the difference between Watch time and completion rate?

Completion rate is the percentage of content consumed, typically calculated as watched seconds divided by total video duration. Watch time is the raw count of seconds watched. FastPix stores Watch time as the underlying metric; you derive completion percentages from it by dividing against total duration. Watch time is the more flexible primitive because the same value supports completion rate, average engagement per viewer, and total watch hours across an audience.

What's the difference between QoE and QoS?

QoS describes infrastructure (bandwidth, latency, packet loss). QoE describes viewer experience (startup, stalls, fidelity). QoS is causal; QoE is observational. Perfect QoS can still produce degraded QoE if the player or encoding is misconfigured.

Do QoE thresholds differ for live vs VOD?

Yes. Live should be roughly 2x stricter on rebuffer_ratio because viewers cannot rewind past a stall. Sports stricter still: target sub-0.3% rebuffer_ratio (thresholdValue: 0.003) and sub-1,500 ms startup for high-stakes mobile live.

What happens when an alert fires?

On-call receives an email with severity badge, the metric and observed value, the timestamp, the filter combination that matched, the view count in the window, and links to view the incident in the FastPix dashboard or mute the alert. The incident is created in FastPix's incident store and remains open until the metric drops below 50% of threshold for recoveryBucketCount consecutive windows, at which point it auto-closes. After close, the cooldownMinutes window prevents a new incident from firing until the cooldown expires.

Can I use FastPix Video Data without buying FastPix Live Streaming or VOD?

Yes. The Video Data SDK works with any player and any video infrastructure. Drop the SDK into your existing HLS.js, AVPlayer, ExoPlayer, or Shaka player, point it at your FastPix workspace, and you get the five custom alerts plus the 11 dashboard metrics with no migration of your live or VOD pipeline.

How does FastPix's exception alert work?

Exception alerts run automatically at the video-title level. FastPix watches per-asset error rates against a default threshold (configurable per workspace); when one asset exceeds the threshold across a meaningful view count, an exception alert fires regardless of whether you configured a custom alert. This catches the per-asset failures that workspace-level custom alerts cannot see, like a single broken video producing 100% errors that only represents 0.2% of total workspace error rate.

Author
Hema Gowtham  R
Hema Gowtham RSoftware Engineer

Join Our Video Streaming Newsletter