《YouTube 規模的 OTT.pdf》由會員分享,可在線閱讀,更多相關《YouTube 規模的 OTT.pdf(69頁珍藏版)》請在三個皮匠報告上搜索。
1、NAB 2025OTT at YouTube ScaleLive Workflow-Sean McCarthyLive Scale,Quality and Latency-Kirk HallerSynthetic Experience Metrics-Chas MastinStable Volume and Industry Consensus-Steven Robertson Live Workflow-Sean McCarthyOTT PurposeProvide a personalized destination for transformative media experiences
2、,connecting users with their passions on a global scale.Team MissionDeliver the highest quality and most reliable premium live video streams to enable YouTube products.OTT Specific Infrastructure-Switch-SLURP-TekProbePRODUCTIONTRANSPORTStudio GearAgentBandaid Edge CacheGoogle PeeringISPXPLAYBACKVIDE
3、O DISTRIBUTIONAS65641AS15169ISP3ISP2ISP1SRT TxSRT RxDirect FiberB2/B4CORE STREAMINGEDGE STREAMINGTRANSCODINGYT ClientUSER ISPINGESTIONB2/B4ArgosChunkPeering/ECEncode VideoTranscode VideoDeliver VideoPlay VideoStadiumVENUEYouTube/GooglePARTNER2016201720182015An OTT TimelineFirst OTA antenna on HQ roo
4、f“When YouTube TV started out,a group of engineers climbed onto the roof of YouTube headquarters while holding an antenna in order to build a prototype”-blogCovered 3 DMAs and head-ends in CBF and AUSFirst NOC established24x7 signal ingestionYouTube TV launches5 DMAs-blogMarquee monitoring starts.EO
5、G expands to adjust schedules for EPG/DVR accuracyPSold DAI launchesYouTube TV expands to 100 DMAs-DAI launches on YTTV-750 Linear Channels offered2020-202120222023+2019PTC launches-blogYTV crosses 5 million subsFirst SRT partner launches(BeIN Sports)Paulisto launches-blogGrowing the ServiceService
6、Continues to GrowNFL Sunday Ticket launches-blogMultiview launchesYouTube TV goes Nationwide-blog and 1 million subs!MLB GOTW launches7Legacy InfrastructureInfrastructure Sites:3 NHEs&8 RHEs12 IHEs(international)101 OTA Sites30+Direct Fiber Installations270+Network Switches215+IRDs3 Satellite Sites2
7、00 Appliance Ecoders Converge 300+sites to 8Improved Q&RSignificantly less maintenanceLower infrastructure management costsUS Infrastructure Convergence6 Super Headends2 National HeadendsSHE Stop Gap to Reduce complexity9SHE Design-High Level6 Geo Diverse PoPs2 PoPs(one cluster)per stream2 Zones per
8、 PoP4 redundant copies per streamHigh capacity throughputFiber circuits from media cos and 3PAs 10Inside a PoP-Spine-Leaf ArchitectureRedundant X,Y PathsBorder Leafs for external connections(3PAs,Cross connects)Access Leafs are for internal devices(monitoring,normalization,etc)Spine Nodes connect al
9、l leaf nodes as well as Border RoutersBorder Routers are direct connected with Google Prod network 11SHE AssessmentThe good-Maintainable and supportable,heterogeneous-Lots of headroom in terms of capacity,bandwidth and server expansion capabilities-Extremely redundant and able to support high qualit
10、y streams-Fully capable of video normalization/processing via 3rd party devicesThe not-so-good-Expensive and timely build outs(18-24 months design to install)-Non elastic/finite scaling,difficult manual capacity planning-Fiber ingest only-Difficult to monitor-Strong network engineering expertise req
11、uired for design,implementation and maintenanceYTTV Acquisition Landscape FormatProsConsMPEGTSMost widely used in broadcastTypically high throughput and low latencyEasy for multi-partner distributionPhysical vulnerabilities(fiber cuts)Not supported in cloud environments(can explore direct connect w/
12、multicast support)Expensive,fixed bandwidthOften multicast onlySRTInternet nativeResilient(FEC)Open SourceStrong ecosystem support/toolingStrong adoption for live events,growing support for linearLow latencyMore difficult server to implement than HTTPMultiple modes/implementationsMore expensive&diff
13、icult to scaleNo video encryptionAlthough internet native,better with dedicated direct connectsCMAF over CDN for SyndicationPassthrough opportunity for distributors(no transcoding required)New Codecs already supported(AV1)Full DRM supportExtremely cost effective and simple to implementCan converge D
14、2C and partner distribution processingBetter geo targeting/content replacement controlCMAF packaging not ideal for re-transcodingMore latencyMedia over Quic(MoQ)Low latency,target latencyHead-of-line blocking,better congestion control&tputMulti-format supportSecure,ScaleableAble to advertise media m
15、etadataNew emerging standard,low adoptionThe Metadata ChallengeSCTE 224Event-driven ESNIFlexible metadataReal-time signalingHow TV Metadata is shared todayCan be ExpensiveMain API based on a linear data model,not great for discrete live eventsNew services and providers arent in the ecosystemTime to
16、propagate data/changesRegistry for a Universal Stream IdentifierMetadata reconciliationschema.org open spec OROR14Observability ChallengeChallenges-Correlating ingestion events with QoE playback issues-Monitoring the bitflow of a channel/stream distributed request tracing-Differentiating component,t
17、ransport level and content level issues-Realtime Raw log collectionGoals-Realtime high volume,high dimensionality,high cardinality,data pipelines(sub 30 seconds)-Fast querying-Curated,well understood metrics for operations-End to end observabilityA Need for End-to-End MonitoringAcquisitionNormalizat
18、ionTranscodingLive Origin/PackagerCDNClient NUCLEONCMSDRM-Network Load/Tput Metrics-PCaps-Zix/SRT ingestor metrics-DCM-Slurp LogsVI access logs for OTT streamsVI pre-defined QoE metrics(not raw beacons)Metrics/AlarmsLicense and key request access logsQuality Real Time Video at Scale -Kirk HallerYT L
19、ive-Switch-SLURP-TekProbePRODUCTIONTRANSPORTStudio GearAgentBandaid Edge CacheGoogle PeeringISPXPLAYBACKVIDEO DISTRIBUTIONAS65641AS15169ISP3ISP2ISP1SRT TxSRT RxDirect FiberB2/B4CORE STREAMINGEDGE STREAMINGTRANSCODINGYT ClientUSER ISPINGESTIONB2/B4ArgosChunkPeering/ECEncode VideoTranscode VideoDelive
20、r VideoPlay VideoStadiumVENUEYouTube/GooglePARTNER17Diversity and ScaleYouTube Live:from casual mobile to professional 4K HDR Streams range from Mobile to Gaming and Desktop to Broadcast.Beyond YouTube TV,sports are a big international presence on YouTube.YouTube Live Streams can be running for year
21、s18QualityVideo quality drives viewer engagementIn 2024,Living Room accounted for 50%of Coachellas total livestream watchtime,the highest of any year to date.In 2025,we will also be using“Watch With”,allowing creators to commenting on the stream.19QualityMeasuring Audio/Video QualityIn 2022,YouTube
22、and Google research opened source UVQML Model trained on Mean Opinion Scores(MOS)Based on content,distortions and compression data“No reference”metric20QualityMeasuring Audio/Video QualityMOS score onTVSatisfactionResolution 4.1795%10804.095%720UVQ scorePerceptual quality1.0,3.5)relatively low3.5,4.
23、1)fair4.1,5,0relatively high0.050.1 UVQ deltaJust noticeable difference(JND)21QualityMeasuring Bad MinutesUse Sampling w UVQAggregate by minuteCount bad minutes22 LatencyThe tradeoffStream LatencyVIdeo QualityPlayback Quality23 LatencyPersonalizationAdjust tradeoff to fit viewerSignals:Content natur
24、eViewer signalInteractionPreferenceClient bandwidthNetwork healthStream healthSynthetic Experience Metrics-Chas Mastin25Playback Experience(Px)The Best Experience Per Bit26People Product Process27How to improve Experience for 2B Users?28The Naive View of QoENaive QoE Rank(Me,circa 2020)1.Video Start
25、 Failures2.Playback Failure3.Rebuffer4.Bitrate5.Start Latency6.Black Screens,A/V synch,long tail issues 29The Naive View of QoEQoE Rank(Me,circa 2025)1.Start Latency(Survival Style)2.Everything else30The Naive View of QoEStreaming Systems,transcoders,CDN,etcInternet,ISPDevice,OS,Client,Player,MediaQ
26、oE DataSource of Most Fixable IssuesDevice Buffer 31Synthetic Experience Metrics A Synthetic Experience Metric is a metric built out of QoE metrics time-normalized in some way to allow comparison and improvements.It can include combining multiple categories of QoE,or showing the likelihood of a bad
27、experience.Simple:%of video starts 1000ms 7DAComplex:Survival-basedHire a Data Scientist to build these32Synthetic Experience Metrics-What is an SLO?SLO33Synthetic Experience MetricsMetric SLOs either protect user experience,or allow you to set improvement goals.Rebuffer is an excellent metric to se
28、t defensive SLOs on,but its difficult to set improve goals on due to changes in audience and networking conditions.You end up having to spend a lot of time comparing things by hand,not a great use of engineering time.MTBR slightly betterLikelihood to rebuffer within x minutes the best34Synthetic Exp
29、erience Metrics p2Metrics based on Survival Curves let you make intelligent tradeoffs with other Defensive Metrics.Survival metrics measure the likelihood of surviving a Bad User Experience event over time(latent startup event,rebuffer,fatal error)Create a human perceptible threshold 1s is“slow”%of
30、playbacks per day going over that thresholdSessions with a Slow Playback Rate35Its Latency All the Way DownLatency is the best proxy for Experience,and the secret of improving the long tail,on device and server-side.You dont need Survival metrics,you can set appropriate user threshold%playbacks(ie:%
31、of playbacks start 1000ms)You may find that diagnosing shifts in your latency metrics take a lot of engineering time-tooling becomes important.Opportunities for AI analysis of trends and opportunities.Problems will show in latency before they show in experience.Improvements will make your product ma
32、gic.1)Set Improve OKRs 2)Task engineers with improving performance and not just building features 36Experience CodecsStart Fast,Play On.Not everything has a clear user experience metric:ie:Audio qualityStable Volume:Viewers vs Creators-Steven RobertsonLoudness management isan accessibility featureLo
33、udness variation wasa top user complaint-14dB on Main AppStable VolumeDecreased viewer complaintsIncreased overall usageLoud ads=less revenueIncreased creator complaintsWhy do viewersand creators disagree?I think this is wrong.I think this is scurvy.The cure for scurvy has been found several times.T
34、he cure for scurvy has been lost several times.The lime juice issued by the British was almost totally ineffective,probably because it came into contact with copper(which oxidizes vitamin C)when it was manufactured.https:/www.md-a.co/p/solving-scurvyThis is because at the same time,the advent of the
35、 steam engine made voyages much shorter,meaning that sailors usually no longer spent enough time between ports to develop scurvy.https:/www.md-a.co/p/solving-scurvyContext is key.Identify the empathy gap.Viewers vs creators:a false choicePersonal,transparent,user controllable.Client-controlled DRC is not new(especially in Living Room).Closed-loop iteration enablesa more complete solution.Well keep iterating.Thank you!