《H.264 和 HEVC 編碼的最佳實踐.pdf》由會員分享,可在線閱讀,更多相關《H.264 和 HEVC 編碼的最佳實踐.pdf(88頁珍藏版)》請在三個皮匠報告上搜索。
1、BEST PRACTICES FOR ENCODING H.264 AND HEVCJan Ozer,Streaming Learning CenterJBest Practices for H.264 and HEVC EncodingH.264Choosing the optimal GOP sizeBenefits of a variable GOPBitrate controlChoosing a presetChoosing the optimal thread countBest AWS CPU HEVCChoosing the optimal GOP sizeBenefits o
2、f a variable GOPBitrate controlChoosing a presetChoosing the optimal thread countWorking with Wavefront Parallel ProcessingBothOptimizing scaling for lower rung productionFundamentalsTop rung target qualityPremium content 93 95 VMAFUGC 85 92 VMAFGetting there:Choose configuration optionsAdjust bitra
3、te to hit the targetVMAFMeasure harmonic meanAnd low-frame(potential for transient quality problems)Just noticeable difference(how much does a difference matter?)Greater than 50%of viewer notice 3 VMAF point Most differences discussed here will be much lessStill,.4 VMAF here,.6 there,pretty soon you
4、re close to a JNDPlus the target is 95(or whatever)After a few adjustments,you will have to increase the bitrate to achieve the target(boosting your bandwidth costs)H.264 AgendaChoosing the optimal GOP sizeBenefits of a variable GOPBitrate controlChoosing a presetChoosing the optimal thread countBes
5、t AWS CPU Uber Best PracticeContent Adaptive Encoding is the ultimate best practiceNone of the techniques discussed herein can touch CAE as an optimization techniqueBest Practice 1 Choose Longest Possible GOP SizeWhat:GOP size(I-frame interval)is a key config option in all encodesHistoricalVery smal
6、l(like.5 second)for MPEG-2Very long(10-20 seconds)for downloaded videoTypically,2-5 seconds for adaptive bitrate video Must divide evenly into segment sizeQuestionHow does GOP size impact qualityTest 13 files in 4 categoriesEntertainmentSportsAnimationOfficeBest Practice 1:Longer is BetterBenefit si
7、gnificant at lower rangeThen diminishing returns Key limit:must divide evenly into segment size10 second copy 1/2/5/10Why not try 10?Check for playabilityDiminishing returnsSynthetic clips(screencam,PPT)most susceptible BP2:Consider Variable GOP SizesSo,longer GOP+GOPs at scene changes Need packager
8、/player compatible with variable segment sizes https:/bit.ly/variable_GOPMetas David Ronca,“The optimal GOP size is aligned to the encoders placement of intra frames with a max spacing between 5-10 seconds.That is,let the encoder decide as much as possible.I-frames at scene changes boosts low-frame
9、score Best Practice 1:GOP SizeUse the longest possible GOP size(segment size)Use variable GOPs/segment sizes if supportedBest Practice 2:Optimize Bitrate ControlData rate:Assigned to file during encodingBitrate control-how encoder allocates the data rateQuestion:Whats the best bitrate control techni
10、que(and how much difference in quality and throughput?)CBR(constant bitrate encoding)Two-pass VBR(variable bitrate encoding)150%,200%,400%constrainedCapped CRF(constant rate factor)2-Pass VBRConstrained VBRTarget=1xMax/VBV=2xTypically ranges from 1.1x to 4xTested 1.5x,2x,and 4xBitrates and GOP size
11、customized for each fileTarget 94 VMAF2 seconds for 24,25,30,60 fpsProsOverall and low-frame quality ConsEncoding time increaseBitrate variabilityMax frame values(deliverability)Use caseVODffmpeg-y-i input.mp4-c:v libx264-b:v 2M-maxrate 4M-bufsize 4M-preset veryslow-g 60-threads-threads 8-pass 1-f m
12、p4 NUL“ffmpeg-y-i input.mp4-c:v libx264-b:v 2M-maxrate 4M-bufsize 4M-preset veryslow-g 60-threads-threads 8 output.mp41-Pass CBRffmpeg-y-i input.mp4-c:v libx264-b:v 2M-maxrate 2M-bufsize 2M-preset veryslow-g 60-threads-threads 8 output.mp4CBRTarget=1xMax/VBV=1xBitrates and GOP size customized for ea
13、ch fileProsShorter encoding timeBitrate consistency ConsOverall/low-frame quality Use caseLiveCapped CRF ffmpeg-y-i input.mp4-c:v libx264-crf 27-maxrate 2M-bufsize 4M-preset veryslow-g 60-threads-threads 8 output.mp4Capped CRF Target=crf value=VMAF 94Max=VBR/CBR target bitrateVBV=2x targetCRF/Caps a
14、nd GOP size customized for each fileProsReduced encoding time(single pass)Bitrate reduction(form of per-title)ConsOverall/low-frame quality Use caseLive/VODAbout Capped CRFhttps:/ BRVMAFLow-FrameVBR64:408,002K18,405K94.1071.57CBR52:527,999K16,477K92.6155.29Capped CRF 52:426,525K12,983K91.1454.14VBR(
15、8M target 16M Max)Capped CRF(CRF 27 8M Cap)EasyEasyHardEasyHardHardCBR(8M target 8M Max)Heres What VBRs Flexibility Gives YouHardEasyEasyHardRed=VBRGreen=CBRBlue=Capped CRFSource2-Pass VBRCBRCapped CRFBig Buck Bunny CBRC CRFOffice-ScreencamCBRC CRFObservationsEncoding savings real but not 2xCCRF del
16、ivers bitrate savings as wellOverall VMAF closeLow frame delta is significant400%has much higher max frame low frame about the sameCapped CRF Disclaimer Typically used instead of fixed ladder(like Apples)So“cap”is typically much higher,like 7800 kbpsLots of potential bitrate reduction baked inIn the
17、se tests,cap was same as CBR/VBR(95 VMAF)So,very little room to generate savingsMostly controlled by the cap,not CRFCap very stringently applied,which degrades both overall and low-frame scoresUseful for comparison purposes,but not a fair lookBottom LineCBR Only when essentialLive/tight connection b
18、andwidthsCapped CRFAlluring technology-bandwidth savings are understatedButSaves only 13%encoding time2-Pass VBRSlight increase in encoding cost and bandwidthBest overall and low-frame quality200%seems the best option How I tested all future encodesBest Practice 2:Bitrate ControlNobody ever got fire
19、d for using 200%2-Pass VBRTwo-pass x264 is very fast(so not 2x one-pass)CBR low frame issues,no bitrate benefitCapped CRFSaves some encoding timeCan shave significant bitrateLow-frame issues legit concern,even with fair comparisonBest Practice 3:Match Preset to View Count Preset functions and differ
20、encesAWS MediaConvert-Elemental codecHandBrake-x264 codec(ultrafast placebo)Fundamental tradeoffPreset selection mathExploring PresetsWhat does the preset do?Adjusts parameters to producers can choose desired quality/encoding time tradeoffx264-10 presets-ultrafast to placeboBig Question:Does the pre
21、set control distribution quality?Yes?No?29Preset RoleControls encoding time/cost,not quality Most producers:Choose quality level(VMAF 93-95/PSNR 45)and encode to match that quality levelIf lower quality preset doesnt achieve target quality,you boost the bitrateSo,preset doesnt control quality,it con
22、trols encoding cost and impacts bandwidth costChoosing a preset is always a tradeoff between encoding cost and bandwidth cost30Presets:Quality vs.Encoding Time Tradeoff3124 filesMeasure encoding timeHarmonic mean VMAFLow-frame VMAFPreset and%of maximum time/scoreWhats the best preset?Medium 99.1%VMA
23、F/7.7%encoding timeVeryslow 100%VMAF/24%encoding timeNext Question32How much do you have to boost the bitrate to match 100%quality?So,if your target is 95,and you use the medium preset,whats the required bitrate boostH.264 Preset33PresetBitrateEncoding timeUltrafast196%6%Superfast171%11%Veryfast151%
24、16%faster123%19%fast122%26%Medium112%31%Slow108%43%Slower106%56%Veryslow100%100%Placebo100%408%Would never use placebo,so adjust comparisons to veryslowUse medium preset:-save 69%encoding cost-but,must boost bitrate by 12%to achieve same quality x264-1080p30 file Viewer Count Breakeven-$0.08/GB34Enc
25、oding cost=$.122.22 GB/hr .08=$.18/hour250$0.18=$45Total=$45.12Encoding cost=$.351.9168 GB/hr .08=$.1533/hour5000$0.1536=$766.72Total=$767 x264-1080p30 file Viewer Count Breakeven-$0.08/GB35Encoding cost=$.122.22 GB/hr .08=$.18/hour250$0.18=$45Total=$45.12Encoding cost=$.351.9168 GB/hr .08=$.1533/ho
26、ur5000$0.1536=$766.72Total=$767 Whats the Point?Encoding is a fraction of the overall cost of distribution.Even at modest distributions,it makes sense to encode at the highest possible quality x264-Viewer Count Breakeven-$0.04/GB36x264-Viewer Count Breakeven-$0.02/GB37As bandwidth costs drop,encodin
27、g cost matters longer(but still not that long)Best Practice 3:PresetBest practice:Balance encoding/delivery costLow distribution volumes minimize encoding cost;boost bandwidth to achieve target qualityHigh distribution volumes(hundreds of hours)maximize encoding efficient for the lowest possible bit
28、rateBest Practice 4:Optimize Thread Count for QualityWhat are threadsImpact on quality Impact on throughputRecommended for production Recommended for testingWhat Are Threads Cores-physical hardware components in CPU that execute instructions Threads-virtual components that divide tasks to be handled
29、 by the cores This computer has 2 CPUs with 16 cores Each core has two threads 64 total threadsFFmpeg can assign threads to command line.ImpactsTranscoding speedOverall throughputTo lesser degree,single file quality Whats Default?Not sure-heres recent encode on 64-core workstationEncoding only this
30、file34 threads-lets see impact on quality/throughputImpact on Quality-OverallOverallMax.52 VMAF delta-HarmonicMax 6.25 VMAF-low frameImpact on Quality-OverallOverallMax.52 VMAF delta-HarmonicMax 6.25 VMAF-low frameSingle thread can do 64 encodes on this computer(RAM permitting)64 threads can do one
31、encode on this computerUnless 64 threads is 64x faster,better to encode 64 instances simultaneouslySoccer-1-641-thread delivers best quality(no surprise)64-thread dramatically worse(big surprise)From a Quality PerspectiveLimit threads when encoding on multicore machineFor production with x264,a sing
32、le thread is always highest quality optionWhat about performance?Cost Per StreamAs instances increaseAnd threads decreaseFPS increasesUntil you oversaturate threads(32)CrashingQuality increases as wellBest Practice Threads H.264Low thread count with high instances seems to deliverBest throughput Bes
33、t quality Awful configuration for testing(files encode so slowly)I tested with eight threadsBest Practice 4:Thread CountBest practice:Balance encoding/delivery costLow distribution volumes minimize encoding cost;boost bandwidth to achieve target qualityHigh distribution volumes(hundreds of hours)max
34、imize encoding efficient for the lowest possible bitrateBonus Best Practice for AWSChoose the best CPU for H.264 processingThree Contestants AmazonAMDIntelInstancec7g.8xlargec7a.8xlargec7i.8xlargeOn Demand$1.1562$1.64224$1.428GoalsID best configuration(youve seen)ID whether going beyond CPU count is
35、 advised(to 40)ID fastest CPUID Least expensive CPUThree 32-vCPU CPUsTest from 1 instance/32-cores to 40 instances/1-core(1080p veryslow transcode)Computer cost per-hour to encode30%cheaper than AMD13%cheaper than AMDhttps:/ Was the FastestAMD delivered fastest throughput(minutes of video processed
36、per hour)This increased with the number of instancesIf youre in a hurry,use AMDGraviton was Lowest Cost Per HourGraviton output less,but cost a lot less as wellIf youre on a budget,use GravitonAnd threads decreaseFPS increasesUntil you oversaturate threads(32)CrashingQuality increases as wellAs Stat
37、ed Previously Low threads/high instances delivers:Best quality Best throughputDont go beyond cores on workstationThroughput drops-allIntel-crashed HEVC AgendaChoosing the optimal GOP sizeBenefits of a variable GOPBitrate controlChoosing a presetChoosing the optimal thread countWorking with Wavefront
38、 Parallel ProcessingBest Practice 1 HEVC Best GOP SizeWhat:GOP size(I-frame interval)is a key config option in all encodesHistoricalVery small(like.5 second)for MPEG-2Very long(10-20 seconds)for downloaded videoTypically,2-5 seconds for adaptive bitrate video Must divide evenly into segment sizeQues
39、tionHow does GOP size impact qualityTest 13 files in 4 categoriesEntertainmentSportsAnimationOfficeBest Practice 1:Longer is BetterBenefit significant at lower rangeAbout 2/3 of H.264Then diminishing returns Key limit:must divide evenly into segment size10 second copy 1/2/5/10Why not try 10?Check fo
40、r playabilityDiminishing returnsSynthetic clips(screencam,PPT)most susceptible(same as H.264)Best Practice 1:GOP SizeLonger is betterBest Practice 2:Bitrate ControlTested configurations1-Pass CBR 2-Pass(200%constrained VBR)2-Pass turbo(200%constrained VBR)Capped CRF(constant rate factor)Saves encodi
41、ng timeLow-Frame issuesBitrate highQuality highMax encoding timeNo Low-Frame issuesBitrate highQuality highSaves 14%All else goodSaves encoding timeLow-frameissuesBitrate savingsQuality dropSaves encoding timeLow-frameissuesBitrate savingsQualityLonger videos more realistic test case for capped CRF.
42、Higher quality,more bitrate savings,similar encoding savingsBottom LineCBR Only when essentialLive/tight connection bandwidths2-Pass VBRMost expensiveBest overall and low-frame quality2-Pass Turbo14%cost/time savingsNo negativesCapped CRFAlluring technology-bandwidth savings can be significant(DIY c
43、ontent adaptive technique)Overall quality goodLow-frame a concern Saves 39%encoding timeBest Practice 2:Bitrate ControlUnlike H.264,2-pass involves with substantial performance penaltyBest Practice 3 Optimal PresetControls encoding time/cost,not quality Most producers:Choose quality level(VMAF 93-95
44、/PSNR 45)and encode to match that quality levelIf lower quality preset doesnt achieve target quality,you boost the bitrateSo,preset doesnt control quality,it controls encoding cost and impacts bandwidth costChoosing a preset is always a tradeoff between encoding cost and bandwidth cost6869Two filesM
45、easure encoding timeHarmonic mean VMAFLow-frame VMAFPreset and%of maximum time/scoreWhats the best preset?HEVC-8-bit 1080p Preset70PresetBitrateEncoding timeUltrafast175%1%Superfast143%1%Veryfast169%2%faster152%2%Fast145%3%Medium137%4%Slow104%8%Slower100%30%Veryslow100%49%Placebo100%100%Bitrate2500M
46、Bytes per hour1125Cost per GB0.08Encode/hr5.5PresetEncodeBandwidth5010025050010005000Ultrafast$0.532.19$0.18$9$18$44$88$176$876Superfast$0.591.92$0.15$8$16$39$77$154$767Veryfast$0.731.69$0.13$7$14$34$68$136$675faster$0.991.41$0.11$7$12$29$57$114$564fast$1.251.40$0.11$7$12$29$57$114$563Medium$1.441.2
47、5$0.10$6$11$27$52$102$503Slow$2.081.20$0.10$7$12$26$50$98$483Slower$2.951.17$0.09$8$12$26$50$97$471Veryslow$5.501.13$0.09$10$15$28$51$96$456Placebo$21.891.13$0.09$26$31$44$67$112$473x265-1080p-Viewer Count Breakeven-$0.08/GB71At higher bandwidth costs,saving bandwidth matters more than encoding cost
48、s.Input parametersx265-1080p-Viewer Count Breakeven-$0.04/GB72Bitrate2500MBytes per hour1125Cost per GB0.04Encode/hr5.5PresetEncodeBandwidth5010025050010005000Ultrafast$0.532.19$0.09$5$9$22$44$88$438Superfast$0.591.92$0.08$4$8$20$39$77$384Veryfast$0.731.69$0.07$4$7$18$34$68$338faster$0.991.41$0.06$4
49、$7$15$29$57$282fast$1.251.40$0.06$4$7$15$29$57$282Medium$1.441.25$0.05$4$6$14$27$52$252Slow$2.081.20$0.05$4$7$14$26$50$243Slower$2.951.17$0.05$5$8$15$26$50$237Veryslow$5.501.13$0.05$8$10$17$28$51$231Placebo$21.891.13$0.05$24$26$33$44$67$247Bitrate2500MBytes per hour1125Cost per GB0.02Encode/hr5.5Pre
50、setEncodeBandwidth5010025050010005000Ultrafast$0.532.19$0.04$3$5$11$22$44$219Superfast$0.591.92$0.04$3$4$10$20$39$192Veryfast$0.731.69$0.03$2$4$9$18$34$169faster$0.991.41$0.03$2$4$8$15$29$142fast$1.251.40$0.03$3$4$8$15$29$142Medium$1.441.25$0.03$3$4$8$14$27$127Slow$2.081.20$0.02$3$4$8$14$26$122Slowe
51、r$2.951.17$0.02$4$5$9$15$26$120Veryslow$5.501.13$0.02$7$8$11$17$28$118Placebo$21.891.13$0.02$23$24$28$33$44$135x265-Viewer Count Breakeven-$0.02/GB73As bandwidth costs drop,encoding cost matters longer Bitrate2500MBytes per hour1125Cost per GB0.02Encode/hr5.5PresetEncodeBandwidth5010025050010005000U
52、ltrafast$0.532.19$0.04$3$5$11$22$44$219Superfast$0.591.92$0.04$3$4$10$20$39$192Veryfast$0.731.69$0.03$2$4$9$18$34$169faster$0.991.41$0.03$2$4$8$15$29$142fast$1.251.40$0.03$3$4$8$15$29$142Medium$1.441.25$0.03$3$4$8$14$27$127Slow$2.081.20$0.02$3$4$8$14$26$122Slower$2.951.17$0.02$4$5$9$15$26$120Veryslo
53、w$5.501.13$0.02$7$8$11$17$28$118Placebo$21.891.13$0.02$23$24$28$33$44$135x264-Viewer Count Breakeven-$0.02/GB74As bandwidth costs drop,encoding cost matters longer(but still not that long)DefaultBest Practice-PresetsRun tests on your own files(results will vary by content,resolution,etc)Perform your
54、 own calculationsIf your typical video is viewed over 10,000 times(or so),it almost always pay to use the veryslow presetPlacebo almost never delivers the best quality and almost always takes much,much longer to encodeBest Practice 4:Choose the Optimal Thread CountWhat are threadsImpact on quality I
55、mpact on throughputRecommended for production Recommended for testingImpact on Quality-OverallOverallMax.59 VMAF delta-HarmonicMax.99 VMAF-low frameFrom a Quality PerspectiveLimit threads when encoding on multicore machineFor production with x265,a single thread is always highest quality optionWhat
56、about performance?Cost Per StreamAs instances increaseAnd threads decreaseFPS increasesUntil you oversaturate threads(32)CrashingQuality increases as wellCost Per StreamAs instances increaseAnd threads decreaseFPS increasesLooks small but 45%Quality increases as wellBest Practice Threads Low thread
57、count with high instances seems to deliverBest throughput Best quality Awful configuration for testing(files encode so slowly)I tested with eight threadsBest Practice 5-Wavefront Parallel Processing(WPP)Encoding TimeVMAFLow FrameWith WPP03:1590.2377.50No WPP23:5190.4276.73Delta7.3x-0.19-0.77Question
58、Where is this additional performance coming from?Enables parallel processingHuge boost in encoding efficiencyVery slight drop in quality Wavefront Parallel Processing(WPP).WPP uses more cores;thats why its faster(32-core workstation)Compare with and without WPP on the same systemWPP enabled WPP disa
59、bledThroughput With and Without WPPBest without WPPVery slightly better qualityVery slightly better performance Simpler jobs win when the systems pushed to the edgeDefinitely system specificBottom line:Dont assume that the faster single-file solution is the best for multiple filesRun your own tests
60、CPU Utilization Different ConfigurationsThread contention limits performanceOptimal balance of utilization and capacity Wasted capacity FFmpeg default scaling is bilinearTested three other methods,best was lanczos Ffmpeg presentation:-vf scale=640360-sws_flags lanczosNot s 640 x360(which uses biline
61、ar)86https:/bit.ly/42pazmCBest Practice all:Scaling with Lanczos for Lower RungsScaling-Meridian87Scaling-Football88VMAFDefaultLanczos2K 7M88.5088.621080p 3.5 MB79.1079.121080p 1.8 MB68.7068.91720p 1 MB59.6760.06360p 500 Kbps43.2544.90Best Practice Scaling Use Lanczos Where Available89Lanczos delivers.75 VMAF improvement 1080p in Meridian(movie clip)3.76 VMAF points 360pTheres no downside encoding time isnt impacted At least with VOD presets(may be some impact live)