《Austin Hom(NIST):FRTE 視頻人臉識別(FIVE).pdf》由會員分享,可在線閱讀,更多相關《Austin Hom(NIST):FRTE 視頻人臉識別(FIVE).pdf(29頁珍藏版)》請在三個皮匠報告上搜索。
1、1NIST FACE IN VIDEO EVALUATION(FIVE)NIST FACE IN VIDEO EVALUATION(FIVE)AUSTIN HOM,PATRICK GROTHER,MEI NGAN AUSTIN HOM,PATRICK GROTHER,MEI NGAN NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGYNATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGYIFPC April 1st,2025April 1st,202521:1 VERIFICATION1:N SEARCHTWINS
2、 DISAMBIGUATIONFACE IN VIDEO 2024MORPH DETECTIONPADAGE ESTIMATIONTWO PEOPLE IN ONE FACE?SUBVERSIVE PHOTO?HOW OLD?OLD ENOUGH?SAME PERSON OR NOT?WHO?WHERE?WHEN?SAME PERSON,OR TWIN?1:N ON NON-COOP PEOPLEQUALITY+DIAGNOSTICSHOW BAD IS THIS PHOTOFRTEFACE RECOGNITIONRECOGNITION TECHNOLOGY EVALUATIONFATEFAC
3、E ANALYSIS ANALYSIS TECHNOLOGY EVALUATIONRECOGNITION:WHO IS IN AN IMAGENISTS Open BenchmarksBenchmarks are:Independent Free Regular Fast Repeatable Fair Black box IP-protecting Open globally Large-scale Sequestered datasets Statistically robust Public Transparent Extensible ABSOLUTE ACCU RELATIVE AC
4、CUANALYSIS:ABOUT IN AN IMAGE3Challenges of FR in video Pose Compound rotation of head to optical axis Resolution Range to subject Legacy camera Adverse compression for storage or transmission Motion blur But Multiple frames Challenges for FR4NIST Face in Video Evaluation(FIVE)2024Assessment of the s
5、tate-of-the-art of 1:N face recognition(FR)on video sequences(and relative improvements since FIVE 2015)Assessment of FR on videos with low qualityLow resolution including compressed video and long range imaging affected by atmospheric turbulenceElevated cameras resulting in high look-down anglePass
6、ively observed subjects who at no point face the camera directlyFace detection in wide field-of-view imageryAbsolute accuracyComparative accuracy of algorithmsComparative computational costThreshold calibration-ability to target specific false positive identification ratesGoalsRe-identificationAnoma
7、ly detectionDetection of un-cooperative actions,evasionOther modalities(e.g.,body and GAIT recognition)Clothing and other non-facial metrics 1:1 verificationOut of Scope5FIVE 2024-TimelineDateActivity2024-01-23First draft of API published2024-02-29API comments due2024-03-07Final API published2024-03
8、-11Phase 1 submission window opens2024-05-18Phase 1 submission window closes2024-06-28Phase 1 report cards distributed2024-07-01Phase 2 submission window opens2024-08-30Phase 2 submission window closes2024-10-18Phase 2 initial report cards distributed2024-12-06Phase 2 final report cards distributed2
9、025 Q2/Q3Public report published6FIVE 2024-Who Participated 11 developers from around the world submitted 31 algorithms totalCognitecCorsightDermalogGpstechvnIdemiaInnovatricsNECNeurotechnologyROCVianteVidemo7FIVE 2024 How Algorithms Are Run Dynamically-linked C+library(.so file)Run“bare metal”on Li
10、nux(Ubuntu 20.04.3)Hardware:Intel server-class CPUs (no GPUs)Hard duration limits-measured on a single CPU core Still Enrollment(face detection+feature extraction+and encoding):1.5 sec/image Video Enrollment(face detection+tracking+feature extraction+encoding):1.5 sec/frame/person Finalize Enrollmen
11、t(gallery size=10,000):4000 seconds Search(gallery size=10,000):1 second Code that crashes is rejected.8Whats In The Gallery Typical Galleries were typically composed of templates generated from still imagerytemplatetemplatetemplateEnrollment gallery9Whats In The Gallery Occasionallytemplate+templat
12、e+Enrollment galleryvideo sequencevideo sequencevideo sequencestill image9 Galleries will occasionally be composed of templates generated from a combination of multiple stills and/or video sequences of the same subject10Probes Probes were single video clips(sequence of frames)with one or more people
13、 in the scenetemplatetemplate.11RECOGNITION IS THE GOAL:NOT DETECTION,NOT TRACKINGSoftware should maximize recognition performance bydetecting person,tracking that person through time,determining which is most recognizable imagery,andextracting features/embeddingsWe do not report metrics forfalse de
14、tectionmissed detectionspatial(bounding box)location accuracytrack integrityrestoration12ALGORITHM SOFTWARE1.Detects K=1 faces2.Software extracts a set of features3.Software searches gallery producing a candidate list of fixed length L N.The value of L is an input specified by NIST via the APINIST E
15、VALUATION1.Chooses a threshold T e.g.4.02.Records a false negative error unless the candidate list includes ID=123 at or above TRepeats this for many probes and many threshold values to produce FNIR vs.T.Non-detection is immaterial if subject is(later)found correctly and identifiedtemplateScoreID4.4
16、98Marcia1.616Mei0.750Mae0.300Maria0.128Melissa0.072Marissa0.012Melani0.007JamesTPOne video,one person,one track K=1 search13templatetemplatetemplateScoreID3.142 Mary2.998 Maria1.626 Marcia0.707 Mae0.330 Mei0.198 Melissa0.074 Marissa0.016 MelaniScoreID2.901 Mary2.798 Marcia1.616 Mei0.750 Mae0.300 Mar
17、ia0.128 Melissa0.072 Marissa0.012 MelaniScoreID4.498 Marcia1.616 Mei0.750 Mae0.300 Mary0.128 Melissa0.072 Marissa0.012 Melani0.007 JamesTPFNFNALGORITHM SOFTWARE1.Even if the person is present in entire clip,as she is here,an algorithm might find the person say K=3 times(broken tracks)2.Software extr
18、acts K sets of features(aka templates)3.Software searches gallery producing K candidate lists each of fixed length L N.NIST1.Chooses a threshold T,e.g.4.02.Records a false negative error unless ANY of the K candidate lists includes ID=Marcia at or above threshold TRepeats this for many probes and ma
19、ny threshold values to produce FNIR vs.T.DISCUSSIONThis method allows tracks to be broken.NIST doesnt care about track integrity per se,only that recognition succeeds.The algorithm implementation is free to select best quality frames,to perform restoration,to perform feature level fusion,to produce
20、a template that internally contains multiple embeddings to allow score-level fusion.One Video,One Person,Multiple Tracks K 0 Searches14templateScoreID4.498 Julio1.616 Jean0.750 Jacques0.300 Julian0.128 Jesus0.072 Javier0.012 JimitemplateScoreID4.298 Julio1.516 John0.850 Jacques0.600 Julian0.428 Jaso
21、n0.172 Job0.002 JimitemplateScoreID4.498 Pedro1.616 Pierre0.750 Peter0.300 Prado0.128 Papa0.072 Paolo0.012 PaulusDISCUSSIONIf say only Julio is in the gallery,then the algorithm is rewarded for correctly finding him at some point.The scoring software does not reward twice for finding the person twic
22、e.If say the person on the right is not in the gallery,then the high score against gallery person Pedro could be accumulated into a count of false positives.That said,false positives are usually measured over sets where the gallery and probes are subject-disjoint.If the gallery is unconsolidated,and
23、 Julio is enrolled multiple times,the algorithm is rewarded for finding any occurrence of Julio in the gallery.One Video,Two Persons K 0 Searches151:N SEARCHFALSE POSITIVES IN OPERATIONShttps:/ Eduardo Medina,2023-12-21https:/ REPORTS THAT“THE SYSTEM GENERATED THOUSANDS OF FALSE-POSITIVE MATCHES”htt
24、ps:/www.ftc.gov/news-events/news/press-releases/2023/12/rite-aid-banned-using-ai-facial-recognition-after-ftc-says-retailer-deployed-technology-without16Measuring False PositivesFALSE POSITIVE IDENTIFICATION RATE Conventional to measure the false positive identification rate(FPIR)Run searches of ind
25、ividuals who are known to be absent from the enrolled galleryFPIR computed as the proportion of searches that produces one or more false positives above a threshold,T In FIVE 2024,false positive error is reported as FPIR given the availability of videos where we know the exact number of people prese
26、ntNUMBER OF FALSE POSITIVES In FIVE 2015,calculating FPIR was not possible,because the number of individuals in the search imagery was not known Instead,false positive errors were stated as the number of false positives from running searches of individuals who are known to be absent from the gallery
27、17Calculating False Positive Identification Rate Gallery containing only nonmated subjects Probe videos containing a known number of people known not to be in the gallery Any subjects who come up above threshold contribute to FPIR FPIR=#of subjects with any track with hit above threshold T(max 1 per
28、 subject in probe)/total#of subjects in probesS1ProbeN1N2N3Nonmated GalleryS218Calculating False Negative Identification Rate Gallery containing mated subjects Mated probe videos containing people known to be in the gallery,can also include unknown subjects Any subjects who does not come up above th
29、reshold contribute to FNIR FNIR=#of mated subjects with no tracks with hit above threshold T/total#of mated subjects in probesS1ProbeS1S2S3Mated Gallery S2U1U220FIVE RESULTS214)Self Boarding GateClassic chokepoint1 webcam250 actors5)In the WildPhotojournalismNot social mediaTV cameras500 actors1)Spo
30、rts Arena11 consumer cameras50 actors2)Passenger Loading Bridge10 pro cameras50 actors3)Concourse10 pro cameras50 actors6)Public SpaceMultiple professional+legacy cameras80 actors7)Luggage2 webcams375 actorsAdverse res,poseSee details on some datasets in the FIVE 2015 reportFIVE 2015 Datasets:People
31、 On The Move22DatasetSelf Boarding GateLuggageSports ArenaConcoursePublic SpaceVideo JournalismGallery Size480004800480480004800935Total video footage duration(minutes)184879952883600699#False Positives112001010100Best 20150.090.450.240.350.260.62Best 20240.000.000.030.050.010.20Miss rates across di
32、fferent datasets(best 2015 vs.best 2024)Accuracy gains since 201523FIVE 2024 ResultsAlgorithm matters!Video Journalism is difficult,because it is comprised of celebrity videos,where high yaw angles are typical,and theenrollment images are also unconstrained24Imagery collected at long range can poten
33、tially be distorted by atmospheric turbulence.Turbulence here refers to the distortions in an image caused by the movement of air due to temperature differentials.Here are examples of imagery collected at 300 meter,650 meter,and 1000 meter ranges,at low and high turbulence levels.Note that some vide
34、os collected at long range will have turbulence,some will not.Low turbulenceHigh turbulenceNew in FIVE 2024:long range with turbulence25Recognition is possible at 300m,even with high turbulenceAlgorithm matters!Recognition accuracy decreases significantly at 650m and abovePhase 2 Results:Long rangeC
35、ooperative subjectsLong lensFNIR FPIR=0.01akamiss rate with T set to have a 1%false alarms rateN=4800026100m100m200mElevation angle:18 degrees10mNew in FIVE 2024:long range and high altitude400m600mUAVUAV27Dataset Description Wide range of optical setups for probes Various ranges(close range to 1km+
36、)Various pitch angles Specialized sensors,non specialized sensors,UAVs Detailed enrollment data High quality stills at various pitch and yaw angles High quality,close range enrollment videos Random walk Structured walk Multiple different collection locations and scenarios28Longe Range Dataset,1:N Op
37、en Search(Main,Blended Gallery,Face Included)Probes:Various types:Some collected outdoors,sometimes at close distance but often at longer range distances,using pole-mounted cameras and unmanned aerial vehicles(UAVs),cameras facing down.Subjects were sometimes stationary,sometimes walked around.Gallery:Two galleries,average N=559 people.Each gallery subject has a variable amount and type of imagery,which could include any combination of videos and still imagery.29Long Range Dataset By Range30THANK YOUFRVTNIST.GOVPatrickGrother(not)Mei NganKayee Hanaoka(+Mei Ngan)AustinHomJoyceYangJimMatey