《enhancing-cyber-resilience-through-zero-trust-chaos-experiments-in-cloud-native-environments-chuan-hui-nano-chang-hou-mao-chan-xi-mo-jie-zhang-sayan-mondal-harness-rafik-harabi-sysdig.pdf》由會員分享,可在線閱讀,更多相關《enhancing-cyber-resilience-through-zero-trust-chaos-experiments-in-cloud-native-environments-chuan-hui-nano-chang-hou-mao-chan-xi-mo-jie-zhang-sayan-mondal-harness-rafik-harabi-sysdig.pdf(42頁珍藏版)》請在三個皮匠報告上搜索。
1、Enhancing Cyber Resilience Through Zero Trust Chaos Experiments in Cloud Native Environments Rafik Harabi,Senior Solutions Architect-SysdigSayan Mondal,Senior Software Engineer-HarnessWho we are?Senior Solution Architect at Sysdig,Cloud Security AdvocateFocus on Cloud Native SecurityPreviously worki
2、ng on go to Cloud programmesSenior Software Engineer II at HarnessMaintainer of LitmusChaos(CNCF Incubating)LFX MentorChaos Engineering Practitionerrafikharabirafik8_s_ayanides-ayanideAgendaCloud Native Application and Threat LandscapeChaos Engineering and Cyber ResilienceEnhance Security with Chaos
3、 EngineeringSolutions ArchitectureTooling and ArchitectureHands on demoNext stepsTakeaways Once,there was a perimeterYou had a perimeter guarded by a firewallDetecting intrusions was your breach indicator Now,there is no perimeter in the cloudCloud providers own external connections Cloud is exposed
4、 to the outside worldYou need to control access to services your team usesYou need to detect unusual activity6Cloud Native Application ArchitectureCloud Infrastructure Cloud ProviderManagementLogs&MonitoringMessaging ServiceIdentity and AccessIAMWorkloadInstanceServerlessContainersNetwork/SecurityCl
5、oud LoadBalancerSecurity GroupsAudit logsPlatformsKubernetesContainer as a ServiceDataStorageObject storageDatabaseManaged SQLCloud Application Security Challenges Dynamic attack surface,Threat actors are using your tools today,Distributed systems and microservices enlarge attack surface,Number of c
6、alls generated by distributed systems,Lack of visibility,Cloud delivery vs security process speed.Runtime architecture,CI/CD,DevOps,Environments,SecOps,Configuration Management,Version Management,Testing,Observability,Analytics,SREDevops goes to canary,etcSelf Service and Policy DrivenZero Trust env
7、ironmentManufacturing software in Cloud Native eraThe Cloud Native problemMicroservices proliferation leads to a RELIABILITY challengeCloud-native codes reliance on numerous microservices and platforms heightens failure risks.Legacy DevOpsCloud-Native DevOpsBuild one applicationEvery QuarterWeek01Sh
8、ip it.02Run it.03Build 10 x micro servicesEvery Quarter01Ship them 10 x faster.02Run in 100 x different environments03What causes Downtime?Application FailuresReputational ImpactFinancial ImpactPoor User ExperienceSlacks Outages Est.$55M in losses to WF75,000+passengers travel plans impactedInfrastr
9、ucture FailuresOperational FailuresApplication FailuresInfrastructure FailuresOperational FailuresExcessive Logging to debugToo many retriesService TimeoutDevice failuresNetwork failuresRegion not availableCapacity issuesIncident managementMonitoring dashboards not availableBad Actors Exploiting Vul
10、nerabilitiesWhat is Chaos Engineering?Chaos engineering is the process of testing a distributed computing system to ensure that it can withstand unexpected disruptions.Tech Target(https:/)“What is Cyber Resilience?Security Chaos Engineering(SCE)is a novel approach to cyber security;its core fundamen
11、tals are based on the principles of chaos engineering,though the objective is to enable cyber resiliency.“Mitigant(mitigant.io)Red Team strategiesfocus on a specific asset and have a defined scope that restricts the penetration tester.conducted periodically.Emulating specific threat actors/attack sc
12、enarios.focusing on specific attack vectors and techniques used by particular adversaries.Adversary EmulationPen TestingIntroducing controlled security failures.observe how the system responds and recovers.Ongoing practiceSecurity Chaos EngWhy Security Chaos Engineering?Security Chaos Engineering co
13、mplements traditional security practices:Proactive approach,Integrated into ongoing security practices,Providing continuous feedback and improvement.Where to practise this?Is Reliability a goal in Security?It is not a direct goal usually,but Reliability of the end product or service is being affecte
14、d while solving the other challenges.DEVELOPER PRODUCTIVITYQUALITYSPEEDAre you sure you are not compromising the reliability?How much of developer time is being spent on issues related to reliability?Have you verified that the known resilience status is intact?No new bugs being leaked into the produ
15、ct?The Chaos Engineering ProcessUse learnings to make targeted reliability improvementsChaos EngineeringRun Set of Chaos Experiments on Target SystemObserve results of experiments on target system23451Select systems to testSelect Chaos Experiments Ex:Simulate Region Goes Down,etcThe Problems in curr
16、ent solutionsFailures impacting resiliency is inevitableNot proactively managedDowntimes maybe expensiveBelieved to be just for OpsDifficult to manage chaos in CI/CDNo monitoring of impactExisting solutionsFailure Scenarios are Difficult to ImplementIsnt implemented in a safe/controlled environmentI
17、snt collaborativeNot scalableFailure Testing isnt automatedA Better SolutionSREs+DevelopersExperiments are in Git just like codeChaos engineering is collaborativeCollaborative chaos experiments in a centralized control planeOptimize initial investmentReduce the inertia for starting chaosRobust Exper
18、iments Public and private chaos hubs with ready to use experiments Find weaknesses during build/test phaseVerifying at dev stage saves moneyIntegrate into CI/CD systems Rollout automated and controlled chaos experiments across prod/non-prod environmentsMeasure the impact of inducing chaosBuild confi
19、dence by starting small Enables observability for ChaosChaos metrics used to assess impact and manage SLOs/Errors Is it really a better solution?Gaining Kernel Level VisibilityKernel-level visibility helps detecting sophisticated threats that traditional security approaches might missComprehensive S
20、ecurity CoverageEnsures comprehensive security coverage addresses potential blind spots in the current chaos engineering framework.Real Time Threat DetectionEnables faster response to potential security incidents.Customisable Rules and PoliciesFlexibility in creating customizable rules and policies
21、tailored to specific security needs and threat models.Potential ToolsLitmus Chaos is an Open Source Cloud-Native Chaos Engineering Framework with cross-cloud support.It is a CNCF Incubating project with adoption across several organizations.Falco is a cloud-native security tool designed for Linux sy
22、stems.It employs custom rules on kernel events,which are enriched with container and Kubernetes metadata,to provide real-time alertshttps:/litmuschaos.iohttps:/falco.orgHow does Litmus work?How does Litmus work?Threat Detection with FalcoFalco is an open source runtime security solution for threat d
23、etection across Kubernetes,containers,hosts and the cloud.CNCF Graduated Project(Feb.2024)7.2k50M+pullshttps:/falco.org https:/ Architecture OverviewKernelalertskernel moduleoreBPF Probeuser spacekernel spacewrite eventsringbufferread events State Engine Event Parsing Event Enrichment Rule matchingF
24、alco Ecosystem FalcosidekickHTTPSSYSCALLSPluginsK8soktaGitHubCloudtrailif priority criticalHands on Demohttps:/ Demo Applicationhttps:/ Chaos Engineering in practiceScenario 2-DNS spoofingScenario 2-Falco Detection RuleScenario 2-Video DemoScenario 1-Modify HTTP headerScenario 1-Falco Detection Rule
25、Scenario 1-Video DemoGo to the next levelRed TeamingCross functional collaborationEnhance Automation using GitOpsIntroduce feedback loopAdvanced MetricsCommunity and ecosystem shiftTakeaway Cloud-native systems exceed traditional security.Cyber-criminals exploit advancements in cloud.Learned about Z
26、ero Trust Chaos.Discovered vulnerabilities with chaos experiments.Enhanced detection and response capabilities.Gained actionable Zero Trust strategies.Further ReadingIncreased support for chaos against Non-Kubernetes infrastructure components More Application specific chaos experiments with native f
27、aults and health checks Improved Chaos SDK for creation of user-defined experiments Additional probe types for diverse steady state-hypothesis validation Improved Observability for chaos experimentsMore community supported Chaos TypesFalco training:https:/falco.org/training Litmus training:https:/v2-docs.litmuschaos.io/tutorialsThank YouScan QR for FeedbackLeave us a feedbacks_ayanide/s-ayanideContact Sayan onrafik8_/rafikharabiContact Rafik on