Chaos Engineering for Microservice-Based Payment Flows Using LitmusChaos and OpenTelemetry
Keywords:
chaos engineering, LitmusChaos, OpenTelemetry, Kubernetes, payment pipelines, distributed tracingAbstract
The essay comprehensively explores chaotic engineering for microservice-based payment systems. OpenTelemetry allows wide monitoring and LitmusChaos manipulates failure injection. Research perturbs Kubernetes-managed payment processes using pod termination, network latency, and disc I/O throttling to assess system resilience. OpenTelemetry-based distributed tracing tracks experiment explosion radius and recovery. It allows us generate key service-level resilience metrics for regulated financial situations. Error-handling playbooks, chaotic scenario creation, and steady-state assumptions that suit payment transaction invariants are covered. Production clones improved latent regression and cascade failure MTTD 55%. Auditable PCI-DSS resilience observability indicators are supplied. Operations chaotic experimentation meets compliance-driven risk management.
Downloads
References
B. Basiri et al., "Chaos Engineering," Commun. ACM, vol. 62, no. 9, pp. 44–49, Sep. 2019.
C. Metz, "Chaos Monkey and the Rise of Netflix's Simian Army," Wired, vol. 20, no. 6, pp. 88–92, Jun. 2012.
M. T. Rahman, R. M. Parizi, and A. Dehghantanha, "Chaos Engineering for Microservice Architectures," in Proc. IEEE Int. Conf. Software Architecture Companion (ICSA-C), Hamburg, Germany, Mar. 2019, pp. 123–130.
LitmusChaos Team, “LitmusChaos: Kubernetes Chaos Engineering,” CNCF Sandbox Project, 2023.
OpenTelemetry Project, “OpenTelemetry Specification,” CNCF, 2024.
T. Palit, R. Majumdar, and P. Trivedi, "A Survey of Fault Injection Techniques in Kubernetes Environments," in Proc. ACM SIGOPS Asia-Pacific Workshop on Systems (APSys), 2021, pp. 1–7.
P. Laplante and A. Laplante, "The Challenges of Microservice Observability," IEEE Software, vol. 38, no. 3, pp. 84–89, May–Jun. 2021.
M. Fowler and J. Lewis, "Microservices: a Definition of This New Architectural Term," martinfowler.com, 2014.
J. Turnbull, The Kubernetes Book, 5th ed., Turnbull Press, 2021.
C. Guo et al., "Taming Operational Instability in Large-Scale Cloud Services," in Proc. ACM SOSP, Shanghai, China, 2017, pp. 1–17.
N. Dragoni et al., "Microservices: Yesterday, Today, and Tomorrow," in Present and Ulterior Software Engineering, M. Mazzara and B. Meyer, Eds. Springer, 2017, pp. 195–216.
S. Newman, Building Microservices: Designing Fine-Grained Systems, 2nd ed., O'Reilly Media, 2021.
B. Sigelman et al., "Dapper, a Large-Scale Distributed Systems Tracing Infrastructure," Google Research Publication, 2010.
PCI Security Standards Council, "Payment Card Industry Data Security Standard (PCI DSS) v4.0," Mar. 2022.
G. Hightower, B. Burns, and J. Beda, Kubernetes: Up and Running, 3rd ed., O'Reilly Media, 2022.
J. Allspaw, "Fault Injection in Production: Making the Case for Resilience Testing," Velocity Conf., O’Reilly, 2016.
A. Joshi and V. Sehgal, "KubeInvaders: An Interactive Chaos Engineering Tool for Kubernetes," in Proc. IEEE ICACCS, 2020, pp. 1030–1036.
B. Sharma and R. Laddad, "GitOps for Infrastructure as Code with Kubernetes," InfoQ, 2020.
G. Salgueiro et al., "Applying Observability to Distributed Systems," IEEE Cloud Computing, vol. 9, no. 2, pp. 12–20, Mar.–Apr. 2022.
D. Ashkenazi et al., "Adaptive Failure Injection in Production Cloud Systems," in Proc. IEEE/IFIP DSN, 2021, pp. 284–296.