Observability Engineering

Author: Charity Majors
Publisher: "O'Reilly Media, Inc."
ISBN: 1492076414
Category : Computers
Languages : en
Pages : 321

Book Description
Observability is critical for building, changing, and understanding the software that powers complex modern systems. Teams that adopt observability are much better equipped to ship code swiftly and confidently, identify outliers and aberrant behaviors, and understand the experience of each and every user. This practical book explains the value of observable systems and shows you how to practice observability-driven development. Authors Charity Majors, Liz Fong-Jones, and George Miranda from Honeycomb explain what constitutes good observability, show you how to improve upon what youÃ¢??re doing today, and provide practical dos and don'ts for migrating from legacy tooling, such as metrics monitoring and log management. YouÃ¢??ll also learn the impact observability has on organizational culture (and vice versa). You'll explore: How the concept of observability applies to managing software systems The value of practicing observability when delivering and managing complex cloud native applications and systems The impact observability has across the entire software development lifecycle How and why different functional teams use observability with service-level objectives (SLOs) How to instrument your code to help future engineers understand the code you wrote today How to produce quality code for context-aware system debugging and maintenance How data-rich analytics can help you debug elusive issues quickly

Site Reliability Engineering

Author: Niall Richard Murphy
Publisher: "O'Reilly Media, Inc."
ISBN: 1491951176
Category :
Languages : en
Pages : 552

Book Description
The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems? In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization. This book is divided into four sections: Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE) Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems Management—Explore Google's best practices for training, communication, and meetings that your organization can use

Database Reliability Engineering

Author: Laine Campbell
Publisher: "O'Reilly Media, Inc."
ISBN: 149192621X
Category : Computers
Languages : en
Pages : 309

Book Description
The infrastructure-as-code revolution in IT is also affecting database administration. With this practical book, developers, system administrators, and junior to mid-level DBAs will learn how the modern practice of site reliability engineering applies to the craft of database architecture and operations. Authors Laine Campbell and Charity Majors provide a framework for professionals looking to join the ranks of today’s database reliability engineers (DBRE). You’ll begin by exploring core operational concepts that DBREs need to master. Then you’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval. With a firm foundation in database reliability engineering, you’ll be ready to dive into the architecture and operations of any modern database. This book covers: Service-level requirements and risk management Building and evolving an architecture for operational visibility Infrastructure engineering and infrastructure management How to facilitate the release management process Data storage, indexing, and replication Identifying datastore characteristics and best use cases Datastore architectural components and data-driven architectures

BPF Performance Tools

Author: Brendan Gregg
Publisher: Addison-Wesley Professional
ISBN: 0136624588
Category : Computers
Languages : en
Pages : 2525

Book Description
Use BPF Tools to Optimize Performance, Fix Problems, and See Inside Running Systems BPF-based performance tools give you unprecedented visibility into systems and applications, so you can optimize performance, troubleshoot code, strengthen security, and reduce costs. BPF Performance Tools: Linux System and Application Observability is the definitive guide to using these tools for observability. Pioneering BPF expert Brendan Gregg presents more than 150 ready-to-run analysis and debugging tools, expert guidance on applying them, and step-by-step tutorials on developing your own. You’ll learn how to analyze CPUs, memory, disks, file systems, networking, languages, applications, containers, hypervisors, security, and the kernel. Gregg guides you from basic to advanced tools, helping you generate deeper, more useful technical insights for improving virtually any Linux system or application. • Learn essential tracing concepts and both core BPF front-ends: BCC and bpftrace • Master 150+ powerful BPF tools, including dozens created just for this book, and available for download • Discover practical strategies, tips, and tricks for more effective analysis • Analyze compiled, JIT-compiled, and interpreted code in multiple languages: C, Java, bash shell, and more • Generate metrics, stack traces, and custom latency histograms • Use complementary tools when they offer quick, easy wins • Explore advanced tools built on BPF: PCP and Grafana for remote monitoring, eBPF Exporter, and kubectl-trace for tracing Kubernetes • Foreword by Alexei Starovoitov, creator of the new BPF BPF Performance Tools will be an indispensable resource for all administrators, developers, support staff, and other IT professionals working with any recent Linux distribution in any enterprise or cloud environment.

Chaos Engineering

Author: Casey Rosenthal
Publisher: "O'Reilly Media, Inc."
ISBN: 1492043818
Category : Computers
Languages : en
Pages : 312

Book Description
As more companies move toward microservices and other distributed technologies, the complexity of these systems increases. You can't remove the complexity, but through Chaos Engineering you can discover vulnerabilities and prevent outages before they impact your customers. This practical guide shows engineers how to navigate complex systems while optimizing to meet business goals. Two of the field's prominent figures, Casey Rosenthal and Nora Jones, pioneered the discipline while working together at Netflix. In this book, they expound on the what, how, and why of Chaos Engineering while facilitating a conversation from practitioners across industries. Many chapters are written by contributing authors to widen the perspective across verticals within (and beyond) the software industry. Learn how Chaos Engineering enables your organization to navigate complexity Explore a methodology to avoid failures within your application, network, and infrastructure Move from theory to practice through real-world stories from industry experts at Google, Microsoft, Slack, and LinkedIn, among others Establish a framework for thinking about complexity within software systems Design a Chaos Engineering program around game days and move toward highly targeted, automated experiments Learn how to design continuous collaborative chaos experiments

Distributed Tracing in Practice

Author: Austin Parker
Publisher: O'Reilly Media
ISBN: 149205660X
Category : Computers
Languages : en
Pages : 330

Book Description
Most applications today are distributed in some fashion. Monitoring the health and performance of these distributed architectures requires a new approach. Enter distributed tracing, a method of profiling and monitoring applications—especially those that use microservice architectures. There’s just one problem: distributed tracing can be hard. But it doesn’t have to be. With this practical guide, you’ll learn what distributed tracing is and how to use it to understand the performance and operation of your software. Key players at Lightstep walk you through instrumenting your code for tracing, collecting the data that your instrumentation produces, and turning it into useful, operational insights. If you want to start implementing distributed tracing, this book tells you what you need to know. You’ll learn: The pieces of a distributed tracing deployment: Instrumentation, data collection, and delivering value Best practices for instrumentation (the methods for generating trace data from your service) How to deal with or avoid overhead, costs, and sampling How to work with spans (the building blocks of request-based distributed traces) and choose span characteristics that lead to valuable traces Where distributed tracing is headed in the future

Feedback Systems

Author: Karl Johan Åström
Publisher: Princeton University Press
ISBN: 069121347X
Category : Technology & Engineering
Languages : en
Pages :

Book Description
The essential introduction to the principles and applications of feedback systems—now fully revised and expanded This textbook covers the mathematics needed to model, analyze, and design feedback systems. Now more user-friendly than ever, this revised and expanded edition of Feedback Systems is a one-volume resource for students and researchers in mathematics and engineering. It has applications across a range of disciplines that utilize feedback in physical, biological, information, and economic systems. Karl Åström and Richard Murray use techniques from physics, computer science, and operations research to introduce control-oriented modeling. They begin with state space tools for analysis and design, including stability of solutions, Lyapunov functions, reachability, state feedback observability, and estimators. The matrix exponential plays a central role in the analysis of linear control systems, allowing a concise development of many of the key concepts for this class of models. Åström and Murray then develop and explain tools in the frequency domain, including transfer functions, Nyquist analysis, PID control, frequency domain design, and robustness. Features a new chapter on design principles and tools, illustrating the types of problems that can be solved using feedback Includes a new chapter on fundamental limits and new material on the Routh-Hurwitz criterion and root locus plots Provides exercises at the end of every chapter Comes with an electronic solutions manual An ideal textbook for undergraduate and graduate students Indispensable for researchers seeking a self-contained resource on control theory

Systems Performance

Author: Brendan Gregg
Publisher: Pearson Education
ISBN: 0133390098
Category : Business & Economics
Languages : en
Pages : 777

Book Description
The Complete Guide to Optimizing Systems Performance Written by the winner of the 2013 LISA Award for Outstanding Achievement in System Administration Large-scale enterprise, cloud, and virtualized computing systems have introduced serious performance challenges. Now, internationally renowned performance expert Brendan Gregg has brought together proven methodologies, tools, and metrics for analyzing and tuning even the most complex environments. Systems Performance: Enterprise and the Cloud focuses on Linux(R) and Unix(R) performance, while illuminating performance issues that are relevant to all operating systems. You'll gain deep insight into how systems work and perform, and learn methodologies for analyzing and improving system and application performance. Gregg presents examples from bare-metal systems and virtualized cloud tenants running Linux-based Ubuntu(R), Fedora(R), CentOS, and the illumos-based Joyent(R) SmartOS(TM) and OmniTI OmniOS(R). He systematically covers modern systems performance, including the "traditional" analysis of CPUs, memory, disks, and networks, and new areas including cloud computing and dynamic tracing. This book also helps you identify and fix the "unknown unknowns" of complex performance: bottlenecks that emerge from elements and interactions you were not aware of. The text concludes with a detailed case study, showing how a real cloud customer issue was analyzed from start to finish. Coverage includes - Modern performance analysis and tuning: terminology, concepts, models, methods, and techniques - Dynamic tracing techniques and tools, including examples of DTrace, SystemTap, and perf - Kernel internals: uncovering what the OS is doing - Using system observability tools, interfaces, and frameworks - Understanding and monitoring application performance - Optimizing CPUs: processors, cores, hardware threads, caches, interconnects, and kernel scheduling - Memory optimization: virtual memory, paging, swapping, memory architectures, busses, address spaces, and allocators - File system I/O, including caching - Storage devices/controllers, disk I/O workloads, RAID, and kernel I/O - Network-related performance issues: protocols, sockets, interfaces, and physical connections - Performance implications of OS and hardware-based virtualization, and new issues encountered with cloud computing - Benchmarking: getting accurate results and avoiding common mistakes This guide is indispensable for anyone who operates enterprise or cloud environments: system, network, database, and web admins; developers; and other professionals. For students and others new to optimization, it also provides exercises reflecting Gregg's extensive instructional experience.

Mastering Distributed Tracing

Author: Yuri Shkuro
Publisher: Packt Publishing Ltd
ISBN: 1788627598
Category : Computers
Languages : en
Pages : 445

Book Description
Understand how to apply distributed tracing to microservices-based architectures Key FeaturesA thorough conceptual introduction to distributed tracingAn exploration of the most important open standards in the spaceA how-to guide for code instrumentation and operating a tracing infrastructureBook Description Mastering Distributed Tracing will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. The rise of Internet-scale companies, like Google and Amazon, ushered in a new era of distributed systems operating on thousands of nodes across multiple data centers. Microservices increased that complexity, often exponentially. It is harder to debug these systems, track down failures, detect bottlenecks, or even simply understand what is going on. Distributed tracing focuses on solving these problems for complex distributed systems. Today, tracing standards have developed and we have much faster systems, making instrumentation less intrusive and data more valuable. Yuri Shkuro, the creator of Jaeger, a popular open-source distributed tracing system, delivers end-to-end coverage of the field in Mastering Distributed Tracing. Review the history and theoretical foundations of tracing; solve the data gathering problem through code instrumentation, with open standards like OpenTracing, W3C Trace Context, and OpenCensus; and discuss the benefits and applications of a distributed tracing infrastructure for understanding, and profiling, complex systems. What you will learnHow to get started with using a distributed tracing systemHow to get the most value out of end-to-end tracingLearn about open standards in the spaceLearn about code instrumentation and operating a tracing infrastructureLearn where distributed tracing fits into microservices as a core functionWho this book is for Any developer interested in testing large systems will find this book very revealing and in places, surprising. Every microservice architect and developer should have an insight into distributed tracing, and the book will help them on their way. System administrators with some development skills will also benefit. No particular programming language skills are required, although an ability to read Java, while non-essential, will help with the core chapters.

Platform Engineering for Architects

Author: Max Körbächer
Publisher: Packt Publishing Ltd
ISBN: 1836203586
Category : Computers
Languages : en
Pages : 374

Book Description
Design and build Internal Developer Platforms (IDPs) with future-oriented design strategies, using the Platform as a Product mindset Key Features Learn how to design platforms that create value and drive user adoption Benefit from expert techniques for shifting to a product-centric mindset as an architect and platform team Implement best practices to understand platform complexity, manage technical debt, and ensure its evolution Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionThe rapid pace of technological advancements, the shortage of IT talent, and the complexity of modern systems highlight the need for structured guidance in building resilient, user-centric platforms for cloud-native environments. This book empowers platform engineers and architects to implement value-driven internal development platforms. You’ll learn how to identify end users, understand their challenges, and define the purpose of a platform, with a focus on self-service solutions for modern cloud-native software development, delivery, and operations. The book incorporates real-world examples of building platforms within and for the cloud, leveraging the power of Kubernetes. You’ll learn how adopting a product mindset for architecting and building platforms helps foster successful platform engineering teams. This emphasizes early end-user involvement and provides a framework that gives you the flexibility to easily adapt and extend for future use cases. The book also offers insights into building a sustainable platform without accumulating technical debt. By the end of this book, you’ll be able to drive the design, definition, and implementation of platform capabilities as a product that aligns with your organizational requirements and strategy.What you will learn Make informed decisions based on your organization's platform needs Identify missing platform capabilities and technical debt Develop a critical user journey through your platform capabilities Define the purpose, principles, and key performance indicators (KPIs) for your platform Utilize relevant data points for making data-driven product decisions Implement your own platform reference and target architectures Who this book is for This book is for platform architects and solutions architects seeking to enhance their skills in designing and building a platform as a product. It also offers valuable insights for decision-makers, platform engineers, and DevOps professionals. While familiarity with cloud-native concepts, CI/CD, and Kubernetes is beneficial, the book builds on these topics to address self-service, cost management, and technical debt. It’s particularly suited to experts tackling the challenge of integrating diverse domains to create effective internal developer platforms with top-notch operational readiness.