Cloud Operations - Other Services

A brief introduction to cloud operations services.

We'll cover the following

In the last lesson, we had a detailed overview of the Cloud logging service of the GCP. There are also other services in the operations stack to identify and fix performance issues of the applications. These services are often overlooked by developers. But these can definitely help to solve some big issues with code.

Let’s have a look at those services one by one.

Trace

Latency is the biggest concern in the world microservices. You often hear about latency issues in APIs due to multiple network hops or bad code written in the backends.

Cloud Trace helps to trace the latency of an endpoint. By default, Cloud Trace tracks the App Engine endpoints and other system event-based endpoints. To include API endpoints for an application hosted on other compute services, an agent needs to be deployed on the specific service.

To access the trace dashboard, go to the main menu> Operations > Trace.

In the overview tab, you will see some endpoints if you have any system-generated events. You can select the endpoint and drill down into the logs and the time for the endpoint.

Complete the following lab to understand more about the Trace service.

Lab

Profiler

Once you know what endpoint is taking more time, you can use the Profiler to look at the system level resources available and used by the code. Profiler helps to identify bottlenecks.

Cloud Profiler is a statistical, low-overhead profiler that continuously gathers CPU usage and memory-allocation information from your production applications.

Open main menu > Operations > Profiler.

Cloud Profiler supports different types of profiling based on the language in which a program is written. The following table summarizes the supported profile types by language:

Profile Type Go Java Node.js Python
CPU Time Yes Yes Yes
Heap Yes Yes Yes
Allocated heap Yes
Contention Yes
Threads Yes
Wall Time Yes Yes Yes

To profile applications running virtual machines, we need to install a profiler agent based on the language. The agent typically comes as a library that you attach to your application when you run it. The agent collects profiling data as the app runs.

You can complete the following lab to understand and use the profiler.

Lab

Debugger

Most of the time developers use log statements to check the flow of the code. Or sometimes apply breakpoints to stop execution at some point and analyze the variables.

Can you try the same with production systems? Not a good idea and probably, developers don’t have access to production systems. As a DevOps practitioner, you can debug live applications using the Debugger.

Since we are talking about debugging real-time with production. There are no breakpoints. The debugger has some other methods to debug the production system.

  • Snapshot: Capture the state of your application in production at a specific line location.

  • Logpoint: Inject a new logging statement on demand at a specific line location.

You can find steps to configure debugger specific to the language you are using.

Error Reporting

The last service from the operations stack is the Error Reporting. As the name suggests, error reporting is used to report crashes and errors in the applications.

It aggregates all the errors in one place.

The most important thing to remember about error reporting is you can configure notifications for errors in your project. This helps in acting upon the errors immediately.

To access error reporting open, main menu > Operations > Error reporting.

Like other services, error reporting also has its libraries to install agents on the compute service on which the application is hosted.

This is enough for the exam because you will not be asked to configure error reporting, however, you should know how to find the documentation to install the specific agent on the particular compute service.

Get hands-on with 1200+ tech skills courses.