Understanding Multi-Agent Remote Debugging for Java

Photo by Nubelson Fernandes on Unsplash

As the use of complex architecture and dynamic deployments increases, the term ‘remote debugging’ instills fears into the bravest of dev hearts. Many are comfortable following the trusted path of debugging on their own machine. However, developers and traditional tools are only adequately equipped for solving debugging issues on monolith applications. Microservice architectures, on the other hand, are becoming increasingly complex. 

With the rise of new techniques and architectures, debugging techniques need to become complex. Thus, tools also need to be more sophisticated to perform remote debugging and multi-agent remote debugging. 

Introduction to Remote Debugging

As the name says, debugging ( de+bugging) is removing the bugs or the process of identifying or locating the abnormalities and removing those from the computer program. Debugging is one of the biggest nightmares if you don’t use the proper tools and plugins. But it is also a must-have skill for every developer. Debugging helps you understand the flow of the program code.  

Troubleshooting the problem in the application may be easy, but if you are debugging in a remote system, especially in production, it is not such an easy task. Usually, the production servers run in a strict environment where not all tools and plugins are available.

Java Configuration for Remote Debugging

Java has a set of well-defined interfaces and the set of rules to do the debugging or remote debugging.  The Java Platform Debugger Architecture (JPDA) includes three well designed APIs / Interfaces for implementing the customer debuggers 

  1. Java Virtual Machine Tool Interface to interact with the application running in Java Virtual Machine
  2. Then an extensible set of APIs that are part of the Java Debug Wire Protocol (JDWP), a special debugging protocol used between the application and the debugger tool
  3. The last one is Java Debug Interface JDI which is used to implement the debugger application

JDWP Options

Transport: This defines the transport mechanism to use. dt_shmem only works on windows, and both the debuggee and debugger process runs on the same machine. dt_socket  is compatible with all platforms and allows for the process to take place on different machines (remote debugging).

Server: This is not a mandatory option. When this flag is on, it means that the JVM will listen for a debugger to attach to it on the address specified in the address option.

Suspend: This defines whether the JVM should wait for the debugger to attach.

Address: It contains the address and the port exposed by the debuggee.

The following is a sample command with the JDWP options to enable the debugger:

java -agentlib:jdwp=transport=dt_socket,server=y, suspend=y,address=

Java Debugger

JDB, the Java debugger, is included in the JDK. To start the debug session, one needs to attach the JDB to the port where the JDWP is running.

jdb -attach

Multi-Agent Remote Debugging

Developers are moving from monolith architectures towards serverless, microservices architectures. With an increase in the use of producer-consumer, microservice, serverless, and distributed architectures, applications are being deployed in multiple instances, including bare-metal, virtual machine, and container pods. 

Debugging multiple application instances is hard due to their distributed environment and concurrent, adaptive, mobile, and heterogeneous nature. The toolset we use for debugging the multi-agents should be able to trace the exceptions, logs of the applications, and the metrics and traces, and all these have always been defined during the development stage. 

Sophisticated tools are required to find bugs in the production environment and resolve possible defects in such programs.

In traditional multi-agent debugging, illustrated in the picture, the developer has to connect to every environment whenever they need to debug and follow the process, as shown in the figure below.

Tools for Multi-Agent Remote Debugging

We understand the need for tools for multi-agent remote debugging. It’s not an easy task to debug the application deployed in multiple servers using the JDWP and Java debugger.

Additionally, when using JDWP and the Java debugger to debug, the exception trace, logs, and metrics required for debugging the system must be identified and added during development. If you attempt production debugging in such an architecture, you will have to sift through many log files, often write additional logs in your application, and then redeploy and restart the application to get the additional data from the server to get the trace of the error. This process is not only time-consuming, but it also needs more patience and practice. These modern and complex requirements and the software architecture bring the gap between the dev ( debug) environment and the production environment.

Classic debugging might yield results, but remote debugging and multi-agent remote debugging done right reduce significant time and headache. With the current trend, it seems very likely that remote debugging and multi-agent remote debugging will soon become necessities.

Multi-Agent Remote Debugging is Broken—Fix it

Multi-agent remote debugging is the same as localhost debugging. However, the application is deployed on multiple machines, and machine access is restricted because the production runs in a tight environment. Connecting to multiple machines and collecting logs from all the machines is not an easy task.

There are newer tools that offer solutions to this problem. One such tool is Lightrun. It is a developer-native observatory platform that integrates into the developer workflow to seamlessly debug applications. It allows you to troubleshoot and debug without having to rebuild the application.

Lightrun is one of those rare platforms that are able to provide a hundred percent code visibility to its developers. There is absolute visibility irrespective of the environment and the deployment model, be it a monolith application or the microservices architecture 

With the help of Lightrun, you have easy access to the exceptions occurring in the production environment as they occur and also the capability to add logs, metrics, and traces to the production code on the fly. 

How I Started Using Lightrun

Installing and using Lightrun is very easy. I am running the cloud edition. 

Firstly, I signed up for it and completed the process; the backend is the service running in their cloud.

Then, I installed the Lightrun plugin from the marketplace in my Intellij Idea ultimate, which acts as my debugger client. I then logged into Lightrun to communicate with the backend.

And then agents: I have my agent running on multiple servers. A few are running in Windows, a few in Linux Virtual Machines in the cloud, and very few in Docker containers.

Lightrun supports all the major environments like Linux, Alpine, Mac, and Windows (WSL), so it can be added to your Docker container as well.

I installed the agent in all the nodes and added LIGHTRUN_KEY with the API Key of my account. So the agents also can communicate with the backend.

The agents are automatically listed in the Lightrun window of my IDE. I can filter it by tags. Below is the screenshot filtered using the ‘Production’ tag.

To debug, right-click on the code, choose Lightrun, choose the log, snapshot, or metric based on your requirement, and run the application. If any of the agents hit the breakpoint, the backend will do the mentioned operation. You have the option to run on a single agent or with all the agents with the tag.


Using the legacy tools built for monolith architectures will reduce developer performance and result in huge delays in applications releases for any organization. Tools and plugins need to evolve along with the software architecture and deployment methodologies. Investing in the right tools ensures that you stay ahead of the curve and operate in the most effective and efficient manner.

Also published on Medium.

This site uses Akismet to reduce spam. Learn how your comment data is processed.