Development and programming tools are used to build frameworks, and they can be used for creating, debugging, and maintaining programs — and much more. The resources in this Zone cover topics such as compilers, database management systems, code editors, and other software tools and can help ensure engineers are writing clean code.
Top 10 Essential Linux Commands
Build a Time-Tracking App With ClickUp API Integration Using Openkoda
The Advantages of Elastic APM for Observing the Tested Environment My first use of the Elastic Application Performance Monitoring (Elastic APM) solution coincides with projects that were developed based on microservices in 2019 for the projects on which I was responsible for performance testing. At that time (2019) the first versions of Elastic APM were released. I was attracted by the easy installation of agents, the numerous protocols supported by the Java agent (see Elastic supported technologies) including the Apache HttpClient used in JMeter and other languages (Go, .NET, Node.js, PHP, Python, Ruby), and the quality of the dashboard in Kibana for the APM. I found the information displayed in the Kibana APM dashboards to be relevant and not too verbose. The Java agent monitoring is simple but displays essential information on the machine's OS and JVM. The open-source aspect and the free solution for the main functions of the tool were also decisive. I generalize the use of the Elastic APM solution in performance environments for all projects. With Elastic APM, I have the timelines of the different calls and exchanges between web services, the SQL queries executed, the exchange of messages by JMS file, and monitoring. I also have quick access to errors or exceptions thrown in Java applications. Why Integrate Elastic APM in Apache JMeter By adding Java APM Agents to web applications, we find the services called timelines in the Kibana dashboards. However, we remain at a REST API call level mainly, because we do not have the notion of a page. For example, page PAGE01 will make the following API calls: /rest/service1 /rest/service2 /rest/service3 On another page, PAGE02 will make the following calls: /rest/service2 /rest/service4 /rest/service5 /rest/service6 The third page, PAGE03, will make the following calls: /rest/service1 /rest/service2 /rest/service4 In this example, service2 is called on 3 different pages and service4 in 2 pages. If we look in the Kibana dashboard for service2, we will find the union of the calls of the 3 calls corresponding to the 3 pages, but we don't have the notion of a page. We cannot answer "In this page, what is the breakdown of time in the different REST calls," because for a user of the application, the notion of page response time is important. The goal of the jmeter-elastic-apm tool is to add the notion of an existing page in JMeter in the Transaction Controller. This starts in JMeter by creating an APM transaction, and then propagating this transaction identifier (traceparent) with the Elastic agent to an HTTP REST request to web services because the APM Agent recognizes the Apache HttpClient library and can instrument it. In the HTTP request, the APM Agent will add the identifier of the APM transaction to the header of the HTTP request. The headers added are traceparent and elastic-apm-traceparent. We start from the notion of the page in JMeter (Transaction Controller) to go to the HTTP calls of the web application (gestdoc) hosted in Tomcat. In the case of an application composed of multi-web services, we will see in the timeline the different web services called in HTTP(s) or JMS and the time spent in each web service. This is an example of technical architecture for a performance test with Apache JMeter and Elastic APM Agent to test a web application hosted in Apache Tomcat. How the jmeter-elastic-apm Tool Works jmeter-elastic-apm adds Groovy code before a JMeter Transaction Controller to create an APM transaction before a page. In the JMeter Transaction Controller, we find HTTP samplers that make REST HTTP(s) calls to the services. The Elastic APM Agent automatically adds a new traceparent header containing the identifier of the APM transaction because it recognizes the Apache HttpClient of the HTTP sampler. The Groovy code terminates the APM transaction to indicate the end of the page. The jmeter-elastic-apm tool automates the addition of Groovy code before and after the JMeter Transaction Controller. The jmeter-elastic-apm tool is open source on GitHub (see link in the Conclusion section of this article). This JMeter script is simple with 3 pages in 3 JMeter Transaction Controllers. After launching the jmeter-elastic-apm action ADD tool, the JMeter Transaction Controllers are surrounded by Groovy code to create an APM transaction before the JMeter Transaction Controller and close the APM transaction after the JMeter Transaction Controller. In the “groovy begin transaction apm” sampler, the Groovy code calls the Elastic APM API (simplified version): Groovy Transaction transaction = ElasticApm.startTransaction(); Scope scope = transaction.activate(); transaction.setName(transactionName); // contains JMeter Transaction Controller Name In the “groovy end transaction apm” sampler, the groovy code calls the ElasticApm API (simplified version): Groovy transaction.end(); Configuring Apache JMeter With the Elastic APM Agent and the APM Library Start Apache JMeter With Elastic APM Agent and Elastic APM API Library Declare the Elastic APM Agent URLto find the APM Agent: Add the ELASTIC APM Agent somewhere in the filesystem (could be in the <JMETER_HOME>\lib but not mandatory). In <JMETER_HOME>\bin, modify the jmeter.bat or setenv.bat. Add Elastic APM configuration like so: Shell set APM_SERVICE_NAME=yourServiceName set APM_ENVIRONMENT=yourEnvironment set APM_SERVER_URL=http://apm_host:8200 set JVM_ARGS=-javaagent:<PATH_TO_AGENT_APM_JAR>\elastic-apm-agent-<version>.jar -Delastic.apm.service_name=%APM_SERVICE_NAME% -Delastic.apm.environment=%APM_ENVIRONMENT% -Delastic.apm.server_urls=%APM_SERVER_URL% 2. Add the Elastic APM library: Add the Elastic APM API library to the <JMETER_HOME>\lib\apm-agent-api-<version>.jar. This library is used by JSR223 Groovy code. Use this URL to find the APM library. Recommendations on the Impact of Adding Elastic APM in JMeter The APM Agent will intercept and modify all HTTP sampler calls, and this information will be stored in Elasticsearch. It is preferable to voluntarily disable the HTTP request of static elements (images, CSS, JavaScript, fonts, etc.) which can generate a large number of requests but are not very useful in analyzing the timeline. In the case of heavy load testing, it's recommended to change the elastic.apm.transaction_sample_rate parameter to only take part of the calls so as not to saturate the APM Server and Elasticsearch. This elastic.apm.transaction_sample_rate parameter can be declared in <JMETER_HOME>\jmeter.bat or setenv.bat but also in a JSR223 sampler with a short Groovy code in a setUp thread group. Groovy code records only 50% samples: Groovy import co.elastic.apm.api.ElasticApm; // update elastic.apm.transaction_sample_rate ElasticApm.setConfig("transaction_sample_rate","0.5"); Conclusion The jmeter-elastic-apm tool allows you to easily integrate the Elastic APM solution into JMeter and add the notion of a page in the timelines of Kibana APM dashboards. Elastic APM + Apache JMeter is an excellent solution for understanding how the environment works during a performance test with simple monitoring, quality dashboards, time breakdown timelines in the different distributed application layers, and the display of exceptions in web services. Over time, the Elastic APM solution only gets better. I strongly recommend it, of course, in a performance testing context, but it also has many advantages in the context of a development environment used for developers or integration used by functional or technical testers. Links Command Line Tool jmeter-elastic-apm JMeter plugin elastic-apm-jmeter-plugin Elastic APM Guides: APM Guide or Application performance monitoring (APM)
Serverless architectures have emerged as a paradigm-shifting approach to building, fast, scalable, and cost-efficient applications. While serverless architectures provide unparalleled flexibility, they also introduce new challenges in terms of monitoring and troubleshooting. In this article, we'll explore how Quarkus integrates with AWS X-Ray and how using a Jakarta CDI Interceptor can keep your code clean while adding custom instrumentation. Quarkus and AWS Lambda Quarkus is a Java-based framework tailored for GraalVM and HotSpot, which results in an amazingly fast boot time while having an incredibly low memory footprint. It offers near-instant scale-up and high-density memory utilization, which can be very useful for container orchestration platforms like Kubernetes or Serverless runtimes like AWS Lambda. Building AWS Lambda Functions can be as easy as starting a Quarkus project, adding the quarkus-amazon-lambda dependency, and defining your AWS Lambda Handler function. XML <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-amazon-lambda</artifactId> </dependency> An extensive guide on how to develop AWS Lambda Functions with Quarkus can be found in the official Quarkus AWS Lambda Guide. Enabling X-Ray for Your Lambda Functions Quarkus provides out-of-the-box support for X-Ray, but you will need to add a dependency to your project and configure some settings to make it work with GraalVM/native compiled Quarkus applications. Let's first start by adding the quarkus-amazon-lambda-xray dependency. XML <!-- adds dependency on required x-ray classes and adds support for graalvm native --> <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-amazon-lambda-xray</artifactId> </dependency> Don't forget to enable tracing for your Lambda function otherwise, it won't work. An example of doing that is by setting the tracing argument to active within your AWS CDK code. Java function = Function.Builder.create(this, "feed-parsing-function") ... .memorySize(512) .tracing(Tracing.ACTIVE) .runtime(Runtime.PROVIDED_AL2023) .logRetention(RetentionDays.ONE_WEEK) .build(); After the deployment of your function and a function invocation, you should be able to see the X-Ray traces from within the Cloudwatch interface. By default, it will show you some basic timing information for your function like the initialization and the invocation duration. Adding More Instrumentation Now that the dependencies are in place and tracing is enabled for our function, we can enrich the traces in X-Ray by leveraging the X-Ray SDKs TracingIntercepter . For instance, for the SQS and DynamoDB client, you can explicitly set the intercepter inside the application.properties file. Plain Text quarkus.dynamodb.async-client.type=aws-crt quarkus.dynamodb.interceptors=com.amazonaws.xray.interceptors.TracingInterceptor quarkus.sqs.async-client.type=aws-crt quarkus.sqs.interceptors=com.amazonaws.xray.interceptors.TracingInterceptor After putting these properties in place, redeploying, and executing the function, the TracingIntercepter will wrap around each API call to SQS and DynamoDB and store the actual trace information alongside the trace. This is very useful for debugging purposes as it will allow you to validate your code and check for any mistakes. Requests to AWS Services are part of the pricing model, so if you make a mistake in your code and you make too many calls, it can become quite costly. Custom Subsegments With the AWS SDK TracingInterceptor configured, we get information about the calls to the AWS APIs, but what if we want to see information about our own code or remote calls to services outside of AWS? The Java SDK for X-Ray supports the concept of adding custom subsegments to your traces. You can add subsegments to a trace by adding a few lines of code to your own business logic as you can see in the following code snippet. Java public void someMethod(String argument) { // wrap in subsegment Subsegment subsegment = AWSXRay.beginSubsegment("someMethod"); try { // Your business logic } catch (Exception e) { subsegment.addException(e); throw e; } finally { AWSXRay.endSubsegment(); } } Although this is trivial to do, it will become quite messy if you have a lot of methods you want to apply tracing to. This isn't ideal, and it would be better if we didn't have to mix our own code with the X-Ray instrumentation. Quarkus and Jakarta CDI Interceptors The Quarkus programming model is based on the Lite version of the Jakarta Contexts and Dependency Injection 4.0 specification. Besides dependency injection, the specification also describes other features like: Lifecycle Callbacks — A bean class may declare lifecycle @PostConstruct and @PreDestroy callbacks. Interceptors — Used to separate cross-cutting concerns from business logic. Decorators — Similar to interceptors, but because they implement interfaces with business semantics, they are able to implement business logic. Events and Observers — Beans may also produce and consume events to interact in a completely decoupled fashion. As mentioned, CDI Interceptors are used to separate cross-cutting concerns from business logic. As tracing is a cross-cutting concern, this sounds like a great fit. Let's take a look at how we can create an interceptor for our AWS X-Ray instrumentation. How to Create an Interceptor for AWS X-Ray Instrumentation We start with defining our interceptor binding, which we will call XRayTracing. Interceptor bindings are intermediate annotations that may be used to associate interceptors with target beans. Java package com.jeroenreijn.aws.quarkus.xray; import jakarta.annotation.Priority; import jakarta.interceptor.InterceptorBinding; import java.lang.annotation.Retention; import static java.lang.annotation.RetentionPolicy.RUNTIME; @InterceptorBinding @Retention(RUNTIME) @Priority(0) public @interface XRayTracing { } The next step is to define the actual Interceptor logic, which is the code that will add the additional X-Ray instructions for creating the subsegment and wrapping it around our business logic. Java package com.jeroenreijn.aws.quarkus.xray; import com.amazonaws.xray.AWSXRay; import jakarta.interceptor.AroundInvoke; import jakarta.interceptor.Interceptor; import jakarta.interceptor.InvocationContext; @Interceptor @XRayTracing public class XRayTracingInterceptor { @AroundInvoke public Object tracingMethod(InvocationContext ctx) throws Exception { AWSXRay.beginSubsegment("## " + ctx.getMethod().getName()); try { return ctx.proceed(); } catch (Exception e) { AWSXRay.getCurrentSubsegment().addException(e); throw e; } finally { AWSXRay.endSubsegment(); } } } An important part of the interceptor is the @AroundInvoke annotation, which means that this interceptor code will be wrapped around the invocation of our own business logic. Now that we've defined both our interceptor binding and our interceptor, it's time to start using it. Every method that we want to create a subsegment for can now be annotated with the @XRayTracing annotation. Java @XRayTracing public SyndFeed getLatestFeed() { InputStream feedContent = getFeedContent(); return getSyndFeed(feedContent); } @XRayTracing public SyndFeed getSyndFeed(InputStream feedContent) { try { SyndFeedInput feedInput = new SyndFeedInput(); return feedInput.build(new XmlReader(feedContent)); } catch (FeedException | IOException e) { throw new RuntimeException(e); } } That looks much better. Pretty clean, if I say so myself. Based on the hierarchy of subsegments for a trace, X-Ray will be able to show a nested tree structure with the timing information. Closing Thoughts The integration between Quarkus and X-Ray is quite simple to enable. The developer experience is really good out of the box with defining the interceptors on a per-client basis. With the help of CDI interceptors, you can keep your code clean without worrying too much about X-Ray-specific code inside your business logic. An alternative to building your own Interceptor might be to start using AWS PowerTools for Lambda (Java). Powertools for Java is a great way to boost your developer productivity, but it can be used for more than X-Ray, so I’ll save it for another post.
Wireshark, the free, open-source packet sniffer and network protocol analyzer, has cemented itself as an indispensable tool in network troubleshooting, analysis, and security (on both sides). This article delves into the features, uses, and practical tips for harnessing the full potential of Wireshark, expanding on aspects that may have been glossed over in discussions or demonstrations. Whether you're a developer, security expert, or just curious about network operations, this guide will enhance your understanding of Wireshark and its applications. Introduction to Wireshark Wireshark was initially developed by Eric Rescorla and Gerald Combs, and designed to capture and analyze network packets in real-time. Its capabilities extend across various network interfaces and protocols, making it a versatile tool for anyone involved in networking. Unlike its command-line counterpart, tcpdump, Wireshark's graphical interface simplifies the analysis process, presenting data in a user-friendly "proto view" that organizes packets in a hierarchical structure. This facilitates quick identification of protocols, ports, and data flows. The key features of Wireshark are: Graphical User Interface (GUI): Eases the analysis of network packets compared to command-line tools Proto view: Displays packet data in a tree structure, simplifying protocol and port identification Compatibility: Supports a wide range of network interfaces and protocols Browser Network Monitors FireFox and Chrome contain a far superior network monitor tool built into them. It is superior because it is simpler to use and works with secure websites out of the box. If you can use the browser to debug the network traffic you should do that. In cases where your traffic requires low-level protocol information or is outside of the browser, Wireshark is the next best thing. Installation and Getting Started To begin with Wireshark, visit their official website for the download. The installation process is straightforward, but attention should be paid to the installation of command-line tools, which may require separate steps. Upon launching Wireshark, users are greeted with a selection of network interfaces as seen below. Choosing the correct interface, such as the loopback for local server debugging, is crucial for capturing relevant data. When debugging a Local Server (localhost), use the loopback interface. Remote servers will probably fit with the en0 network adapter. You can use the activity graph next to the network adapter to identify active interfaces for capture. Navigating Through Noise With Filters One of the challenges of using Wireshark is the overwhelming amount of data captured, including irrelevant "background noise" as seen in the following image. Wireshark addresses this with powerful display filters, allowing users to hone in on specific ports, protocols, or data types. For instance, filtering TCP traffic on port 8080 can significantly reduce unrelated data, making it easier to debug specific issues. Notice that there is a completion widget on top of the Wireshark UI that lets you find out the values more easily. In this case, we filter by port tcp.port == 8080 which is the port used typically in Java servers (e.g., Spring Boot/tomcat). But this isn't enough as HTTP is more concise. We can filter by protocol by adding http to the filter which narrows the view to HTTP requests and responses as shown in the following image. Deep Dive Into Data Analysis Wireshark excels in its ability to dissect and present network data in an accessible manner. For example, HTTP responses carrying JSON data are automatically parsed and displayed in a readable tree structure as seen below. This feature is invaluable for developers and analysts, providing insights into the data exchanged between clients and servers without manual decoding. Wireshark parses and displays JSON data within the packet analysis pane. It offers both hexadecimal and ASCII views for raw packet data. Beyond Basic Usage While Wireshark's basic functionalities cater to a wide range of networking tasks, its true strength lies in advanced features such as ethernet network analysis, HTTPS decryption, and debugging across devices. These tasks, however, may involve complex configuration steps and a deeper understanding of network protocols and security measures. There are two big challenges when working with Wireshark: HTTPS decryption: Decrypting HTTPS traffic requires additional configuration but offers visibility into secure communications. Device debugging: Wireshark can be used to troubleshoot network issues on various devices, requiring specific knowledge of network configurations. The Basics of HTTPS Encryption HTTPS uses the Transport Layer Security (TLS) or its predecessor, Secure Sockets Layer (SSL), to encrypt data. This encryption mechanism ensures that any data transferred between the web server and the browser remains confidential and untouched. The process involves a series of steps including handshake, data encryption, and data integrity checks. Decrypting HTTPS traffic is often necessary for developers and network administrators to troubleshoot communication errors, analyze application performance, or ensure that sensitive data is correctly encrypted before transmission. It's a powerful capability in diagnosing complex issues that cannot be resolved by simply inspecting unencrypted traffic or server logs. Methods for Decrypting HTTPS in Wireshark Important: Decrypting HTTPS traffic should only be done on networks and systems you own or have explicit permission to analyze. Unauthorized decryption of network traffic can violate privacy laws and ethical standards. Pre-Master Secret Key Logging One common method involves using the pre-master secret key to decrypt HTTPS traffic. Browsers like Firefox and Chrome can log the pre-master secret keys to a file when configured to do so. Wireshark can then use this file to decrypt the traffic: Configure the browser: Set an environment variable (SSLKEYLOGFILE) to specify a file where the browser will save the encryption keys. Capture traffic: Use Wireshark to capture the traffic as usual. Decrypt the traffic: Point Wireshark to the file with the pre-master secret keys (through Wireshark's preferences) to decrypt the captured HTTPS traffic. Using a Proxy Another approach involves routing traffic through a proxy server that decrypts HTTPS traffic and then re-encrypts it before sending it to the destination. This method might require setting up a dedicated decryption proxy that can handle the TLS encryption/decryption: Set up a decryption proxy: Tools like Mitmproxy or Burp Suite can act as an intermediary that decrypts and logs HTTPS traffic. Configure network to route through proxy: Ensure the client's network settings route traffic through the proxy. Inspect Traffic: Use the proxy's tools to inspect the decrypted traffic directly. Integrating tcpdump With Wireshark for Enhanced Network Analysis While Wireshark offers a graphical interface for analyzing network packets, there are scenarios where using it directly may not be feasible due to security policies or operational constraints. tcpdump, a powerful command-line packet analyzer, becomes invaluable in these situations, providing a flexible and less intrusive means of capturing network traffic. The Role of tcpdump in Network Troubleshooting tcpdump allows for the capture of network packets without a graphical user interface, making it ideal for use in environments with strict security requirements or limited resources. It operates under the principle of capturing network traffic to a file, which can then be analyzed at a later time or on a different machine using Wireshark. Key Scenarios for tcpdump Usage High-security environments: In places like banks or government institutions where running network sniffers might pose a security risk, tcpdump offers a less intrusive alternative. Remote servers: Debugging issues on a cloud server can be challenging with Wireshark due to the graphical interface; tcpdump captures can be transferred and analyzed locally. Security-conscious customers: Customers may be hesitant to allow third-party tools to run on their systems; tcpdump's command-line operation is often more palatable. Using tcpdump Effectively Capturing traffic with tcpdump involves specifying the network interface and an output file for the capture. This process is straightforward but powerful, allowing for detailed analysis of network interactions: Command syntax: The basic command structure for initiating a capture involves specifying the network interface (e.g., en0 for wireless connections) and the output file name. Execution: Once the command is run, tcpdump silently captures network packets. The capture continues until it's manually stopped, at which point the captured data can be saved to the specified file. Opening captures in Wireshark: The file generated by tcpdump can be opened in Wireshark for detailed analysis, utilizing Wireshark's advanced features for dissecting and understanding network traffic. The following shows the tcpdump command and its output: $ sudo tcpdump -i en0 -w output Password: tcpdump: listening on en, link-type EN10MB (Ethernet), capture size 262144 bytes ^C3845 packets captured 4189 packets received by filter 0 packets dropped by kernel Challenges and Considerations Identifying the correct network interface for capture on remote systems might require additional steps, such as using the ifconfig command to list available interfaces. This step is crucial for ensuring that relevant traffic is captured for analysis. Final Word Wireshark stands out as a powerful tool for network analysis, offering deep insights into network traffic and protocols. Whether it's for low-level networking work, security analysis, or application development, Wireshark's features and capabilities make it an essential tool in the tech arsenal. With practice and exploration, users can leverage Wireshark to uncover detailed information about their networks, troubleshoot complex issues, and secure their environments more effectively. Wireshark's blend of ease of use with profound analytical depth ensures it remains a go-to solution for networking professionals across the spectrum. Its continuous development and wide-ranging applicability underscore its position as a cornerstone in the field of network analysis. Combining tcpdump's capabilities for capturing network traffic with Wireshark's analytical prowess offers a comprehensive solution for network troubleshooting and analysis. This combination is particularly useful in environments where direct use of Wireshark is not possible or ideal. While both tools possess a steep learning curve due to their powerful and complex features, they collectively form an indispensable toolkit for network administrators, security professionals, and developers alike. This integrated approach not only addresses the challenges of capturing and analyzing network traffic in various operational contexts but also highlights the versatility and depth of tools available for understanding and securing modern networks. Videos Wireshark tcpdump
The amount of data generated by modern systems has become a double-edged sword for security teams. While it offers valuable insights, sifting through mountains of logs and alerts manually to identify malicious activity is no longer feasible. Here's where rule-based incident detection steps in, offering a way to automate the process by leveraging predefined rules to flag suspicious activity. However, the choice of tool for processing high-volume data for real-time insights is crucial. This article delves into the strengths and weaknesses of two popular options: Splunk, a leading batch search tool, and Flink, a powerful stream processing framework, specifically in the context of rule-based security incident detection. Splunk: Powerhouse Search and Reporting Splunk has become a go-to platform for making application and infrastructure logs readily available for ad-hoc search. Its core strength lies in its ability to ingest log data from various sources, centralize it, and enable users to explore it through powerful search queries. This empowers security teams to build comprehensive dashboards and reports, providing a holistic view of their security posture. Additionally, Splunk supports scheduled searches, allowing users to automate repetitive queries and receive regular updates on specific security metrics. This can be particularly valuable for configuring rule-based detections, monitoring key security indicators, and identifying trends over time. Flink: The Stream Processing Champion Apache Flink, on the other hand, takes a fundamentally different approach. It is a distributed processing engine designed to handle stateful computations over unbounded and bounded data streams. Unlike Splunk's batch processing, Flink excels at real-time processing, enabling it to analyze data as it arrives, offering near-instantaneous insights. This makes it ideal for scenarios where immediate detection and response are paramount, such as identifying ongoing security threats or preventing fraudulent transactions in real time. Flink's ability to scale horizontally across clusters makes it suitable for handling massive data volumes, a critical factor for organizations wrestling with ever-growing security data. Case Study: Detecting User Login Attacks Let's consider a practical example: a rule designed to detect potential brute-force login attempts. This rule aims to identify users who experience a high number of failed login attempts within a specific timeframe (e.g., an hour). Here's how the rule implementation would differ in Splunk and Flink: Splunk Implementation sourcetype=login_logs (result="failure" OR "failed") | stats count by user within 1h | search count > 5 | alert "Potential Brute Force Login Attempt for user: $user$" This Splunk search query filters login logs for failed attempts, calculates the count of failed attempts per user within an hour window, and then triggers an alert if the count exceeds a predefined threshold (5). While efficient for basic detection, it relies on batch processing, potentially introducing latency in identifying ongoing attacks. Flink Implementation SQL SELECT user, COUNT(*) AS failed_attempts FROM login_logs WHERE result = 'failure' OR result = 'failed' GROUP BY user, TUMBLE(event_time, INTERVAL '1 HOUR') HAVING failed_attempts > 5; Flink takes a more real-time approach. As each login event arrives, Flink checks the user and result. If it's a failed attempt, a counter for that user's window (1 hour) is incremented. If the count surpasses the threshold (5) within the window, Flink triggers an alert. This provides near-instantaneous detection of suspicious login activity. A Deep Dive: Splunk vs. Flink for Detecting User Login Attacks The underlying processing models of Splunk and Flink lead to fundamental differences in how they handle security incident detection. Here's a closer look at the key areas: Batch vs. Stream Processing Splunk Splunk operates on historical data. Security analysts write search queries that retrieve and analyze relevant logs. These queries can be configured to run periodically automatically. This is a batch processing approach, meaning Splunk needs to search through potentially a large volume of data to identify anomalies or trends. For the login attempt example, Splunk would need to query all login logs within the past hour every time the search is run to calculate the failed login count per user. This can introduce significant latency in detecting, and increase the cost of compute, especially when dealing with large datasets. Flink Flink analyzes data streams in real-time. As each login event arrives, Flink processes it immediately. This stream-processing approach allows Flink to maintain a continuous state and update it with each incoming event. In the login attempt scenario, Flink keeps track of failed login attempts per user within a rolling one-hour window. With each new login event, Flink checks the user and result. If it's a failed attempt, the counter for that user's window is incremented. This eliminates the need to query a large amount of historical data every time a check is needed. Windowing Splunk Splunk performs windowing calculations after retrieving all relevant logs. In our example, the search stats count by user within 1h retrieves all login attempts within the past hour and then calculates the count for each user. This approach can be inefficient for real-time analysis, especially as data volume increases. Flink Flink maintains a rolling window and continuously updates the state based on incoming events. Flink uses a concept called "time windows" to partition the data stream into specific time intervals (e.g., one hour). For each window, Flink keeps track of relevant information, such as the number of failed login attempts per user. As new data arrives, Flink updates the state for the current window. This eliminates the need for a separate post-processing step to calculate windowed aggregations. Alerting Infrastructure Splunk Splunk relies on pre-configured alerting actions within the platform. Splunk allows users to define search queries that trigger alerts when specific conditions are met. These alerts can be delivered through various channels such as email, SMS, or integrations with other security tools. Flink Flink might require integration with external tools for alerts. While Flink can identify anomalies in real time, it may not have built-in alerting functionalities like Splunk. Security teams often integrate Flink with external Security Information and Event Management (SIEM) solutions for alert generation and management. In essence, Splunk operates like a detective sifting through historical evidence, while Flink functions as a security guard constantly monitoring activity. Splunk is a valuable tool for forensic analysis and identifying historical trends. However, for real-time threat detection and faster response times, Flink's stream processing capabilities offer a significant advantage. Choosing the Right Tool: A Balancing Act While Splunk provides a user-friendly interface and simplifies rule creation, its batch processing introduces latency, which can be detrimental to real-time security needs. Flink excels in real-time processing and scalability, but it requires more technical expertise to set up and manage. Beyond Latency and Ease of Use: Additional Considerations The decision between Splunk and Flink goes beyond just real-time processing and ease of use. Here are some additional factors to consider: Data Volume and Variety Security teams are often overwhelmed by the sheer volume and variety of data they need to analyze. Splunk excels at handling structured data like logs but struggles with real-time ingestion and analysis of unstructured data like network traffic or social media feeds. Flink, with its distributed architecture, can handle diverse data types at scale. Alerting and Response Both Splunk and Flink can trigger alerts based on rule violations. However, Splunk integrates seamlessly with existing Security Information and Event Management (SIEM) systems, streamlining the incident response workflow. Flink might require additional development effort to integrate with external alerting and response tools. Cost Splunk's licensing costs are based on data ingestion volume, which can become expensive for organizations with massive security data sets. Flink, being open-source, eliminates licensing fees. However, the cost of technical expertise for setup, maintenance, and rule development for Flink needs to be factored in. The Evolving Security Landscape: A Hybrid Approach The security landscape is constantly evolving, demanding a multifaceted approach. Many organizations find value in a hybrid approach, leveraging the strengths of both Splunk and Flink. Splunk as the security hub: Splunk can serve as a central repository for security data, integrating logs from various sources, including real-time data feeds from Flink. Security analysts can utilize Splunk's powerful search capabilities for historical analysis, threat hunting, and investigation. Flink for real-time detection and response: Flink can be deployed for real-time processing of critical security data streams, focusing on identifying and responding to ongoing threats. This combination allows security teams to enjoy the benefits of both worlds: Comprehensive security visibility: Splunk provides a holistic view of historical and current security data. Real-time threat detection and response: Flink enables near-instantaneous identification and mitigation of ongoing security incidents. Conclusion: Choosing the Right Tool for the Job Neither Splunk nor Flink is a one-size-fits-all solution for rule-based incident detection. The optimal choice depends on your specific security needs, data volume, technical expertise, and budget. Security teams should carefully assess these factors and potentially consider a hybrid approach to leverage the strengths of both Splunk and Flink for a robust and comprehensive security posture. By understanding the strengths and weaknesses of each tool, security teams can make informed decisions about how to best utilize them to detect and respond to security threats in a timely and effective manner.
Businesses can react quickly and effectively to user behavior patterns by using real-time analytics. This allows them to take advantage of opportunities that might otherwise pass them by and prevent problems from getting worse. Apache Kafka, a popular event streaming platform, can be used for real-time ingestion of data/events generated from various sources across multiple verticals such as IoT, financial transactions, inventory, etc. This data can then be streamed into multiple downstream applications or engines for further processing and eventual analysis to support decision-making. Apache Flink serves as a powerful engine for refining or enhancing streaming data by modifying, enriching, or restructuring it upon arrival at the Kafka topic. In essence, Flink acts as a downstream application that continuously consumes data streams from Kafka topics for processing, and then ingests the processed data into various Kafka topics. Eventually, Apache Druid can be integrated to consume the processed streaming data from Kafka topics for analysis, querying, and making instantaneous business decisions. Click here for an enlarged view In my previous write-up, I explained how to integrate Flink 1.18 with Kafka 3.7.0. In this article, I will outline the steps to transfer processed data from Flink 1.18.1 to a Kafka 2.13-3.7.0 topic. A separate article detailing the ingestion of streaming data from Kafka topics into Apache Druid for analysis and querying was published a few months ago. You can read it here. Execution Environment We configured a multi-node cluster (three nodes) where each node has a minimum of 8 GB RAM and 250 GB SSD along with Ubuntu-22.04.2 amd64 as the operating system. OpenJDK 11 is installed with JAVA_HOME environment variable configuration on each node. Python 3 or Python 2 along with Perl 5 is available on each node. A three-node Apache Kafka-3.7.0 cluster has been up and running with Apache Zookeeper -3.5.6. on two nodes. Apache Druid 29.0.0 has been installed and configured on a node in the cluster where Zookeeper has not been installed for the Kafka broker. Zookeeper has been installed and configured on the other two nodes. The Leader broker is up and running on the node where Druid is running. Developed a simulator using the Datafaker library to produce real-time fake financial transactional JSON records every 10 seconds of interval and publish them to the created Kafka topic. Here is the JSON data feed generated by the simulator. JSON {"timestamp":"2024-03-14T04:31:09Z ","upiID":"9972342663@ybl","name":"Kiran Marar","note":" ","amount":"14582.00","currency":"INR","geoLocation":"Latitude: 54.1841745 Longitude: 13.1060775","deviceOS":"IOS","targetApp":"PhonePe","merchantTransactionId":"ebd03de9176201455419cce11bbfed157a","merchantUserId":"65107454076524@ybl"} Extract the archive of the Apache Flink-1.18.1-bin-scala_2.12.tgz on the node where Druid and the leader broker of Kafka are not running Running a Streaming Job in Flink We will dig into the process of extracting data from a Kafka topic where incoming messages are being published from the simulator, performing processing tasks on it, and then reintegrating the processed data back into a different topic of the multi-node Kafka cluster. We developed a Java program (StreamingToFlinkJob.java) that was submitted as a job to Flink to perform the above-mentioned steps, considering a window of 2 minutes and calculating the average amount transacted from the same mobile number (upi id) on the simulated UPI transactional data stream. The following list of jar files has been included on the project build or classpath. Using the code below, we can get the Flink execution environment inside the developed Java class. Java Configuration conf = new Configuration(); StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf); Now we should read the messages/stream that has already been published by the simulator to the Kafka topic inside the Java program. Here is the code block. Java KafkaSource kafkaSource = KafkaSource.<UPITransaction>builder() .setBootstrapServers(IKafkaConstants.KAFKA_BROKERS)// IP Address with port 9092 where leader broker is running in cluster .setTopics(IKafkaConstants.INPUT_UPITransaction_TOPIC_NAME) .setGroupId("upigroup") .setStartingOffsets(OffsetsInitializer.latest()) .setValueOnlyDeserializer(new KafkaUPISchema()) .build(); To retrieve information from Kafka, setting up a deserialization schema within Flink is crucial for processing events in JSON format, converting raw data into a structured form. Importantly, setParallelism needs to be set to no.of Kafka topic partitions else the watermark won't work for the source, and data is not released to the sink. Java DataStream<UPITransaction> stream = env.fromSource(kafkaSource, WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofMinutes(2)), "Kafka Source").setParallelism(1); With successful event retrieval from Kafka, we can enhance the streaming job by incorporating processing steps. The subsequent code snippet reads Kafka data, organizes it by mobile number (upiID), and computes the average price per mobile number. To accomplish this, we developed a custom window function for calculating the average and implemented watermarking to handle event time semantics effectively. Here is the code snippet: Java SerializableTimestampAssigner<UPITransaction> sz = new SerializableTimestampAssigner<UPITransaction>() { @Override public long extractTimestamp(UPITransaction transaction, long l) { try { SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'"); Date date = sdf.parse(transaction.eventTime); return date.getTime(); } catch (Exception e) { return 0; } } }; WatermarkStrategy<UPITransaction> watermarkStrategy = WatermarkStrategy.<UPITransaction>forBoundedOutOfOrderness(Duration.ofMillis(100)).withTimestampAssigner(sz); DataStream<UPITransaction> watermarkDataStream = stream.assignTimestampsAndWatermarks(watermarkStrategy); //Instead of event time, we can use window based on processing time. Using TumblingProcessingTimeWindows DataStream<TransactionAgg> groupedData = watermarkDataStream.keyBy("upiId").window(TumblingEventTimeWindows.of(Time.milliseconds(2500), Time.milliseconds(500))).sum("amount"); .apply(new TransactionAgg()); Eventually, the processing logic (computation of average price for the same UPI ID based on a mobile number for the window of 2 minutes on the continuous flow of transaction stream) is executed inside Flink. Here is the code block for the Window function to calculate the average amount on each UPI ID or mobile number. Java public class TransactionAgg implements WindowFunction<UPITransaction, TransactionAgg, Tuple, TimeWindow> { @Override public void apply(Tuple key, TimeWindow window, Iterable<UPITransaction> values, Collector<TransactionAgg> out) { Integer sum = 0; //Consider whole number int count = 0; String upiID = null ; for (UPITransaction value : values) { sum += value.amount; upiID = value.upiID; count++; } TransactionAgg output = new TransactionAgg(); output.upiID = upiID; output.eventTime = window.getEnd(); output.avgAmount = (sum / count); out.collect( output); } } We have processed the data. The next step is to serialize the object and send it to a different Kafka topic. Add a KafkaSink in the developed Java code (StreamingToFlinkJob.java) to send the processed data from the Flink engine to a different Kafka topic created on the multi-node Kafka cluster. Here is the code snippet to serialize the object before sending/publishing it to the Kafka topic: Java public class KafkaTrasactionSinkSchema implements KafkaRecordSerializationSchema<TransactionAgg> { @Override public ProducerRecord<byte[], byte[]> serialize( TransactionAgg aggTransaction, KafkaSinkContext context, Long timestamp) { try { return new ProducerRecord<>( topic, null, // not specified partition so setting null aggTransaction.eventTime, aggTransaction.upiID.getBytes(), objectMapper.writeValueAsBytes(aggTransaction)); } catch (Exception e) { throw new IllegalArgumentException( "Exception on serialize record: " + aggTransaction, e); } } } And, below is the code block to sink the processed data sending back to a different Kafka topic. Java KafkaSink<TransactionAgg> sink = KafkaSink.<TransactionAgg>builder() .setBootstrapServers(IKafkaConstants.KAFKA_BROKERS) .setRecordSerializer(new KafkaTrasactionSinkSchema(IKafkaConstants.OUTPUT_UPITRANSACTION_TOPIC_NAME)) .setDeliveryGuarantee(DeliveryGuarantee.AT_LEAST_ONCE) .build(); groupedData.sinkTo(sink); // DataStream that created above for TransactionAgg env.execute(); Connecting Druid With Kafka Topic In this final step, we need to integrate Druid with the Kafka topic to consume the processed data stream that is continuously published by Flink. With Apache Druid, we can directly connect Apache Kafka so that real-time data can be ingested continuously and subsequently queried to make business decisions on the spot without interventing any third-party system or application. Another beauty of Apache Druid is that we need not configure or install any third-party UI application to view the data that landed or is published to the Kafka topic. To condense this article, I omitted the steps for integrating Druid with Apache Kafka. However, a few months ago, I published an article on this topic (linked earlier in this article). You can read it and follow the same approach. Final Note The provided code snippet above is for understanding purposes only. It illustrates the sequential steps of obtaining messages/data streams from a Kafka topic, processing the consumed data, and eventually sending/pushing the modified data into a different Kafka topic. This allows Druid to pick up the modified data stream for query, analysis as a final step. Later, we will upload the entire codebase on GitHub if you are interested in executing it on your own infrastructure. I hope you enjoyed reading this. If you found this article valuable, please consider liking and sharing it.
Kubernetes has pretty much become synonymous with container orchestration, and all competition is either assimilated (or rewritten to be based on Kubernetes, see Openshift), or basically disappeared into the void (sorry Nomad). Does that mean that development will slow down now? Or is it the beginning of something even bigger? Maybe Kubernetes is on the verge of becoming a generic name, just like Kleenex, or maybe a verb like “to google.” Years ago, somebody asked me in an interview what I think of Docker, and if I see a future for containerization. At the time my answer was quick and easy. First of all, containerization wasn’t a new concept. BSD, Solaris, as well as other systems, had them for years. They were kind of new to Linux though (at least in a widespread fashion), so they were here to stay. It was the next logical evolutionary step to virtualization. Docker, however, at least in my mind, was different. Towards Docker, I simply answered, “It’s the best tool we have today, but I hope it won’t be the final solution we can come up with.” While Docker turned around and is just coming back, the tooling we use today is unanimously built upon the specs defined by the open container initiative (OCI) and its OCI image format. So what will the future hold for Kubernetes? Is it going to stay or is it going to step into the abyss and will it “just be another platform that was replaced by something else,” as Michael Levan wrote in The Future of Kubernetes. The Rise of Kubernetes Jokingly, when looking up container orchestration in the dictionary, you’ll probably find the synonym Kubernetes, but it took Kubernetes about a decade to get to where it is today. Initially built by Google on the key learnings of Borg, Google’s internal container orchestration platform, Kubernetes was released in September 2014. By the release of Kubernetes, Borg itself was already over a decade old. By 2013 many of the original team members of the Borg started to look into the next step. Project 7 was born. At its release, Kubernetes was still using Docker underneath. A combination that most probably helped elevate Kubernetes' popularity. Docker was extremely popular at the time, but people started to find insufficiencies when trying to organize and run large numbers of containers. Kubernetes was about to fix it. With its concepts of building blocks, independently deployed and the actors (or agents) it was easy to extend but still understandable. In addition, resources are declarative by nature (written as JSON or YAML files), which enables version control of those definitions. Ever since the release, Kubernetes enabled more and more use cases, hence more companies started using it. I think a major step for the adoption was the release of Helm in 2016 which simplified the deployment process for more complex applications and enabled an “out of the box” experience. Now Kubernetes was “easy” (don’t quote me on easy though!). Today every cloud provider and their mothers offer a managed Kubernetes environment. Due to a set of standard interfaces, those services are mostly interchangeable. One of the big benefits of Kubernetes. Anyhow, it’s only mostly. The small inconsistencies and implementations or performance differences may give you quite the time of your life. Not a good one though. But let’s call it “run everywhere” because it mostly is. A great overview of the full history of Kubernetes, with all major milestones, can be found in The History of Kubernetes by Ferenc Hámori. Kubernetes, Love or Hate When we look into the community, opinions on Kubernetes diverge. Many people point out the internal (yet hidden) complexity of Kubernetes. A complexity that only increases with new features and additional functionality (also third-party) being added. This complexity is real, a reason why folks like Kelsey Hightower or Abdelfettah Sghiouar call Kubernetes a platform to build platforms (listen to our Cloud Commute podcast episode with Abdelfettah), meaning it should be used by the cloud providers (or company internal private cloud teams) to build a platform for container deployment, but it shouldn’t be used by the developers or just everyone. However, Kelsey also claimed that Kubernetes is a good place to start, not the endgame. Kelsey Hightower on Kubernetes being a platform to build platforms On the other end of the spectrum, you have people who refer to Kubernetes as the operating system of the cloud area. And due to the extensibility and feature richness, they’re probably not too far off. Modern operating systems have mostly one job, abstract away the underlying hardware and its features. That said, Kubernetes abstracts away many aspects of the cloud infrastructure and the operational processes necessary to run containers. In that sense, yes, Kubernetes is probably a Cloud OS. Especially since we started to see implementations of the Kubernetes running on operating systems other than Linux. Looking at you Microsoft. If you’re interested in learning more about the idea of Kubernetes as a Cloud OS, Natan Yellin from Robusta Dev wrote a very insightful article named Why Kubernetes will sustain the next 50 years. What Are the Next Steps? The most pressing question for Kubernetes as it stands today, is what will be next? Where will it evolve? Are we close to the end of the line? Looking back at Borg, a decade in, Google decided it was time to reiterate on orchestration and build upon the lessons learned. Kubernetes is about to hit its 10-year anniversary soon. So does that mean it’s time for another iteration? Many features in Kubernetes, such as secrets, were fine 10 years ago. Today we know that an encoded “secret” is certainly not enough. Temporary user accounts, OIDC, and similar technologies can and are integrated into Kubernetes already, increasing the complexity of it. Looking beyond Kubernetes, technology always runs in three phases, the beginning or adoption phase, the middle where everyone “has” to use it, and the end, where companies start to phase it out. Personally, I feel that Kubernetes is at its height right now, standing right in the middle. That doesn’t give us any prediction about the time frame for hitting the end though. At the moment it looks like Kubernetes will keep going and growing for a while. It doesn’t show any signs of slowing down. Other technologies, like micro virtual machines, using kata containers or Firecracker, are becoming more popular, offering higher isolation (hence security), but aren’t as efficient. The important element though, they offer a CRI-compatible interface. Meaning, they can be used as an alternative runtime underneath Kubernetes. In the not-too-distant future, I see Kubernetes offering multiple runtime environments, just as it offers multiple storage solutions today. Enabling running simple services in normal containers, but moving services with higher isolation needs to micro VMs. And there are other interesting developments, based on Kubernetes, too. Edgeless Systems implements a confidential computing solution, provided as a Kubernetes distribution named Constellation. Confidential computing makes use of CPU and GPU features that help to hardware-encrypt memory, not only for the whole system memory space, but per virtual machine, or even per container. That enables a whole set of new use cases, with end-to-end encryption for highly confidential calculations and data processes. While it’s possible to use it outside Kubernetes, the orchestration and operation benefits of running those calculations inside containers, make them easy to deploy and update. If you want to learn more about Constellation, we had Moritz Eckert from Edgeless Systems in our podcast not too long ago. Future or Fad? So, does Kubernetes have a bright future and will stand for the next 50 years, or will we realize that it is not what we’re looking for very soon-ish? If somebody would ask me today what I think about Kubernetes, I think I would answer similarly to my Docker answer. It is certainly the best tool we have today, making it the to-go container orchestration tool of today. Its ever-increasing complexity makes it hard to see the same in the future though. I think there are a lot of new lessons learned again. It’s probably time for a new iteration. Not today, not tomorrow, but somewhere in the next few years. Maybe this new iteration isn’t an all-new tool, but Kubernetes 2.0, who knows - but something has to change. Technology doesn’t stand still, the (container) world is different from what it was 10 years ago. If you asked somebody at the beginning of containerization, it was all about how containers have to be stateless, and what we do today. We deploy databases into Kubernetes, and we love it. Cloud-nativeness isn’t just stateless anymore, but I’d argue a good one-third of the container workloads may be stateful today (with ephemeral or persistent state), and it will keep increasing. The beauty of orchestration, automatic resource management, self-healing infrastructure, and everything in between is just too incredible to not use for “everything.” Anyhow, whatever happens to Kubernetes itself (maybe it will become an orchestration extension of the OCI?!), I think it will disappear from the eyes of the users. It (or its successor) will become the platform to build container runtime platforms. But to make that happen, debug features need to be made available. At the moment you have to look way too deep into Kubernetes or agent logs to find out and fix issues. The one who never had to find out why a Let’s Encrypt certificate isn’t updating may raise a hand now. To bring it to a close, Kubernetes certainly isn’t a fad, but I strongly hope it's not going to be our future either. At least not in its current incarnation.
Debugging is not just about identifying errors — it's about instituting a reliable process for ensuring software health and longevity. In this post, we discuss the role of software testing in debugging, including foundational concepts and how they converge to improve software quality. As a side note, if you like the content of this and the other posts in this series check out my Debugging book that covers this subject. If you have friends that are learning to code I'd appreciate a reference to my Java Basics book. If you want to get back to Java after a while check out my Java 8 to 21 book. The Intersection of Debugging and Testing Debugging and testing play distinct roles in software development. Debugging is the targeted process of identifying and fixing known bugs. Testing, on the other hand, encompasses an adjacent scope, identifying unknown issues by validating expected software behavior across a variety of scenarios. Both are a part of the debug fix cycle which is a core concept in debugging. Before we cover the cycle we should first make sure we're aligned on the basic terminology. Unit Tests Unit tests are tightly linked to debugging efforts, focusing on isolated parts of the application—typically individual functions or methods. Their purpose is to validate that each unit operates correctly in isolation, making them a swift and efficient tool in the debugging arsenal. These tests are characterized by their speed and consistency, enabling developers to run them frequently, sometimes even automatically as code is written within the IDE. Since software is so tightly bound it is nearly impossible to compose unit tests without extensive mocking. Mocking involves substituting a genuine component with a stand-in that returns predefined results, thus a test method can simulate scenarios without relying on the actual object. This is a powerful yet controversial tool. By using mocking we're in effect creating a synthetic environment that might misrepresent the real world. We're reducing the scope of the test which might perpetuate some bugs. Integration Tests Opposite to unit tests, integration tests examine the interactions between multiple units, providing a more comprehensive picture of the system's health. While they cover broader scenarios, their setup can be more complex due to the interactions involved. However, they are crucial in catching bugs that arise from the interplay between different software components. In general, mocking can be used in integration tests but it is discouraged. They take longer to run and are sometimes harder to set up. However, many developers (myself included) would argue that they are the only benchmark for quality. Most bugs express themselves in the seams between the modules and integration tests are better at detecting that. Since they are far more important some developers would argue that unit tests are unnecessary. This isn't true, unit test failures are much easier to read and understand. Since they are faster we can run them during development, even while typing. In that sense, the balance between the two approaches is the important part. Coverage Coverage is a metric that helps quantify the effectiveness of testing by indicating the proportion of code exercised by tests. It helps identify potential areas of the code that have not been tested, which could harbor undetected bugs. However, striving for 100% coverage can be a case of diminishing returns; the focus should remain on the quality and relevance of the tests rather than the metric itself. In my experience, chasing high coverage numbers often results in bad test practices that persist in problems. It is my opinion that unit tests should be excluded from coverage metrics due to the importance of integration tests to overall quality. To get a sense of quality coverage should focus on integration and end-to-end tests. The Debug-Fix Cycle The debug-fix cycle is a structured approach that integrates testing into the debugging process. The stages include identifying the bug, creating a test that reproduces the bug, fixing the bug, verifying the fix with the test, and finally, running the application to ensure the fix works in the live environment. This cycle emphasizes the importance of testing in not only identifying but also in preventing the recurrence of bugs. Notice that this is a simplified version of the cycle with a focus on the testing aspect only. The full cycle includes a discussion of the issue tracking and versioning as part of the whole process. I discuss this more in-depth in other posts in the series and my book. Composing Tests With Debuggers A powerful feature of using debuggers in test composition is their ability to "jump to line" or "set value." Developers can effectively reset the execution to a point before the test and rerun it with different conditions, without recompiling or rerunning the entire suite. This iterative process is invaluable for achieving desired test constraints and improves the quality of unit tests by refining the input parameters and expected outcomes. Increasing test coverage is about more than hitting a percentage; it's about ensuring that tests are meaningful and that they contribute to software quality. A debugger can significantly assist in this by identifying untested paths. When a test coverage tool highlights lines or conditions not reached by current tests, the debugger can be used to force execution down those paths. This helps in crafting additional tests that cover missed scenarios, ensuring that the coverage metric is not just a number but a true reflection of the software's tested state. In this case, you will notice that the next line in the body is a rejectValue call which will throw an exception. I don’t want an exception thrown as I still want to test all the permutations of the method. I can drag the execution pointer (arrow on the left) and place it back at the start of the method. Test-Driven Development How does all of this fit with disciplines like Test-Driven Development (TDD)? It doesn't fit well. Before we get into that let's revisit the basics of TDD. Weak TDD typically means just writing tests before writing the code. Strong TDD involves a red-green-refactor cycle: Red: Write a test that fails because the feature it tests isn't implemented yet. Green: Write the minimum amount of code necessary to make the test pass. Refactor: Clean up the code while ensuring that tests continue to pass. This rigorous cycle guarantees that new code is continually tested and refactored, reducing the likelihood of complex bugs. It also means that when bugs do appear, they are often easier to isolate and fix due to the modular and well-tested nature of the codebase. At least, that's the theory. TDD can be especially advantageous for scripting and loosely typed languages. In environments lacking the rigid structure of compilers and linters, TDD steps in to provide the necessary checks that would otherwise be performed during compilation in statically typed languages. It becomes a crucial substitute for compiler/linter checks, ensuring that type and logic errors are caught early. In real-world application development, TDD's utility is nuanced. While it encourages thorough testing and upfront design, it can sometimes hinder the natural flow of development, especially in complex systems that evolve through numerous iterations. The requirement for 100% test coverage can lead to an unnecessary focus on fulfilling metrics rather than writing meaningful tests. The biggest problem in TDD is its focus on unit testing. TDD is impractical with integration tests as the process would take too long. But as we determined at the start of this post, integration tests are the true benchmark for quality. In that test TDD is a methodology that provides great quality for arbitrary tests, but not necessarily great quality for the final product. You might have the best cog in the world, but if doesn't fit well into the machine then it isn't great. Final Word Debugging is a tool that not only fixes bugs but also actively aids in crafting tests that bolster software quality. By utilizing debuggers in test composition and increasing coverage, developers can create a suite of tests that not only identifies existing issues but also guards against future ones, thus ensuring the delivery of reliable, high-quality software. Debugging lets us increase coverage and verify edge cases effectively. It's part of a standardized process for issue resolution that's critical for reliability and prevents regressions.
The world of Telecom is evolving at a rapid pace, and it is not just important, but crucial for operators to stay ahead of the game. As 5G technology becomes the norm, it is not just essential, but a strategic imperative to transition seamlessly from 4G technology (which operates on OpenStack cloud) to 5G technology (which uses Kubernetes). In the current scenario, operators invest in multiple vendor-specific monitoring tools, leading to higher costs and less efficient operations. However, with the upcoming 5G world, operators can adopt a unified monitoring and alert system for all their products. This single system, with its ability to monitor network equipment, customer devices, and service platforms, offers a reassuringly holistic view of the entire system, thereby reducing complexity and enhancing efficiency. By adopting a Prometheus-based monitoring and alert system, operators can streamline operations, reduce costs, and enhance customer experience. With a single monitoring system, operators can monitor their entire 5G system seamlessly, ensuring optimal performance and avoiding disruptions. This practical solution eliminates the need for a complete overhaul and offers a cost-effective transition. Let's dive deep. Prometheus, Grafana, and Alert Manager Prometheus is a tool for monitoring and alerting systems, utilizing a pull-based monitoring system. It scrapes, collects, and stores Key Performance Indicators (KPI) with labels and timestamps, enabling it to collect metrics from targets, which are the Network Functions' namespaces in the 5G telecom world. Grafana is a dynamic web application that offers a wide range of functionalities. It visualizes data, allowing the building of charts, graphs, and dashboards that the 5G Telecom operator wants to visualize. Its primary feature is the display of multiple graphing and dashboarding support modes using GUI (Graphical user interface). Grafana can seamlessly integrate data collected by Prometheus, making it an indispensable tool for telecom operators. It is a powerful web application that supports the integration of different data sources into one dashboard, enabling continuous monitoring. This versatility improves response rates by alerting the telecom operator's team when an incident emerges, ensuring a minimum 5G network function downtime. The Alert Manager is a crucial component that manages alerts from the Prometheus server via alerting rules. It manages the received alerts, including silencing and inhibiting them and sending out notifications via email or chat. The Alert Manager also removes duplications, grouping, and routing them to the centralized webhook receiver, making it a must-have tool for any telecom operator. Architectural Diagram Prometheus Components of Prometheus (Specific to a 5G Telecom Operator) Core component: Prometheus server scrapes HTTP endpoints and stores data (time series). The Prometheus server, a crucial component in the 5G telecom world, collects metrics from the Prometheus targets. In our context, these targets are the Kubernetes cluster that houses the 5G network functions. Time series database (TSDB): Prometheus stores telecom Metrics as time series data. HTTP Server: API to query data stored in TSDB; The Grafana dashboard can query this data for visualization. Telecom operator-specific libraries (5G) for instrumenting application code. Push gateway (scrape target for short-lived jobs) Service Discovery: In the world of 5G, network function pods are constantly being added or deleted by Telecom operators to scale up or down. Prometheus's adaptable service discovery component monitors the ever-changing list of pods. The Prometheus Web UI, accessible through port 9090, is a data visualization tool. It allows users to view and analyze Prometheus data in a user-friendly and interactive manner, enhancing the monitoring capabilities of the 5G telecom operators. The Alert Manager, a key component of Prometheus, is responsible for handling alerts. It is designed to notify users if something goes wrong, triggering notifications when certain conditions are met. When alerting triggers are met, Prometheus alerts the Alert Manager, which sends alerts through various channels such as email or messenger, ensuring timely and effective communication of critical issues. Grafana for dashboard visualization (actual graphs) With Prometheus's robust components, your Telecom operator's 5G network functions are monitored with diligence, ensuring reliable resource utilization, tracking performance, detection of errors in availability, and more. Prometheus can provide you with the necessary tools to keep your network running smoothly and efficiently. Prometheus Features The multi-dimensional data model identified by metric details uses PromQL (Prometheus Querying Language) as the query language and the HTTP Pull model. Telecom operators can now discover 5G network functions with service discovery and static configuration. The multiple modes of dashboard and GUI support provide a comprehensive and customizable experience for users. Prometheus Remote Write to Central Prometheus from Network Functions 5G Operators will have multiple network functions from various vendors, such as SMF (Session Management Function), UPF (User Plane Function), AMF (Access and Mobility Management Function), PCF (Policy Control Function), and UDM (Unified Data Management). Using multiple Prometheus/Grafana dashboards for each network function can lead to a complex and inefficient 5G network operator monitoring process. To address this, it is highly recommended that all data/metrics from individual Prometheus be consolidated into a single Central Prometheus, simplifying the monitoring process and enhancing efficiency. The 5G network operator can now confidently monitor all the data at the Central Prometheus's centralized location. This user-friendly interface provides a comprehensive view of the network's performance, empowering the operator with the necessary tools for efficient monitoring. Grafana Grafana Features Panels: This powerful feature empowers operators to visualize Telecom 5G data in many ways, including histograms, graphs, maps, and KPIs. It offers a versatile and adaptable interface for data representation, enhancing the efficiency and effectiveness of your data analysis. Plugins: This feature efficiently renders Telecom 5G data in real-time on a user-friendly API (Application Programming Interface), ensuring operators always have the most accurate and up-to-date data at their fingertips. It also enables operators to create data source plugins and retrieve metrics from any API. Transformations: This feature allows you to flexibly adapt, summarize, combine, and perform KPI metrics query/calculations across 5G network functions data sources, providing the tools to effectively manipulate and analyze your data. Annotations: Rich events from different Telecom 5G network functions data sources are used to annotate metrics-based graphs. Panel editor: Reliable and consistent graphical user interface for configuring and customizing 5G telecom metrics panels Grafana Sample Dashboard GUI for 5G Alert Manager Alert Manager Components The Ingester swiftly ingests all alerts, while the Grouper groups them into categories. The De-duplicator prevents repetitive alerts, ensuring you're not bombarded with notifications. The Silencer is there to mute alerts based on a label, and the Throttler regulates the frequency of alerts. Finally, the Notifier will ensure that third parties are notified promptly. Alert Manager Functionalities Grouping: Grouping categorizes similar alerts into a single notification system. This is helpful during more extensive outages when many 5G network functions fail simultaneously and when all the alerts need to fire simultaneously. The telecom operator will expect only to get a single page while still being able to visualize the exact service instances affected. Inhibition: Inhibition suppresses the notification for specific low-priority alerts if certain major/critical alerts are already firing. For example, when a critical alert fires, indicating that an entire 5G SMF (Session Management Function) cluster is not reachable, AlertManager can mute all other minor/warning alerts concerning this cluster. Silences: Silences are simply mute alerts for a given time. Incoming alerts are checked to match the regular expression matches of an active silence. If they match, no notifications will be sent out for that alert. High availability: Telecom operators will not load balance traffic between Prometheus and all its Alert Managers; instead, they will point Prometheus to a list of all Alert Managers. Dashboard Visualization Grafana dashboard visualizes the Alert Manager webhook traffic notifications as shown below: Configuration YAMLs (Yet Another Markup Language) Telecom Operators can install and run Prometheus using the configuration below: YAML prometheus: enabled: true route: enabled: {} nameOverride: Prometheus tls: enabled: true certificatesSecret: backstage-prometheus-certs certFilename: tls.crt certKeyFilename: tls.key volumePermissions: enabled: true initdbScriptsSecret: backstage-prometheus-initdb prometheusSpec: retention: 3d replicas: 2 prometheusExternalLabelName: prometheus_cluster image: repository: <5G operator image repository for Prometheus> tag: <Version example v2.39.1> sha: "" podAntiAffinity: "hard" securityContext: null resources: limits: cpu: 1 memory: 2Gi requests: cpu: 500m memory: 1Gi serviceMonitorNamespaceSelector: matchExpressions: - {key: namespace, operator: In, values: [<Network function 1 namespace>, <Network function 2 namespace>]} serviceMonitorSelectorNilUsesHelmValues: false podMonitorSelectorNilUsesHelmValues: false ruleSelectorNilUsesHelmValues: false Configuration to route scrape data segregated based on the namespace and route to Central Prometheus. Note: The below configuration can be appended to the Prometheus mentioned in the above installation YAML. YAML remoteWrite: - url: <Central Prometheus URL for namespace 1 by 5G operator> basicAuth: username: name: <secret username for namespace 1> key: username password: name: <secret password for namespace 1> key: password tlsConfig: insecureSkipVerify: true writeRelabelConfigs: - sourceLabels: - namespace regex: <namespace 1> action: keep - url: <Central Prometheus URL for namespace 2 by 5G operator> basicAuth: username: name: <secret username for namespace 2> key: username password: name: <secret password for namespace 2> key: password tlsConfig: insecureSkipVerify: true writeRelabelConfigs: - sourceLabels: - namespace regex: <namespace 2> action: keep Telecom Operators can install and run Grafana using the configuration below. YAML grafana: replicas: 2 affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app.kubernetes.io/name" operator: In values: - Grafana topologyKey: "kubernetes.io/hostname" securityContext: false rbac: pspEnabled: false # Must be disabled due to tenant permissions namespaced: true adminPassword: admin image: repository: <artifactory>/Grafana tag: <version> sha: "" pullPolicy: IfNotPresent persistence: enabled: false initChownData: enabled: false sidecar: image: repository: <artifactory>/k8s-sidecar tag: <version> sha: "" imagePullPolicy: IfNotPresent resources: limits: cpu: 100m memory: 100Mi requests: cpu: 50m memory: 50Mi dashboards: enabled: true label: grafana_dashboard labelValue: "Vendor name" datasources: enabled: true defaultDatasourceEnabled: false additionalDataSources: - name: Prometheus type: Prometheus url: http://<prometheus-operated>:9090 access: proxy isDefault: true jsonData: timeInterval: 30s resources: limits: cpu: 400m memory: 512Mi requests: cpu: 50m memory: 206Mi extraContainers: - name: oauth-proxy image: <artifactory>/origin-oauth-proxy:<version> imagePullPolicy: IfNotPresent ports: - name: proxy-web containerPort: 4181 args: - --https-address=:4181 - --provider=openshift # Service account name here must be "<Helm Release name>-grafana" - --openshift-service-account=monitoring-grafana - --upstream=http://localhost:3000 - --tls-cert=/etc/tls/private/tls.crt - --tls-key=/etc/tls/private/tls.key - --cookie-secret=SECRET - --pass-basic-auth=false resources: limits: cpu: 100m memory: 256Mi requests: cpu: 50m memory: 128Mi volumeMounts: - mountPath: /etc/tls/private name: grafana-tls extraContainerVolumes: - name: grafana-tls secret: secretName: grafana-tls serviceAccount: annotations: "serviceaccounts.openshift.io/oauth-redirecturi.first": https://[SPK exposed IP for Grafana] service: targetPort: 4181 annotations: service.alpha.openshift.io/serving-cert-secret-name: <secret> Telecom Operators can install and run Alert Manager using the configuration below. YAML alertmanager: enabled: true alertmanagerSpec: image: repository: prometheus/alertmanager tag: <version> replicas: 2 podAntiAffinity: hard securityContext: null resources: requests: cpu: 25m memory: 200Mi limits: cpu: 100m memory: 400Mi containers: - name: config-reloader resources: requests: cpu: 10m memory: 10Mi limits: cpu: 25m memory: 50Mi Configuration to route Prometheus Alert Manager data to the Operator's centralized webhook receiver. Note: The below configuration can be appended to the Alert Manager mentioned in the above installation YAML. YAML config: global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' routes: - receiver: '<Network function 1>' group_wait: 10s group_interval: 10s group_by: ['alertname','oid','action','time','geid','ip'] matchers: - namespace="<namespace 1>" - receiver: '<Network function 2>' group_wait: 10s group_interval: 10s group_by: ['alertname','oid','action','time','geid','ip'] matchers: - namespace="<namespace 2>" Conclusion The open-source OAM (Operation and Maintenance) tools Prometheus, Grafana, and Alert Manager can benefit 5G Telecom operators. Prometheus periodically captures all the status of monitored 5G Telecom network functions through the HTTP protocol, and any component can be connected to the monitoring as long as the 5G Telecom operator provides the corresponding HTTP interface. Prometheus and Grafana Agent gives the 5G Telecom operator control over the metrics the operator wants to report; once the data is in Grafana, it can be stored in a Grafana database as extra data redundancy. In conclusion, Prometheus allows 5G Telecom operators to improve their operations and offer better customer service. Adopting a unified monitoring and alert system like Prometheus is one way to achieve this.
What Is the C4 Model? The C4 model is a hierarchical framework designed to help software architects and developers visualize and communicate the essential aspects of software architecture in a clear and structured way. Unlike traditional diagramming approaches that often result in cluttered and overly complex diagrams, the C4 model focuses on simplicity and abstraction to convey architectural concepts effectively. The next question is which tool you use to create said diagrams. You can use Visio, draw.io, PlantUML, even PowerPoint, or whatever tool you normally use for creating diagrams. However, these tools do not check whether naming, relations, etc. are consistently used in the different diagrams. Besides that, it might be difficult to review new versions of diagrams because it is not clear which changes are made. In order to solve these problems, Simon Brown, the author of the C4 model, created Structurizr. What Is Structurizr? Structurizr allows you to create diagrams as code. Based on the code, Structurizr visualizes the diagrams for you and you can interact with the visualization. Because the diagrams are maintained in code, you can add them to your version control system (git), and changes in the diagrams are tracked and can be easily reviewed. In a previous article, some features of Structurizr are explored. Structurizr Lite was used, which supports only one workspace. However, if you have a more diverse system landscape, Structurizr Lite is not sufficient anymore. You will have multiple workspaces, one for every software system. You also probably want an overview of your entire system landscape. In this article, you will explore how you can use Structurizr to maintain not only the software architecture of one system but your entire system landscape as code. Sources used in this blog can be found at GitHub. Prerequisites Prerequisites for this blog are: Basic knowledge of the C4 model Basic knowledge of Docker Basic knowledge of Structurizr Linux is used — if you are using a different Operating System, you will need to adjust the commands accordingly Installation As mentioned before, Structurizr Lite cannot be used for this scenario. Instead, you need to install Structurizr on-premises. Create in the root of the repository a data directory. This directory will be mapped as a volume in the docker container. If you have executed the previous blog, ensure that you clean the data directory first. With Structurizr Lite, it is intended that you can edit files in this data directory, with Structurizr on-premises it is advised not to alter the files in the data directory. Structurizr on-premises should be run on a separate server and a normal user should not have access to the data directory anyway. Execute the following command from within the root of the repository: Shell $ docker run -it --rm -p 8080:8080 -v ./data:/usr/local/structurizr structurizr/onpremises Navigate in your browser to http://localhost:8080, log in with the default user structurizr and password password, and the Structurizr webpage is shown. Single Workspace First, let’s see how you can create a single workspace with Structurizr on-premises. Click New workspace, and an empty workspace is created. It is not possible anymore to edit files on your host machine, just like with Structurizr Lite. So, how can you upload your DSL files to the workspace? In order to do so, you need Structurizr CLI. At the moment of writing, v2024.02.22 is the latest version, which can be downloaded as a zip from GitHub. Unpack the zip file, and add the directory to your path. You will upload the latest version of the software system from the previous blog. The DSL is located in the workspaces/3-basic-styles directory. Navigate to this directory. To push the DSL to Structurizr, you will make use of the push command. The push command needs some parameters, which can be found in the settings of the Structurizr workspace. You need the information as shown under API details. Below this information, the parameters can easily be copied. Execute the following command, replacing the parameters for your situation: Shell $ structurizr.sh push -url http://localhost:8080/api -id 1 -key 2607de22-7ce0-4eb1-9f28-1e7e9979121a -secret 09528dfd-0c0a-4380-85cb-766b8da5e1dc -workspace workspace.dsl Pushing workspace 1 to http://localhost:8080/api - creating new workspace - parsing model and views from /<path to project directory>/MyStructurizrPlanet/workspaces/3-basic-styles/workspace.dsl - merge layout from remote: true - storing previous version of workspace in null - pushing workspace Getting workspace with ID 1 Putting workspace with ID 1 {"success":true,"message":"OK","revision":2} - finished If everything goes well, the DSL is pushed successfully. The System Context and Container diagram are now added to the workspace. Workspace Features In this section, some interesting features of Structurizr on-premises are shown. 5.1 Version Control Every upload automatically creates a new version. It is also possible to retrieve an older version. 5.2 Error Checking The Inspections in the left menu, gives you an overview of errors in your DSL. 5.3 Reviews When you open a diagram, you can create a review. When creating the review, you can choose which diagrams need to be reviewed, what kind of review you are requesting and whether unauthenticated access is allowed or not. The reviewer can add comments of course. Next to the Public review text, a link to a checklist is present which can help you executing the review. Create System Landscape Using DSL Only The above examples consist of diagrams for a single software system. Often, multiple software systems are used in an organization. These software systems interact with each other and thus form together a system landscape. Each team will be responsible for its own software system diagrams, but it is also necessary to have a diagram containing the larger picture. Let’s explore whether this is possible using Structurizr. You will be using an example based on the enterprise example provided at the Structurizr GitHub repository. The files can be found in workspaces/4-system-landscape. Create a new workspace via the UI, navigate to the 4-system-landscape directory, and push the customer-service DSL to this workspace. Shell $ structurizr.sh push -url http://localhost:8080/api -id 2 -key f24fe705-a508-4f8d-9cf7-3fc7b323f293 -secret 02c6597f-c750-47e0-9b88-f6e26fccdf38 -workspace customer-service/workspace.dsl In the same way, create a workspace for the invoice-service and the order-service. Push the corresponding DSL to each workspace. A separate system-landscape DSL is present, which uses a plugin to create the relationships between the software systems. Create a workspace for this DSL and push it. Shell $ structurizr.sh push -url http://localhost:8080/api -id 5 -key cb18cabb-61c7-4c3a-a58e-2e97ff0fa285 -secret a638aa99-73cd-427d-8188-3788e678129f -workspace system-landscape/workspace.dsl This creates the system landscape overview. However, two issues are encountered with this view: It is not possible to click on the Order Service f.e. in order that it opens the software system diagram for the Order Service. The DSL of the Customer Service does not define the relationship with Order Service and Invoice Service as can be seen in the diagram below. It would be nice if this inconsistency was reported one way or the other. I asked a question about this on GitHub and used the answer to create a solution that can be found in the following paragraphs. Create System Landscape Using Java The solution to the problem with the absence of links to the different services is to make use of the Java Structurizr library. With this library, you have much more control to achieve the desired functionality. I used the source code from the example in the Structurizr repository and added it to the directory: workspaces/5-system-landscape. The pom file contains the necessary dependencies to run the code, and the maven-assembly-plugin is added to create a fat jar. The code executes the following steps: Create a workspace for the system landscape. Create workspaces for each service. Generate the system landscape by parsing the workspaces metadata, create the necessary relationships, add a link to the services and create a view for the system landscape. Execute the following command from within the workspaces/5-system-landscape directory in order to build the fat jar. Shell $ mvn clean package Run the code and an error occurs. Shell $ java -jar target/mystructurizrplanet-1.0-SNAPSHOT-jar-with-dependencies.jar Mar 02, 2024 11:41:12 AM com.structurizr.api.AdminApiClient createWorkspace SEVERE: com.structurizr.api.StructurizrClientException: The API key is not configured for this installation - please refer to the documentation Exception in thread "main" com.structurizr.api.StructurizrClientException: com.structurizr.api.StructurizrClientException: The API key is not configured for this installation - please refer to the documentation at com.structurizr.api.AdminApiClient.createWorkspace(AdminApiClient.java:109) at com.mydeveloperplanet.mystructurizrplanet.CreateSystemLandscape.main(CreateSystemLandscape.java:30) Caused by: com.structurizr.api.StructurizrClientException: The API key is not configured for this installation - please refer to the documentation at com.structurizr.api.AdminApiClient.createWorkspace(AdminApiClient.java:105) ... 1 more To use the Java library, you need to use an API key. This API key is disabled by default. To enable it, you need to add a file structurizr.properties to your data directory. In the properties file, you set the API key to its bcrypt encoded value. Properties files structurizr.apiKey=$2a$10$ekjju1h3fC1y2YAln7wqxuJ.q0gBjQoFPX/Wvmzr.L5aIdoqvUIwa Add read permissions to the file. Shell $ chmod o+r data/structurizr.properties Restart the Docker container and execute the jar file again. Shell $ java -jar target/mystructurizrplanet-1.0-SNAPSHOT-jar-with-dependencies.jar Mar 02, 2024 11:50:03 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 7 Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: Putting workspace with ID 7 Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: {"success":true,"message":"OK","revision":2} Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 8 Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: Putting workspace with ID 8 Mar 02, 2024 11:50:04 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: {"success":true,"message":"OK","revision":2} Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 9 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: Putting workspace with ID 9 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: {"success":true,"message":"OK","revision":2} Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 1 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 2 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 3 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 4 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 5 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 6 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 7 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 8 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 9 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient getWorkspace INFO: Getting workspace with ID 6 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: Putting workspace with ID 6 Mar 02, 2024 11:50:05 AM com.structurizr.api.WorkspaceApiClient putWorkspace INFO: {"success":true,"message":"OK","revision":2} If you open the system landscape workspace, it is now possible to double-click one of the services, and you will be navigated to the corresponding service. Great, but there are some caveats to mention: This source code always creates new workspaces every time you run it. This is just an example of what is possible using the Java library. If you want to update existing workspaces, you will need to alter the source code for this purpose. The source code contains a hardcoded API key in plain text. You should not do this in a production environment. Validate Relationships Is it possible to validate the relationships using the Java library? Yes, it is. An example of the source code can be found in directory workspaces/6-validate-relationships. This code will validate offline whether the DSL contains the correct relationships. It is only intended to prove that the validation can be done. For using this in production, the source code needs to be made more robust. Build the code and run it. Shell $ mvn clean package $ java -jar target/validaterelationships-1.0-SNAPSHOT-jar-with-dependencies.jar missing relation in CustomerService {2 | Order Service | } ---[Manages customer data using]---> {4 | Customer Service | } missing relation in CustomerService {3 | Invoice Service | } ---[Gets customer data from]---> {4 | Customer Service | } The validation finds the two errors in the Customer Service. Add the relationships to the Customer Service DSL. Plain Text model { !extend customerService { api = container "Customer API" database = container "Customer Database" api -> database "Reads from and writes to" orderService -> customerService "Gets customer data from" invoiceService -> customerService "Gets customer data from" } } Build the code and run it. The errors are gone and the relationships are visible in the Customer Service if you run the code from the previous paragraph. Conclusion Structurizr offers many features to get a grip on your software architecture. It also allows you to generate a system landscape and to implement several customizations, e.g. custom validation checks. You need to learn the Java Structurizr library, but the learning curve is not very steep.
Cross-Origin Resource Sharing (CORS) often becomes a stumbling block for developers attempting to interact with APIs hosted on different domains. The challenge intensifies when direct server configuration isn't an option, pushing developers towards alternative solutions like the widely-used cors-anywhere. However, less known is the capability of NGINX's proxy_pass directive to handle not only local domains and upstreams but also external sources, for example: This is how the idea was born to write a universal (with some reservations) NIGNX config that supports any given domain. Understanding the Basics and Setup CORS is a security feature that restricts web applications from making requests to a different domain than the one that served the web application itself. This is a crucial security measure to prevent malicious websites from accessing sensitive data. However, when legitimate cross-domain requests are necessary, properly configuring CORS is essential. The NGINX proxy server offers a powerful solution to this dilemma. By utilizing NGINX's flexible configuration system, developers can create a proxy that handles CORS preflight requests and manipulates headers to ensure compliance with CORS policies. Here's how: Variable Declaration and Manipulation With the map directive, NGINX allows the declaration of new variables based on existing global ones, incorporating regular expression support for dynamic processing. For instance, extracting a specific path from a URL can be achieved, allowing for precise control over request handling. Thus, when requesting http://example.com/api, the $my_request_path variable will contain api. Header Management NGINX facilitates the addition of custom headers to responses via add_header and to proxied requests through proxy_set_header. Simultaneously, proxy_hide_header can be used to conceal headers received from the proxied server, ensuring only the necessary information is passed back to the client. We now have an X-Request-Path header with api. Conditional Processing Utilizing the if directive, NGINX can perform actions based on specific conditions, such as returning a predetermined response code for OPTIONS method requests, streamlining the handling of CORS preflight checks. Putting It All Together First, let’s declare $proxy_uri that we will extract from $request_uri: In short, it works like this: when requesting http://example.com/example.com, the $proxy_uri variable will contain https://example.com. From the resulting $proxy_uri, extract the part that will match the Origin header: For the Forwarded header, we need to process 2 variables at once: The processed X-Forwarded-For header is already built into NGINX. Now we can move on to declaring our proxy server: We get a minimally working proxy server, which can process the CORS Preflight Request and add the appropriate headers. Enhancing Security and Performance Beyond basic setup, further refinements can improve security and performance: Hiding CORS Headers When NGINX handles CORS internally, it's beneficial to hide these headers from client responses to prevent exposure of server internals. Rate Limit Bypassing It would also be nice to pass the client’s IP to somehow bypass the rate limit, which can occur if several users access the same resource. Disabling Caching And finally, for dynamic content or sensitive information, disabling caching is a best practice, ensuring data freshness and privacy. Conclusion This guide not only demystifies the process of configuring NGINX to handle CORS requests but also equips developers with the knowledge to create a robust, flexible proxy server capable of supporting diverse application needs. Through careful configuration and understanding of both CORS policies and NGINX's capabilities, developers can overcome cross-origin restrictions, enhance application performance, and ensure data security. This advanced understanding and application of NGINX not only solves a common web development hurdle but also showcases the depth of skill and innovation possible when navigating web security and resource-sharing challenges.
Bartłomiej Żyliński
Software Engineer,
SoftwareMill
Abhishek Gupta
Principal Developer Advocate,
AWS
Yitaek Hwang
Software Engineer,
NYDIG