Programming languages allow us to communicate with computers, and they operate like sets of instructions. There are numerous types of languages, including procedural, functional, object-oriented, and more. Whether you’re looking to learn a new language or trying to find some tips or tricks, the resources in the Languages Zone will give you all the information you need and more.
Enhancing Web Scraping With Large Language Models: A Modern Approach
Exploring Reactive Programming in Kotlin Coroutines With Spring Boot: A Comparison With WebFlux
This is part 4 of a 4-part tutorial Part 1: DSL Validations: Properties Part 2: DSL Validations: Child Properties Part 3: DSL Validations: Operators Part 4: DSL Validations: The Whole Enchilada In this final part of a four-part tutorial, after introducing the concept of property validators and operators, we can tie it all together by validating complete beans in a more reusable way. PropertyBeanValidator PropertyBeanValidator is the worker class that evaluates a collection of PropertyValidators against a target object. The specific property validators are provided during construction and are AND'ed together as if wrapped by an AndOperator (e.g., all property validators must pass for the entire bean to validate successfully). Kotlin open class PropertyBeanValidator<T> ( validators: Set<PropertyValidator<T>>) : DefaultBeanValidator() { override fun <T> validate( source: T, vararg groups: Class<*>?): Set<ConstraintViolation<T>> { // Place to catch all the constraint violations that // occurred during this validation val violations = mutableSetOf<ConstraintViolation<T>>() // Call each individual validator to determine whether // or not the bean validates correctly validators .parallelStream() .forEach { it as PropertyValidator<T>; it.validate(source, violations) } return violations } } Putting It All Together We'll reuse the Student class defined in Part 2, "DSL Validations: Child Properties" (linked in the introduction). Kotlin data class Address( val line1: String?, val line2: String? val city: String, val state: String, val zipCode: String ) data class Student( val studentId: String, val firstName: String?, val lastName: String?, val emailAddress: String?, val localAddress: Address ) For this example, we have three business rules to apply against a Student object: firstName and lastName must both be present or missing; address.line2 presence requires that address.line1 is also present; address.zipCode must be formatted correctly. Ad-Hoc Bean Validator Bean validators are created by instantiating PropertyBeanValidator directly by a factory allowing callers to be provided an appropriate validator without actually knowing what needs to be validated. The factory determines the specific validations required - based on caller, data state, feature flags, etc. - and builds the validator on the fly. Kotlin val validators = setOf( OrOperator( "studentName", listOf( AndOperator( "namePresent", listOf( NotBlankValidator("firstName", Student::firstName), NotBlankValidator("lastName", Student::lastName) ) ), AndOperator( "nameNotPresent", listOf( NullOrBlankValidator("firstName", Student::firstName), NullOrBlankValidator("lastName", Student::lastName) ), ) "first/last name must both be present or null" ), OrOperator( "Line2RequiresLine1", listOf( ChildPropertyValidator( "line1NotNull", Student::localAddress, NotBlankValidator("line1", Address::line1)), ChildPropertyValidator( "line2Null", Student::localAddress, NullOrBlankValidator("line2", Address::line2)), ), "line2 requires line1." ), ChildPropertyValidator( "address.zipCode", Student::localAddress, ZipCodeFormatValidator("address", Address::zipCode) ) ) val validator = PropertyBeanValidator(validators) Class-Specific Bean Validator A class-specific validator is useful when there is one and only one way to validate a class and consistent and correct usage across the code base. Here we extend PropertyBeanValidator and pass in the validators via an alternative constructor. Kotlin class StudentBeanValidator (validators: Set<PropertyValidator<Student>>) : PropertyBeanValidator<Student> (validators) { constructor() : this(getValidators()) companion object { fun getValidators() : Set<PropertyValidator<Student>> { return setOf( . . <same validations as above> . . ) } } } NOTE: It's a little more awkward in Kotlin, as you can't access data in the companion object before the object is constructed, but calling a method is allowed. Statics in Java would allow the creation of an immutable set that could be used for any number of instantiations. Validating Kotlin // Assume the student is created from a database entry val myStudent = retrieveStudent("studentId") // Validate the object val violations = mutableSetOf<ConstraintViolation<T>>() validator.validate(myStudent, violations) // empty collection means successful validation val successfullyValidated = violations.isEmpty() Annotation-Based Validation Jakarta's validation interface ConstraintValidator declares an annotation-driven validation which, in turn, can be defined via the DSL. First, implement the annotation that can be applied for validating students, in this example, limited to method parameters. Kotlin @Constraint(validatedBy = [StudentValidator::class]) @Target(AnnotationTarget.VALUE_PARAMETER) @Retention(AnnotationRetention.RUNTIME) annotation class ValidStudent( val message: String = "Invalid Student record", val groups: Array<KClass<*>> = [], val payload: Array<KClass<out Payload>> = [] ) Next, implement the class-extending ConstraintValidator that does the actual validation, using the StudentBeanValidation implemented earlier. Kotlin class StudentValidator : ConstraintValidator<ValidStudent, Student> { override fun isValid(student: Student, context: ConstraintValidatorContext): Boolean { val errors = StudentBeanValidator().validate(student) return if (errors.isNotEmpty()) { context.disableDefaultConstraintViolation() context.buildConstraintViolationWithTemplate( "Student validation failed with following errors :$errors") .addConstraintViolation() false } else { true } } } } Here is the annotation in action: Kotlin fun registerStudentForClass(@ValidStudent student: Student): Student { . . <do some work> . . } For those interested, this Baeldung tutorial dives deeper into validations than what I've covered. Final Thoughts DSL Validations is a language-independent way of checking for bean/object validity without writing ever more if-then-else statements that are uncommented, unclear, and unreadable, and are easy to extend and customize for whatever specific requirements your organization has. Supporting Code DefaultBeanValidator Kotlin open class DefaultBeanValidator : Validator { override fun <T> validate(source: T, vararg groups: Class<*>?) : Set<ConstraintViolation<T>> { throw UnsupportedOperationException (EXCEPTION_MESSAGE) } override fun <T> validateProperty(source: T, propertyName: String?, vararg groups: Class<*>?) : Set<ConstraintViolation<T>> { throw UnsupportedOperationException (EXCEPTION_MESSAGE) } override fun <T> validateValue(beanType: Class<T>?, propertyName: String?, value: Any?, vararg groups: Class<*>?) : Set<ConstraintViolation<T>> { throw UnsupportedOperationException (EXCEPTION_MESSAGE) } override fun getConstraintsForClass(clazz: Class<*>?): BeanDescriptor { throw UnsupportedOperationException (EXCEPTION_MESSAGE) } override fun <T : Any?> unwrap(type: Class<T>?): T { throw UnsupportedOperationException (EXCEPTION_MESSAGE) } override fun forExecutables(): ExecutableValidator { throw UnsupportedOperationException (EXCEPTION_MESSAGE) } companion object { const val EXCEPTION_MESSAGE = "Not yet implemented" } }
Generative AI development has been democratized, thanks to powerful Machine Learning models (specifically Large Language Models such as Claude, Meta's LLama 2, etc.) being exposed by managed platforms/services as API calls. This frees developers from the infrastructure concerns and lets them focus on the core business problems. This also means that developers are free to use the programming language best suited for their solution. Python has typically been the go-to language when it comes to AI/ML solutions, but there is more flexibility in this area. In this post, you will see how to leverage the Go programming language to use Vector Databases and techniques such as Retrieval Augmented Generation (RAG) with langchaingo. If you are a Go developer who wants to how to build and learn generative AI applications, you are in the right place! If you are looking for introductory content on using Go for AI/ML, feel free to check out my previous blogs and open-source projects in this space. First, let's take a step back and get some context before diving into the hands-on part of this post. The Limitations of LLMs Large Language Models (LLMs) and other foundation models have been trained on a large corpus of data enabling them to perform well at many natural language processing (NLP) tasks. But one of the most important limitations is that most foundation models and LLMs use a static dataset which often has a specific knowledge cut-off (say, January 2022). For example, if you were to ask about an event that took place after the cut-off, date it would either fail to answer it (which is fine) or worse, confidently reply with an incorrect response — this is often referred to as Hallucination. We need to consider the fact that LLMs only respond based on the data they were trained on - it limits their ability to accurately answer questions on topics that are either specialized or proprietary. For instance, if I were to ask a question about a specific AWS service, the LLM may (or may not) be able to come up with an accurate response. Wouldn't it be nice if the LLM could use the official AWS service documentation as a reference? RAG (Retrieval Augmented Generation) Helps Alleviate These Issues It enhances LLMs by dynamically retrieving external information during the response generation process, thereby expanding the model's knowledge base beyond its original training data. RAG-based solutions incorporate a vector store which can be indexed and queried to retrieve the most recent and relevant information, thereby extending the LLM's knowledge beyond its training cut-off. When an LLM equipped with RAG needs to generate a response, it first queries a vector store to find relevant, up-to-date information related to the query. This process ensures that the model's outputs are not just based on its pre-existing knowledge but are augmented with the latest information, thereby improving the accuracy and relevance of its responses. But, RAG Is Not the Only Way Although this post focuses solely on RAG, there are other ways to work around this problem, each with its pros and cons: Task-specific tuning: Fine-tuning large language models on specific tasks or datasets to improve their performance in those domains. Prompt engineering: Carefully designing input prompts to guide language models towards desired outputs, without requiring significant architectural changes. Few-shot and zero-shot learning: Techniques that enable language models to adapt to new tasks with limited or no additional training data. Vector Store and Embeddings I mentioned vector store a few times in the last paragraph. These are nothing but databases that store and index vector embeddings, which are numerical representations of data such as text, images, or entities. Embeddings help us go beyond basic search since they represent the semantic meaning of the source data — hence the word Semantic search, which is a technique that understands the meaning and context of words to improve search accuracy and relevance. Vector databases can also store metadata, including references to the original data source (for example, the URL of a web document) of the embedding. Thanks to generative AI technologies, there has also been an explosion in Vector Databases. These include established SQL and NoSQL databases that you may already be using in other parts of your architecture — such as PostgreSQL, Redis, MongoDB, and OpenSearch. But there are also databases that are custom-built for vector storage. Some of these include Pinecone, Milvus, Weaviate, etc. Alright, let's go back to RAG... What Does a Typical RAG Workflow Look Like? At a high level, RAG-based solutions have the following workflow. These are often executed as a cohesive pipeline: Retrieving data from a variety of external sources like documents, images, web URLs, databases, proprietary data sources, etc. This consists of sub-steps such as chunking which involves splitting up large datasets (e.g. a 100 MB PDF file) into smaller parts (for indexing). Create embeddings: This involves using an embedding model to convert data into numerical representations. Store/Index embeddings in a vector store Ultimately, this is integration as part of a larger application where the contextual data (semantic search result) is provided to LLMs (along with the prompts). End-To-End RAG Workflow in Action Each of the workflow steps can be executed with different components. The ones used in the blog include: PostgreSQL: It will be used as a Vector Database, thanks to the pgvector extension. To keep things simple, we will run it in Docker. langchaingo: It is a Go port of the langchain framework. It provides plugins for various components, including vector store. We will use it for loading data from web URLs and indexing it in PostgreSQL. Text and embedding models: We will use Amazon Bedrock Claude and Titan models (for text and embedding respectively) with langchaingo. Retrieval and app integration: langchaingo vector store (for semantic search) and chain (for RAG). You will get a sense of how these individual pieces work. We will cover other variants of this architecture in subsequent blogs. Before You Begin Make sure you have: Go, Docker and psql (for e.g., using Homebrew if you're on Mac) installed. Amazon Bedrock access configured from your local machine - Refer to this blog post for details. Start PostgreSQL on Docker There is a Docker image we can use! docker run --name pgvector --rm -it -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres ankane/pgvector Activate pgvector extension by logging into PostgreSQL (using psql) from a different terminal: # enter postgres when prompted for password psql -h localhost -U postgres -W CREATE EXTENSION IF NOT EXISTS vector; Load Data Into PostgreSQL (Vector Store) Clone the project repository: git clone https://github.com/build-on-aws/rag-golang-postgresql-langchain cd rag-golang-postgresql-langchain At this point, I am assuming that your local machine is configured to work with Amazon Bedrock The first thing we will do is load data into PostgreSQL. In this case, we will use an existing web page as the source of information. I have used this developer guide — but feel free to use your own! Make sure to change the search query accordingly in the subsequent steps. export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=load -source=https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html You should get the following output: loading data from https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-general-nosql-design.html vector store ready - postgres://postgres:postgres@localhost:5432/postgres?sslmode=disable no. of documents to be loaded 23 Give it a few seconds. Finally, you should see this output if all goes well: data successfully loaded into vector store To verify, go back to the psql terminal and check the tables: \d You should see a couple of tables — langchain_pg_collection and langchain_pg_embedding. These are created by langchaingo since we did not specify them explicitly (that's ok, it's convenient for getting started!). langchain_pg_collection contains the collection name while langchain_pg_embedding stores the actual embeddings. | Schema | Name | Type | Owner | |--------|-------------------------|-------|----------| | public | langchain_pg_collection | table | postgres | | public | langchain_pg_embedding | table | postgres | You can introspect the tables: select * from langchain_pg_collection; select count(*) from langchain_pg_embedding; select collection_id, document, uuid from langchain_pg_embedding LIMIT 1; You will see 23 rows in the langchain_pg_embedding table, since that was the number of langchain documents that our web page source was split into (refer to the application logs above when you loaded the data) A quick detour into how this works... The data loading implementation is in load.go, but let's look at how we access the vector store instance (in common.go): brc := bedrockruntime.NewFromConfig(cfg) embeddingModel, err := bedrock.NewBedrock(bedrock.WithClient(brc), bedrock.WithModel(bedrock.ModelTitanEmbedG1)) //... store, err = pgvector.New( context.Background(), pgvector.WithConnectionURL(pgConnURL), pgvector.WithEmbedder(embeddingModel), ) pgvector.WithConnectionURL is where the connection information for PostgreSQL instance is provided pgvector.WithEmbedder is the interesting part, since this is where we can plug in the embedding model of our choice. langchaingo supports Amazon Bedrock embeddings. In this case I have used Amazon Bedrock Titan embedding model. Back to the loading process in load.go. We first get the data in form of a slice of schema.Document (getDocs function) using the langchaingo in-built HTML loader for this. docs, err := documentloaders.NewHTML(resp.Body).LoadAndSplit(context.Background(), textsplitter.NewRecursiveCharacter()) Then, we load it into PostgreSQL. Instead of writing everything by ourselves, we can use the langchaingo vector store abstraction and use the high-level function AddDocuments: _, err = store.AddDocuments(context.Background(), docs) Great. We have set up a simple pipeline to fetch and ingest data into PostgreSQL. Let's make use of it! Execute Semantic Search Let's ask a question. I am going with "What tools can I use to design dynamodb data models?" relevant to this document which I used as the data source — feel free to tune it as per your scenario. export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=semantic_search -query="what tools can I use to design dynamodb data models?" -maxResults=3 You should see a similar output — note that we opted to output a maximum of three results (you can change it): vector store ready ============== similarity search results ============== similarity search info - can build new data models from, or design models based on, existing data models that satisfy your application's data access patterns. You can also import and export the designed data model at the end of the process. For more information, see Building data models with NoSQL Workbench similarity search score - 0.3141409 ============================ similarity search info - NoSQL Workbench for DynamoDB is a cross-platform, client-side GUI application that you can use for modern database development and operations. It's available for Windows, macOS, and Linux. NoSQL Workbench is a visual development tool that provides data modeling, data visualization, sample data generation, and query development features to help you design, create, query, and manage DynamoDB tables. With NoSQL Workbench for DynamoDB, you similarity search score - 0.3186116 ============================ similarity search info - key-value pairs or document storage. When you switch from a relational database management system to a NoSQL database system like DynamoDB, it's important to understand the key differences and specific design approaches.TopicsDifferences between relational data design and NoSQLTwo key concepts for NoSQL designApproaching NoSQL designNoSQL Workbench for DynamoDB Differences between relational data design and NoSQL similarity search score - 0.3275382 ============================ Now what you see here are the top three results (thanks to -maxResults=3). Note that this is not an answer to our question. These are the results from our vector store that are semantically close to the query — the keyword here is semantic. Thanks to the vector store abstraction in langchaingo, we were able to easily ingest our source data into PostgreSQL and use the SimilaritySearch function to get the top N results corresponding to our query (see semanticSearch function in query.go): Note that (at the time of writing) the pgvector implementation in langchaingo uses cosine distance vector operation but pgvector also supports L2 and inner product - for details, refer to the pgvector documentation. Ok, so far we have: Loaded vector data Executed semantic search This is the stepping stone to RAG (Retrieval Augmented Generation) - let's see it in action! Intelligent Search With RAG To execute a RAG-based search, we run the same command as above (almost), only with a slight change in the action (rag_search): export PG_HOST=localhost export PG_USER=postgres export PG_PASSWORD=postgres export PG_DB=postgres go run *.go -action=rag_search -query="what tools can I use to design dynamodb data models?" -maxResults=3 Here is the output I got (might be slightly different in your case): Based on the context provided, the NoSQL Workbench for DynamoDB is a tool that can be used to design DynamoDB data models. Some key points about NoSQL Workbench for DynamoDB: - It is a cross-platform GUI application available for Windows, macOS, and Linux. - It provides data modeling capabilities to help design and create DynamoDB tables. - It allows you to build new data models or design models based on existing data models. - It provides features like data visualization, sample data generation, and query development to manage DynamoDB tables. - It helps in understanding the key differences and design approaches when moving from a relational database to a NoSQL database like DynamoDB. So in summary, NoSQL Workbench for DynamoDB seems to be a useful tool specifically designed for modeling and working with DynamoDB data models. As you can see, the result is not just about "Here are the top X responses for your query." Instead, it's a well-formulated response to the question. Let's peek behind the scenes again to see how it works. Unlike, ingestion and semantic search, RAG-based search is not directly exposed by the langchaingo vector store implementation. For this, we use a langchaingo chain which takes care of the following: Invokes semantic search Combines the semantic search with a prompt Sends it to a Large Language Model (LLM), which in this case happens to be Claude on Amazon Bedrock. Here is what the chain looks like (refer to the function ragSearch in query.go): result, err := chains.Run( context.Background(), chains.NewRetrievalQAFromLLM( llm, vectorstores.ToRetriever(store, numOfResults), ), question, chains.WithMaxTokens(2048), ) Let’s Try Another One This was just one example. I tried a different question and increased maxResults to 10, which means that the top 10 results from the vector database will be used to formulate the answer. go run *.go -action=rag_search -query="how is NoSQL different from SQL?" -maxResults=10 The result (again, it might be different for you): Based on the provided context, there are a few key differences between NoSQL databases like DynamoDB and relational database management systems (RDBMS): 1. Data Modeling: - In RDBMS, data modeling is focused on flexibility and normalization without worrying much about performance implications. Query optimization doesn't significantly affect schema design. - In NoSQL, data modeling is driven by the specific queries and access patterns required by the application. The data schema is designed to optimize the most common and important queries for speed and scalability. 2. Data Organization: - RDBMS organizes data into tables with rows and columns, allowing flexible querying. - NoSQL databases like DynamoDB use key-value pairs or document storage, where data is organized in a way that matches the queried data shape, improving query performance. 3. Query Patterns: - In RDBMS, data can be queried flexibly, but queries can be relatively expensive and don't scale well for high-traffic situations. - In NoSQL, data can be queried efficiently in a limited number of ways defined by the data model, while other queries may be expensive and slow. 4. Data Distribution: - NoSQL databases like DynamoDB distribute data across partitions to scale horizontally, and the data keys are designed to evenly distribute the traffic across partitions, avoiding hot spots. - The concept of "locality of reference," keeping related data together, is crucial for improving performance and reducing costs in NoSQL databases. In summary, NoSQL databases prioritize specific query patterns and scalability over flexible querying, and the data modeling is tailored to these requirements, in contrast with RDBMS where data modeling focuses on normalization and flexibility. Where to “Go” From Here? Learning by doing is a good approach. If you've followed along and executed the application thus far, great! I recommend you try out the following: langchaingo has support for lots of different models, including ones in Amazon Bedrock (e.g. Meta LLama 2, Cohere, etc.) — try tweaking the model and see if it makes a difference. Is the output better? What about the Vector Database? I demonstrated PostgreSQL, but langchaingo supports others as well (including OpenSearch, Chroma, etc.) - Try swapping out the Vector store and see how/if the search results differ. You probably get the gist, but you can also try out different embedding models. We used Amazon Titan, but langchaingo also supports many others, including Cohere embed models in Amazon Bedrock. Wrap Up This was a simple example for you to better understand the individual steps in building RAG-based solutions. These might change a bit depending on the implementation, but the high-level ideas remain the same. I used langchaingo as the framework. But this doesn't always mean you have to use one. You could also remove the abstractions and call the LLM platforms APIs directly if you need granular control in your applications or the framework does not meet your requirements. Like most generative AI, this area is rapidly evolving, and I am optimistic about having Go developers have more options to build generative AI solutions. If you've feedback or questions, or you would like me to cover something else around this topic, feel free to comment below! Happy building!
I wrote previously about libs for error management in Rust. This week, I want to write about the try block, an experimental feature. The Limit of The ? Operator Please check the above article for a complete refresher on error management in general and the ? operator in particular. In short, ? allows to hook into a function call that returns a Result: If the Result contains a value, it continues normally If it contains an error, it short-circuits and returns the Result to the calling function. Rust fn add(str1: &str, str2: &str) -> Result<i8, ParseIntError> { Ok(str1.parse::<i8>()? + str2.parse::<i8>()?) } fn main() { print!("{:?}", add("1", "2")); print!("{:?}", add("1", "a")); } The output is the following: Plain Text Ok(3) Err(ParseIntError { kind: InvalidDigit }) Note that the defining function's signature must return a Result or an Option. The following block doesn't compile: Rust fn add(str1: &str, str2: &str) -> i8 { str1.parse::<i8>()? + str2.parse::<i8>()? } Plain Text the `?` operator can only be used in a function that returns `Result` or `Option` The Verbose Alternative We must manually unwrap to return a non-wrapper type, e.g., i8 instead of Option. Rust fn add(str1: &str, str2: &str) -> i8 { let int1 = str1.parse::<i8>(); //1 let int2 = str2.parse::<i8>(); //1 if int1.is_err() || int2.is_err() { -1 } //2-3 else { int1.unwrap() + int2.unwrap() } //4 } Define Result variables Manually checks if any of the variables contains an error, i.e., the parsing failed Return a default value since we cannot get a Result. In this case, it's not a great idea, but it's for explanation's sake Unwrap with confidence The try Block to the Rescue The sample above works but is quite lengthy. The try block is an experimental approach to make it more elegant. It allows "compacting" all the checks for errors in a single block: Rust #![feature(try_blocks)] //1 fn add(str1: &str, str2: &str) -> i8 { let result = try { let int1 = str1.parse::<i8>(); let int2 = str2.parse::<i8>(); int1.unwrap()? + int2.unwrap()? //2 }; if result.is_err() { -1 } //3 else { result.unwrap() } //4 } Enable the experimental feature Use the ? operator though the defining function doesn't return Result Check for errors only once Unwrap confidently Alas, the code doesn't compile: Plain Text the `?` operator can only be applied to values that implement `Try` i8 doesn't implement Try. Neither i8 nor Try belong to our crate; a custom implementation would require the use of the wrapper-type pattern. Fortunately, a couple of types already implement Try: Result, Option, Poll, and ControlFlow. Rust fn add(str1: &str, str2: &str) -> i8 { let result: Result<i8, ParseIntError> = try { //1 str1.parse::<i8>()? + str2.parse::<i8>()? //2 }; if result.is_err() { -1 } else { result.unwrap() } } The compiler cannot infer the type Using ? on Result inside the try block is now allowed Conclusion I learned about the try block in Java over twenty years ago. Java needs it because exceptions are at the root of its error-handling system; Rust doesn't because it uses Functional Programming for its error handling — mainly Result. The ? operator builds upon the Result type to allow short-circuiting in functions that return Result themselves. If the function doesn't, you need a lot of boilerplate code. The experimental try block relieves some of it. To Go Further Error management in Rust, and libs that support it "The Rust Unstable Book: try_blocks" The Rust RFC Book Extending Rust's Effect System
In December of last year, cybersecurity agencies from multiple nations (USA, UK, CA, AU, and NZ) collectively put out a document called "The Case for Memory Safe Roadmaps." While memory-safe programming languages are not my normal topic of discussion, it's an important security issue and should be understood. First, a quick explanation of memory safe vs. memory unsafe programming languages. In memory-unsafe languages, the developer is responsible for manually allocating and deallocating memory, which can lead to leaks, dangling pointers, and other bugs. And without automated bounds checking, they're more vulnerable to buffer overflows and other exploits. Operating systems, device drivers, embedded software, and more are often written in C++ to give the developers very precise control, get "close to the metal," and operate as fast and lean as possible. Memory-safe languages include some of the most popular programming languages in the world: Python, Java, C#, Go, Rust, and Swift. JavaScript, which powers most websites on the front end and is used as a back-end language using Node.js, is a mixed bag when you ask about memory safety. It depends on the runtime engine and environment. Especially in the browser, there are ways to create memory leaks with bad management of DOM objects. Given the speed and tuning abilities of a language like C++, why are all these security agencies recommending moving away from it? Memory Issues Are a Major Area of Vulnerability That may seem like restating the obvious, but two-thirds of vulnerabilities identified for memory-unsafe languages are related to memory issues. This can be found in a blog from the USA Cybersecurity and Infrastructure Security Agency (CISA) that pleads for developers to adopt memory-safe programming languages. In real-world numbers, they cite Microsoft stating that around 70% of their CVEs relate to memory issues. The same goes for Google with the Chromium project that underlies not just the Chrome browser, but Microsoft's Edge, Opera, and more. Mozilla, the developer of the Firefox browser, is quoted as stating that 94% of their critical/high-rated vulnerabilities were memory-related. Memory-Safe Programming Languages Are More Than “Good Enough” In 2022, the Linux Kernel officially began supporting kernel modules written in Rust. That's not minor. Linux runs on just about anything these days and is the base kernel for all Android devices, including smartphones, tablets, smart TVs, cars, etc. While a person who runs Linux on their laptop or server might not consider Android to be a Linux operating system because it lacks most of the utilities and features of a traditional distribution, the kernel is Linux. Both Rust and Go have been engineered to provide nearly C++ speeds and the three are the subject of a lot of discussion around performance. While one may beat the others in a specific benchmark, when multiple benchmark tests are taken into account, it's a toss-up with no language winning all of them. That does not mean you should immediately dump C++ unconditionally, but it's important to understand why you need it and if it will be superior enough for your specific purposes to assume its risks. HuggingFace's tokenizers AI library is written in Rust with bindings for both Python and JavaScript. Python is popular for AI because it's easy to learn. While developers are writing AI code in Python, thanks to libraries like tokenizers, Python is more like a supervisor assigning the hardest work to the hardest workers (the libraries), which allows for very high performance. Should You Switch to a Memory-Safe Language? If you're using C++ and are considering adjusting your roadmap to adopt a memory-safe language, you'll have to consider multiple factors: Which language is best suited to your existing and planned projects What tradeoffs you'll have to make Whether to port existing projects to the new language or just use it for new modules and new projects The cost of getting your developers up to speed on the new language Providing your developers with the right productivity and security tools, such as software composition analysis (SCA) tools, that will help you validate the third-party dependencies you use from package managers like PyPi (Python) or NPM (Node.js) Memory safety is an important consideration because the lack of it in languages like C++ is a big source of vulnerabilities. Continuing with memory-unsafe languages won't necessarily introduce new bugs, but it increases the likelihood they'll occur (or may already be there, but undiscovered). Memory-safe languages won't guarantee you write error-free code, but with less worrying about memory issues, you'll have more overhead to deal with other security concerns… like sprawling secrets.
Brief Problem Description Imagine the situation: you (a Python developer) start a new job or join a new project, and you are told that the documentation is not up to date or is even absent, and those, who wrote the code, resigned a long time ago. Moreover, the code is written in a language that you are not familiar with (or “that you do not know”). You open the code, start examining it, and realize that there are no tests either. Also, the service has been working on Prod for so long that you are afraid to change something. I am not talking about any particular project or company. I have experienced this at least three times. Black Box Well, you have a black box that has API methods (judging by the code) and you know that it pulls something and writes to a database. There is also documentation for those services that are receiving requests. The advantages include the fact that it starts there is documentation on the API that it pulls, and the service code is quite readable. As for the disadvantages, it wants to get something via API. Something can be run in a container, and something can be used from a developer environment, but not everything. Another problem is that requests to the black box are encrypted and signed as well as requests from it to some other services. At the same time, you need to change something in this service and not break what is working. In such cases, Postman or cURL is inconvenient to use. You need to prepare each request in each specific case since there are dynamic input data and signatures that depend on the time of the request.There are almost no ready-made tests, and it is difficult to write them if you do not know the language very well. The market offers solutions that allow you to run tests in such a service. However, I have never used them, so trying to understand them would be more difficult and would take much more time than creating my own solution. Created Solution I have come up with a simple and convenient option. I have written a simple script in Python that will pull this very application. I used requests and a simple signature that I created very quickly for the requests prepared in advance.Next, I needed to mock backends. First Option To do this, I just ran a mock service in Python. In my case, Django turned out to be the fastest and easiest tool for this. I decided to implement everything as simply and quickly as possible and used the latest version of Django. The result was quite good, but it was only one method and it took me several hours to use despite the fact that I wanted to save time. There are dozens of such methods. Examples of Configuration Files In the end, I got rid of everything I did not need and simply generated JSON with requests and responses. I described each request from the front end of my application, the expected response of the service to which requests were sent, as well as the rules for checking the response to the main request.For each method, I wrote a separate URL. However, manually changing the responses of one method from correct to incorrect and vice versa and then pulling each method is difficult and time-consuming. JSON { "id": 308, "front": { "method": "/method1", "request": { "method": "POST", "data": { "from_date": "dfsdsf", "some_type": "dfsdsf", "goods": [ { "price": "112323", "name": "123123", "quantity": 1 } ], "total_amount": "2113213" } }, "response": { "code": 200, "body": { "status": "OK", "data": { "uniq_id": "sdfsdfsdf", "data": [ { "number": "12223", "order_id": "12223", "status": "active", "code": "12223", "url": "12223", "op_id": "12223" } ] } } } }, "backend": { "response": { "code": 200, "method": "POST", "data": { "body": { "status": 1, "data": { "uniq_id": "sdfsdfsdf", "data": [ { "number": "12223", "order_id": "12223", "status": "active", "code": "12223", "url": "12223", "op_id": "12223" } ] } } } } } } Second Option Then I linked mock objects to the script. As a result, it appeared that there is a script call that pulls my application and there is a mock object that responds to all its requests. The script saves the ID of the selected request, and the mock object generates a response based on this ID. Thus, I collected all requests in different options: correct and with errors. What I Got As a result, I got a simple view with one function for all URLs. This function takes a certain request identifier and, based on it, looks for the response rules — a mock object. In the meantime, the script that pulls the service before the request writes this very request identifier to the storage. This script simply takes each case in turn, writes an identifier, and makes the correct request, then it checks if the response is correct, and that's it. Intermediate Connections However, I needed not only to generate responses to these requests but also to test requests to mock objects. After all, the service could send an incorrect request, so it was necessary to check them too. As a result, there was a huge number of configuration files, and my several API methods turned into hundreds of large configuration files for checking. Connecting Database I decided to transfer everything to a database. My service began to write not only to the console but also to the database so that it would be possible to generate reports. That appeared to be more convenient: each case had its own entry in the database. Cases are combined into projects and have flags that allow you to disable irrelevant options. In the settings, I added request and response modifiers, which should be applied to each request and response at all levels. To simplify this as much as possible, I use SQLite. Django has it by default. I have transferred all configuration files to the database and saved all testing results in it. Algorithm Therefore, I found a very simple and flexible solution. It already works as an external integration test for three microservices, but I am the only one who uses it. It certainly does not override unit tests, but it complements them well. When I need to validate services, I use this Django tester to do that. Configuration File Example The settings have become simpler and are managed with Django Admin. I can easily turn them off, change, and watch history. I could go further and make a full-fledged UI, but this is more than enough for me for now. Request Body JSON JSON { "from_date": "dfsdsf", "some_type": "dfsdsf", "goods": [ { "price": "112323", "name": "123123", "quantity": 1 } ], "total_amount": "2113213" } Response Body JSON JSON { "uniq_id": "sdfsdfsdf", "data": [ { "number": "12223", "order_id": "12223", "status": "active", "code": "12223", "url": "12223", "op_id": "12223" } ] } Backend Response Body JSON JSON { "status": 1, "data": { "uniq_id": "sdfsdfsdf", "data": [ { "number": "12223", "order_id": "12223", "status": "active", "code": "12223", "url": "12223", "op_id": "12223" } ] } } What It Gives You In what way can this service be useful? Sometimes, even with tests, you need to pull services from the outside, or several services in a chain. Services can also be black boxes. A database can be run in Docker. As for an API...an API can be run in Docker as well. You need to set a host, port, and configuration files and run it. Why the Unusual Solution? Some may say that you can use third-party tools integration tests or some other tests. Of course, you can! But, with limited resources, there is often no time to apply all this, and quick and effective solutions are needed. And here comes the simplest Django service that meets all requirements.
Just as you can plug in a toaster, and add bread... You can plug this API Appliance into your database, and add rules and Python. Automation can provide: Remarkable agility and simplicity With all the flexibility of a framework Using conventional frameworks, creating a modern, API-based web app is a formidable undertaking. It might require several weeks and extensive knowledge of a framework. In this article, we'll use API Logic Server (open source, available here) to create it in minutes, instead of weeks or months. And, we'll show how it can be done with virtually zero knowledge of frameworks, or even Python. We'll even show how to add message-based integration. 1. Plug It Into Your Database Here's how you plug the ApiLogicServer appliance into your database: $ ApiLogicServer create-and-run --project-name=sample_ai --db-url=sqlite:///sample_ai.sqlite No database? Create one with AI, as described in the article, "AI and Rules for Agile Microservices in Minutes." It Runs: Admin App and API Instantly, you have a running system as shown on the split-screen below: A multi-page Admin App (shown on the left), supported by... A multi-table JSON:API with Swagger (shown on the right) So right out of the box, you can support: Custom client app dev Ad hoc application integration Agile collaboration, based on working software Instead of weeks of complex and time-consuming framework coding, you have working software, now. Containerize API Logic Server can run as a container or a standard pip install. In either case, scripts are provided to containerize your project for deployment, e.g., to the cloud. 2. Add Rules for Logic Instant working software is great: one command instead of weeks of work, and virtually zero knowledge required. But without logic enforcement, it's little more than a cool demo. Behind the running application is a standard project. Open it with your IDE, and: Declare logic with code completion. Debug it with your debugger. Instead of conventional procedural logic, the code above is declarative. Like a spreadsheet, you declare rules for multi-table derivations and constraints. The rules handle all the database access, dependencies, and ordering. The results are quite remarkable: The 5 spreadsheet-like rules above perform the same logic as 200 lines of Python. The backend half of your system is 40X more concise. Similar rules are provided for granting row-level access, based on user roles. 3. Add Python for Flexibility Automation and rules provide remarkable agility with very little in-depth knowledge required. However, automation always has its limits: you need flexibility to deliver a complete result. For flexibility, the appliance enables you to use Python and popular packages to complete the job. Below, we customize for pricing discounts and sending Kafka messages: Extensible Declarative Automation The screenshots above illustrate remarkable agility. This system might have taken weeks or months using conventional frameworks. But it's more than agility. The level of abstraction here is very high, bringing a level of simplicity that empowers you to create microservices - even if you are new to Python or frameworks such as Flask and SQLAlchemy. There are 3 key elements that deliver this speed and simplicity: Microservice automation: Instead of slow and complex framework coding, just plug into your database for an instant API and Admin App. Logic automation with declarative rules: Instead of tedious code that describes how logic operates, rules express what you want to accomplish. Extensibility: Finish the remaining elements with your IDE, Python, and standard packages such as Flask and SQLAlchemy. This automation appliance can provide remarkable benefits, empowering more people, to do more.
Go uses escape analysis to determine the dynamic scope of Go values. Typically, go tries to store all the Go values in the function stack frame. The go compiler can predetermine which memory needs to be freed and emits machine instructions to clean it up. This way it becomes easy to clean up memory without the intervention of the Go Garbage Collector. This way of allocating memory is typically called stack allocation. But when the compiler cannot determine the lifetime of a Go value it escapes to the heap. A value may also escape to the heap when the compiler does not know the size of the variable, or it’s too large to fit into the stack, or if the compiler cannot determine whether the variable is used after the function ends or the function stack frame is not used anymore. Can we truly and completely know whether the value is stored in the heap or stack? The answer is NO. Only the compiler would know where exactly the value is stored all the time. As mentioned in this doc “The Go language takes responsibility for arranging the storage of Go values; in most cases, a Go developer need not care about where these values are stored, or why, if at all.” There might still be scenarios where we might want to know the allocation to improve performance. As we know, physical memory is finite, and overuse might result in unnecessary performance issues. Let’s now see how we can determine when and why a variable escapes to the heap. We will use the go build command to determine it. Run go help build to get various options for go build. We will use go build -gcflags=”-m” command to ask the compiler where the variables are being put. Let’s now go through some examples to determine it: In this example, we call a square function from our main function Go package main func main() { x := 2 square(x) } func square(x int) int { return x*x } When we run the above code with go build -gcflags=”-m” we get the below result: Go # github.com/pranoyk/escape-analysis ./main.go:8:6: can inline square ./main.go:3:6: can inline main ./main.go:5:8: inlining call to square Right now everything is in the stack frame. 2. Let’s now modify our code to return a pointer from the square function Go package main func main() { x := 2 square(x) } func square(x int) *int { y := x*x return &y } When we build this code we get the following: # github.com/pranoyk/escape-analysis ./main.go:8:6: can inline square ./main.go:3:6: can inline main ./main.go:5:8: inlining call to square ./main.go:9:2: moved to heap: y Here the value `y` escaped to the heap. Now notice why this happened. The value of `y` has to prevail once the square function life cycle has finished and hence it escapes to the heap. 3. Let’s modify the above function. Let’s make our square function accept a pointer and not return a value. Go package main func main() { x := 4 square(&x) } func square(x *int) { *x = *x**x } When we build the above code we get the following: Go # github.com/pranoyk/escape-analysis ./main.go:8:6: can inline square ./main.go:3:6: can inline main ./main.go:5:8: inlining call to square ./main.go:8:13: x does not escape Notice in the above function even though we are passing the pointer to the square the compiler mentions that the variable `x` does not escape. This is because the variable `x` is created in the main function stack frame which lives longer than the square function stack frame. 4. Let’s make one more modification to the above code. Let’s make our square function both accept and return a pointer. Go package main func main() { x := 4 square(&x) } func square(x *int) *int { y := *x**x return &y } The allocation of the above code is: Go # github.com/pranoyk/escape-analysis ./main.go:8:6: can inline square ./main.go:3:6: can inline main ./main.go:5:8: inlining call to square ./main.go:8:13: x does not escape ./main.go:9:2: moved to heap: y Now notice carefully that the result of this code is a combination of examples 2 and 3. If we look even further closely we can say that sharing memory down from the main to another function typically stays on the stack and sharing memory up from a function to the main typically escapes to the heap. We can never be completely sure about it always because only the compiler would truly know where the value is stored. But this still gives some hint as to when an escape to the heap may occur. Conclusion Escape analysis in Go is a way in which the compiler determines whether the value is to be stored in the stack frame or the heap. Anything that cannot be stored in the function stack frame escapes to the heap. We can check the memory allocation of our code using `go build -gcflags=”-m”`. Although, go manages the memory allocation quite efficiently and almost always a developer might not be concerned with it. It’s still good to know in case you want to improve on the performance. References Understanding Allocations: the Stack and the Heap A Guide to the Go Garbage Collector
Python has emerged as one of the most popular and versatile programming languages in the ever-evolving software development landscape. Its simplicity, readability, and rich ecosystem of libraries make it an ideal choice for a wide range of applications, from web development and data science to automation and machine learning. However, writing efficient and maintainable Python code requires more than just a basic understanding of the language syntax. It involves leveraging modern patterns, features, and strategies that facilitate readability, maintainability, robustness, and performance. This article explores some impactful patterns, features, and strategies in modern Python programming that can help you write efficient and effective code. We'll delve into various aspects of Python development, including best practices for writing clean and readable code, techniques for enhancing code maintainability and robustness, and strategies for optimizing code performance. Whether you're a seasoned Python developer looking to sharpen your skills or a newcomer eager to learn modern Python programming, this article will provide valuable insights and practical tips to help you become a more proficient Python programmer. Throughout this exploration, we'll cover topics such as: Readability: The importance of writing code that is easy to understand and maintain, including adhering to coding conventions, using descriptive names, and writing clear documentation. Maintainability: Strategies for organizing code into modular components, writing meaningful comments, and practicing version control to facilitate long-term maintainability and collaboration. Robustness: Techniques for gracefully handling errors, validating inputs, and implementing defensive programming practices to ensure your code behaves predictably under various conditions. Performance: Best practices for optimizing code performance, including choosing efficient algorithms and data structures, profiling code to identify bottlenecks, and leveraging concurrency and parallelism where applicable. By mastering these patterns, features, and strategies, you'll be well-equipped to write Python code that is efficient and performant, and easy to maintain and scale as your projects grow in complexity. Let's embark on this journey to explore the world of modern Python programming and unlock its full potential. Here's a breakdown of impactful patterns, features, and strategies for writing efficient code in modern Python, covering the topics you've listed: 1. Generators Patterns Generators are a powerful feature in Python for creating iterators. They allow you to generate values on the fly, which can be memory-efficient, especially for large datasets. Patterns Generator Expressions Generator expressions are a concise and memory-efficient way to compute large sequences of values lazily in Python. They are similar to list comprehensions but use parentheses () instead of square brackets []. Generator expressions produce values on the fly, one at a time, as requested, rather than generating the entire sequence upfront and storing it in memory. This makes them ideal for dealing with large datasets or infinite sequences. Generator Functions Generator functions are special functions in Python that use the yield keyword to produce a series of values over time rather than returning them all at once. When a generator function is called, it returns a generator iterator, which can be iterated over to generate values one by one. Generator functions allow for efficient memory usage and lazy evaluation, making them useful for processing large datasets or infinite sequences. 2. Collections With Comprehensions List, dictionary, and set comprehensions are concise and efficient ways to create Python collections. Patterns List Comprehensions: When creating a new list, apply an operation to each item in an existing iterable. Dictionary Comprehensions: Use when creating a new dictionary by transforming items from an existing iterable. Set Comprehensions: Use when you need to create a new set by applying an operation to each item in an existing iterable. 3. Functions Python functions are first-class citizens, which can be passed around as arguments and returned from other functions. Understanding advanced function features can lead to more elegant and efficient code. Features Higher-Order Functions Higher-order functions are functions that can accept other functions as arguments or return them as results. In Python, functions are first-class citizens, meaning they can be treated like any other data type. This allows you to pass functions as arguments to other functions or return them from other functions, enabling powerful and flexible programming paradigms such as functional programming. Anonymous Functions (Lambda Functions) Lambda functions, known as anonymous functions, are concise functions defined using the lambda keyword. They are typically used for short, simple operations where defining a named function would be overkill. Lambda functions can take any number of arguments but only have one expression. They are often combined with higher-order functions or when a function is needed as a short-lived object. Closures Closures are functions that capture variables from their enclosing scope, even when called outside that scope. In Python, functions can access variables defined in their enclosing scope (outer function) even after the outer function has finished executing, as long as the inner function (closure) is still in scope. This allows closures to "remember" and access the values of variables from the outer function, providing a way to maintain state across multiple function calls. 4. Decorators Decorators are a powerful and flexible tool in Python for modifying the behavior of functions or methods. Patterns The Basic Decorator The fundamental concept of decorators in Python revolves around enhancing or modifying functions by wrapping them with another function. Decorators are typically denoted with the @ symbol followed by the name of the decorator function. This allows for adding functionality or behavior to functions without directly modifying their original code. Decorators That Take Arguments Building on the basic decorator pattern, decorators can be designed to accept arguments. This enables more customizable behavior, as decorators can be parameterized to adapt to different scenarios or use cases. Decorators that take arguments can modify their behavior based on the specific needs of the decorated function or the context in which it is used. Class-Based Decorators Decorators are not limited to functions; they can also be applied to classes. Class-based decorators extend decorators to work with classes, allowing for the enhancement of entire classes or their methods. This provides a powerful mechanism for adding functionality, behavior, or attributes to classes in a modular and reusable manner. Preserving the Wrapped Function When applying decorators, it's essential to retain the original function's metadata and signature. This ensures compatibility with tools like introspection and documentation generation, which rely on inspecting functions' properties and signatures. Decorators maintain transparency and compatibility with existing code and development workflows by preserving the wrapped function. 5. Exception Handling Python's exception-handling mechanism allows you to handle errors and unexpected conditions in your code gracefully. Strategies Try-Except Blocks Try-except blocks are used in Python to handle exceptions gracefully. The try block encloses code that might raise an exception, while the exception block is used to catch and handle specific exceptions that occur within the try block. By using try-except blocks, you can prevent your program from crashing when encountering unexpected errors and handle them in a controlled manner. Custom Exceptions Custom exception classes are user-defined exception types that allow you to represent specific error conditions in your code. By creating custom exceptions, you can provide more descriptive error messages and make your code more readable and maintainable. Custom exceptions are typically subclasses of Python's built-in Exception class. Exception Chaining Exception chaining allows you to associate one exception with another, preserving the original traceback information while raising a new exception. This is useful when catching an exception in one part of the code and re-raising it with additional context or information in another part. The from keyword is used to chain exceptions together. 6. Classes and Objects Object-oriented programming is a fundamental paradigm in Python, and understanding various design patterns can help you write more maintainable and scalable code. Key Design Patterns Factory Pattern The Factory Pattern is a creational design pattern that provides an interface for creating objects in a superclass but allows subclasses to alter the type of objects that will be created. This pattern is useful when the exact class of objects to be created is unknown beforehand or when the creation process involves complex logic. By using a factory function or method, clients can create objects without needing to specify their exact class, thus promoting flexibility and decoupling in the codebase. Singleton Pattern The Singleton Pattern is a creational design pattern that ensures a class has only one instance and provides a global point of access to that instance. This pattern is useful when exactly one object is needed to coordinate actions across the system, such as a logger, database connection, or configuration manager. By restricting the instantiation of a class to a single instance, the Singleton Pattern facilitates centralized access to shared resources and prevents unnecessary duplication of objects. Observer Pattern (Publisher-Subscriber) The Observer Pattern is a behavioral design pattern that establishes a one-to-many dependency between objects, where one object (the subject or publisher) maintains a list of its dependents (observers or subscribers) and notifies them of any state changes. This pattern is commonly used in event-driven architectures, user interface frameworks, and distributed systems to achieve loose coupling between components. Observers register interest in specific events or notifications and receive updates automatically when changes occur, enabling efficient communication and event handling. Builder Pattern The Builder Pattern is a creational design pattern that separates the construction of a complex object from its representation, allowing the same construction process to create different representations. This pattern is useful when the construction of an object involves multiple steps or configurations, and the client code needs to be shielded from the details of the construction process. By using a builder class to encapsulate the construction logic, clients can create objects using a fluent interface or a step-by-step approach, providing flexibility and maintainability in object creation. Strategy Pattern The Strategy Pattern is a behavioral design pattern that defines a family of algorithms, encapsulates each algorithm, and makes them interchangeable. This pattern allows clients to vary the behavior of a class or method dynamically at runtime, without altering its structure. By encapsulating algorithms into separate classes and providing a common interface, the Strategy Pattern promotes code reuse, flexibility, and extensibility, enabling clients to select the most appropriate algorithm for a given context or scenario. 7. Test-Driven Development (TDD) Writing tests for your code ensures correctness and facilitates future changes and refactoring. Strategies Unit Tests and Simple Assertions Utilize unit tests to verify the functionality of individual units or components within your codebase. Employ simple assertions, such as assertEqual, assertTrue, or assertRaises, to validate expected behavior and outcomes. Fixture and Common Test Setup Implement fixtures to establish common test environments and configurations, ensuring consistency across test cases. Utilize setUp and tearDown methods to initialize and clean up resources before and after each test, streamlining test setup and teardown processes. Asserting Exceptions Validate error handling by asserting the occurrence of expected exceptions using methods like assertRaises or context managers like with self.assertRaises. This ensures the code properly handles exceptional scenarios and behaves as intended in error conditions. Using Subtests Emply subtests to break down complex test cases into smaller, more manageable units, allowing for finer-grained testing and isolation of specific functionalities. This approach enhances test organization, readability, and maintainability, particularly in scenarios with multiple test conditions or inputs. Test-Driven Development (TDD) Best Practices Ensure each software feature or behavior has a corresponding test Maintain small and targeted tests Begin by writing tests before implementing the code Regularly verify test success by running them frequently Refactor code as needed for improved efficiency Employ a dedicated test runner such as pytest for comprehensive test execution and detailed feedback 8. Logging Logging is essential for monitoring and debugging your Python applications. Strategies Logging levels: Use different logging levels (e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL) to categorize log messages by severity. Log formatting: Customize the format of log messages to include relevant information like timestamps, log levels, and source file names. Logging handlers: Use handlers to control where log messages are outputted (e.g., console, file, network). Logging configuration: Configure logging settings programmatically or through a configuration file to control log levels, formats, and destinations. Incorporating these patterns, features, and strategies into your Python code allows you to write more efficient, maintainable, and robust software. In conclusion, mastering modern Python programming involves understanding and applying impactful patterns, features, and strategies that promote efficiency and effectiveness in code development. Python's popularity and versatility stem from its simplicity, readability, and extensive library ecosystem, making it suitable for diverse applications ranging from web development and data science to automation and machine learning. Becoming a proficient Python programmer involves continuous learning and practice, embracing modern techniques, and exploring Python's vast possibilities in the dynamic software development landscape.
Go (aka Golang) came to life at Google in 2009. It was designed by a few big names: Robert Griesemer, who had a large hand in the development of the Java Virtual Machine. Rob Pike, who holds the U.S. patent for windowing UI systems as well as helped build the Plan 9 operating system at Bell Labs. (In fact, the mascots for Plan 9 and for Golang are remarkably similar because Pike’s wife, Renée French, is a renowned illustrator.) Ken Thompson, who designed and implemented a little thing called Unix. In this article, we’ll demonstrate how simple it is to build a RESTful web service in Go. Then, we’ll demonstrate how to deploy this application with Heroku. But before we embark on this journey, let’s talk briefly about why you might want to use Go. Why Go? To build a web service in 2024, why would you choose Go over another language, like Python or TypeScript? Go’s biggest advantage over those two languages is speed. Go is a compiled-to-machine-code language. It’s not interpreted like Python, Ruby, or JavaScript. It’s not even compiled to bytecode and run in a virtual machine like Java. Lots of benchmarks show Go to be 40x or 50x faster than applications written in interpreted languages. When it comes to speed, Go applications perform similarly to those written in Rust or C++. Go has a simple syntax, often with only one particular way of doing something. This is appealing to many developers, especially any who have ever been in a development team setting, where squabbles over various ways of doing things eat up precious time. This simplicity drives conformity in a codebase and offers less perplexity when reading the code. (Believe it or not, most developers spend more of their time reading code rather than writing it.) Go is a young language, so it comes packed with modern features out of the nursery. You get automatic garbage collection like in Java or Python. You get built-in linters, formatters, and unit testing. You get a rich network stack in the standard library. And perhaps most beneficial to network programmers: You get an easy-to-use multi-threading toolkit called Goroutines. Yes, there are some reasons why not everyone is hot on Go. One common complaint revolves around error handling in Go. Functions in Go can return multiple values, one of which is an error code. This hearkens back to the days of C—before exceptions—and feels admittedly archaic. It is easy to forget to check error codes for every function. It’s also tedious to percolate errors from down in the depths—when you awaken a balrog deep inside your application—up to somewhere manageable. You know you’ve done that. Alright, the Go cheerleading is done. Let’s get to building. Building a Simple RESTful Web Service in Go We’ll build a small API service that provides some text operations that applications commonly need, such as: Encode a given string using a basic Caesar Cipher Determine if a string is a palindrome (Perhaps most importantly) SpongeBob-encode a zinging retort. If you’d rather skip ahead to the finished code for this application, you can find it in this GitHub repo. We’re not going to go through main.go line by line, but we’ll talk about the important bits. Let’s start with the main function, the bootstrapping code of the service: Go func main() { http.HandleFunc("/is-palindrome", palindromeHandler) http.HandleFunc("/rot13", rot13Handler) http.HandleFunc("/spongebob", spongebobHandler) http.HandleFunc("/health", healthHandler) appPort := ":" + os.Getenv("PORT") if appPort == ":" { appPort = ":8080" } err := http.ListenAndServe(appPort, nil) if err != nil { return } } As we mentioned, one of Go’s powerful features is the expressive net/http standard library. You don’t need any third-party dependencies to quickly get a basic RESTful service up and running. With http.HandleFunc, you can easily define your routes and assign handlers to the requests that are routed to those URIs. The http.ListenAndServe method kicks off the server, binding it to the port you specify. When we deploy to Heroku, Heroku will handle setting the PORT var in the environment. For our local deployment, we default to 8080. Let’s look at a handler: Go func spongebobHandler(w http.ResponseWriter, r *http.Request) { decoder := json.NewDecoder(r.Body) var t requestPayload err := decoder.Decode(&t) if err != nil { panic(err) } result := map[string]string{ "original": *t.Input, "spongebob": spongebob(*t.Input), } w.Header().Set("Content-Type", "application/json") err = json.NewEncoder(w).Encode(result) if err != nil { return } } Our handler needs to do the work of taking the JSON body of the request and parsing it into a Go struct defined outside of this snippet. Then, it needs to build the result from other functions that transform the input string into a SpongeBob utterance. Again, none of the libraries here are third-party dependencies; they all come standard with Go. You can see the prevalence of error handling here via error codes, as working with err takes up a large part of the code real estate. To run this service locally, we simply do this: Shell $ go run main.go Then, we send a GET request to /health: Shell $ curl -s http://localhost:8080/health We receive a JSON response indicating the service is up and healthy. That’s it—one file, with roughly 100 lines of actual code, and you have a working Go RESTful microservice! Deploying Your Go Service to Heroku Running the service on your laptop is OK, I guess. But you know what would be really cool? Running it on the web, that’s what. These days, we have lots of options for how to host a service like this. You could build out your own infrastructure using AWS or Azure, but that gets complicated and expensive quickly. Lately, I’ve been turning more and more to Heroku. As a platform-as-a-service (PaaS), it’s a low-hassle, low-cost option that allows me to deploy applications to the cloud quickly. When I’m doing testing and development, I use their Eco Dyno plan to get 1000 dyno hours per month for $5. To deploy basic apps to production, I use their Basic Dyno Plan, which costs a max of $7 per month. For frameworks that Heroku supports, the process of deploying right from your local machine to the web is quick and painless. After setting up a Heroku account, I install the Heroku CLI and log in from the command line. You can create a new app directly through the CLI, or you can use the web UI. I named my application the same as my GitHub repo: golang-text-demo. We’ll think of something snazzier before our IPO; but for now, this will do. To deploy our GitHub repo to Heroku, we first need to add a remote repository. Shell $ heroku git:remote -a golang-text-demo This creates a new remote location in our GitHub repo, pointing it to the Heroku application we just created. Now, whenever we push our branch to that remote (git push heroku main), it will kick off a flurry of activity as Heroku gets to work. Lastly, we add one file called go.mod, which specifies our app’s build dependencies (we don’t have any) and build configurations for Heroku. Our file is short and sweet, simply setting the Go version we want Heroku to use: Shell module golang-text-demo go 1.22 When we push to our Heroku remote, Heroku initializes all the required resources in the cloud. This may take a minute or two the first time you deploy your app, but the results appear to be cached, reducing the time in subsequent deploys. When your app has successfully deployed, you’ll see output that looks similar to this: This gives us the URL for our deployed Heroku app. Sweet! With a single git push command, we’ve deployed a Go microservice to the cloud, and it is now accessible anywhere in the world. To interact with it, we simply issue the same curl command we did before, but we use the Heroku app URL instead of localhost. The Heroku CLI also gives us access to our application’s logs. It’s almost exactly like working with the tools directly on your local machine. We just run heroku logs -tail, and we see the latest log lines from Heroku and our application right there in our terminal window. Before we finish, let’s briefly highlight the impressive insights that can be gained about your application from the Heroku app dashboard. Sure, there’s the obvious stuff you care about—like how much your resources are costing or whether or not they are functioning. But the metrics section gives you impressive detail about the performance of your application in near real-time. Somebody better do something about those critical errors… Conclusion In this walkthrough, we’ve explored why Go is a great choice for building a modern, low-dependency, and efficient web service. We built a simple API service in Go and demonstrated how to deploy our service using Heroku. As a PaaS, Heroku supports running a wide variety of services, not just Go. With that, you now have the tools needed to get started on your own Go services journey. Don’t wait, get Go-ing!
Python is one of the most popular and versatile programming languages in the world. Known for its simplicity and readability, Python is an excellent choice for beginners and experienced developers alike. In this tutorial, we'll cover the fundamentals of Python programming, from basic syntax to more advanced concepts, to help you kickstart your journey into the world of programming. Introduction to Python Python is a high-level, interpreted programming language that emphasizes code readability and simplicity. It was created by Guido van Rossum and first released in 1991. Python's design philosophy focuses on code readability, with its clear and expressive syntax that makes it easy to learn and use. Setting Up Your Environment Before diving into Python programming, you'll need to set up your development environment. Python is compatible with various operating systems, including Windows, macOS, and Linux. You can download and install Python from the official Python website, which provides installers for different platforms. Once Python is installed, you can use the built-in IDLE (Integrated Development and Learning Environment) or choose from a variety of third-party code editors and IDEs (Integrated Development Environments) such as Visual Studio Code, PyCharm, or Sublime Text. Basic Syntax and Data Types Python uses a simple and intuitive syntax, making it easy to write and understand code. Let's start with some basic concepts: Variables and Data Types In Python, variables are used to store data values. You can assign a value to a variable using the equals sign (=). Python supports various data types, including: Integer: Whole numbers without any decimal point (e.g., 10, -5) Float: Numbers with a decimal point (e.g., 3.14, -0.5) String: Sequence of characters enclosed in single ('') or double ("") quotes (e.g., 'hello', "world") Python # Variable assignment x = 10 y = 3.14 name = 'Python' # Print variable values print(x) # Output: 10 print(y) # Output: 3.14 print(name) # Output: Python Basic Arithmetic Operations Python supports various arithmetic operations, including addition (+), subtraction (-), multiplication (*), division (/), and exponentiation (**). Python # Arithmetic operations a = 10 b = 5 print(a + b) # Output: 15 print(a - b) # Output: 5 print(a * b) # Output: 50 print(a / b) # Output: 2.0 print(a ** b) # Output: 100000 Control Flow Statements Control flow statements allow you to control the execution of your code based on certain conditions. Python supports several control flow statements, including: If...Else Statements The if...else statement allows you to execute a block of code conditionally based on a specified condition. Python # If...else statement age = 18 if age >= 18: print("You are eligible to vote.") else: print("You are not eligible to vote.") Loops Loops are used to iterate over a sequence of elements or execute a block of code repeatedly. Python # For loop for i in range(5): print(i) # Output: 0, 1, 2, 3, 4 # While loop num = 0 while num < 5: print(num) # Output: 0, 1, 2, 3, 4 num += 1 Functions and Modules Functions allow you to organize your code into reusable blocks, while modules are Python files containing functions, classes, and variables. You can import modules into your Python code to reuse their functionality, as shown below: Python # Function definition def greet(name): print("Hello, " + name + "!") # Function call greet("Python") # Output: Hello, Python! Conclusion In this Python tutorial for beginners, we've covered the basics of Python programming, including setting up your environment, basic syntax and data types, control flow statements, functions, and modules. Python's simplicity and versatility make it an excellent choice for both beginners and experienced developers. Now that you've learned the fundamentals of Python, you're ready to explore more advanced topics and start building your own Python projects. Happy coding!
Sameer Shukla
Sr. Software Engineer,
Leading Financial Institution
Kai Wähner
Technology Evangelist,
Confluent
Alvin Lee
Founder,
Out of the Box Development, LLC