Author: drweb

Optical Character Recognition lets you extract printed, handwritten, or scanned text from images and convert it into machine-readable data. If you have ever stared at a photograph of a table and dreaded typing all that data into a spreadsheet by hand, OCR is the solution. You write the code once, feed it any image, and get clean spreadsheet rows out the other end without touching a keyboard. In this article you will build two complete Python pipelines that handle end-to-end OCR on table images and write the results directly to Excel files. The first pipeline uses Pytesseract, the wrapper around…

Read More

Always normalize case and filter stopwords before analyzing word frequencies in most applications The complete pipeline from raw text to frequency comparison and visualization fits in under 100 lines of Python Save results to CSV for downstream use and generate Matplotlib charts for visual reporting Frequently Asked Questions What is the difference between tokenization and stemming? Tokenization splits text into individual units called tokens, typically words or subword sequences. Stemming reduces those tokens to their root form by removing affixes. Tokenization is a prerequisite for stemming, and both are standard steps in any text mining pipeline. Can text mining work…

Read More

Every tech conference likes to say it is about the future. Most of them are really about product launches, roadmaps and a little carefully managed optimism. SUSECON feels different this year. Part of that is timing. Part of it is geography. And part of it is that SUSE happens to sit right in the middle of several conversations that are becoming more urgent by the day. The event runs April 20 through 23 in Prague, with more than 100 breakout sessions covering Linux, cloud native infrastructure, edge computing, AI, observability and digital sovereignty, along with keynotes, hands-on labs, certification exams…

Read More

Operators are the building blocks of any Python expression. They perform operations on values and variables, and the values they act upon are called operands. Python gives you both keyword-based and symbol-based operators, and I use them every day in production code without thinking twice about the distinction. This article covers every operator type Python ships with, from basic arithmetic through bitwise manipulation, operator overloading, and the operator module. By the end, you will know exactly what each operator does, how precedence works, and how to overload them for your own classes. Arithmetic Operators Arithmetic operators work on numeric types,…

Read More

Software runs, but sometimes it doesn’t… and that’s often down to a lack of runtime visibility in relation to platform engineering teams being able to trust coding assistants and AI-powered site reliability engineering (SRE) services. It’s an assertion made by software reliability company Lightrun, in its State of AI-Powered Engineering Report 2026, based on an independent poll of 200 SREs and DevOps leaders at enterprises in the U.S., UK and EU.  “To keep pace with AI-driven velocity, we can no longer rely on reactive observability. We must shift runtime visibility left, giving AI tools and agents the live execution data…

Read More

Accuracy is a trap for anyone working with imbalanced classification. In fraud detection, cancer screening, churn prediction, and spam filtering, the majority class dominates by orders of magnitude. A model trained on such data can achieve 95% accuracy by simply predicting the majority class every time. That is not a useful model. The accuracy metric does not tell you that you have built something worthless. Precision and recall solve this problem by measuring what the confusion matrix actually says. This article covers how both metrics work, when to prioritize each one, and how to compute them in Python with code…

Read More

Mean and standard deviation are the two statistics you reach for first when you need to understand a dataset. Mean tells you where the center of your data sits. Standard deviation tells you how tightly or loosely the values cluster around that center. Every engineer I know uses these two measures daily, whether they are profiling latency across API endpoints, checking quality control metrics on a production line, or deciding what inventory levels make sense for next quarter. This article covers the full picture: the math behind mean and standard deviation, the difference between population and sample statistics, and every…

Read More

Building enterprise-grade AI agents just got a little less risky. OpenAI has released a significant update to its Agents SDK, adding two capabilities that development teams have been waiting for: Native sandboxing and an in-distribution model harness. Together, these additions push the SDK from a promising framework into something closer to a production-ready platform. From Swarm to Something Serious If you’ve been following OpenAI’s agent development story, the Agents SDK has come a long way in a short time. It launched in early 2025 as the production-ready successor to Swarm, an experimental, lightweight framework for exploring multi-agent patterns. The SDK…

Read More

Python for data analysis is getting popular over time because first, it is easy to learn and use. Then it has lots of powerful libraries for data analysis tasks like NumPy for fast calculations, Pandas for handling and analyzing data, and Matplotlib for data visualization. Plus, it works well with large data and connects easily with AI tools, helping analysts work faster and smarter. Why Practice Python for Data Analysis MCQs? Practising Multiple Choice Questions (MCQs) is one of the best ways to test your foundational knowledge. While writing code is important, understanding the theoretical concepts ensures you can apply…

Read More