Haojie Ni’s Blog

你好 / Welcome / Willkommen 👋

This is my little corner on the internet where I share learning notes and some nerdy thoughts—hopefully you’ll find something useful (or at least fun) here.

Right now, I’m exploring opportunities as a software engineer in Germany, especially around Python Backend, DevOps, and Data Engineering. I spend most of my time in Python and Linux, and I enjoy tinkering with them a lot.

Feel free to check out my LinkedIn or GitHub if you’re curious about what I’ve been working on.
And if you’d like the full story, my CV is available as a PDF.

Understanding the NULL Character: Behavior Across Languages, Databases, and Editors

In this post, we explore the NULL character in different programming languages and databases, and how it’s presented in different contexts.

September 25, 2025

Building a BPE Tokenizer with TDD - Part 3: Implementing Encode and Decode Methods

Final part of our BPE tokenizer series, where we implement encoding and decoding capabilities. We’ll write comprehensive tests for token conversion, handle special tokens, and ensure proper error handling for edge cases.

September 13, 2025

Building a BPE Tokenizer with TDD - Part 2: Implementing the Train Method

Second part of our BPE tokenizer series, focusing on implementing the train method. We’ll cover the core BPE algorithm, write tests for training functionality, and implement vocabulary management and pair merging logic.

September 13, 2025

Building a BPE Tokenizer with TDD - Part 1: Project Setup and First Test

First part of a series on building a Byte Pair Encoding tokenizer using Test-Driven Development. We set up our project structure, create a virtual environment, and write our first test for the tokenizer’s initialization.

September 13, 2025

A Deep Dive into UTF-8 for BPE Tokenization

A hands-on exploration of UTF-8 encoding, prompted by the need to prepare text for a Byte Pair Encoding (BPE) tokenizer. This post breaks down why Unicode characters produce mixed results of readable text and hex codes when encoded, clarifies that all bytes are fundamentally integers, and demystifies the non-continuous ranges in the UTF-8 specification with examples. Features AI-assisted explanations from Gemini and ChatGPT.

September 11, 2025

n8n - Telegram Bot for ZH-DE Translation With Anki

Background I’m moving to Germany soon to look for a job and wanted a tool to help me improve my German speaking skills. After trying numerous dictionary apps without finding one that met my needs, I decided to build my own solution. My ideal application would accept Chinese voice input or text input, translate it to German, and store the translations in Anki. This workflow would allow me to efficiently memorize vocabulary and practice speaking German. ...

August 24, 2025

Django Beginers - Difference Between login_required and LoginRequiredMixin

中文标题: login_required 和 LoginRequiredMixin 的区别 Both login_required and LoginRequiredMixin serve the same purpose of restricting access to authenticated users, but they’re used in different contexts: @login_required (Function-Based Views) Type: Decorator Usage: Used with function-based views How it works: If user is not logged in, redirects to settings.LOGIN_URL Passes the current path in the query string as ?next=/path/ Example: from django.contrib.auth.decorators import login_required @login_required def my_view(request): return HttpResponse('This view requires login') LoginRequiredMixin (Class-Based Views) Type: Mixin class Usage: Used with class-based views How it works: Must be the left-most mixin in the inheritance list Implements dispatch() to check authentication Also redirects to settings.LOGIN_URL if not authenticated Example: from django.contrib.auth.mixins import LoginRequiredMixin from django.views.generic import TemplateView class MyView(LoginRequiredMixin, TemplateView): template_name = 'my_template.html' login_url = '/login/' # Optional: override default login URL redirect_field_name = 'next' # Optional: change the "next" parameter name Key Differences: ...

June 4, 2025

Django Beginers - Difference Between Blank and Null

中文标题: Blank 和 Null 的区别 Forshort, null is for database, blank is for form. Below are common combinations: Required field (default) name = models.CharField(max_length=100) # blank=False, null=False Optional field (with null in database) middle_name = models.CharField(max_length=100, blank=True, null=True) Optional Field with Empty String nickname = models.CharField(max_length=100, blank=True, default='') Important Notes For string-based fields (CharField, TextField), it’s often better to use blank=True with default='' instead of null=True to avoid having both NULL and empty strings in the database. ...

June 4, 2025

Book Review: Clean Code

I just finished reading this book: Clean Code: A Handbook of Agile Software Craftsmanship Clean Code: A Handbook of Agile Software Craftsmanship. It’s the best professional book I have ever read. This book gives excellent advices to write clean code which is highly readable and well structured. The code examples provided in this book is as good as an article. The reading experience is really smooth. Also, the author as a sense a humor, which makes the book more fun than the others. ...

February 22, 2023

[ETL] A Quick Way to Check If Files Are Landed Completely

Introduction When working for ETL projects, I find there is a need to check if files are landed (or downloaded) completely for next-step processing. We may suffer data lost if we process the files immediately when they are still being transferred. The ideal case is to negotiate a signal with your up-steam. The signal will be shown when all files are finished transferring. In this case, you don’t need to perform an landing check. However, the real world is far from ideal. In this article, I will a quick ways to check file landing. ...

January 15, 2023