As the pace of work slowed during the closing weeks of 2024, I redirected my energy towards a long-standing hobby: StarCraft: Brood War (BW). This classic 1998 real-time strategy game continues to thrive today, supported by an enduring community. Over the past weeks, I made significant strides in addressing a decade-old challenge within the BW community by leveraging large language models (LLMs) and free software.

Cultural Context of Brood War

The BW community is predominantly Korean. For over 20 years, most professional players, teams, tournaments, and passionate audiences have been based in Korea. For non-Korean members of the community, terms like “foreigners” are commonly used. Just as in chess, BW requires strategic thinking, and studying past professional matches is crucial, especially at higher levels of play. Much discourse occurs through videos and live streams by professional BW players, leading to the Foreigner Knowledge problem.

The Foreigner Knowledge Problem

Most non-Korean BW players lack fluency in Korean, limiting their access to the Korean discourse. While the community funds translations of Korean YouTube subtitles, this process is slow and poorly compensated. Machine translation, though helpful, struggles with BW’s domain-specific language, filled with jargon.

Consider a typical Google-translated subtitle from a Korean BW tutorial. Terms like “12 courtyard” and “3-point processing” appear nonsensical. Yet, this mistranslation phenomenon has long hindered the Foreigner community.

The New Translation Process

Developing a new machine translation process profoundly improves translation quality. For example, the line “12 courtyard” is correctly translated to “12 Natural build,” immediately offering clear and relevant context to BW players. Such translations ensure a dramatically improved signal-to-noise ratio.

Previously, only a few dozen videos were translated yearly. Now, my spare-time efforts translate about seven videos a day, suggesting vast future potential.

Tech Stack Overview

The translation process divides into producing and consuming subtitles.

Producing Subtitles

yt-dlp and OpenAI Whisper: Initially, I used yt-dlp to download YouTube’s automatic Korean subtitles, which proved ineffective. Instead, I extracted audio tracks and used Whisper to generate cleaner Korean subtitles.

Google Colab: Whisper requires substantial VRAM. To facilitate collaborative translation, I utilized Google Colab’s Jupyter notebooks and free GPUs, easily accessible to non-technical users.

Using LLMs and the Slang Dictionary

Leveraging LLMs alongside a BW-specific slang dictionary, I refined translations. The translated text aligns better with BW’s lexicon, dramatically improving clarity compared to earlier machine translations.

Consuming Subtitles

TamperMonkey: I created a UserScript to integrate with YouTube, allowing users to download translated subtitles for specific videos. The button is notably retro-styled, affectionately dubbed the BWKT Client.

Google Sheets and Apps Script: Utilizing Google Sheets and Apps Script, I developed a system to store subtitle URLs, easily accessible to users via the BWKT Client.

Improvements and Future Prospects

While effective, the project remains a work-in-progress. Potential upgrades include supporting multiple languages and refining UserScript functionality. Despite its humble inception, the project connects fragmented BW community resources, enhancing accessibility and understanding.

Shipping the Solution

Testing among fellow BW enthusiasts confirmed the translation system’s effectiveness. The solution met with enthusiasm, promising a shift in how the community engages with Korean BW content.

Final Thoughts

This project elegantly harnesses existing technologies to resolve a niche yet significant issue within the BW community. Its simplicity and agility exemplify the power of intuitive, user-centric solutions, ensuring it remains a valuable tool for BW enthusiasts worldwide.