Bad Apple but it's 6,500 regexes that I search for in vim
Why should I leave vim to watch a video?
Jan 10, 2025
Well the title of this post promised you Bad Apple in vim only using search queries. So here’s Bad Apple in vim, but the only thing that’s changing is the search query:
Unfortunately this is only 120x90; my screen isn't big enough to size up more
Let’s talk about how this works!
Wait, sorry, what is Bad Apple
Bad Apple is a visually compelling music video that folks enjoy embedding in surprising places. It’s a meme in the same way that running DOOM on a smartfridge is a meme.
I’ve wanted to run Bad Apple somewhere since I saw this video purporting to run it on One Million Checkboxes.
Getting the frames
The first step here was pretty simple. I needed data for each frame of Bad Apple. Felixoofed had already done some of the work for me; I cloned their repo which gave me the video along with a suggestion of an ffmpeg
command to turn the video into ~6,500 PNGs representing each frame.
Then I wrote a little bit of Python code to turn each of those PNGs into a 2D array of 0s and 1s (where 1 represented a black pixel). The video was originally 480x360 - I shrunk it down to 120x90 after taking some measurements of my terminal and concluding that I couldn’t really go any bigger.
from PIL import Image
import numpy as np
def process_image(path, target_width=120, target_height=90):
img = Image.open(path)
img = img.resize((target_width, target_height), Image.Resampling.LANCZOS)
if img.mode != "L": img = img.convert("L")
pixels = np.array(img)
binary_pixels = (pixels < 10).astype(int) # 1 for dark, 0 for light
return binary_pixels
def text_preview(binary_pixels):
chars = {0: ".", 1: "#"}
return "\n".join("".join(chars[px] for px in row) for row in binary_pixels)
I ran this against a couple of frames and printed out their text_preview
s to confirm that they looked ok. They did!
Drawing in vim
So how do we draw something in vim? Well, to start I made a grid of text with a drawing embedded inside it. Here’s a grid that mostly has As, but if you search for B you see a little stick figure:
hey little guy
So that kinda lets us create a drawing! But there are some problems. One is that the blue highlighting here (which I think is the default thing vim does?) isn’t super clear.
But vim highlighting is very configurable! We can tell it to highlight the foreground and background of each matched character with the same color. The invocation I use is hi Search cterm=NONE ctermfg=grey ctermbg=grey
which gives us nice blocks instead:
much nicer
But there’s one more problem - we want square pixels! But right now each of our characters is more of a rectangle, because most fonts are taller than they are wide! I searched around and found Square, a font that is…square! It’s designed for roguelike games played in the terminal, and using it here gives us a very nice grid:
all squared away
So that lets us create pictures by highlighting text. But how do we highlight the right text for each frame of Bad Apple?
Drawing arbitrary rectangles
I kicked around a bunch of vague ideas about file structure - maybe there was a way to analyze my frames, figure out which pixels were “hot,” and then generate a file that was optimized for regexs selecting those pixels?
But before going too far down this path I decided to read through vim’s docs on searching. I knew that vim’s search had all sorts of crazy features 1 and I wanted to know what tools I had available. And it turns out that vim already had exactly what I needed!
My favorites are \zs
and \ze
, which let you declare when a match starts or
ends. It’s basically a way to say “treat everything behind/ahead of me as a
lookahead/lookbehind” - it’s very ergonomic!
/\%l /\%>l /\%<l E951 E1204 E1273
\%23l Matches in a specific line.
\%<23l Matches above a specific line (lower line number).
\%>23l Matches below a specific line (higher line number).
\%.l Matches at the cursor line.
\%<.l Matches above the cursor line.
\%>.l Matches below the cursor line.
vim searches can match on specific line numbers (and column numbers). And you can combine several of these searches together - for example \%>5c\%<15c\%>4l\%<9l
matches the rectangle between columns 5 and 15 and between lines 4 and 9.
And better yet, you can OR those patterns together just like any other vim search - \%>5c\%<15c\%>4l\%<9l\|\%>12c\%<25c\%>10l\%<15l
finds both our previous rectangle and the one between cols 12 and 25 and lines 10 and 15. So it’s easy to draw many rectangles to the screen with a single search.
two squares from one search string!
This transformed the problem - now the goal was to decompose each frame into a series of rectangles that we could search for!
Frames into rectangles
Our grid was 90x120 so we had ~10,000 pixels. This meant that a super naive approach would potentially look for thousands of different rectangles and generate a search string tens of thousands of characters long. Some basic testing showed me that while vim’s search was fast, search strings that long would kill my framerate.
“Decompose this grid into the minimum number of rectangles required to fill it” seemed like the type of thing that would have a de-facto solution, so I looked for existing solutions. But I didn’t find much! There’s a 14 year old stack overflow question where the accepted answer is “it’s hard.”
So I wrote something naive. The approach I took was:
- Detect all runs of “1s” in the first line line
- Look at the next line, and find runs that overlap with the runs from the previous line
- “Merge” those runs into a rectangle if the area of the merged rectangle is greater than the area of either line alone (remember, the runs may not overlap fully)
- Keep going, making sure to merge new runs into old rectangles whenever possible
The actual code is pretty long and I’m not sure it’s worth reading, but here it is if you’re curious:
The code, which I'm not sure you really need to see
class Rect:
def __init__(self, col_start, col_end, row_start, row_end):
self.row_start = row_start
self.row_end = row_end
self.col_start = col_start
self.col_end = col_end
def height(self):
return self.row_end - self.row_start
def width(self):
return self.col_end - self.col_start
def area(self):
return self.height() * self.width()
def to_vim_pattern(self):
# vim is 1-indexed and exclusive in both directions
row_start = self.row_start
row_end = self.row_end + 1
col_start = self.col_start
col_end = self.col_end + 1
return fr"\%>{col_start}c\%<{col_end}c\%>{row_start}l\%<{row_end}l"
def __repr__(self):
width = self.width()
height = self.height()
return f"R: ({width}x{height}) {self.row_start}:{self.row_end}, {self.col_start}:{self.col_end}"
def test_merge_horizontal(self, run_start, run_end, run_row):
if run_end <= self.col_start: return [False, None, None, None]
if run_start >= self.col_end: return [False, None, None, None]
overlap_start = max(run_start, self.col_start)
overlap_end = min(run_end, self.col_end)
overlap_rect = Rect(overlap_start, overlap_end, self.row_start, run_row+1)
# rects that are no longer eligible for merging
top_rects = []
if overlap_start > self.col_start:
top_rects.append(Rect(self.col_start, overlap_start, self.row_start, self.row_end))
if overlap_end < self.col_end:
top_rects.append(Rect(overlap_end, self.col_end, self.row_start, self.row_end))
# rects that are split from the current run but eligible for merging
# with stuff below
bot_rects = []
if overlap_start > run_start:
bot_rects.append(Rect(run_start, overlap_start, run_row, run_row+1))
if overlap_end < run_end:
bot_rects.append(Rect(overlap_end, run_end, run_row, run_row+1))
return [True, overlap_rect, top_rects, bot_rects]
def find_runs(row):
runs = []
start = None
for i, val in enumerate(row):
if val == 1 and start is None:
start = i
elif val == 0 and start is not None:
runs.append((start, i))
start = None
if start is not None:
runs.append((start, len(row)))
return runs
def try_merge_with_prev_row(mergeable_rects, run_start, run_end, run_row):
candidate_merges = []
for i, rect in enumerate(mergeable_rects):
merge_result = rect.test_merge_horizontal(run_start, run_end, run_row)
if merge_result[0]: candidate_merges.append((merge_result, i))
best = None
best_idx = None
best_merge_result = None
run_length = run_end - run_start
for (candidate, candidate_idx) in candidate_merges:
_, overlap_rect, top_rects, bot_rects = candidate
overlap_area = overlap_rect.area()
if best is None or overlap_area > best:
best = overlap_area
best_idx = candidate_idx
best_merge_result = (overlap_rect, top_rects, bot_rects)
if best is None: return (None, None)
elif best > run_length: return (best_idx, best_merge_result)
else: return (None, None)
def to_horizontal_merge_rect_representation(binary_pixels):
completed_rects = []
mergeable_rects = []
for row_idx, row in enumerate(binary_pixels):
next_mergeable_rects = []
runs = find_runs(row)
for (run_start, run_end) in runs:
best_merge_idx, best_merge_result = try_merge_with_prev_row(mergeable_rects, run_start, run_end, row_idx)
if best_merge_idx is None:
r = Rect(run_start, run_end, row_idx, row_idx+1)
next_mergeable_rects.append(r)
else:
mergeable_rects.pop(best_merge_idx)
overlap_rect, top_rects, bot_rects = best_merge_result
mergeable_rects.extend(top_rects)
next_mergeable_rects.append(overlap_rect)
next_mergeable_rects.extend(bot_rects)
completed_rects.extend(mergeable_rects)
mergeable_rects = next_mergeable_rects
completed_rects.extend(mergeable_rects)
return r"\|".join(rect.to_vim_pattern() for rect in completed_rects)
So probably don’t read that. But the point is that it’s far from optimal. For example, the code never looks at future rows, so it discards merges that are a bad idea now but would work out well based on following rows 2.
The code was really fun to write though
But I knew the code was naive when I wrote it - I just wanted to get something vaguely reasonable down to see how it’d perform. I used it to write search queries for every frame to a file, set up a simple vim animation test harness 3, and found that the performance was fine! Well, in some cases.
we’ll talk about how this works in a second
Pathological cases
My naive algorithm worked really well in some cases and totally fell over in others. Many of the search strings were 500 - 2000 characters long, but the search string that it generates for this image is over 10,000 characters 4!
Length of a search string is not a perfect proxy for performance, but I think it’s pretty good. Our search strings are just a bunch of patterns (of similar length) OR’d together. The longer a search string, the more patterns. And search time should scale with the number of patterns, since vim has to check each one before deciding that a search doesn’t match.
That said I did no principled profiling here and may be totally wrong in some way!
cool image tbh
These long search strings dramatically reduced frames per second - we went from around ~40 FPS to the single digits.
I hunted for optimizations 5. I came up with a bunch of ideas for tweaks, but wasn’t confident that any of them would help much with the bad cases. And I didn’t have the time to find a good general-purpose algorithm: I was working on this the night before weekly presentations at the Recurse Center and I wanted to present it the next day!
My favorite rabbit hole was trying to improve vim’s search performance instead of my regexs. My setup for playing the video had multiple buffers open and the searches were ran over each buffer. I thought if I could prevent that (or prevent highlighting in other buffers) that might help. But I couldn’t get that to work.
So I pulled out one of my favorite tricks 6 - why just use a single algorithm?
which I originaly learned many years ago from my friend Eliot, who is also the person who made One Million Checkboxes faster
I wrote two more naive solutions:
- A version of my “build rects top to bottom” algorithm that went left to right instead
- A simple RLE that only looked at individual rows
And then I updated my code to try all three naive solutions and pick the search pattern that was the smallest!
This worked great. While each of these algorithms had pathological cases, they covered for each other pretty well - at least one ~always generated something reasonable.
RLE ended up most often generating the “best” solution - although it also had some particularly bad pathological cases (which is why I avoided using it to start). And my original approach was used the least often! oops.
# Number of times each approach was picked
original approach (top to bottom merging) - 1110
left to right merging - 2239
single-row RLE - 3300
Wait, so how do you actually run this inside vim
Yes, right, that’s a good question.
Here’s an image of the vim setup:
it's all clear now right?
The center window on the top is where we play the video. It’s a file that contains 90 lines of 120 spaces (remember that our images are 120x90); since we’re matching on rows and columns it doesn’t need to have any content. To its left and right are some empty buffers that are sized to center the image.
The bottom window is our list of search patterns! All ~6,500 of them.
To play the video we use a vim macro. If you haven’t seen vim macros, they’re a way to record a series of arbitrary keystrokes to be replayed later. They’re extremely powerful - since you do ~everything in vim via the keyboard, you can trivially record and replay any action. And if you set up your macro right (so that it ends with the cursor in the right spot for the next iteration) you can tell vim to run the macro many times in a row!
The macro
The macro is "ay$:let @/=@a^M+
. That is:
"a
Operate over register ay$
Yank until the end of the line
Vim has “named” registers that can hold text - ‘a’ now contains the current line
:
Enter command modelet @/=@a
set the contents of register/
to the contents of registera
^M
execute the command (^M
represents the enter key)
To reference a register in command mode you use @
followed by the register name. Vim has special registers - the /
register represents the current search. At this point vim is searching for the query that is on the current line
+
move to the start of the next line
This is what I meant by “if you set up your macro right you can make it replayable” - we’re prepared to execute our macro again because the cursor is now on the start of the next line!
This might look like nonsense to you - I’m pretty comfortable with macros because of the years that I’ve spent vim golfing. But it works!
The most interesting optimization is let @/=@a
- an alternative is to do something like /^Ra^M
(begin a search, pastes the output of register a
, enter), but this is a problem because it requires directly pasting a query that is potentially thousands of characters long. This causes the search window to expand to fit the query, which creates a flickering effect and reduces framerate.
But now we can run 1500@q
(assuming we recorded the macro into register q
) to play the macro 1500 times - meaning that we’ll run through 1500 frames as quickly as we can.
And that gives us this!
not bad
Wrapping up
This was really fun! I built this in a single day, but if I wanted to spend more time on it I might make a few tweaks:
- I think it’d be more magical if I had gone the “create a well-structured file that I can craft traditional regexs over” route instead of using vim’s line/col search feature. You might (reasonably) quibble that these aren’t real regexs!
- I make no effort to keep a stable framerate, and the framerate definitely fluctuates a bit over the course of the video.
But this was good enough for my purposes. And I think it’s neat that I’ve built most of a general-purpose solution for playing a video inside of vim using search queries. Someone suggested that I record a video of me running Bad Apple in vim and play that video in vim. So I guess I’ll get on that.
Beyond that, I made this in my first week of a new batch at The Recurse Center, a place that offers something like a writers retreat but for programming. I presented it to folks there and got a lovely response. I really love Recurse, and if you love nonsense like this I bet you’d like it too. Consider applying!.
Finally, the code is super messy but if you want to poke around you can see it here.
And that’s all I’ve got. As always, I’ll be back with more nonsense soon.