From 4ac1c56563a6452b89bd0664b25639cc8e0bdf81 Mon Sep 17 00:00:00 2001 From: Alisa Sireneva Date: Sun, 17 Nov 2024 22:49:03 +0300 Subject: [PATCH] Credit xnor --- blog/any-python-program-fits-in-24-characters/index.html | 4 ++-- blog/any-python-program-fits-in-24-characters/index.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/blog/any-python-program-fits-in-24-characters/index.html b/blog/any-python-program-fits-in-24-characters/index.html index 3f74018..cb1d9c2 100644 --- a/blog/any-python-program-fits-in-24-characters/index.html +++ b/blog/any-python-program-fits-in-24-characters/index.html @@ -1,5 +1,5 @@ Any Python program fits in 24 characters* | purplesyringa's blog

Any Python program fits in 24 characters*

* If you don’t take whitespace into account.

My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode any Python program in 24 bytes, ignoring whitespace.

This post originally stated that 30 characters are always enough. Since then, commandz and another person from the codegolf Discord server have devised a better solution, reaching 24 bytes. After a few minor modifications, it satisfies the requirements of this problem, so I publish it here too.

BitsWe can encode arbitrary data in a string by only using whitespace. For example, we could encode 0 bits as spaces and 1 bits as tabs. Now you just have to decode this.

As you start implementing the decoder, it immediately becomes clear that this approach requires about 50 characters at minimum. You can use c % 2 for c in b"..." to extract individual bits, then you need to merge bits by using str and concatenating then with "".join(...), then you to parse the bits with int.to_bytes(...), and finally call exec. We need to find another solution.

CharactersWhat if we didn’t go from characters to bits and then back? What if instead, we mapped each whitespace character to its own non-whitespace character and then evaluated that?

exec(
+My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode any Python program in 24 bytes, ignoring whitespace."property=og:description>

Any Python program fits in 24 characters*

* If you don’t take whitespace into account.

My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that’s exactly what we’ll abuse to encode any Python program in 24 bytes, ignoring whitespace.

This post originally stated that 30 characters are always enough. Since then, commandz and xnor from the Code Golf Discord server have devised a better solution, reaching 24 bytes. After a few minor modifications, it satisfies the requirements of this problem, so I publish it here too.

BitsWe can encode arbitrary data in a string by only using whitespace. For example, we could encode 0 bits as spaces and 1 bits as tabs. Now you just have to decode this.

As you start implementing the decoder, it immediately becomes clear that this approach requires about 50 characters at minimum. You can use c % 2 for c in b"..." to extract individual bits, then you need to merge bits by using str and concatenating then with "".join(...), then you to parse the bits with int.to_bytes(...), and finally call exec. We need to find another solution.

CharactersWhat if we didn’t go from characters to bits and then back? What if instead, we mapped each whitespace character to its own non-whitespace character and then evaluated that?

exec(
     "[whitespace...]"
         .replace(" ", "A")
         .replace("\t", "B")
@@ -17,7 +17,7 @@
     )
 )
 

The characters ABCDEFGHIJ are located at indices 9,11,12,28,29,30,31,32,133,160 – all whitespace code points below 256 except CR and LF, which are invalid in a string. While this code is long, most of it is just whitespace, which we ignore. After removing whitespace, it’s only 32 characters:

exec("".translate("ABCDEFGHIJ"))
-

We can now encode any Python program that uses at most 10 different characters. We could now use PyFuck, which transforms any Python script to an equivalent script that uses only 8 characters: exc('%0). This reduces the code size to 30 charaters (plus whitespace). A bit of postprocessing is necessary to get it working well, as PyFuck often has exponential output, but that’s a minor issue.

A better wayBut it turns out there’s another way to translate whitespace to non-whitespace.

This solution was found by a reader of my blog – thanks!

When repr is applied to Unicode strings, it replaces the Unicode codepoints with their \uXXXX representations. For example, U+2001 Em Quad is encoded as '\u2001'. All in all, Unicode whitespace gives us unlimited supply of \, x, and the whole hexadecimal alphabet (plus two instances of ').

Say we wanted to extract the least significant digits of characters from U+2000 to U+2007. Here’s how to do this:

# Imagine these \uXXXX escapes are literal whitespace characters
+

We can now encode any Python program that uses at most 10 different characters. We could now use PyFuck, which transforms any Python script to an equivalent script that uses only 8 characters: exc('%0). This reduces the code size to 30 charaters (plus whitespace). A bit of postprocessing is necessary to get it working well, as PyFuck often has exponential output, but that’s a minor issue.

A better wayBut it turns out there’s another way to translate whitespace to non-whitespace.

This solution was found by readers of my blog – thanks!

When repr is applied to Unicode strings, it replaces the Unicode codepoints with their \uXXXX representations. For example, U+2001 Em Quad is encoded as '\u2001'. All in all, Unicode whitespace gives us unlimited supply of \, x, and the whole hexadecimal alphabet (plus two instances of ').

Say we wanted to extract the least significant digits of characters from U+2000 to U+2007. Here’s how to do this:

# Imagine these \uXXXX escapes are literal whitespace characters
 >>> repr("\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007")[6::6]
 '01234567'
 

To get \, x, and the rest of the hexadecimal alphabet, we need characters like U+000B and U+001F. We also need to align the strings exactly, so that one of the columns contains all the alphabet:

         v
diff --git a/blog/any-python-program-fits-in-24-characters/index.md b/blog/any-python-program-fits-in-24-characters/index.md
index 746cace..88e8342 100644
--- a/blog/any-python-program-fits-in-24-characters/index.md
+++ b/blog/any-python-program-fits-in-24-characters/index.md
@@ -11,7 +11,7 @@ intro: |
 
 My friend challenged me to find the shortest solution to a certain Leetcode-style problem in Python. They were generous enough to let me use whitespace for free, so that the code stays readable. So that's exactly what we'll abuse to encode *any* Python program in $24$ bytes, ignoring whitespace.
 
-> This post originally stated that $30$ characters are always enough. Since then, [commandz](https://github.com/commandblockguy) and another person from the codegolf Discord server have devised a better solution, reaching $24$ bytes. After a few minor modifications, it satisfies the requirements of this problem, so I publish it here too.
+> This post originally stated that $30$ characters are always enough. Since then, [commandz](https://github.com/commandblockguy) and xnor from the [Code Golf](https://code.golf) Discord server have devised a better solution, reaching $24$ bytes. After a few minor modifications, it satisfies the requirements of this problem, so I publish it here too.
 
 
 ### Bits
@@ -79,7 +79,7 @@ We can now encode any Python program that uses at most $10$ different characters
 
 But it turns out there's another way to translate whitespace to non-whitespace.
 
-> This solution was found by a reader of my blog -- thanks!
+> This solution was found by readers of my blog -- thanks!
 
 When `repr` is applied to Unicode strings, it replaces the Unicode codepoints with their `\uXXXX` representations. For example, `U+2001 Em Quad` is encoded as `'\u2001'`. All in all, Unicode whitespace gives us unlimited supply of `\`, `x`, and the whole hexadecimal alphabet (plus two instances of `'`).