I’ve seen this article/newsletter/whitepaper(?) titled “Society’s Technical Debt and Software’s Gutenberg Moment” from SK Ventures all over the place this past week. In it, the authors Paul Kedrosky & Eric Norlin argue that either now or very soon, Large Language Model (LLM) chatbots like ChatGPT will allow anyone to easily write software, and accordingly the cost of writing software will fall, like, a ton.
Because I am a software developer, I am watching these developments with interest. Right now I am currently unsure about how much I, personally, should be worried about chatbots takin my jerbs.
However, I can safely say that after reading this article, I’m not any more worried now than I was a little bit ago.
The evidence these guys offer that ChatGPT or whatever is going to change, as they put it, “the very idea of what it means to manipulate the symbols that constitute a programming language”, is this:
Rather than having to learn Python to parse some text and remove ASCII emojis, for example, one could literally write the following ChatGPT prompt:
Write some Python code that will open a text file and get rid of all the emojis, except for one I like, and then save it again.
In case you’re thinking, “there’s no way that works,” you’re wrong. It works fine, took two seconds, and is just a glimpse of how previously inaccessible deftness at writing code is now available to anyone:
That does sound impressive! But it also sounds like bullshit presented confidently, which is about to become a theme here.
From the screenshot above, here’s the Python script that ChatGPT spat out:
# Open the file and read its content
with open('toomanyemojis.txt', 'r', encoding='utf-8') as f:
content = f.read()
# Define a list of emojis to keep
keep_emojis = ['¯\_(ツ)_/¯']
# Remove all ASCII emojis except the ones to keep
for emoji in content:
if emoji.encode('ascii', 'ignore').decode('utf-8') not in keep_emojis:
content = content.replace(emoji, '')
# Save the modified content back to the file
with open('toomanyemojis.txt', 'w', encoding='utf-8') as f:
f.write(content)
Quick digression about me: while I have been programming for over 15 years, I not a Python expert. I did a tutorial once.
That said, this code does not work.
They asked ChatGPT to “write some Python code that will open a text file and get rid of all the emojis, except for one I like, and then save it again” and ChatGPT gave them a script that will turn your text file—no matter the contents—into an empty text file.
I’ll explain.
First up, opening the file:
# Open the file and read its content
with open('toomanyemojis.txt', 'r', encoding='utf-8') as f:
content = f.read()
This looks ok, but my spidey-sense (haha because I detect possible bugs like a spider haha) is already concerned that this will only work if the file toomanyemojis.txt
is in the same directory where you’re running the script. If that file is not in the current directory, it won’t work. But, like, whatever. File systems are hard.
Next up:
# Define a list of emojis to keep
keep_emojis = ['¯\_(ツ)_/¯']
This does define a list; however, ChatGPT or the authors or both do not seem to make or even understand a distinction between ASCII art or emojis.
The authors prompted “all ascii emojis except the one for shrugs”, and ChatGPT seems to interpret that as meaning the ASCII art shrug ¯\_(ツ)_/¯
instead of the “person shrugging" emoji, which is maybe what the authors meant? 🤷
You could call that semantic nitpicking; I’ve seen lots of people refer to ASCII art or emoticons or whatever “using text-y characters not literally” as emoji.
But (spoiler alert!) a lack of clarity in programming has consequences.
Anyway, the code up to this point has been a mere apéritif stimulating the palette before things get real bonkers:
# Remove all ASCII emojis except the ones to keep
for emoji in content:
if emoji.encode('ascii', 'ignore').decode('utf-8') not in keep_emojis:
content = content.replace(emoji, '')
So in this for
loop we’re going through every single character in the file, one-by-one.
Then in the next line with the if
statement we’re… well, let’s split that apart a bit.
First, with emoji.encode('ascii', 'ignore').decode('utf-8')
we’re effectively removing all non-ASCII characters. We’re going to come back to this later, but for the moment let’s rewrite this a bit and put that into its own variable:
for character in content:
ascii_character = character.encode('ascii', 'ignore').decode('utf-8')
if ascii_character not in keep_emojis:
content = content.replace(emoji, '')
The “this will never work” part is if ascii_character not in keep_emojis
.
After we’ve removed all non-ASCII characters from our single character, we now check if it is not in the keep_emojis
list.
As a reminder, keep_emojis
is a list that contains a single nine-character string that looks like a shrug:
keep_emojis = ['¯\_(ツ)_/¯']
A quick lesson because I just had to verify it myself: x in y
in Python checks if the right-hand item is present in the left-hand collection. x not in y
is the opposite: the right-hand item is not present in the left-hand collection. Some examples:
# strings
'a' in 'abc' # true
'z' in 'abc' # false
'z' not in 'abc' # true
'/' in '¯\_(ツ)_/¯' # true
# lists
1 in [1, 2, 3] # true
1 in [10, 11, 12] # false
1 not in [10, 11, 12] # true
# lists of strings; pay attention!
'my' in ['my', 'cute', 'dog'] # true
'm' in ['my', 'cute', 'dog'] # false!
'm' not in ['my', 'cute', 'dog'] # true!
Are you seeing the problem yet?
Let me try that conditional statement as a sentence: "if this single ASCII character is not in the list that contains the multi-character string ¯\_(ツ)_/¯
, do the following…".
Put another way: "if a single character is not any of these specific multi-character strings, do the following…".
Well, that’s… always true!
Again, not a Python expert here, but I’ll rewrite that line back into code:
if true:
That’s unnecessary, so this whole loop is…
for character in content:
content = content.replace(character, '')
…turning every character in the file into an empty string one-by-one, and finally write it all back to the same file…
# Save the modified content back to the file
with open('toomanyemojis.txt', 'w', encoding='utf-8') as f:
f.write(content)
…resulting in a blank file, no matter what was originally in the file! 🎉🥳🎊
(You saved a separate copy of that file that ChatGPT just helped you erase, right?)
So, yeah, this doesn’t work.
Let’s circle back to that encoding/decoding part.
The authors actually asked ChatGPT to explain this and called it “damn clever”:
Say you weren’t sure why this works, or were skeptical that it would work, and thought maybe the AI was bluffing. You could ask it to explain itself, as shown here:
In short, the LLM used a neat hack. Instead of working through an exhaustive list of all ASCII emojis and seeing if they’re present, it elected to use character encodings to differentiate emojis from non-. That is damn clever, and that you can ask the LLM to explain how it did something—that there is a self-referential explanatory power—is another reason why this changes the software game.
The statement emoji.encode('ascii', 'ignore').decode('utf-8')
does do what ChatGPT says it does. If any character in that string is outside of the more-limited ASCII set, it’ll be removed. Most emoji characters are not ASCII characters.
We’ll put aside if “clever” code is a good thing (it is not), but that is a clever way to remove most emojis from a string.
However, “remove emojis” wasn’t the goal, was it? And if you’re actually looking for emoji, it doesn’t seem correct to remove them all first.
And, believe me, I have about a thousand more words in me about how that ¯\_(ツ)_/¯
ASCII art is technically a mix of ASCII and non-ASCII characters and most ASCII art is multiple characters and many emoji are often actually multiple combined characters masquerading as a single character so looking through a file a single character at a time is doomed to fail, but all that doesn’t matter.
What does matter is the authors assumed that this code works (it doesn’t), asked ChatGPT how it works (it doesn’t) and ChatGPT responded with some confident bullshit about character encodings that sounds technical and programmer-y but doesn’t actually have much bearing on how it works (it doesn’t), and the authors bought it. Yadda yadda yadda anyone can code!
So when the authors said
In case you’re thinking, “there’s no way that works,” you’re wrong.
that was, um, also wrong.
If I was being charitable, I’d say the authors are over-excited about these chatbots, tried to use one to prove a point, and didn’t check their work. Like even a little bit.
If I was being less charitable, I’d say the authors appear to believe anything a chatbot tells them. Which is mildly terrifying.
I mean, look: these chatbot doohickeys are going to get better. I can see a compelling case for working with one as a springboard or co-pilot of sorts, where you let it handle the boilerplate and trawl the internet for answers for you, and then you work together to refine the results. But I wouldn’t blindly trust one with a coding task today.
And in the meantime, watch out for confident bullshit from both your LLM chatbots and the people hyping them.
But what do I know? I’m just an expensive, inefficient software developer.
¯\_(ツ)_/¯
(And, yes, I am aware of Brandolini’s law, thank you.)