
In the past couple of weeks, two different projects of mine have been released that were powered, at least in part, by a Python script that eliminated enormous amounts of labor from a process that used to take hours of drudgery.
Both The Upgradies and Feuding Families rely on compiling a list of most common answers from hundreds of submissions into a free-entry box in a Google form. As you might expect, this leads to some pretty inconsistent data entry. Poll people about their favorite Apple product of the year and youâll get Mini, Mac mini, M4 Mac mini, The Mini, The Mac Mini, The New Mac Mini, and even things like Macmini and Mca mini and Macini.
I started down the path of automating because I just thought computers would do a better job of counting identical input than humans would. And thatâs true, but the more I thought about it, the more I wanted the tool to go beyond counting identical entriesâI wanted it to throw all the similar entries into the count as well. Why not?
And so I created the first iteration of this tool, which Myke Hurley and I have been using for our projects for a year or two. Itâs a Python script thatâs just inserted into a one-line Shortcut (run shell script, since you can choose Python from a list of shells) for convenienceâs sake.
All the original script does is read the clipboard, puts everything in title case (thereby avoiding differences in capitalization), and then strips out a bunch of extraneous spaces and the addition of âTheâ via regular expressions.
Once all of that is normalized a bit, itâs run through a pretty amazing python command called Counter, contained within the collections package. It takes a list and returns an array with the number of times each list item appears. My script processes it, using the most_common
sorting technique, and formats it fancy for export. And with that, a data set of one, two, three, four, four, four, five, one becomes:
Four 3
One 2
Two 1
Three 1
Five 1
So thatâs good, but it still requires quite a bit of merging items that arenât quite close enough to be caught and normalized by my small stack of regular expressions. Enter Six Colors member Adrian, who suggested using the Levenshtein method to match similar strings to one another. Thereâs even a Python package that does the trick.
So I decided to recreate my script using Adrianâs suggested approach. After normalizing the list by removing articles, punctuation, extra spaces, and different cases, my script loops through the list and uses the Levenshtein ratio to decide if a string is close enough to be considered part of the larger group.
Then I brought in ChatGPT to do the dirty work of formatting and sorting the output in a way that was pleasing to me. The result is that a data set of one, two, three, threee, five, one hundred, a hundred, one hundreed, theree, four, fore, five, one becomes:
Three 3
One Hundred 3
One 2
Five 2
Four 2
Two 1
I tested this all with actual Feuding Families data. Hereâs the result of a real list of nearly 700 poll submissions answering the question âName a bounty hunter in Star Warsâ:
Boba Fett 335
Ig-88 86
The Mandalorian 41
Bossk 36
Mando 34
Greedo 27
Jango Fett 27
Ig-11 13
Not perfect, Iâm going to have to manually merge The Mandalorian with Mando, but thatâs it! And it managed to merge Baba Fet, Babo Fett, Bob A Fett, Bob Fett, Boba Fet, Boba Fety, Bobafett, Bobba Fet, Bobba Fett, Bobs Fett, Boda Feta, Bona Fett, and Bubba Fett together into a single âBoba Fettâ answer.
Finally, the Mac-friendly integration: By saving this as a Shortcut and opting for it to appear in the Services Menu, because the script accepts input and generates output, I can actually use this script in pretty much any Mac text editor. I just select the text of the list, choose Count Duplicates in List from the Services submenu under the Apple Menu, and the uncounted list will be replaced with a counted one.

Anyway, if you ever find yourself in the very specific need of processing a bunch of similar, but not identical, answers, the script is available as a gist on GitHub.
If you appreciate articles like this one, support us by becoming a Six Colors subscriber. Subscribers get access to an exclusive podcast, members-only stories, and a special community.