The Magic of No Magic

On a recent episode of This America Life, Teller (of Penn & Teller) said this wonderful thing about practicing magic:

"If you understand the good magic trick, and I mean really understand it right down to the mechanics at the core of its psychology, the magic trick gets better, not worse."

I love magic, and Teller might be my favorite magician, so there was something lovely and surprising about hearing him say this (in his also surprising Brooklyn accent). The more you know about a trick, the better it gets.

Penn & Teller doing the "Cups and Balls" trick with clear plastic cups. Source

Penn & Teller doing the "Cups and Balls" trick with clear plastic cups. Source

As much as I love watching magic, I'll never be a very good magician because I'll never put in the time to learn it. As practice, it just doesn't hold my interest.

But since I was a kid, I have steadily learned more and more about computers. Over the course of the last few years, it's become a slow-burning obsession of mine to figure out how they work. From top to bottom. How do individual transistors produce these lush worlds of meaning and interconnectedness?

At some point, it became clear that I would never understand computers without learning how to use one as a tool. A tool that I can personally wield. It's a happy accident that you can also make good money these days by knowing about computers, and maybe one day I'll do that too.

At another point, a bit later than that, I realized that programming is fundamentally the manipulation of words. Words! Hey, I know words! Each program is a thoughtful letter to the computer telling it what you would like it to do.

That's how it feels anyway. And in a sense that's right, but in another sense I'm just sending electrical impulses through keyboard strikes into a machine able to assemble them into something that appears meaningful to you and me, but which is nothing but electrical states (billions of them) to the computer.

Knowing that only makes the trick better.

Regex

One the less heralded innovations leading to widespread literacy, a few hundred years before Gutenberg (and at least as important), was the invention of spaces between words and lowercase lettering. IMNOTSUREACULTUREWITHTEXTLIKETHISISCAPABLEOFPRODUCINGTHECRITIQUEOFPUREREASONITSINSCRUTIBLEENOUGHASITIS

Regex is like magic but the syntax is Byzantine (or maybe earlier than that — sorry if I offended any Byzantines). It's the only system of written text that I'm aware of that is easier to learn to write than to read. Though if I'm honest I might put English into that category for myself. It's easier, increasingly, to write than to read 1,000 words in a sitting.

But my inattention to written English is no excuse for Regex, which does what it does so satisfyingly but does it LIKETHISWHICHISNEARLYUNREADABLEBYTHEHUMANEYE.

Most mainstream programming conventions that are still around are there for a reason. There have been enough obsessive people along the way to root out most of the truly nonsensical stuff. So rather than judge Regex for its squished syntax, I'll appreciate that it's the human writer and the computer reader that it’s aiming at.

That's what I'm assuming, anyway. I'm sure someone who writes and reads Regex for a living (I'm assuming there are such people) would want to hit me on the head for suggesting the addition of more parentheses or spaces to their marvel of efficiency, which is perfectly readable if you take the time to learn it, thank you very much. Or maybe they would just call me a /n0{2}b\scuck/, who knows?

Programming is Scrappy

Maybe it's because, like so many nerdy kids, my values were shaped by Star Wars. Who is cooler than Han Solo, and what's cooler than the Millennium Falcon? A space-faring Argo built, rebuilt, patched by whatever is lying around and by whoever happens to own it at a particular time. It's impossible to design the Millennium Falcon from scratch: it's an expression of its own desired function, shaped by what it needs to do. It's put together one piece at a time, as necessary. It mostly works, most of the time, and when it works it works (mostly) as well as any ship in space.

Maybe it's Han Solo's fault: I love scrappiness. There is something troubling about a complex plan that is executed "perfectly" and without alteration along the way. How could it possibly be right from conception to execution? The 20th century was rotten with examples of genius plans executed with tragic efficiency.

Is it the Han Solo or the Yoda in me who believes that the world provides all the feedback and wisdom you need if you keep an ear out and adapt yourself.

Photo by Andres Rueda Source: Wikicommons

Photo by Andres Rueda
Source: Wikicommons

This is all an indirect way of saying how delighted I've been to learn that programming is a scrappy process. The better I get at it the more scrappy it becomes. Computers, like any music instrument, give you immediate feedback to tell you when you are playing a note wrong. And, like with music, the better you are with computers the more you can improvise.

My goal was to make a simple web scraper that would look on the websites of Best Buy, Amazon, and Target and tell me if Nintendo's SNES Classic is available for purchase.

I got to work and after a little while built the whole program — the command line interface and objects for each store — and just needed to plug in the web scraper.

But when I did, the webpage element I needed (the text on the "Add to Cart" button) kept coming up blank. At first I thought I was isolating the wrong element, but after several tries and several page reloads I realized it wasn't there. Open-URI was reading Best Buy's HTML just fine, but Best Buy uses JavaScript to update item quantities. So when the HTML first loads, and Open-URI reads it, the "Add to Cart" button is blank, waiting to be filled in by JS telling it if there's anything to add to the cart, or if it should instead display "Coming Soon" or "Out of Stock" or "Pre-Order," etc.

For a moment I considered dropping the project entirely. I had built the whole thing, so that idea sucked. But if it wasn't possible, it wasn't possible.

Then I decided to Google around — "Nokogiri ruby fully load page." I pretty quickly found Watir, a Ruby gem which actually opens a browser window and allows you to perform actions on it ("click this element," etc.). All I needed Watir to do was open the browser window and feed the final doc into Nokogiri (Watir has its own Nokogiri type functions but I thought using that would be out of the scope of this project).

It all works. I set Watir to run "headless," meaning it opens a browser window with Chrome but does it without displaying an open Chrome window. It's all in the background.

The solution is a lot slower than originally intended (it takes 30-60 seconds to load the pages) but it works, and from the user's perspective nothing is different but how long it takes. I’m 90% certain there are APIs available that would make this instantaneous and less prone to breaking in the future, but this is a web scraping project and, however scrappy, it does the job.