Programming is Scrappy

Maybe it's because, like so many nerdy kids, my values were shaped by Star Wars. Who is cooler than Han Solo, and what's cooler than the Millennium Falcon? A space-faring Argo built, rebuilt, patched by whatever is lying around and by whoever happens to own it at a particular time. It's impossible to design the Millennium Falcon from scratch: it's an expression of its own desired function, shaped by what it needs to do. It's put together one piece at a time, as necessary. It mostly works, most of the time, and when it works it works (mostly) as well as any ship in space.

Maybe it's Han Solo's fault: I love scrappiness. There is something troubling about a complex plan that is executed "perfectly" and without alteration along the way. How could it possibly be right from conception to execution? The 20th century was rotten with examples of genius plans executed with tragic efficiency.

Is it the Han Solo or the Yoda in me who believes that the world provides all the feedback and wisdom you need if you keep an ear out and adapt yourself.

Photo by Andres Rueda Source: Wikicommons

Photo by Andres Rueda
Source: Wikicommons

This is all an indirect way of saying how delighted I've been to learn that programming is a scrappy process. The better I get at it the more scrappy it becomes. Computers, like any music instrument, give you immediate feedback to tell you when you are playing a note wrong. And, like with music, the better you are with computers the more you can improvise.

My goal was to make a simple web scraper that would look on the websites of Best Buy, Amazon, and Target and tell me if Nintendo's SNES Classic is available for purchase.

I got to work and after a little while built the whole program — the command line interface and objects for each store — and just needed to plug in the web scraper.

But when I did, the webpage element I needed (the text on the "Add to Cart" button) kept coming up blank. At first I thought I was isolating the wrong element, but after several tries and several page reloads I realized it wasn't there. Open-URI was reading Best Buy's HTML just fine, but Best Buy uses JavaScript to update item quantities. So when the HTML first loads, and Open-URI reads it, the "Add to Cart" button is blank, waiting to be filled in by JS telling it if there's anything to add to the cart, or if it should instead display "Coming Soon" or "Out of Stock" or "Pre-Order," etc.

For a moment I considered dropping the project entirely. I had built the whole thing, so that idea sucked. But if it wasn't possible, it wasn't possible.

Then I decided to Google around — "Nokogiri ruby fully load page." I pretty quickly found Watir, a Ruby gem which actually opens a browser window and allows you to perform actions on it ("click this element," etc.). All I needed Watir to do was open the browser window and feed the final doc into Nokogiri (Watir has its own Nokogiri type functions but I thought using that would be out of the scope of this project).

It all works. I set Watir to run "headless," meaning it opens a browser window with Chrome but does it without displaying an open Chrome window. It's all in the background.

The solution is a lot slower than originally intended (it takes 30-60 seconds to load the pages) but it works, and from the user's perspective nothing is different but how long it takes. I’m 90% certain there are APIs available that would make this instantaneous and less prone to breaking in the future, but this is a web scraping project and, however scrappy, it does the job.