The Chair
Several Years Ago
Several years ago, I was in the market for a new chair for my office. I don't buy furniture frequently, so I performed a significant amount of market research before deciding to purchase what in the industry you might call an "aspirational" chair: a chair that had an outstanding reputation for quality, design, comfort, and durability, and which, with proper maintenance, could conceivably find a place, one day, in a grandchild's home.
I visited the manufacturer's web site, and was immediately confronted with a dizzying array of options: different woods, different finishes, and different fabrics of several varieties and colors. Some combinations, oddly, were simply not available, even though their individual components were available in other configurations. Other decisions came with consequences in both price and shipping time.
Maddeningly, there was no way to look at a grid of options—materials, availability, delivery estimate, and price—without clicking through a slick, reactive interface, choosing each component of the chair in order, and learning, at the end, what the cost and ship time would be.
I decided, after clicking through several configurations and taking detailed notes on each for about 20 minutes, that finding the chair I wanted was ultimately an optimization exercise: some combinations I liked more than others, some prices I liked more than others, and some delivery times I liked more than others, and—having found no reliable pattern—there was no way I was going to be able to decide among all the options without a long, frustrating, and exhaustive search.
A Shocking Visit
The manufacturer happened to have a popular showroom several miles away, so I hoofed it to their storefront in the hope that I could speak to a sales rep who would have more comprehensive access to options and consequences. "Unfortunately no," the general manager explained. "When we want to look at all the options, we have to do the same thing you do: log onto the web site, click through all of them, and write them down by hand. Availability changes all the time, and you don't know what you'll find until you look."
While this, it seemed, was mildly inconvenient for the store's sales team, it obviously wasn't inconvenient enough for them to demand a daily report from the manufacturer's inventory management system, nor was the experience frustrating enough for the customers thronging the store and leaving with loosely-committed delivery dates for chairs, ottomans, end tables, and other accessories.
Returning to the web site later that afternoon, I wouldn't find angry reviews about the interface or the impossibility of enumerating all the possibilities available to customers. The situation was—surprising to me—just fine with everyone. Either they didn't care about price as much as I did, or they had a very high level of patience, or they were, by and large, the kind of people who knew exactly what they wanted and ordered it outside of any other concern.
The kind of people entirely unlike me.
Automation and Ethics
I don't at all blame the manufacturer or its design team for producing an interface entirely unsuited for the way I personally shop. It was extremely well done. Clearly they had performed at least as much market research as I had, and understood their customers deeply. Design is about balance, and in this world of aspirational chairs, balance clearly favored an ethic of exploration rather than an ethic of optimization.
Regardless of what everyone else needed from their visit to the site's configuration wizard, however, I needed an exhaustive list, and was determined to get it.
I launched Chrome and examined the site. It was entirely reactive, with dynamically-loaded components, but it was still HTML and JavaScript. It might be a tangled mess, but it did not depend on opaque controls and there were no obvious attempts to obfuscate the interface to make it impossible for the user to select elements. It would be possible to use web automation tools to iterate through all the thousands of options and produce a table for them all.
I'll acknowledge, before sharing the rest of the story, that web sites, in general, do not want you to automate their interface. They will often state that explicitly, they may include so-called "robots.txt" files to redirect web crawlers from indexing their content, and the terms of service of accessing the site may include serious consequences for people who ignore those warnings, including having their accounts suspended or banned.
There are a lot of good reasons for policies like these. Exercising a server for a comprehensive list of database-driven content creates a tremendous load on back-end systems for very little value: unlike the static content indexed by search engines, for example, dynamic content can change every minute or every day, and is probably out of date by the time anyone would benefit from it. And "web crawlers," as a class, are not always well behaved: they can create a flurry of activity with no break, effectively constituting a Denial of Service (DoS) attack against a commercial portal. Trawling a company's web site against their policies or their desires is, generally, not good citizenship.
I defended my intent to exhaustively search through all options in this way:
I would, whether it was manually or automatically, exhaustively search through all options regardless. It might take me a full day to do so—which I would undoubtedly spread over multiple sessions across several days—but the automation would not affect the total load against the manufacturer's front end or back end systems. It would only change the agent.
I would, to the extent possible, run the automation similarly to how I would perform the same actions myself, with plenty of space and plenty of breaks so that the choice I was making would not impact other users' experience.
I would, in the end, use the results to do what I was purporting to do: pick a chair and buy it. I wasn't there to publish or monetize the results, or to perform some personally profitable exercise across dozens of manufacturers daily, with all of the benefit going to me and none of it to them.
Automation in Practice
There are a number of good tools on the market to perform all sorts of automations across applications, whether those are web browsers or native applications running on the desktop.
The general category of these tools is "Robotic Process Automation," and while that may elicit visions of a gleaming metallic robot sitting a desk, manually typing at a keyboard—a robot maybe like the deceptively beautiful machine in Fritz Lang's "Metropolis"—in fact RPA is nothing more than using custom libraries in normal programming languages to interface in some way with running applications, maybe through inspection, maybe through a debugging API, or maybe through intercepting or otherwise accessing Win32 API calls on the desktop. It's just a computer controlling your application: nothing more and nothing less.
These applications have great value to enterprises, who often have legacy applications with deep and possibly unknown business logic that function perfectly well and would cost a small fortune (not to mention be highly disruptive) to replace. Corporations therefore use a variety of these tools to create automatons that manipulate one or more of their legacy applications to do things that the designers of those applications did not consider, or that (like my example here) are unusually tedious, error-prone, or otherwise awful. With enterprise usefulness comes enterprise pricing, and many of these are simply out of reach for the average individual, consumer, or hobbyist.
An exception to this is the Selenium product, which (as of the date I am writing this) is a F/OSS (free and open source) RPA commonly used by Quality Assurance (QA) engineers to test web-based Software as a Service (SaaS) offerings. Rather than employing teams of testers to click through every feature of an application, QA engineers write scripts against the proposed build of a web site using the Selenium library to exercise its workflows and "pass" or "fail" a build entirely programmatically.
All of this, of course, should sound very much like what I wanted to do, only that instead of confirming that the web site functioned correctly, I would be assuming that the web site did work correctly, and save the results of each workflow against the product page in a local database.
Which is what I did.
I continued my inspection of the site, using the developer tools in Chrome and the Selenium IDE to monitor and capture a flow for a sample product configuration from start to end. I identified how the underlying HTML indicated unavailable combinations and what elements would contain both price and ship time. A little foundational logic converted some dynamic text strings into more consistent and predictable numbers. I baked long—and, I hoped, considerate—delays into each click and refresh.
And I coded error handling. Lots and lots of error handling.
One night, after my work passed a few rounds and looked like it was complete, I set it against all combinations and ran it in "production" over the course of a few days.
I checked the database and overall progress as it executed, and before the end of the week was rewarded with a table of data that would, ostensibly, help me to make the same decision, with confidence, that every other customer was making entirely on their own.
Results
Was it worth it?
It was a smaller and less frustrating investment than the alternative: I spent less time getting the results through all of this effort than I would have spent generating the table by hand, and the experience was far less frustrating.
More importantly, I was both surprised and not surprised. I entered the experiment with the (unproven but strongly-felt) suspicion that somewhere in the universe of designer chairs there was a "diamond in the rough," some combination of configuration, price, and delivery that was not just "optimal," but whose value to me could not be otherwise guessed or deduced from quickly examining a tree of options through some informal method similar to a hill-climbing algorithm. I believed, without evidence, that only an exhaustive search would find the one combination that would be visually appealing, match the rest of my office, and come at both a price I could afford and a delivery date I could tolerate.
And that's exactly what happened. I reviewed the table and was surprised by how good the best option was, how well it fit my budget, and how soon I could get it. I was not surprised that my intuition turned out to be right—or maybe I was, just a little. Computers are great at exhaustive explorations, and, here, my computer, unsurprisingly, turned out to be better than me.
Why I'm Sharing This Story
You might wonder why I'm sharing this story, as, on the surface, it is neither about Artificial Intelligence (AI) nor about design. RPA is an interesting topic—and, sometimes, there are great RPA stories where the robots are semi-autonomous agents and use Machine Learning (ML) to make decisions about their actions rather than simply performing an exhaustive script.
This story is not one of those. And while it does dive into a problem space where—like my e-mail management project—it is looking for the proverbial needle in a haystack, it does not directly continue the theme I began of using Large Language Models (LLMs) to filter large volumes of content. What it does, though, is introduce RPA and third-party software into this discussion, the value and the pitfalls of reactive web design, and some of the ethics around using tools like Selenium to automate one's interactions with other companies' services.
These will all be important themes for next time, when we move on to social media, a new problem with filtering, and, ultimately, how to thoughtfully introduce and use a database of all of this content to make knowledge management both easier and more capable.