Our newest computer-using users

home

ยท Seattle

A few weeks ago, I finished up my post on AI ethics. I talked about 2025 being the year of AI agents, and how I am specifically interested in the future of autonomous computer-using agents. Basically, instead of crawling and scraping webpages to take an action and hoping that those pages never get changed, we will see more approaches using vision and contextual understanding that can practically spoof the usage patterns of a human.

Today, OpenAI released Operator, which is still in its early stages, but already showing quite some promise. More importantly, it is likely to be the first time that a lot of the world is introduced to the future of the internet and possibly their job.

In my opinion, this is big. Creating a reliable system that is able to use a computer and possibly also embed the reasoning engines being developed with models like the GPT o models and DeepSeek R1 into common knowledge work is pivotal to my predictions of knowledge and computer work displacement from the ethics post.

Since the advent of jobs behind a computer, there haven’t been perfect ways to automate those random busy-work tasks that take a bit of time every day. With the new patterns we are seeing like scheduled tasks and now AI computer usage, we are clearly getting close to a reliable solution to first, fix that problem, and second, do the rest of the job with it.

For businesses, this means there are some interesting choices to make. For average B2C, I see this as almost a requirement to start thinking about AI-SEO. By that, I mean sites have to be optimized for real-time web browsing capabilities of AI models and that the majority of the experience is optimized for your new AI computer-using ‘users’. For other businesses, I think it’s already time to start thinking about UX experiences that are going to be unlocked for your employees. One thing that comes to mind is in healthcare charting: with efficient AI computer use in the near future and some form of visual and audio context collector, maybe healthcare professionals can move beyond manual documentation.

Questions and speculations

Something that came to my mind during the Operator livestream was how the AI is browsing on an instance of Chrome. I wonder a few things:

  • Long term, will we see that computer-using agents continue using Chrome, or could sites become so optimized for these agents that we no longer develop certain sites for human usage, and in turn see new browsers developed for this use case? In other words, are sites of the future more like APIs for computer-using agents?

Example: people no longer go to OpenTable to book a table. Instead, they ask their assistant to book them a table for 2 at a place they think they’ll like. Every restaurant has a site optimized for computer-using agents which gives the agents easy access optimized for the agents’ understanding

  • There could be various benefits to these new agent specific browsers, such as component libraries and stylesheets that have been proven as more understandable or intuitive for the agents. In this scenario, could we see the rise of new frameworks and dominance that focuses on this? Like a shadcn/ui, but with no downloads necessary and not focused on human visual intuition.

The majority of the logic instead lives in the environment of the AI agent, and we develop a simple standard for relaying real-world data, like table availability, to the agent. This could be HTML of course, but should it be?

  • Focusing on creating a browser to act as a human makes a lot of sense right now, since it’s a lot more feasible than having every human update their sites to some new spec. But what if we did create a new browser and spec, and businesses get an easy way to feed it their context? Again, like giving it access to the open table count. I can imagine a future where personal assistants work great with this new browser, and can easily check table availability without the issues of sifting through a human-optimized site. The personal assistant is the ‘auth’ and would interface with a proof-of-humanity service to take actions for their owner.
  • Would someone own this spec, browser, agent, and proof of humanity?

Obviously, the short term will be more focused on sites and apps being optimized for human usage, and the AI vision will continually improve at understanding the world through a human lens. I just wonder if all the dots can connect to bring us to incredible productivity, even in the tiny tasks that we do mindlessly.

Comments

Loading comments...