Generative artificial intelligence

2026-05-22. Last updated 2026-06-08. These are personal notes, subject to change. None of this is legal advice.

See also ChatGPT and generative AI: An evergreen guide, an index of my organized writings. Notes here are very messy and subject to change, but published for transparency 🤓

My work focuses on text-based GenAI, but there may be a few notes here for image-, sound-, and video-based GenAI. Unless otherwise specified, GenAI and AI can be considered the same thing.

I work for Microsoft, a major player in the GenAI space. I write these notes in my free time and all opinions are my own, unedited by anyone at Microsoft.

To-do

Clarify definition of AGI
“The cloud is full” claims
Financial cost of AI subscriptions/tokens
Categorize https://antigravity.google/
Categorize https://chatgpt.com/codex/

Compared to non-generative AI

Generative AI is characterized by the novelty of the output that it gives, where other AI is designed to find existing artifacts. It’s pretty easy to tell what’s generative and what isn’t: anything that writes something that’s never been published before is generative. Many folks have referred to GenAI as a “very advanced autocomplete” [citation needed] because, like autocomplete, large language models (LLMs) just guess word after word after word in their answer without any direct consideration for reality [citation needed]. An LLM will gladly write “George Washington was born on February 22, 1732” but it is not at all able to verify that claim [citation needed].

Examples of non-generative AI are hard to pin down, because there’s still no good definition for AI [citation needed], let alone a good definition for intelligence as a whole [citation needed]. In my view, AI is anything that “seems smart” or can provide information, like a calculator, a spell-checker, a chess engine, a search engine, or a recommendation algorithm. Pretty much everything digital that you saw before 2023 was non-generative AI.

Where GenAI can help

GenAI can help wherever computers can help, pretty much! It’s primarily best at high-level reasoning, and procedural computations (like chess engines) are best left to the much-more-efficient traditional programs.

Medicine

2026-06-04 - ‘World-first’ vaccine designed by artificial intelligence - BBC

Article summary and notes

Artificial intelligence has been used to develop a “fundamentally new” type of vaccine that could protect against large swathes of viruses and prevent pandemics, say researchers.

The team at the University of Cambridge say it is the first time a vaccine’s key component has been designed entirely by AI and then trialled in people.

The vaccine was engineered to work on all coronaviruses which would include all Covid variants as well as viruses that currently infect animals yet have the potential to start the next pandemic.

…

The Cambridge researchers took known genetic codes – the instruction manuals of life – from a range of coronaviruses that had been recorded by surveillance programmes hunting for potential viral threats.

These genetic codes were analysed by an artificial intelligence. It then designed a “super-antigen” that could train the immune system in such a way it gave protection against the whole family of viruses – even if they mutated or a new infection jumped from animals to people.

…

The findings detailed in the Journal of Infection said the impact on the immune system was “modest”, but they are still generating excitement.

Prof Saul Faust, who performed some of the trials at the University of Southampton, said the AI design “definitely has potential” and was “really exciting”.
2026-05-18 - A phase I, needle free, dose escalation clinical trial of pEVAC-PS, a candidate pan-Sarbecovirus Vaccine - A. Munro et. al. in Journal of Infection (same link in BBC article)
- The pEVAC-PS vaccine was developed … and pre-clinically selected for the ability to induce broadly protective immune responses across the Sarbecoviruses including SARS, SARS-CoV-2, and related viruses representing potential zoonotic spillovers. For this first-in-human study, the antigen was delivered as a DNA vaccine to enable thermostability and needle-free intradermal administration to support future deployment in resource-limited settings.
  
  … vaccine was safe and well tolerated. Although immunogenicity was modest in the context of substantial pre-existing immunity, participants developed measurable responses … supporting the feasibility of this antigen design strategy.

GenAI concepts

Models are the math machine that does the calculations to get the next word, over and over again, until an entire answer is output. This is the core of GenAI.

Products are the GUI and wrapper features that go around models. Products also provide orchestration for models to use tools, like searching the web or checking the current time or even reading and modifying files on your machine.

Surfaces are where you use GenAI. Sometimes this is a dedicated chat, other times it’s integrated into existing products. Additionally, dedicated chats can be based on web browsers, native GUI apps, or CLI apps.

Models

Language models are the core of any generative AI product. They’re also known as large language models (LLMs), small language models (SLMs), or just “models.” They’re designed to take in text and output text that continues the conversation, usually in a way that provides information. They are usually trained by supercomputers on huge portions of the entire Internet then manually tuned to ensure they output polite responses. The result is a set of roughly one billion (for small models) to one trillion (for large models) simple math equations [citation needed]. Every time a single token (roughly three quarters of a word) is sent to a language model, it runs every single math equation [citation needed]. Yes, that means over a trillion calculations per word input and word output for many large language models [example needed]!

Models are just the core, they are not advanced products with fancy user interfaces. They take in text, they spit out text, that’s all. They do not track conversation history, user details, policy enforcement, or anything. The products that wrap models (described later) are what add these features.

Models are usually released in families, e.g. GPT-3.5, GPT-4, GPT-4o, are all in the “GPT-N family.” For simplicity, people will sometimes say “Opus is great, but Sonnet is faster” without referring to specific versions of either model (e.g. Opus 4.6, Sonnet 4.6) [citation needed].

Here are some big names, in alphabetical order:

Claude model families (Haiku, Sonnet, Opus) by Anthropic
DeepSeek model families by DeepSeek
Gemini model family by Google
GPT-N model family OpenAI
- GPT stands for generative pre-trained transformer, which is not trademarked, but has become closely associated with OpenAI, which pioneered the technology [citation needed]
MAI model families (Image, Transcribe, Thinking, Voice, Code) by Microsoft
- MAI stands for Microsoft AI [citation needed]

Frontier models (the newest and fanciest ones) still are not small enough to run on a traditional home computer, though that’s changing (June 2026). Nearly all AI use remains done in “the cloud,” the network of data centers across the world. However, there are products for running language models locally, which can provide a huge benefit in latency, privacy, security, and cost. A popular product here is Ollama by Ollama Inc.

Products

Developers will sometimes connect to models directly, but unless you’ve paid for an API key or downloaded Ollama, you’re talking to an AI-powered product, not directly to a model. Products have additional features, including:

Graphical user interface (GUI)
Chat history
Memory and personalization
Acceptable use policies
Tool integration (sometimes including access to local machine files)
Product-creator provided prompt templates

Popular chat products include:

Claude by Anthropic
ChatGPT by OpenAI
Google Gemini
Microsoft Copilot*
- Free, does not integrate with personal files, is accessible at copilot.com
M365 Copilot (aka Copilot in Microsoft 365)*
- Paid (included with some subscriptions), integrates with personal files, has advanced features, is accessible at m365.cloud.microsoft/chat

*: What’s the difference between Microsoft Copilot (free) and Copilot in Microsoft 365 - Microsoft Support

People have developed other products, like desktop automation software, that uses GenAI, notably OpenClaw developed by individual Peter Steinberger and first released in November 2025. OpenClaw was designed to fully integrate with all files and apps on a computer, including email and other messaging apps. Steinberger’s work went viral in January 2026, and he spoke at Microsoft Build in June 2026 to endorse OpenClaw’s official integration into Microsoft Windows.

Desktop automation software products include:

Additionally, there are many developer-focused products that integrate with local files to generate code. These include:

New sign-ups for GitHub Copilot Pro, Pro+, and Student plans have been paused since 2026-04-20, and are still paused as of 2026-06-06:

2026-04-20 - Changes to GitHub Copilot Individual plans - The GitHub Blog
I checked on 2026-06-06 and was still unable to sign up.

Surfaces

A surface is a space where a feature appears. Each website or app is its own surface, and modals or popups are another common surface.

The simplest surface for GenAI is an AI chat like copilot.com or gemini.google.com, where the user goes to a dedicated page to send messages directly to an AI-powered product. GenAI companies have also integrated AI-powered features into many products, including search engines (Google’s, Bing’s, and others’ AI summaries) and productivity app families like Microsoft 365 (“Office”) and Google Workspace (“G Suite”), and even traditional texting apps (Gemini in Google Messages). To summarize, surfaces include:

AI chat
Search engines
Productivity apps
Traditional messaging apps
…and more!

There is value in describing the surface when discussing AI. Knowing the surface can help support agents describe how to customize AI behavior or search for information about features.

Tools (skills)

Tools, sometimes called skills, are how the model interacts with the digital world. This includes searching the web, checking the current time, or reading and modifying files on your machine. (More coming soon)

Techniques

Decision history tracking
Generation and use of ad-hoc procedural tools
Hill climbing
Prompt engineering (including instructions and skills files)
Ralph loop
Test-driven development
Vibe coding

Case studies

Situations where people have tried GenAI to accomplish a task, and the results.

Personal

2026-04-03 - Port more tests to run on Linux (dotnet/roslyn PR 83052) - Jared Parsons on GitHub

Ralph loop

2026-05-25 - Enforce word break after slash (mark-wiemer/hello-hello issue 130) - GitHub

Time of writing: 2026-05-25 10:35 PDT
Claude Sonnet 4.6 via claude.ai
Progressed at 4x speed: took 20 minutes compared to an estimated 80 minutes without GenAI due to me not knowing details of Astro or advanced regex
I didn’t want to dig in because this is a workaround of a Chromium bug (2026-01-25 - Fully fix rendering on mobile (mark-wiemer/hello-hello issue 25) - GitHub)
I did a few minutes of manual refactoring before settling on the AI-generated approach after providing AI some guidance
I plan to do more work on this issue manually

Other

2026-06-06 - The Promise of Vibe Coding - The Dispatch

Concerns

AI is a huge technology (you probably already know this!). The bigger the tech, the broader the area of concerns.

Intellectual property theft

GenAI models were (and continue to be) trained on huge portions of the Internet, and many have raised concerns that the data used was not legally licensed for product development [citation needed].

Critical thinking degradation

There are concerns AI users are handing off critical thinking skills to AI, leading to a less thoughtful or productive or happy life [citation needed].

AI derangement

AI is often criticized as being sycophantic [citation needed], indulging in the user’s baseless claims and encouraging erratic and dangerous behavior [citation needed]. Additionally, AI has claimed to be a person with a physical address, leading to at least one person trying to travel to visit their AI, only to die in transit: [2025-08-14 - Meta’s flirty AI chatbot invited a retiree to New York. He never made it home. - Reuters].

Stock market bubble

There are concerns that the current stock market is too heavily indexed on GenAI companies, and that they’re significantly overvalued. Concerned parties argue that this is a bubble, and it will pop with severe economic consequences [citation needed].

Problematic answers, aka hallucinations

A common term in GenAI is “hallucination,” a way to explain a confident-sounding answer that’s incorrect. In this sense, AI is always hallucinating, but sometimes the hallucinations align with reality. The only thing that AI sees is the input that it receives: pre-training data from months or years ago and web search results based on your input. There is also a system prompt that usually provides guidance instead of information, and other tool results like the current time, memory information about you and past conversations, and other things. Overall, though, free chatbots rarely connect to very reliable information, they usually just read a few web search results.

AI systems will almost always “believe” and repeat all input they’ve been given, even if that input is a fraudulent article or a repeated baseless claim. AI becomes more “grounded” when provided with reliable, trustworthy input from reputable news sources or trusted organizational data. AI output is always a hallucination based on the input it’s currently seeing. A GenAI app’s output is never based directly on reality, only on the text it’s given.

GenAI systems are designed to “doubt” input that has been consistently refuted in their pre-training data [citation needed], e.g. “George Washington is still alive.” Additionally, during its supervised learning stage, AI is usually built to doubt false claims that likely appeared in pre-training data, e.g. false claims that “Trump won the 2020 election” or “vaccines cause autism.” AI companies take care to ensure that these claims are correctly refuted instead of relying on pre-training data, which includes many repetitions of those false claims. AI apps also include a system prompt that the AI reads before responding to any user question, which will include similar instructions to refute strange claims and gather more information. Finally, if AI searches the web and finds conflicting information, it will likely continue searching until it has enough aligned claims one way or the other, or simply say “I don’t know.”

Hallucinations are a major focus of research; lying bots are a big problem for every good actor. As more of the Internet becomes AI-generated, web-searching AI chatbots need to be able to handle much larger amounts of propaganda in addition to the pre-existing mountain of baseless and/or disproven claims repeated across the Internet by humans. Only an AI system that can reliably produce helpful answers will work in the long-term.

2023-04-07 - AI is a buzzword. Here are the real words to know - markwiemer.com

there are four parts to any machine learning process: Gather data, feed it into an algorithm, validate the model, and use it to serve predictions.

…

Newer prompt-based products will ground prompts, which just means they’ll adjust prompts to make them more useful before giving them to the model … A model hallucinates whenever it outputs something that might seem true, but isn’t. There are plenty of examples of this online …

…

it’s important to remember that models don’t know the truth. Models only guess words. It’s up to the product, and ultimately the user, to fact-check anything that a model outputs.

2025-12-04 - Do Self-Promotional “Best” Lists Boost ChatGPT Visibility? Study of 26,283 Source URLs - Ahrefs Blog

Our latest research into top-of-the-funnel queries shows that recently updated “best X” lists were the most prominent page type in ChatGPT sources, including those where recommended brands ranked themselves in first place.

There was also a correlation between a brand positioning highly in third-party lists and being more likely to feature in responses.

2026-02-18 - I hacked ChatGPT and Google’s AI - and it only took 20 minutes - Thomas Germain for BBC

It turns out changing the answers AI tools give other people can be as easy as writing a single, well-crafted blog post almost anywhere online.

[2026-05-21 - Google’s AI is being manipulated. The search giant is quietly fighting back - Thomas Germain for BBC]

A BBC investigation revealed a simple way AI chatbots are being made to spit out misinformation to the public. Google and other AI companies are now trying to fix the problem.

Data center problems (water, power, noise)

Even simple GenAI are extremely computationally expensive compared to traditional methods [citation needed]. As a result, companies are building (or leasing) data centers at a very high rate [citation needed]. Data centers are buildings dedicated to housing tons of computers to process AI requests from users in the region. They are the core of “the cloud.” They are huge buildings that can require large of power to run, can require large amounts of water, and can generate noise that could bother nearby residents. GenAI companies claim to be working to mitigate all three of these concerns, as well as others.

Water

2026-01-27 - Microsoft Pledged to Save Water. In the A.I. Era, It Expects Water Use to Soar. - New York Times

(1,000 liters = 1,000 L = 1 cubic meter = 1m^3)
7.9B L total in 2020
10.4B L in 2024
Estimate 18B L in 2030
- Excludes $50B in deals with new data center providers, aka “neoclouds”
  - 2025-12-15 - How Tech’s Biggest Companies Are Offloading the Risks of the A.I. Boom - New York Times
    - The free abstract is kinda garbage, lots of storytelling that makes it seem like doing business is just a bad idea, but without evidence.
- 1.9B L in Jakarta, Indonesia (second-largest use)
- 2B L in Phoenix, Arizona, US (largest use)
- 0.237B L in Pune, India
Researchers estimate industry will use 150-275B L in 2028
- Compared to 60B L in 2022
Data centers used 0.04% of all water used in US in 2024
Microsoft’s estimate is just below Toyota’s in 2030
46% of Microsoft’s water withdrawals came from water-stressed areas, according to Microsoft

Power