
Coffee of the Week
Sam Altman changes plans, doctors lose to robots and the internet goes up in flames
What's up, folks! Welcome to another edition of DevCafé, where I bring you that warm sip of the most important news from the world of technology and artificial intelligence. This week was hectic, with bombshell announcements from OpenAI, news from Meta with Llama 4, and Amazon jumping headfirst into the voice model battle. Let's grab our virtual cup and dive into the news while it's hot!
Sam Altman on X: "Change of plans: we're actually launching o3 and o4-mini, probably in a few weeks..."
Sam Altman announced on X (formerly Twitter) a change in launch plans, indicating that versions o3 and o4-mini will be launched soon, likely within a few weeks. Additionally, he revealed that GPT-5 will be launched within a few months. Altman explained that this change is due to several reasons, including the ability to make GPT-5 even better than originally planned, as well as difficulties encountered in integration. The priority is to ensure sufficient capacity to meet expected demand.
It's incredible how OpenAI manages to keep everyone on the edge of their seats, isn't it? Sam Altman has a special talent for making announcements that turn the market upside down in less than 280 characters! This change of plans seems like a smart move: releasing intermediate models while preparing the ground for the highly anticipated GPT-5. I'm particularly curious to see what this "o4-mini" will deliver, as it seems to be a more accessible version of what GPT-4o would have been. And the concern about capacity is understandable – nobody wants another launch with endless queues like we've seen in the past, right?
OpenAI slashes time given to safety testing as it races to innovate
OpenAI has reportedly reduced the time spent testing the safety of its artificial intelligence models, sparking concerns that it is rushing towards powerful AI without adequate safeguards. The company used to allow months for safety testing but now grants only a few days. AI safety officers are concerned about the catastrophic risks of AI, including facilitating the development of bioweapons, and a former OpenAI researcher recently warned about the dynamics of an arms race leading to a dangerous race for AI.
This news gives me that "haste makes waste" feeling. I completely understand the competitive pressure OpenAI is facing, especially with Meta releasing Llama 4 and Anthropic always nipping at their heels. But honestly? Reducing safety testing from months to days seems a bit like driving at 200km/h blindfolded. If the famous open letter everyone signed two years ago talked about the existential risks of AI, how can cutting precisely the part that verifies these risks be justified? It's that old story: sometimes we need to slow down to reach the right destination faster – and not quickly reach the wrong place.
Memory and new controls for ChatGPT
OpenAI is testing ChatGPT's ability to remember things you discuss to make future chats more helpful. You're in control of ChatGPT's memory. Memory in ChatGPT is now more comprehensive and references all your past conversations to provide more relevant and personalized responses. You can turn referencing "saved memories" or "chat history" on or off at any time in Settings. You can also ask ChatGPT to change what it knows about you directly in conversation or use Temporary Chat for conversations that don't use or update memory. The more you use ChatGPT, the more helpful it becomes. New conversations build on what it already knows about you to make interactions smoother and more personalized over time.
Finally, a feature everyone has been asking for for months! How many times have you had to explain to ChatGPT for the thousandth time who you are and what you do? This memory update promises to end that "first date" feeling with every new conversation. What's nice is that they seem to have thought a lot about privacy, with all these control options. I confess I'm eager to test this – just imagine not having to repeat that I'm allergic to PHP every time I ask for help with code! At the same time, I think I'll use that "Temporary Chat" quite a bit for those, let's say... less professional questions. Like when I ask for help inventing creative excuses to skip the gym.
The Llama 4 herd: Ushering in a new era of natively multimodal AI innovation
Meta announces the first Llama 4 models, which will enable people to build more personalized multimodal experiences. Llama 4 Scout and Llama 4 Maverick are the first native open-source multimodal models with unprecedented context length support and built using a mixture-of-experts (MOE) architecture. Meta is also previewing Llama 4 Behemoth, one of the world’s smartest LLMs and Meta’s most powerful yet, to serve as a teacher for its new models.
Meta is coming in strong with this new generation! I love this "herd" concept – it's clear their strategy is to target different niches with specialized models, rather than a single model trying to do everything. This MOE (Mixture of Experts) architecture is quite interesting because it allows activating only specific parts of the model as needed, saving resources. What catches my attention most is this "Behemoth" – a very modest name, right? 😂 But if it really delivers what it promises, it could be Meta's trump card to finally compete head-to-head with GPT-4. And all of this being open source! Meanwhile, OpenAI continues to keep its models under lock and key...
Meta’s benchmarks for its new AI models are a bit misleading
Meta recently released a new AI model called Maverick. The model's benchmarks show it performing well in the LM Arena, but the version of Maverick available to developers differs from the version deployed in the LM Arena. Researchers pointed out that the Maverick in the LM Arena is an “experimental chat version.” A chart on the official Llama website reveals Maverick's LM Arena testing was conducted using "Llama 4 Maverick optimized for conversation." Some researchers noted behavioral differences between the downloadable version and the version hosted in the LM Arena. The LM Arena version appears to use lots of emojis and gives incredibly verbose answers.
Ah, the old story of benchmarks... always as reliable as next month's weather forecast! This tactic by Meta reminds me of that friend who shows their Tinder profile picture from 10 years ago. "Optimized for conversation" is an elegant way of saying "it's not exactly the same model you'll download." The funniest part is the emoji thing – imagine a super advanced model deciding that the key to success is stuffing 🔥😂👍 into every answer! This raises important questions about transparency in AI benchmarks. As a developer, I prefer a model that's honest about its limitations rather than one that looks amazing on paper but disappoints in practice. We urgently need more standardized and transparent metrics in the AI field.
Meta exec denies the company artificially boosted Llama 4’s benchmark scores
A Meta executive denied a rumor that the company tuned its new AI models to perform well on specific benchmarks, obscuring the models' weaknesses. Ahmad Al-Dahle, Meta's VP of generative AI, stated it's "simply not true" that Meta trained its Llama 4 Maverick and Llama 4 Scout models on "test sets." The rumor stemmed from a post on a Chinese social network by a user claiming to have resigned from Meta in protest of the company's benchmarking practices.
We're watching the drama unfold in real-time! First, we find out the benchmarks are based on different versions, and now even more serious accusations arise that Meta might have cheated directly. Training on the test sets would be like studying with the test answers – of course, you'll do well! What's interesting is that the denial came quickly, showing how damaging these accusations can be to the company's credibility. Ultimately, this all shows how the AI race is becoming increasingly fierce and how benchmarks have become a sort of battlefield. When everyone is obsessed with the numbers, the numbers start to lose their meaning. The problem is that we, the developers, are the ones caught in the crossfire trying to figure out which models are actually worth it.
Amazon unveils a new AI voice model, Nova Sonic
Amazon debuted a new generative AI model, Nova Sonic, capable of natively processing voice and generating natural-sounding speech. Amazon claims Sonic’s performance is competitive with frontier voice models from OpenAI and Google on benchmarks measuring speed, speech recognition, and conversational quality. Nova Sonic is Amazon’s answer to newer AI voice models, like the one powering ChatGPT’s Voice Mode, which feel more natural to talk to than the stiffer models from Amazon Alexa’s early days.
Amazon finally realized that Alexa was starting to sound like that GPS from the 2000s compared to the new AI voices. Nova Sonic seems to be the company's attempt not to fall behind in this voice interface race. I have to confess that the Alexa at home has become almost a decorative piece of furniture since I tried ChatGPT's Voice Mode and Google Gemini's voice assistant – the difference is stark. I'm curious to see how Nova Sonic compares in practice, especially since Amazon has an advantage few consider: billions of hours of voice interactions with Alexa to train its models. If they can turn this experience into a voice assistant that truly sounds natural and intelligent, they might regain lost ground. I just hope it's not another case of "amazing in benchmarks, disappointing in real life."
Google’s AI search shift leaves website makers feeling ‘betrayed’
In March 2024, website owner Morgan McBride was posing for photos in her semi-renovated kitchen for a Google ad celebrating the ways the search giant helped her family business grow. But by the time the ad ran about a month later, Google traffic had plummeted more than 70%, McBride said. Charleston Crafted, which features guides on home improvement projects, had weathered algorithm changes and updates in the past; this time, it didn’t recover. McBride suspected people were getting more renovation advice from the artificial intelligence answers at the top of Google search.
This story is heartbreaking. Imagine literally starring in a Google advertisement celebrating the success of your website, while behind the scenes the same Google is preparing an AI that will suck away your traffic. The most ironic part is that Google built its empire using content from sites like Morgan's, and now it's essentially cannibalizing the very sites that helped build its search index. We are witnessing a fundamental shift in the internet economy, where creating original content may no longer be enough to survive. Have we reached the point where all content creators will need to pivot to business models based on subscriptions or direct products, as organic traffic dries up?
WordPress.com launches a free AI-powered website builder
WordPress.com has launched a new AI website builder that allows anyone to create a functional website using an AI chat interface. The feature is free and aimed at entrepreneurs, freelancers, bloggers, and others who need a professional online presence. The AI builder includes 30 free prompts before users need to choose a hosting plan.
As someone who has spent hours (who am I kidding, days!) configuring WordPress themes and tweaking CSS, this news evokes mixed feelings. On one hand, democratizing website creation is incredible – now, literally anyone can have a professional online presence without having to learn code. On the other hand, there go some more front-end developer and web designer jobs! What intrigues me is the quality of these generated sites. Will they all start looking the same, with that "AI aesthetic" we're beginning to recognize? Anyway, this is another example of how AI is rapidly transforming tasks that previously required technical skills into simple conversations.
Adobe’s vision for accelerating creativity and productivity with generative AI
Adobe announced its vision for integrating generative AI into its products like Acrobat, Express, Photoshop, and Premiere Pro, enabling users to automate tasks, create content more easily, and expand their creativity. Generative AI will act as a creative partner, assisting users throughout the creative process and allowing them to focus on more important tasks.
After years of charging a fortune for their software, it seems they realized they need to add more value to justify that painful monthly subscription. Adobe has a unique position in the market – they are already the creative industry standard, so they don't need to "reinvent the wheel" like many AI startups are trying to do. Instead, they can focus on improving existing workflows. Imagine being able to say "remove this background object and replace it with a sunset" directly in Photoshop, or "create a smooth transition between these two scenes" in Premiere? As someone who has lost hours doing manual masking, this sounds like magic. The challenge will be balancing automation with creative control – professionals won't want to feel like the AI is making artistic decisions for them.
Anthropic steps up OpenAI competition, launches $200 subscription for Claude
Anthropic launched the Max plan for Claude, a new subscription tier for its viral chatbot and ChatGPT competitor. The plan has two price points: $100 per month or $200 per month, offering differing amounts of usage. Subscribers will get “priority access to new models and capabilities,” including Claude’s voice mode when it launches.
$200 per month? Anthropic is really betting on the theory that "if you charge more, people will think it's better"! Claude has always had this aura of being the "most polite and safe AI assistant," with less tendency for hallucinations and problematic responses than its competitors. I imagine this Max plan is aimed at professionals like lawyers, doctors, and researchers who need reliable answers and are willing to pay for them. What makes me curious is what exactly this "priority access" means in practice – will premium subscribers receive improved versions of Claude while the rest of us continue with the "diet" version?
OpenAI countersues Elon Musk, claims harassment
OpenAI countersued Elon Musk on Wednesday, citing a pattern of harassment by Musk and asking a federal judge to block him from any “unlawful and wrongful conduct” against OpenAI in a court battle over the future structure of the company that helped launch the AI revolution. The two sides are expected to begin a jury trial next spring.
And the drama continues! This saga between Elon Musk and OpenAI has more twists than a Netflix series. It's curious to see how a company that started with such noble ideals about "AI beneficial to humanity" ended up in the midst of such an ugly legal battle with one of its founders. What makes it even more intriguing is that both sides are developing powerful AI technologies – Musk with his xAI and OpenAI with GPT. It's almost like we're watching a war of tech titans, where the prize is nothing less than the future of artificial intelligence. The funny side is that while they fight in the courts, the rest of the industry continues to advance. Perhaps the real lesson here is about how difficult it is to maintain partnerships and shared visions when billions of dollars and potentially transformative technologies are at stake.
OpenAI's European Union Economic Blueprint
OpenAI is sharing the European Union Economic Blueprint, a set of proposals to help harness the promise of artificial intelligence to drive sustainable economic growth in the region and ensure AI is developed and deployed by Europe, in Europe, and for Europe. OpenAI outlined four principles to achieve Europe’s AI potential: establishing and growing the foundations necessary for sustained AI growth (data, energy, and talent); ensuring EU rules are streamlined and work in sync to enable AI pilots, rather than hinder them; maximizing the AI opportunity through widespread adoption across regions and society; ensuring AI is built responsibly and reflects European values.
Look at OpenAI trying to score points with European regulators... This "Economic Blueprint" looks like that gift you give your mother-in-law after a heated argument. The EU has been one of the strictest regulators regarding AI, with the famous AI Act imposing quite restrictive rules, and now OpenAI comes with this pretty proposal talking about "AI by Europe, in Europe, and for Europe." Translating from corporatese: "please don't regulate us too severely!". What catches my attention is the focus on simplifying rules – they are clearly feeling the weight of European bureaucracy. Deep down, I don't think they are even interested in Europe's well-being, they are just trying to clear the path to dominate the market. Let's see if the EU takes the bait...
AI hasn’t killed radiology, but it is changing it
While artificial intelligence isn't replacing radiologists, it has significantly changed the field. Two-thirds of radiology departments in the United States use AI in some form, and that number has doubled since 2019. There are roughly 340 FDA-approved AI radiology tools to date, and most are detection algorithms, which can look for everything from brain tumors and pneumonia to breast cancer and strokes. AI also has the potential to give patients more accurate results, and research has shown that when two radiologists read the same study, there’s a 3% to 5% discrepancy in their findings.
What we are witnessing is a gradual transformation of the radiologist's role, not its disappearance. The statistic that impresses me most is this 3-5% discrepancy between human radiologists – this shows how the interpretation of medical images has a subjective component that we often ignore. AI, in this context, is becoming a kind of more consistent "second pair of eyes." The highlight of this evolution is that, instead of simply replacing humans, AI in radiology is allowing professionals to focus on more complex cases and patient interaction. Perhaps this is the future for most professions: not replacement, but the elevation of human work to more creative and interpersonal levels.
When AI Outperforms Doctors with AI Alone
An article by Eric Topol and Pranav Rajpurkar explores recent studies showing AI systems working independently outperformed doctors paired with AI on medical tasks like screening, diagnosis, and management reasoning. The paper discusses explanations for these surprising findings, including physicians’ inadequate use of AI or “automation complacency.” It proposes rethinking the division of responsibilities between human clinicians and AI systems, advocating for an optimal partnership model where AI handles initial screening while doctors focus on complex cases. The article emphasizes the need to re-evaluate the roles of clinicians and AI, finding a balance between human expertise and artificial intelligence to enhance medical practice.
This study is truly surprising and, in a way, counterintuitive. Conventional wisdom has always been that "AI + human > AI alone > human alone," but it seems reality is more complicated. One explanation that makes a lot of sense to me is "automation complacency" – I've noticed this in myself when using GPS. When Waze is on, I pay less attention to streets and signs because I know the app will alert me. I imagine the same happens with doctors using AI – perhaps they are delegating too much of their critical reasoning. The proposal to use AI for initial screening while doctors focus on complex cases seems the most sensible path. As in many areas, it's not about humans vs. machines, but about finding the optimal division of labor.
Microsoft updates Copilot with the greatest hits from other AIs
Microsoft is updating Copilot with new features including memory, personalization, web-based actions, podcast creation, camera and screen analysis, deep search, and more. The new features aim to make Copilot more personal and powerful, allowing users to customize the look and feel and perform tasks using a web browser.
Microsoft is clearly playing the game of "take all the cool ideas others have already launched and put them into a single product." We've seen memory in ChatGPT, web actions remind us of WebPilot and other plugins, and podcast creation seems inspired by Meta's Artifact. But you know what's brilliant about all this? It works! For the average user, having all these features in one place is much more convenient than having to switch between multiple services. Microsoft rarely invents something completely new, but when it decides to integrate multiple technologies into a cohesive package, few do it better.
Paper: AI vs. Clinician Comparison in Virtual Urgent Care Diagnoses
A study compared initial artificial intelligence (AI) recommendations with final clinician recommendations in AI-assisted virtual urgent care visits. The study found AI recommendations were more often rated as higher quality than clinician decisions, suggesting AI may have a role in supporting medical decision-making in virtual urgent care.
Another study showing that in some cases AI can outperform doctors. We need to be careful with sensationalism here, but these results cannot be ignored either. For me, what's happening is that we are comparing AI on tasks where it is optimized to shine – following protocols, considering large amounts of information, and not being influenced by fatigue or personal biases. This doesn't mean AI is a "better doctor" overall, but rather that it has specific advantages in certain contexts. Virtual urgent care is a perfect environment for AI, as it usually involves common and well-documented conditions. The future of medicine is likely neither AI alone nor doctors alone, but rather an intelligent collaboration where each does what they do best. I envision a system where AI handles initial diagnosis and standard protocols, while human doctors focus on empathy, complex decisions, and atypical cases. The question isn't whether AI will replace doctors, but how we will redesign the healthcare system to leverage the best of both worlds.
Paper: Inference Time Scaling for Generalist Reward Modeling
The paper explores using reinforcement learning to enhance the reasoning capabilities of large language models (LLMs). It investigates how to improve reward modeling (RM) with more inference computation for general queries and the effectiveness of scaling performance computation with suitable learning methods. It further presents Self-Principle Critical Tuning (SPCT) to foster scalable reward generation behaviors in GRMs via online RL, resulting in DeepSeek-GRM models. It also employs parallel sampling to expand computation usage and introduces a meta RM to guide the voting process for better scaling performance.
Here we have one of those technical papers that seem complicated but bring important ideas. In simple terms, they are basically saying: "What if we use more computational power not only to train better models, but also to make them think more deeply during inference?" It's like giving the model extra time to reconsider its answers before responding – something we humans do naturally. In the developer world, this could mean that instead of always needing the latest and most expensive model, we could extract high-quality answers from smaller models just by giving them more time to "think." It's an elegant approach that reminds me that we don't always need a bigger hammer – sometimes just using what we have more intelligently is enough.
Paper: SoundStream, an end-to-end neural audio codec
The paper presents SoundStream, a novel neural audio codec that can efficiently compress speech, music, and general audio at bitrates typically targeted by speech-tailored codecs. In subjective evaluations using audio sampled at 24kHz, SoundStream at 3kbps outperforms Opus at 12kbps and approaches EVS at 9.6kbps.
Is this serious? SoundStream can compress audio down to 3 kbps while maintaining quality comparable to Opus at 12 kbps? For non-audio experts, this means you can stream audio using 1/4 of the previously required bandwidth while maintaining the same quality. Imagine the impact of this on video calls in areas with poor connections, or music streaming in countries with limited internet. This is one of those technologies that seem very technical and obscure, but quietly change the world when implemented at scale.
And so we reach the end of another hectic week in the world of technology and AI. What's clear is that the competition between the major players – OpenAI, Meta, Microsoft, Anthropic, and Amazon – is fiercer than ever. While Sam Altman surprises with sudden changes in plans for launching new models, Meta waves its "herd" of Llamas, Anthropic launches an expensive premium plan, and Amazon finally shows it doesn't want to be left behind in the voice interface race.
At the same time, we see worrying signs, such as OpenAI drastically reducing safety testing time and Google turning the lives of content creators into a real nightmare with its AI answers at the top of searches.
Amidst all this, the advancements continue to be impressive – from neural audio codecs revolutionizing compression to studies showing that AI already surpasses doctors in certain tasks. Technology advances at a frantic pace, and often the social, economic, and ethical impacts take a backseat.
As developers and technology enthusiasts, our role is not only to keep up with these changes but also to think critically about them. After all, we are not just observing the AI revolution – we are actively building and directing its future.
Until the next cup at DevCafé, folks! And remember: in an increasingly automated world, our humanity becomes our most valuable asset.