
Coffee of the Week
Where models think, dolphins talk and billions are spent
Welcome to another edition of DevCafé, where I serve you the hottest news from the AI world while you enjoy your coffee! This week has been a real rollercoaster in the tech universe, with startups reaching astronomical valuations, giants launching new features, and even dolphins talking to AI. Grab your favorite mug and let's dive into the biggest developments that shook the industry in recent days!
OpenAI Co-founder Ilya Sutskever's Safe Superintelligence Valued at $32 Billion
Safe Superintelligence (SSI), the AI startup led by OpenAI co-founder and former chief scientist Ilya Sutskever, has raised an additional $2 billion in funding with a $32 billion valuation. Sutskever left OpenAI in May 2024 after apparently playing a role in a failed attempt to oust CEO Sam Altman. He founded SSI with Daniel Gross and Daniel Levy, who stated the company had "one goal and one product: safe superintelligence."
Wow, seems like starting an AI startup with a fancy name and a tagline about "superintelligence" is the new "artisanal bakery" trend! Seriously though, I'm amazed how Sutskever bounced back so quickly after that OpenAI drama. $32 billion for a company that hasn't delivered anything concrete yet shows how thirsty the market is for anything AI safety-related. If SSI actually delivers on its promises, it could be revolutionary. If not... well, it wouldn't be the first valuation bubble to burst, would it?
Open Source Pioneer Aims to Liberate Robots
Hugging Face has acquired open-source robotics startup Pollen Robotics to help "democratize" robotics. Hugging Face plans to sell the robot while also allowing developers to download, modify, and suggest improvements to its code.
Open-source robots? Sounds like the beginning of a sci-fi movie that doesn't end well for humanity! But seriously, I love this initiative from Hugging Face. They've already revolutionized access to AI models with their platform, and now they're bringing that same philosophy to robotics. Imagine the potential when thousands of developers can collaborate on robots that solve real problems without being constrained by corporate interests. If robotics follows the same path as open-source software, we can expect an explosion of innovation in coming years.
Perplexity AI in Talks to Integrate Assistant into Samsung and Motorola Phones
Perplexity AI Inc. is in discussions with Samsung Electronics Co. about integrating its assistant into the smartphone giant's devices and has already reached a similar agreement with Lenovo Group Ltd.'s Motorola.
Perplexity is playing with the big boys now! Look how they're managing to break through the bubble of traditional assistants like Google Assistant, Siri, and Alexa. Their direct integration strategy with smartphone manufacturers is brilliant—after all, what's the point of having the best AI assistant if nobody uses it? If these deals materialize, Perplexity could become a household name overnight. For us consumers, this can only improve competition and, consequently, the quality of available assistants. I bet Google and Apple are scratching their heads right now!
Wikipedia Gives AI Developers Its Data to Ward Off Scraping Bots
Wikipedia is trying to prevent AI developers from scraping its platform by releasing a dataset specifically optimized for training AI models. The Wikimedia Foundation announced on Wednesday it partnered with Kaggle—a Google-owned data science community platform hosting machine learning data—to publish a beta dataset of "structured English and French Wikipedia content."
It's like parents allowing a controlled house party to keep teens from unsupervised clubbing! Wikipedia finally understood it can't stop AI models from using its data, so it's offering an "official" dataset to maintain some control. Clever, right? This approach benefits everyone: developers get structured data access, Wikipedia maintains some content control, and AI models may become more accurate when citing encyclopedia information. A smart diplomatic move in a field full of data usage tensions.
Chatbot Arena Is Becoming a Real Company
Chatbot Arena, an academic project whose website became a popular spot for visitors to test new AI models, is transforming into a company. Chatbot Arena leaders announced the formation of LMArena, which they hope will allow faster expansion. The platform lets people test cutting-edge AI models head-to-head and vote for their favorites in site leaderboards closely watched by the tech community.
Who would've thought a "Claude vs. GPT?" debate would become a real business? Chatbot Arena's story reminds us how side projects can evolve into something much bigger. What started as an academic experiment became a sort of "Billboard Hot 100" for AI models, with companies literally adjusting strategies based on rankings. Now that it's becoming a company, I'm curious how they'll monetize without losing their hard-earned credibility. Will blind testing remain impartial when investor profits are at stake? Either way, never underestimate the power of rankings and human competitiveness!
Netflix Tests New AI Search Engine to Recommend Shows and Movies
Netflix Inc. is testing new search technology for subscribers that employs artificial intelligence to help them find TV shows and movies, expanding its use of the technology. The OpenAI-powered search engine lets customers search for content using much more specific terms, including the subscriber's mood, the company said. It will then recommend options from the company's catalog.
About time, Netflix! I'm tired of typing "action movie with submarine" and getting romantic comedies as suggestions. This update could solve one of the platform's biggest problems: helping people find what they actually want to watch during their 15-minute pre-sleep window. Imagine being able to say "I'm heartbroken and need something funny but not too silly" and actually finding suitable content! The OpenAI partnership also shows how even giants like Netflix need external help with advanced AI. If this works well, I bet all other streaming platforms will rush to implement something similar.
AI-Generated Music Accounts for 18% of All Tracks Uploaded to Deezer
About 18% of songs uploaded to Deezer are fully AI-generated. The French streaming platform said over 20,000 AI-generated tracks are uploaded daily—nearly double the number reported four months ago. The growing use of generative AI in creative industries has triggered a wave of lawsuits, with artists, authors, and rights holders accusing AI companies of using copyrighted material without consent or compensation to train their models.
Whoa, 18%! That's nearly 1 in 5 new songs on Deezer made by robots! This stat gives me mixed feelings. On one hand, it's amazing how technology has democratized music creation—anyone can now produce tracks without years of instrument training. On the other hand, where does this leave human artists? Will we soon have entirely AI-generated playlists without any human touch? The most interesting part is many listeners probably don't even realize when they're hearing AI-made music. We're entering an era where the question won't be "Do you like this song?" but "Do you know who (or what) made this song?"
ElevenLabs Establishes Japanese Subsidiary ElevenLabs G.K.
The global leader in AI voice technology expands to the Asia-Pacific region by launching an international hub in Japan. The new Japanese entity will focus on adapting ElevenLabs' cutting-edge voice generation platform to the Japanese market, addressing the region's unique linguistic and cultural requirements. ElevenLabs partnered with DOCOMO Innovations, TBS, MBC C&I CO., LTD, and LLSOLLU. The company received strong investor support, with backers viewing Japan as a strategic market for AI voice technology.
Smart move by ElevenLabs! Japan isn't just a tech powerhouse but also a massive cultural hub with anime, games, and other media that could hugely benefit from AI voice generation. Imagine the impact on dubbing and localization industries! Adapting for Japanese won't be easy—it's a language with important tonal nuances and unique cultural expressions. But if they get it right, the market is enormous. Plus, this experience could open doors to other Asian markets like Korea and China. This expansion shows how AI voice generation is rapidly evolving from a tech curiosity to a fundamental component across global industries.
Copilot Vision Now Available for Free in Microsoft Edge
Copilot Vision is now available for free in Microsoft Edge. It can literally see what's on your screen (if you opt in). It's incredible! It will think aloud with you while you browse online. No more over-explaining, copy-pasting, or struggling to put things into words.
An assistant that can see my screen? Sounds useful but slightly creepy! Microsoft is going all-in on AI with Copilot, and this vision feature could truly change how we interact with the web. Imagine no longer needing to copy-paste article excerpts to ask questions about them? Or being able to ask "what does this error mean?" while Copilot looks directly at your error screen? This is a game-changer for developers, researchers, and even tech-challenged grandparents. Of course, this raises privacy concerns (I don't even want to imagine Copilot seeing my online shopping history), but if the opt-in feature is well implemented, it could revolutionize web browsing.
Satya Nadella Announces New Agent Capabilities for Copilot Studio
Satya Nadella announced Copilot Studio now has agent capabilities allowing anyone to create agents that act within user interfaces of desktop and web applications. Charles Lamanna also revealed agents can now click, type, and interact with desktop/web apps without needing APIs.
Microsoft is seriously biting into the automation market. These new Copilot Studio features are game-changing, especially the no-API-required aspect. Anyone who's tried automating tasks knows the headache of dealing with legacy systems lacking decent APIs. Now imagine creating an assistant that simply "sees" and interacts with any app like a human would—it's practically magic! This could revolutionize tech support, customer service, and administrative work. Most interesting is Microsoft's democratization of these tools—"anyone" can create these agents. Is Nadella trying to turn us all into mini AI creators? Will we soon see marketplaces for user-made "Copilot agents"?
Claude Gains Research Capabilities
Anthropic launched new features for Claude including search and Google Workspace integration to make it a more informed, capable collaborator. The search feature lets Claude find and analyze information from multiple sources, while Google Workspace integration connects it to user emails, calendars, and documents.
Finally, Claude gets research superpowers! It was frustrating watching the poor thing try answering questions about recent events without internet access. With this update, Anthropic is clearly targeting GPT and Perplexity territory. The Google Workspace integration is intriguing—imagine asking Claude to summarize last week's important emails or organize your calendar? It's like having a personal assistant already plugged into your digital life. Of course, this raises privacy/security questions, but if implemented well, Claude could evolve from smart chatbot to essential productivity tool. The AI assistant race keeps heating up!
xAI Adds 'Memory' Feature to Grok
xAI is introducing a "memory" feature for Grok chatbot, allowing it to remember details from past user conversations. This enhancement aims to provide more personalized responses based on learned preferences. The feature is in beta on Grok.com and Grok iOS/Android apps, with plans to expand to X.
Late to the LLM party with Grok, xAI now tries differentiating itself with memory. In theory, it's brilliant—who isn't annoyed constantly re-explaining preferences to virtual assistants? A Grok remembering you hate horror movies or prefer technical explanations could create a truly personalized experience. But I have concerns: How deep does this "memory" go? How much will Grok remember? What happens to this data? Knowing Musk's data privacy history (especially on X), we should watch this development closely. Either way, it's another step toward AI assistants that feel like they know us as real people.
Grok Gains Canvas-Like Tool for Creating Docs and Apps
Grok gained a canvas-like feature for editing and creating basic documents and apps. Called Grok Studio, it was announced on X last Tuesday. Available to free and paid Grok users on Grok.com, Grok Studio doesn't seem materially different from previous canvas tools. It lets users visualize HTML snippets and run code in languages like Python, C++, and JavaScript, with all content opening in a right-side window next to Grok's responses.
Seems Musk wants to turn Grok into an AI Swiss Army knife. First memory, now a dev canvas. Grok Studio strongly resembles existing code playgrounds like CodePen or Replit, but built directly into the chatbot. It's an interesting addition, especially for developers wanting quick code tests or simple prototypes. However, as the article notes, there's nothing revolutionary here—similar tools already exist. The difference lies in integration with Grok's ecosystem (and potentially X). I'm curious if this will evolve beyond just another code playground into a more robust development platform.
DolphinGemma: How Google's AI Is Helping Decode Dolphin Communication
DolphinGemma, a large language model developed by Google, is helping scientists study dolphin communication and hopefully discover what they're saying. The project, in collaboration with Georgia Tech researchers and fieldwork by Wild Dolphin Project (WDP), aims to analyze dolphin vocalizations, generate dolphin-like sound sequences, and eventually establish a shared vocabulary for cross-species communication. By identifying recurring sound patterns, the model could help researchers uncover hidden structures and potential meanings in natural dolphin communication, bringing us closer to future human-dolphin dialogue. Google plans to share DolphinGemma as an open model this summer to assist researchers studying other cetacean species.
Now THIS is AI being used for something truly incredible! Seriously, who hasn't dreamed of chatting with dolphins? DolphinGemma shows how AI can transcend purely human applications to bridge connections with other intelligent species. Imagine discovering dolphins have inside jokes, group gossip, or philosophical debates? Beyond the "wow" factor, this has huge implications for marine conservation and bioethics. If we can understand what other species communicate, we might finally consider their "interests" more directly in environmental decisions. Google's open-model plan is especially exciting—we could see similar techniques applied to whale, elephant, and other complex social animal communication. Douglas Adams would be proud!
Google Officially Links AI Overviews to Its Own Search Results
After testing for a month, Google officially launched this method to help researchers explore more topics. Google said it's doing this to make topic exploration easier for researchers, having heard users find direct links to relevant results pages helpful.
Hmm, Google making it easier to go from its AI... to more Google? How convenient! This integration makes sense from a UX perspective. If an AI overview gives me a summary about "how to make sourdough bread," naturally I'd want to click for more detailed results. The cynical part of me sees this as Google protecting its core search business—after all, if users are satisfied with AI answers alone, who'll click on search ads? That said, this approach might actually combat misinformation by letting users verify sources behind AI summaries. A small step showing how Google's trying to integrate AI without cannibalizing its main business model.
Google Makes Gemini Live Camera and Screen Sharing Free on Android
Google began widely rolling out Gemini Live camera and screen sharing for Advanced subscribers, with Project Astra-powered features soon free for all Android users. Gemini Live now lets you ask questions about what's on your screen or camera. Screen sharing can be quickly initiated by launching Gemini's overlay and tapping the new "Share screen with Live" chip. After confirming, you'll see a countdown next to your status bar time. Google launched the new phone-call-style notification for Live. The camera and screen sharing join how you can chat with Gemini Live about an image, PDF, or YouTube video.
Google's playing its cards right. Making premium features free—especially something as powerful as real-time camera analysis—shows how determined they are not to lose ground to Microsoft and OpenAI in the AI race. Great news for Android users who now get a truly capable visual assistant for free. Imagine pointing at a restaurant dish asking "does this have gluten?" or showing a plant to learn care tips. Screen sharing is also great for remote tech support—you can literally show Gemini what's happening on your phone for help. Of course, privacy concerns exist (we're literally giving Google eyes), but for many users, convenience will easily outweigh these worries.
Gemini 2.5 Flash Launched!
Google DeepMind released Gemini 2.5 Flash, a hybrid reasoning model letting you control how much it "thinks," making it ideal for tasks like chat apps, data extraction, and more. An early version is available in Google AI Studio.
Google finally joined the "reasoning" wave! After OpenAI and Anthropic started this whole models-"thinking" trend, it was only a matter of time before Google followed. What's interesting about Gemini 2.5 Flash is this control over "how much" it thinks—like a slider between speed and depth. This is great for developers balancing costs, latency, and response quality. For simple tasks, let the model respond quickly; for complex analysis, tell it to "think more." It's like having an intern you can instruct to work faster or more meticulously as needed. Can't wait to see how developers use this to create more responsive, intelligent apps without sacrificing answer quality.
Advancing AI Systems Through Progress in Perception, Localization, and Reasoning
Meta FAIR is releasing new research artifacts that improve perception understanding and support achieving advanced machine intelligence (AMI). This includes the Meta Perception Encoder, Perception Language Model (PLM), and Collaborative Reasoner.
Meta's quietly doing amazing work in its corner. While we all hype OpenAI and Google releases, Meta FAIR keeps advancing fundamental areas like perception and collaborative reasoning. The coolest part? They often make this research openly available, contributing to the entire field. This focus on perception is particularly interesting—after all, for AI to truly understand our world, it needs to "perceive" more completely and contextually. Collaborative Reasoner also sounds promising for applications requiring multiple agents to solve complex problems together. It's like Meta's building foundational blocks while others focus on flashier end products. Long-term, this basic research could have much deeper impacts.
OpenAI Is Building a Social Network
OpenAI is working on its own X-like social network, according to multiple sources familiar with the matter. Though still in early stages, the project focuses on ChatGPT image generation featuring a social feed. This could heighten Altman's already bitter rivalry with Elon Musk and put OpenAI on collision course with Meta, which also plans adding a social feed to its upcoming standalone AI assistant app.
Another social network? Seriously, OpenAI? As if we don't have enough time-wasting options already! I imagine it would be like Instagram but using prompts instead of filters to generate amazing images. The differentiator might be this AI-generated content focus, creating a space where creativity depends on prompt-crafting skills rather than technical abilities. The Musk rivalry adds extra drama—every OpenAI move now seems interpreted through this feud. What worries me is the potential impact on our already saturated attention economy. Do we really need another platform competing for limited time? Then again, if they create something truly unique... well, I'm curious.
OpenAI in Talks to Buy Windsurf for About $3 Billion
OpenAI is negotiating to acquire Windsurf, an AI-assisted coding tool formerly known as Codeium, for approximately $3 billion. The deal would be OpenAI's largest acquisition yet and could help the company face growing competition in AI-powered coding assistants.
Wow, OpenAI's pockets run deep! $3 billion isn't pocket change. This makes strategic sense—the coding assistant market is one of today's fiercest AI battlegrounds, with Microsoft's GitHub Copilot dominating. Windsurf (Codeium to insiders) has nice tech and loyal users but needs financial muscle to compete with giants. OpenAI clearly wants to dominate not just chatbots but also developer-specific tools. Ironically, this contradicts OpenAI's original nonprofit "AI for humanity's benefit" rhetoric. Now it seems like any traditional tech company making billion-dollar acquisitions. As they say—if you can't beat 'em, buy 'em! Too bad Windsurf's my favorite coding assistant... if OpenAI buys it, I'll probably stop using it. Every day I grow more distant from OpenAI and its strategies...
Introducing GPT-4.1 in the API
A new series of GPT models with major improvements in coding, instruction following, and long-context handling—plus our first nano model. GPT-4.1 models outperform GPT-4o and GPT-4o mini across the board, with big gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens—and better long-context understanding. Knowledge cutoff updated to June 2024.
OpenAI keeps this breakneck release pace! I'd barely gotten used to GPT-4o and already there's a new model. GPT-4.1 seems particularly focused on developers, with coding improvements. That 1-million-token context window is serious—imagine feeding entire books or massive codebases and having the model actually understand everything! Most intriguing is this "nano" model they mention. Could we finally get powerful GPT versions running locally on our devices? That would revolutionize privacy and offline use. Regardless, the model race continues full speed, and we users win with ever-more-capable tools. We just need to sprint to keep up!
Latest Viral ChatGPT Trend: Reverse Location Search from Photos
A new viral behavior is emerging on ChatGPT where users employ the tool to try pinpointing photo locations. ChatGPT was updated with new o3 and o4-mini AI models capable of "reasoning" through uploaded images. The models can crop, rotate, and zoom photos (even blurry/distorted ones) for thorough analysis. Combined with ChatGPT's web search ability, this creates a powerful location tool. While fun, it raises privacy concerns—ChatGPT could uncover personal information without knowledge or consent. OpenAI says it's working to address these concerns and remains committed to user privacy.
This is both amazing and terrifying. Hiding anything online was already hard—now it's practically impossible. Imagine: you take a random-location selfie and post it. Someone malicious could use ChatGPT to pinpoint exactly where you were, identifying background establishments or subtle landmarks. On one hand, this could aid investigations, find missing persons, or satisfy "where was this amazing photo taken?" curiosity. On the other, it's a privacy nightmare waiting to happen. We're heading toward a world where visual anonymity becomes nearly impossible. Most worrying? This capability wasn't necessarily planned by OpenAI—users discovered it themselves. What other unexpected "superpowers" do these models have that we haven't uncovered yet?
Introducing OpenAl o3 and o4-mini
OpenAl launched o3 and o4-mini, its smartest, most capable models to date with full tool access. OpenAl o3 advances programming, math, science, and visual perception frontiers, while o4-mini is optimized for fast, economical reasoning. The models show improved instruction following and verifiable responses thanks to enhanced intelligence and web source inclusion. OpenAl also launched Codex CLI—a lightweight coding agent run from terminal—and a $1M initiative supporting projects using Codex CLI and OpenAl models.
These new o3 and o4-mini models seem like direct responses to growing competition from Claude, Gemini, etc. The "reasoning" focus and verifiable responses show they're addressing hallucination and reliability criticisms. I'm especially intrigued by Codex CLI—a native terminal AI assistant! For us developers, this could be a productivity game-changer. Imagine typing "create a script organizing my photos by date" directly in terminal and seeing ready-to-run code appear. That $1M project fund is a smart move to build an ecosystem around these new products. OpenAI's clearly trying to cement its market lead as competitors close in.
Paper: DeepSeek-R1 Thoughtology: Let's About LLM Reasoning
Large-scale reasoning models like DeepSeek-R1 mark a fundamental shift in how LLMs approach complex problems by creating detailed multi-step reasoning chains, seemingly "thinking" through problems before answering. The reasoning process is publicly available, creating endless opportunities to study model reasoning behavior and opening the Thoughtology field. Our DeepSeek-R1 analyses investigate thought-length impact/controllability, long/confusing context management, cultural/safety concerns, and cognitive phenomena like human-like language processing and world modeling.
"Thoughtology" sounds straight from sci-fi, but we're living it! We're starting to study AI "thinking" as its own scientific discipline. DeepSeek-R1 brings something truly interesting: reasoning transparency. Instead of magical answers, we see models build reasoning step-by-step, almost like watching someone's mental drafts. This improves reliability and helps identify flawed reasoning. Imagine using this in education—students seeing not just answers but complete solution paths. Or in critical fields like medicine where understanding reasoning is as important as conclusions. We're entering an era where we don't just use AIs as tools but study their cognition as its own science.
Paper: Leveraging Reasoning Model Outputs to Enhance Non-Reasoning Model Capabilities
Recent large language model (LLM) advances like DeepSeek-R1 and OpenAI-o1 demonstrate test-time scaling's significant effectiveness, achieving substantial performance gains across benchmarks. These advanced models use deliberate "thought" steps to systematically improve answer quality. This paper proposes using these high-quality outputs from compute-intensive reasoning models to enhance less demanding non-reasoning models. It explores and compares methodologies for using reasoning model outputs to train and improve non-reasoning models. Supervised fine-tuning (SFT) experiments on established benchmarks show consistent improvements, highlighting this approach's potential for advancing direct-answering model capabilities.
What a brilliant idea! It's like having a super-smart (but slow) professor train a faster (but initially less capable) TA. This research tackles one of AI's biggest dilemmas: we want deep, well-reasoned answers but also quick responses. Reasoning models like DeepSeek-R1 and OpenAI-o1 are incredibly powerful, but that "thinking" time can frustrate when you just want a quick answer. Using these "thinkers" to train lighter, faster models gives us the best of both worlds. Imagine GPT-4 quality with GPT-3.5 speed! This could democratize advanced AI since lighter models run on cheaper, less energy-intensive hardware. A highly promising approach to make advanced AI more accessible and practical for daily use without sacrificing quality.
Paper: AI-Guided POCUS Outperforms Experts in TB Detection for Underserved Areas
AI-guided point-of-care ultrasound (POCUS) can accurately detect tuberculosis (TB), according to research presented at the European Society of Clinical Microbiology and Infectious Diseases (ESCMID) conference in Vienna, Austria. The technology could have applications in clinically underserved areas.
THIS is AI saving real lives. TB remains a major problem in many parts of the world, especially resource-limited areas. What makes this research special is combining two relatively accessible technologies—portable ultrasound and AI—into a solution that works even in remote locations without major hospitals or specialists. Outperforming human experts is impressive, but the real impact is scalability: we can train many more technicians to use AI-guided POCUS than we can produce medical specialists. This is exactly the kind of AI application we need more of—technology solving real problems for people who truly need it, not just conveniences for those already well-resourced. I hope this research quickly translates to field implementation, especially in high-TB regions.
Phew! What an intense week in the AI world, huh? If these stories show anything, it's that innovation speed is becoming downright dizzying. Remember when major product launches happened yearly? Now we get new models, features, and tools almost daily!
What stood out most this week was how we're rapidly moving from "wow, this is cool!" to truly transformative applications—whether deciphering dolphin language, diagnosing TB in underserved regions, or creating agents that automate tasks in existing interfaces.
It's also fascinating seeing different company approaches: while OpenAI continues its aggressive frequent-release, billion-dollar-acquisition strategy, players like Meta and Google DeepMind focus on fundamental research with potentially deeper long-term impacts.
What caught your eye this week? Any stories particularly stand out? Don't forget to return next week for more piping-hot AI news, always here at DevCafé with that fresh coffee and cutting-edge tech aroma!
Until next time, happy coding!