Coffee of the Week

'Hello? It's ChatGPT speaking!' and other incredible news from the world of technology

Walter Gandarella • December 30, 2024

tools AI development

Hey, folks! It's time for our weekly roundup, and wow... what an intense week it has been! It seems like everyone decided to launch new things all at once - must be that end-of-year rush to meet targets, right? 😄 We have a bit of everything: from AIs that can answer your phone to robots that learn to dance. And the best part? Each new development seems more impressive than the last!

AGI isn't that important

A provocative article argues that the market is excessively focused on improving LLMs and achieving AGI, when in fact true transformation will come from the software that controls and orchestrates these tools. The author contends that even if we achieve AGI, its impact will be less disruptive than imagined, and that real value lies in the ability to mimic and automate processes through well-structured software.

I found the author's perspective interesting and, honestly, a relief amid so much hysteria about AGI. It's like everyone is chasing the Holy Grail of artificial intelligence, while ignoring the incredible tools we already have at our disposal. The point about mimicry is especially insightful - AIs are already impressive at imitating and automating processes, and that alone has immense value when applied correctly.

Original source

Elon Musk wanted a for-profit OpenAI

OpenAI released a detailed timeline showing how Elon Musk initially questioned the organization's non-profit structure, later demanded majority control and a CEO position when the possibility of turning it into a for-profit company arose, and finally left the organization when his demands were not met. The document reveals that Musk even created a public benefit corporation called "Open Artificial Intelligence Technologies, Inc." as a proposed structure for OpenAI.

This story reminded me of those corporate dramas that would make a Netflix movie, right? Musk's narrative as an "ethical AI advocate" clashes with these documents showing that he wanted to basically dominate the company. The most ironic part is that after criticizing OpenAI for becoming a for-profit company (even if limited), he ended up founding his own AI company, xAI. It's really like they say: do what I say, but don't do what I do!

Original source

GitHub launches free version of Copilot

GitHub announced a free version of its popular programming assistant Copilot, which will now come by default with the VS Code editor. The free version has some limitations, such as 2000 code completions per month and access to only a few language models (Claude 3.5 Sonnet and GPT-4o), but still represents a significant move to democratize access to AI tools for programming.

This is one of those news items that makes you think "finally!". It's great to see GitHub following the trend of making AI tools more accessible, especially for programmers in countries where 10 dollars a month is a significant amount. Of course, there are limitations, but hey, 2,000 completions a month is quite a good amount for experimentation!

Original source

Google launches NotebookLM for businesses

Google is expanding its AI-powered research and note-taking application, NotebookLM, to the enterprise market. The Plus version offers additional security and privacy features, allows sharing among organization members, and includes audio summaries in podcast style. The service is part of Agentspace, Google's new cloud platform for AI "agents."

My favorite personal organization tool can now be used at work? That's exactly what Google is offering here. NotebookLM was already great for personal use, but now with the business features, it could become even better. The audio summary feature is very interesting - imagine being able to turn that boring two-hour meeting into a 15-minute podcast with the key points?

Original source

Google DeepMind presents new video model to compete with Sora

DeepMind announced Veo 2, a video generation model that promises to surpass OpenAI's Sora in some technical aspects, such as resolution (up to 4K) and duration (over 2 minutes). Although currently limited in its testing version, the model promises better understanding of physics, more precise camera controls, and clearer generation of textures and moving images. DeepMind is also implementing the SynthID watermarking technology to combat deepfakes.

I feel foolish seeing how the rivalry between OpenAI and Google DeepMind is starting to resemble that classic dispute between Marvel and DC, but here it's not about superheroes; we are talking about increasingly impressive AI models. Veo 2 seems really promising, especially with that 4K resolution, but as always, the devil is in the details. For now, the available version is very limited; in theory, it's beautiful, but in practice... well, we'll see. At least they are taking the issue of deepfakes seriously with watermarking technology, which is a very important point nowadays.

Original source

Google launches its "reasoning" model

Google announced the launch of Gemini 2.0 Flash Thinking Experimental, a new AI model focused on reasoning capabilities. Available on AI Studio, the company's prototyping platform, the model was developed for multimodal understanding, reasoning, and programming, focusing on solving complex problems in areas like programming, mathematics, and physics.

Have you ever tried to explain something to someone, and they take a while to process before giving that more elaborate answer? That's more or less what Google is trying to do with this new model. But not everything is rosy - in initial tests, the model is still kind of "stumbling." It even miscounted how many "R's" are in the word "strawberry" (it said there were two!). But hey, everyone starts somewhere. And with Google heavily investing in this area, it's only a matter of time before these "gaffes" are a thing of the past.

Original source

IBM announces Granite 3.1 with significant improvements

IBM launched a significant update for its Granite language model series, bringing performance improvements, longer context, and new embedding models. The Granite 3.1 8B Instruct offers higher performance in academic benchmarks, expanded context up to 128K tokens, and new features to detect hallucinations in workflows with AI agents.

IBM is coming in strong! The most exciting part is that they not only improved what they already had, but they also added that cherry on top: an AI "trip" detector (you know when it makes something up? Yep!). And the best part: all of this is open source, so anyone can get their hands dirty and make their own modifications.

Original source

Meta adds live AI and translations to its smart glasses

Meta is expanding the capabilities of its Ray-Ban smart glasses with three new features: live AI, live translations, and Shazam. The AI and translation functions will be available only to members of the Early Access Program, while Shazam will be available to all users in the US and Canada.

I found myself imagining those moments when we're at the market looking at a zucchini and thinking, "What am I going to do with this?" Now imagine being able to ask an AI through your glasses and get recipe suggestions! Cool! But wait, there's more: the glasses can also translate conversations in real-time from English to Spanish, French, or Italian. And to top it all off, you can also find out what song is playing in the background of the restaurant. The future has arrived, and it wears glasses! (I had to make this joke)

Original source

Perplexity AI triples its value in 6 months

The research startup Perplexity AI closed a $500 million funding round in early December, raising its valuation to $9 billion. This represents a threefold increase in its value in just six months, following a previous investment from SoftBank that valued the company at $3 billion in June.

Isn't it amazing that the folks at Perplexity are having a party? In six months, they achieved what many companies take years to accomplish! And how interesting: even with all that controversy over copyright (we talked about this in a previous Coffee of the Week — they are being sued by News Corp), investors keep throwing money at them as if there’s no tomorrow. The world of AI is increasingly resembling a financial roller coaster, and apparently, everyone wants to take a ride!

Original source

Midjourney launches personalized profile and moodboard system

Midjourney announced a significant update to its model personalization infrastructure, now allowing users to have multiple personalization profiles and use moodboards — collections of images that serve as inspiration for the model. The personalization process has become up to 5x faster and requires far fewer ratings to start — only 40, compared to the thousands needed previously.

Who has never wanted to have a multiple personality (in a good way) in image creation? Now Midjourney is making that possible! What's cool is that you no longer need to provide thousands of notes to the images to teach the model - with just 40 you can start having fun. It's promising, I really hope it evolves.

Original source

Perplexity confirms acquisition of startup Carbon to expand business research

Perplexity officially announced the acquisition of Carbon, a startup specializing in connecting external data sources with language models. The integration will allow users to link apps like Notion and Google Docs directly to Perplexity, facilitating searches in internal corporate data.

It seems that Perplexity wants to show that they mean business! After a 2024 full of news and a valuation that would make even a billionaire envious, they have now bought Carbon to solve that headache everyone has at work: finding important information scattered across 500 different places. And there's more, they are taking seriously the idea of "what is yours is yours" - they promised that everything will be encrypted and that only those with permission will be able to access the data. But it wasn't clear if they are already using the standard connectivity protocol for LLMs to data sources proposed by Anthropic, will this standard take off?

Original source

Anthropic publishes revealing study on AI vulnerabilities

Anthropic released a study on the Best-of-N Jailbreaking (BoN) algorithm, which demonstrates how frontier AI models can be exploited in multiple modalities. The method achieved success rates of up to 89% on GPT-4o and 78% on Claude 3.5 Sonnet using 10,000 samples.

Wow, Anthropic decided to lay all its cards on the table! They basically created a "hacker's manual for AI" to show where the shoe pinches in today's most advanced models. The impressive part is that their method works frighteningly well - almost 90% success on GPT-4o! But don't worry, this is for the greater good - the more we know about the vulnerabilities, the more we can work to fix them.

Original source

Ilya Sutskever reflects on the historical evolution of AI

In a reflective lecture, Ilya Sutskever, one of the pioneers of modern AI, discussed the evolution of the field over the past 10 years, from the paper "Sequence to Sequence Learning" to the present day. He addressed how pre-training revolutionized the field, but also pointed out its limits, such as the finiteness of available data on the internet.

He touched on a super interesting point: it's as if we've reached the "peak oil" of internet data. After all, there is only one internet in the world. And look at the cool analogy he made with mammal brains - apparently, even nature has its tricks for "scalability." Will we discover some trick like this for AI as well? As Ilya himself would say, it's impossible to predict the future, but it will certainly be quite a journey!

Original source

OpenAI launches phone service with ChatGPT

OpenAI announced that ChatGPT can now be accessed by phone and WhatsApp. Users in the US can call 1-800-CHAT-GPT (1-800-242-8478) and talk to the model by voice, while global users can interact via WhatsApp. The free service offers 15 minutes of calls per month, with an option for additional time for subscribers.

It's that moment when you realize the future has really arrived - now you can call ChatGPT! No need to open the browser, log in... Now you just pick up the phone and say "I'm here"! And the coolest part is that they even tested it on an old rotary phone (and it worked!). Imagine the scene: your grandfather pulling that dusty phone out of the drawer and asking ChatGPT for the recipe for sponge cake. 😄

Original source

OpenAI announces new features for programmers

In a special edition of "DevDay Holiday Edition," OpenAI launched several new features for programmers, including the O1 model with calling functions, structured outputs, and programmer messages. The model also received support for vision inputs and a new "reasoning effort" parameter that allows adjusting how long the model should spend thinking about problems.

Who would have thought that even developers would receive Christmas presents from OpenAI? What I liked the most is this "reasoning effort" thing - now you can tell the model "think a bit more" or "relax, no need to think so hard." This way, we save some tokens!

Original source

ChatGPT gains project and organization system

OpenAI launched a new project feature in ChatGPT, allowing users to organize conversations, upload files, and set custom instructions for each project. The feature includes integration with all existing functionalities like search and canvas, in addition to being used as a filing system to organize conversations.

I was already fed up with the confusion of chats? Now you can organize everything neatly into folders, just like it already existed in Claude, meaning OpenAI doesn't leave any feature exclusive to the competition! Drew, one of OpenAI's engineers, showed how to remember when to change the fridge filter (which, by the way, everyone needs to change!). The funniest part was that he also demonstrated how to organize the secret gift exchange for the end of the year - and ChatGPT even helped with the draw, ensuring there was no cheating! 😂

Original source

OpenAI expands ChatGPT search functionality

ChatGPT now offers web search for all logged-in free users. The feature, previously restricted to paid users, has been enhanced with more speed, better mobile experience, and new map functionalities. The search has also been integrated into the advanced voice mode, allowing access to updated information during voice conversations.

And isn't it that OpenAI decided to give an early Christmas gift to everyone? Now anyone can use ChatGPT to do that sneaky web search! And it even works when you're talking to it by voice. Want to know where the party will be this weekend? Just ask! It was about time, huh OpenAI?

Original source

ChatGPT gains integration with desktop applications

OpenAI has expanded the capabilities of ChatGPT Desktop, allowing the assistant to work directly with computer applications. The functionality includes support for code editors like XCode and VS Code, as well as text applications like Apple Notes, Notion, and Quip. The feature has also been integrated with the advanced voice mode, allowing voice interactions with the applications.

Just press some magic keys (Option + Space) and it appears, ready to help. The cool thing is that it’s not one of those intrusive assistants that snoop around - it only looks at what you allow. But it has limitations; for example, it can't change anything in these applications, if you like a suggestion, you have to manually copy it from the chat and paste it into the application.

Original source

Meta advances in understanding the mind with ExploreToM

Meta introduced ExploreToM, a new approach that uses A search algorithms and specific languages to generate synthetic stories that test the ability of language models to understand the mental states of other people (Theory of Mind - ToM). The system creates complex scenarios and tracks the beliefs and intentions of the characters, revealing fundamental limitations in current models.

You know when your friend swears they know what you're thinking, but in reality, they have no idea? Meta created a tool to see if AIs also suffer from this problem! ExploreToM is a digital soap opera creator - it creates twisty stories to see if the AI can keep track of who knows what. Even the powerful GPT-4 didn't do very well in this test... it only got it right 9% of the time, but surely it did better than I did.

Original source

Meta presents humanoid model with full body control

Meta announced META MOTIVO, the first fundamental behavioral model for controlling humanoids in complete bodily tasks without the need for additional training. The model uses unsupervised reinforcement learning and can respond to different types of instructions, from imitation to reward optimization.

You must have seen one of those clumsy robots in videos trying to walk and always falling. Meta decided to give it a dance lesson! META MOTIVO is a physical education teacher for robots - it teaches them to move more naturally, without looking like they're always about to trip over their own feet. It learns by observing real people, like individuals who learn dance steps just by watching others at the club! Of course, it still has its limitations - it’s not a big fan of falls and hasn’t learned to interact with objects yet, but Rome wasn't built in a day, right?

Original source

Meta revolutionizes language processing with Byte Latent Transformer

Meta published a study on the Byte Latent Transformer (BLT), a new architecture that eliminates the need for tokenization in language models when working directly with bytes. The model uses dynamic patches based on entropy and can reduce inference costs by up to 50%, while maintaining competitive performance.

Have you ever tried to talk to someone in a language you don’t know and ended up communicating through mime? Well, traditional AI models kind of do that - they need to turn everything into "tokens" to understand. But the BLT says "you know what? I'm going to read directly from the source code!" It's much more economical - it uses half the processing of normal models. Worth checking out.

Original source

Microsoft trains Phi-4 with a focus on synthetic data

Microsoft has introduced Phi-4, a language model with 14 billion parameters trained with a unique combination of synthetic and organic data. The model excels in STEM reasoning tasks, even outperforming GPT-4o in some benchmarks, despite its relatively modest size.

Microsoft decided to take on the role of a private tutor for its new model. Instead of throwing it into the internet to learn on its own (as most do), it created a personalized study plan. The secret? 40% of its "study material" was custom-made. Of course, it still slips up occasionally—like inventing facts that don't exist (who hasn't?)—but it's on the right track!

Original source

Phew! It's amazing to see how AI is evolving in so many different directions at the same time. We have ChatGPT becoming a telemarketer (the kind we actually enjoy talking to!), Meta teaching robots to dance, and even creating AIs that are more "human" with ExploreToM. Meanwhile, Microsoft is there, creating a "prodigy student" with Phi-4. If all this has happened in just one week, imagine what’s coming in 2025! Well, that's it for now. Oh, and don't forget to change your refrigerator filter! 😉

Latest related articles

ferramentasAIdesenvolvimento

Café da Semana

A semana que a AI decidiu fazer das suas

Walter Gandarella • August 18, 2025

ferramentasAIweb

Café da Semana

Esta semana vai fazer-te repensar o futuro da AI

Walter Gandarella • June 10, 2025