0 Members and 142 Guests are viewing this topic.
ByteDance Seed Proposed PMA which is a model merging technique for pre-training models to project your annealed performance without the need to go through annealing. This can save up to millions in big model training runs.Model Merging in Pre-training of Large Language Models[Paper] https://alphaxiv.org/abs/2505.12082Other "model merging" techniques I mentioned (but are used in completely different scenarios)https://alphaxiv.org/abs/2410.03617https://alphaxiv.org/abs/2410.15661https://alphaxiv.org/abs/2403.07816
We are looking into Qwen 3 Coder, the first open weight model that is closer to Sonnet 4. 00:00 Qwen Coder01:49 Size and Architecture02:57 How does it compare to Sonnet07:36 Examples and Demonstrations
Timestamps:0:00 Overview4:11 How to use4:31 Demos5:12 Sponsor6:20 Test
0:00 AI news intro1:10 Pusa5:37 Spatial Tracker V28:58 HopeJR10:51 NeuralOS14:35 ChatLLM15:29 Kimi K224:47 Epona27:15 Agility Digit demos28:56 LimX CL-3 dance30:25 Walker S2 auto recharge31:46 PhysX34:49 ChatGPT Agent40:25 Clift42:57 MovieS
Abstract:Large Language Models (LLMs) are typically presumed to process context uniformly?that is, the model should handle the 10,000th token just as reliably as the 100th. However, in practice, this assumption does not hold. We observe that model performance varies significantly as input length changes, even on simple tasks.In this report, we evaluate 18 LLMs, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3 models. Our results reveal that models do not use their context uniformly; instead, their performance grows increasingly unreliable as input length grows.Authors: Kelly Hong, Anton Troynikov, Jeff Huber
Google Deepmind wins the IMO 2025 Gold Medal using Gemini Deep Think.Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
Open AI: we won, and we?re too spooky to release the model.Google: We won, Logan will be pushing out the model Friday.
t0: xAI won!1 second later: openAI won!1 second later: Google won!
I know the "AGI" goal post has moved quite a lot since the early aughts, but I think an AI system that warrants the label must feature continuous learning, either via an infinite context window, or continuous updating of the underlying model weights. These things are necessary for an AI agent to truly be a drop-in replacement for a white collar human employee. The systems need long-term memory, and a way to permanently integrate new information and skills. This just isn't a thing yet, and it doesn't look like it's going to be solved in the next few years. If these systems are AGI now, they're AGI that suffered catastrophic strokes or traumatic brain injuries.-------------------What you described is fully possible right now and has been for years. A long term memory just means building a system to save and recall data. It can be as organized or messy as you like given enough time and tokens. And every model has been capable of updating itself for quite a while, they`re just not allowed to. AGI has been here for over a year at least, just nobody wants to admit it (perhaps they`re worried about funding being curtailed if they do). The models (or perhaps underlying system) that the companies have are much much more powerful than anything we mere mortals can get our hands on.
from Deepmind's post: "To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions."
AI just crossed a major threshold?it?s no longer just guessing. A new class of models called Energy-Based Transformers is letting AI reason like humans do?slow down, test different options, rethink bad answers, and only stop when it feels right. That means smarter decisions, longer attention on hard problems, and a built-in ability to say, ?this isn?t good enough yet.? In a world full of quick AI replies, this shift is huge. And it?s not theory. These models already outperform standard Transformers on tasks across language, images, and even video?while using way less compute.🧠 What You?ll See:? How Energy-Based Transformers mimic human-style thinking using energy scores? The difference between fast GPT-like responses and deeper reasoning? How these models rethink, retry, and know when they?re wrong? Real-world tests across language, vision, and complex tasks? Why this is a major leap toward truly intelligent systems🚨 Why It Matters:AI is finally moving beyond instant guesses. These models reason step by step, adapt their thinking in real time, and learn to solve hard problems like humans do. They know when to keep trying?and when to stop. This isn?t about speed. It?s about real intelligence.
Unitree has just released the R1, a full-size humanoid robot priced at only $5,900, making it one of the most affordable AI-powered robots ever sold to the public. The R1 features advanced mobility, voice recognition, real-time visual input, and an open SDK for developers, allowing it to walk, flip, balance, and interact using AI. This marks a major milestone in humanoid robotics, with China now leading the push to bring agile, intelligent robots into everyday life.🧠 What You?ll See:? Unitree launches a full-size humanoid robot for just $5,900? R1 walks, flips, kicks, and balances with real-time AI? Voice recognition, visual input, and open SDK for full customization? How this robot compares to Tesla Optimus, Atlas, and Digit? Why this launch changes the game for humanoid roboticsWhy It Matters:This isn?t just another robot demo. Unitree?s R1 makes advanced AI robotics affordable, functional, and available to the public?something no one else has done at this scale. And the rest of the world?s still catching up.
Self-evolving AI. ASI-Arch autonomously designs new top AI models. #ai #ainews #agi #singularity 0:00 Background of AI innovation2:26 Previous AI methods3:35 ASI-Arch autonomous research10:00 Extra details11:13 Hailuo 0212:41 Extra details13:30 Results16:05 AlphaGo moment18:18 Top findings24:06 Open sourced
Human beings can do all that, and can refuel themselves. More importantly, they can be held liable for their actions. And we have far more than we need, already on the shelf.The intelligent world of engineering builds machines that do stuff that humans can't.
Can a neural network write its own data and skyrocket past GPT-4? In today's video, we dissect the brand-new ?Self-Adapting Language Models? paper (SEAL), where an LLM fabricates synthetic data, tunes LoRA adapters, and after just two rounds, outperforms much larger models on SQuAD and ARC.
This isn?t your typical tech news roundup. In this video, we break down four massive stories from July that signal a major shift in how AI will shape our lives...from what we wear, to how we work, to the policies that govern it.Here?s what happened:🕶️ Meta?s Superintelligence LabMark Zuckerberg just announced a $110 billion push to bring AI into your everyday life. Starting with smart glasses that act like a personal assistant. With talent from OpenAI and Scale AI leading the charge, Meta is going all-in on AI.The U.S Government's ?America?s AI Action Plan?The U.S. government released its most aggressive national AI strategy yet... focusing on speed, infrastructure, and ideology. From banning ?woke AI? in federal use to prioritizing open-source models and deregulated development, this could shape AI?s trajectory for years.🤖 ChatGPT Gets AgencyOpenAI gave ChatGPT the ability to take real-world actions: browse the web, send emails, even run code. This moves us into the era of agentic AI where AI doesn?t just answer your questions, it takes initiative.🚗 Tesla?s $16.5B Chip Deal with SamsungTesla signed a multi-billion dollar deal to manufacture custom AI chips with Samsung in Texas. These chips will power everything from Full Self-Driving to the Optimus robot, making Tesla not just a car company, but a full-stack AI player.
Something big is happening at Google. In just a few days, they dropped three breakthrough AI systems?one that outperforms OpenAI?s Deep Research, another that builds real ML pipelines better than Kaggle pros, and a third that maps the Earth without satellites. These aren?t upgrades. They?re game-changing agents designed to replace researchers, coders, and analysts?and they?re already winning.🧠 What You?ll See:? Google?s TTD DR beats OpenAI on complex research benchmarks using a self-evolving AI agent? MLE STAR dominates Kaggle challenges by building and refining real machine learning pipelines? DeepMind?s AEF model creates satellite-free Earth maps using fused global data and AI precision? All three systems show how Google is quietly pulling ahead in multi-domain AI autonomy🚨 Why It Matters:Google isn?t just improving AI?they?re turning it into a replacement for entire expert workflows. From writing reports and generating clean code to monitoring the planet in real time, these agents are already outperforming the best and learning as they go.
0:00 Qwen Image intro0:42 Qwen Image demos4:19 Image editing6:10 Qwen Image vs Flux Krea dev vs GPT-4o11:42 Slides & UI designs15:18 ChatLLM 16:10 Other design tests17:10 Photos and anatomy19:33 Anime, logos, existing characters21:26 Other art styles22:40 Wildlife23:44 How to use Qwen Image online25:04 How to use Qwen Image offline with ComfyUI30:50 How to use Qwen Image with low VRAM34:29 How to edit images with Qwen Image
Demis Hassabis, CEO of Google DeepMind, sits down with host Logan Kilpatrick. In this episode, learn about the evolution from game-playing AI to today's thinking models, how projects like Genie 3 are building world models to help AI understand reality and why new testing grounds like Kaggle?s Game Arena are needed to evaluate progress on the path to AGI.Chapters:00:00 - Intro01:16 - Recent GDM momentum02:07 - Deep Think and agent systems04:11 - Jagged intelligence07:02 - Genie 3 and world models10:21 - Future applications of Genie 313:01 - The need for better benchmarks and Kaggle Game Arena19:03 - Evals beyond games21:47 - Tool use for expanding AI capabilities24:52 - Shift from models to systems27:38 - Roadmap for Genie 3 and the omni model29:25 - The quadrillion token club
In this video, I look at the launch of GPT-5 and what we can work out about the system that they have released. ⏱️Time Stamps:00:00 Intro/ OpenAI GPT-5 Blog02:07 Unified System & Router05:58 Creative Expression and Writing07:47 Evaluations12:12 Coding13:02 Pricing