0 Members and 1 Guest are viewing this topic.
In this video, I am writing the Group Relative Policy Optimization (GRPO) algorithm from scratch in Pytorch and training SLMs to reason. Plus, we are going through the policy gradient equation, writing code, and visualizing exactly how reasoning Large Language Models work!Papers:Deepseek Math: https://arxiv.org/pdf/2402.03300DeepSeek R1: https://arxiv.org/abs/2501.12948DAPO: https://arxiv.org/abs/2503.14476Critical Perspectives on R1: https://arxiv.org/abs/2503.20783Timestamps:0:00 - Thinking LLMs are taking over!3:47 - Setting up Reinforcement Learning Environment4:50 - Reasoning Gym library - Rewards8:00 - GRPO Visually explained10:41 - Policy Optimization and PPO loss Explained15:45 - Coding response generation20:55 - Coding Reward Generation & Advantages26:25 - Calculating log probabilities30:58 - RL Training loop33:49 - Visualizing log probabilities post training36:01 - The GRPO and PPO Loss function38:19 - Surrogate clipping41:21 - Supervised Finetuning and LORA training43:26 - Reasoning SLM results!45:36 - 10 Practical Tips for finetuning Reasoning SLMs
Correction: I should have said "FORMER Google CEO Eric Schmidt", sorry about that.While AI development seems to have stagnated for a bit, AI researchers are working on some seriously interesting stuff ? namely, AI algorithms that can constantly improve their accuracy or learn new tasks. Just in the past months, we have seen multiple advances in self-reinforced AI learning, including some that could go beyond large language models. Let?s take a look.
In this video, we take a look at this research paper called Continuous Thought Machines published by http://sakana.ai/ , where they integrate neural timing into an AI model.
Hey! I'm the first author on the CTM. It is my "brain child", so to speak.Thank you so much for the excellent video! Please feel free to reach out and ask any questions you may have.
I think what people struggle to see or understand with the CTM is that it isn't the end goal, but rather a means to the end. It gives us a springboard to explore (or, rather, re-explore as most things "have been done" in some form or other) concepts and approaches that have been painfully out of reach or deemed unnecessary owing to disappointing performace signals. The neural-dynamics-forward approach of the CTM yields a fundamentally new representation and that is exciting to me. Referencing off hand some comments on this video: embodied intelligence, model-action feedback loop, memory, etc., are all things we actively care about and are excitedly in pursuit. To answer more directly: what I'm excited about is the pursuit of understanding intelligence. Being close to the architecture lets me quickly add in components grounded conceptually in understandings from biology and watch the behaviour change. Almost always there is some surprising element that I have yet to see discussed in the field.
Unsurprising question: can this be retrofitted to generate text? i.e: become a language modelHave you tested it against other kinds of neural networks at the same task and compared the performance?----There is an online and interactive version of the paper that also includes links to the PDF and github repo, but I can't put the link on youtube without it being deleted, unfortunately. A quick google search will take you there though. I built this website to explain the CTM and it includes a maze-navigation model running in-browser, which is surprisingly fun to play with!We are busy retrofitting to the CTM to language tasks, yes, but the whole point of this research is for new and novel approaches, not necessarily to find a slightly better way of doing what LLMs are extremely good at already.
This sounds so dope, really hope this could get us closer to an AI that is not stuck in time waiting for a prompt but rather an AI that "thinks" and interacts in a way closer to humans
If the input gets analyzed by the AI multiple times, it makes me wonder if this may lead to a sort of confirmation bias, where the AI makes a decision when it analyzed the input for the first time, then on subsequent times, it only analyzes the input in ways that validate its own decision. I would be kinda funny tho.PS: let me emphasize that it would be silly if, by making AIs think more like humans, they may be subject to the same logical pitfalls we humans do, like confirmation bias. AIs already have a problem of being way too confident about wrong inferences, imagine how funny it would be (and horrible too) if they also stubbornly stick to their own wrong conclusions. It would be like having a internet argument with them.
Oh, finally! This is the kind of design that I've been waiting to appear. From "simple" input-output systems towards continuous and overlapping feedback loops with parallel, partially independent networks taking fresh input both from memory and new sensor data. Can't wait to see where this will take us (a few papers down the line)!
Neuromorphic computing is a real thing, and it is truly very close to how the brain operates in many more respects than this is. It's an entirely different field, so I'd hesitate to go too deep on the metaphor with this. These models are extremely interesting, and will likely get even more interesting when they try something like this in a feedback loop, where it can alter its input and perceive its own alterations.
6:20 - Yes this is what is needed. I have been working for the last 10 years on neural nets where the signal loops around and do not go away. Input is a continous stream and output is also continous. There are many different ways of going about this and I have tested several, but I truly feel that this is what is missing from any of the feed-forward models actually being "alive" in any way. Glad to see other also doing simmilar work! I find that this needs other ANN structures than popular ones, and back in 2014 i figured I needed a neural net that sets the weight of my main neural net, but so far genetic algorithms seems to work better for me.
What makes Gemini 2.5 Pro Google?s most revolutionary AI yet? Discover how this upgrade shatters limits in reasoning, speed, and versatility ? potentially outpacing rivals like Claude 3 and GPT-4. We break down why developers, researchers, and businesses are calling it a quantum leap in artificial intelligence.Gemini 2.5 Pro?s massive 2-million-token context window transforms long-context understanding. It analyzes entire codebases, feature-length films, or scientific papers in one go ? enabling unprecedented document comprehension and complex problem-solving. Its refined mixture-of-experts (MoE) architecture dynamically routes tasks for lightning-fast, cost-efficient performance.Unlike its predecessors, Gemini 2.5 Pro excels at multimodal reasoning ? seamlessly connecting text, images, audio, and video. Imagine diagnosing medical scans with verbal context, generating code from hand-drawn diagrams, or summarizing lecture videos with pinpoint accuracy. This isn?t just an upgrade; it?s a new paradigm for human-AI collaboration.With enhanced safety guardrails and enterprise-ready scalability, Gemini 2.5 Pro positions Google to dominate real-world AI applications. From accelerating drug discovery to automating legal analysis, its ability to reduce hallucinations while handling massive data could redefine industries overnight.How does Gemini 2.5 Pro work? Is it better than GPT-4? What can you do with 2M tokens? Can Gemini 2.5 replace programmers? How does it reduce AI hallucinations? This video reveals why experts say it?s Google?s most consequential AI yet. Watch now to future-proof your skills.
What if your AI assistant could see, hear, and even feel the world around you?We?re entering a new era of artificial intelligence, and it?s multi-modal. This video breaks down what that means, why it?s happening now, and how it?s already reshaping everything from accessibility to autonomous vehicles.You?ll learn:👁️ What multi-modal AI really is⚙️ The tech stack powering it (Transformers, CLIP, GPUs, and massive datasets)📱 How tools like GPT-4V and Tesla FSD use it in the real world⚠️ The risks and limitations we need to consider🤖 And what?s next?from emotion-sensing AI to robots that feelThis isn?t science fiction. It?s happening now...and it?s changing the way machines interact with the world (and us).?Chapters:00:00 What if AI could see and hear?01:00 What is multi-modal AI?02:30 The tech behind the shift04:10 GPT-4V & Be My Eyes06:00 Tesla FSD & multi-modal driving07:40 Why this changes everything09:00 Limitations and risks10:00 The future: emotion-aware, sensory AI11:20 Why it matters?
We traveled all the way to China for an exclusive, behind-the-scenes look at UNITREE ROBOTICS, the masterminds behind some of the most mind-blowing and advanced robots on the planet. Their humanoid robots sprint, box in the ring, and even perform some kung fu moves! And guess what? I am stepping into the ring myself to really see what it?s made of.We get an in-depth look at their robots, including the H1, G1, B2, Go2 and the B2W. Not only will we see all the tricks and programs they are capable of, but we will also drag race them to see who is the fastest!Let us know in the comments which ROBOT was your favourite and where we should be visiting next!
Who will dominate the race for superintelligent AI that could reshape humanity? As OpenAI, Google, Microsoft, NVIDIA, and Elon Musk?s xAI clash in an all-out AGI war, we break down their secret weapons, critical vulnerabilities, and the trillion-dollar stakes behind this high-stakes battle.OpenAI bets everything on scaling laws?pushing GPT-5 toward human-like reasoning with massive compute. But leaked internal docs reveal concerns about safety vs. speed. Google DeepMind counters with Gemini?s multimodal brain, integrating search, code, and real-world data into a single agent. Yet its struggle to monetize threatens progress.Microsoft leverages Azure?s cloud empire to dominate AI infrastructure, backing OpenAI while secretly developing its own AGI models. NVIDIA isn?t just supplying chips?its CUDA ecosystem and robotics platforms (like GR00T) aim to control the physical layer of AGI. And xAI?s wildcard: Musk?s plan to merge AGI with neural implants, using real-time brain data to train models.The winner won?t just profit?they?ll set global ethics standards, control AI rights, and potentially trigger geopolitical strife. With China accelerating, this war?s outcome could determine whether AGI uplifts or endangers civilization.Who currently leads the AGI race? When will true AGI arrive? Which company has the best strategy? Could AGI cause a tech monopoly? Will governments stop the war? This video exposes the ruthless battle for intelligence supremacy. The future is being decided now.
Peter Wang, co-founder of the Anaconda Project, stopped by FUTO to give a couple lectures on AI. This is the first lecture he gave and the next lecture will release shortly. Do you like his ideas? Let us know in the comments below!Also, if there are other people you would like for us to have give talks for the channel, let us know!
AI watching AI videos, curated by AI, with AI commercials... now all we gotta do is give the AI money to spend on the crap the commercials push and we've solved the alignment problem!
It's not dead, it is undead. Still moving
Generative AI is advancing fast, but can we afford it? This video dives into the hidden costs behind agentic AI: compute, memory, infrastructure, and power. From healthcare to education, the stakes are rising. Here?s what needs to change if we want AI that works - and something that works for everyone. https://www.recogni.com/[00:00] Intro[00:29] AI's Current Capabilities [02:45] AI in Education [05:10] AI in Healthcare [08:06] AI in Employment [10:25] Infrastructure and AI [13:48] AI Compute Limitations [17:29] Evolution of Generative AI
0:00 Can AI Reason?0:23 LLMs vs LRMs2:11 Towers of Hanoi problem3:36 Results from the Apple Paper6:47 My thoughts on the paper9:49 Brilliant.org/TreforBazett
Grok 4 is here, but did you know these 10 things about the new model? From benchmark caveats to soloing science, $300 a month secrets to Grok 5 promises, here's 10 new things to know in just under 12 minutes.AI Insiders ($9!): / aiexplained Chapters:00:00 - Introduction00:22 - Benchmark Results02:11 - Benchmark Caveats02:59 - ARC-AGI 2 03:35 - SimpleBench04:49 - ?Humanity?s Last Exam?07:20 - SuperGrok Heavy Price07:58 - API Price08:12 - Grok 5, Gemini 3.0 Beta, GPT-509:12 - System Prompt Change + $1B a month, pollution10:20 - Not soloing science, helping you solo code
In this episode, I dive deep into the release of Grok 4 by XAI and its groundbreaking performance on various benchmarks. We compare its capabilities with popular leading AI models like OpenAI's O3, Gemini 2.5, and Claude 4. Grok 4 tops the ARC AGI leaderboard and excels in complex tasks but also shows some limitations in nuanced queries. I test its efficiency in real-world scenarios, from ranking global snack foods to evaluating image authenticity. Despite some challenges, Grok 4 showcases impressive advancements, and I discuss its potential impact on the AI landscape. Stay tuned for more in-depth tests and community reactions in future videos!00:00 Introduction to Grok Four00:23 Benchmark Performance of Grok Four01:33 ARC AGI Benchmark Validation02:50 Humanity's Last Exam and Other Benchmarks04:24 New Features and Voice Mode05:22 Grok Four Heavy and Advanced Capabilities06:43 Coding and Real-World Applications07:49 Live Testing Grok Four11:58 Comparative Analysis with Other Models16:06 Image Analysis and Multimodal Capabilities18:43 Final Thoughts and Future Prospects
Progress towards general intelligence has been marked by identifying fundamental intelligence bottlenecks within existing models and developing solutions that improve the architecture or training objective. From this perspective, we discuss our work on Thinking in Gemini as a solution to a bottleneck in test-time compute. We will discuss recent progress in Thinking both from the benefit of capability and steerability, and discuss where our models are headed.About Jack RaeLead of Gemini Thinking, co-lead of Gemini Pre-training
How does the 3mm-thick sheet of cells on the surface of our brain give rise to everything we consider human intelligence? In this video we explore one of the most compelling answers to that question: Jeff Hawkins' "Thousand Brains Theory" and the neurobiology behind cortical columns. We will see how every cortical column is a complete sensorimotor system, discuss all six neuronal layers of the cortex to see how they work together to build predictive models of the world through a constant loop of sensation, movement, and consensus voting.My name is Artem, I'm a neuroscience PhD student at Harvard University. Socials:X/Twitter: https://x.com/ArtemKRSVPatreon: / artemkirsanov 🕒 OUTLINE:00:00 Introduction00:57 Columnar Hypothesis03:07 Predictive Models of the world05:05 Sensorimotor coupling07:53 Evolutionary origins10:43 Anatomy of the column14:19 Consensus Voting17:42 Putting all together19:47 Brilliant📚 FURTHER READING & REFERENCES:For those who want to dive deeper into the science:1. Hawkins, J. (2021). A Thousand Brains: A New Theory of Intelligence. (Book co-authored with Richard Dawkins).2. Clay, V. et al. (2024). The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence.3. Hawkins, J. et al. (2019). A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex. Frontiers in Neural Circuits.4. Hawkins, J. & Ahmad, S. (2016). Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex. Frontiers in Neural Circuits.5. Harris, K.D. & Shepherd, G.M.G. (2015). The neocortical circuit: themes and variations. Nature Neuroscience.6. Haueis, P. (2016). The life of the cortical column: opening the domain of functional architecture of the cortex (1955?1981). History and Philosophy of the Life Sciences.7. Hawkins, J. (N.d.). A Theory of How Columns in the Neocortex Enable Learning the Structure of the World. Numenta Whitepaper.8. Hawkins, J., Leadholm, N., Clay, V., 2025. Hierarchy or Heterarchy? A Theory of Long-Range Connections for the Sensorimotor Brain. 9. Leadholm, N., Clay, V., Knudstrup, S., Lee, H., Hawkins, J., 2025. Thousand-Brains Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference.
The Big LLM Architecture ComparisonFrom DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture DesignIt has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are.Sure, positional embeddings have evolved from absolute to rotational (RoPE), Multi-Head Attention has largely given way to Grouped-Query Attention, and the more efficient SwiGLU has replaced activation functions like GELU. But beneath these minor refinements, have we truly seen groundbreaking changes, or are we simply polishing the same architectural foundations?Comparing LLMs to determine the key ingredients that contribute to their good (or not-so-good) performance is notoriously challenging: datasets, training techniques, and hyperparameters vary widely and are often not well documented.However, I think that there is still a lot of value in examining the structural changes of the architectures themselves to see what LLM developers are up to in 2025.
somehow ai continues to rapidly improve despite many saying "its hit a wall", please ppl let the results speak for themselves
Can't wait for the new goalposts that explain what ai will never be able to do
Those underestimating AI just can't handle it emotionally.
Staying coherent for the length of these proofs tells me to expect that we'll be on the very short term trajectory on the METR agent task length benchmark of doubling every four months for at least another two rounds (i.e. eight months).
It's interesting and a little sad how people so frantically grasp at any limitation/shortcoming of, or mistake by, frontier AIs. The insecurity is palpable. My suggestion is they get over it. As Ilya warned: you're going to have an increasingly bad time if you value human intelligence above all else. The ship has left port, people. We're on shore, watching it sail out into the vast sea of super-intelligence.