LAI #82: MCP, Byte-Level LLMs, Vision Transformers, and the Week Backprop Finally Clicked
Last Updated on July 4, 2025 by Editorial Team
Author(s): Towards AI Editorial Team
Originally published on Towards AI.
Good morning, AI enthusiasts,
This weekβs issue zooms in on what happens when you go one layer deeper, whether itβs understanding MCP for smarter tool integrations, or hand-coding backprop to finally grasp what your modelβs doing under the hood.
In Whatβs AI, we break down the Model Context Protocol (MCP), a new standard that might save you hours of repetitive integration work across tools and orgs. Then we go hands-on: building Vision Transformers in PyTorch, testing byte-level LLMs without tokenization, comparing neural optimizers, and evaluating when open-source alternatives are actually good enough to use.
Also in the mix: metadata enrichment tools for Rust, new Discord collabs, and a meme that may hit a little too close for anyone whoβs wrangled gradient descent this week.
Letβs get into it.
Whatβs AI Weekly
This week in Whatβs AI, I will dive into Anthropicβs Model Context Protocol. How many times have you started a new AI project and found yourself re-wiring the same GitHub, Slack, or SQL integration? That annoying code copy-paste repetition and lack of norms between organizations and individuals is exactly why Model Context Protocol, or MCP, exists. Read about what it is and why it matters in your day-to-day life, or watch the video on YouTube.
β Louis-FranΓ§ois Bouchard, Towards AI Co-founder & Head of Community
Learn AI Together Community Section!
Featured Community post from the Discord
Superuser666_30897 has developed a system for gathering, enriching, and analyzing metadata for Rust crates, utilizing AI-powered insights, web scraping, and dependency analysis. It combines web scraping, AI-powered analysis, and cargo testing to provide comprehensive insights into Rust ecosystem packages. Check it out on GitHub and support a fellow community member. If you have any questions or suggestions, reach out to him in the thread!
AI poll of the week!
Over half of this community entered AI after ChatGPT, and thatβs not a bad thing. It marks a clear generational shift: from research-first to product-first, from academic papers to API calls. If you joined before ChatGPT, what mindset do you think the newer crowd misses? And if you joined after, whatβs one thing you think the earlier crowd underestimates? Tell us in the thread!
Collaboration Opportunities
The Learn AI Together Discord community is flooded with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, we share cool opportunities every week!
1. Skaggsllc is building VERA AI, an AI-driven system for predictive vehicle maintenance and fleet diagnostics, and is looking for developers who may be interested in contributing to the development of this platform. If this sounds like your niche, connect in the thread!
2. Vergil727 is looking for someone to help integrate an Advanced Planning & Scheduling (APS) system into their ERP/MES environment. Youβll handle data mapping, scheduling config, and system integration (SQL, ERP, MES). If this falls within your skill set, please reach out in the thread!
Meme of the week!
Meme shared by bigbuxchungus
TAI Curated Section
Article of the week
From Pixels to Predictions: Building a Transformer for Images By Vicki Y Mu
As Generative AI continues to advance, understanding foundational models like the Vision Transformer is essential. This walkthrough details the process of building a Vision Transformer (ViT) from scratch in PyTorch, explaining the theory behind converting images into patch sequences and processing them with multi-head self-attention. The author implements the model, trains it on the CIFAR-10 dataset, and analyzes the results, achieving an accuracy of 60%. It also covers ViT limitations and mentions more advanced architectures.
Our must-read articles
1. The Week I Spent Hand-Coding Neural Networks to Finally Understand Backpropagation By Abduldattijo
The author details their experience building a neural network from scratch using only NumPy, despite having years of experience with frameworks like PyTorch. Prompted by a gap in understanding backpropagation, the process covered the challenges of manual gradient calculation, implementing the backward pass, and building an optimizer. This exercise yielded a deeper, practical understanding of how neural networks function, resulting in improved debugging skills and an enhanced ability to comprehend complex architectures.
2. From Bytes to Ideas: LLMs Without Tokenization By MKWriteshere
The author examines Metaβs Autoregressive U-Net (AU-Net), an architecture designed to overcome the limitations of traditional tokenization in language models. Instead of using predefined tokens, AU-Net processes raw text at the byte level, learning to build understanding from letters up to concepts. This method improves the handling of typos and new languages. Performance benchmarks show AU-Net is competitive with standard models, demonstrating particular strength in multilingual translation and character-level tasks. However, the current version is primarily optimized for Latin-script languages.
3. Understanding Model Context Protocol (MCP): The Future of AI Tool Integration By Mahendramedapati
This blog explains the Model Context Protocol (MCP), a standardized method for connecting AI models to external systems, such as databases, APIs, and files. It functions as a universal translator, eliminating the need for complex, custom integrations for each tool. Using a hotel concierge analogy, the author illustrates how MCP securely fetches and formats data for the AI. The text outlines benefits such as reduced costs and improved security, and provides guides with code examples for implementation.
4. The Best Optimization Algorithm for Your Neural Network By Riccardo Andreoni
The author presented a guide to neural network optimization algorithms designed to reduce training time. It reviews foundational methods, such as Batch and Mini-Batch Gradient Descent, before explaining more advanced techniques. It covered Momentum, which uses past gradients for faster convergence, and RMSprop, which adapts learning rates for each parameter. The discussion then progressed to Adam, an algorithm combining both approaches, whose superior performance was demonstrated in a practical comparison on the Fashion MNIST dataset. The summary also noted learning rate decay as a complementary technique to refine the training process.
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI