Pipeline parallelism as a bandaid on memory limitations
Large models as engines of computation
Distilling models makes them feasible to use
Quadratic Complexity holds back the legendary transformer (Part 2)
Quadratic Complexity holds back the legendary transformer (Part 1)
Composing models together makes them powerful
Multi-modal models are the future
Deep learning solves a 20-year long unsolved problem in science (Part 2)
Deep learning solves a 20-year long unsolved problem in science (Part 1)
Models can do calculus better than you
Is a group of expert models better than one very smart model?
Winning the AI Lottery by Buying a Lot of Tickets
Using information retrieval for code generation
Meta's new model is small and mighty
Models can control robots just like humans
Anthropic makes AI that teaches itself ethics
Models can magically learn new skills at scale
Discovering a better optimization algorithm with evolution
Talking to models requires special prompts that help them think sequentially