Pipeline parallelism as a bandaid on memory limitations

Large models as engines of computation

Distilling models makes them feasible to use

Quadratic Complexity holds back the legendary transformer (Part 2)

Quadratic Complexity holds back the legendary transformer (Part 1)

Composing models together makes them powerful

Multi-modal models are the future

Deep learning solves a 20-year long unsolved problem in science (Part 2)

Deep learning solves a 20-year long unsolved problem in science (Part 1)

Models can do calculus better than you

Is a group of expert models better than one very smart model?

Winning the AI Lottery by Buying a Lot of Tickets

Using information retrieval for code generation

Meta's new model is small and mighty

Models can control robots just like humans

Anthropic makes AI that teaches itself ethics

Models can magically learn new skills at scale

Discovering a better optimization algorithm with evolution

Talking to models requires special prompts that help them think sequentially

Teaching LLMs to use tools and not suck at math