random weekly thoughts - 2024/04/07

random thoughts and ideas

interesting papers

2024-04-05

Training LLMs over Neurally Compressed Text
- latent space representation in text space, looks a bit worse in terms of perplexity w.r.t. BPE tokenization but resulting sequence length is shorter
ReFT: Representation Finetuning for Language Models
- Lucas Beyer explanation
- instead of finetuning whole layers, add parameters dynamically at each token
Robust Concept Erasure Using Task Vectors

2024-04-04

Mixture-of-Depths
- set a cap to the amount of computation and route computation to top-k tokens
Visual Autoregressive Modeling
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
On the Scalability of Diffusion-based Text-to-Image Generation
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
- enable the Mind’s Eye in LLMs to visualize intermediate steps

2024-04-03

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
- when computation budget is fixed, bigger models might be worse than smaller (more efficient) models

2024-04-02

Measuring Style Similarity in Diffusion Models
- extract style descriptors from image to check match against database of artists (?)

2024-04-01

Gecko: Versatile Text Embeddings Distilled from Large Language Models
- better text embeddings for retrieval by distilling K from larger LLM
- could these embedding be good for t2i?

interesting stuff

Introducing Command R+: A Scalable LLM Built for Business
- With MLX inference
Bringing Python to Workers using Pyodide and WebAssembly
- python inference in cloudflare workers with preloaded langchain extension, it’d be interesting trying LLM inference + RAG on this
Schedule-Free Learning
- code implementation
- it does not mean no lr tuning but at least it seems to avoid crafting lr schedule
How to graduate your PhD when you have no hope

The worthwhile problems are the ones you can really solve or help solve, the ones you can really contribute something to. A problem is grand in science if it lies before us unsolved and we see some way for us to make some headway into it.

No problem is too small or too trivial if we can really do something about it.

Mario meets Pareto
- some choices in mk8 are dominated by others, as long as you’re on pareto frontier you’re good
What I think about when I edit

thoughts and ideas

research

dynamic computation seems like a big deal right now, with MOD and Quiet-STaR we can have dynamic routing on tokens with an arbitrary (?) amount of computation spent on each token. sounds like we can craft system that can choose important tokens and reason more on some. combined with some latent representation we could achieve some sort of abstract reasoning?
similar idea could be applied to image generation to avoid spending computation on un-important parts of the image and focus on more important, or to filter un-relevant tokens in text conditioning. the current vae used in stable diffusion is a bit of a glorified downscaler, but with a token representation the same rationale could be applied.
gecko embeddings for t2i as it looks they can carry more information than “normal” CLIP and T5 embeddings.

off-topic

fell in love with Bach’s BWV 1062
not sure i like latest jjk chapters :/