ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Best Practices and Lessons Learned on Synthetic Data for Language Models
2024-04-11
BRAVE: Broadening the visual encoding of vision-language models
Adapting LLaMA Decoder to Vision Transformer
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
2024-04-10
OmniFusion Technical Report
2024-04-09
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
2024-04-08
No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
We consistently find that, far from exhibiting “zero-shot” generalization, multimodal models require exponentially more data to achieve linear improvements in downstream “zero-shot” performance, following a sample inefficient log-linear scaling trend.
non-attentive network seems to be (a bit) back, griffin architecture from google in llm, rwkv for diffusion, plus the infinite context for llm. learning a compressed representation of past sounds a bit like what rnn did back in lstm no? also seq2seq, is it just a matter of scaling?
openeqa shows how current networks are unable to operate in the wild with real-life scenarios. what’s the missing piece to bring research in real world? i am more and more convinced that deep learning is not the way
off-topic
i’ve been reading less this week, both books and comics :(