On the opportunities and risks of foundation models R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ... arXiv preprint arXiv:2108.07258, 2021 | 3784 | 2021 |
Wilds: A benchmark of in-the-wild distribution shifts PW Koh, S Sagawa, H Marklund, SM Xie, M Zhang, A Balsubramani, ... arXiv preprint arXiv: 2012.07421, 2021 | 1341 | 2021 |
Holistic Evaluation of Language Models P Liang, R Bommasani, T Lee, D Tsipras, D Soylu, M Yasunaga, Y Zhang, ... arXiv preprint arXiv:2211.09110, 2022 | 998* | 2022 |
StarCoder: may the source be with you! R Li, LB Allal, Y Zi, N Muennighoff, D Kocetkov, C Mou, M Marone, C Akiki, ... arXiv preprint arXiv:2305.06161, 2023 | 674* | 2023 |
Extending the WILDS Benchmark for Unsupervised Adaptation S Sagawa, PW Koh, T Lee, I Gao, SM Xie, K Shen, A Kumar, W Hu, ... arXiv preprint arXiv:2112.05090, 2021 | 120 | 2021 |
Evaluating Human-Language Model Interaction M Lee, M Srivastava, A Hardy, J Thickstun, E Durmus, A Paranjape, ... arXiv preprint arXiv:2212.09746, 2022 | 86 | 2022 |
Holistic Evaluation of Text-to-Image Models T Lee, M Yasunaga, C Meng, Y Mai, JS Park, A Gupta, Y Zhang, ... Thirty-seventh Conference on Neural Information Processing Systems Datasets …, 2023 | 59 | 2023 |
BioMedLM: A 2.7 B Parameter Language Model Trained On Biomedical Text E Bolton, A Venigalla, M Yasunaga, D Hall, B Xiong, T Lee, R Daneshjou, ... arXiv preprint arXiv:2403.18421, 2024 | 38* | 2024 |
Can small and synthetic benchmarks drive modeling innovation? a retrospective study of question answering modeling approaches NFLTL Robin, JP Liang | 28* | 2021 |
Cheaply estimating inference efficiency metrics for autoregressive transformer models D Narayanan, K Santhanam, P Henderson, R Bommasani, T Lee, ... Advances in Neural Information Processing Systems 36, 66518-66538, 2023 | 9* | 2023 |
The First Steps to Holistic Evaluation of Vision-Language Models PL T Lee, Y Mai, C Wong, J Roberts, M Yasunaga, F Kaiyom, R Bommasani https://crfm.stanford.edu/2024/05/08/vhelm-initial.html, 2024 | | 2024 |