Follow
Sharan Narang
Sharan Narang
Research Engineer, Meta AI
Verified email at meta.com
Title
Cited by
Cited by
Year
Exploring the limits of transfer learning with a unified text-to-text transformer
C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ...
Journal of machine learning research 21 (140), 1-67, 2020
163612020
Llama 2: Open foundation and fine-tuned chat models
H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ...
arXiv preprint arXiv:2307.09288, 2023
63302023
Palm: Scaling language modeling with pathways
A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ...
Journal of Machine Learning Research 24 (240), 1-113, 2023
39582023
Deep speech 2: End-to-end speech recognition in english and mandarin
D Amodei, S Ananthanarayanan, R Anubhai, J Bai, E Battenberg, C Case, ...
International conference on machine learning, 173-182, 2016
36632016
Scaling instruction-finetuned language models
HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, Y Li, X Wang, ...
Journal of Machine Learning Research 25 (70), 1-53, 2024
20872024
Mixed precision training
P Micikevicius, S Narang, J Alben, G Diamos, E Elsen, D Garcia, ...
arXiv preprint arXiv:1710.03740, 2017
17922017
Deep voice 3: Scaling text-to-speech with convolutional sequence learning
W Ping, K Peng, A Gibiansky, SO Arik, A Kannan, S Narang, J Raiman, ...
arXiv preprint arXiv:1710.07654, 2017
863*2017
Self-consistency improves chain of thought reasoning in language models
X Wang, J Wei, D Schuurmans, Q Le, E Chi, S Narang, A Chowdhery, ...
arXiv preprint arXiv:2203.11171, 2022
7312022
Deep learning scaling is predictable, empirically
J Hestness, S Narang, N Ardalani, G Diamos, H Jun, H Kianinejad, ...
arXiv preprint arXiv:1712.00409, 2017
6622017
Exploring sparsity in recurrent neural networks
S Narang, E Elsen, G Diamos, S Sengupta
arXiv preprint arXiv:1704.05119, 2017
3532017
Byt5: Towards a token-free future with pre-trained byte-to-byte models
L Xue, A Barua, N Constant, R Al-Rfou, S Narang, M Kale, A Roberts, ...
Transactions of the Association for Computational Linguistics 10, 291-306, 2022
3372022
DSD: regularizing deep neural networks with dense-sparse-dense training flow
S Han, J Pool, S Narang, H Mao, S Tang, E Elsen, B Catanzaro, J Tran, ...
arXiv preprint arXiv:1607.04381 3 (6), 2016
326*2016
Wt5?! training text-to-text models to explain their predictions
S Narang, C Raffel, K Lee, A Roberts, N Fiedel, K Malkan
arXiv preprint arXiv:2004.14546, 2020
1782020
Block-sparse recurrent neural networks
S Narang, E Undersander, G Diamos
arXiv preprint arXiv:1711.02782, 2017
1512017
Scaling up models and data with t5x and seqio
A Roberts, HW Chung, G Mishra, A Levskaya, J Bradbury, D Andor, ...
Journal of Machine Learning Research 24 (377), 1-8, 2023
1282023
Scale efficiently: Insights from pre-training and fine-tuning transformers
Y Tay, M Dehghani, J Rao, W Fedus, S Abnar, HW Chung, S Narang, ...
arXiv preprint arXiv:2109.10686, 2021
1112021
Do transformer modifications transfer across implementations and applications?
S Narang, HW Chung, Y Tay, W Fedus, T Fevry, M Matena, K Malkan, ...
arXiv preprint arXiv:2102.11972, 2021
892021
Palm: Scaling language modeling with pathways. arXiv 2022
A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ...
arXiv preprint arXiv:2204.02311 10, 2022
852022
Huai hsin Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V
HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, E Li, X Wang, ...
Le, and Jason Wei, 2022
772022
Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv
C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ...
arXiv preprint arXiv:1910.10683, 2019
742019
The system can't perform the operation now. Try again later.
Articles 1–20