Follow
Fazl Barez
Fazl Barez
University of Oxford, Tangentic
Verified email at robots.ox.ac.uk - Homepage
Title
Cited by
Cited by
Year
Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
J Hoelscher-Obermaier*, J Persson*, E Kran, I Konstas, F Barez*
Findings of the Association for Computational Linguistics 2023, 11548–11559, 2023
442023
Sleeper agents: Training deceptive llms that persist through safety training
E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ...
arXiv preprint arXiv:2401.05566, 2024
402024
PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration
P Li, H Tang, T Yang, X Hao, T Sang, Y Zheng, J Hao, ME Taylor, Z Wang, ...
arXiv preprint arXiv:2203.08553, 2022
312022
The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python
AVM Barone*, F Barez*, I Konstas, SB Cohen
The 61st Annual Meeting Of The Association For Computational Linguistics, 2023
25*2023
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
K Duh, H Gomez, S Bethard
Proceedings of the 2024 Conference of the North American Chapter of the …, 2024
182024
Neuron to Graph: Interpreting Language Model Neurons at Scale
A Foote*, N Nanda, E Kran, I Konstas, S Cohen, F Barez*
arXiv preprint arXiv:2305.19911, 2023
17*2023
Understanding Addition in Transformers
P Quirke, F Barez
International Conference on Learning Representations (ICLR), 2023
162023
Risks and Opportunities of Open-Source Generative AI
F Eiras, A Petrov, B Vidgen, C Schroeder, F Pizzati, K Elkins, ...
arXiv preprint arXiv:2405.08597, 2024
82024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
C Denison, M MacDiarmid, F Barez, D Duvenaud, S Kravec, S Marks, ...
arXiv preprint arxiv.org/abs/2406.10162, 2024
72024
Near to mid-term risks and opportunities of open source generative ai
F Eiras, A Petrov, B Vidgen, CS de Witt, F Pizzati, K Elkins, ...
arXiv preprint arXiv:2404.17047, 2024
52024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (arXiv: 2401.05566). arXiv
E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ...
Link to article, 2024
52024
Benchmarking specialized databases for high-frequency data
F Barez, P Bilokon, R Xiong
arXiv preprint arXiv:2301.12561, 2023
52023
Sycophancy to subterfuge: Investigating reward-tampering in large language models, 2024
C Denison, M MacDiarmid, F Barez, D Duvenaud, S Kravec, S Marks, ...
URL https://arxiv. org/abs/2406.10162, 0
5
Identifying a preliminary circuit for predicting gendered pronouns in gpt-2 small
C Mathwin, G Corlouer, E Kran, F Barez, N Nanda
URL: https://itch. io/jam/mechint/rate/1889871, 2023
42023
System III: Learning with Domain Knowledge for Safety Constraints
F Barez, H Hasanbieg, A Abbate
NeurIPS ML Safety Workshop, 2022
42022
Discovering topics and trends in the UK Government web archive
D Beavan, F Barez, M Bel, J Fitzgerald, E Goudarouli, K Kollnig, ...
Data Study Group Final Report. Alan Turing Institute, London, 2021
4*2021
Large language models relearn removed concepts
M Lo, SB Cohen, F Barez
arXiv preprint arXiv:2401.01814, 2024
32024
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
A Garde, E Kran, F Barez
arXiv preprint arXiv:2310.01870, 2023
32023
Fairness in AI and Its Long-Term Implications on Society
O Bohdal*, T Hospedales, PHS Torr, F Barez*
arXiv preprint arXiv:2304.09826, 2023
32023
Exploring the advantages of transformers for high-frequency trading
F Barez, P Bilokon, A Gervais, N Lisitsyn
arXiv preprint arXiv:2302.13850, 2023
32023
The system can't perform the operation now. Try again later.
Articles 1–20