Follow
Yizhuo Li
Yizhuo Li
Verified email at cs.hku.hk - Homepage
Title
Cited by
Cited by
Year
Videochat: Chat-centric video understanding
KC Li, Y He, Y Wang, Y Li, W Wang, P Luo, Y Wang, L Wang, Y Qiao
arXiv preprint arXiv:2305.06355, 2023
5522023
TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model
B Pang, Y Li, Y Zhang, M Li, C Lu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020
3282020
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ...
arXiv preprint arXiv:2212.03191, 2022
3172022
Mvbench: A comprehensive multi-modal video understanding benchmark
K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ...
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024
2342024
Internvid: A large-scale video-text dataset for multimodal understanding and generation
Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ...
arXiv preprint arXiv:2307.06942, 2023
2042023
Unmasked teacher: Towards training-efficient video foundation models
K Li, Y Wang, Y Li, Y Wang, Y He, L Wang, Y Qiao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
1432023
HOI Analysis: Integrating and Decomposing Human-Object Interaction
YL Li, X Liu, X Wu, Y Li, C Lu
Advances in Neural Information Processing Systems 33, 2020
1412020
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao
arXiv preprint arXiv:2211.09552, 2022
1262022
Test-time personalization with a transformer for human pose estimation
Y Li, M Hao, Z Di, NB Gundavarapu, X Wang
Advances in Neural Information Processing Systems 34, 2583-2597, 2021
502021
Uniformerv2: Unlocking the potential of image vits for video understanding
K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
472023
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ...
arXiv preprint arXiv:2211.09529, 2022
432022
Hake: a knowledge engine foundation for human activity understanding
YL Li, X Liu, X Wu, Y Li, Z Qiu, L Xu, Y Xu, HS Fang, C Lu
IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (7), 8494-8506, 2022
342022
PGT: A Progressive Method for Training Models on Long Videos
B Pang, G Peng, Y Li, C Lu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
132021
Unsupervised representation for semantic segmentation by implicit cycle-attention contrastive learning
B Pang, Y Li, Y Zhang, G Peng, J Tang, K Zha, J Li, C Lu
Proceedings of the AAAI Conference on Artificial Intelligence 36 (2), 2044-2052, 2022
122022
Tdaf: Top-down attention framework for vision tasks
B Pang, Y Li, J Li, M Li, H Cao, C Lu
Proceedings of the AAAI Conference on Artificial Intelligence 35 (3), 2384-2392, 2021
112021
Harvest Video Foundation Models via Efficient Post-Pretraining
Y Li, K Li, Y He, Y Wang, Y Wang, L Wang, Y Qiao, P Luo
arXiv preprint arXiv:2310.19554, 2023
22023
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Y Chen, Y Ge, Y Li, Y Ge, M Ding, Y Shan, X Liu
arXiv preprint arXiv:2412.04445, 2024
2024
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Y Ge, Y Li, Y Ge, Y Shan
arXiv preprint arXiv:2412.04432, 2024
2024
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models
Y Li, Y Ge, Y Ge, P Luo, Y Shan
arXiv preprint arXiv:2412.04446, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–19