Avatar

Dave Zhenyu Chen

I'm a Ph.D. candidate at TUM Visual Computing Group, advised by Prof. Dr. Matthias Niessner and Prof. Dr. Angel X. Chang. Previously, I received my Master's Degree in Informatics at Ludwig Maximilians University of Munich (LMU). Prior to this, I got my Bachelor's Degree in Computer Science at University of Electronic Science and Technology of China (UESTC).

My research interest is bridging the gap between 3D vision and natural language with the power of deep learning. Specifically I'm interested in 1) 3D object detection; 2) visual grounding in RGB-Dscans; 3) scene graph representation in indoor environments.


  Email   /     Google Scholar   /     Github   /     Linkedin   /     CV



News
[06/25/2021] I co-organized the 1st Workshop on Language for 3D Scenes at CVPR 2021!
[02/28/2021] Our paper is accepted at CVPR 2021!
[07/02/2020] Our paper is accepted at ECCV 2020!
Publications
scanrefer Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner, Angel X. Chang,
Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Nashville, US
paper / video / bibtex / project / code

We introduce the task of dense captioning in 3D scans from commodity RGB-D sensors. As input, we assume a point cloud of a 3D scene; the expected output is the bounding boxes along with the descriptions for the underlying objects. Our method can effectively localize and describe 3D objects in scenes from the ScanRefer dataset, outperforming 2D baseline methods by a significant margin (27.61% CiDEr@0.5IoUimprovement).



scanrefer ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner
European Conference on Computer Vision (ECCV), 2020, Glasgow, UK
paper / video / bibtex / project / code

We propose ScanRefer, a method that learns a fused descriptor from 3D object proposals and encoded sentence embeddings, to address the newly introduced task of 3D object localization in RGB-D scans using natural language descriptions. Along with the method we release a large-scale dataset of 51,583 descriptions of 11,046 objects from 800 ScanNet scenes.



Teaching
Teaching Assistant, Advanced Deep Learning for Computer Vision - Winter 2020/21
Teaching Assistant, Advanced Deep Learning for Computer Vision - Summer 2020
Teaching Assistant, Advanced Deep Learning for Computer Vision - Winter 2019/20
Teaching Assistant, Advanced Deep Learning for Computer Vision - Summer 2019