287
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Monocular 3D object detection with thermodynamic loss and decoupled instance depth

ORCID Icon, & ORCID Icon
Article: 2316022 | Received 05 Jun 2023, Accepted 02 Feb 2024, Published online: 13 Feb 2024

References

  • Brazil, G., & Liu, X. (2019). M3D-RPN: Monocular 3D region proposal network for object detection. In Proceedings of the IEEE international conference on computer vision (pp. 9286–9295).
  • Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., & Urtasun, R. (2018). 3D object proposals using stereo imagery for accurate object class detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(5), 1259–1272. https://doi.org/10.1109/TPAMI.2017.2706685
  • Chen, Y., Huang, S., Liu, S., Yu, B., & Jia, J. (2023). Dsgn++: Exploiting visual-spatial relation for stereo-based 3D detectors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4416–4429. https://doi.org/10.1109/TPAMI.2022.3200725
  • Chen, Y., Tai, L., Sun, K., & Li, M. (2020). MonoPair: Monocular 3D object detection using pairwise spatial relationships. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 12090–12099).
  • Chen, Y.-N., Dai, H., & Ding, Y. (2022). Pseudo-stereo for monocular 3D object detection in autonomous driving. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 877–887).
  • Ding, M., Huo, Y., Yi, H., Wang, Z., Shi, J., Lu, Z., & Luo, P. (2020). Learning depth-guided convolutions for monocular 3d object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 11669–11678).
  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 3354–3361).
  • Jiang, C., Wang, G., Miao, Y., & Wang, H. (2022). 3D scene flow estimation on pseudo-LiDAR: bridging the gap on estimating point motion. arXiv. https://doi.org/10.48550/arXiv.2209.13130
  • Joseph, K. J., Khan, S., Khan, F. S., & Balasubramanian, V. N. (2021). Towards open world object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 5826–5836).
  • Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. In 3rd International conference on learning representations, ICLR 2015 - Conference track proceedings.
  • Lei, J., Guo, T., Peng, B., & Yu, C. (2021). Depth-assisted joint detection network for monocular 3D object detection. In Proceedings - International conference on image processing, ICIP (pp. 2204–2208).
  • Li, P., Chen, X., & Shen, S. (2019). Stereo R-CNN based 3D object detection for autonomous driving. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 7636–7644).
  • Li, Y., Chen, Y., He, J., & Zhang, Z. (2022a). Densely constrained depth estimator for monocular 3D object detection. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 718–734).
  • Li, Z., Qu, Z., Zhou, Y., Liu, J., Wang, H., & Jiang, L. (2022b). Diversity matters: Fully exploiting depth clues for reliable monocular 3D object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2781–2790).
  • Lian, Q., Li, P., & Chen, X. (2022). Monojsg: Joint semantic and geometric cost volume for monocular 3D object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 1060–1069).
  • Liu, W., Zhu, D., Luo, H., & Li, Y. (2023). 3D object detection with fusion point attention mechanism in LiDAR point cloud. Guangzi Xuebao/Acta Photonica Sinica, 52(9), 0912002. https://doi.org/10.3788/gzxb20235209.0912002
  • Liu, X., Xue, N., & Wu, T. (2022). Learning auxiliary monocular contexts helps monocular 3D Object detection. In Proceedings of the 36th AAAI conference on artificial intelligence, AAAI 2022 (pp. 1810–1818). https://doi.org/10.1609/aaai.v36i2.20074
  • Liu, Z., Wu, Z., & Toth, R. (2020). SMOKE: Single-stage monocular 3D object detection via keypoint estimation. In IEEE computer society conference on computer vision and pattern recognition workshops (pp. 4289–4298).
  • Liu, Z., Zhou, D., Lu, F., Fang, J., & Zhang, L. (2021). Autoshape: Real-time shape-aware monocular 3D object detection. In Proceedings of the IEEE international conference on computer vision (pp. 15621–15630).
  • Lu, Y., Ma, X., Yang, L., Zhang, T., Liu, Y., Chu, Q., Yan, J., & Ouyang, W. (2021). Geometry uncertainty projection network for monocular 3D object detection. In Proceedings of the IEEE international conference on computer vision (pp. 3091–3101).
  • Ma, X., Liu, S., Xia, Z., Zhang, H., Zeng, X., & Ouyang, W. (2020). Rethinking pseudo-LiDAR representation. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 311–327).
  • Manhardt, F., Kehl, W., & Gaidon, A. (2019). ROI-10D: Monocular lifting of 2D detection to 6D pose and metric shape. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2064–2073).
  • Miao, Y., Deng, H., Jiang, C., Feng, Z., Wu, X., Wang, G., & Wang, H. (2023). Pseudo-LiDAR for Visual Odometry. IEEE Transactions on Instrumentation and Measurement, 72, 1–9. https://doi.org/10.1109/TIM.2023.3315416
  • Mousavian, A., Anguelov, D., Koecka, J., & Flynn, J. (2017). 3D bounding box estimation using deep learning and geometry. In Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017 (pp. 5632–5640).
  • Ouyang, E., Zhang, L., Chen, M., Arnab, A., & Fu, Y. (2021). Dynamic depth fusion and transformation for monocular 3D object detection. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 349–364).
  • Peng, L., Liu, F., Yu, Z., Yan, S., Deng, D., Yang, Z., Liu, H., & Cai, D. (2022a). Lidar point cloud guided monocular 3D object detection. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 123–139).
  • Peng, L., Wu, X., Yang, Z., Liu, H., & Cai, D. (2022b). DID-M3D: Decoupling instance depth for monocular 3D object detection. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 71–88).
  • Qian, R., Lai, X., & Li, X. (2022). 3D object detection for autonomous driving: A survey. Pattern Recognition, 130, 108796. https://doi.org/10.1016/j.patcog.2022.108796
  • Reading, C., Harakeh, A., Chae, J., & Waslander, S. L. (2021). Categorical depth distribution network for monocular 3D object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 8551–8560).
  • Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
  • Shi, X., Chen, Z., & Kim, T.-K. (2020). Distance-normalized unified representation for monocular 3D object detection. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 91–107).
  • Shin, K., Kwon, Y. P., & Tomizuka, M. (2019). RoarNet: A Robust 3D object detection based on region approximation refinement. In IEEE intelligent vehicles symposium, proceedings (pp. 2510–2515).
  • Simonelli, A., Bulo, S. R., Porzi, L., Ricci, E., & Kontschieder, P. (2020). Towards generalization across depth for monocular 3D object detection. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics), 767–782.
  • Tang, Y., Dorn, S., & Savani, C. (2021). Center3D: Center-based monocular 3D object detection with joint depth understanding. In Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics) (pp. 289–302).
  • Van Dijk, T., & De Croon, G. (2019). How do neural networks see depth in single images? In Proceedings of the IEEE international conference on computer vision (pp. 2183–2191).
  • Wang, G., Tian, X., Ding, R., & Wang, H. (2021a). Unsupervised learning of 3D scene flow from monocular camera. In Proceedings - IEEE international conference on robotics and automation (pp. 4325–4331).
  • Wang, L., Zhang, L., Zhu, Y., Zhang, Z., He, T., Li, M., & Xue, X. (2021b). Progressive coordinate transforms for monocular 3D object detection. In Advances in neural information processing systems (pp. 13364–13377).
  • Wang, Q., Li, Z., Zhu, D., & Yang, W. (2023). LiDAR-only 3D object detection based on spatial context. Journal of Visual Communication and Image Representation, 93, 103805. https://doi.org/10.1016/j.jvcir.2023.103805
  • Wang, X., Yin, W., Kong, T., Jiang, Y., Li, L., & Shen, C. (2020). Task-aware monocular depth estimation for 3D object detection. In AAAI 2020 - 34th AAAI conference on artificial intelligence (pp. 12257–12264).
  • Xu, B., & Chen, Z. (2018). Multi-level fusion based 3D object detection from monocular images. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2345–2353).
  • Yan, Y., Mao, Y., & Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors (Switzerland), 18(10), 3337. https://doi.org/10.3390/s18103337
  • Yang, B., Luo, W., & Urtasun, R. (2018). Pixor: Real-time 3D object detection from point clouds. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 7652–7660).
  • Yu, F., Wang, D., Shelhamer, E., & Darrell, T. (2018). Deep layer aggregation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 2403–2412).
  • Zhou, X., Wang, D., & Krahenbuhl, P. (2019). Objects as points. arXiv.
  • Zhou, Y., & Tuzel, O. (2018). Voxelnet: End-to-end learning for point cloud based 3D object detection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 4490–4499).
  • Zhou, Z., Du, L., Ye, X., Zou, Z., Tan, X., Zhang, L., Xue, X., & Feng, J. (2022). Sgm3d: Stereo guided monocular 3D object detection. IEEE Robotics and Automation Letters, 7(4), 10478–10485. https://doi.org/10.1109/LRA.2022.3191849