Publications & Awards
Hello, Bonjour, こんにちは, 你好! Glad to see you!
Research Papers
Here are papers I published as the (co-)first or corresponding authors. Visit my Google Scholar page to view the full publication list.
Someone† denotes equal contribution; Someone denotes corresponding authors.
🤖 Domain-Specific Processors for Robotics & Machine Vision
- [TCAS-1'25] Mengjie Li, Yiming Zhang, Siqi He, Qi Liu, Xiaoyang Zeng, Chixiao Chen, Haozhe Zhu. “RT-FLOW: FPGA Implementation of Real-Time Optical-Flow-based SLAM for High-Speed Tracking and High-Quality Mapping”
- [MICRO'24] Lizhou Wu, Haozhe Zhu, Siqi He, Jiapei Zheng, Chixiao Chen, Xiaoyang Zeng. “GauSPU: 3D Gaussian Splatting Processor for Real-Time SLAM Systems”
- [JSSC'24] Mengjie Li†, Haozhe Zhu†, Siqi He, Hongyi Zhang, Jie Liao, Danfeng Zhai, Chixiao Chen, Qi Liu, Xiaoyang Zeng, Ninghui Sun, Ming Liu. “SLAM-CIM: A Visual SLAM Backend Processor With Dynamic-Range-Driven-Skipping Linear-Solving FP-CIM Macros”
- [A-SSCC'24] Mengjie Li, Haozhe Zhu, Yiming Zhang, Siqi He, Qi Liu, Xiaoyang Zeng, Chixiao Chen. “A Real-Time Optical-Flow-based SLAM FPGA Accelerator with Inter-Frame Similarity Exploitation and Correlation-Guided Mixed-Precision Flow Update”
- [TVLSI'24] Lizhou Wu, Haozhe Zhu, Jiapei Zheng, Mengjie Li, Yinuo Cheng, Qi Liu, Xiaoyang Zeng, Chixiao Chen. “Hi-NeRF: A Multicore NeRF Accelerator With Hierarchical Empty Space Skipping for Edge 3-D Rendering”
Chiplet Integration
- [DAC'25] Siqi He†, Haozhe Zhu†, Jiapei Zheng, Lizhou Wu, Bo Jiao, Qi Liu , Xiaoyang Zeng, Chixiao Chen. "Hydra: Harnessing Expert Popularity for Efficient Mixture-of-Expert Inference on Chiplet System" (Accepted)
- [HPCA'25] Siyao Jia†, Bo Jiao†, Haozhe Zhu, Chixiao Chen, Qi Liu, Ming Liu. “EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer”
- [ISSCC'25] Bo Jiao†, Haozhe Zhu†, Yuman Zeng, Yongjiang Li, Jie Liao, Siyao Jia, Mochen Tian, Zexing Chen, Jundong Zhu, Dexin Wen, Yan Wang, Yu Wang, Jian Xu, Feng Wang, Jun Tao, Chixiao Chen, Qi Liu, Ming Liu. “37.4 SHINSAI: A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory”
- [ISCAS'23] Jie Liao, Bo Jiao, Jinshan Zhang, Shiwei Liu, Hao Jiang, Jun Tao, Wenning Jiang, Qi Liu, Lihua Zhang, Haozhe Zhu, Chixiao Chen. “A Scalable Die-to-Die Interconnect with Replay and Repair Schemes for 2.5D/3D Integration”
- [ISSCC'22] Haozhe Zhu†, Bo Jiao†, Jinshan Zhang†, Xinru Jia, Yunzhengmao Wang, Tianchan Guan, Shengcheng Wang, Dimin Niu, Hongzhong Zheng, Chixiao Chen, Mingyu Wang, Lihua Zhang, Xiaoyang Zeng, Qi Liu, Yuan Xie, Ming Liu. “COMB-MCM: Computing-on-Memory-Boundary NN Processor with Bipolar Bitwise Sparsity Optimization for Scalable Multi-Chiplet-Module Edge Machine Learning”
- [GLSVLSI'21] Bo Jiao†, Haozhe Zhu†, Jinshan Zhang, Shunli Wang, Xiaoyang Kang, Lihua Zhang, Mingyu Wang, Chixiao Chen. “Computing Utilization Enhancement for Chiplet-based Homogeneous Processing-in-Memory Deep Learning Processors”
Processing in/near Memory
- [DAC'25] Lizhou Wu†, Haozhe Zhu†, Siqi He, Xuanda Lin, Xiaoyang Zeng, Chixiao Chen. "PIMoE: Towards Efficient MoE Transformer Deployment on NPU-PIM System through Throttle-Aware Task Offloading" (Accepted)
- [JSSC'25] Siqi He†, Haozhe Zhu†, Hongyi Zhang, Yujie Ma, Zexing Chen, Mengjie Li, Danfeng Zhai, Chixiao Chen, Qi Liu, Xiaoyang Zeng, Ming Liu. "A 22-nm 109.3-to-249.5-TFLOPS/W Outlier-Aware Floating-Point SRAM Compute-in-Memory Macro for Large Language Models" (Accepted)
- [ISCAS'24] Mengjie Li, Hongyi Zhang, Siqi He, Haozhe Zhu, Hao Zhang, Jinglei Liu, Jiayuan Chen, Zhenping Hu, Xiaoyang Zeng, Chixiao Chen. “A 19.7 TFLOPS/W Multiply-less Logarithmic Floating-Point CIM Architecture with Error-Reduced Compensated Approximate Adder”
- [DATE'24] Hongyi Zhang, Haozhe Zhu, Siqi He, Mengjie Li, Chengchen Wang, Xiankui Xiong, Haidong Tian, Xiaoyang Zeng, Chixiao Chen. “ARCTIC: Agile and Robust Compute-In-Memory Compiler with Parameterized INT/FP Precision and Built-In Self Test”
- [TCAS-2'24] Haozhe Zhu†, Hongyi Zhang†, Siqi He, Mengjie Li, Xiaoyang Zeng, Chixiao Chen. “Trident-CIM: A LUT-Based Compute-in-Memory Macro With Trident Read Bit-Line and Partial Product Pruning”
- [JETCAS'20] Haozhe Zhu, Chixiao Chen, Shiwei Liu, Qiaosha Zou, Mingyu Wang, Lihua Zhang, Xiaoyang Zeng, C.-J. Richard Shi. “A Communication-Aware DNN Accelerator on ImageNet Using in-Memory Entry-Counting Based Algorithm-Circuit-Architecture Co-Design in 65nm CMOS”
- [Design&Test'19] Haozhe Zhu, Yu Wang, C.-J. Richard Shi. “Tanji: A General-Purpose Neural Network Accelerator with a Unified Crossbar Architecture”
Others
- [BioCAS'24] Jiajun Lu, Haozhe Zhu, Xiaoyang Zeng, Chixiao Chen. “ST-BPTT: A Memory-efficient BPTT SNN Training Approach through Gradient-Contribution-Driven Time-Step Selection”
- [AICAS'23] Siqi He, Hongyi Zhang, Mengjie Li, Haozhe Zhu, Chixiao Chen, Qi Liu, Xiaoyang Zeng. “Bit-Offsetter: A Bit-serial DNN Accelerator with Weight-offset MAC for Bit-wise Sparsity Exploitation”
Awards
I was (co-)awarded:
- IEEE A-SSCC 2025 Distinguished Design Award.