I am Yiwei Zhang, a fourth-year Ph.D. student in Institute of Computing Technology (ICT), University of Chinese Academy of Sciences (UCAS), advised by Prof. Yunquan Zhang. I was a research intern in Heterogeneous Extreme Computing (HEX) Group of Microsoft Research Asia (MSRA), advised by Dr. Kun Li and Dr. Ting Cao. I obtained my B.Eng. degree from the Department of Computer Science and Technology, Tsinghua University in 2021.

My research interests include high-performance computing and parallel heterogeneous computing. Previously, I focused primarily on data-level parallel optimization for stencil computations, with related research published at the two top international HPC conferences. Recently, I have developed a strong interest in MLSys and LLMs. If you are interested in collaborating or have any guidance to offer, I would be delighted to hear from you.

If you’d like to discuss anything further, please don’t hesitate to reach out.

🔥 News

  • 2024.11:  🎉🎉 Our paper Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers has been accepted by ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming 2025 (PPoPP’25).
  • 2024.11:  🎉🎉 Our paper LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores was nominated for the SC’24 Best Reproducibility Advancement Award.
  • 2024.07:  🎉🎉 Our paper LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores has been accepted by International Conference for High Performance Computing, Networking, Storage and Analysis (SC’24).

📝 Publications

🎖 Honors and Awards

  • 2025.01 - Director’s Excellence Award (Institute of Computing Technology), University of Chinese Academy of Sciences.
  • 2024.11 - Best Reproducibility Advancement Award Nomination, SC’24.
  • 2024.10 - First-Class Graduate Academic Scholarship, University of Chinese Academy of Sciences.
  • 2024.05 - Honors for Merit Student, University of Chinese Academy of Sciences.

📖 Educations

💬 Invited Talks

  • 2025.03, “Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers”. ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), Las Vegas, NV, USA.
  • 2025.03, “FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units”. ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), Las Vegas, NV, USA.
  • 2024.11, “LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores”. High Performance Youth Forum, Chongqing, China.
  • 2024.11, “LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores”. International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Atlanta, GA, USA.

💻 Internships

  • 2023.09 - 2025.05, Heterogeneous Extreme Computing (HEX), Microsoft Research Asia (MSRA), Beijing, China.