Academic
Academic
Home
Publications
Contact
Light
Dark
Automatic
Paper-Conference
H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference
Zizhuo Fu
,
Xiaotian Guo
,
Wenxuan Zeng
,
Shuzhang Zhong
,
Yadong Zhang
,
Peiyu Chen
,
Runsheng Wang
,
Le Ye
,
Meng Li
HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
Hochen Huang
,
Shuzhang Zhong
,
Zhe Zhang
,
Shuangchen Li
,
Dimin Niu
,
Hongzhong Zheng
,
Runsheng Wang
,
Meng Li
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
Shuzhang Zhong
,
Yanfan Sun
,
Ling Liang
,
Runsheng Wang
,
Ru Huang
,
Meng Li
SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
Linye Wei
,
Shuzhang Zhong
,
Songqiang Xu
,
Runsheng Wang
,
Ru Huang
,
Meng Li
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference
Shuzhang Zhong
,
Ling Liang
,
Yuan Wang
,
Runsheng Wang
,
Ru Huang
,
Meng Li
PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization
Tianshi Xu
,
Shuzhang Zhong
,
Wenxuan Zeng
,
Runsheng Wang
,
Meng Li
ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Shuzhang Zhong
,
Zebin Yang
,
Ruihao Gong
,
Runsheng Wang
,
Ru Huang
,
Meng Li
Memory-aware scheduling for complex wired networks with iterative graph optimization
Shuzhang Zhong
,
Meng Li
,
Yun Liang
,
Runsheng Wang
,
Ru Huang
Cite
×