Publications

(2025). HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing. In ICCAD 2025.

(2025). H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference. In ICCAD 2025.

(2025). SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding. In DAC 2025.

(2025). HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference. In DAC 2025.

(2024). ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding. In ICCAD 2024.

(2024). PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization. In ICCAD 2024.

(2024). AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference. In ICCAD 2024.

(2023). Memory-aware scheduling for complex wired networks with iterative graph optimization. In ICCAD 2023.