top of page

Shanghang Zhang

A picture with Stockholm City Hall,
the venue of the Nobel Prize award ceremony

Dr. Shanghang Zhang is a Tenure Track Assistant Professor at the School of Computer Science, Peking UniversityShe has been the postdoc research fellow at Berkeley AI Research Lab (BAIR), UC Berkeley. Her research focuses on OOD Generalization that enables the machine learning systems to generalize to new domains, categories, and modalities using limited labels, with applications to autonomous driving and robotics, as reflected in her over 50 papers on top-tier journals and conference proceedings (Google Scholar Citations: 4321, H-index: 28, I10-index: 38). She has also been the author and editor of the book “Deep Reinforcement Learning: Fundamentals, Research and Applications” published by Springer Nature. This book is selected to Annual High-Impact Publications in Computer Science by Chinese researchers and its Electronic Edition has been downloaded 150,000 times worldwide. Her recent work “Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting” has received the AAAI 2021 Best Paper Award.  It ranks the 1st place of Trending Research on PaperWithCode and its Github receives 3,300+ Stars.


Shanghang has been selected to “2018 Rising Stars in EECS, USA”. She has also been awarded the Adobe Academic Collaboration Fund, Qualcomm Innovation Fellowship (QInF) Finalist Award, and Chiang Chen Overseas Graduate Fellowship. Her research outcomes have been successfully productized into real-world machine learning solutions and filed 5 patents. Dr. Zhang has been the chief organizer of several workshops on ICML/NeurIPS, and the special issue on ICMR. Dr. Zhang received her Ph.D. from Carnegie Mellon University in 2018, and her Master from Peking University.


News and Events

CVPR, 2023 June, 8 papers accepted by CVPR23
AAAI, 2023 Feb, Organizing the 1st AAAI23 Practical Deep Learning in the Wild Workshop
NeurIPS, 2022 Dec, 3 papers accepted by NeurIPS22
NeurIPS, 2022 Dec, Organizing the 1st NeurIPS22 Human in the Loop Learning Workshop.
ECCV, 2022 Oct., 2 papers accepted by ECCV22
IJCAI, 2022 July, 1 paper accepted by IJCAI22
ICML, 2022 June, 1 paper accepted by ICML22
CVPR, 2022 June, 1 paper accepted by CVPR22
ICRA, 2022 June, 1 paper accepted by ICRA22
NeurIPS, 2021 Dec, 3rd place on Visual Domain Adaptation Challenge
ICCV, 2021 Oct, Keynote talk, The 2nd Anti-UAV Workshop & Challenge.
CCN, 2021 Sep, 1st place on the Algonauts Project 2021 Challenge.
CVPR, 2021 June, Speak at the 4th Workshop and Prize Challenge: Bridging the Gap between Computational Photography  and Visual Recognition (UG2+)
ODSC, 2020 Oct 31, Speak on the Open Data Science Conference
UCSB, 2020 Oct 27, Guest Lecture on Multimodal Learning
ICML, 2020 July 18, Organizing the 2nd ICML Workshop on Human in the Loop Learning
NeurIPS, 2019 Dec 8-12, Vancouver, Canada
ICML, 2019 June 13, Organizing the 1st ICML Workshop on Human in the Loop Learning

RESEARCH (to be updated)


My research focuses on machine learning generalization in the open world, including theory, algorithm, and system development, with applications to important IoT problems such as autonomous driving and robotics. Especially, by investigating the brain cognition mechanism, I develop generalized and efficient machine learning system that can adapt to new domains and modalities with limited labels.

Rethinking Distributional Matching based Domain Adaptation

  • Systematically analyze the existing Distributional Matching based DA methods, and find they can only work under simple covariate/label shift with strong assumptions, while they may fail in real-world problems which have Label Distribution Shift or Pseudo Label Distribution Shift.

  • Propose a new instance-based information matching DA algorithm that is more robust to these distributional shifts.

Compositional Few-Shot Learning     

  • Provide a compositional view of the widely adopted FSL baseline model.

  • Based on this view, to imitate humans' ability of learning visual primitives and composing primitives to recognize novel classes, we propose an approach to FSL to learn a feature representation composed of important primitives, which is jointly trained with primitive discovery and primitive enhancing.

Learning Invariant Risks and Representations for Domain Adaptation

  • Derive tighter upper bound for semi-DA, which simultaneously achieves marginal and conditional distribution alignment.

  • Based on this upper bound, we propose the Invariant Risks and Representations Minimization framework to solve the semi-supervised DA as an invariant integrity optimization problem across domains from the informatic point of view.

Multi-source Distilling Domain Adaptation

Propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones.

Topology Adaptive Graph Convolutional Networks

Propose the topology adaptive graph convolutional network, a novel graph convolutional network that generalizes CNN architectures to graph-structured data and provides a systematic way to design a set of fixed-size learnable filters to perform convolutions on graphs. The topologies of these filters are adaptive to the topology of the graph when they scan the graph to perform convolution, replacing the square filter for the grid-structured data in traditional CNNs. It can be used with both directed and undirected graphs.

  • Propose a new generalization bound for domain adaptation when there are multiple source domains with labeled instances and one target domain with unlabeled instances.

  • Propose an efficient implementation of the theoretical results using adversarial neural networks: Learn feature representations that are invariant to the multiple domain shifts while still being discriminative for the learning task.

Long Term Time-Series Transformer   

Time-series forecasting is a long-standing problem in machine learning, and it remains as a sequence to sequence prediction paradigm. Recent works on the Transformer have revealed that the self-attention mechanism improves the sequence alignment performance and easily handles massive input sequences in Natural Language Processing. Inspired by this, we proposed a Long Term Time-series Transformer (LT^3) targeting the long term sequences' prediction. Our model has three distinctive characteristics:

  • Uniform inputs representation, a scaled combination of the scalar projection and time-stamp embeddings,  provides a way to measure quantity change and temporal shift simultaneously;

  • Self-attention distilling, a conv-maxpool operation halving cascading layer inputs, privileges dominating attention compositions and sharply reduces the size of network;

  • Dependency pyramid, subsequently truncated self-attention stacks at cross-scale, allows the encoder to fetch a diversified resolution of self-attention feature map.

Generalized Zero-Shot Learning

  • Generalized zero-shot learning (GZSL) is a challenging class of vision and knowledge transfer problems in which both seen and unseen classes appear during testing. We propose the Dual Adversarial Semantics-Consistent Network (DASCN), which learns primal and dual Generative Adversarial Networks (GANs) in a unified framework for GZSL. In particular, the primal GAN learns to synthesize inter-class discriminative and semantics-preserving visual features from both the semantic representations of seen/unseen classes and the ones reconstructed by the dual GAN. The dual GAN enforces the synthetic visual features to represent prior semantic knowledge well via semantics-consistent adversarial learning.

  • Generalized zero-shot learning for ICD coding, which is essentially the multi-label classification with structural label space.

Understand photo blurs with salient objects segmentation

  • Generate spatially-variant blur responses using fully convolutional neural networks.

  • Understand if such responses are undesired by distilling higher-level image semantics: Learn salient object segmentation map and content feature map to localize important content in the images. 

Deep learning based Environment Understanding for Autonomous Driving

  • Design deep convolutional neural network (DCNN) to detect and understand obstacles around driving vehicles.

  • Hierarchical feature extractor to adapt the network particularly to autonomous driving without overfitting.

  • Finalist Awards for Qualcomm Innovation Fellowship (35 outside 146 teams from Engineering Top 10 Univerisities in U.S.)

Deep Understanding of Urban Traffic from Large-Scale City Cameras

I develop a deep multi-task model to jointly estimate vehicle density, segment foreground and detect vehicles based on fully convolution networks to overcome the challenges of web camera data. Multi-domain adaptation mechanism is explored to adapt the deep model to different cameras and environmental conditions. Filters in each convolution layer are dynamically generated to learn different camera perspectives. Deep spatio-temporal networks are developed to incorporate the temporal information of traffic flow.

Learning deep features for multi-modal inference with robotic data

Develop a multi-task learning scheme based on neural networks for robot action prediction, which is very important step for autonomous robots design.

Develop a Convolutional Variational Auto-Encoder to generate features of percepted images for the robot action prediction, which is capable of capturing the useful statistics of robot actions without requiring large-scale training samples or hand-engineered features.

bottom of page