Shanghang Zhang

I am a postdoctoral research fellow in the Berkeley AI Research (BAIR) Lab, the Department of Electrical Engineering and Computer Sciences, UC Berkeley, working with Prof. Kurt Keutzer and Prof. Trevor Darrell. Her research interests cover deep learning, computer vision, and reinforcement learning, especially on machine learning with limited training data, including low-shot learning, domain adaptation, and meta-learning, which enables the learning system to automatically adapt to real-world variations and new environments. She was one of the “2018 Rising Stars in EECS” (a highly selective program launched at MIT in 2012, which has since been hosted at UC Berkeley, Carnegie Mellon, and Stanford annually). She has also been selected for the Qualcomm Innovation Fellowship (QInF) Finalist Award and Chiang Chen Overseas Graduate Fellowship. She received her Ph.D. from Carnegie Mellon University in 2018.

NeurIPS 2019  Dec 8-12, Vancouver, Canada


ICML 2020  July 18, Organizing the 2nd ICML Workshop on Human in the Loop Learning
ODSC 2020  Oct 27, Speak on the Open Data Science Conference
ICML 2019  June 13, Organizing the 1st ICML Workshop on Human in the Loop Learning


Rethinking Distributional Matching based Domain Adaptation

  • Systematically analyze the existing Distributional Matching based DA methods, and find they can only work under simple covariate/label shift with strong assumptions, while they may fail in real-world problems which have Label Distribution Shift or Pseudo Label Distribution Shift.

  • Propose a new instance-based information matching DA algorithm that is more robust to these distributional shifts.

Compositional Few-Shot Learning     

  • Provide a compositional view of the widely adopted FSL baseline model.

  • Based on this view, to imitate humans' ability of learning visual primitives and composing primitives to recognize novel classes, we propose an approach to FSL to learn a feature representation composed of important primitives, which is jointly trained with primitive discovery and primitive enhancing.

Learning Invariant Risks and Representations for Domain Adaptation

  • Derive tighter upper bound for semi-DA, which simultaneously achieves marginal and conditional distribution alignment.

  • Based on this upper bound, we propose the Invariant Risks and Representations Minimization framework to solve the semi-supervised DA as an invariant integrity optimization problem across domains from the informatic point of view.

Multi-source Distilling Domain Adaptatio

Propose a novel multi-source distilling domain adaptation (MDDA) network, which not only considers the different distances among multiple sources and the target, but also investigates the different similarities of the source samples to the target ones.

Topology Adaptive Graph Convolutional Networks

Propose the topology adaptive graph convolutional network, a novel graph convolutional network that generalizes CNN architectures to graph-structured data and provides a systematic way to design a set of fixed-size learnable filters to perform convolutions on graphs. The topologies of these filters are adaptive to the topology of the graph when they scan the graph to perform convolution, replacing the square filter for the grid-structured data in traditional CNNs. It can be used with both directed and undirected graphs.

  • Propose a new generalization bound for domain adaptation when there are multiple source domains with labeled instances and one target domain with unlabeled instances.

  • Propose an efficient implementation of the theoretical results using adversarial neural networks: Learn feature representations that are invariant to the multiple domain shifts while still being discriminative for the learning task.

Long Term Time-Series Transformer   

Time-series forecasting is a long-standing problem in machine learning, and it remains as a sequence to sequence prediction paradigm. Recent works on the Transformer have revealed that the self-attention mechanism improves the sequence alignment performance and easily handles massive input sequences in Natural Language Processing. Inspired by this, we proposed a Long Term Time-series Transformer (LT^3) targeting the long term sequences' prediction. Our model has three distinctive characteristics:

  • Uniform inputs representation, a scaled combination of the scalar projection and time-stamp embeddings,  provides a way to measure quantity change and temporal shift simultaneously;

  • Self-attention distilling, a conv-maxpool operation halving cascading layer inputs, privileges dominating attention compositions and sharply reduces the size of network;

  • Dependency pyramid, subsequently truncated self-attention stacks at cross-scale, allows the encoder to fetch a diversified resolution of self-attention feature map.

Generalized Zero-Shot Learning

  • Generalized zero-shot learning (GZSL) is a challenging class of vision and knowledge transfer problems in which both seen and unseen classes appear during testing. We propose the Dual Adversarial Semantics-Consistent Network (DASCN), which learns primal and dual Generative Adversarial Networks (GANs) in a unified framework for GZSL. In particular, the primal GAN learns to synthesize inter-class discriminative and semantics-preserving visual features from both the semantic representations of seen/unseen classes and the ones reconstructed by the dual GAN. The dual GAN enforces the synthetic visual features to represent prior semantic knowledge well via semantics-consistent adversarial learning.

  • Generalized zero-shot learning for ICD coding, which is essentially the multi-label classification with structural label space.

Understand photo blurs with salient objects segmentation

  • Generate spatially-variant blur responses using fully convolutional neural networks.

  • Understand if such responses are undesired by distilling higher-level image semantics: Learn salient object segmentation map and content feature map to localize important content in the images. 

Deep learning based Environment Understanding for Autonomous Driving

  • Design deep convolutional neural network (DCNN) to detect and understand obstacles around driving vehicles.

  • Hierarchical feature extractor to adapt the network particularly to autonomous driving without overfitting.

  • Finalist Awards for Qualcomm Innovation Fellowship (35 outside 146 teams from Engineering Top 10 Univerisities in U.S.)

Deep Understanding of Urban Traffic from Large-Scale City Cameras

I develop a deep multi-task model to jointly estimate vehicle density, segment foreground and detect vehicles based on fully convolution networks to overcome the challenges of web camera data. Multi-domain adaptation mechanism is explored to adapt the deep model to different cameras and environmental conditions. Filters in each convolution layer are dynamically generated to learn different camera perspectives. Deep spatio-temporal networks are developed to incorporate the temporal information of traffic flow.

Learning deep features for multi-modal inference with robotic data

Develop a multi-task learning scheme based on neural networks for robot action prediction, which is very important step for autonomous robots design.

Develop a Convolutional Variational Auto-Encoder to generate features of percepted images for the robot action prediction, which is capable of capturing the useful statistics of robot actions without requiring large-scale training samples or hand-engineered features.