Photo of Haseeb Shah
:haseeb

Haseeb Shah

I am a Ph.D. student at the RLAI lab at University of Alberta advised by Prof. Martha White. My research focuses on foundational topics in Reinforcement Learning such as policy gradient algorithms, general value functions and recurrent learning as well as applying reinforcement learning to solve real-world problems in physical systems such as automating drinking water-treatment plants and controlling laser wakefield accelerators

Work Experience

RL Core Technologies

Jan 2024 - Aug 2024

Machine Learning Intern

Canada

University of Alberta

Feb 2023 - Jan 2024

Research Assistant (Fulltime)

Canada

LAVIS Lab, Hochschule RheinMain

Jan 2020 - Dec 2020

Research Assistant

Remote

DCube Tech.

Mar 2020 - Jul 2020

Machine Learning Engineer

Pakistan

LAVIS Lab, Hochschule RheinMain

Jun 2018 - Aug 2018

DAAD Research Intern

Germany

TUKL-NUST R&D Center

Jun 2017 - Dec 2017

Undergraduate Research Intern

Pakistan

Education

University of Alberta

September 2024 - Present

Ph.D. in Computer Science

Supervisor: Martha White

University of Alberta

2021 - 2023

M.Sc. in Computer Science

CGPA: 4.00/4.00

Supervisor: Martha White

Thesis: Greedy Pruning for Continually Adapting Networks

National University of Sciences and Technology

2015 - 2019

Bachlors of Engineering in Software Engineering

Supervisor: Faisal Shafait

Thesis: An Open-World Extension for Knowledge Graph Completion Models

Publications

q-Exponential Family for Policy Optimization

ICLR 2025 Equal contribution

L. Zhu*, H. Shah*, H. Wang*, M. White

We explore the effectiveness of q-exponential policies in policy optimization methods, finding that heavy-tailed policies (q > 1) are generally more effective and can consistently outperform the Gaussian policy.

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

JMLR 2023

K. Javed, H. Shah, R. Sutton, M. White

We show that by either decomposing the network into independent modules or learning a recurrent network incrementally, we can make RTRL scale linearly with the number of parameters. Unlike prior scalable gradient estimation algorithms, our algorithms do not add noise or bias to the gradient estimate.

GVFs in the Real World: Making Predictions Online for Water Treatment

MLJ 2023

K. Janjua, H. Shah, M. White, E. Miahi, M. C. Machado, A. White

We propose a framework for making accurate predictions on a real world water treatment plant based on the General Value Functions. This work is one of the first to motivate the importance of adapting predictions in real-time, for non-stationary high-volume systems in the real world.

An Open-World Extension for Knowledge Graph Completion Models

AAAI 2019 Oral

H. Shah, J. Villmow, A. Ulges, U. Schwanecke, F. Shafait

We propose an extension that enables any existing Knowledge Graph Completion model to predict facts about the open-world entities. This approach is more robust, more portable and has better performance than the published state of the art on most datasets. We also released a new dataset that overcomes the shortcomings of previous ones.

Under Review

Stable learning in deployment while mitigating performance degradation in offline-to-online RL

RLC 2026 Under Review

H. Wang, H. Shah, A. White, M. White

We introduce a new fine-tuning algorithm that gradually allows more exploration based on off-policy estimates of performance.

Symmetric Behavior Policy Optimization

ICML 2026 Under Review

L. Zhu, H. Shah, C. Zheng, N. Yukie, M. White

We study Symmetric Divergences for Behavior Regularized Policy Optimization and propose Symmetric 𝑓-Actor Critic, which avoids per-environment failures encountered by other offline Reinforcement Learning methods.

Workshops and Preprints

Relation Specific Transformations for Open World Knowledge Graph Completion

COLING 2020

H. Shah, J. Villmow, A. Ulges

We introduced relation specific transformations to substantially improve the performance of Open World Knowledge Graph Completion models. We also proposed an approach for clustering of relations to reduce the training time and memory footprint.

Distillation Techniques for Pseudo-rehearsal Based Incremental Learning

2018 Preprint

H. Shah, K. Javed, F. Shafait.

Standard neural networks suffer from catastrophic forgetting when they are trained on incrementally arriving stream of i.i.d. data. To combat this forgetting, one approach is to train GANs on previously arrived data and feed it to the network again. In this paper, we highlighted that this method is biased and proposed an approach to mitigate this bias and reduce the effect of catastrophic forgetting.

Public Talks

Online Feature Decorrelation

AMII 2022 Talk

A significant proportion of the representations learned by the current generate & test methods consist of highly redundant features. This talk demonstrates how the feature ranking criteria utilized by these methods are highly ineffective in addressing this problem. In this talk, I present a new approach for decorrelating features in an online setting. I demonstrate that this decorrelator can effectively eliminate redundant features and produce a statistically significant performance improvement in the low-capacity function approximation setting.