CyberGISX
CyberGISX/CyberGIS-Jupyter is a science gateway for education and reproducible computational geosciences. The platform provides a community, a variety of guided notebooks, and a Jupyter environment. My work has focused on deploying JupyterHub through Kubernetes and redesigning the backend to provide more flexible and reproducible compute environments.
Check out the project:
I’ve also contributed a number of Jupyter notebooks to the CyberGISX Hub community notebooks. A few of those notebooks are linked below:
Login page for the CyberGISX JupyterHub environment
CyberGIS-Jupyter for Water (CJW)
Similar to CyberGISX, CyberGIS-Jupyter for Water (CJW) provides a Jupyter environment for hydrological modeling and analysis that is deeply integrated with the Hydroshare ecosystem. My work has focused on deploying JupyterHub through Kubernetes.
A few links:
Associated Publications
Journal Articles
2023
EasyScienceGateway: A new framework for providing reproducible user environments on science gateways
Concurrency and Computation: Practice and Experience,
2023
Science gateways have become a core part of the cyberinfrastructure ecosystem by increasing access to computational resources and providing community platforms for sharing and publishing education and research materials. While science gateways represent a promising solution for computational reproducibility, common methods for providing users with their user environments on gateways present challenges which are difficult to overcome. This article presents EasyScienceGateway: a new framework for providing user environments on science gateways to resolve these challenges, provides the technical details on implementing the framework on a science gateway based on Jupyter Notebook, and discusses our experience applying the framework to the CyberGIS-Jupyter and CyberGIS-Jupyter for Water gateways.
Conference Proceedings
2023
Streamlined HPC Environments with CVMFS and CyberGIS-Compute
Forum 2023 - Harnessing the Geospatial Data Revolution for Sustainability Solutions,
2023
High-Performance Computing (HPC) resources provide the potential for complex, large-scale modeling and analysis, fueling scientific progress over the last few decades, but these advances are not equally distributed across disciplines. Those in computational disciplines are often trained to have the necessary technical skills to utilize HPC (e.g. familiarity with the terminal), but many disciplines face technical hurdles when trying to apply HPC resources to their work. This unequal familiarity with HPC is increasingly a problem as cross-discipline teams work to tackle critical interdisciplinary issues like climate change and sustainability. CyberGIS-Compute is middle-ware designed to democratize to HPC services with the goal of empowering domain scientists, but a key challenge facing model developers on CyberGIS-Compute is creating a containerized software environment for their models. In this paper, we discuss our work to integrate the Cern Virtual Machine File System (CVMFS) into CyberGIS-Compute to provide consistent software environments across science gateways and HPC resources.
2022
CyberGIS-Jupyter for Water - An Open Geospatial Computing Platform for Collaborative Water Research
Li, Zhiyu,
Michels, Alexander,
Padmanabhan, Anand,
Nassar, Ayman,
Tarboton, David G.,
and
Wang, Shaowen
AGU Fall Meeting Abstracts,
2022
Recent advances in cyberinfrastructure and data science promise to transform how hydrologic analysis and modeling are conducted. However, the computational capabilities needed for this potential transformation still remain only accessible to a small set of domain experts, hampering the engagement and contribution from the broader water research community. We have developed a domain-specific online computing platform, called CyberGIS-Jupyter for Water (CJW), that aims to integrate advanced cyberinfrastructure and geospatial capabilities for serving the broad water science communities. CJW represents a novel cyber-based geospatial information science and systems (cyberGIS) framework for harnessing distributed high-performance computing resources to enable collaborative and large-scale hydrologic analysis and modeling. CJW provides a stack of integrated geospatial software tools and libraries to facilitate collaborative and reproducible workflows that have been made interoperable with HydroShare, a web-based hydrologic data and model sharing platform, to expand community access. This talk presents the design and implementation of CJW, and demonstrates its capabilities with several success stories from users and a case study on computationally intensive hydrologic modeling based on WRF-Hydro.
2021
Towards Reproducible Research on CyberGISX with Lmod and Easybuild (Ext. Abs.)
Proceedings of Gateways 2021,
2021
JupyterHub [1] has become a popular choice in many scientific communities, offering an easy-to-use interface for users with little to no frontend development work while promoting reproducible and replicable (R&R) science [2]. In the broad geospatial science community, CyberGISX [3] provides such a gateway environment with many cyberGIS (i.e., geospatial information science and systems based on advanced cyberinfrastructure) and geospatial software packages prebuilt and ready to use. Like other JupyterHub-based solutions, CyberGISX also provides container-based access for its users and must balance a trade-off between providing a static compute environment which enhances R&R and continuously updating the software environment to keep up with advances in scientific software. Solutions such as Binder [4] have attempted to address this trade-off by having required dependencies encoded in the package and building the software environment at the time of use. However, such a solution comes with two major disadvantages: (a) software is built at the time it is needed, increasing startup time and introducing the possibility that some of the dependencies of the environment are no longer available or have changed; and (b) the onus of specifying and managing software installations is passed to notebook developers, many of whom are domain scientists and not comfortable with such responsibilities. To address these challenges and enhance R&R with minimal effort from end-users, we have designed and implemented a solution on CyberGISX that allows software to be kept on an external file server mounted into each user’s environment. Scientific software is installed with Easybuild [5] and managed by Lmod [6] giving a variety of benefits: (1) the compute environment is more standardized and easily reproducible outside of the gateway; (2) multiple versions of software can be made available to users without increasing container size; and (3) the exact copies of software are always available on the gateway instead of being rebuilt for every release, further enhancing R&R. We also employ an Easybuild-installed Anaconda [7] to create and manage conda environments on the file server. The combination of the software stack from Easybuild and Python environment from conda provides end-users with kernels for their Jupyter notebooks which are persistent and unchanged as the gateway’s container updates. This design enhances R&R and adds functionality for advanced users without introducing technical barriers to non-technical end-users. As such, domain scientists using this solution need not build their own software and specify dependencies, which helps prevent the notebooks they have developed from getting broken by the next software release. This talk explores the new architecture and applications of this solution to CyberGISX [3] and CyberGIS-Jupyter for Water (CJW) [8].
2019
CyberGIS-Jupyter for Spatially Explicit Agent-based Modeling: A Case Study on Influenza Transmission
GeoSim ’19: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on GeoSpatial Simulation,
2019
Despite extensive efforts on achieving reproducible agent-based models (ABMs) to improve the capability of this widely adopted methodology, it remains challenging to reproduce and replicate pre-existing ABMs, due to a number of factors such as diverse computing resources and ABMs platforms. In this study, we propose to employ CyberGIS-Jupyter for spatially explicit ABMs. CyberGIS-Jupyter is a cyberGIS framework to achieve data-intensive, reproducible, and scalable geospatial analytics using Jupyter Notebook based on advanced cyberinfrastructure. Influenza transmission in the city of Miami, Florida, USA was used as a case study. In the model, Influenza is transmitted through the contact networks of individual human agents, which are constructed based on commuting behaviors. CyberGIS-Jupyter can support one not only to conduct collaborative and transparent modeling, but also to perform modeling simulation on advanced cyberinfrastructure resources. It may contribute to boosting the reproducibility and replicability of ABMs.