AI Horizons: How machine learning and artificial intelligence will shape Great Lakes observations, modeling, and forecasting in the coming decade

Dates: June, 2024

Steering Committee: Leads: Jing Liu (University of Michigan); Bill Currie (University of Michigan); Silvia Newell (University of Michigan); Scott Steinschneider (University of Michigan); Dani Jones (University of Michigan – CIGLR); Lauren Fry (NOAA GLERL); Lacey Mason (NOAA GLERL); Andrea VanderWoude (NOAA GLERL)

CIGLR Research Theme: Observing Systems & Advanced Technology & Hydrometeorological & Ecosystem Forecasting

Summit Agenda Summit Participants Summit Report

Description: The Great Lakes, as the planet’s largest freshwater reservoir, not only provide drinking water for over 38 million people, but also serve as a vital resource for irrigation, shipping and navigation, fisheries, hydropower, and recreation. In addition, the Great Lakes provide critical habitat for plants, wildlife, and ecosystems vulnerable to global change. Because of their importance, observing, monitoring, and forecasting the hydrology, biogeochemistry, and ecology of the Great Lakes are critical activities. Great Lakes science has been growing and accelerating in recent decades, but several major challenges remain, for example in the areas of comprehensive, process-based analysis of observational data, observing network design, and modeling.

Rapid advances in machine learning (ML) and artificial intelligence (AI) provide an opportunity to transform Great Lakes science in the coming decade. These techniques can be harnessed to address environmental challenges and hazards facing the Great Lakes and the communities they support. The raw ingredients for this transformation are already present – increasingly diverse datasets and data pipelines, networks of domain experts, and a wide array of ML/AI methods are ready to be leveraged. ML refers to a broad set of approaches that include neural networks, clustering, regression, and reinforcement learning, among others; they provide a way for Great Lakes scientists and managers to learn from data and improve predictive accuracy over time. There are opportunities here, for example, to use ML to improve the representation of the Great Lakes in physical and ecological models, as well as to rethink how the models are formulated in the first place. AI includes ML but also includes a much broader set of approaches related to natural language, automation, images and computer vision, robotics, knowledge modeling, and decision making, among others. There may be opportunities to use AI to change how the Great Lakes observing systems are planned and realized, for example in autonomous route planning for observing platforms.

At this CIGLR Summit, we aim to map out the ways in which ML/AI could be used to transform Great Lakes research over the next decade. From designing effective observing networks and enhancing modeling capacities to assessing hazards and risks, assimilating critical data, and developing data pipelines, ML/AI is poised to enable cutting-edge research that drives solutions to pressing challenges within the Great Lakes basin. This must happen in parallel with efforts to improve the accessibility, use, and synthesis of Great Lakes data. There is also a need for understandable AI, ethical AI, and grounding and alignment in the use of AI approaches. Grounding refers to the question of whether AI approaches are based in reality, or a scientific world view, and alignment refers to the question of whether AI approaches work in the ways that humans design them to work and desire them to work.

One potential growth area is that of building a community of collaboration, enabled by open-source tools, cloud-based data storage platforms, effective data and software management, and structured engagement of members of the research community, e.g. establishment of a community of practice. This could mirror the growth experienced by the oceanography and climate community, where developments in open-source Python and cloud-based platforms on which large datasets can be stored and analyzed have paved the way for rapid developments in collaborative oceanographic science.

We have specific high-priority topics areas that we want to tackle. These high-priority topic areas include, but are not necessarily limited to the following:

- Modeling and forecast capacity
- Environmental hazards and risks
- Observational network design and ML/AI
- Data assimilation
- Developing AI approaches and data pipelines
- Understandable/explainable AI, grounding, and alignment
- Building the Great Lakes ML/AI research community

The Great Lakes research community recently identified the major gaps and challenges in Great Lakes science for the coming decade. Among these, ML/AI has clear potential for improving modeling and forecasting systems, as shown by recent advances in ML/AI forecasting in weather prediction. There is also a critical need for a renewed, interdisciplinary focus on the relationship between processes, specifically in a framework that considers how these processes are interconnected. Many unsupervised ML methods are well-suited to generating new hypotheses for how the different components and processes in a system may affect each other. Another critical area is that of observing and monitoring. The current efforts to monitor the Great Lakes, while extremely valuable, face challenges due to geographic sparsity, limited temporal coverage, and gaps in winter data, highlighting the need for a coordinated, robust monitoring network with continuous data collection capabilities to track the system’s high variability and address the significant impacts of climate change on winter conditions and the entire ecosystem. ML developments in observing system design techniques may be especially useful here, as these methods can help make efficient use of the available observing platforms (e.g. buoys, research vessels). To ensure that modeling and forecasting are well-constrained by these observations, we need a focus on data assimilation, especially the data pipelines needed to perform data assimilation.

Goals:

1. Mapping the Transformative Potential of ML/AI in Great Lakes Research: Our primary goal is to map out how machine learning and artificial intelligence can revolutionize Great Lakes research over the next decade, both by targeting critical, immediate gaps in the field as well as developing a vision for what might be possible with this technology. We aim to explore areas across modeling and forecasting, hazard assessment, data assimilation, and observing system design.
2. Building a Collaborative ML/AI Research Community for Great Lakes Science: As an integral part of the summit, we seek to establish the nucleus of a collaborative research community in the realm of machine learning and artificial intelligence for Great Lakes science. This entails leveraging open-source tools, cloud-based data storage platforms, effective data and software management, and structured community engagement. We aspire to align these efforts with larger-scale programs from funders like NOAA and NSF, aiming for mutual benefits and contributing to the broader advancement of AI research in environmental science.