Publications
Publications from Donghyun Lee AI Group. Underline indicates group members, and * indicates the corresponding author.
2025
-
Reinforced explainable AI for algal bloom forecasting under climate change: A multi-run class activation mapping (CAM) approachDonghyun Lee* and Hyeongseo JeonJournal of Cleaner Production (SCIE, IF = 10.0 / JCR Top 6.0%), 2025Harmful algal blooms (HABs), which are intensifying because of climate change, pose significant threats to ecosystems and public health worldwide. Although deep learning models offer high predictive accuracy, their limited transparency undermines trust in critical decision-making contexts. To address this limitation, we propose a multi-run class activation mapping (CAM) framework that enhances both robustness and interpretability in forecasting chlorophyll-a concentrations—a key indicator of algal blooms. A one-dimensional convolutional neural network was trained on a weekly time-series data—including nutrients, organic matter, and temperature—collected from multiple freshwater monitoring stations between 2016 and 2022. By repeating CAM 1000 times with varied initializations, we reduced the variability typically observed in single-run explanations. This iterative CAM approach generated statistically robust importance scores, consistently identifying chemical oxygen demand, chlorophyll-a, and nutrients (nitrate nitrogen and total phosphorus) as the dominant drivers of bloom formation. Water temperature had a moderate but stable effect, reinforcing the thermal sensitivity of bloom dynamics. Other factors, such as suspended solids and specific phosphorus species, played secondary and context-dependent roles. These findings underscore the importance of nutrient reduction, temperature monitoring, and turbidity management in effective bloom mitigation. By linking statistically grounded CAM simulations with interpretable outcomes, our approach bridges predictive modeling and environmental decision-making, enabling adaptive, evidence-based interventions and supporting more effective HAB policy development.
2024
-
Building reliable AI for quantifying uncertainty in particulate matter predictions with deep learningDonghyun Lee* and Beomhui LeeJournal of Cleaner Production (SCIE, IF = 10.0 / JCR Top 6.0%), 2024Predicting the concentrations of particulate matter (PM) has recently become crucial because of its significant impact on air pollution. As the adoption of artificial intelligence (AI) in this domain increases, ensuring the reliability of AI-driven predictions has become paramount. Although many previous studies have harnessed the power of AI for PM prediction, the inherent uncertainty in the results, which is influenced by spatial data imbalances and varying meteorological factors, poses a challenge. Addressing this uncertainty is central to building reliable AI systems. Therefore, we employed the Monte Carlo dropout (MCDO) approach, which is a technique that integrates dropout layers in neural networks during both the training and inference phases, to estimate the prediction uncertainty. Our objective was to address the challenge of prediction uncertainty in PM concentrations, with the ultimate aim of enhancing the reliability and trustworthiness of AI-driven predictions in air quality forecasting. By applying the MCDO approach to grids formed from multidimensional arrays of air quality and weather data, we obtained a 95% confidence interval for the prediction results, thereby demonstrating the trustworthiness of our model. Our evaluation revealed an R-square greater than 0.97 for PM10 and PM2.5, showcasing the robust and reliable predictive performance of the model. This study highlights not only the accuracy of this approach but also the critical role of quantifying uncertainty in building reliable AI systems. Our findings mark a significant step towards a holistic approach to predictive modeling in which reliability and uncertainty quantification are at the forefront.
2022
-
Integrated explainable deep learning prediction of harmful algal bloomsDonghyun Lee*, Mingyu Kim, Beomhui Lee, Sangwon Chae, Sungjun Kwon, and Sungwon KangTechnological Forecasting and Social Change (SSCI, IF = 13.3 / JCR Top 0.9%), 2022Harmful algal blooms (HABs) can cause serious problems for aquatic ecosystems and human health, as well as massive social costs. Therefore, continuous monitoring and prevention are required. Water quality management is an important task to minimize such algae, and future occurrences can be accurately predicted through optimal water resource management. In this study, we developed a convolutional neural network model using eight water quality variables and four weather variables to predict the concentration of chlorophyll-a in four major Korean rivers. In addition, Deep SHAP was applied to aid in policy decision-making and identify the influence on variables affecting chlorophyll-a. This integrated prediction model showed a 38.01 % reduction in root mean square error and 36.16 % improvement in R-squared compared to the long short-term memory (LSTM) model. This demonstrated the effectiveness of the proposed integrated prediction approach. Furthermore, despite simultaneously predicting HABs at all monitoring stations and training 394 times faster than LSTM-based models, the proposed method exhibited a significant improvement in efficiency and elucidated variable influences that existing models failed to explain. The proposed integrated prediction model can predict HAB spread, identify variable influences to aid decision-makers, and effectively implement preemptive responses, thus reducing economic losses and preserving aquatic ecosystems.
-
Use of artificial intelligence for predicting infectious diseaseSuna Kang and Donghyun Lee*In Big Data Analytics for Healthcare (SCOPUS), 2022Infectious diseases threaten the lives of the entire global population. Some diseases such as SARS and COVID-19 trigger pandemics, as spread from country to country, with severe adverse effects on the medical system, such as shortages in medical professionals and equipment, financial burden, and death. Therefore, it is crucial to predict and respond to the spread of infectious diseases. In this chapter, we reviewed the research related to the prediction models of the spread of infectious diseases, based on various methodologies. Studies that adopt conventional mathematical models, such SIR, SEIR, and agent-based models are considered. In addition, an analysis centered on artificial intelligence, big data, and machine learning methodologies was carried out. Decision-makers should arrive at decisions by considering limitations of modeling infectious diseases. In particular, the internal structure of deep learning is a black box; hence, it difficult to interpret the results. Modeler should transparently provide data collection, coding, and modeling processes, as well as provide information on model uncertainty to help decision-makers create policy decisions. Furthermore, to make scientific and rational decisions based on evidence, considering the geographic information system interpersonal interactions, national, and social environments, decision-makers should refer to epidemiologic data and modeling results.
2021
-
PM10 and PM2. 5 real-time prediction models using an interpolated convolutional neural networkSangwon Chae, Joonhyeok Shin, Sungjun Kwon, Sangmok Lee, Sungwon Kang, and Donghyun Lee*Scientific Reports (SCIE, IF = 3.9 / JCR Top 18%), 2021In this paper, we propose a real-time prediction model that can respond to particulate matters (PM) in the air, which are an indication of poor air quality. The model applies interpolation to air quality and weather data and then uses a Convolutional Neural Network (CNN) to predict PM concentrations. The interpolation transforms the irregular spatial data into an equally spaced grid, which the model requires. This combination creates the interpolated CNN (ICNN) model that we use to predict PM10 and PM2.5 concentrations. The PM10 and PM2.5 evaluation results show an effective prediction performance with an R-squared higher than 0.97 and a root mean square error (RMSE) of approximately 16% of the standard deviation. Furthermore, both PM10 and PM2.5 prediction models forecast high concentrations with high reliability, with a probability of detection higher than 0.90 and a critical success index exceeding 0.85. The proposed ICNN prediction model achieves a high prediction performance using spatio-temporal information and presents a new direction in the prediction field.
2020
-
Complex System Analysis of Korean Peninsula Earthquake DataSangwon Chae, Suyoung Jang, Sangmok Lee, and Donghyun Lee*Scientific Reports (SCIE, IF = 3.9 / JCR Top 18%), 2020Earthquakes are natural disasters that cause damage in a wide range of regions and represent a complex system that does not have a clear causal relationship with specific observable factors. This research analyzes the earthquake activities on the Korean Peninsula with respect to spatial and temporal factors. Using logarithmic regression analysis, we showed that the relationship between the location of the earthquake and its frequency in these locations follows a power law distribution. In addition, we showed that since 1998 the average earthquake magnitude has decreased from 3.0143 to 2.5433 and the frequency has risen by 3.98 times. Finally, the spatial analysis revealed significantly concentrated earthquake activities in a few particular areas and showed that earthquake occurrence points have shifted southeast. This research showed the change in earthquake dynamics and concentration of earthquake activities in particular regions over time. This finding implies the necessity of further research on spatially-derived earthquake policies on the change of earthquake dynamics.
2018
-
Inter-fuel substitution path analysis of the korea cement industrySung-Yoon Huh, Hyejin Lee, Jungwoo Shin*, Donghyun Lee, and Jinyoung JangRenewable and Sustainable Energy Reviews (SCI, IF = 16.3 / JCR Top 2.4%), 2018Many countries have employed various policy measures to reduce industrial CO2 emissions. The cement industry plays a crucial role in emissions reduction because it accounts for a substantial proportion of global emissions. This study analyzes the inter-fuel substitution paths for the cement industry, along with its impacts on emissions reduction. A mixed multiple discrete-continuous extreme value (MDCEV) model is used to accommodate for the heterogeneity of firms’ preferences for fuel mixes. The proposed model is empirically verified using firm-level data collected from 1998 to 2011 for Korean cement production firms. The results show that firms’ marginal utilities from using bituminous coal are still larger than those from other alternative fuels. The determinants of the firms’ alternative fuel choices are different according to the individual fuel types, but the price of bituminous coal has a primary impact, generally speaking. Scenario analysis shows that 10% and 20% increases in bituminous coal prices will lead to roughly 1.30 million and 1.58 million tons of CO2 reduction for the Korean cement industry, respectively. This study analyzes the selection and consumption patterns according to fuel types among cement producers, and also predicts its impacts on emissions reduction. Further, our study also provides policy implications for the government, which plays a crucial role in designing incentives for firms to use alternative fuels more often.
-
Analysing the failure factors of eco-friendly home appliances based on a user-centered approachJungwoo Shin, Suna Kang, Donghyun Lee*, and Bum Il HongBusiness Strategy and the Environment (SSCI, IF = 13.3 / JCR Top 1.1%), 2018Pro‐environmental consumption is necessary for sustainable development, but the sales of eco‐friendly products have been limited. In this regard, the present study analyses the failure factors of eco‐friendly product consumption activation from the consumer’s perspective, specifically focusing on detergent‐free washing machines, which are representative innovative products of eco‐friendly home appliances. This study analyses: (1) the attitude‐behaviour gap that occurs in the consumer decision‐making process, and (2) the consumer preference with respect to the core attributes. A recursive model considering the decision‐making stage was constructed and a mixed logit model was utilized to analyse the preference for the core attributes. As a result, the product compatibility and transfer of expert information must be secured to reduce the attitude‐behaviour gap. Additionally, washing power is an important attribute for improving product saleability. The analysis framework of this study can be used to establish sustainable policies for activating eco‐friendly products.
-
Improved prediction of harmful algal blooms in four Major South Korea’s Rivers using deep learning modelsSangmok Lee and Donghyun Lee*International journal of environmental research and public health, 2018Harmful algal blooms are an annual phenomenon that cause environmental damage, economic losses, and disease outbreaks. A fundamental solution to this problem is still lacking, thus, the best option for counteracting the effects of algal blooms is to improve advance warnings (predictions). However, existing physical prediction models have difficulties setting a clear coefficient indicating the relationship between each factor when predicting algal blooms, and many variable data sources are required for the analysis. These limitations are accompanied by high time and economic costs. Meanwhile, artificial intelligence and deep learning methods have become increasingly common in scientific research; attempts to apply the long short-term memory (LSTM) model to environmental research problems are increasing because the LSTM model exhibits good performance for time-series data prediction. However, few studies have applied deep learning models or LSTM to algal bloom prediction, especially in South Korea, where algal blooms occur annually. Therefore, we employed the LSTM model for algal bloom prediction in four major rivers of South Korea. We conducted short-term (one week) predictions by employing regression analysis and deep learning techniques on a newly constructed water quality and quantity dataset drawn from 16 dammed pools on the rivers. Three deep learning models (multilayer perceptron, MLP; recurrent neural network, RNN; and long short-term memory, LSTM) were used to predict chlorophyll-a, a recognized proxy for algal activity. The results were compared to those from OLS (ordinary least square) regression analysis and actual data based on the root mean square error (RSME). The LSTM model showed the highest prediction rate for harmful algal blooms and all deep learning models out-performed the OLS regression analysis. Our results reveal the potential for predicting algal blooms using LSTM and deep learning.
-
Predicting infectious disease using deep learning and big dataSangwon Chae, Sungjun Kwon, and Donghyun Lee*International journal of environmental research and public health, 2018Infectious disease occurs when a person is infected by a pathogen from another person or an animal. It is a problem that causes harm at both individual and macro scales. The Korea Center for Disease Control (KCDC) operates a surveillance system to minimize infectious disease contagions. However, in this system, it is difficult to immediately act against infectious disease because of missing and delayed reports. Moreover, infectious disease trends are not known, which means prediction is not easy. This study predicts infectious diseases by optimizing the parameters of deep learning algorithms while considering big data including social media data. The performance of the deep neural network (DNN) and long-short term memory (LSTM) learning models were compared with the autoregressive integrated moving average (ARIMA) when predicting three infectious diseases one week into the future. The results show that the DNN and LSTM models perform better than ARIMA. When predicting chickenpox, the top-10 DNN and LSTM models improved average performance by 24% and 19%, respectively. The DNN model performed stably and the LSTM model was more accurate when infectious disease was spreading. We believe that this study’s models can help eliminate reporting delays in existing surveillance systems and, therefore, minimize costs to society.
-
Exploring the dynamic knowledge structure of studies on the Internet of things: Keyword analysisYoung Seog Yoon, Hangjung Zo*, Munkee Choi, Donghyun Lee, and Hyun-woo LeeETRI Journal, 2018A wide range of studies in various disciplines has focused on the Internet of Things (IoT) and cyber‐physical systems (CPS). However, it is necessary to summarize the current status and to establish future directions because each study has its own individual goals independent of the completion of all IoT applications. The absence of a comprehensive understanding of IoT and CPS has disrupted an efficient resource allocation. To assess changes in the knowledge structure and emerging technologies, this study explores the dynamic research trends in IoT by analyzing bibliographic data. We retrieved 54,237 keywords in 12,600 IoT studies from the Scopus database, and conducted keyword frequency, co‐occurrence, and growth‐rate analyses. The analysis results reveal how IoT technologies have been developed and how they are connected to each other. We also show that such technologies have diverged and converged simultaneously, and that the emerging keywords of trust, smart home, cloud, authentication, context‐aware, and big data have been extracted. We also unveil that the CPS is directly involved in network, security, management, cloud, big data, system, industry, architecture, and the Internet.
2017
-
Scale-free network analysis of big data for patent litigation cases in the United StatesDonghyun Lee, Jinhyeong Kim, and Jungwoo Shin*Journal of the Korean Physical Society, 2017This study empirically analyzes the structure and behavior of the patent litigation network in the U.S. by introducing complex network theory characterized as growth and preferential attachment rules. In this study, we draw a log-log plot of the probability distribution for both the plaintiff and the defendant sides and use a log-transform regression to verify that the patent litigation network degree distribution follows a power-law distribution. We also graph the structure of the network to explore the origin of its asymmetrical pattern. In addition, we investigate the behavior of the patent litigation network over time by calculating the Shannon entropy for each year from 2005 to September 2016. We find the power-law degree distribution, and a few hubs of the patent litigation network are like other real-world networks. We also find that the asymmetrical pattern of the patent litigation network is largely driven by non-practicing entities and the major information technology firms. We conclude that the patent litigation network is becoming more asymmetrical over time based on our finding that its Shannon entropy is decreasing.
2016
-
Adoption of green electricity policies: Investigating the role of environmental attitudes via big data-driven search-queriesDonghyun Lee, Minki Kim*, and Jungyoun LeeEnergy Policy (SSCI, IF = 9.2 / JCR Top 1.5%), 2016Despite the rising influence of public opinion on government energy policy formulation and implementation, the roles of pro and/or anti-environmental attitudes among residents have not been empirically examined. To quantify time-varying environmental attitudes among local residents, we exploit geo-specific Google search-query data derived from Internet-based “big data” and verify through ordinary least squares regression outcomes regarding environmental behavior. For the purpose of drawing policy implications, we revisit decisions by state governments of the United States to adopt three well-known green electricity policies: renewable energy portfolio, net metering rules, and public benefit funds. As some states have not yet adopted some (or any) of these policies, unlike previous studies, we handle the issue by examining right-censored data and applying a duration-based econometric method called the accelerated failure time model. We found state residents’ environmental attitudes to have statistically significant roles, after controlling for other traditional time-varying policy adoption factors. Interestingly, the extent to which anti-environmental attitudes affect a state’s policy adoption differs across green energy policies, and knowing this can help a local government formulate better-tailored environmental policy. In particular, researchers can use our method of incorporating citizens’ environmental attitudes to discuss relevant issues in the field of energy policy.
-
Examining the relationship between past orientation and US suicide rates: An analysis using big data-driven Google search queriesDonghyun Lee, Hojun Lee, and Munkee Choi*Journal of medical Internet research (SCIE, IF = 6.0 / JCR Top 6.1%), 2016Background Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. Objective To investigate the relationship between past orientation and suicide rate by examining Google search queries. Methods We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. Results It was found that past orientation had a positive relationship with the suicide rate (P≤.001) and that it improves the goodness-of-Lofit of the model regarding the suicide rate. Unemployment rate (P≤.001 in Models 3 and 4), Gini coefficient (P≤.001), and population growth rate (P≤.001) had a positive relationship with the suicide rate, whereas the gross state product (P≤.001) showed a negative relationship with the suicide rate. Conclusions We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query.