
Case: Implementing deep reinforcement learning based demand forecasting for a leading global semiconductor components distributor who has to deal with a huge number of short life cycle products with very different demand patterns and potential perishability concerns. The organization serves as a franchise partner for hundreds of suppliers worldwide. The distributor equipped with a strong service team provides a range of value-added services such as product information, demand generation, turnkey solution, technical support, after-sales services, warehousing, logistics, e-commerce, and credit to different customers including OEM (Original Equipment Manufacturer), ODM (Original Design Manufacturer), EMS (Electronic Manufacturing Services), and SME (Small and Medium Enterprises). With orders from customers all over the world, the distributor must satisfy thousands of product demands with local flexibility.
Demand forecasting plays an important role for smart production in semiconductor supply chains due to long lead-time for capacity expansion, shortening product life cycles and pervasive uncertainty.
Some of the traditional forecasting techniques have limitations such as lack of data for estimating parameters for employing multivariate models (it is also very time-consuming when there are many products involved) and aggregating demand at monthly or quarterly levels reduces supply chain agility needed for smart production. Reinforcement learning (RL) enables agents to learn to select an action that maximizes the reward from the interaction with environment. High dimensional state of most real-world problems make it challenging for RL to derive efficient representation of the environment. Additionally the use of completely raw inputs for learning optimal control policies create additional limitations in terms of memory complexity and computational complexity. With increasing computing power being widely available, neural network and deep learning are providing effective solutions to overcome these obstacles. Deep Reinforcement Learning (DRL) is a combinational agent of deep learning and the convolutional reinforcement learning that can learn to optimize a sequential decision making under demand uncertainty through the environment state.
It is important to design a forecasting selection mechanism that may integrate a number of forecasting approaches. The selection of forecasting model has limitations. In particular, rule-based approach may have the problem of overfitting or insufficient definition. Feature-based selection is highly dependent on the feature formulation and selection step that is time-consuming. Since the product quantities and types for a semiconductor distributor are quite large, it is not suitable to apply the rule-based method due to the difficulty to define all the required rules. In addition, as demand fluctuation is high in the semiconductor industry, the feature formulation will be complex or insufficient due to diverse features and thus affect the capability for further forecasting. For the ensemble model, since the weight assigned to each predictor will not be updated in each step, it may have limitations for forecasting the products with rapid changing time-series patterns such as high-tech products with shortening product life cycles. The DRL model provides the advantage that it can select the forecasting model automatically and update the chosen model dynamically through the interaction with the environment.
The implementation involves six phases - (1) defining the problem scope; (2) identifying gaps in decision quality; (3) establishing objectives, considering alternatives, and identifying uncertainty Influences; (4) assessing and outlining likely outcomes; (5) conducting holistic evaluations and value assessments; and (6) assessing trade-offs between attributes and making a choice.
Defining the Problem Scope
The framework begins with defining and structuring the problem at hand. Both the decision maker and the context are central to this process. For this case, the decision content revolves around estimating future demand, while the decision maker is the demand-planning manager. This individual responds to demand forecasting for a semiconductor distributor, relying on both past experience and customer forecasts.
To meet the global demand of customers, a semiconductor distributor must place timely and accurate orders with suppliers based on these demand forecasts. However, challenges arise from the semiconductor industry's relatively short product life cycle, long lead times, and the high volatility in supply and demand. These factors make precise forecasting particularly difficult. Overestimating demand often leads to excess inventory and high storage costs, while underestimating demand increases the risk of stockouts. Therefore, the decision maker requires a reliable, efficient, and accurate method to forecast future demand and mitigate unnecessary inventory costs.
Identifying Gaps in Decision Quality
Semiconductor components are known for having diverse demand patterns, some of which exhibit significant fluctuations due to the short product life cycles inherent in the industry. Many companies lack advanced data acquisition systems that can collect sufficient data for comprehensive, multifactorial analyses. As a result, the focus is on improving decision quality using univariate data.
Since future demand for various semiconductor products must be forecasted, and given that each forecasting method has its own strengths and limitations, it is essential to automatically select the appropriate forecasting model for each product. By combining analytical insights into demand with expert knowledge from domain specialists, companies can make more informed and intelligent decisions within a smart supply chain. Furthermore, due to the rapid changes in time series data linked to the short product life cycle, choosing the right forecasting model at different time intervals becomes a critical challenge.
Establishing Objectives, Considering Alternatives, and Identifying Uncertainty Influences
After defining the problem and identifying key areas for improvement, relevant demand data is prepared. Using the theoretical demand pattern classification rule, demand patterns are classified into four groups to enhance the forecasting model’s prediction accuracy. These categories are based on two key thresholds: inter-demand intervals (IDI) and the coefficient of demand variation. The four groups are smooth (low IDI and low CV), erratic (low IDI and high CV), intermittent (high IDI and low CV), and lumpy (high IDI and high CV).
To address demand uncertainty, this step explores the relationships between various sources of demand uncertainty, considering the unique characteristics of each forecasting model. A comparison of different models, each with its own properties, is performed and integrated into a comprehensive strategy. Specifically, traditional forecasting methods such as the Naïve forecast, simple moving average, single exponential smoothing, and the Syntetos-Boylan approximation (SBA) are evaluated. Additionally, three machine learning approaches—artificial neural network (ANN), recurrent neural network (RNN), and support vector regression (SVR)—are selected for comparison.
Assessing and Outlining Likely Outcomes
Deep Reinforcement Learning (DRL) is utilized to evaluate the potential outcomes of various actions, enabling effective learning by combining deep neural networks with reinforcement learning (RL). Given that demand forecasting is performed at a weekly granularity, Deep Q-Network (DQN), a DRL model well-suited for discrete states, is selected to establish a model selection mechanism.
Traditional Q-learning stores the discrete states in a table, but it faces limitations when it comes to extracting meaningful features from high-dimensional data, especially as the number of states grows. To overcome this issue, DQN employs stochastic gradient descent within a deep neural network to update the model, instead of storing large volumes of Q-values in a table. This approach makes DQN ideal for learning the complex and stochastic demand patterns of semiconductor products and selecting the most appropriate forecasting method. The agent’s objective is to maximize the cumulative future reward by optimizing the action-value function.
Conducting Holistic Evaluations and Value Assessments
To evaluate the overall performance of the selected methods, Mean Absolute Scaled Error (MASE) is used as the primary metric. Given the high fluctuations in the demand patterns, some periods in the time series may contain zero values. The advantage of using MASE is that it accommodates the presence of zeros in the historical demand data.
Assessing Trade-offs Between Attributes and Making a Choice
The decision maker will make the ultimate decision based on the performance evaluations from the previous stage. The proposed approach, with its self-learning function, will update periodically and select the method that minimizes forecast error. The quantified forecasting results, along with the measurement metrics, will serve as a reference for the demand-planning manager, who can then adjust the final demand forecast using their domain expertise and insights.
OVERALL FINDING
The DRL model is compared with the seven forecasting models, the ensemble model that combines the forecasting results from the seven forecasting model with the simple average, and with Support Vector Machine whose input is the same as the observed state in the proposed DRL model. The output is the selected forecasting model. The DRL model had the lowest MASE for the all demand scenarios - smooth, erratic, intermittent, and lumpy.