What to know
Introduction
The US Centers for Disease Control and Prevention has hosted influenza forecasting challenges, annually since the 2013-2014 influenza season, except for 2020-2021, when there was limited influenza activity. This report summarizes the performance of FluSight forecasts from the 2024-2025 U.S. influenza season. Further details on FluSight and the scoring methods can be found at the bottom of the page.
Results
Influenza activity in the United States during the 2024–2025 season exceeded levels observed in recent years and was classified as high severity according to the CDC preliminary in-season severity assessment1, making it the first high severity season since the 2017-2018 season2. Influenza hospitalizations began to increase in mid-November 2024, with an initial spike during the week ending January 4, 2025. Following a temporary decline, activity peaked nationally during the week ending February 8, 2025 and then declined steadily through April 20251. The figure below shows national weekly observed influenza hospital admissions (black points) along with FluSight ensemble forecasts with 50% and 95% prediction intervals (denoted by shading) for six different submission timepoints during the 2024-2025 season (Figure 1). While ensemble forecasts, including the FluSight ensemble (see Methods), have been among the most accurate for influenza and other infectious disease forecasting efforts, they may not always reliably predict rapid changes in disease trends, such as increases observed at the season onset and changes at the peak. This pattern was seen during the 2024-2025 season when the ensemble 50% and 95% prediction intervals failed to anticipate the significant increase in hospitalizations observed in late December 2024 and January 2025. Due to this performance, forecasts were first published to the CDC FluSight webpage on March 5, 2025. This decision was made after it was determined that the ensemble forecast prediction interval reliably captured future observed data (see Results and Supplemental Figure 1)3.
Figure 1: National Ensemble Forecasts

A total of 33 teams contributed forecasts from 46 unique models; of those, 35 models met inclusion criteria for inclusion in this analysis, and were categorized based on their methodologies as statistical, mechanistic, artificial intelligence or machine learning, or ensemble. The following models submitted forecasts during the 2024-2025 season and were included in the FluSight ensemble when submitted, but were not included in this analysis because they did not submit at least 75% of the forecast targets: CADPH-FluCAT_Ensemble, CFA_Pyrenew-Pyrenew_HE_Flu, CFA_Pyrenew-Pyrenew_H_Flu, Google_SAI-FluBoostQR, JHUAPL-Morris, LosAlamos_NAU-CModel_Flu, UGuelph-CompositeCurve, and UVAFluX-OptimWISE. A baseline model that carries forward the prior week's number of hospital admissions was generated for comparison purposes (see Methods).
Forecasts were scored primarily using the relative weighted interval score (WIS), a forecast skill metric that measures how consistent a collection of forecast prediction intervals is with observed data. This metric is calculated relative to the baseline model, with a value below 1 representing a forecast that was more accurate than the baseline. Forecasts were also scored on coverage, or how often the prediction interval contained the eventually observed value.
The FluSight ensemble, which is the model used by CDC to communicate forecast messaging, outperformed all submitted models in terms of the primary performance metric (average relative WIS across the season). Among the individual team submissions, the top performing model was PSI-PROF_beta4. Notably, the top 10 performing models had similar relative WIS values, with the top 10 all falling within 0.04 of each other (Table 1). Of the 36 submitted models, 27 performed better than the baseline model (Table 1).
Table 1: Results for included models
Relative WIS values varied by jurisdiction. Table 2 shows how each model performed for each jurisdiction. The FluSight ensemble had the smallest range in relative WIS values across jurisdictions and performed better than the baseline in each jurisdiction.
Table 2: Relative WIS by jurisdiction and model
Three models, including the FluSight-lop_norm ensemble, had 95% coverage values over 90% which indicates that the forecast captured at least 90% of future observed values. (Table 1 and Figure 2). The lowest coverage for the FluSight ensemble occurred on January 4, 2025, which aligns with the first peak in influenza hospital admissions, with just 6% of the 2-week horizon forecast prediction intervals across jurisdictions containing observed values. Another drop in coverage was seen in early February, aligning with the second and highest peak in influenza hospitalizations of the season (Figure 2). FluSight ensemble coverage stabilized at values near 95% starting in March 2025. Models with high coverage values were often, but not always, the models with the lowest relative WIS scores.
Methods
CDC solicited weekly forecasts from academic, industry, and government forecasting teams from November 20, 2024, through May 31, 2025. The main forecasting target was weekly influenza hospital admissions for the current week and up to three weeks in the future for the US, each state, Puerto Rico, and Washington DC. Target data, Weekly Hospital Respiratory Data Metrics by Jurisdiction, were downloaded from National Healthcare and Safety Network and data.cdc.gov5. Final target data used for scoring in this analysis were published June 25, 2025. Forecasts were first published to the CDC FluSight webpage on March 5, 2025, after demonstrating sufficient coverage3.
Each week, CDC created a baseline forecast (FluSight baseline) for comparison and an ensemble forecast (FluSight ensemble). The FluSight baseline model forecasted a median incidence equal to that of the last week with uncertainty based on observation noise. Methods for creating the baseline have been described previously6. The FluSight ensemble, which was used for CDC's influenza forecasting messaging, is an ensemble that takes the median forecast from models self-designated for ensemble inclusion. The FluSight-lop_norm ensemble is a linear opinion pool of submitted eligible models with its distribution tails estimated by a Gaussian normal distribution7. Both ensembles were created using the hubEnsembles R package8.
Prior to forecast submission, teams were required to submit model metadata which included information about methods and whether the model should be included in the FluSight ensemble9. Models were categorized based on model components including statistical (STAT), mechanistic (MECH), and artificial intelligence or machine learning (AI/ML). Components were determined based on submitted methods listed in the metadata. Models with "mechanistic", "SEIR", "SIR", "SLIR", "km27", "compartment", "renewal", or "dynamics" in the description were considered to have mechanistic components, models that mentioned "lstm", "random forest", "generative", "GRB", "SVM", "lightGBM", or had key words including "deep", "neural", or "machine learning" were considered to have AI/ML components, and methods that explicitly stated "statistical", "time-series", "ARIMA", "Bayesian", "regression", "holt", or "random walk", or were not classified as having either mechanistic or AI/ML were classified as having statistical components. Components were not considered mutually exclusive. Whether a model was an ensemble of other models was also included in categorization. Ensembles were explicitly reported by each team. This information was not used for scoring or analysis purposes but is included in Table 1 for reference.
For the analysis, we excluded models that submitted less than 75% of the total forecasts from all weeks and jurisdictions, national forecasts, and forecasts submitted on January 25, 2025. Forecasts were also excluded for Washington from May 17, 2025, through May 31, 2025, Arizona for January 18, 2025, and Puerto Rico for the entire season. Table 1 includes the percentage of forecasts submitted for each model for the targets included in this analysis.
Forecasts were evaluated on relative WIS using transformed counts with a natural logarithm to minimize the impact of count magnitude across jurisdictions10. WIS is a proper score that measures how consistent a collection of forecast prediction intervals is with log-transformed observed data. A lower value represents a better forecast. WIS is often calculated relative to the baseline to give relative WIS (values <1 are considered better than the baseline). Relative WIS was calculated using the geometric mean WIS of each model forecast compared to the geometric mean WIS of the corresponding FluSight baseline model forecast10. Using the geometric mean allows for a more direct comparison of models even when not all models submit all or most forecast jurisdictions. This prevents penalization against models that may have submitted forecasts for all jurisdictions or jurisdictions that may be inherently more difficult to forecast.
Forecasts were also evaluated on 50% and 95% coverage. Coverage is a measure of how often the prediction interval captures the eventually observed values or the percentage of times a prediction interval correctly contains the observed value. WIS and coverage were calculated using the scoringutils R Package11.
All analyses were performed in R version 4.4.012.
Supplemental Figure 1: National Ensemble Forecasts of Influenza Hospitalizations

- Influenza Division Centers for Disease Control and Prevention 2024-2025 United States Flu Season: Preliminary In-Season Severity Assessment. 2025; Available from: https://www.cdc.gov/flu/php/surveillance/in-season-severity.html.
- Influenza Division Centers for Disease Control and Prevention Past Flu Season Severity Assessments. 2024; Available from: https://www.cdc.gov/flu/php/surveillance/past-seasons.html.
- Influenza Division Centers for Disease Control and Prevention Flu Hospital Admissions as of March 5, 2025. 2025; Available from: https://www.cdc.gov/flu-forecasting/data-vis/03052025-flu-forecasts.html.
- Ben-Nun, M., et al., Forecasting national and regional influenza-like illness for the USA. PLoS Comput Biol, 2019. 15(5): p. e1007013.
- Weekly Hospital Respiratory Data (HRD) Metrics by Jurisdiction, National Healthcare Safety Network (NHSN) (Preliminary). Available from: https://data.cdc.gov/Public-Health-Surveillance/Weekly-Hospital-Respiratory-Data-HRD-Metrics-by-Ju/mpgq-jmmr/about_data.
- Mathis, S.M., et al., Title evaluation of FluSight influenza forecasting in the 2021-22 and 2022-23 seasons with a new target laboratory-confirmed influenza hospitalizations. Nat Commun, 2024. 15(1): p. 6289.
- Howerton, E., et al., Context-dependent representation of within- and between-model uncertainty: aggregating probabilistic predictions in infectious disease epidemiology. J R Soc Interface, 2023. 20(198): p. 20220659.
- Krystalli, A., Ray, E.L., Shandross, L. hubEnsembles: Ensemble methods for combining hub model outputs. 2023; Available from: https://github.com/infectious-disease-modeling-hubs/hubEnsembles.
- CDC FluSight Team FluSight Forecast Hub. 2024; Available from: https://github.com/cdcepi/FluSight-forecast-hub.
- Bosse, N.I., et al., Scoring epidemiological forecasts on transformed scales. PLoS Comput Biol, 2023. 19(8): p. e1011393.
- Bosse, N., I., Gruson, H., Funk, S., Cori, A., van Leeuwen, E., Abbot, S., Evaluating Forecasts with scoringutils in R. 2022.
- R Core Team R: A Language and Envrionment for Statistical Computing. 2024; Available from: https://www.R-project.org/.