How the Confidence Score is Calculated

The confidence score has remained unchanged since its initial implementation in 2017. It’s well due for an update and is something we would like to improve.

The confidence score is a percent score out of 100, indicating whether a sensor’s laser counters appear to be reading properly and in line. It’s only calculated for sensors with two channels of PM data. Otherwise, it’s set to a default value of 30.

Confidence for sensors can be viewed using the sensor confidence data layer on the PurpleAir Map and can be obtained from our API. It’s not available for local data collection, like SD data or accessing sensor JSON, but can be calculated using the information below.

An overview of confidence and how it affects markers on the PurpleAir Map is found in the following article: The Confidence Score.

How the Confidence Score Works

On a high level, the confidence score is calculated using PM data from both channels. This score is then lowered based on downgrades. Confidence considers both automatically-calculated downgrades and manual downgrades applied by PurpleAir staff.

The steps for calculating confidence are as follows:

  1. Channel sums are calculated for channel A and channel B using pseudo averaged data.
  2. Automatic downgrades are set for channels A and B. This is done if either:
    • The channel sum is over 2000.
    • The channel sum is more than 10 times the other (only the higher reading channel is downgraded).
  3. The initial confidence score is calculated using only the channel sums
    • Initial confidence is lowered if one or more channel sums are above 2000.
  4. Both confidence_auto and confidence_manual are calculated by lowering the initial confidence using automatic and manual downgrades, respectively.
    • initial_confidence * (100 - (40 * number_of_downgraded_channels)) / 100;
  5. channel_flags reflecting downgrades are set.
  6. The final confidence value is set
    • confidence = (confidence_auto + confidence_manual) / 2

Implementation

The following code calculates the confidence score, with some minor adjustments for readability.

public class ConfidenceCalculator {
  public static final int MAX_PM = 2000;

  public static double getZeroIfNodeIsNull(final JsonNode node) {
    if (node == null) {
      return 0;
    } else {
      return (double)node.asDouble();
    }
  }

  /*
  *  This is where we start the confidence calculation
  *  
  *  objA -- stats_a -- real-time pseudo averaged data for channel A
  *  objB -- stats_b -- real-time pseudo averaged data for channel B
  */
  public static void calculateFlags(final Object e, int channelState, int channelFlagsManual, ObjectNode objA, ObjectNode objB){

    ObjectNode performance = calculatePerformance(objA, objB, channelState, channelFlagsManual);

    Integer confidenceAuto = null;
    Integer confidenceManual = null;
    int channelFlagsAuto = 3;
    // It starts as flags = 3, all bad.
    if (performance.get("channel_flags_auto") != null) {
      channelFlagsAuto = performance.get("channel_flags_auto").asInt();
    }
    if (performance.get("confidence_manual") != null) {
      confidenceManual = performance.get("confidence_manual").asInt();
    }
    if (performance.get("confidence_auto") != null) {
      confidenceAuto = performance.get("confidence_auto").asInt();
    }

    int channelFlags = channelFlagsManual;
    if (channelFlagsManual == 0) {
      channelFlags = channelFlagsAuto;
    }

    Integer confidence;
    if (confidenceManual == null && confidenceAuto == null) {
      confidence = 30;
    } else if (confidenceManual == null) {
      confidence = confidenceAuto;
    } else if (confidenceAuto == null) {
      confidence = confidenceManual;
    } else {
      confidence = (confidenceManual + confidenceAuto) / 2;
    }
  }

  // Calculates confidence_manual and confidence_auto
  public static ObjectNode calculatePerformance(final ObjectNode objA, 
      final ObjectNode objB, final int channelState, final int channelFlagsManual) {

    // Initialize our perf object holding confidence values
    ObjectMapper objectMapper = new ObjectMapper();
    ObjectNode perf = (ObjectNode) objectMapper.readValue("{}", JsonNode.class);
    // Return an object with no confidence values if there's no PM data
    if (objA == null && objB == null) {
      return perf;
    }

    float pmA = 0;
    if (objA != null) {
      JsonNode jsA = objA.get("pm2.5_atm"); // pm2.5_atm_a
      if (jsA != null) {
        pmA = jsA.floatValue();
      }
    }
    float pmB = 0;
    if (objB != null) {
      JsonNode jsB = objB.get("pm2.5_atm"); // pm2.5_atm_b
      if (jsB != null) {
        pmB = jsB.floatValue();
      }
    }

    double totala = pmA;
    double vA = 0;
    if (objA != null) {
      vA = getZeroIfNodeIsNull(objA.get("pm2.5")); // pm2.5_a
      totala += vA;
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_10minute")); // pm2.5_10minute_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_30minute")); // pm2.5_30minute_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_60minute")); // pm2.5_60minute_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_6hour")); // pm2.5_6hour_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_24hour")); // pm2.5_24hour_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_1week")); // pm2.5_1week_a
    }

    double totalb = pmB;
    double vB = 0;
    if (objB != null) {
      vB = getZeroIfNodeIsNull(objB.get("pm2.5")); // pm2.5_b
      totalb += vB;
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_10minute")); // pm2.5_10minute_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_30minute")); // pm2.5_30minute_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_60minute")); // pm2.5_60minute_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_6hour")); // pm2.5_6hour_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_24hour")); // pm2.5_24hour_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_1week")); // pm2.5_1week_b
    }

    // Calculate automatic downgrades
    int channelFlagsAuto = 0;
    if (vA > MAX_PM || objB != null && totala > (totalb * 10)) {
      channelFlagsAuto = channelFlagsAuto + 1;
    }
    if (vB > MAX_PM || objA != null && totalb > (totala * 10)) {
      channelFlagsAuto = channelFlagsAuto + 2;
    }
    perf.put("channel_flags_auto", channelFlagsAuto);

    if (objA != null && objB != null) {

      // calculate initial confidence
      int confidence;
      confidence = (int) getConfidence(totala, totalb);

      // Lower confidence for abnormally high readings
      if (pmA > MAX_PM && pmB > MAX_PM) {
        confidence = confidence - 40;
      } else if (pmA > MAX_PM) {
        confidence = confidence - 60;
      } else if (pmB > MAX_PM) {
        confidence = confidence - 60;
      }

      // Calculate confidence_auto by adjusting for automatic downgrades
      perf.put("confidence_auto", getConfidenceAdjusted(confidence, channelState, channelFlagsAuto));
      // Calculate confidence_manual by adjusting for manual downgrades
      perf.put("confidence_manual", getConfidenceAdjusted(confidence, channelState, channelFlagsManual));
    }

    return perf;
  }  

  // Gets the initial confidence between two channel sums
  private static double getConfidence(final double a, final double b) {
    double diff = Math.abs(a - b);
    double avg = (a + b) / 2;

    double pc = Math.round(diff / avg * 100 / 1.6);
    pc = pc - 25;
    if (pc < 0) {
      pc = 0;
    }

    double npcx;
    npcx = 100 - pc;
    if (npcx < 0) {
      npcx = 0;
    }

    return npcx;
  }

  // Adjusts confidence based on downgrades (channelFlags)
  private static int getConfidenceAdjusted(final int conf, final int state, final int channelFlags) {
    int c = conf;
    if (c < 0) {
      c = 0;
    }

    if (state == 3 && channelFlags > 0) {
      int correction = 1;
      if (channelFlags > 2) {
        correction = 2; 
      }
      return c * (100 - (40 * correction)) / 100;
    }
    return c;
  }
}

2 Likes

I hope you find these suggestions valuable, even if they are potentially too complex or expensive to implement :slight_smile:

Also, I just wanted to say that I do not hold any technical credit for this work. It’s the polished result of three iterations of ChatGPT o1’s output :robot:

Proposed Improvements to the Confidence Calculation System

The current confidence calculation system for sensors uses static logic, fixed thresholds, and constant penalties. This approach is limited because it does not adapt to environmental dynamics, does not effectively distinguish between normal fluctuations and significant anomalies, and does not leverage historical data or advanced statistical or modeling techniques.

Below we present a set of technical proposals to improve the confidence calculation, integrated with a modular, gradual, and more thoroughly documented approach. The goal is to make the system more robust, adaptive, interpretable, and capable of correctly reflecting real data variability, while also identifying potentially defective sensors.

Possible Technical Improvements

  1. Dynamic Thresholds Generated by Robust Statistics:
    Instead of using a fixed MAX_PM, it is possible to define normality thresholds based on robust statistical metrics, such as the median and MAD (Median Absolute Deviation) or robust mean and standard deviation. These thresholds can be periodically updated using historical data, resulting in less arbitrary, more adaptive criteria that evolve with changing environmental conditions.

  2. Temporal Data Filtering and Modeling:
    Before computing the confidence, median filters can be applied to reduce noise and stabilize the data. In subsequent phases, Kalman filters, ARIMA models, or other predictive methods can be introduced to estimate expected values, compare them with observed values, and identify abnormal deviations.
    This approach makes it easier to distinguish between normal fluctuations and unusual patterns, ensuring greater reliability.

  3. Historical Channel Weighting and Gradual Penalties:
    Not all channels are equally stable over time. Their historical reliability can be assessed and used to assign different weights: a more stable channel will have a greater impact on the final confidence. Furthermore, instead of rigid penalties (linear score cuts), sigmoidal functions or penalties proportional to anomaly severity can be introduced, making confidence reduction smoother.

  4. Advanced Correlation and Stability Metrics:
    Analyzing the temporal correlation between channels and measuring signal entropy can help detect suspicious situations. If two channels that should be consistent over time display diverging patterns, this may indicate a problem with one of the sensors.

  5. Contextualization with External Data:
    Integrating environmental data (humidity, temperature, weather conditions), contextual knowledge (time of day, seasonality), or other external variables makes it possible to understand whether a PM2.5 spike is plausible or not. A sensor that records sudden anomalies under inconsistent environmental conditions will see its confidence reduced. The use of multivariate regression or ML models allows for estimating context-aware expected values.

  6. Machine Learning and Bayesian Updating:
    ML models (e.g., Random Forest, Gradient Boosting) can be trained on labeled historical data to learn how to discriminate between reliable and anomalous readings. Bayesian approaches also allow for dynamically updating confidence by continuously integrating new observations and prior knowledge.

Gradual Approach, Parameterization, and Maintenance

To avoid introducing an overly complex and hard-to-maintain system all at once, it is advisable to proceed in phases:

  • Phase 1: Integrate median filters, dynamic thresholds based on robust statistics, and a detailed logging system. This will ensure greater stability compared to a simple fixed threshold and will help in better understanding system behavior.

  • Phase 2: Introduce a simple forecasting model (e.g., ARIMA), compare the results with the baseline system, and evaluate the benefits before moving on to the next phase.

  • Phase 3: Integrate supervised Machine Learning models using labeled historical data. These models can provide more sophisticated confidence evaluations but require a training, validation, and maintenance phase. It is advisable to include periodic retuning mechanisms to keep the model updated to new environmental or usage conditions.

Parameters, Tuning, and Explainability

All parameters (filtering windows, threshold factors, maximum penalties, channel weights, statistical and ML model parameters) must be clearly documented:

  • Define a Tuning Protocol:
    Use historical data and, where available, labeled datasets to optimize parameters. Measure accuracy metrics (false positive rate, false negative rate, correlation with reference data) and adjust the parameters accordingly.

  • Fallback and Redundancies:
    Consider scenarios in which external data or ML models are unavailable and implement fallback strategies, reverting to simpler yet reliable methods.

  • Detailed Logging:
    Each phase of the confidence calculation should be logged to enable post-hoc analysis. This makes it possible to explain, after the fact, why the confidence assumed a certain value, ensuring the system is transparent and maintainable.

Robustness to Missing Data and Outliers

  • Data Sanitization:
    Before computing the confidence, clean the data of implausible values (beyond physically acceptable ranges). Sensors that are partially non-functional or channels with poor data quality should proportionally reduce the confidence.

  • Capping and Extreme Value Handling:
    Prevent extreme outliers from compromising the statistics by applying capping to values or treating them as missing if they exceed physically improbable thresholds.

Conclusions

By adopting this more sophisticated, incremental, and documented approach, the confidence calculation system becomes:

  • More Adaptive: Capable of responding to new environmental conditions and continuously evolving data.
  • More Robust: Less sensitive to anomalous peaks and extreme values, thanks to filtering and robust statistics.
  • More Transparent: Through detailed logging, documented parameters, and the ability to explain decisions.
  • Modular and Maintainable: By gradually integrating components and models, it’s possible to evaluate the effectiveness of each step and avoid unnecessary complexity.

Pseudocode

// -----------------------------------------------------
// Main function for computing confidence
// -----------------------------------------------------
function compute_confidence(
    currentA, currentB,
    historyA, historyB,
    externalData,
    modelPredictor,     // can be ARIMA, ML, or null if not available
    kalmanA, kalmanB,   // Kalman filters or null if not used
    parameters,
    logger              // an object for detailed logging
):

    // STEP 0: Pre-validation and Data Sanitization
    currentA = sanitize_input(currentA, parameters.maxPhysicalPM)
    currentB = sanitize_input(currentB, parameters.maxPhysicalPM)
    historyA = sanitize_history(historyA, parameters.maxPhysicalPM)
    historyB = sanitize_history(historyB, parameters.maxPhysicalPM)

    channelAvailabilityFactor = compute_channel_availability_factor(historyA, historyB)
    logger.log("ChannelAvailabilityFactor", channelAvailabilityFactor)

    // STEP 1: Preprocessing & Filtering
    // Apply median filters to reduce noise and stabilize data
    filteredA = median_filter(append(historyA, currentA), window=parameters.medianWindow)
    filteredB = median_filter(append(historyB, currentB), window=parameters.medianWindow)

    if kalmanA is not null:
        smoothedA = kalmanA.apply(filteredA)
    else:
        smoothedA = filteredA

    if kalmanB is not null:
        smoothedB = kalmanB.apply(filteredB)
    else:
        smoothedB = filteredB

    currentFilteredA = last(smoothedA)
    currentFilteredB = last(smoothedB)

    logger.log("CurrentFilteredA", currentFilteredA)
    logger.log("CurrentFilteredB", currentFilteredB)

    // STEP 2: Compute Historical (Robust) Statistics
    // For example, use median and MAD or mean and robust std
    meanA, stdA = robust_stats(historyA)
    meanB, stdB = robust_stats(historyB)

    // STEP 3: Dynamic Thresholds
    // Define normality ranges based on statistics
    rangeA_min = meanA - parameters.thresholdFactor * stdA
    rangeA_max = meanA + parameters.thresholdFactor * stdA
    rangeB_min = meanB - parameters.thresholdFactor * stdB
    rangeB_max = meanB + parameters.thresholdFactor * stdB

    logger.log("RangeA", [rangeA_min, rangeA_max])
    logger.log("RangeB", [rangeB_min, rangeB_max])

    // STEP 4: Compare with model-predicted values (if available)
    if modelPredictor is not null:
        predictedA = modelPredictor.predict(historyA, externalData)
        predictedB = modelPredictor.predict(historyB, externalData)
    else:
        // Fallback: if no model is available, use the historical mean
        predictedA = meanA
        predictedB = meanB

    deviationFromModelA = abs(currentFilteredA - predictedA)
    deviationFromModelB = abs(currentFilteredB - predictedB)

    logger.log("DeviationFromModelA", deviationFromModelA)
    logger.log("DeviationFromModelB", deviationFromModelB)

    // STEP 5: Anomaly Analysis
    anomalyA = (currentFilteredA < rangeA_min) or (currentFilteredA > rangeA_max)
    anomalyB = (currentFilteredB < rangeB_min) or (currentFilteredB > rangeB_max)

    avgAB = (currentFilteredA + currentFilteredB) / 2.0 if (currentFilteredA + currentFilteredB) > 0 else 1
    diffAB = abs(currentFilteredA - currentFilteredB)
    relDiff = diffAB / avgAB

    logger.log("AnomalyA", anomalyA)
    logger.log("AnomalyB", anomalyB)
    logger.log("RelativeDifferenceAB", relDiff)

    // STEP 6: Channel Weighting Based on Historical Reliability
    reliabilityA = channel_reliability(historyA, parameters)
    reliabilityB = channel_reliability(historyB, parameters)

    sumReliability = max(reliabilityA + reliabilityB, parameters.smallNumber)
    weightA = reliabilityA / sumReliability
    weightB = reliabilityB / sumReliability

    logger.log("ReliabilityA", reliabilityA)
    logger.log("ReliabilityB", reliabilityB)
    logger.log("Weights", [weightA, weightB])

    // STEP 7: Compute a Base Confidence
    baseConfidence = 100.0

    // Penalties for anomalies
    if anomalyA:
        baseConfidence -= parameters.anomalyPenalty
    if anomalyB:
        baseConfidence -= parameters.anomalyPenalty

    // Penalty for differences between channels
    diffPenalty = sigmoid(relDiff, parameters.sigmoidMidpoint, parameters.sigmoidScale)
    baseConfidence -= diffPenalty * parameters.maxDiffPenalty

    // Penalty for deviation from the model (limited by maxModelPenalty)
    modelDeviationScoreA = deviationFromModelA / (stdA + parameters.smallNumber)
    modelDeviationScoreB = deviationFromModelB / (stdB + parameters.smallNumber)

    baseConfidence -= min(modelDeviationScoreA, parameters.maxModelPenalty)
    baseConfidence -= min(modelDeviationScoreB, parameters.maxModelPenalty)

    logger.log("BaseConfidenceBeforeContext", baseConfidence)

    // STEP 8: Context Integration
    contextAdjustment = 1.0
    if externalData is not null:
        contextAdjustment = compute_context_factor(externalData, parameters)

    baseConfidence = baseConfidence * contextAdjustment * channelAvailabilityFactor
    logger.log("BaseConfidenceBeforeWeights", baseConfidence)

    // STEP 9: Use Weights for the Final Confidence
    confidence = baseConfidence * (0.5 * (weightA + weightB))

    if confidence < 0:
        confidence = 0
    if confidence > 100:
        confidence = 100

    logger.log("FinalConfidence", confidence)

    // Return confidence and explanations (logs)
    return {
        "confidence": confidence,
        "explanations": logger.get_logs()
    }


 // -----------------------------------------------------
 // Utility Functions
 // -----------------------------------------------------

function median_filter(data, window):
    // Apply a median filter on the specified window
    // Return the filtered series

function robust_stats(data):
    // Compute robust metrics (e.g., median and MAD)
    // return (mean/median, robust std)

function channel_reliability(history, parameters):
    // Evaluate the frequency of historical anomalies or channel stability
    // Return a value between 0 and 1

function sigmoid(x, midpoint, scale):
    // Sigmoidal function
    // return 1/(1+e^(-(x-midpoint)/scale))

function compute_context_factor(externalData, parameters):
    // Adjust confidence based on context (e.g., weather conditions, time of day)
    // return a factor (e.g., between 0.5 and 1.5)

function sanitize_input(value, maxPhysicalPM):
    if value is null or value < 0 or value > maxPhysicalPM*10:
        return NaN
    else:
        return value

function sanitize_history(history, maxPhysicalPM):
    cleaned = []
    for val in history:
        if val < 0 or val > maxPhysicalPM*10:
            cleaned.append(NaN)
        else:
            cleaned.append(val)
    return cleaned

function compute_channel_availability_factor(historyA, historyB):
    availA = count_non_nan(historyA) / length(historyA)
    availB = count_non_nan(historyB) / length(historyB)
    avgAvail = (availA + availB) / 2.0
    return max(0.5, avgAvail)

function count_non_nan(data):
    count = 0
    for val in data:
        if val is not NaN:
            count += 1
    return count

// logger.get_logs() will return information to understand why the confidence
// took on a certain value, facilitating transparency and maintenance.