How the Confidence Score is Calculated

PurpleAir · December 13, 2024, 8:50pm

How the Confidence Score Works

PurpleAir sensors report PM2.5 data through two channels, called Channel A and Channel B. The confidence score reflects how closely the two channels agree and whether either channel shows signs of abnormal behavior.

At a high level, the confidence calculation works as follows:

A channel sum is calculated separately for Channel A and Channel B.
An initial confidence score is calculated based on how similar those two sums are.
The confidence score is then adjusted using automatic rules and any manual downgrades applied by PurpleAir staff.

The complete process is described below:

Calculate the channel sums
A channel sum is calculated independently for Channel A and Channel B.
Each sum is the total of all PM2.5 pseudo-averaged values for that channel (real-time, 10-minute, 30-minute, 60-minute, 6-hour, 24-hour, and 1-week pseudo-averages).
Apply automatic downgrades
Each channel is checked to determine whether it should be automatically downgraded.
A channel is downgraded if either of the following conditions is true:
- Its latest reported PM2.5 value (real-time reading) is greater than 2000.
- Its channel sum is more than 10× the other channel’s sum.
  (Only the higher-reading channel is downgraded under this rule.)
Calculate the initial confidence score
The initial confidence score is calculated from the two channel sums using a modified form of Relative Percentage Difference, which determines how closely the channels match.

The initial confidence score is then reduced when either channel’s latest real-time PM2.5 value exceeds 2000, with a larger reduction applied if both channels are above this threshold. This adjustment is not related to the automatic downgrade logic in the previous step.
Calculate downgraded confidence values
Two adjusted confidence values are created from the initial confidence—confidence auto and confidence manual—one for automatic downgrades and one for manual downgrades.
- confidence_auto uses only automatic downgrades. The channels involved are indicated by channel_flags_auto, but this flag set does not represent the number of downgraded channels and cannot be used directly in the formula.
- confidence_manual uses only manual downgrades. The channels involved are indicated by channel_flags_manual, but this flag set also does not represent the number of downgraded channels and cannot be used directly in the formula.
Each downgraded channel reduces the corresponding score by 40%.
The calculations are:
Automatic:
confidence_auto = initial_confidence * (100 - (40 * number_of_automatic_downgrades)) / 100

Manual:
confidence_manual = initial_confidence * (100 - (40 * number_of_manual_downgrades)) / 100
Set downgrade flags
channel_flags are updated to record which channels (A, B, or both) were downgraded automatically or manually.
Calculate the final confidence score
The final confidence score is calculated as the average of the two downgraded values:
confidence = (confidence_auto + confidence_manual) / 2

Implementation

The following code calculates the confidence score, with some minor adjustments for readability.

public class ConfidenceCalculator {
  public static final int MAX_PM = 2000;

  public static double getZeroIfNodeIsNull(final JsonNode node) {
    if (node == null) {
      return 0;
    } else {
      return (double)node.asDouble();
    }
  }

  /*
  *  This is where we start the confidence calculation
  *  
  *  objA -- stats_a -- real-time pseudo averaged data for channel A
  *  objB -- stats_b -- real-time pseudo averaged data for channel B
  */
  public static void calculateFlags(final Object e, int channelState, int channelFlagsManual, ObjectNode objA, ObjectNode objB){

    ObjectNode performance = calculatePerformance(objA, objB, channelState, channelFlagsManual);

    Integer confidenceAuto = null;
    Integer confidenceManual = null;
    int channelFlagsAuto = 3;
    // It starts as flags = 3, all bad.
    if (performance.get("channel_flags_auto") != null) {
      channelFlagsAuto = performance.get("channel_flags_auto").asInt();
    }
    if (performance.get("confidence_manual") != null) {
      confidenceManual = performance.get("confidence_manual").asInt();
    }
    if (performance.get("confidence_auto") != null) {
      confidenceAuto = performance.get("confidence_auto").asInt();
    }

    int channelFlags = channelFlagsManual;
    if (channelFlagsManual == 0) {
      channelFlags = channelFlagsAuto;
    }

    Integer confidence;
    if (confidenceManual == null && confidenceAuto == null) {
      confidence = 30;
    } else if (confidenceManual == null) {
      confidence = confidenceAuto;
    } else if (confidenceAuto == null) {
      confidence = confidenceManual;
    } else {
      confidence = (confidenceManual + confidenceAuto) / 2;
    }
  }

  // Calculates confidence_manual and confidence_auto
  public static ObjectNode calculatePerformance(final ObjectNode objA, 
      final ObjectNode objB, final int channelState, final int channelFlagsManual) {

    // Initialize our perf object holding confidence values
    ObjectMapper objectMapper = new ObjectMapper();
    ObjectNode perf = (ObjectNode) objectMapper.readValue("{}", JsonNode.class);
    // Return an object with no confidence values if there's no PM data
    if (objA == null && objB == null) {
      return perf;
    }

    float pmA = 0;
    if (objA != null) {
      JsonNode jsA = objA.get("pm2.5_atm"); // pm2.5_atm_a
      if (jsA != null) {
        pmA = jsA.floatValue();
      }
    }
    float pmB = 0;
    if (objB != null) {
      JsonNode jsB = objB.get("pm2.5_atm"); // pm2.5_atm_b
      if (jsB != null) {
        pmB = jsB.floatValue();
      }
    }

    double totala = pmA;
    double vA = 0;
    if (objA != null) {
      vA = getZeroIfNodeIsNull(objA.get("pm2.5")); // pm2.5_a
      totala += vA;
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_10minute")); // pm2.5_10minute_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_30minute")); // pm2.5_30minute_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_60minute")); // pm2.5_60minute_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_6hour")); // pm2.5_6hour_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_24hour")); // pm2.5_24hour_a
      totala += getZeroIfNodeIsNull(objA.get("pm2.5_1week")); // pm2.5_1week_a
    }

    double totalb = pmB;
    double vB = 0;
    if (objB != null) {
      vB = getZeroIfNodeIsNull(objB.get("pm2.5")); // pm2.5_b
      totalb += vB;
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_10minute")); // pm2.5_10minute_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_30minute")); // pm2.5_30minute_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_60minute")); // pm2.5_60minute_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_6hour")); // pm2.5_6hour_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_24hour")); // pm2.5_24hour_b
      totalb += getZeroIfNodeIsNull(objB.get("pm2.5_1week")); // pm2.5_1week_b
    }

    // Calculate automatic downgrades
    int channelFlagsAuto = 0;
    if (vA > MAX_PM || objB != null && totala > (totalb * 10)) {
      channelFlagsAuto = channelFlagsAuto + 1;
    }
    if (vB > MAX_PM || objA != null && totalb > (totala * 10)) {
      channelFlagsAuto = channelFlagsAuto + 2;
    }
    perf.put("channel_flags_auto", channelFlagsAuto);

    if (objA != null && objB != null) {

      // calculate initial confidence
      int confidence;
      confidence = (int) getConfidence(totala, totalb);

      // Lower confidence for abnormally high readings
      if (pmA > MAX_PM && pmB > MAX_PM) {
        confidence = confidence - 40;
      } else if (pmA > MAX_PM) {
        confidence = confidence - 60;
      } else if (pmB > MAX_PM) {
        confidence = confidence - 60;
      }

      // Calculate confidence_auto by adjusting for automatic downgrades
      perf.put("confidence_auto", getConfidenceAdjusted(confidence, channelState, channelFlagsAuto));
      // Calculate confidence_manual by adjusting for manual downgrades
      perf.put("confidence_manual", getConfidenceAdjusted(confidence, channelState, channelFlagsManual));
    }

    return perf;
  }  

  // Gets the initial confidence between two channel sums
  private static double getConfidence(final double a, final double b) {
    double diff = Math.abs(a - b);
    double avg = (a + b) / 2;

    double pc = Math.round(diff / avg * 100 / 1.6);
    pc = pc - 25;
    if (pc < 0) {
      pc = 0;
    }

    double npcx;
    npcx = 100 - pc;
    if (npcx < 0) {
      npcx = 0;
    }

    return npcx;
  }

  // Adjusts confidence based on downgrades (channelFlags)
  private static int getConfidenceAdjusted(final int conf, final int state, final int channelFlags) {
    int c = conf;
    if (c < 0) {
      c = 0;
    }

    if (state == 3 && channelFlags > 0) {
      int correction = 1;
      if (channelFlags > 2) {
        correction = 2; 
      }
      return c * (100 - (40 * correction)) / 100;
    }
    return c;
  }
}

Marco · December 17, 2024, 7:41pm

I hope you find these suggestions valuable, even if they are potentially too complex or expensive to implement

Also, I just wanted to say that I do not hold any technical credit for this work. It’s the polished result of three iterations of ChatGPT o1’s output

Proposed Improvements to the Confidence Calculation System

The current confidence calculation system for sensors uses static logic, fixed thresholds, and constant penalties. This approach is limited because it does not adapt to environmental dynamics, does not effectively distinguish between normal fluctuations and significant anomalies, and does not leverage historical data or advanced statistical or modeling techniques.

Below we present a set of technical proposals to improve the confidence calculation, integrated with a modular, gradual, and more thoroughly documented approach. The goal is to make the system more robust, adaptive, interpretable, and capable of correctly reflecting real data variability, while also identifying potentially defective sensors.

Possible Technical Improvements

Dynamic Thresholds Generated by Robust Statistics:
Instead of using a fixed MAX_PM, it is possible to define normality thresholds based on robust statistical metrics, such as the median and MAD (Median Absolute Deviation) or robust mean and standard deviation. These thresholds can be periodically updated using historical data, resulting in less arbitrary, more adaptive criteria that evolve with changing environmental conditions.
Temporal Data Filtering and Modeling:
Before computing the confidence, median filters can be applied to reduce noise and stabilize the data. In subsequent phases, Kalman filters, ARIMA models, or other predictive methods can be introduced to estimate expected values, compare them with observed values, and identify abnormal deviations.
This approach makes it easier to distinguish between normal fluctuations and unusual patterns, ensuring greater reliability.
Historical Channel Weighting and Gradual Penalties:
Not all channels are equally stable over time. Their historical reliability can be assessed and used to assign different weights: a more stable channel will have a greater impact on the final confidence. Furthermore, instead of rigid penalties (linear score cuts), sigmoidal functions or penalties proportional to anomaly severity can be introduced, making confidence reduction smoother.
Advanced Correlation and Stability Metrics:
Analyzing the temporal correlation between channels and measuring signal entropy can help detect suspicious situations. If two channels that should be consistent over time display diverging patterns, this may indicate a problem with one of the sensors.
Contextualization with External Data:
Integrating environmental data (humidity, temperature, weather conditions), contextual knowledge (time of day, seasonality), or other external variables makes it possible to understand whether a PM2.5 spike is plausible or not. A sensor that records sudden anomalies under inconsistent environmental conditions will see its confidence reduced. The use of multivariate regression or ML models allows for estimating context-aware expected values.
Machine Learning and Bayesian Updating:
ML models (e.g., Random Forest, Gradient Boosting) can be trained on labeled historical data to learn how to discriminate between reliable and anomalous readings. Bayesian approaches also allow for dynamically updating confidence by continuously integrating new observations and prior knowledge.

Gradual Approach, Parameterization, and Maintenance

To avoid introducing an overly complex and hard-to-maintain system all at once, it is advisable to proceed in phases:

Phase 1: Integrate median filters, dynamic thresholds based on robust statistics, and a detailed logging system. This will ensure greater stability compared to a simple fixed threshold and will help in better understanding system behavior.
Phase 2: Introduce a simple forecasting model (e.g., ARIMA), compare the results with the baseline system, and evaluate the benefits before moving on to the next phase.
Phase 3: Integrate supervised Machine Learning models using labeled historical data. These models can provide more sophisticated confidence evaluations but require a training, validation, and maintenance phase. It is advisable to include periodic retuning mechanisms to keep the model updated to new environmental or usage conditions.

Parameters, Tuning, and Explainability

All parameters (filtering windows, threshold factors, maximum penalties, channel weights, statistical and ML model parameters) must be clearly documented:

Define a Tuning Protocol:
Use historical data and, where available, labeled datasets to optimize parameters. Measure accuracy metrics (false positive rate, false negative rate, correlation with reference data) and adjust the parameters accordingly.
Fallback and Redundancies:
Consider scenarios in which external data or ML models are unavailable and implement fallback strategies, reverting to simpler yet reliable methods.
Detailed Logging:
Each phase of the confidence calculation should be logged to enable post-hoc analysis. This makes it possible to explain, after the fact, why the confidence assumed a certain value, ensuring the system is transparent and maintainable.

Robustness to Missing Data and Outliers

Data Sanitization:
Before computing the confidence, clean the data of implausible values (beyond physically acceptable ranges). Sensors that are partially non-functional or channels with poor data quality should proportionally reduce the confidence.
Capping and Extreme Value Handling:
Prevent extreme outliers from compromising the statistics by applying capping to values or treating them as missing if they exceed physically improbable thresholds.

Conclusions

By adopting this more sophisticated, incremental, and documented approach, the confidence calculation system becomes:

More Adaptive: Capable of responding to new environmental conditions and continuously evolving data.
More Robust: Less sensitive to anomalous peaks and extreme values, thanks to filtering and robust statistics.
More Transparent: Through detailed logging, documented parameters, and the ability to explain decisions.
Modular and Maintainable: By gradually integrating components and models, it’s possible to evaluate the effectiveness of each step and avoid unnecessary complexity.

Pseudocode

// -----------------------------------------------------
// Main function for computing confidence
// -----------------------------------------------------
function compute_confidence(
    currentA, currentB,
    historyA, historyB,
    externalData,
    modelPredictor,     // can be ARIMA, ML, or null if not available
    kalmanA, kalmanB,   // Kalman filters or null if not used
    parameters,
    logger              // an object for detailed logging
):

    // STEP 0: Pre-validation and Data Sanitization
    currentA = sanitize_input(currentA, parameters.maxPhysicalPM)
    currentB = sanitize_input(currentB, parameters.maxPhysicalPM)
    historyA = sanitize_history(historyA, parameters.maxPhysicalPM)
    historyB = sanitize_history(historyB, parameters.maxPhysicalPM)

    channelAvailabilityFactor = compute_channel_availability_factor(historyA, historyB)
    logger.log("ChannelAvailabilityFactor", channelAvailabilityFactor)

    // STEP 1: Preprocessing & Filtering
    // Apply median filters to reduce noise and stabilize data
    filteredA = median_filter(append(historyA, currentA), window=parameters.medianWindow)
    filteredB = median_filter(append(historyB, currentB), window=parameters.medianWindow)

    if kalmanA is not null:
        smoothedA = kalmanA.apply(filteredA)
    else:
        smoothedA = filteredA

    if kalmanB is not null:
        smoothedB = kalmanB.apply(filteredB)
    else:
        smoothedB = filteredB

    currentFilteredA = last(smoothedA)
    currentFilteredB = last(smoothedB)

    logger.log("CurrentFilteredA", currentFilteredA)
    logger.log("CurrentFilteredB", currentFilteredB)

    // STEP 2: Compute Historical (Robust) Statistics
    // For example, use median and MAD or mean and robust std
    meanA, stdA = robust_stats(historyA)
    meanB, stdB = robust_stats(historyB)

    // STEP 3: Dynamic Thresholds
    // Define normality ranges based on statistics
    rangeA_min = meanA - parameters.thresholdFactor * stdA
    rangeA_max = meanA + parameters.thresholdFactor * stdA
    rangeB_min = meanB - parameters.thresholdFactor * stdB
    rangeB_max = meanB + parameters.thresholdFactor * stdB

    logger.log("RangeA", [rangeA_min, rangeA_max])
    logger.log("RangeB", [rangeB_min, rangeB_max])

    // STEP 4: Compare with model-predicted values (if available)
    if modelPredictor is not null:
        predictedA = modelPredictor.predict(historyA, externalData)
        predictedB = modelPredictor.predict(historyB, externalData)
    else:
        // Fallback: if no model is available, use the historical mean
        predictedA = meanA
        predictedB = meanB

    deviationFromModelA = abs(currentFilteredA - predictedA)
    deviationFromModelB = abs(currentFilteredB - predictedB)

    logger.log("DeviationFromModelA", deviationFromModelA)
    logger.log("DeviationFromModelB", deviationFromModelB)

    // STEP 5: Anomaly Analysis
    anomalyA = (currentFilteredA < rangeA_min) or (currentFilteredA > rangeA_max)
    anomalyB = (currentFilteredB < rangeB_min) or (currentFilteredB > rangeB_max)

    avgAB = (currentFilteredA + currentFilteredB) / 2.0 if (currentFilteredA + currentFilteredB) > 0 else 1
    diffAB = abs(currentFilteredA - currentFilteredB)
    relDiff = diffAB / avgAB

    logger.log("AnomalyA", anomalyA)
    logger.log("AnomalyB", anomalyB)
    logger.log("RelativeDifferenceAB", relDiff)

    // STEP 6: Channel Weighting Based on Historical Reliability
    reliabilityA = channel_reliability(historyA, parameters)
    reliabilityB = channel_reliability(historyB, parameters)

    sumReliability = max(reliabilityA + reliabilityB, parameters.smallNumber)
    weightA = reliabilityA / sumReliability
    weightB = reliabilityB / sumReliability

    logger.log("ReliabilityA", reliabilityA)
    logger.log("ReliabilityB", reliabilityB)
    logger.log("Weights", [weightA, weightB])

    // STEP 7: Compute a Base Confidence
    baseConfidence = 100.0

    // Penalties for anomalies
    if anomalyA:
        baseConfidence -= parameters.anomalyPenalty
    if anomalyB:
        baseConfidence -= parameters.anomalyPenalty

    // Penalty for differences between channels
    diffPenalty = sigmoid(relDiff, parameters.sigmoidMidpoint, parameters.sigmoidScale)
    baseConfidence -= diffPenalty * parameters.maxDiffPenalty

    // Penalty for deviation from the model (limited by maxModelPenalty)
    modelDeviationScoreA = deviationFromModelA / (stdA + parameters.smallNumber)
    modelDeviationScoreB = deviationFromModelB / (stdB + parameters.smallNumber)

    baseConfidence -= min(modelDeviationScoreA, parameters.maxModelPenalty)
    baseConfidence -= min(modelDeviationScoreB, parameters.maxModelPenalty)

    logger.log("BaseConfidenceBeforeContext", baseConfidence)

    // STEP 8: Context Integration
    contextAdjustment = 1.0
    if externalData is not null:
        contextAdjustment = compute_context_factor(externalData, parameters)

    baseConfidence = baseConfidence * contextAdjustment * channelAvailabilityFactor
    logger.log("BaseConfidenceBeforeWeights", baseConfidence)

    // STEP 9: Use Weights for the Final Confidence
    confidence = baseConfidence * (0.5 * (weightA + weightB))

    if confidence < 0:
        confidence = 0
    if confidence > 100:
        confidence = 100

    logger.log("FinalConfidence", confidence)

    // Return confidence and explanations (logs)
    return {
        "confidence": confidence,
        "explanations": logger.get_logs()
    }


 // -----------------------------------------------------
 // Utility Functions
 // -----------------------------------------------------

function median_filter(data, window):
    // Apply a median filter on the specified window
    // Return the filtered series

function robust_stats(data):
    // Compute robust metrics (e.g., median and MAD)
    // return (mean/median, robust std)

function channel_reliability(history, parameters):
    // Evaluate the frequency of historical anomalies or channel stability
    // Return a value between 0 and 1

function sigmoid(x, midpoint, scale):
    // Sigmoidal function
    // return 1/(1+e^(-(x-midpoint)/scale))

function compute_context_factor(externalData, parameters):
    // Adjust confidence based on context (e.g., weather conditions, time of day)
    // return a factor (e.g., between 0.5 and 1.5)

function sanitize_input(value, maxPhysicalPM):
    if value is null or value < 0 or value > maxPhysicalPM*10:
        return NaN
    else:
        return value

function sanitize_history(history, maxPhysicalPM):
    cleaned = []
    for val in history:
        if val < 0 or val > maxPhysicalPM*10:
            cleaned.append(NaN)
        else:
            cleaned.append(val)
    return cleaned

function compute_channel_availability_factor(historyA, historyB):
    availA = count_non_nan(historyA) / length(historyA)
    availB = count_non_nan(historyB) / length(historyB)
    avgAvail = (availA + availB) / 2.0
    return max(0.5, avgAvail)

function count_non_nan(data):
    count = 0
    for val in data:
        if val is not NaN:
            count += 1
    return count

// logger.get_logs() will return information to understand why the confidence
// took on a certain value, facilitating transparency and maintenance.

Topic		Replies	Views
The Confidence Score Real-Time Map	2	1387	January 14, 2025
A and B channel Confidence Score Data	1	309	February 8, 2024
Explanation of the confidence values Data	3	888	January 10, 2024
What are Channel A and Channel B? Data	11	3105	January 3, 2025
Meaning of channel_state API	2	470	March 6, 2023