Identifying Faulty Sensors

I am using historical purple air data to explore air quality trends over time in a particular city. I downloaded the PM2.5_atm field with average=60 and have corrected the data, but several sensors have data that look something like this:

800ug/m^3 seems way too high so I’m inclined to not trust this data. I’m wondering, without really knowing exactly what sources may be contributing, what would be the highest purple air sensor reading that indicates a real reading, and not a faulty sensor?


While we don’t have a definitive upper bound for real readings, 800ug/m^3 is typically an unrealistic reading, especially given the spiking of the data. Sensor data will typically spike like this due to hyper-local events. Smoking and cooking are two common hyper-local events.

High readings can also be caused by debris or insects inside the sensor. These will be seen as outliers within an area, and their readings can sometimes be in the thousands of ug/m^3. If this is the case, the higher reading will typically be constant. However, sometimes insects can cause readings to spike like your example.

Here are some example readings from our map, as seen on July 26, 2023. All measurements were seen using the Raw PM2.5 data layer, no conversion factor, and a 1-hour averaging period. The measurements are in ug/m^3:

  • There are currently wildfires in Northern Canada. Some sensors read all the way up to the mid-400s.
  • There are also currently fires in Oregon, USA. If you look at some of the sensors, they read as high as 900 momentarily.
  • In Michigan, USA, you can see many sensors reading at high levels in the ‘40s and ‘50s.

Hopefully this provides some insights for ensuring data quality in your dataset.


Thank you so much, this was very helpful!

Are both sensors in the PA unit providing the same pattern of readings? If not, it’s a fair bet that the one with the high readings is faulty. Sometimes it can be cleaned. If it’s still a problem PA now provides replacements.

If both laser counters in the unit are providing similar readings, then it’s most likely genuine, but could be a very local source.

For one thing, compare the suspect sensor or sensors with nearby sensors, look at what the “going rate” seems to be on average, and if you compare the suspect readings with the official hourly AirNow EPA readings plus the nearby PurpleAir sensors, that should give you an idea what the REAL air pollution index SHOULD currently be. Use the nearest Purple Air sensor to your location, if you don’t have a sensor yourself; use the closest one as representing your city, town, village, or village. If the official hourly AirNow reading is in fairly good agreement with nearby sensors, you can use THAT reading as a guide, too, on what the index likely is around your backyard. You will find out that there’s no 800 reading here in America. We don’t have the choking smoke as evidence of such a horrific reading. You would know if it was 800; you would not be able to go out, and it likely would look like a gloomy evening at noon with hardly even a red sun visible, if no clouds other than smoke were around. I think visibility would be at certain spots near zero.