Is there a field that returns data with US EPA pm2.5 conversion formula applied?

Hello, I was trying to pull sensor data for pm2.5 using the API. I noticed that there’s an option to apply US EPA conversion formula to the data directly in the data layer on the website. However, I did not find a corresponding field that returns such data with conversion applied in the API call. Is it true that we have to manually apply our own correction formula after getting raw data from the API? Any help is appreciated.

2 Likes

With the fields currently available via the API, you will need to manually apply the correction formula to the raw data. The correction formula from the US EPA can be found here: Sensor data cleaning and correction: Application on the AirNow Fire and Smoke Map | Science Inventory | US EPA. You’ll need to download the pdf found under “URLs/Downloads.” The full equation can be found on page 26.

While we don’t currently provide this as an available calculated field on the API, it can be calculated on and downloaded from the map. We’re open to user feedback on this.

1 Like

Hi Andrew, thank you for the quick reply! I have tried to do exact what you said myself but had small issues. Basically, after applying the equations manually, I get very similar, but not exactly the same corrected numbers from directly downloading data using the “US EPA” conversion on the map. I had some clarification questions that might potentially solve this:

  • should I use pm2.5_atm (an average of pm2.5_atm_a and pm2.5_atm_b) as the “x” shown on the equations?
  • I was applying the conversion equation onto the hourly aggregated data. Should I instead work with the realtime/2 minute data? (and my feedback here is that the 1 million API free point seems a little tight if we need to retrieve realtime data)
  • is there any other step (e.g. quality control, AB check) that PurpleAir applied for EPA conversion besides the correction equations?

Thanks!

2 Likes

Happy to help, Karen! I should clarify that the CF=ATM formula should be used for outdoor sensors. If you are in need of the formula for indoor sensors (CF=1), I can provide that as well.

Should I use pm2.5_atm (an average of pm2.5_atm_a and pm2.5_atm_b) as the “x” shown on the equations?

Yes, this will apply the correction to the averaged values. However, you can apply the correction to the A and B fields individually if you wish. Remember that this data (pm2.5_atm) is the raw data in µg/m3. Thus, you’ll want to make sure that you’ve changed the data layer on the map to “Raw PM2.5” when comparing your data against the map data.

I was applying the conversion equation onto the hourly aggregated data. Should I instead work with the realtime/2 minute data? (and my feedback here is that the 1 million API free point seems a little tight if we need to retrieve realtime data)

You can use hourly data. This should match the hourly averages shown on the map.

Is there any other step (e.g. quality control, AB check) that PurpleAir applied for EPA conversion besides the correction equations?

Yes, and this is a great question! All of our outdoor sensors use two particle counters, which is why the A and B channels in the data exist. Using two counters to reference values against each other serves as a measure of confidence in data quality. When a particle counter’s fan fails or debris gets caught inside and blocks the laser, that particle counter can provide erroneous data.

We have a formula that attempts to identify such particle counters and “downgrade” them, which means that they are marked as having an issue. This lowers the impact of such particle counters on the map and can also be found in the data using the field channel_flags. In summary, a “downgraded” channel represents a particle counter that we believe has an issue and isn’t providing reliable data.

Here’s a snippet from the API documentation on the channel_flags field:

Possible values are:
Normal = No PM sensors are marked as downgraded.
A-Downgraded = PM sensor on channel A is marked as downgraded.
B-Downgraded = PM sensor on channel B is marked as downgraded.
A+B-Downgraded = PM sensors on both channels A and B are marked as downgraded.

2 Likes

Thanks Andrew, this is helpful. However, I’m still having issue matching the manually adjusted numbers to the map data for the hourly PM2.5 raw readings. I think I followed all the correct steps:

  1. download hourly pm2.5_atm using API, apply the EPA equations on them based on different conditions;
  2. for the map data, set the data layer to raw pm2.5 and apply US EPA conversion to hourly data;
  3. adjust the time zone difference between two sets of data.

This still gives me discrepancy on the decimal level (~0.3 difference) between map data and API data. Any thoughts on why? Thanks!

2 Likes

Hi @Karen_Wang,

I just tested the formula and was able to match the data from the API to the data on the map. Ensure you are using the most updated correction formula:
PM2.5 = 0.52*PAcf_1 - 0.086*RH + 5.75

My apologies, I wrote the incorrect formula. I was still able to match up the data. The correct equations are found on page 26 of the PDF.

  • y={0 ≤ x <30: 0.524x - 0.0862RH + 5.75}
  • y={30≤ x <50: (0.786*(x/20 - 3/2) + 0.524*(1 - (x/20 - 3/2)))x -0.0862RH + 5.75}
  • y={50 ≤ x <210: 0.786x - 0.0862RH + 5.75}
  • y={210 ≤ x <260: (0.69*(x/50 – 21/5) + 0.786*(1 - (x/50 – 21/5)))x - 0.0862RH*(1 - (x/50 – 21/5)) + 2.966*(x/50 – 21/5) + 5.75*(1 - (x/50 – 21/5)) + 8.84*(10^{-4})x^{2}(x/50 – 21/5)}
  • y={260 ≤ x: 2.966 + 0.69x + 8.8410^{-4}*x^2}

These formulas can be found in the US-wide study:
https://cfpub.epa.gov/si/si_public_record_report.cfm?dirEntryId=353088&Lab=CEMM

Ensure that the humidity is being written out as it appears in the data (not a percentage). If the humidity is 24.7, that is exactly what is put into the formula.

1 Like

Thank you both! I finally get it to work. Turns out the ordering of calculation matters. The small difference was a result of averaging A and B channel first and applying the formula. Applying the correction formula for each channel separately and taking an average at the last step solves the problem.

2 Likes