Hello, I was trying to pull sensor data for pm2.5 using the API. I noticed that there’s an option to apply US EPA conversion formula to the data directly in the data layer on the website. However, I did not find a corresponding field that returns such data with conversion applied in the API call. Is it true that we have to manually apply our own correction formula after getting raw data from the API? Any help is appreciated.
With the fields currently available via the API, you will need to manually apply the correction formula to the raw data. The correction formula from the US EPA can be found here: Sensor data cleaning and correction: Application on the AirNow Fire and Smoke Map | Science Inventory | US EPA. You’ll need to download the pdf found under “URLs/Downloads.” The full equation can be found on page 26.
While we don’t currently provide this as an available calculated field on the API, it can be calculated on and downloaded from the map. We’re open to user feedback on this.
Hi Andrew, thank you for the quick reply! I have tried to do exact what you said myself but had small issues. Basically, after applying the equations manually, I get very similar, but not exactly the same corrected numbers from directly downloading data using the “US EPA” conversion on the map. I had some clarification questions that might potentially solve this:
- should I use pm2.5_atm (an average of pm2.5_atm_a and pm2.5_atm_b) as the “x” shown on the equations?
- I was applying the conversion equation onto the hourly aggregated data. Should I instead work with the realtime/2 minute data? (and my feedback here is that the 1 million API free point seems a little tight if we need to retrieve realtime data)
- is there any other step (e.g. quality control, AB check) that PurpleAir applied for EPA conversion besides the correction equations?
Thanks!
Happy to help, Karen! I should clarify that the CF=ATM formula should be used for outdoor sensors. If you are in need of the formula for indoor sensors (CF=1), I can provide that as well.
Should I use pm2.5_atm (an average of pm2.5_atm_a and pm2.5_atm_b) as the “x” shown on the equations?
Yes, this will apply the correction to the averaged values. However, you can apply the correction to the A and B fields individually if you wish. Remember that this data (pm2.5_atm
) is the raw data in µg/m3. Thus, you’ll want to make sure that you’ve changed the data layer on the map to “Raw PM2.5” when comparing your data against the map data.
I was applying the conversion equation onto the hourly aggregated data. Should I instead work with the realtime/2 minute data? (and my feedback here is that the 1 million API free point seems a little tight if we need to retrieve realtime data)
You can use hourly data. This should match the hourly averages shown on the map.
Is there any other step (e.g. quality control, AB check) that PurpleAir applied for EPA conversion besides the correction equations?
Yes, and this is a great question! All of our outdoor sensors use two particle counters, which is why the A and B channels in the data exist. Using two counters to reference values against each other serves as a measure of confidence in data quality. When a particle counter’s fan fails or debris gets caught inside and blocks the laser, that particle counter can provide erroneous data.
We have a formula that attempts to identify such particle counters and “downgrade” them, which means that they are marked as having an issue. This lowers the impact of such particle counters on the map and can also be found in the data using the field channel_flags
. In summary, a “downgraded” channel represents a particle counter that we believe has an issue and isn’t providing reliable data.
Here’s a snippet from the API documentation on the channel_flags
field:
Possible values are:
Normal
= No PM sensors are marked as downgraded.
A-Downgraded
= PM sensor on channel A is marked as downgraded.
B-Downgraded
= PM sensor on channel B is marked as downgraded.
A+B-Downgraded
= PM sensors on both channels A and B are marked as downgraded.
Thanks Andrew, this is helpful. However, I’m still having issue matching the manually adjusted numbers to the map data for the hourly PM2.5 raw readings. I think I followed all the correct steps:
- download hourly pm2.5_atm using API, apply the EPA equations on them based on different conditions;
- for the map data, set the data layer to raw pm2.5 and apply US EPA conversion to hourly data;
- adjust the time zone difference between two sets of data.
This still gives me discrepancy on the decimal level (~0.3 difference) between map data and API data. Any thoughts on why? Thanks!
Hi @Karen_Wang,
I just tested the formula and was able to match the data from the API to the data on the map. Ensure you are using the most updated correction formula:
PM2.5 = 0.52*PAcf_1 - 0.086*RH + 5.75
My apologies, I wrote the incorrect formula. I was still able to match up the data. The correct equations are found on page 26 of the PDF.
- y={0 ≤ x <30: 0.524x - 0.0862RH + 5.75}
- y={30≤ x <50: (0.786*(x/20 - 3/2) + 0.524*(1 - (x/20 - 3/2)))x -0.0862RH + 5.75}
- y={50 ≤ x <210: 0.786x - 0.0862RH + 5.75}
- y={210 ≤ x <260: (0.69*(x/50 – 21/5) + 0.786*(1 - (x/50 – 21/5)))x - 0.0862RH*(1 - (x/50 – 21/5)) + 2.966*(x/50 – 21/5) + 5.75*(1 - (x/50 – 21/5)) + 8.84*(10^{-4})x^{2}(x/50 – 21/5)}
- y={260 ≤ x: 2.966 + 0.69x + 8.8410^{-4}*x^2}
These formulas can be found in the US-wide study:
https://cfpub.epa.gov/si/si_public_record_report.cfm?dirEntryId=353088&Lab=CEMM
Ensure that the humidity is being written out as it appears in the data (not a percentage). If the humidity is 24.7, that is exactly what is put into the formula.
Thank you both! I finally get it to work. Turns out the ordering of calculation matters. The small difference was a result of averaging A and B channel first and applying the formula. Applying the correction formula for each channel separately and taking an average at the last step solves the problem.
When I use the equation on page 26 of that document, it doesn’t result in the same converted value as the map shows. For example, this JSON response shows a pm2.5_10minute value of 5.6 and humidity value of 62 (*note that I’ve changed the sensor_index and name to protect my privacy)
“fields” : [“sensor_index”,“name”,“confidence”,“humidity”,“pm2.5_10minute”],
“data” : [
[51468,“MySensor”,100,62,5.6]
]
If I plug those into the equation, I get this:
• y={0 ≤ x <30: 0.524x - 0.0862RH + 5.75}
y=0.5245.6 - 0.086262 + 5.75
So y=3.34
But the PurpleAir Map for that sensor shows an AQI value of 7 with these settings:
Am I doing something wrong or is there a different conversion equation being applied?
Hi @Tatian I’m not seeing any sensor under that sensor index. Is that the actual sensor index of the sensor or a placeholder? If you don’t want to share your sensor index on a public forum, you can send it to the moderators (which will get to me) or send an email to contact@purpleair.com.
Thanks for your response. Sorry, that’s a placeholder value. I’ll email you the actual sensor index.
Thanks for the email. I took a look at a couple of things that I thought could be causing the problem, but everything appears to be in order. I think the only issue you’re running into is that the final number in your calculation is a mass concentration value (µg/m^3) whereas the value you’re comparing it to on the map is an AQI value (US EPA PM2.5 AQI).
If you change the data layer on the map to Raw PM2.5 (µg/m^3), that should match the value you calculate using the formula.
Andrew - What would be the additional formula to convert it from µg/m^3 to the AQI value?
This should help: https://forum.airnowtech.org/t/aqi-calculations-overview-ozone-pm2-5-and-pm10/168. It’s a resource from AirNow.
@Karen_Wang @Tatian @cncmills
I feel I should mention that while we currently only provide raw data via the API, we have plans to create fields that will provide QA’ed data in the future. This will include AQI fields as well as conversions, such as the US EPA conversion for wildfire conditions. Stay tuned.
When reviewing the USEPA Sensor Data Cleaning and Correction PDF linked in previous posts, it appears to me that they are suggesting a simpler correction formula using cf_1 on page 13 (Titled “Final Correction”)
For PAcf_1<=343: PM2.5 = (0.52)PAcf_1 - (0.086)RH + 5.75
For PAcf_1>343: PM2.5 = (0.46)PAcf_1 + (3.93)(10^-4)(PAcf_1)^2 + 2.97
On page 14 they state:
The PurpleAir US-wide & extended corrections were developed using cf=1 [higher]
• Cf=1 is more strongly correlated with FRM/FEM/near FEM over the full concentration range
Page 18 is titled “Additional Slides” and on page 20 they state that
PurpleAir only provides averaged (e.g. 10-min to daily) cf_atm data in their new API
• Cf_atm equation in use on AirNow Fire and Smoke map (September 2021)
And on page 32 they state that the
5 piece [correction for cf_atm] performs more similarly to the cf_1.
It seems they are suggesting that the simpler formula for correction of cf_1 is actually more appropriate than the more complex formula using cf_atm, and that the entire section after page 18 is merely showing how to get similar results if you only have cf_atm data (which apparently wasn’t readily available from the PurpleAir API at the time of publication.) The cf_1 data is currently available on the API, so I’m confused as to why all of the previous posts are using the cf_atm correction formula.
Is PurpleAir using the cf_atm correction for outdoor sensors and the cf_1 correction for indoor sensors on the map?
The PDF also states on page 3 that this discussion only applies to Outdoor Sensors, which again suggests to me that the cf_1 correction formula is valid for outdoor sensors and the cf_atm formula is unnecessarily complex.
Hi Christopher. The document details how EPA corrects data for the AirNow Fire and Smoke Map, which pulls PurpleAir data at a 60-minute interval from the Get Sensors Data API. Currently, when pulling a 60-minute average from the Get Sensors API, you can’t choose whether to get ATM or CF1 data; rather, it will automatically give you ATM data for outdoor sensors and CF1 data for indoor sensors. Thus, AirNow can only get ATM data for the Fire and Smoke Map with the way they pull the data. We plan to offer more granularity in the future.
Note that you can specify ATM or CF1 data in most cases. It is specifically the simple running averages*––available in the Get Sensor Data and Get Sensors Data APIs––that don’t allow you to choose and automatically provide ATM/CF1 based on the location_type
of the sensor.
*Simple running averages: pm2.5_10minute
, pm2.5_30minute
, pm2.5_60minute
, etc.