Hi everyone! I’m still new to this group, so I apologize in advance if this has been answered before. Does anyone know of an existing R script that uses the new API to create a data file of all historical data for a given list of sensors? I tried to access the data but got the following error while processing the data:
I coordinate an air quality monitoring network composed of 30 sensors and I need long-term historical data for several sensors. Any comments will be most welcome. Thank you very much.
Hi Willian, we’ve developed a script to pull historical data. It may not be the most succinct or pretty, but it’s worked for us. This is for one sensor so you may have to adjust to make it a group of sensors.
#Code -----
library(tidyverse)
library(httr) # authenticate API
library(jsonlite) # read JSON
library(lubridate) # work with dates
#Get current time - 2 days (Pull historical only allows 2 days of data at a time)
current.epoch ← as.character(as.integer(
as.POSIXct(Sys.time()) - days(2)
))
#List of variables you’d like to pull, formatted to paste into web address
fields=paste0(“?average=0&start_timestamp=”, current.epoch, “&fields=latitude%2C%20longitude%2C%20altitude%2C%20rssi%2C%20humidity%2C%20temperature%2C%20pressure%2C%20pm2.5_atm%2C%20pm2.5_atm_a%2C%20pm2.5_atm_b%2C%20pm2.5_alt%2C%20pm2.5_alt_a%2C%20pm2.5_alt_b%2C%20pm2.5_cf_1%2C%20pm2.5_cf_1_a%2C%20pm2.5_cf_1_b%2C%20pm10.0_atm%2C%20pm10.0_atm_a%2C%20pm10.0_atm_b%2C%20pm10.0_cf_1%2C%20pm10.0_cf_1_a%2C%20pm10.0_cf_1_b”)
Hi Laura, thanks for your script, it worked perfectly… It will help a lot in the management of the air quality network in the State of Acre, Brazil.
Thank you very much!
I was pleased to see that you are including the PM2.5_alt algorithm along with the CF1 and CF_ATM algorithms provided by Plantower in your R script. Based on my experience with all three algorithms, I would expect that the CF1 and CF_ATM algorithms will give results 30-90% higher than the PM2.5_alt algorithm. The latter has been compared to FEM/FRM regulatory monitors to obtain an accurate calibration factor for both the PMS1003 and PMS 5003 sensors used in the PA-I and PA-II monitors [1-3]. Recently a very large study was carried out to estimate long-term indoor exposures using indoor (4000) and outdoor (10,000) PurpleAir sites in the three West Coast states [4].
Are you making any indoor studies? If so, I can predict that there will be a large number of zeros returned by the CF1 and CF_ATM algorithms. The PM2.5_alt algorithm returns no zeros, since it is based on particle number and the particle number in the 0.3-0.5 um size category never falls to zero.
Wallace, L.A., Zhao, T., Klepeis, N.R. 2022 Indoor contribution to PM2.5 exposure using all PurpleAir sites in Washington, Oregon, and California.Indoor Air 32: (9) 13105. https://onlinelibrary.wiley.com/doi/abs/10.1111/ina.13105.
Hi Laura, I am trying to use your code to pull historical data. However, I got the same lexical error as Willian. I apologize that I am really new to coding and R, but I really need to figure this out for my thesis.
Here is my code
library(tidyverse)
library(httr) # authenticate API
library(jsonlite) # read JSON
library(lubridate) # work with dates
#Get current time - 2 days (Pull historical only allows 2 days of data at a time)
current.epoch <- as.character(as.integer(
as.POSIXct(Sys.time()) - days(2)
))
#List of variables you’d like to pull, formatted to paste into web address
fields=paste0("?average=0&start_timestamp=", current.epoch, "&fields=latitude%2C%20longitude%2C%20altitude%2C%20rssi%2C%20humidity%2C%20temperature%2C%20pressure%2C%20pm2.5_atm%2C%20pm2.5_atm_a%2C%20pm2.5_atm_b%2C%20pm2.5_alt%2C%20pm2.5_alt_a%2C%20pm2.5_alt_b%2C%20pm2.5_cf_1%2C%20pm2.5_cf_1_a%2C%20pm2.5_cf_1_b%2C%20pm10.0_atm%2C%20pm10.0_atm_a%2C%20pm10.0_atm_b%2C%20pm10.0_cf_1%2C%20pm10.0_cf_1_a%2C%20pm10.0_cf_1_b")
#Pull historical data from API
raw_data_hist <- httr::GET(url = paste("https://map.purpleair.com/1/b/i/mAQI/a10/p0/cC0?select=101555#12.75/43.65301/-79.3985", fields, sep=" "),
config = add_headers("X-API-Key" = "DB0239FE-****-11ED-****-***********"))
#Structurized data in form of R vectors and lists
jsonParsed <- fromJSON(content(raw_data_hist, as='text'))
#Dataframe from JSON data
#modJson <- as.data.frame(jsonParsed$data)
#names(modJson)<- jsonParsed$fields
This is the error
Error: lexical error: invalid char in json text. <!doctype html> <html lang=“en” (right here) ------^
Hi Sicheng, I have developed an R function to download historical data from PurpleAir sensors and it is available on my GitHub page https://github.com/willianflores/getPurpleairApiHistory. The function is still in the testing phase, if you use it and find any errors, please let me know.
Hi Willian I tried using your code from GitHub im running into an error Error in structure(.External(.C_dotTclObjv, objv), class = “tclObj”) :
[tcl] invalid command name “toplevel”.
Is it possible to do execute without the tcltk package? do you have any suggestions. I have a Mac I install XQuarts the required dependency.
Hi Willian, I found no errors running the function on my mac on RStudio. However, I think I found a bug with the 6-hour average pull. It appears that the 6 pm average reading every other day is missing. For reference, I tried to get 6-hour averages for multiple sensors for December 2022. I have not tried it with other averages yet. Otherwise, thank you so much for putting this together! It has been very helpful.
I realize this is a thread about R, which is also my native language, but I found that python was a much better solution for this type of problem. Once it’s downloaded, I still use R for all of my processing and analyses.
Hello Alyssa, I have identified this problem, apparently it is not related to the R code, but to the processing of the API itself. What I have been doing is downloading the data with an average of 10 minutes and doing a post-processing to obtain the other time averages, if necessary.
That I could help with, you have to go to where the function is saved also on the github and copy and paste the whole thing into R and run it, so the function for R is actually loaded into memory.
I have a question for William though, I ran it, and got this error,
Error in sprintf(“web service error %s from:\n %s\n\n%s”, status_code, :
object ‘webserviceUrl’ not found
Which seems to suggest that a URL is bad in the function? Or am I entering my information incorrectly?
Hi John, I have run into the same error and believe this happens when the data for a specific sensor does not exist at the specified time frame or when a sensor is private. I have bypassed this error by inserting the try() function at line 209. This will keep the code running through API calls that don’t return anything. I have found this to be helpful when querying multiple sensors over time. The code would look like how it does below. You could also implement the trycatch() function if you want to keep the error message but ensure the loop continues.
r_temp ← try(httr::GET( # added try so that loop will continue regardless of error
URLbase,
query = queryList,
config = add_headers(“X-API-Key” = apiReadKey)
))