R script to retrieve historical data

Hi everyone! I’m still new to this group, so I apologize in advance if this has been answered before. Does anyone know of an existing R script that uses the new API to create a data file of all historical data for a given list of sensors? I tried to access the data but got the following error while processing the data:

Error: lexical error: invalid char in json text.
<!doctype html> <html lang=“en”
(right here) ------^

I coordinate an air quality monitoring network composed of 30 sensors and I need long-term historical data for several sensors. Any comments will be most welcome. Thank you very much.

Willian Flores

1 Like

Hi Willian, we’ve developed a script to pull historical data. It may not be the most succinct or pretty, but it’s worked for us. This is for one sensor so you may have to adjust to make it a group of sensors.

#Code -----

library(tidyverse)
library(httr) # authenticate API
library(jsonlite) # read JSON
library(lubridate) # work with dates

#Get current time - 2 days (Pull historical only allows 2 days of data at a time)
current.epoch ← as.character(as.integer(
as.POSIXct(Sys.time()) - days(2)
))

#List of variables you’d like to pull, formatted to paste into web address
fields=paste0(“?average=0&start_timestamp=”, current.epoch, “&fields=latitude%2C%20longitude%2C%20altitude%2C%20rssi%2C%20humidity%2C%20temperature%2C%20pressure%2C%20pm2.5_atm%2C%20pm2.5_atm_a%2C%20pm2.5_atm_b%2C%20pm2.5_alt%2C%20pm2.5_alt_a%2C%20pm2.5_alt_b%2C%20pm2.5_cf_1%2C%20pm2.5_cf_1_a%2C%20pm2.5_cf_1_b%2C%20pm10.0_atm%2C%20pm10.0_atm_a%2C%20pm10.0_atm_b%2C%20pm10.0_cf_1%2C%20pm10.0_cf_1_a%2C%20pm10.0_cf_1_b”)

#Pull historical data from API
raw_data_hist ← httr::GET(url = paste(“https://api.purpleair.com/v1/sensors/YY_ADD_SENSOR_ID/history",fields,sep="”),
config = add_headers(‘X-API-Key’ = “YY_ADD_YOUR_KEY”))

#Structurized data in form of R vectors and lists
jsonParsed ← fromJSON(content(raw_data_hist, as=“text”))

#Dataframe from JSON data
modJson ← as.data.frame(jsonParsed$data)
names(modJson) ← jsonParsed$fields

1 Like

Hi Laura, thanks for your script, it worked perfectly… It will help a lot in the management of the air quality network in the State of Acre, Brazil.
Thank you very much!

1 Like

Hello Laura Travis–

I was pleased to see that you are including the PM2.5_alt algorithm along with the CF1 and CF_ATM algorithms provided by Plantower in your R script. Based on my experience with all three algorithms, I would expect that the CF1 and CF_ATM algorithms will give results 30-90% higher than the PM2.5_alt algorithm. The latter has been compared to FEM/FRM regulatory monitors to obtain an accurate calibration factor for both the PMS1003 and PMS 5003 sensors used in the PA-I and PA-II monitors [1-3]. Recently a very large study was carried out to estimate long-term indoor exposures using indoor (4000) and outdoor (10,000) PurpleAir sites in the three West Coast states [4].

Are you making any indoor studies? If so, I can predict that there will be a large number of zeros returned by the CF1 and CF_ATM algorithms. The PM2.5_alt algorithm returns no zeros, since it is based on particle number and the particle number in the 0.3-0.5 um size category never falls to zero.

  1. Wallace, L. Zhao, T. and Klepeis, N.E… Calibration of PurpleAir PA-I and PA-II monitors using daily mean PM2.5 concentrations measured in California, Washington, and Oregon from 2017 to 2021. Sensors 2022, 22, 4741. https://doi.org/10.3390/ s22134741 Calibration of PurpleAir PA-I and PA-II Monitors Using Daily Mean PM2.5 Concentrations Measured in California, Washington, and Oregon from 2017 to 2021 - PubMed

  2. Wallace, L. Intercomparison of PurpleAir Sensor Performance over Three Years Indoors and Outdoors at a Home: Bias, Precision, and Limit of Detection Using an Improved Algorithm for Calculating PM2.5. Sensors 2022, 22, 2755. Sensors | Free Full-Text | Intercomparison of PurpleAir Sensor Performance over Three Years Indoors and Outdoors at a Home: Bias, Precision, and Limit of Detection Using an Improved Algorithm for Calculating PM2.5

3.Wallace, L., Bi, J., Ott, W.R., Sarnat, J.A. and Liu, Y. (2021) Calibration of low-cost PurpleAir outdoor monitors using an improved method of calculating PM2.5. Atmospheric Environment, 256 (2021) 118432. https//doi.org/10.1016/j.atmosenviron.2021.118432 Calibration of low-cost PurpleAir outdoor monitors using an improved method of calculating PM2.5 - ScienceDirect

  1. Wallace, L.A., Zhao, T., Klepeis, N.R. 2022 Indoor contribution to PM2.5 exposure using all PurpleAir sites in Washington, Oregon, and California.Indoor Air 32: (9) 13105. https://onlinelibrary.wiley.com/doi/abs/10.1111/ina.13105.
2 Likes

Hi Laura, I am trying to use your code to pull historical data. However, I got the same lexical error as Willian. I apologize that I am really new to coding and R, but I really need to figure this out for my thesis.

Here is my code

library(tidyverse)
library(httr) # authenticate API
library(jsonlite) # read JSON
library(lubridate) # work with dates

#Get current time - 2 days (Pull historical only allows 2 days of data at a time)
current.epoch <- as.character(as.integer(
as.POSIXct(Sys.time()) - days(2)
))

#List of variables you’d like to pull, formatted to paste into web address
fields=paste0("?average=0&start_timestamp=", current.epoch, "&fields=latitude%2C%20longitude%2C%20altitude%2C%20rssi%2C%20humidity%2C%20temperature%2C%20pressure%2C%20pm2.5_atm%2C%20pm2.5_atm_a%2C%20pm2.5_atm_b%2C%20pm2.5_alt%2C%20pm2.5_alt_a%2C%20pm2.5_alt_b%2C%20pm2.5_cf_1%2C%20pm2.5_cf_1_a%2C%20pm2.5_cf_1_b%2C%20pm10.0_atm%2C%20pm10.0_atm_a%2C%20pm10.0_atm_b%2C%20pm10.0_cf_1%2C%20pm10.0_cf_1_a%2C%20pm10.0_cf_1_b")

#Pull historical data from API
raw_data_hist <- httr::GET(url = paste("https://map.purpleair.com/1/b/i/mAQI/a10/p0/cC0?select=101555#12.75/43.65301/-79.3985", fields, sep=" "),
config = add_headers("X-API-Key" = "DB0239FE-****-11ED-****-***********"))

#Structurized data in form of R vectors and lists
jsonParsed <- fromJSON(content(raw_data_hist, as='text'))

#Dataframe from JSON data
#modJson <- as.data.frame(jsonParsed$data)
#names(modJson)<- jsonParsed$fields

This is the error
Error: lexical error: invalid char in json text. <!doctype html> <html lang=“en” (right here) ------^

  1. parse_string(txt, bigint_as_char)

  2. parseJSON(txt, bigint_as_char)

  3. parse_and_simplify(txt = txt, simplifyVector = simplifyVector, simplifyDataFrame = simplifyDataFrame, simplifyMatrix = simplifyMatrix, flatten = flatten, …)

  4. fromJSON(content(raw_data_hist, as = “text”))

Thank you very much!

Sicheng

Hi Sicheng, I have developed an R function to download historical data from PurpleAir sensors and it is available on my GitHub page https://github.com/willianflores/getPurpleairApiHistory. The function is still in the testing phase, if you use it and find any errors, please let me know.

Willian Flores

1 Like

Hi Willian I tried using your code from GitHub im running into an error Error in structure(.External(.C_dotTclObjv, objv), class = “tclObj”) :
[tcl] invalid command name “toplevel”.
Is it possible to do execute without the tcltk package? do you have any suggestions. I have a Mac I install XQuarts the required dependency.

Hi Shyra, this function was developed in a Windows environment and I have not tested it in a Mac environment.

Hi Willian, I found no errors running the function on my mac on RStudio. However, I think I found a bug with the 6-hour average pull. It appears that the 6 pm average reading every other day is missing. For reference, I tried to get 6-hour averages for multiple sensors for December 2022. I have not tried it with other averages yet. Otherwise, thank you so much for putting this together! It has been very helpful.

If people are still interested in this topic, I have recently posted a python script to download group data from a historical period through the API:

1 Like

I realize this is a thread about R, which is also my native language, but I found that python was a much better solution for this type of problem. Once it’s downloaded, I still use R for all of my processing and analyses.

1 Like

Hello Alyssa, I have identified this problem, apparently it is not related to the R code, but to the processing of the API itself. What I have been doing is downloading the data with an average of 10 minutes and doing a post-processing to obtain the other time averages, if necessary.

Hi Aaron, thank you very much for pointing out new possibilities for accessing PurpleAir data.

1 Like

Hi Willian,

I tried using your code that you posted earlier, however, I am met with the error: 'Could not find function “getPurpleairApiHistory”

I am wondering what package you used that had this function.

Thank you!

That I could help with, you have to go to where the function is saved also on the github and copy and paste the whole thing into R and run it, so the function for R is actually loaded into memory.

I have a question for William though, I ran it, and got this error,

Error in sprintf(“web service error %s from:\n %s\n\n%s”, status_code, :
object ‘webserviceUrl’ not found

Which seems to suggest that a URL is bad in the function? Or am I entering my information incorrectly?

Hi John, I have run into the same error and believe this happens when the data for a specific sensor does not exist at the specified time frame or when a sensor is private. I have bypassed this error by inserting the try() function at line 209. This will keep the code running through API calls that don’t return anything. I have found this to be helpful when querying multiple sensors over time. The code would look like how it does below. You could also implement the trycatch() function if you want to keep the error message but ensure the loop continues.

r_temp ← try(httr::GET( # added try so that loop will continue regardless of error
URLbase,
query = queryList,
config = add_headers(“X-API-Key” = apiReadKey)
))

1 Like