R script to retrieve historical data

Hi everyone! I’m still new to this group, so I apologize in advance if this has been answered before. Does anyone know of an existing R script that uses the new API to create a data file of all historical data for a given list of sensors? I tried to access the data but got the following error while processing the data:

Error: lexical error: invalid char in json text.
<!doctype html> <html lang=“en”
(right here) ------^

I coordinate an air quality monitoring network composed of 30 sensors and I need long-term historical data for several sensors. Any comments will be most welcome. Thank you very much.

Willian Flores

Hi Willian, we’ve developed a script to pull historical data. It may not be the most succinct or pretty, but it’s worked for us. This is for one sensor so you may have to adjust to make it a group of sensors.

#Code -----

library(tidyverse)
library(httr) # authenticate API
library(jsonlite) # read JSON
library(lubridate) # work with dates

#Get current time - 2 days (Pull historical only allows 2 days of data at a time)
current.epoch ← as.character(as.integer(
as.POSIXct(Sys.time()) - days(2)
))

#List of variables you’d like to pull, formatted to paste into web address
fields=paste0(“?average=0&start_timestamp=”, current.epoch, “&fields=latitude%2C%20longitude%2C%20altitude%2C%20rssi%2C%20humidity%2C%20temperature%2C%20pressure%2C%20pm2.5_atm%2C%20pm2.5_atm_a%2C%20pm2.5_atm_b%2C%20pm2.5_alt%2C%20pm2.5_alt_a%2C%20pm2.5_alt_b%2C%20pm2.5_cf_1%2C%20pm2.5_cf_1_a%2C%20pm2.5_cf_1_b%2C%20pm10.0_atm%2C%20pm10.0_atm_a%2C%20pm10.0_atm_b%2C%20pm10.0_cf_1%2C%20pm10.0_cf_1_a%2C%20pm10.0_cf_1_b”)

#Pull historical data from API
raw_data_hist ← httr::GET(url = paste(“https://api.purpleair.com/v1/sensors/YY_ADD_SENSOR_ID/history",fields,sep="”),
config = add_headers(‘X-API-Key’ = “YY_ADD_YOUR_KEY”))

#Structurized data in form of R vectors and lists
jsonParsed ← fromJSON(content(raw_data_hist, as=“text”))

#Dataframe from JSON data
modJson ← as.data.frame(jsonParsed$data)
names(modJson) ← jsonParsed$fields

1 Like

Hi Laura, thanks for your script, it worked perfectly… It will help a lot in the management of the air quality network in the State of Acre, Brazil.
Thank you very much!

1 Like

Hello Laura Travis–

I was pleased to see that you are including the PM2.5_alt algorithm along with the CF1 and CF_ATM algorithms provided by Plantower in your R script. Based on my experience with all three algorithms, I would expect that the CF1 and CF_ATM algorithms will give results 30-90% higher than the PM2.5_alt algorithm. The latter has been compared to FEM/FRM regulatory monitors to obtain an accurate calibration factor for both the PMS1003 and PMS 5003 sensors used in the PA-I and PA-II monitors [1-3]. Recently a very large study was carried out to estimate long-term indoor exposures using indoor (4000) and outdoor (10,000) PurpleAir sites in the three West Coast states [4].

Are you making any indoor studies? If so, I can predict that there will be a large number of zeros returned by the CF1 and CF_ATM algorithms. The PM2.5_alt algorithm returns no zeros, since it is based on particle number and the particle number in the 0.3-0.5 um size category never falls to zero.

  1. Wallace, L. Zhao, T. and Klepeis, N.E… Calibration of PurpleAir PA-I and PA-II monitors using daily mean PM2.5 concentrations measured in California, Washington, and Oregon from 2017 to 2021. Sensors 2022, 22, 4741. https://doi.org/10.3390/ s22134741 Calibration of PurpleAir PA-I and PA-II Monitors Using Daily Mean PM2.5 Concentrations Measured in California, Washington, and Oregon from 2017 to 2021 - PubMed

  2. Wallace, L. Intercomparison of PurpleAir Sensor Performance over Three Years Indoors and Outdoors at a Home: Bias, Precision, and Limit of Detection Using an Improved Algorithm for Calculating PM2.5. Sensors 2022, 22, 2755. Sensors | Free Full-Text | Intercomparison of PurpleAir Sensor Performance over Three Years Indoors and Outdoors at a Home: Bias, Precision, and Limit of Detection Using an Improved Algorithm for Calculating PM2.5

3.Wallace, L., Bi, J., Ott, W.R., Sarnat, J.A. and Liu, Y. (2021) Calibration of low-cost PurpleAir outdoor monitors using an improved method of calculating PM2.5. Atmospheric Environment, 256 (2021) 118432. https//doi.org/10.1016/j.atmosenviron.2021.118432 Calibration of low-cost PurpleAir outdoor monitors using an improved method of calculating PM2.5 - ScienceDirect

  1. Wallace, L.A., Zhao, T., Klepeis, N.R. 2022 Indoor contribution to PM2.5 exposure using all PurpleAir sites in Washington, Oregon, and California.Indoor Air 32: (9) 13105. https://onlinelibrary.wiley.com/doi/abs/10.1111/ina.13105.
2 Likes

Hi Laura, I am trying to use your code to pull historical data. However, I got the same lexical error as Willian. I apologize that I am really new to coding and R, but I really need to figure this out for my thesis.

Here is my code

library(tidyverse)
library(httr) # authenticate API
library(jsonlite) # read JSON
library(lubridate) # work with dates

#Get current time - 2 days (Pull historical only allows 2 days of data at a time)
current.epoch <- as.character(as.integer(
as.POSIXct(Sys.time()) - days(2)
))

#List of variables you’d like to pull, formatted to paste into web address
fields=paste0("?average=0&start_timestamp=", current.epoch, "&fields=latitude%2C%20longitude%2C%20altitude%2C%20rssi%2C%20humidity%2C%20temperature%2C%20pressure%2C%20pm2.5_atm%2C%20pm2.5_atm_a%2C%20pm2.5_atm_b%2C%20pm2.5_alt%2C%20pm2.5_alt_a%2C%20pm2.5_alt_b%2C%20pm2.5_cf_1%2C%20pm2.5_cf_1_a%2C%20pm2.5_cf_1_b%2C%20pm10.0_atm%2C%20pm10.0_atm_a%2C%20pm10.0_atm_b%2C%20pm10.0_cf_1%2C%20pm10.0_cf_1_a%2C%20pm10.0_cf_1_b")

#Pull historical data from API
raw_data_hist <- httr::GET(url = paste("https://map.purpleair.com/1/b/i/mAQI/a10/p0/cC0?select=101555#12.75/43.65301/-79.3985", fields, sep=" "),
config = add_headers("X-API-Key" = "DB0239FE-****-11ED-****-***********"))

#Structurized data in form of R vectors and lists
jsonParsed <- fromJSON(content(raw_data_hist, as='text'))

#Dataframe from JSON data
#modJson <- as.data.frame(jsonParsed$data)
#names(modJson)<- jsonParsed$fields

This is the error
Error: lexical error: invalid char in json text. <!doctype html> <html lang=“en” (right here) ------^

  1. parse_string(txt, bigint_as_char)

  2. parseJSON(txt, bigint_as_char)

  3. parse_and_simplify(txt = txt, simplifyVector = simplifyVector, simplifyDataFrame = simplifyDataFrame, simplifyMatrix = simplifyMatrix, flatten = flatten, …)

  4. fromJSON(content(raw_data_hist, as = “text”))

Thank you very much!

Sicheng

Hi Sicheng, I have developed an R function to download historical data from PurpleAir sensors and it is available on my GitHub page https://github.com/willianflores/getPurpleairApiHistory. The function is still in the testing phase, if you use it and find any errors, please let me know.

Willian Flores

1 Like

Hi Willian I tried using your code from GitHub im running into an error Error in structure(.External(.C_dotTclObjv, objv), class = “tclObj”) :
[tcl] invalid command name “toplevel”.
Is it possible to do execute without the tcltk package? do you have any suggestions. I have a Mac I install XQuarts the required dependency.

Hi Shyra, this function was developed in a Windows environment and I have not tested it in a Mac environment.

Hi Willian, I found no errors running the function on my mac on RStudio. However, I think I found a bug with the 6-hour average pull. It appears that the 6 pm average reading every other day is missing. For reference, I tried to get 6-hour averages for multiple sensors for December 2022. I have not tried it with other averages yet. Otherwise, thank you so much for putting this together! It has been very helpful.