Is there an Introductory guide on how to utilise the API to analyse historical data?

Hello,

I first would like to say hello! I am a research scientist that was just introduced to Purpleair and am becoming more familiar on how to utilise the data presented. While my doctorate is in behavioural and spatial (temporal!) ecology, and I am familiar with meteorological and climatic data, I am not the most tech savvy person in the world when it comes to APIs.

I am hoping to learn how to use the API to get historical data from specific sensors in my state (PM 2.5 as well as humidity and temp) . From my understanding, this can be quite a process ranging from getting an API key, to working in python and/or R (I have used both for work but am more familiar with R). While the website and the forum are quite informative, as you can imagine, it can be a bit overwhelming knowing the exact right steps to take in order on how to accomplish my goals. Today I even had to learn more about APIs in themselves!

In any event, would anyone have any step by step tutorials handy or can provide me with links that may allow me to familiarise myself to a workflow to achieve my goals? I.e., programmes or packages to download, what to put in to get the correct data, etc.

I apologise for such a simple question, but even a PDF, link to another forum post, or even a youtube video would be of use as of right now it seems like it is difficult to see the forest from the trees! (so to speak).

Thank you :slight_smile:

1 Like

Hi, you have my undivided attention as i feel i am in the same stage of elusive. I am researcher as well and purpleAir just got to my area to focus on. i honestly not know where to start yet. but I guess all I can do is to try maneuver some thoughts with you if you like. looking forward to hear from you.
Thanks

This thread has a script for pulling historical data with R - R script to retrieve historical data - Data / API - PurpleAir Community. It uses slightly different code than mine - they used the jsonlite package to directly translate a request to an R dataframe, whereas I have been writing all requests to text files to save them over time. Here’s a generalized version of the script I made -

#the purpose of this script is to pull the most recent 2 weeks of data from a list of sensors
#and then compile it over time
#requirement: maintain sensorindex_name csv as new sensors are deployed

#setting working directory - change to whatever path you want to use
setwd("~/R/YourWorkingDirectory")

#install if necessary
install.packages("readr")
install.packages("lubridate")
install.packages("httr")

#to be usable within this file
library(readr) 
library(lubridate) 
library(httr) 

#contact purpleair support at contact@purpleair.com to request your API keys
#they are typically very speedy
#the historical endpoint was recently restricted, so you may specifically need 
#to ask for access to it
read_key <- "your key here"
write_key <- "your key here"

#getting the list of sensor indexes
#this needs to be manually updated as new sensors are brought online
#the sensor index can be found by clicking the sensor on the live map after select= in the URL
#this is pulling from a simple table with column headers "sensor_index" and "name_on_map"
sensorindex_name <- read.csv("sensorindex_name.csv", fileEncoding="UTF-8-BOM")
sensor_list <- as.list(sensorindex_name$sensor_index)

#manipulating the URL
#the URL is customizable, you can test how adding and removing elements looks at https://api.purpleair.com/
#i have included only the fields I thought were necessary for the scope of my data review
#this code makes the end time the current time and the start time 2 weeks before the end time
#2 weeks is the max time period you can request at one time for hourly data
URL <- "https://api.purpleair.com/v1/sensors/sensor_index/history/csv?start_timestamp=starttime&end_timestamp=endtime&average=60&fields=pm2.5_cf_1_a%2C%20pm2.5_cf_1_b%2Cpm2.5_atm_a%2C%20pm2.5_atm_b%2Cpm2.5_alt_a%2Cpm2.5_alt_b%2Chumidity%2Ctemperature%2Cpressure%2Cuptime%2Crssi%2Cpa_latency%2Cmemory"
endtime <- as.integer(Sys.time())
twoweeks <- 1209600 #604800 is the number of units the unix timestamp elapses in two weeks
starttime <- endtime-twoweeks
URL <- sub('starttime', starttime, URL)
URL <- sub('endtime', endtime, URL)

#looping sensor_index over the URL, writing output to text files
#not sure if there is a better way to parse out the data
#this does preserve each request as a unique text file, which is partially what I wanted
#PurpleAir support has asked that you send no more than one request every 1-10 minutes
#hence the forced pause in the code
for (i in sensor_list)
{
  request_URL <- sub('sensor_index', i, URL)
  data <- GET(request_URL,add_headers('X-API-Key'=read_key))
  data <- content(data, "raw")
  writeBin(data, paste(i,starttime,endtime,".txt", sep="_"))
  flush.console() #this makes sys.sleep work in a loop?
  Sys.sleep(60) 
}

#reading in the files
filepath = "~/R/YourWorkingDirectory"
outputdata_list <- list.files(path=filepath, pattern=".txt")
for (i in 1:length(outputdata_list)){
  assign(outputdata_list[i], 
         read.table(outputdata_list[i], sep=",", header=TRUE)
  )}

#merging the imported data frames
#the time conversion automatically makes it the current timezone according to R
#in my script, I adjust R's timezone to EST so it matches our regulatory monitors
#don't have to worry about converting from UTC (default time zone in PA data)
sensor_history <- do.call(rbind.data.frame, mget(ls(pattern = ".txt")))
row.names(sensor_history) <- NULL #row labels were from original files and ugly
sensor_history$time_stamp <- as.POSIXct(sensor_history$time_stamp, origin="1970-01-01") #better datetime format
sensor_history <- unique(sensor_history) #removes overlapping days, don't have to worry about getting data exactly once every 2 weeks
sensor_history <- merge(sensor_history, sensorindex_name[, c("sensor_index", "name_on_map")], by="sensor_index") #adds name on map to merged df
sensor_history <- sensor_history[c("time_stamp", "sensor_index", "name_on_map", "rssi", "uptime", "pa_latency", "memory", "humidity", "temperature", "pressure", "pm2.5_atm_a", "pm2.5_atm_b", "pm2.5_cf_1_a", "pm2.5_cf_1_b", "pm2.5_alt_a", "pm2.5_alt_b")]

#saving the merged file for further analysis
write.csv(sensor_history, file="sensor_history.csv")

Hope that helps! Please feel free to shoot me a message if you have any questions about how this works.

Hi Chloe! Thank you so much for that! Also, I truly appreciate the script and example. Despite being familiar with R, an actual example helps me out tremendously! I will let you know how it goes :grinning:

Hi! We can certainly chat! I am glad that I am not in this alone! Do send me a message if you would like :grinning: . Also, the community here are incredibly friendly so that is also very encouraging! Looking forward hearing from you soon !

Thanks, Redskies421 for replying to me, I am definitely down to chatting with you. I guess, which stage now you are in? I’m doing some analysis for a date and am supposed to receive it by next week or so, in meant time I’m reading papers essentially using crowdsourced data in Ca. let me know how we can extend this convo—looking forward to connecting further with you. regards!

My apologies for an extremely beginner - level post (but this seems to perhaps be the
right place). Here is the question that I just emailed to PurpleAir Support:

I am writing to request help with downloads of historical data, since
the process is now very different than it was a year ago. Rather than
selecting one or more monitors from a list, entering the desired
day range, and clicking on a button, it now appears (??) that one
has to write and run computer programs in Python or R (??) to access
historical data. (The easy one-click downloads appear to still be
possible for the data that are graphed when you double-click on a
location from the map… …but they provide only the last three days
of data.)

I have tried looking through the FAQ and PurpleAir Community
Forums, but most of those entries have been written by programmers
for other programmers, rather than lay-people with no programming
experience.

Can you provide a resource that will describe things at a much simpler,
more detailed level, and clarify the exact type of programming software
that I will need to install and run on my iMac? Something intended for a
person who has no prior experience with API or Python or R or any of that
stuff. I need a really introductory approach because my intention is to share /
teach / develop these data-accessing skills in my undergraduate students,
who are not always computer-literate STEM majors.

Thanks !!

To clarify your first point - my understanding is that map data downloads now have the same restrictions as the historical API. That means you can only get 3 days of 10-minute data at once, but you would be able to get 2 weeks of 60-minute data at once. I haven’t used the map download tool much, but I’m not actually sure if there’s a way to change the time frame displayed on the map (e.g. can I look at 2 weeks of hourly data from a month ago?).

In terms of understanding how to work with APIs, I’d be surprised (and impressed) if PurpleAir support had answers for your questions. Purely from an R perspective, this is how I’d go about figuring out (i.e. Googling) your problem:

  1. How to install Program R and R Studio on an iMac.

  2. R basics, such as understanding packages, libraries, objects, functions, and how to set up your working directory.

  3. API basics (i.e. what APIs are, what they’re used for, and how they work). You should also read through the PurpleAir API documentation to get an idea of the kind of data you can get from their servers.

  4. Understanding how to use APIs with R and example code (which you can find above). Maybe controversial (?) but I’ve found ChatGPT to be really useful for debugging and providing assistance with portions of code. Stack Overflow is a critical resource as well.

I hope that’s at least a little helpful! Those are pretty much the steps I went through when PurpleAir started testing their new historical API last June.

Hi Chloe,

Thank you so much for a timely and very helpful response – it is much appreciated.

The approach you suggest seems reasonable, and now it’s on me to do my homework
and learn the required programming skills. This will require a non-trivial front-end investment
of time and effort, but the reward of being able to do large / customized downloads will be
pretty substantial.

1 Like