Python script for downloading and organizing historical API data

Aaron_Lamplugh · February 24, 2023, 6:05pm

Hey all,

I just finished this python script to access historical data through the API. It pulls data for all monitors in a registered group. It then creates a file structure where it dumps the data and compiles it into a single CSV for each monitor. It’s currently setup to download the files in a very similar format to the way the thingspeak download page used to work (primary and secondary A and B channels). I also have it making a files similar to the format that comes off the SD cards. This way, if you’re like me and had a bunch of analysis scripts setup to work with the old thingspeak files, you don’t need to change them much.

This script also makes a folder to dump in all of the compiled secondary A&B files to perform checks of the data. I’ll post an R script soon that performs the A and B channel comparisons and provides metrics for sensor performance.

Big thanks to Zuber Farooqui, who provided the base code that I adapted into this script. I’m very new to python, so I don’t think I would have figured it out otherwise.

Feel free to tweak and use as needed.

github.com

alamp326/PA_DataScripts/blob/main/pa_get_historicaldata_bygroup.py

# -*- coding: utf-8 -*-
"""
This code gets hisotrical PurpleAir data from new PurpleAir API for a group of monitors and stores them in a file structure in my documents.

Data from the site are in bytes/text and NOT in JSON format.

Created on Fri Jun 10 21:34:01 2022

@author: Zuber Farooqui, Ph.D.

Modified Wed Feb 15 17:00:00 2023

@author: Aaron Lamplugh, Ph.D.
"""

import requests
import pandas as pd
from datetime import datetime
import time
import json

This file has been truncated. show original

Aaron_Lamplugh · February 24, 2023, 6:11pm

Also, in case anyone isn’t aware, you need your own API read key to make this work (line 28), and it needs to have approval to access historical endpoints.

There are also 2 locations in the code where you’ll need to set your own directory:

Lines 67 and 239

You also need to set the start and end dates:

Lines 34 and 35

and the group number:

Line 38

bluesystems · March 1, 2023, 5:02pm

I am not really a programmer by vocation (hardware designer) and so find writing code to be “difficult” work. So please excuse me if my questions seem easy to see the answer to if you are a programmer.
Does your Python script output PM2.5 values for each sensor and provide all the same corrections that result from a download from the PurpleAir web page?
What version of (Win10 PC based) Python does your script need to compile and run properly? I have in the past encountered versions of Python and incompatibilities between code and a specific version of Python.
Thanks for your help.

Aaron_Lamplugh · March 1, 2023, 5:38pm

Hey Bob, this script downloads the purpleair CF1 data, which is the standard data that PA monitors report. If you want to change that, you can insert different parameters into the field lists included on lines 86, 95, 104, 113, and 122. The full list of fields you can query is available on the API website:

https://api.purpleair.com/#api-groups-get-member-history

If you are looking for corrected PM2.5 data, I believe that the pm2.5_alt field is Lance’s corrected PM2.5 data, which has the most documentation associated with it. I correct my own data from the particle number fields, so I primarily utilize the data in the “secondary” data files.

I’m running this code on Windows 10 with python 3.11. I do believe you’ll need to download a couple modules to run this script if you don’t already have them (specifically the requests module). I just had my undergraduate assistant do it on his computer, and it seemed to work easily enough. I do think there could be issues if you’re running an older version of python that isn’t compatible with some of these modules.

Lance · March 1, 2023, 7:27pm

Bob and Aaron–

Aaron is right about the pm2.5_alt variable being my corrected PM2.5 data. This algorithm avoids both the Plantower proprietary algorithms CF_1 and CF_ATM and thus, as Aaron states, uses only the secondary output of the smallest four of the six number fields: >0.3 um, > 0.5, >1, and >2.5. Of course, one needs to subtract each succeeding value to get N1, N2, and N3 for the 0.3-0.5 um, 0.5-1 um, and 1-2.5 um size categories contributing to PM2.5. Then there is a simple equation to determine PM2.5, which has been implemented by PurpleAir to give the PM2.5 estimate directly.

fuhad01 · March 6, 2023, 8:47am

hi Aaron.

please, what’s the maximum average_time limit supported?

Aaron_Lamplugh · March 6, 2023, 3:11pm

That goes back to Zuber’s original script. He has the following options for the average time:

0 (real-time), 10 (default if not specified), 30, 60

So I guess that would be hourly. Of course, you can always average it any way you want after the download. That’s why I always pull the real-time data.

Zuber_Farooqui · March 10, 2023, 11:55pm

Maximum average time goes up to 24 hours (1440 minutes). I did not include that in script as I will not recommend using this as the data are in UTC. It will generate 24 hours average based on UTC which may not be useful to your location. I will suggest to get hourly average and convert to your time zone and then calculate 24 hours average.

Thanks!

Marcello_Barisonzi · March 21, 2023, 3:56pm

Hi Aaron,

what do you mean when you write “it needs to have approval to access historical endpoints”?

How does one get approval?

Thanks,
Marcello

Aaron_Lamplugh · March 21, 2023, 4:43pm

Marcello, you need an API read key to access any data through the PA API, but in order to access historical data you need additional permissions. You can gain these by emailing PA support and requesting them. Until you have these permissions, you won’t be able to run this script.

Topic		Replies	Views
Python script to retrieve historical data API	4	1167	December 19, 2022
Getting historical data with API keys in Python API	4	1002	February 8, 2023
PurpleAir Data Migration to BigQuery and a New API API	9	2045	April 28, 2025
Python code using API to get historical data API	11	4099	August 10, 2022
Preview of API that includes the ability to get sensor history API	40	3656	December 6, 2022

Python script for downloading and organizing historical API data

Related topics