Python code using API to get historical data

Zuber_Farooqui · June 14, 2022, 10:44pm

Here is Python code to download historical PurpleAir data with new AQI. Test it to find bugs.
Thank you!!

jcothran · June 21, 2022, 2:53pm

Thanks, nice work!

Does there exist or is there an example to filter the request download for the following?

only primary sensor (A not redundant B)
‘outside’ sensors (not ‘inside’)

Also, would another approach to purple air data download for larger regional analysis to download larger areas by some time interval(day/month/year/etc) and share the downloaded results on github or shared google drive than repeatedly tax the API for the same historical regional datasets?

What would the average daily amount of data collected by all purple air sensors currently be? I would guess less than a gigabyte/day.

Thanks,
Jeremy

Zuber_Farooqui · June 22, 2022, 7:48pm

Hi Jeremy,

I have updated the script to reflect choices of ‘location_type’ as indoor/outdoor/both. Please refer to the script lines: 27, 51-57, 60, 172-173.

Regarding only primary sensors data (A) then you need to provide only fields from sensor A . Please refer to the document and search ‘fields’ for more options. Loading...

The size of the data depends on minutes, hourly, daily, period of data and number of monitors.

Let me know if above works for you.
Thanks!

fvelarde · June 28, 2022, 9:34pm

Hi Zuber, I have been working with your script but I have trouble on this line
engine = create_engine(‘postgresql://postgres:password@location:port/database’)
could you please tell me what user, password, location, and port this line is referring to?

thanks in advance

Zuber_Farooqui · June 28, 2022, 9:59pm

Hi Fernando,

The ‘engine’ is created if you are using SQL database to save downloads in a table. In my case, I am using PostgreSQL to save data in tables. More information about SQLAlchemy can be found at Engine Configuration — SQLAlchemy 1.4 Documentation

If you just need the data in CSV then comment lines #: 19, 22, 74, and 160.

Let me know if this works.

Thanks!

fvelarde · June 29, 2022, 3:11am

Hi Zuber
It runs well, Thank you.
About the field list, I had to change ‘humidity_a’, ‘humidity_b’ to ‘humidity’, ‘humidity_a’ in order to get both measurements (the same for temp ann pressure)

Zuber_Farooqui · June 29, 2022, 4:18pm

Hi Fernando,

I am glad that it worked for you. Heads up about humidity and other fields.

Variations:
humidity returns average for channel A and B.
humidity_a returns channel A data.
humidity_b returns channel B data.

Thanks!

ppolonik · August 10, 2022, 4:40am

Thanks for sharing! This is a great starting point that means I don’t have to start from zero - a huge time saver. For some reason the fields are out of order for me - regardless of the order of fields_list, the json has ‘location_type’ before lat/lon. So if I just move ‘location_type’ to be the third item in fields_list, it fixes the problem. Not sure why, but it’s an easy fix.

I am also planning to add (1) shapefile compatibility, so I can select sensors in a certain region instead of just a bounding box, and (2) grouped requests, so that I can follow API guidelines and avoid submitting many requests for many sensors.
These would be nice features if you are hoping to expand your code in the future.

Edit addition: I agree with the first comment that it seems like historical data should be available for download without this complicated API. At least at some moderately aggregated times (60 min, 1440 min). But that’s not an issue with your code, just a general recommendation for PA that would decrease the download burden.

Thanks again!

Ethan · August 10, 2022, 2:57pm

So you are aware, PurpleAir data can be downloaded without using the API through our Sensor Data Download Tool. More information is available here: Download Sensor Data.

ppolonik · August 10, 2022, 3:07pm

Fair point, there is indeed an easier way already! But this method does not work for larger areas. I think once people create code through their preferred languages to easily use the API (as Zuber has kindly started to do), the barrier to use will be a lot lower.

Zuber_Farooqui · August 10, 2022, 4:06pm

@ppolonik your plans to include shapefile compatibility and group requests are great. Please share once you are done and I will include that in my code.

Thanks!!!

ppolonik · August 10, 2022, 9:59pm

Well it sounds like it is not possible to access historical data right now without special permission. Hopefully this is temporary or permission is granted soon.

But in the meantime, I am trying to create a group. I was able to create one using:
requests.post(https://api.purpleair.com/v1/groups/?api_key=***&name=GroupName)
(where the key is now the write key)

However, when I go to add members, I get a 400 error.
requests.post('https://api.purpleair.com/v1/groups/GroupName/members?api_key=***&sensor_index=131075')
I have tried a variety of iterations on this but nothing has worked. If anyone has suggestions, please let me know!

Another unrelated code suggestion - I think as written the end date needs to be at least 14 days after the start date for 60 min averages to work. So after the creation of date_list, I added:

if enddate not in date_list:
    date_list = date_list.append(pd.DatetimeIndex([enddate]))

Topic		Replies	Views
Getting historical data with API keys in Python API	4	1011	February 8, 2023
Python script to retrieve historical data API	4	1172	December 19, 2022
Python PurpleAirAPI (PAA) Implementation API	8	1475	January 19, 2023
Erro 400 api.purpleair get historical data API	6	904	January 18, 2023
Python script for downloading and organizing historical API data API	9	2205	March 21, 2023

Python code using API to get historical data

Related topics