Python code using API to get historical data

Here is Python code to download historical PurpleAir data with new AQI. Test it to find bugs.
Thank you!!

2 Likes

Thanks, nice work!

Does there exist or is there an example to filter the request download for the following?

  • only primary sensor (A not redundant B)
  • ‘outside’ sensors (not ‘inside’)

Also, would another approach to purple air data download for larger regional analysis to download larger areas by some time interval(day/month/year/etc) and share the downloaded results on github or shared google drive than repeatedly tax the API for the same historical regional datasets?

What would the average daily amount of data collected by all purple air sensors currently be? I would guess less than a gigabyte/day.

Thanks,
Jeremy

Hi Jeremy,

I have updated the script to reflect choices of ‘location_type’ as indoor/outdoor/both. Please refer to the script lines: 27, 51-57, 60, 172-173.

Regarding only primary sensors data (A) then you need to provide only fields from sensor A . Please refer to the document and search ‘fields’ for more options. Loading...

The size of the data depends on minutes, hourly, daily, period of data and number of monitors.

Let me know if above works for you.
Thanks!

Hi Zuber, I have been working with your script but I have trouble on this line
engine = create_engine(‘postgresql://postgres:password@location:port/database’)
could you please tell me what user, password, location, and port this line is referring to?

thanks in advance

Hi Fernando,

The ‘engine’ is created if you are using SQL database to save downloads in a table. In my case, I am using PostgreSQL to save data in tables. More information about SQLAlchemy can be found at Engine Configuration — SQLAlchemy 1.4 Documentation

If you just need the data in CSV then comment lines #: 19, 22, 74, and 160.

Let me know if this works.

Thanks!

Hi Zuber
It runs well, Thank you.
About the field list, I had to change ‘humidity_a’, ‘humidity_b’ to ‘humidity’, ‘humidity_a’ in order to get both measurements (the same for temp ann pressure)

Hi Fernando,

I am glad that it worked for you. Heads up about humidity and other fields.

Variations:
humidity returns average for channel A and B.
humidity_a returns channel A data.
humidity_b returns channel B data.

Thanks!

Thanks for sharing! This is a great starting point that means I don’t have to start from zero - a huge time saver. For some reason the fields are out of order for me - regardless of the order of fields_list, the json has ‘location_type’ before lat/lon. So if I just move ‘location_type’ to be the third item in fields_list, it fixes the problem. Not sure why, but it’s an easy fix.

I am also planning to add (1) shapefile compatibility, so I can select sensors in a certain region instead of just a bounding box, and (2) grouped requests, so that I can follow API guidelines and avoid submitting many requests for many sensors.
These would be nice features if you are hoping to expand your code in the future.

Edit addition: I agree with the first comment that it seems like historical data should be available for download without this complicated API. At least at some moderately aggregated times (60 min, 1440 min). But that’s not an issue with your code, just a general recommendation for PA that would decrease the download burden.

Thanks again!

1 Like

So you are aware, PurpleAir data can be downloaded without using the API through our Sensor Data Download Tool. More information is available here: Download Sensor Data.

Fair point, there is indeed an easier way already! But this method does not work for larger areas. I think once people create code through their preferred languages to easily use the API (as Zuber has kindly started to do), the barrier to use will be a lot lower.

1 Like

@ppolonik your plans to include shapefile compatibility and group requests are great. Please share once you are done and I will include that in my code.

Thanks!!!

Well it sounds like it is not possible to access historical data right now without special permission. Hopefully this is temporary or permission is granted soon.

But in the meantime, I am trying to create a group. I was able to create one using:
requests.post(https://api.purpleair.com/v1/groups/?api_key=***&name=GroupName)
(where the key is now the write key)

However, when I go to add members, I get a 400 error.
requests.post('https://api.purpleair.com/v1/groups/GroupName/members?api_key=***&sensor_index=131075')
I have tried a variety of iterations on this but nothing has worked. If anyone has suggestions, please let me know!

Another unrelated code suggestion - I think as written the end date needs to be at least 14 days after the start date for 60 min averages to work. So after the creation of date_list, I added:

if enddate not in date_list:
    date_list = date_list.append(pd.DatetimeIndex([enddate]))