Research Considerations

The following article is meant to provide a number of considerations for academics and researchers who want to use PurpleAir data or sensors. During each step of the research process, there are a number of caveats that need to be considered.

PurpleAir Data can be used in a variety of ways. This versatility lends itself to many different types of research. At the following link, you can find a non-exhaustive list of completed research articles either on or using PurpleAir data or sensors: Research Papers using PurpleAir Data or PurpleAir Sensors


What PurpleAir Offers

PurpleAir offers a global sensor network and an expansive dataset dating back to as early as 2016. PurpleAir sensors report air quality and environmental condition data.

Types of Data

  • Particulate Matter (PM): PurpleAir offers PM data in sizes ranging from 0.3µm to 10µm. Values are estimated using Plantower laser counters. We are primarily known for the preciseness of our PM data for sizes 2.5µm and below.

Disclaimer: PurpleAir sensors have been evaluated to be very precise at estimating PM2.5 and below. However, our PM10 data is not of the same quality, and we do not recommend using it.

  • Temperature/Pressure/Humidity: PurpleAir offers these measurements using Bosch BME sensor boards.

  • Volatile Organic Compounds (VOCs): PurpleAir offers experimental VOC data using a Bosch BME 680/688 sensor board. These are Total VOC, or TVOC, values. Meaning the sensors are incapable of differentiating between different VOCs.

Disclaimer: The VOC measurements from PurpleAir devices are purely experimental. We have not seen any research or evaluations indicating that they are accurate. We only recommend using VOC data if the purpose of your research is to evaluate its quality.

  • Other Information: PurpleAir sensors provide other types of assorted data. Some of the above values may be used to calculate this data.

Most of the above-advertised data is available in resolutions ranging from real-time (2-minute average) to yearly averages.

Sensors

The above section primarily discusses PurpleAir data. We also offer an array of sensors for collecting your own data. If you are interested in purchasing sensors for research purposes, you should refer to our Which Sensor to Choose article. Some things you might want to consider could include:

  1. MicroSD logging capability.

  2. Durability for indoor/outdoor environments.

  3. Laser counter replacement process.

Outdoor-rated PurpleAir sensors (the Classic, Classic SD, Flex, and Zen models) contain two laser counters that conduct separate measurements.


Data Collection Process

When discussing the data collection stage of research, there are a few things to keep in mind. For the purposes of this article, we are organizing “data” into two categories: data from existing datasets and data from owned sensors.

Existing Datasets

When we talk about “existing” datasets, we’re referring to data that has already been sent to PurpleAir servers. This includes the entire history of the dataset going back to 2016, as well as live data uploaded in real time. This data can generally be sorted into two categories: live data and historical data.

  • Historical data typically involves retrieving large datasets from periods of time that are in the past, hence historical. This type of data is typically used for historical analysis. We have seen these datasets compared with historical events to find correlations, for example.

  • Live data involves pulling smaller datasets frequently to monitor an area’s data in real time. This type of data is typically used when a live or real-time feed of data is desired. We have seen this used for city or county dashboards, for example.

Both the above types of data require retrieval via the PurpleAir API.

The PurpleAir API

The API is the medium through which you can retrieve PurpleAir data from our servers. To get started with the API, check out our About the PurpleAir API article.

It is important to note that the API is a paid service using a points-based billing system. Details on points cost can be found in our API Pricing article. When planning research projects, we ask that you budget for data costs.

If you know what data you’re looking for, you can utilize our How Do I Calculate My API Call’s Point Cost article to estimate how many points you may need. If the data you’re looking for seems too expensive to be sustainable, you can make your API calls more efficient to reduce cost. If you have further concerns, please reach out to contact@purpleair.com.

There are also some other pieces of information to keep in mind when downloading data.

  • Indoor/Outdoor Sensors: The PurpleAir network consists of indoor and outdoor sensors. Consider whether this is important for you as you pull data from sensors. If needed, you can isolate your sensor list to indoor or outdoor sensors using the API parameter location_type.

  • Public/Private Sensors: Sensor owners have the option of registering devices as public or private. Data that is freely available on the map and through the API consists entirely of public sensors. If you wish to retrieve data from private sensors, you will need information only accessible by the sensor owner.

  • Channel-Specific Data vs Averaged Data: When retrieving data on the API and through other retrieval methods, you will see data fields appended with _a or _b. These are the readings from channel A and channel B in a device. These are the two separate laser counters mentioned above. You can pull data from individual channels, or you can pull the averaged field.

  • CF=1, ATM, and ALT: PurpleAir devices do not measure µg/m3 air quality readings specifically. Rather, they estimate a µg/m3 reading based on particle per deciliter counts. CF=1, ATM, and ALT are the formulas used to estimate these readings. More information is available in this community article.

  • Station Information Fields: In addition to the data produced by the sensors, you can also retrieve data about the sensors, such as latitude and longitude, hardware, or firmware_version. This data does not update frequently, so you do not need to query it as often.

Owned Sensors

When we discuss owned sensors, we’re referring to PurpleAir sensors that are owned by you or your organization. Given that owned sensors also contribute to existing datasets, the above still applies. This means that if the sensors have an internet connection, you can retrieve their data via the API. However, there are a couple of main differences to keep in mind, as well as some new data retrieval options available to you.

These retrieval methods will only work if the sensor is functioning as intended. To find strategies for reducing data loss, check out our Data Collection and Reliability article.

Owner Differences

As a sensor owner, there are a couple of differences between querying data from your sensors vs other sensors.

  1. Sensor owners have access to the information necessary to query data from their private sensors. If you need to query from a private sensor, you will need to include the sensor’s private read key in your API call. More information is available in this community article: View Your Sensor on the Map

  2. Sensor owners do not have to pay for the data from their own sensors. As per our API Points for Sensor Owners article, you can retrieve that data at no cost.

Additional Data Retrieval Options

As a sensor owner, there are a couple of alternative data retrieval options available to you.

  1. SD Card Data: Most PurpleAir models support SD logging hardware (the exceptions being the PurpleAir Classic and the PurpleAir Touch). As long as an appropriately formatted SD card is inserted, the sensor will log data to it. Then, you can retrieve that data physically via the SD card. SD card data will be categorized under the following file headers.

  2. Local Querying: When a sensor is connected to a WiFi network, whether that network has internet access or not, you can retrieve JSON data from it using its IP address. This community article provides more information.


Data Presentation

All PurpleAir data is available without any QA/QC processes applied. This data is collected from sensors that are run by an assortment of individuals and organizations, and as such, you may wish to apply some yourself. The below information is meant to assist you in understanding how data is presented by PurpleAir. This presentation differs between the map, the API, and through other retrieval methods.

Collection Frequency

Knowing how data is collected and aggregated by the sensors may affect what data you want to use in your research.

  • Outdoor-rated PurpleAir sensors contain two laser counters that conduct separate measurements. These are the majority of sensors reporting to PurpleAir. When a device contains two laser counters, the data from each is aggregated to create singular averages. Sensors will alternate readings between the two laser counters on a five-second basis. This means that one laser counter will take readings for five seconds, and then the other does the same, and so on. These five-second readings are then averaged into a 2-minute “real-time” average.

  • To create lower resolutions of data samples, averages are also aggregated. For example, 2-minute samples are averaged into 10-minute samples. Then, three 10-minute samples are averaged into a 30-minute sample. Then, two 30-minute samples are averaged into a 1-hour sample. All higher averages are calculated using 1-hour averages.

Channel Downgrades and Confidence Score

There are two indicators that we use to determine the quality of data from any given sensor. Both of these methods are used on the PurpleAir map and can be queried from the API.

  • Channel Downgrades: As mentioned above, the majority of PurpleAir sensors contain two laser counters (or channels) that conduct individual air quality measurements. If we believe that one channel is producing erroneous readings because it is reading much higher or lower than the other channel, it will be “downgraded.” On the PurpleAir map, this means the downgraded channel’s readings will not be considered when calculating sensor values. Downgrades can be applied automatically by the system or manually by a PurpleAir staff member.

  • The Confidence Score: This is a measure of how “confident” PurpleAir is in the readings a sensor is producing. The disparity between the two channels in a device, as well as the above-mentioned downgrades, affect the Confidence Score. More information is available in this community article.

Applied Corrections

Temperature and Humidity measurements for PurpleAir sensors are taken inside the device housing and do not directly reflect ambient conditions. To more closely match ambient conditions, corrections can be used. These corrections are also available on the PurpleAir Map as separate data layers.

  • Temperature: On average, the raw temperature readings are 8° Fahrenheit higher than ambient temperatures.

  • Humidity: On average, the raw humidity readings are 4% lower than ambient conditions.


Learn More

How to Create a Sensor Lending Program
How to Tell When A Sensor Produced Data
What Do PurpleAir Sensors Measure and How Do They Work?

1 Like