Machine Learning on PurpleAir data

Zander_bw · October 17, 2022, 8:12pm

Hi Everyone!

I am a data scientist by training turned founder and I have used the data collected from PurpleAir myself during California wildfire season to figure out where and when it is safe to go outside. More recently I wrote a software program that leveraged PurpleAir data and machine learning to detect malfunctioning sensors vs. smoke from fire events and I gave a presentation on it. I wanted to share it here because I thought some of the people here might find it interesting, but also they might have some ideas on how I could make it better. It currently serves as a proof of concept.

The project uses the historical real-time data available from the website and replays it in a streaming fashion so as if the data were coming directly from many of the devices. The software first detects anomalies in an unsupervised manner and then it compares nearby sensors to determine if it is a malfunctioning device or smoke from a wildfire.

The presentation and the code can be found here.

Thanks for any feedback and for checking out my project.

melina_peixoto · February 17, 2024, 9:19am

I ask permission to study your code. I believe it will help my doctoral thesis.

Topic		Replies	Views
Interviewing People on Hardware for Environmental Sensing Community Projects	4	505	October 13, 2024
How can I identify which PurpleAir sensors recorded data during the Palisades Fire (Jan 7–31, 2025) Data	1	39	May 17, 2025
Feature idea: Allow businesses to tag their sensors for the realtime map User Experience	0	272	August 27, 2023
PurpleAir Data Validation Data	7	784	February 23, 2024
New Community Scientist Role added to the PurpleAir Team Community Projects	10	3445	October 23, 2024

Machine Learning on PurpleAir data

Related topics