Machine Learning on PurpleAir data

Hi Everyone!

I am a data scientist by training turned founder and I have used the data collected from PurpleAir myself during California wildfire season to figure out where and when it is safe to go outside. More recently I wrote a software program that leveraged PurpleAir data and machine learning to detect malfunctioning sensors vs. smoke from fire events and I gave a presentation on it. I wanted to share it here because I thought some of the people here might find it interesting, but also they might have some ideas on how I could make it better. It currently serves as a proof of concept.

The project uses the historical real-time data available from the website and replays it in a streaming fashion so as if the data were coming directly from many of the devices. The software first detects anomalies in an unsupervised manner and then it compares nearby sensors to determine if it is a malfunctioning device or smoke from a wildfire.

The presentation and the code can be found here.

Thanks for any feedback and for checking out my project.

1 Like

I ask permission to study your code. I believe it will help my doctoral thesis.