Instructor: Matthew Turk
Notebook by Group H : Beatrice Lovely, Zane Inman, Manasi Karale, Zahra Malwi, Suhani Jaiswal
This data was gathered by Beatrice and a friend on the streets and sidewalks of Champaign
Classifying where an electric scooter is ridden based on data from Accelerometer and Gyroscope measurements
Data Collection Setup - smartphone mounted on scooter
Sidewalk on Gregory and Sixth Street
Street 2 : Loredo Taft Drive
Parameters used:
Can a machine learning model classify where the scooter was ridden based on the data, and what features would it need to do so?
If we can visualize features in the data and qualitatively see a difference - then a ML model should be able to tell the difference as well!
50.5050505050505
Conclusion: sampling frequency is 50 Hz
1001
1001
Above I have actually divided data into training samples for a ML model by "chopping up" each and every data stream into windows of 75 samples each, and z-normalizing (mean 0, std 1), and computing features (mean, std, 90th percentile, 10th percentile. Below is a basic feature importance analysis, a heatmap of which features are most correlated to the label. Interestingly it is the 10th percentile of of accelerometer in y and z directions, I was expecting it to be standard deviation or something.
done!
Overview:
The above plot is divided into 3*3 subplots. They have common X and Y axis. These subplots can be grouped such that vertically they represent same direction of sensor and horizontally they represnt sensor. The X axis has 6 main ticks -
The Y axis is the mean values that we calculated for above 6 combinations. The blue line represents sidewalk values and orange line represents street values.
Inference:
If the blue line never overlaps with orange line then we can use that direction and sensor to provide to the ML model that will use those values to differentiate between sidewalk and street. To Note: Values not overlapping each other isn't the only requirement but we also need to have the blue line consistently over the orange line or vice a versa to confidently use the respective direction and sensor to plot the values.
Overview:
The above interactive visualization works on one file and direction axis within it at a time. It has a common X-axis (which can be chosen) and 2 Y-axis one on each side. The one on left is red colored and the one on right is green colored. It is a scatter plot that shows correlation between 3 values (in the form of 2 vs 2 scenario) at the same time.
Note:
Steps to interact with thid visualization: 1. Choose the Sensor for your plots X, Y and Z axis 2. Choose the Direction for your sensor 3. Choose the File you want to compare your sensors and directions in
Inference:
As seen above the Acclerometer vs Gyrometer (green dots) are distributed across the plot and away specifically from center. Whereas Acclerometer vs Magnetometer (red dots). In this way we can find out the relation between sensors for same file and directional axis and choose which one to use for our ML model to differentiate between Sidewalk and Street.
Overview of the visualization: This graph is a comparision between the Streets and the sidewalk in terms of their x_acceleration (i.e. when the movement is front and back) and time .Trying to see how many relevant breaks are their in the acceleartion of the scooter when moving on a sidewalk and on a street.
The breakes can tell about the difference as sidewalks are prone to have more blocks ( pedestrians / mailboxes/ dustbins/ stands,...) than the streets, as streets have more dedicated lanes mostly free of the above blocks;
BUT it can be vice versa, too!
Observation: After carefully viewing the two graphs, it is fairly understood that the sidewalk in this case has not been obstructed much in comparision to the streets.
Deployed a quick function to check the same;
992
653
Clearly Sidewalk has less breaks;
1173
943
Again, as a conclusiom it can be seen that street has more brakes and obstruction in comparision to sidewalks;