Data Visualization IS 445 Final project

Instructor: Matthew Turk

Notebook by Group H : Beatrice Lovely, Zane Inman, Manasi Karale, Zahra Malwi, Suhani Jaiswal

This data was gathered by Beatrice and a friend on the streets and sidewalks of Champaign

Sidewalk vs Street

Classifying where an electric scooter is ridden based on data from Accelerometer and Gyroscope measurements

Data Collection Setup - smartphone mounted on scooter WhatsApp Image 2021-12-02 at 10.20.05 AM.jpeg

Sidewalk on Gregory and Sixth Street

Gregory&Sixth.jpeg

Street 2 : Loredo Taft Drive

LoredoTaftDrive.jpeg

Parameters used:

  1. Accelerometer; It measures linear acceleration (specified in mV/g) along one or several axis.
  2. Gyroscope; A gyroscope measures angular velocity (specified in mV/deg/s)
  3. Magnetometer; A magnetometer is a device that measures magnetic field or magnetic dipole moment. Magnetometers are widely used for measuring the Earth's magnetic field, in geophysical surveys, to detect magnetic anomalies of various types, and to determine the dipole moment of magnetic materials.

The End Goal

Can a machine learning model classify where the scooter was ridden based on the data, and what features would it need to do so?

If we can visualize features in the data and qualitatively see a difference - then a ML model should be able to tell the difference as well!

Our Vizualization Plan:

  1. To make visualization powerful enough so that even before the data is fed to the ML model we are able to distinguish between street and sidewalk.
  2. Speed of the rider and the weight carried shouldn't influence the data and hence we aggregated each sensor and each direction by using mean as its central tendency.

Conclusion: sampling frequency is 50 Hz

Beatrice Visualizations

Above I have actually divided data into training samples for a ML model by "chopping up" each and every data stream into windows of 75 samples each, and z-normalizing (mean 0, std 1), and computing features (mean, std, 90th percentile, 10th percentile. Below is a basic feature importance analysis, a heatmap of which features are most correlated to the label. Interestingly it is the 10th percentile of of accelerometer in y and z directions, I was expecting it to be standard deviation or something.

Visualizations by Manasi Karale:

Overview:

The above plot is divided into 3*3 subplots. They have common X and Y axis. These subplots can be grouped such that vertically they represent same direction of sensor and horizontally they represnt sensor. The X axis has 6 main ticks -

  1. fast speed + no weight rider
  2. fast speed + weight rider
  3. medium speed + no weight rider
  4. medium speed + weight rider
  5. slow speed + no weight rider
  6. slow speed + weight rider

The Y axis is the mean values that we calculated for above 6 combinations. The blue line represents sidewalk values and orange line represents street values.

Inference:

If the blue line never overlaps with orange line then we can use that direction and sensor to provide to the ML model that will use those values to differentiate between sidewalk and street. To Note: Values not overlapping each other isn't the only requirement but we also need to have the blue line consistently over the orange line or vice a versa to confidently use the respective direction and sensor to plot the values.


Overview:

The above interactive visualization works on one file and direction axis within it at a time. It has a common X-axis (which can be chosen) and 2 Y-axis one on each side. The one on left is red colored and the one on right is green colored. It is a scatter plot that shows correlation between 3 values (in the form of 2 vs 2 scenario) at the same time.

Note:

Inference:

As seen above the Acclerometer vs Gyrometer (green dots) are distributed across the plot and away specifically from center. Whereas Acclerometer vs Magnetometer (red dots). In this way we can find out the relation between sensors for same file and directional axis and choose which one to use for our ML model to differentiate between Sidewalk and Street.


Visualizations by Zahra Malwi

Overview of the visualization: This graph is a comparision between the Streets and the sidewalk in terms of their x_acceleration (i.e. when the movement is front and back) and time .Trying to see how many relevant breaks are their in the acceleartion of the scooter when moving on a sidewalk and on a street.

The breakes can tell about the difference as sidewalks are prone to have more blocks ( pedestrians / mailboxes/ dustbins/ stands,...) than the streets, as streets have more dedicated lanes mostly free of the above blocks;

BUT it can be vice versa, too!

Observation: After carefully viewing the two graphs, it is fairly understood that the sidewalk in this case has not been obstructed much in comparision to the streets.

Deployed a quick function to check the same;

Clearly Sidewalk has less breaks;

Again, as a conclusiom it can be seen that street has more brakes and obstruction in comparision to sidewalks;