Electric Vehicles Analysis (2025)
I initially began this project because I was interested in purchasing a new electric vehicle at some point within the next 3-5 years, and because I wanted to learn how to do web scraping with Python. I wanted to create a mostly automated and standardized data gathering, cleaning, and presentation process that would make it easy for me to continue to research these vehicles as time goes on. I used tools such as Excel, Python, and Power BI to make it easier to gather the data for these electric vehicles and to visualize and filter them in an easy to understand way for my research purposes. With the recent discovery and work towards solid state batteries, that can theoretically double the power storage and range of newer electric vehicles, I've become even more intrigued in purchasing an EV.
Dashboard Explanation Video With Research Examples

Electric Vehicle Dashboard Explanation
Part 1: Gathering The Data and Conversions

I started off this project with some data that was initially gathered from a kaggle dataset. It had a similar structure to the pictures of the Excel file posted to the left, however I felt that it was missing a few key variables and touch ups for my research's purpose. I added all pricing variables into the Excel file after I web scraped them in Python, converted metric units to imperial in Python (amongst other cleaning and updates to the data), and created the correlation matrix variables in Power BI. This data came from the ev-database.org website.


After performing some basic data checks in Excel and tidying up a few errors, I started the import process for the file into my Python script. I imported the necessary modules for web scraping, cleaning the data, and checking the data through visualizations and graphs. I added a few extra imports, such as tkinter, because I wanted to make improvements to the file selection window that I'd used in my previous Monthly Profit Calculator project.

To import the Excel file into the script I used tkinter. My previous tkinter work only included a csv or Excel file selection, which would automatically choose the first sheet in the file. I upgraded this code with the ability to select the specific sheet within the file as well. This step in the process also sets up the filepath variable so that I can easily write the updated data back into a new Excel sheet at the end of this script.

An example of the data from the Excel file, before any conversions are made, can be seen in the screenshot to the left. This Excel data was loaded into a DataFrame titled "Loaded_File".

In order to make this data more intuitive for my research, I decided to convert most of the file's metric data into imperial units. I also renamed the columns and rearranged them based on similar attributes that I was interested in researching. Some of these attributes included the Acceleration from 0 to 60 mph, Total Range of the vehicle, and Top Speed.

Each vehicle from each separate URL led to a webpage that contained multiple data tables of information. Since no prices from the pricing data table were included with the initial kaggle dataset, I decided to use web scraping to gather the pricing data for each of the 3 countries listed in the table (UK, Netherlands, Germany). Before performing any web scraping, I checked the ev-database.org website's robot.txt and made sure that there were no issues with gathering data this way.
To start the web scraping, I used BeautifulSoup and the developer's console of my browser in order to find the portion of the CSS code that contained the pricing data table for each vehicle, which started and ended with "td". Each numerically labelled variable of the "Price_Pounds_Scraped" corresponded to a different country's pricing data from the data table. I also had to add some random time intervals between my script gathering data from a new vehicle's URL due to the website's overzealous automated blocking protocol that is only based on how quickly you open new pages. I assume that this blocking protocol was to help make sure that the website doesn't have issues with a DDOS attack or something similar regarding heavy site traffic.

After web scraping pricing data for all 477 vehicles in the original kaggle dataset, I decided to combine the URLs and all 3 country's pricing data lists into a dictionary. Since the URLs are all unique within the data, I used this as a primary key in order to merge this pricing data back with the original Excel data in a future step.
​
An example of the pricing data that I gathered can be seen in the screenshot to the left. Sometimes a vehicle may not have been sold in a specific country, or the price was missing from the data table, and therefore it was labelled as "Not Available" in the table. Sometimes no prices appeared for any of the 3 countries due to an outdated URL that had been updated by the website, and in these cases the data would appear as a blank list "[]" for all 3 countries. I ended up manually updating all of these URLs, which led to me getting temporarily blocked by the website, but after a few days I restarted the script and regathered all of the pricing data.

After creating the dictionary in the step above, I turned it into a DataFrame so that it would be easier to merge with the "Loaded_File"'s DataFrame in future steps.
Part 2: Merging The Dataframes and Checking The Data

I started off this section by merging the web scrapped data which included 4 columns, the URLs and 3 countries' prices, against the original dataset which contained the same unique URLs into a dataframe called "Loaded_File_Merged". I then deleted the duplicate "Hyperlink" column that contained the URLs from the pricing data table after they were merged.
​
Next, I cleaned up the text that was gathered from the web scraped prices, such as the brackets, "<td>", "Not Available", and similar items to only leave numerical data or blanks if the data was missing.

After cleaning up the 3 countries' pricing columns, I converted them to float types and created the logic for final pricing column based on the US dollar. This logic prioritized converting Germany's prices to USD if it wasn't null, followed by The Netherlands and finally the UK if both other countries' pricing data were blank. I then rounded the prices to the nearest US dollar after converting their base prices up to 4 decimal digits for increased accuracy. The two conversions were Euros to USD (Germany or The Netherlands) and Pounds to USD (UK).
​
As part of an error check for my logic in these conversion conditions, I made the default value extremely high (around 9 trillion), so that it was obvious if there was a mistake.​

As part of my next data checks I decided to use the describe, groupby, and query functions. This was used to check for extreme min or max values that would indicate an error or outlier, as well as to get a general idea of what the average values for Range, Acceleration, and Top Speed were.
The describe function also showed me the counts of values in each column, which showed me that Germany had the largest count compared to the other 2 countries with 451 prices web scraped. I decided to manually check each of these 26 missing rows of data with the query function so that I could perform one final check on my conditional logic that created the USD prices column.​


My next check included a basic attempt at automating a graph's X and Y axis values so that the code cell did not have to be altered for me to check different variables. I ran the cell many times with different X and Y axis values based on a column name that I input as text. This helped me visually check whether any averages looked extreme and whether I needed to delve deeper into my data before proceeding.

The final checks that I performed included 3 different Regex code cells. These cells had some basic automation similar to the bar chart, where I could choose to filter and show different columns or rows of data in the dataframe based on a couple of input variables.
In the first example to the left, I chose to search for all rows with drivetrains that started with "a", which led to 191 "AWD" rows being shown. This cell is not case sensitive.

In this 2nd example of my 1st cell, I chose to search for all Models that started with "z", which only ended up showing 6 rows of data from 2 different Brands.

In this final example of my 1st cell, I searched for all brands that started with "x", which brought back 9 rows of data for the Brand "XPENG".

In the 2nd cell of my Regex checks, I created a system that filters and shows only columns that started with the case sensitive letter that I input.
​
In the example to the left, I checked for columns that contained a capital "M" in their names.

In the 2nd example of this 2nd cell, I filtered for columns that contained a lowercase "b", which ended up being only the "Number_Of_Cells".

In the final example of this 2nd cell, I filtered for columns that contained a capital "B" to show the contrast from the lowercase "b".

In the 3rd and final cell of my Regex checks, I created a system similar to my 1st cell, but with a more precise filter for the secondary variable called "Item_Name". This system checks the column name that you input and the specific value that you input in the "Item_Name" variable to return those rows of data.
If you entered a string for the "Item_Name" variable, then you can see all rows returned for the string value of the column you're searching. If you entered a numeric value for the "Item_Name" variable, then the cell will return all rows of data that are greater than or equal to the numeric value of the column you're searching.
​
In the example to the left, I checked for any vehicles that had 8 or more seats in the car, which led to 22 rows returned.

In the 2nd example of this 3rd cell, I searched for any vehicles that had a USD price of $225,000 or more, which returned 9 rows.

In the final example of this 3rd cell, I searched for any models that contained a "J" in their name.
Part 3: Writing The Data Back To Excel and Dashboard

In the final portion of my Python script I used Excel Writer to write the updated dataframe "Loaded_File_Merged" back to an excel file and table.
I started off by using the filepath that was originally created in the tkinter step at the start of my script. Excel Writer would not allow me to use the filepath if it contained the file's extension name, so I used filepath[:-5] to remove the ".xlsx" portion. I then named the Excel sheet, gathered the exact number of rows and columns that would be necessary to create the table, and gathered the column titles for an upcoming step.
​
​

After gathering the data and creating the infrastructure for the excel table, I then created the table with the add_table function. Next, I set the columns and their widths to be equal so that the data would be easier to view. The final step was to close the writer so that I could view the data in Excel, which can be seen in the blue table of the 1st screenshot in Part 1 of this page.

The final steps of this project revolved around creating the Power BI dashboard and designing it in a way that would make it easy for me to research these electric vehicles. I wanted to make a dashboard that had a lot of functionality, but also was concise and fit within 1 page. To do this I employed the usage of filters on the side of the page, parameters, and bookmarks so that the cards, top 2 graphs, and bottom table could display the data in different ways based on what I wanted to research.
​
The 2 correlation tables in the middle of the page are not affected by the parameters or bookmarks, but can be changed with the filters on the side of the page. This allows me to see the correlations between different variables based on specific attributes of the vehicles such as brand, body type, or drivetrain. For example, I could see that a brand like Volvo has strong positive correlations between USD Prices and the size of their vehicles (box dimensions), battery capacity, and range that have strong positives.
​
If you haven't seen my video demonstrating and explaining this dashboard yet, then please scroll back to the top of this page!
