Capital-Weather

Global Capitals Weather Insight

Python Power BI

Project Overview

I found an interesting real-time weather dataset on Kaggle which contains 6-hourly data from 252 world capitals. Using Python I cleaned and explored the data, and with Power BI, I built an interactive dashboard that highlights temperature trends and climate differences across continents.

Analytical Questions

Which regions show the highest and lowest temperature in the dataset?
How do humidity, visibility, and pressure vary across continents?
Which regions offer the best comfort index relative to their actual temperature?
How does temperature change by time?

General Data Description

Dataset size: 6-hourly data
Date/data range: Sep - Oct 2025
Main columns: continent, temperature, visibility, humidity, Comfort Index
Units: °C, %, km, hPa

Load the dataset

import pandas as pd, numpy as np pd.set_option('display.max_columns', None) pd.set_option('display.expand_frame_repr', False) path=.... df=pd.read_csv(path) print(df.head(3))

utc_time local_time country capital continent temperature 0 2025-09-01 00:00:00 2025-09-01 04:00:00 Abkhazia Sukhumi Unknown 22.15 1 2025-09-01 06:00:07 2025-09-01 10:00:07 Abkhazia Sukhumi Unknown 26.83 2 2025-09-01 12:00:02 2025-09-01 16:00:02 Abkhazia Sukhumi Unknown 29.20

Light EDA

print(df.info())

class'pandas.core.frame.DataFrame' RangeIndex: 60732 entries, 0 to 60731 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 utc_time 60732 non-null datetime64[ns] 1 local_time 60732 non-null object 2 country 60732 non-null object 3 capital 60732 non-null object 4 continent 60732 non-null object 5 temperature 60732 non-null float64 6 temp_min 60732 non-null float64 7 temp_max 60732 non-null float64 8 humidity 60732 non-null float64

Checking how much NaN value we have.

print(df.isna().sum().head(12))

utc_time 0 local_time 0 country 0 capital 0 continent 0 temperature 0 temp_min 0 temp_max 0 humidity 0 feels_like 0 visibility 37 precipitation 0 dtype: int64

Cleaning and Converting into CSV

#cleaning the data from the useless spaces for col in ["country", "capital", "continent", "weather_main", "weather_description", "weather_icon"]: if col in df.columns: df[col] = df[col].astype(str).str.strip() # the temperature should be within the range of -20 and +55 df["temperature"] = df["temperature"].clip(lower=-20, upper=55) #clip our time into more usable form df["utc_time"] = pd.to_datetime(df["utc_time"]) df["date_utc"] = df["utc_time"].dt.date df["hour_utc"] = df["utc_time"].dt.hour df[["utc_time","date_utc","hour_utc",]].head(3) #filling up the missing values df["visibility"] = df["visibility"].interpolate(method="linear")

One last percentile check, for getting rational data.

print(df[["temperature","humidity","visibility","cloudcover","pressure"]].describe(percentiles=[.25,.5,.75]))

temperature humidity visibility cloudcover pressure count 60732.000000 60732.000000 60732.000000 60732.000000 60732.000000 mean 21.644209 71.709033 9534.611737 46.704983 1013.741306 std 8.335061 20.018523 1503.979509 35.996583 6.416767 min -20.000000 2.000000 21.000000 0.000000 956.000000 25% 16.420000 61.000000 10000.000000 20.000000 1011.000000 50% 23.900000 76.000000 10000.000000 40.000000 1013.000000 75% 27.620000 87.000000 10000.000000 75.000000 1017.000000 max 44.660000 100.000000 10000.000000 100.000000 1050.000000

df.to_csv(.....,index=False)

Data Visualization

Which regions show the highest and lowest temperature in the dataset?

After importing the cleaned dataset into Power BI, I opened Transform Data to prepare the fields I needed for mapping.
In Power Query, I grouped the data by the country column and created a new field called Avg Temperature, using the Average operation on the temperature column.
After I saved the changes, I inserted a Map visual from the Visualizations panel and dragged country into the Location field the map could recognize the geographic areas.
I added Avg Temperature into the Tooltips field so that hovering over each country displays its calculated average temperature.
Finally, I changed the applied color scale, which allowed me to visually compare which countries and regions were warmer or cooler.

How do humidity, visibility, and pressure vary across continents?

For my second chart I imported again the cleaned dataset, and opened Transform Data and Power Query Editor to prepare the fields needed for the bar chart.
In Power Query, I grouped the data by the continent column and created new fields for each metric:
- Avg Temperature → Average of temperature
- Avg Visibility → Average of visibility
- Avg Pressure → Average of pressure
- Avg Humidity → Average of humidity
Select Avg Visibility → Transform → Standard → Divide → value: 1000
Select Avg Pressure → Transform → Standard → Divide → value: 100
Renamed the columns to Avg Visibility (km) and Avg PRessure (hPa/100) and set the data type to Decimal Number.
I saved the changes with Close & Apply so the new aggregated columns appeared in the main model.

Which regions offer the best comfort index relative to their actual temperature?

For my table, after inserted my resource dataframe again, I created some DAX measures:
- Mean Temperature = AVERAGE( Capital_Weather[temperature] )
- Std Temperature = STDEV.S( Capital_Weather[temperature] )
- Min Temperature = MIN( Capital_Weather[temperature] )
- Max Temperature = MAX( Capital_Weather[temperature] )
I also defined a simple Comfort Index (Mean) measure to compare how comfortable the weather feels:
- Comfort Index (Mean) = AVERAGEX( Capital_Weather, Capital_Weather[feels_like] - Capital_Weather[temperature] )
After creating and saving the measures, I dragged the continent column into the Values field of the Table visual so each row represents one continent and then I added the new, created measures.
Finally I added the new measures to the same Table visual as additional columns.

How does temperature change by time?

For my time-based line chart I opened Transform Data → Power Query Editor to prepare the time-based columns needed for weekly smoothing.
I selected the local_time column and created a date-only version using:
New Column → Date → Date Only, then renamed it to local_date.
To reduce day-to-day noise, I added a weekly grouping column:
For weekly trends: New Column → Date → Week → Start of Week → renamed to week_start.
After creating the time buckets, I clicked Close & Apply so the new fields became available in the report view.
In the Report view, I inserted a Line chart from the Visualizations panel, dragged week_start to the X-axis and temperature to the Y-axis and changed the aggregation to Average.
I placed continent in the Legend field so each continent gets its own line in the chart. (plus I removed the automatic Date Hierarchy that Power BI adds by default.)
Finally, I renamed the visual to: “Temperature Timeline by Continent (Weekly)”

I merged and putted my visualizations together on a dashboard which finally looks like this: