Exploring Ghana's Rent Data
Table of Contents
Introduction
Although Ghana has seen a recent surge in real estate activity, there is still limited data and few metrics available to accurately determine rental prices. In this article, we explore data sourced from a rental website to investigate the key factors that influence the cost of renting a home. The insights gained from this analysis can help investors understand how to add value to their properties, while also guiding renters on what to look for to ensure they get the best value for their money.
The data used for this analysis was sourced from Kaggle. It was originally scraped from Tonaton.com, a popular rental listing website in Ghana. The dataset includes various features for each property, such as rental cost, number of bedrooms and bathrooms, floor area, available amenities, location, and more.
Analysis
The data came with the following features;
Field | Description |
---|---|
url | The link to the listing. |
name | The headline or title of the rental property listing. |
price | The rental price of the property in Ghanaian Cedis (GHS). |
category | The type of rental property (e.g., apartment, house, room, office). |
bedrooms | The number of bedrooms available in the property. |
bathrooms | The number of bathrooms available in the property. |
floor_area | The floor area of the property in square meters. |
location | The address location where the property is located. |
condition | Condition of the property e.g new, used, off-plan etc. |
amenities | Amenities provided in the property. |
region | Geographic administrative region of the property location. |
locality | Represent the town or city where the property is located. |
parking_space | Indicates if there is parking space available. |
is_furnished | Indicates if the property is furnished. |
lat | Longitude location of the property. |
lng | Latitude location of the property. |
Location Distribution
The dataset does not provide uniform coverage across all regions in Ghana. Most of the rental listings are concentrated in the capital city, Accra, with a sharp decline in the number of properties available in other cities and regions.
Figure 1: Distribution of properties shown on the map of Ghana
Preprocessing
The fields category
, condition
, and is_furnished
were identified as ordinal variables, meaning their values have a meaningful order. To capture this in the analysis, these fields were numerically encoded according to the following orderings:
Category | Condition | Is Furnished | Code |
---|---|---|---|
Flats | Used | Unfurnished | 0 |
Duplex | Renovated | Semi-Furnished | 1 |
Townhouse | New | Furnished | 2 |
Semi-Detached | — | — | 3 |
Detached | — | — | 4 |
Mansion | — | — | 5 |
The parking_space
field was encoded as a binary variable, where 1
indicates the presence of parking space and 0
indicates its absence. Similarly, the amenities
were separated and individually encoded as binary features—1
representing the presence of an amenity in a property and 0
its absence.
A total of 18 amenities were considered:
24-hour Electricity
Air Conditioning
Balcony
Chandelier
Dining Area
Hot Water
Kitchen Cabinets
Kitchen Shelf
Pop Ceiling
Pre-Paid Meter
Tiled Floor
Refrigerator
Wardrobe
Wi-Fi
Apartment
Microwave
Dishwasher
TV
The following columns were not used for the analysis:
url
listing_type
name
location
amenities
region
locality
Visualizing Data
The resulting data had 28 features. In order to visualize it, the data was standardized and the dimensionality was reduced to 3 using Principal Component Analysis (PCA)1. During the analysis, the dominant components were Hot Water
, bedrooms
, and Pre-Paid Meter
.
Figure 2: Plot of the first 3 principal components
The plot shows that most of the data points are clumped together, with only a few outliers positioned outside the cluster.
Finding important features.
RandomForestRegressor
2, GradientBoostingRegressor
3 and XGBRegressor
4 models were used for predicting the rent prices. The root mean squared (RMSE) and $R^2$ errors were computed during test and test.
Figure 3: Feature importance per model
We computed the mean importance of each feature accross the models and selected those that are greater than the standard deviation.
Figure 4: Selected features
The selected features are then analyzed further for how they correlate with each other and with the rent price.
Figure 5: Correlation matrix of selected features.
It is observed that with the exception of the longitude
and lattitude
, which has a $0.1$,$-0.1$ coefficient of correlation respectively, all of the other features has coefficient of about $0.5$ with the rent price.
Finally, we measure the statistical significance by computing the $p$-value.
Variable | p-value |
---|---|
Bedrooms | 0.000000e+00 |
Bathrooms | 0.000000e+00 |
Latitude (lat) | 1.116787e-50 |
Longitude (lng) | 1.830369e-43 |
Air Conditioning | 0.000000e+00 |
Refrigerator | 0.000000e+00 |
Microwave | 0.000000e+00 |
Price | 0.000000e+00 |
The low values tells us that these features are very significant in determining the rent price.
Conclusion
For a rental property in the Ghanaian market, key factors influencing pricing include the number of bedrooms
and bathrooms
, as well as the availability of amenities such as air conditioning
, a refrigerator
, and a microwave
. Additionally, the location
of the property plays a significant role in determining its value.
Source code
The source code for our analysis can be found here on github.
Acknowledgement
Special thanks to Philip Adzanoukpe for providing the dataset used in this analysis.
References
Principal Component Analysis, Wikipedia, https://en.wikipedia.org/wiki/Principal_component_analysis (accessed April 6, 2025).
RandomForestRegressor, RandomForestRegressor — scikit-learn 1.6.1 documentation, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html (accessed April 6, 2025).
GradientBoostingRegressor — scikit-learn 1.6.1 documentation, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html, (accessed April 6, 2025).
Python API Reference — xgboost 3.0.0 documentation, https://xgboost.readthedocs.io/en/stable/python/python_api.html, (accessed April 6, 2025).