Exploring Ghana's Rent Data

Introduction

Although Ghana has seen a recent surge in real estate activity, there is still limited data and few metrics available to accurately determine rental prices. In this article, we explore data sourced from a rental website to investigate the key factors that influence the cost of renting a home. The insights gained from this analysis can help investors understand how to add value to their properties, while also guiding renters on what to look for to ensure they get the best value for their money.

The data used for this analysis was sourced from Kaggle. It was originally scraped from Tonaton.com, a popular rental listing website in Ghana. The dataset includes various features for each property, such as rental cost, number of bedrooms and bathrooms, floor area, available amenities, location, and more.

Analysis

The data came with the following features;

FieldDescription
urlThe link to the listing.
nameThe headline or title of the rental property listing.
priceThe rental price of the property in Ghanaian Cedis (GHS).
categoryThe type of rental property (e.g., apartment, house, room, office).
bedroomsThe number of bedrooms available in the property.
bathroomsThe number of bathrooms available in the property.
floor_areaThe floor area of the property in square meters.
locationThe address location where the property is located.
conditionCondition of the property e.g new, used, off-plan etc.
amenitiesAmenities provided in the property.
regionGeographic administrative region of the property location.
localityRepresent the town or city where the property is located.
parking_spaceIndicates if there is parking space available.
is_furnishedIndicates if the property is furnished.
latLongitude location of the property.
lngLatitude location of the property.

Location Distribution

The dataset does not provide uniform coverage across all regions in Ghana. Most of the rental listings are concentrated in the capital city, Accra, with a sharp decline in the number of properties available in other cities and regions.

Figure 1: Distribution of properties shown on the map of Ghana

Preprocessing

The fields category, condition, and is_furnished were identified as ordinal variables, meaning their values have a meaningful order. To capture this in the analysis, these fields were numerically encoded according to the following orderings:

CategoryConditionIs FurnishedCode
FlatsUsedUnfurnished0
DuplexRenovatedSemi-Furnished1
TownhouseNewFurnished2
Semi-Detached3
Detached4
Mansion5

The parking_space field was encoded as a binary variable, where 1 indicates the presence of parking space and 0 indicates its absence. Similarly, the amenities were separated and individually encoded as binary features—1 representing the presence of an amenity in a property and 0 its absence.

A total of 18 amenities were considered:

  • 24-hour Electricity
  • Air Conditioning
  • Balcony
  • Chandelier
  • Dining Area
  • Hot Water
  • Kitchen Cabinets
  • Kitchen Shelf
  • Pop Ceiling
  • Pre-Paid Meter
  • Tiled Floor
  • Refrigerator
  • Wardrobe
  • Wi-Fi
  • Apartment
  • Microwave
  • Dishwasher
  • TV

The following columns were not used for the analysis:

  • url
  • listing_type
  • name
  • location
  • amenities
  • region
  • locality

Visualizing Data

The resulting data had 28 features. In order to visualize it, the data was standardized and the dimensionality was reduced to 3 using Principal Component Analysis (PCA)1. During the analysis, the dominant components were Hot Water, bedrooms, and Pre-Paid Meter.

Figure 2: Plot of the first 3 principal components

The plot shows that most of the data points are clumped together, with only a few outliers positioned outside the cluster.

Finding important features.

RandomForestRegressor2, GradientBoostingRegressor3 and XGBRegressor4 models were used for predicting the rent prices. The root mean squared (RMSE) and $R^2$ errors were computed during test and test.

Figure 3: Feature importance per model

We computed the mean importance of each feature accross the models and selected those that are greater than the standard deviation.

Figure 4: Selected features

The selected features are then analyzed further for how they correlate with each other and with the rent price.

Figure 5: Correlation matrix of selected features.

It is observed that with the exception of the longitude and lattitude, which has a $0.1$,$-0.1$ coefficient of correlation respectively, all of the other features has coefficient of about $0.5$ with the rent price.

Finally, we measure the statistical significance by computing the $p$-value.

Variablep-value
Bedrooms0.000000e+00
Bathrooms0.000000e+00
Latitude (lat)1.116787e-50
Longitude (lng)1.830369e-43
Air Conditioning0.000000e+00
Refrigerator0.000000e+00
Microwave0.000000e+00
Price0.000000e+00

The low values tells us that these features are very significant in determining the rent price.

Conclusion

For a rental property in the Ghanaian market, key factors influencing pricing include the number of bedrooms and bathrooms, as well as the availability of amenities such as air conditioning, a refrigerator, and a microwave. Additionally, the location of the property plays a significant role in determining its value.

Source code

The source code for our analysis can be found here on github.

Acknowledgement

Special thanks to Philip Adzanoukpe for providing the dataset used in this analysis.

References

1

Principal Component Analysis, Wikipedia, https://en.wikipedia.org/wiki/Principal_component_analysis (accessed April 6, 2025).

2

RandomForestRegressor, RandomForestRegressor — scikit-learn 1.6.1 documentation, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html (accessed April 6, 2025).

3

GradientBoostingRegressor — scikit-learn 1.6.1 documentation, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html, (accessed April 6, 2025).

4

Python API Reference — xgboost 3.0.0 documentation, https://xgboost.readthedocs.io/en/stable/python/python_api.html, (accessed April 6, 2025).