We try to understand the Housing sector in Madrid with data from more than 20k advertisements in real estate websites.
We will identify the pricing distribution, look at which factors are most influential and analyze which developers/companies are most important for each niche. All with one goal: finding undervalued opportunities for buying or investing thanks to predictive algorithms.
Price distribution in Madrid seems to follow an already known pattern
We can appreciate how the prices in Madrid, as in almost every city, tend to increase while you are getting closer to the city center.
However, there is also an upward trend while getting away from the center and towards the northwest: Pozuelo, Aravaca, Las Rozas, Montecarmelo, Las Tablas...
Almost the opposite happens in the other direction. If you go further southeast, the prices go down. Arganzuela, Carabanchel, Getafe, Leganés, Fuenlabrada... 'El cinturón sur'.
What marks the difference between high price areas and low price ones?
The main difference is found in habitants' education level. It is the most important variable when differentiating both segments. This insights makes us wonder: Is it easier to study in an expensive area or do people with higher levels of education simply tend to live there? Maybe both? A virtuous loop?
The second most important factor is Location Index, a number between 1 and 9 summarizing the overall quality of the location, lower values being better. It seems reasonable that areas with more expensive houses have a better rate.
Something not that obvious happens with the 3rd and 4th variables. Apparently the ammount of bathrooms has a bigger correlation with price difference than the number of rooms. Houses with lower prices tend to have 2 or less bathrooms while more expensive ones range from 2 to 4.
Cheaper houses tend to have 2 or 3 bedrooms compared to the 3 or 4 bedrooms we usually see in more expensive houses.
Type of house and main developers for Low and High Price segments
Although it could seem obvious, we can confirm that in the Low Price segment we see mainly flats and apartments. Duplexes are also included here, something not so evident.
For the High Price segment we find chalets of any kind as the main type, followed by attics.
In terms of developers, it is notable that none from the top 10 of any segment is included in the top 10 of the other. Only Housell and Pradesa are included in both segments, a remarkable fact. This insight shows us that specialization is really important to succeed in this sector.
Benchmark of developers in specific niches
If specialization is necessary for success, analyzing niches is critical for good positioning. For example, if we wanted to position ourselves in chalets of medium-low and medium-high prices in the northwest of Madrid, we should take into account that Pradesa is the main player by far.
Unexpected demographic insights
Thanks to Census' data enrichment, we are able to make a more complex analysis than simply exploring a Real Estate platform's data. We introduce square meters per house, number of persons per house, education level, proportion of foreigners, proportion of married couples, age and location index.
Coloring our Graph by each of these variables to understand its distribution, we found that this zone inside Ensanche de Vallecas has the lowest education level of all Madrid. We can also see how immigration levels match almost exactly with low education levels. We could say that there is a clear problem of social integration in the area.
Finding opportunities with Graphext by calculating house prices based on their characteristics
With Graphext's prediction flow we can calculate the 'real' price of each house based on its characteristics. Once we have it, we can compare their announcement price with their 'real' price. Then, all of those houses with a lower price in the announcement than the price we have calculated are undervalued, so they are potential opportunities.
To start identifying these opportunities, we have to focus on Error Category. Overpredicted houses are possible opportunities since Graphext thinks they are worth more than their announced price. However, there's a problem that we need to take into account. Not every overprediction is an opportunity. There are false, outdated or flawed adds and, obviously, Graphext will commit errors too.
For this reason, we will need to filter this Overprediction category with the error that we recognize as suitable. 20/30k error for a 150/200k house? 100/200k for 1/1.5 million? Well, that's something to discuss with your partner, your pillow or your Real State's expert brother-in-law.
Nevertheless, every opportunity should be checked to confirm it is a real opportunity and not a false alarm.
How did we do this analysis?
With Graphext. It took us less that 30 minutes to get, process and find these insights. Graphext lets you find patterns and insights in the data in a quick and intuitive way. You can use Graphext yourself with our Freemium version, Graphext Public.
If you want to play with this project yourself and find cool opportunities, click here!