Off-the-shelf vs Custom Machine Learning Models?
When is building better than buying an off-the-shelf solution? Companies can engage in different approaches to model development. From fully...
A popular question among property managers and private investors is: “Is there a way to reliably identify undervalued properties?”
While this question may have found some (partial) answers in the past, today’s technological environment allows us to find answers in predictive technology. The volatile real estate market is impacted by many economic and social factors alike, whose effect is either unknown or hard to measure. The limitations of traditional real estate data are reached when trying to figure out the evolution of the real estate market. Simple historical data is not enough anymore.
Instead, Blue Orange Digital, a top-ranked AI development agency in New York explains how alternative data sources offer a way to accurately identify real estate market trends and identify undervalued properties.
Since the real estate market is not only impacted by global factors, but by local factors alike, undervalued properties can be identified by modeling local dynamics. A variety of alternative data sources provide information about the behavior of markets at a local scale.
When investigating the areas surrounding a property, location data gives an accurate model of nearby activity. Tracking makes it possible to determine the usage and status of public spaces (such as parking spots), to identify the most commonly used pathways, as well as traffic patterns in a neighborhood. At the same time, IoT and sensor data can give insights with respect to both indoor and outdoor activities (shopping centers, commercial areas, drug stores, etc.). Such data can be leveraged by predictive technology to understand the impact on real estate prices.
Research has repeatedly identified local activity as a factor impacting real estate prices, in both commercial and residential areas. The walkability score, for example, is a popular measure that has an impact on property value. Apart from its social and environmental benefits, it has been found that “All walkable property types generated higher income and therefore have the potential to generate returns as good as or better than less walkable properties, as long as they are priced correctly.”
Another score that represents local activity is the so-called proximity effect. It accounts for travel times, distances to specific locations, and different modes of transportation that are available in the area. By utilizing alternative data sources, research has reached some interesting conclusions with regards to house pricing, such as the “negative relation between residential values and proximity to commercial and industrial zones.” With the plethora of alternative data sources available to the CRE sector, modeling different scores of local activity is now easily available.
Increasing numbers of real estate transactions are happening online, producing a digital track record of investments and their value in real-time. Automated processing and tracking tools of online listings give an opportunity to identify undervalued properties that fall outside the typical market pricing range. Data scraped from real estate listing websites plays a crucial role in the estimation of both buying and selling prices.
Alternative data collected from the web can complement the traditional features used to evaluate real estate properties. For example, sociodemographic data and geographic features of an area can also be extracted from public online listings. Modern pricing models can handle heterogeneous data sources and this gives them increased accuracy for determining price-impacting factors. Once underpriced properties are identified (those with a price estimation well above their current listing value), price prediction systems can be automated to send custom alerts, helping real-estate players stay ahead of the market.
Predicting the price of commercial real estate is a difficult process. You may have a lot of data but not know what to do with it or you may want to know what data can be used to gain an advantage. First, let’s look at how you can use some of your existing data to better inform your decisions.
In the following case study, Blue Orange Digital models the effects of adding diversified data sources into advanced prediction models to more easily identify undervalued properties. For this basic demonstration, to show the value of integrating third-party data, they began with about 2,500 observations of commercial real estate sales from 2017 in New York City. The key features of this data are neighborhood, build class category (e.g. office, retail, hotel, garage), unit type(residential, commercial, and total units), year built, and gross square feet.
Using this data set they wanted to build a model that predicts the price per square foot starting with a basic model and gradually increasing the complexity to compare the accuracy rates. Finally, they added different alternative data sources to witness how that would again impact the effectiveness of the models. Noting of course, that more data, is always the first approach to increase accuracy, but even with this limited amount of data, the point is clear.
Firstly, they cleaned the data and removed any outliers to get a data set that would give the most accurate predictions. They focused their data set on properties that were selling from $100 to $1,000 per square foot since this was the price range for the majority of our observations.
Next, they wanted to mimic the current basic standard of analysis, a linear regression model, to get a baseline measure of accuracy to build upon. Using this method they found a mean square error (MSE) of $186.93. What this represents is by applying just a standard linear regression, our model predicts a value that was within +/- $186.93 of the real price. The closer your MSE is to 0, the better your model. Since we are looking only at values between $100 and $1,000 per square foot, the first prediction isn’t very good.
Phase 1: Basic linear regression model: Gives a baseline comparison for improved accuracy.
Let’s see how the model improves with the use of more advanced algorithms and data science. Instead of using a standard linear regression model, they sampled results across three different regression models: Decision Tree, Random Forest, and a Gradient Boosting Regressor. To save you the boring details, these models are versions of regression models that each apply a different statistical methodology. Here are the MSE results for our models:
Phase 2 increase accuracy: Apply complex models
|Random Forest Regressor||Decision Tree Regressor||Gradient Boosting Regressor|
As you can see, the Decision Tree model performed just as poorly as our linear regression model, but our Random Forest and Gradient Boosting Model provide a significantly more accurate prediction than a standard linear regression model. You can use this model now to predict which commercial real estate listings are currently overpriced or underpriced.
Now, let’s see what happened when they added alternative data to the model. For this example, they chose to use the WalkScore API (Application Program Interface) and GoogleMaps API to help get some additional features about the data. By adding this external data they were hoping to reduce the MSE to help make a better prediction. Here are the results:
Phase 3 to increase accuracy: Apply WalkScore API
You can see by adding this one piece of data they were able to gain a 6% increase in accuracy in predicting price. While the mean standard error was drastically reduced from our baseline model, it is still relatively high. In order to drive down the MSE closer to 0, you would need to add more observations or more variables. Picking the right alternative data sources will depend on your market and your access to data. This is why it is important to team your commercial real estate expertise with a company like Blue Orange Digital that has the technical expertise to enrich, organize, and analyze your data to gain more analytic insight into the commercial real estate market.
The volatility of the real estate market can present both a challenge and an opportunity for real-estate players. Keeping a close eye on the market fluctuations is a way to quickly identify undervalued properties, but this is hard to achieve with traditional data sets and tools. Alternative data sources allow the capturing of market patterns and the modeling of market fluctuations. Web scraped data, social media posts, and news articles are commonly integrated with traditional real-estate data for market monitoring and forecasting applications.
Traditional market forecasts rely merely on statistics provide in annual reports, financial statements, government data, which are published with a delay and only serve as descriptions for historical events. Alternative and real-time data are more suitable for market monitoring: orthogonal data sources capture a 360-degree view of the market and predictive tools can process data events as they happen. This leads to better-performing models and more reliable insights, enabling real-estate players to be proactive instead of reactive.
Additional insights have been gained by integrating search indices from Google Trends with the more traditional housing index price data. Beyond that, natural language processing has allowed companies to integrate online news articles, based on the assumption that “emotions or sentiment of words in online opinions or articles could serve as an effective indicator of the real-world,” which makes such sentiment data relevant for prediction applications. Since the internet nowadays is used by most buyers and sellers engaging in real estate transactions, there is a lot of potential and economic value in alternative data.
We have seen how data science, real-time data, and sentiment analysis can bring insights that were previously hidden. This is all made possible via third-party and alternate data that provides an opportunity to understand local-scale effects and derive property value from seemingly unrelated data events. Similarly, web scraped data provides an opportunity to evaluate properties, build automated tools, and stay up-to-date with market events. Lastly, external data such as news articles, search indices, and more web-scraped data can help investors keep track of market fluctuations and establish proactive behavior, instead of reacting to data events in the past.
All in all, we can conclude that alternative data is a true compass that can help identify undervalued opportunities. When leveraged by predictive technology, there’s the potential to increase ROI and answer the sector’s most pressing questions.
When you hear about the power of alternative data to boost the bottom line in real estate, you might ask your IT team to work on it. However, implementing advanced algorithms, scaling a digital transformation, and capitalizing on all the available alternative data may not be within your team’s capability nor within your budget to experiment with. That means that an in-house solution is unlikely to work. Reach out to a top-ranked software development agency, like Blue Orange Digital, to discuss how they can deliver your real estate solution in 90 days or less.
Do you have any related questions? From real estate to health care and energy, the Blue Orange Digital team has extensive experience developing machine learning algorithms, analytic models, and custom big data solutions.
Tell us about your project today, schedule 15 minutes below to discover the power in your data.
Read More.... about alternative data sources and how they are being used to increase ROI in Third-Party Data is Increasing Commercial Real Estate ROI.
Josh Miramant is the CEO and founder of Blue Orange Digital, a data science and machine learning agency with offices in New York City and Washington DC. Miramant is a popular speaker, futurist, and a strategic business & technology advisor to enterprise companies and startups. He is a serial entrepreneur and software engineer that has built and scaled 3 startups. He helps organizations optimize and automate their businesses, implement data-driven analytic techniques, and understand the implications of new technologies such as artificial intelligence, big data, and the Internet of Things.
Featured on IBM ThinkLeaders, Dell Technologies, and NYC’s Top 10 AI Development and Custom Software Development Agencies as reviewed on Clutch and YahooFinance for his contributions to NLP, AI, and Machine Learning. Specializing in predictive maintenance, unified data lakes, supply chain/grid/marketing/sales optimization, anomaly detection, recommendation systems, among other ML solutions for a multitude of industries.
Visit BlueOrange.digital for more information and Case Studies.