Analyzing Housing Prices in Airbnb

Housing Price Prediction, Review Classification and Rental Activity Analysis

Batselem Jagvaral
5 min readDec 16, 2019

Airbnb is an online platform that allows home-owners to put their real state such as house or apartment listings on the internet for rent so that renters pay home-owners to stay in their house or apartment. In this post, we will look at Seattle, Boston and Beijing Airbnb data and answer the following questions:

  1. Can we predict housing prices based on features such as the number of beds and amenities, etc?
  2. Can we find negative and positive reviews based on text data?
  3. What month of the year is the busiest?

Housing price prediction
Let’s say that an owner opens a new housing business in Seattle city. Then, the owner may want to know what price he or she can set for rent. To help the owner, we will build a model that can estimate a rental price for a house based on its properties. First, let’s look at Airbnb data. A host listing (host house) in Airbnb data contains several properties such as the number of rooms, beds, its location, etc:

Table 1. Example Airbnb Dataset.

We can visualize correlations between different features (properties) in the data and find out which features are linearly related and which ones can be useful for predicting the housing price. In Figure 1, the features with a strong correlation are highlighted in red color. For example, the number of beds and the number of bedrooms are highly correlated (linearly related) features since each bedroom can have a bed and the number of bedrooms might be equivalent to that of beds. Combining these two features might not give us any insight into predicting housing price.

Figure 1. Feature Correlation Matrix. Red color indicates a high correlation while blue indicates a low correlation.

Next, we need to select features that are highly influencing the price. In order to select such features, we rank them by their importance. The ranking result of the features is presented on the right side of the graph below. It indicates that monthly reviews are the most important features for predicting housing price which means that people often rely on online reviews to find rental houses. The number of reviews increases as more people rent houses from the owner and the popularity of the host grows. On the left side of the following graph, the distribution of housing prices in Seattle city is shown and the red line indicates the average housing price. If an owner wants to open a house renting business in this area, then he or she can expect to set the housing price around $100.

Figure 2. Properties & Price Visualization (Seattle).

We can now build a price prediction model based on the features we selected. First, we split our data into two datasets such as training and test datasets so that we train our model on the training dataset and evaluate it on the test dataset. The left side of the graph below shows model training results. As the number of epochs increases, test prediction score gradually approaches the training data score which means that the model can predict housing prices as accurately as it predicted on the training data. On the left side of the graph, it shows a scatter plot that compares real and predicted prices. As we can see, prediction and real values are linearly distributed along the two axes. Here, we want to make our housing price predictions as close as possible to real prices.

Figure 3. Model Training Results (Seattle). The red line indicates the expected result.

Using this model, a new host can set the right renting price for his house or apartment based on its properties.

Sentiment analysis on text reviews
Now, we compare housing reviews between different cities and find out if Seattle area is a good place for housing. To analyze reviews, text reviews of hosts are obtained from Airbnb for three different cities and then each review is classified into positive or negative category based on its content. Pre-trained sentiment analysis model (TextBlob) was used for review classification. An example review is shown in the following box:

Review text: Cute and cozy place. Perfect location to everything! 
Sentiment score: 0.433(3)

Given a text sequence, the model returns a value between -1.0 and 1.0 in which 0 indicates neutral, 1.0 indicates a positive sentiment and -1.0 indicates a negative sentiment. Review classification scores for different cities are shown in the following graph:

Figure 4. Review Comparison between Cities.

Two cities such as Boston and Seattle both got slightly positive reviews but in Beijing, more people got unsatisfied with their service.

The busiest month of the year
Lastly, let’s look at monthly visits. The graph (left) below shows the number of requests per month. It only shows the number of renting requests that were denied due to the reason that guest houses were not available or booked by someone else at the time. The second graph shows the average housing price for each month.

Figure 5. Monthly Visit Report (Seattle).

The result on the right side of the graph indicates that June is the busiest month of the year. So, if someone travels to Seattle and rents a house in June, he or she will pay as high as $150 for the rent. In the summer, the number of visits increases as most people take their holiday and spend time in another city or country. As the number of visits increases, the housing price gets higher. For example, from March to June, both the housing price and the number of renting requests have increased sharply. However, as shown on the first graph, the number of renting requests unexpectedly spiked in January.

Conclusion
Three different datasets were analyzed in this article. By analyzing these datasets, we found that housing price rate varies depending on the features such as house location, its properties and the season. Especially, the reviews were the most important factors in determining the housing price. In summertime, housing price also greatly increases as the holiday season begins. These findings can help house renters to plan their holiday in advance and hosts to seasonally adjust their housing prices according to the market.

--

--

Batselem Jagvaral

My research interests focus on Machine Learning, Deep Learning and Data Science.