Example results after monitoring one of the locations in Warsaw
The investment is built in the Warsaw eastern district of Praga Południe. This is an area with great potential. Currently, there are a lot of apartment buildings and places that wait for renovation or replacement. However, it is very well connected with the city center (15 minutes by public transport or 10 minutes by car). Another advantage is that there are a lot of points of interests like shopping malls, parks, schools or medical service points.
The monitored investment is built in the eastern Warsaw district of Praga Południe. This is an area with great potential. Currently, there are a lot of apartment buildings and places waiting to be renovated. However, it is very well connected with the city centre (15 minutes by public transport or 10 minutes by car). There are also a lot of facilities nearby, such as shopping malls, parks, schools, or medical service points.
Here are the basic characteristics of the monitored investment:
- There are 135 apartments for sale.
- There are 8 floors in the building; each floor has 14 to 16 apartments.
- Investment starts from an empty plot with construction permission, work on the site started in Q1 2018. Planned date of completion is Q4 2019.
- The real estate developer began selling apartments in April 2018.
After collecting the first data from the investment website, we can see the current offers of different apartment types:
This is good to know, but the most interesting data comes from the periodical monitoring of sales progress. Below you can see the timeline representing the number of apartments sold each week:
>What else can we see? For example, what is the current state of sales progress?
It is worth remembering that construction began in Q1 2018. Currently, only the building’s foundation is ready, but almost half of the apartments are already sold.
Predicted date of selling all apartments
Let’s see what the sales dynamic is by apartment type. We can also try to make a simple prediction of when all apartments will be sold.
What we can see from this visualisation:
- If the selling pace continues, almost all apartments will be sold by May 2019. This is over 6 months before the planned construction completion date.
- However, not all apartments have the same sales dynamic. The fastest selling apartments have 1 or 2 rooms. The biggest apartments, with 4 rooms, are less popular – only 3 have sold.
Did potential buyers change their mind? Could they get a mortgage loan?
Another interesting observation is about apartments that had sold, but later became available again. There are multiple reasons why this happens. It is important to understand how the selling process works. The buyer has to sign an agreement with the developer before requesting a loan from a bank. There is always a possibility that after officially signing for an apartment, the bank can reject your loan request. The buyer can also change their mind and withdraw for other reasons.
In the observed data, there was only one situation like this. A 2-room apartment with a separate kitchen was sold at the end of August 2018, but, in mid-September, became available again.
Gathering valuable data for analysis is the foundation of every data science process. When the data comes from an external online source, one of the methods is data scraping. This technique means that the data is “scraped” from a website by a web crawler (other names are “robot” or “scraper”). The robot parses text into a machine-readable format, so then it can be analysed.
Companies are aware of this process, so they usually protect their websites and online systems. Some of the techniques are:
- Protecting content with captcha or other robot-detection tools.
- Anomaly detection systems that analyse behaviour between requests and ban suspicious visitors.
At Appsilon we know how to deal with all of these obstacles. All of them make web scraping harder, but are not bulletproof.
Collecting the data from the investment page
In the case described in this article, I monitored one of the locations in Warsaw. The real estate developer is selling apartments in this location using an online system, which contains the following table of apartment availability.
The solution is a web crawler simulating human behaviour, clicking through the interface. Imagine that you have a robot-employee, who monitors all the information you need, on a weekly basis, doesn’t get tired or bored, and is 100% precise.
I used Google’s Puppeteer.js technology; it can replicate a human behaviour in a browser running in the background. Here is the source code of a scraper:
And we can run the crawler with the command below. Just configure a CRON or other job to run this weekly:
This is just one of a wide range of techniques that can be used for scraping. Let us know if you want to know more about collecting data and other business cases from our experience!
As a private investor, I need to excel in my investments. There is no place for bad bets, which is why I use all available cards in the deck to make sure I do everything in my data science power. This analysis gave me the information about the investment’s financial performance, and I could predict how much time I had to close the deal for my dream apartment. I had more time for the final decision, and I was more confident during negotiations. Remember that buying an apartment at the latest possible moment gives you short-term alternatives, instead of freezing your capital right away. I had an edge that other buyers did not, so I could easily tackle false claims about the popularity of the apartments, and not feel pressured into buying. Each sales process is a game with unequal distribution of knowledge. Knowledge of sales dynamics will allow you to fight against the odds and close better deals.
Bio: Paweł Przytuła is the COO and co-founder at data science company Appsilon. Pawel is also a Fullstack Developer, data Engineer and Project Manager.
- The Whys and Hows of Web Scraping – A Lethal Weapon in Your data Arsenal
- What is Web Scraping and Why You Should Learn It?
- The ultimate list of Web Scraping tools and software