Data-driven maps have the ability to tell powerful stories about places in the world. By plotting points in specific locations, maps can reveal new perspectives on geographical datasets and show you the hidden connections that link cities, countries and continents.
"You can't use an old map to explore a new world"
- Albert Einstein
This guide is intended to walk you through the process of building maps in Graphext. We'll look at the different types of maps you can create and the variables you need to do so before moving on to plot 37012 Airbnb listings in order to draw a data-driven map of New York.
After taking a quick look at the key variables that define the locations of Airbnb rentals, we'll explore the Graph to examine in more detail how rentals vary between districts and neighbourhoods in the city.
Then, we'll look at comparing the price range of rentals in order to investigate which property features make a rental more or less expensive for holidaymakers and travellers. This is a rich dataset and we hope that this guide will set you off on your way to conducting precise and insightful geospatial analysis of your own.
First, we need to upload the dataset to Graphext. Make sure you have the dataset downloaded to your computer and navigate to the workspace of the team that you want to build the project in.
From inside the Datasets panel, select New Dataset and either upload the file or drag and drop it into the workspace.
Open up the dataset once it has loaded. You should see 37012 rows of data with 33 columns. Take a second to scroll through the variables in the data. Notice that latitude and longitude are already part of the dataset. These are essential columns in a dataset for Geospatial analysis and you need them to plot points in Graphext.
From your project setup wizard, choose Geospatial as your analysis type then select to plot points. Plotting points is the simplest kind of geospatial analysis in Graphext. You can also plot routes or aggregate and cluster your data. For this guide, we'll focus on mapping our data by plotting coordinates.
After choosing Plot Points, Graphext will ask you to select targets and factors to calculate the relationships between rows in your dataset. Selecting targets and factors helps Graphext to generate clusters that will group Airbnb listings together depending on the similarity of the variables you select.
Target: The variable you want to gain a deeper understanding of.
Factors: The variables used to calculate similarity.
From your list of Other Variables select Price as your target variable. Then, select all 7 review score variables as your factors. This will result in Graphext grouping Airbnb listings according to their review scores and in order to gain a deeper understanding of the price of the listing.
Before moving onto executing your analysis, you need to tell Graphext where to find coordinates in your dataset. Providing coordinates allows Graphext to place nodes in your Graph according to the exact location of the property.
Open up the Network Visualization tab and choose Latitude and Longitude respectively. Finally, name your project something like 🌍 Mapping Airbnb Rentals in New York and execute it.
From the Projects panel of your team's workspace, find the 🌍 Mapping Airbnb Rentals in New York project and open it up.
The first thing you see will be the project's Graph where each Airbnb listing in your data has been plotted using the Latitude and Longitude variables. There are enough listings in the data to draw a pretty accurate representation of New York here.
But since color mapping isn't applied yet, it is difficult to gain any insight from looking at the map right now. Let's make things a little clearer by applying color mapping to the District variable. Find District in your right sidebar and click the raindrop icon to apply color mapping to values here.
Now you should see the 5 districts in the data mapped clearly onto the nodes in the Graph.
With districts coloring the regions of your Graph, it makes sense to label these regions with the corresponding values from the District column. Click the project settings icon from the top of your Graph and navigate to the Graph tab.
Here you should see a section on Labels. Click the checkbox to activate region labels and choose District from the variable dropdown menu. Save the configuration and head back to the Graph.
With region labels and color mapping, we can get an immediate impression of how listings are distributed amongst the 5 New York districts. Find the District variable in your right sidebar. You can see that the bar representing Manhattan is the largest - indicating that Manhattan has the most listings. Staten Island is the smallest - indicating that Staten Island has the fewest listings.
The type of Airbnb rental that people decide upon is usually determined in part by the price of the listing. In a dataset of different property types spread across the whole of New York, we would expect to find a significant variation in the prices of listings. Nonetheless, it is interesting to investigate which features of the data are most associated with price variations.
For instance, can we use the Graph to determine whether a 2 bed apartment in Manhattan is likely to be more expensive than the same property in the Bronx?
Take a look at the Price variable chart and notice that values are bunched up towards the lower range of the bottom axis. This is because we have outliers in our price variable - or in other words, some really expensive listings!
It would be useful for our analysis to create a quartile segmentation dividing our price variable into Low | Medium-Low | Medium-High | High categories.
To create this manual segmentation, click New Segmentation from your left sidebar. Then choose Manual and name the segmentation Price Quartiles. Now, to add segments we will need the descriptive statistics associated with the Price variable. Click the stats icon from within the Price variable chart to bring up stats of the distribution of data here.
Great. Our first segment will represent Low values - between 0 and the Q1 value of 60. Click on the bar inside the Price variable chart. Then, click the upper range value which should read 500 currently. Reset this value by clicking on it so your chart represents data between 0 and 60.
Now to save this segment, click on the plus icon inside of the new Price Quartiles segmentation. Call this segment Low and click OK. That's it ... the data between 0 and Q1 will have been saved inside this segment. Before we save the other quartile ranges, let's change the color of this segment.
To change the segment's color, select the Low segment and click the color picker. Choose a color from the preset colors, click OK and save your changes.
Now repeat the process so that you have saved segments dividing your data across the price quartiles; Low | Medium-Low | Medium-High | High - representing data between 0 & Q1 - Q1 & Median - Median & Q3 - Q3 & Max respectively.
Having created a useful segmentation that divides Airbnb rentals across price quartiles, let's apply color mapping to this new variable. Click the raindrop icon next to your new Price Quartiles segmentation.
Notice that most of the High value properties are closer to the city center and in Manhattan and to a lesser extent Brooklyn.
Click the bar representing Low values inside your Price Quartiles variable. Now change the representation of your sidebar charts to Relative using the Absolute vs Relative dropdown at the top of your right sidebar. Values in your variable charts now represent data in proportion to selections you make.
Scroll through the charts in your right sidebar and find the Neighbourhood variable. Let's find out which neighbourhood has the highest density of cheap rentals. Sort the variable chart so that the values at the top of your chart represent the most populous in the Low Price Quartiles. Do this by selecting the arrow icon and choosing sort by selection from the menu list.
It looks like Bedford-Stuyvesant, Bushwick, Williamsburg and Harlem all feature a high density of cheap rentals.
Let's use the Compare panel to investigate the factors that best distinguish cheap properties from expensive ones. The Compare panel generates a series of charts showing values that explain the difference between values in your data.
To generate charts showing values that explain the difference between your Price Quartiles, navigate to the Compare panel and choose Price Quartiles from the search box. Then use the plus icon to add in values for High | Medium-High | Medium-Low as well as Low.
Before turning your attention to the charts in the panel below, switch up the collection of charts displayed here so that All variables are included. You can do this using the dropdown between the charts and the variable cards at the top of your panel.
With charts generated in your Compare panel, you can start to check out how features belonging to properties of different prices are distributed. There are a few charts here that express insights that we might expect to find.
For instance, the first chart representing Room Type shows that it is more expensive to rent entire places and much cheaper to rent private rooms. Similarly, referring to the District compare chart, it is easy to see that rentals in Manhattan tend to be more expensive than in other districts.
Scroll further down your Compare panel to find the chart representing the Host Since variable. This chart shows how recent the host registered on Airbnb and the four colors represent how these values are distributed across the 4 price quartiles.
Notice that the blue line - representing Low value properties - is higher towards the more recent values on the bottom axis. Hover over the circles on the chart to inspect the value here further. It seems like hosts that register after the start of 2017 are offering cheaper rentals!
This is quite a surprising finding. Save the chart as an insight by clicking on the three dots at the top right of the Host Since compare chart. Choose Save as Insight and name your insight something like Hosts registering after 2017 offer cheaper properties.
Great. Now you can check out this insight in your project's Insights panel.
Airbnb datasets like this one offer lots of rich avenues for investigation. We'll stop our analysis here and let you crack on with conducting some of your own. Don't forget that the same flow of analysis is a good starting place for any kind of project in which you want to plot Geospatial data in Graphext using latitude and longitude!
We know that data isn't always clean and simple.
Have a look through these topics if you can't see what you are looking for.