Data
I sourced the data for this viz from http://www1.nyc.gov/site/finance/taxes/property-rolling-sales-data.page which gives the latest rolling 12 months. This is a really great source for NYC property data, in fact I think nyc.gov have done a pretty fantastic job all around of making city data public. For historic data back to 2003 you can go here http://www1.nyc.gov/site/finance/taxes/property-annualized-sales-update.page, that might be my next task!
The data comes separately for each borough, so I just downloaded them all and stuck them all into a single Excel sheet, noting the various borough codes (1=Manhattan, 2=Bronx, 3=Brooklyn, 4=Queens and 5=Staten Island).
This dataset includes ALL property sales, including commercial property and entire buildings. So to try and identify which sales were single residential units I filtered to the building categories below and then applied an extra filter to try and pick out apartments versus whole buildings:
I know for a fact that this wasn't 100% successful, but I think it was a pretty good way to filter. I also filtered out the 17,000 sales less than $50,000, because I just don't believe that could ever be real.
One wish I have for this dataset would be better sq ft data, its currently listed for less than half of properties, and bedrom counts.
Geocoding
The data provided by nyc.gov gives street addresses and zip codes, but I wanted to be able to map the building points exactly. Last time I used a geocoding tool by Texas A&M University. This time I searched around again and found a site called geocod.io which seemed to offer a combination of very reasonable pricing with a friendly user interface.
To do the geocoding I simply uploaded a csv file including the street addresses and the zip code. In fact the first time I did it I also included the city name of New York for all points but this put everything in Manhattan. The folks at geocod.io were kind enough to help me out quickly and return my credits, I was very impressed with the customer service they provided.
Now let's talk about accuracy of geocoding. For the most part it was pretty good, but there were some weird results too, for example check out this map showing all points I geocoded:
The zip code 11363 is in Queens, so I'm not sure how this point ended up in Arkansas. Fortunately the few big mistakes are easy to get rid of using the Tableau lasso tool.
More frustrating are the near misses, for example some points ended up in the wrong borough:
and some of the 'famous' high end buildings in Manhattan were in the wrong spot. For example:
I realize this might seem picky, but when you are looking at NYC real estate the $ difference between the top and the bottom of Central Park, for example, is HUGE. I'd say 95% of the data points are pretty spot on, but unfortunately the mistakes do cause some problems, particularly when trying to zoom to a particular neighborhood.
Geocod.io do provide accuracy scores, but sometimes clearly wrong locations (like the wrong state) are scored 100%. Having said that, I would use the service again as I don't think any service has mastered batch geocoding perfectly.
Color
For the design of this viz I wanted to create a unique look and a nice color palette. I searched around for NYC graphics and found this apple, and from here I built out the palette.
To do this I used a site called coolors.co. Its a really nice system for building 5-color palettes. I locked in the yellow, red green and grey from the apple and then hit the space bar to generate the fifth color until I was happy, easy peasy!
Performance
To be honest, the viz loads more slowly than I would like. At first I was using data blending to bring in the latitudes and longitudes, so I switched this to joins to try and speed things up. Unfortunately this didn't work, so I think the slowness is down to the number of mapped points, the use of medians and the high res images. I don't really want to lose any of these, so please be patient :-)
I hope you enjoy the viz, I like doing work with real estate data and will probably do some more in future.