[SQL] “Reviews of Categories” – Yelp

SQL Yelp interview

Photo from Pixabay on Pexels

 

The dataset for this SQL interview question from Yelp contains information on 100 businesses . These information include the name of the business, other information such as its address, city, postal code, along with the categories which this business falls into, and the total number of reviews that it has gotten.

The goal here would be to write a query which will return the categories with the highest number of reviews. The challenge here is that the categories are all listed in the same string, and are separated by semicolons. Therefore, to solve this question, one needs to first split these categories, so that aggregate function may then be used on it.

This is the code that solves the question:

# Yelp SQL Interview: "Reviews of Categories"
SELECT UNNEST(STRING_TO_ARRAY(categories, ';')) "categories", 
       SUM(review_count) "Total_reviews"
FROM yelp_business
GROUP BY 1
ORDER BY 2 DESC

 

The full solution and its explanation can be found here:

 

 

 

 

 

Related Images: