How i built a Stack overflow trends dashboard
Starting using Brent Ozar Stack Overflow database extract i tried to build a dashboard that show evolution of Tags Trends over time and if possible compare trends of several tags between them.
I first tried using SQL server fulltext index on Tags column of the Posts table. This lead to a RAW_SQL in my tableau desktop dashboarding tool in order to call a SQL CONTAINS predicate in SQL Server that will use the full text indexing. The performances was good but i could not be able to compare two tags (or more) over time.
So, I change my approach and split the Tags column of the Posts table using STRING_SPLIT table-function in SQL Server. I finally aggregate intermediate data result by Months and Tags to minimise data to push and send the Stack Overflow trends dashboard on Tableau Public.
Please enjoy the result :
SQL Code for generate intermediate table PostsTagsLink and the aggregate one :
DROP TABLE IF EXISTS PostsTagsLink;
GO
SELECT
Id,
REPLACE(ssp.value,'<','') TagName,
CreationDate,
ViewCount
INTO PostsTagsLink
FROM dbo.PostsWithoutBody
CROSS APPLY STRING_SPLIT(Tags, '>') ssp
WHERE PostTypeId = 1 AND REPLACE(ssp.value,'<','')<>'';
CREATE CLUSTERED COLUMNSTORE INDEX i0 ON PostsTagsLink
GO
DROP TABLE IF EXISTS aggregates.agg_by_TagNameCreateMonth;
SELECT COUNT( *) QuestionsCount
,[TagName]
,CAST(STR(YEAR([CreationDate])*10000+MONTH(CreationDate)*100+1) AS DATE) CreationDate
,SUM([ViewCount]) ViewCount
INTO aggregates.agg_by_TagNameCreateMonth
FROM [dbo].[PostsTagsLink]
GROUP BY [TagName]
,CAST(STR(YEAR([CreationDate])*10000+MONTH(CreationDate)*100+1) AS DATE);
CREATE CLUSTERED COLUMNSTORE INDEX i0 ON aggregates.agg_by_TagNameCreateMonth;
GO