Friday, October 12, 2012

Google BigQuery Aggregate functions

Google BigQuery has provided aggregate functions that are very useful when you are reading data from Google Big Table. The list of Aggregate functions includes Avg, Count, Max, Min and Sum which are very common. If you have used Ms-Excel spreadsheet or Google Spreadsheet or if you are familiar with any DBMS or RDBMS system, you may have used them at some point of time.

I have following Big Table with some sample rows into it. We will apply Aggregate functions on them. I have issued a SELECT query to see what data I have in this Big Table and its column.

Count: Count Function, as it name suggest will give count of total rows in the Big Table. So, I issued a COUNT(*) statement and it returned me a count 19.

Count(Distinct, field,[n]): Count(Distinct, field,[n]) function, will give count of total unique rows in the Big Table. In my Big Table Employee there are three unique departments, so I issued a COUNT (DISTINCT [DeptCode]) statement and it returned me a count 3.

GROUP_CONCAT('str'): GROUP_CONCAT('str') function concatenates the group values into one separating each value by a comma(,). I wanted to show each department and their concerned employees so, I issued a Group_CONCAT statement. They return each department and their employees. Do you remember how much code you had to write to achieve the same with SQL Server or Oracle or other RDBMS. I appreciate Google BigQuery has introduced this function.

STDDEV: This aggregate function returns Standard deviation of a particular column.

Variance: This function returns Variance in a particular column of Big Table.

SUM: As its name suggests it returns the sum total of a particular numeric column.

The list of Google Big Query Aggregrate function includes following functions

  • Avg
  • Count
  • Count(Distinct, field,[n])
  • GROUP_CONCAT('str')
  • QUANTILES(expr[, buckets])
  • STDDEV(numeric_expr)
  • VARIANCE(numeric_expr)
  • LAST(field)
  • MAX(field)
  • MIN(field)
  • NTH(n, field)
  • SUM(field)
  • TOP(field, [max_records], [multiplier])


Popular Posts

Real Time Web Analytics