#6 SQL basics part2
In the "SQL syntax (I)" section, we covered the fundamentals of SQL,including SQL query statement infrastructure syntax, datetime, group by, with as and join.Next, we continue to introduce some common fundamentals of SQL.
Commonly used Date functions and Interval
In the context of blockchain, data is recorded and stored in the order of transaction occurrences. When conducting daily data analysis, it is often necessary to perform statistical operations on data within specific time periods. In the previous section, we introduced the date_trunc()
function, which allows us to truncate date values at specified intervals such as days, weeks, or hours. Furthermore, there are several commonly used functions and their respective usage.
1.Now() and Current_Date() functions
The now()
function is used to get the date and time of the current system.Note that it is internally stored with hour, minute and second values, but Dune's query editor only displays "hour:minute" by default.When we want to correlate the date field with the minute
field in the prices.usd
table, we must first intercept by minute.Otherwise,the correct price record may not be associated.
The current_date()
function is used to get the current date (without the hours, minutes and seconds part).When filtering data by date and time, we often need to combine these functions and use relevant date functions to obtain the exact date or time. The current_date()
function is equivalent to using date_trunc('day', now())
, which extracts the day value from the result of now()
. You can also omit the parentheses of current_date()
and write it as current_date
directly.
2.DateAdd(), Date_Add(), Date_Sub() and DateDiff() functions
The dateadd(unit, value, expr)
function adds a datetime unit to a date expression. Here the "date and time units" using constants, commonly used are HOUR, DAY, WEEK, MONTH and so on.The value can be a negative number, which means that the corresponding date and time unit is subtracted from the following expression.It is also because a negative number can be used to indicate the subtraction of a datetime interval that the datesub()
function is not needed and indeed not available.
The date_add(startDate, numDays)
function adds or subtracts the specified number of days to a date expression and returns another date.Parameter numDays for a positive number of days after the specified date to return to the startDate
, for a negative number of days before the specified date to return.The function date_sub (startDate, numDays)
is similar, but the meaning of the opposite, a negative number indicates the date after the return, and a positive number indicates the previous date.
The function datediff(endDate, startDate)
returns the number of days between two date expressions.If endDate
is after startDate
, it returns a positive value, before it returns a negative value.
The SQL example is as follows:
3.INTERVAL type
Interval is a datatype that represents an interval of time in specified datetime units. The time interval represented by Interval is very convenient to use, avoiding being troubled by the previous date functions with similar names and similar functions.
For a description of more date-time related functions, see Date and time functions and operators
Conditional expressions Case, If
When conditional logic needs to be applied, the case
statement can be used. The general syntax for the CASE statement is CASE {WHEN cond1 THEN res1} [...] [ELSE def] END
. This statement allows an expression to be evaluated under multiple conditions and returns the value corresponding to the first condition that evaluates to True. If none of the conditions are satisfied, the value specified after else
is returned. The else
part is optional, and if omitted, NULL is returned.
We have used the CASE statement many times in the "Lens Practice Case: Creator Profile Domain Name Analysis" section. Some of the code excerpts are as follows:
As you can see, through the CASE statement we can flexibly convert the data according to actual needs to facilitate subsequent statistical summary.
Related links for the above example query:
Description: Lens Creator Profile Domain Name Analysis
The function if(cond, expr1, expr2)
returns one of two expressions, depending on whether the condition evaluates to true or false. If the condition evaluates to a true value, the first expression is returned, and if it evaluates to a false value, the second expression is returned.
Common functions for string processing
Substring() function
When there are certain situations where we have to work with the original data table transactions
or logs
and decode the data
therein, we need to extract part of the string from it first, and then carry out the targeted conversion process, at this time we need to use the Substring function. The syntax of the Substring function is substring( expr, pos [, len])
or substring (expr FROM pos [FOR len] ] )
, that in the expression expr
, starting from the position pos
, intercept len
characters and return. If len
is omitted, the string is intercepted until the end of the string.
Concat() Function and the || Operator
The function concat(expr1, expr2 [, ...])
strings multiple expressions together. ) concatenates multiple expressions together and is often used to link strings. The operator ||
has the same function as Concat.
Right() function
"The right(str, len)
function retrieves len
characters from the right side of the string str
. In our case, the original data table, like logs
, contains connected groups of 64 characters stored in the data
.For the contract address or user address, it is represented by 40 characters. When saving, it will be filled with 0
on the left to make up the 64-bit length. When extracting the address, we need to retrieve the 40 rightmost characters and add the '0x' prefix to restore it to the correct address format.
Note that in Dune SQL, directly using the right()
function may return a syntax error, which can be solved by putting the function name in double quotes, that is, using "right"()
. Since this method is cumbersome, you can leverage the substring function with a negative start position parameter to perform the right-side extraction with ease.
The following is a comprehensive example of using the above functions. This example decodes the cross-chain to Arbitrum records from the logs
table, using several methods comprehensively:
Related links for the example query above:
Description: String Functions and Operators
Window function
The combination of multiple rows of data becomes a window (Window). A function that operates on a set of rows in a window and calculates the return value for each row based on the set of rows is called a window function.Window functions prove invaluable for various processing tasks, such as computing moving averages, cumulative statistics, or accessing the value of a row based on its relative position within the current row's window.The common syntax format of a window function is as follows:"
Among them, function
can be a ranking window function, an analysis window function or an aggregation function. Over
is a fixed keyword that must be used. There are two possible changes in the window_spec
part: partition by partition_feild order by order_field
or order by order_field
, respectively indicating partition first and then sort and direct sort without partition. Except for the case where all rows are treated as the same group, the grouping function must be used with order by
.
LEAD(), LAG() functions
The Lead() function returns the value of the specified expression from subsequent rows within the partition. Its syntax is lead(expr [, offset [, default] ] )
. The Lag() function returns the value of the specified expression from the preceding row in the partition. These two functions are very useful when we need to compare the value of a column in the result set with the value of the same column in the previous or next row (of course, values can also be taken at intervals of multiple rows).
Our previous tutorial described a query to count the number of new pools added daily to Uniswap V3 for the last 30 days. Its SQL is:
If we want to add a curve to show the change in the number of new fund pools every day based on the current bar chart, we can use the Lag() function to calculate the change value of each day compared to the previous day, and then visualize it. In order to keep the logic clear, we added a CTE, and the modified SQL is as follows:
Add pool_count_diff
to the visualisation chart (using the right hand axis and choosing Line for the graph type) as shown below:
The Lead() function proves to be a valuable tool when we wish to perform "forward" comparisons of data across various rows. For example, we previously prpesented a query that identified the creator accounts with the highest post counts in the Lens instance. Now, we shall refine this query to retrieve the top 50 accounts with the highest post counts. With this refined dataset, we can delve into comparing the differences in the number of posts among these top accounts. Specifically, we will examine the variance between the first and second positions, the second and third positions, and so on.The key part of the query code is as follows:
The query results are shown in the figure below, where we can see that there are very small differences in the number of posts between some accounts:
Full SQL reference link:
Row_Number() function
Row_Number() is a powerful window function of the ranking type, primarily used to assign distinct row numbers based on a specified sorting method. These row numbers start from 1 and increment consecutively. In a previous example, we employed Row_Number() with the syntax row_number() over (order by post_count desc) as rank_id
to generate row numbers representing the ranking of our data, so we won't delve into another example here.If combined with the partition by
partition clause, Row_Number() will start numbering from 1 within each partition. Using this feature, we can use it to implement some advanced screening. For example, we have a group of Token addresses, and we need to calculate and return their average price in the last 1 hour.Due to potential delays in data from sources like Dune, filtering by the "hour" value of the current system date may not always yield the desired price data. To ensure accuracy, a more cautious approach is to widen the time range and then filter out the most recent record for each Token. This way, even if the data is delayed by several hours, our queries will continue to function correctly.To achieve this, we can utilize the Row_Number() function along with the partition by
clause. The process involves generating row numbers for each partition, and then filtering out the required data based on these row numbers.
The above query results are shown in the figure below:
Full SQL reference link:
More complete information on window functions:
Array_agg() function
If you want to combine a certain column of each row of data in the query result set, you can use the array_agg() function. If you want to merge multiple columns of data together (imagine exporting the query results as CSV), you can consider using the string concatenation method described above to combine multiple columns of data into one column, and then apply the array_agg() function. Here is a simple example:
Summary
Each database has dozens or even hundreds of built-in functions, and what we introduce here is only a small part of the commonly used functions. If you want to become a proficient data analyst, we highly recommend reading and understanding the usage of each of the built-in functions here:Trino functions.
About Us
Sixdegree
is a professional onchain data analysis team Our mission is to provide users with accurate onchain data charts, analysis, and insights. We are committed to popularizing onchain data analysis. By building a community and writing tutorials, among other initiatives, we train onchain data analysts, output valuable analysis content, promote the community to build the data layer of the blockchain, and cultivate talents for the broad future of blockchain data applications. Welcome to the community exchange!
Website: sixdegree.xyz
Email: contact@sixdegree.xyz
Twitter: twitter.com/SixdegreeLab
Dune: dune.com/sixdegree
Github: https://github.com/SixdegreeLab
Last updated