Simply put, a data warehouse is a structured storage of data for statistical purposes. The storage carrier is [data tables]. A series of [data tables] grouped together for one or multiple subjects is called a data warehouse. Note: the data here can be result data (e.g. daily trading volume of a trading pair on Uniswap since its launch) It can also be process data (every transaction record of a trading pair on Uniswap since launch: who initiated it, trading A for B, transaction time, tx_hash, amount, etc.).
2. What is SQL?
Let's say you want some crispy Nestle chocolate bars, but you can't go out right now. So you ask someone to run the errand for you: "I need a box of chocolate bars, the brand is Nestle". The errand runner goes to the supermarket, buys the chocolate and delivers it to your home. Similarly, SQL is like the sentence you spoke, Dune Analytics is the errand runner. It allows you to converse with the data warehouse and retrieve data from it. The most basic structure or syntax of SQL has 3 components, almost all SQL will contain these 3 parts:
select: Which fields to retrieve?
from: From which table to retrieve?
where: What are the criteria?
3. What does a data table look like?
You can think of a table as an Excel sheet, with each sheet containing different data. Take ethereum.transactions (Ethereum transaction records) as an example:
There are some commonly used fields in tables:
block_time: Timestamp when the transaction was mined
block_number: Block height where the transaction was mined
value: Amount of ETH transferred (need to divide by power(10,18) for decimal precision)
from: Wallet address where the ETH was sent from
to: Wallet address where the ETH was sent to
hash: Transaction hash of this transaction
success: Whether the transaction succeeded
Common Syntax and Use Cases
1. Basic Structure, Operators, Sorting
Case 1: I want to see when Sun's wallet (0x3DdfA8eC3052539b6C9549F12cEA2C295cfF5296) had large ETH transfers (>1000 ETH) since January 2022, and the specific amounts transferred.
SQL
select-- Select fields to query, separate multiple fields with commas block_time ,"from" ,"to" ,hash ,value/power(10,18) as value-- Convert value to decimal by dividing by power(10,18), 18 is Ethereum's precisionfrom ethereum.transactions -- Get data from ethereum.transactions tablewhere block_time >date('2022-01-01') -- Limit block_time to be after Jan 1, 2022and"from"= 0x3DdfA8eC3052539b6C9549F12cEA2C295cfF5296 -- Limit to Sun's walletandvalue/power(10,18) >1000-- Limit ETH transfer value > 1000order by block_time -- Sort by block_time in ascending order,desc for descending order
SELECT followed by fields to query, separate multiple fields with commas
FROM
FROM followed by source table
WHERE
WHERE followed by filters on the data
Operators: and / or
Use operators to connect multiple filters
and: intersection of multiple filters
or: union of multiple filters
Sorting: order by [fieldA], sort in ascending order by fieldA, add desc at the end for descending order
Power calculation: used to convert Value precision, syntax is Power(Number, Power), where Number is base and Power is exponent
Change case of strings
lower(): convert string to lowercase
upper(): convert string to uppercase
2. Aggregate Functions
Case 2: the table contains detailed data. I don't want to see the specifics, I just want to understand the overview through some aggregated stats.
SQL
selectsum( value/power(10,18) ) as value-- Sum the value field ,max( value/power(10,18) ) as max_value -- Get max value ,min( value/power(10,18) ) as min_value -- Get min value ,count( hash ) as tx_count -- Count number of rows ,count( distinctto ) as tx_to_address_count -- Count number of rows for qualifying data (dedupe by to address)from ethereum.transactions -- Get data from ethereum.transactions tablewhere block_time >date('2022-01-01') -- Limit block_time to be after Jan 1, 2022and"from"= 0x3DdfA8eC3052539b6C9549F12cEA2C295cfF5296andvalue/power(10,18) >1000-- Limit ETH transfer value > 1000
count(): count, number of rows; add distinct inside () for dedupe
sum(): sum
min(): minimum
max(): maximum
avg(): average
3. Date/Time Functions, Aggregation by Groups
Case 3: I don't want to just see a single number, I want to see trends broken down by hour/day/week.
3.1 Convert timestamp to hour/day/week format for further aggregated analysis
SQL
-- Convert seconds-level timestamp to day/hour/minute (for further aggregation by day or hour)select-- Select fields to query, separate multiple fields with commas block_time -- Timestamp of when the transaction happened ,date_trunc('hour',block_time) as stat_hour -- Convert to hour ,date_trunc('day',block_time) as stat_date -- Convert to day ,date_trunc('week',block_time) as stat_week -- Convert to week ,"from" ,"to" ,hash ,value/power(10,18) as value-- Convert value by dividing by power(10,18), 18 is Ethereum's precisionfrom ethereum.transactions -- Get data from ethereum.transactions tablewhere block_time >date('2021-01-01') -- Limit block_time to be after Jan 1, 2021and"from"= 0x3DdfA8eC3052539b6C9549F12cEA2C295cfF5296andvalue/power(10,18) >1000-- Limit ETH transfer value > 1000order by block_time -- Sort by block_time ascending, add desc at end for descending
Returns different results based on datepart parameter:
minute: Truncate timestamp to minute
hour: Truncate timestamp to hour
day: Truncate timestamp to day
week: Truncate timestamp to Monday of the week
year: Truncate timestamp to first day of the year
3.2 Aggregate by groups using group by + sum based on the processed time fields from before
SQL
select date_trunc('day',block_time) as stat_date ,sum( value/power(10,18) ) as value-- Sum the value field for qualifying datafrom ethereum.transactions -- Get data from ethereum.transactions tablewhere block_time >date('2022-01-01') -- Limit block_time to be after Jan 1, 2022and"from"= 0x3DdfA8eC3052539b6C9549F12cEA2C295cfF5296andvalue/power(10,18) >1000-- Limit ETH transfer value > 1000group by1order by1
Aggregation by groups (group by): The syntax for aggregation by groups is group by. As the name suggests, it groups first then aggregates, and needs to be used together with aggregate functions.
Let's say the table above shows household expenses (3 people) for the first 2 months of 2020. If you just use sum, you would only get the total of 12900. If you want to get the 2 types of aggregated data on the right, you need to use group by (group by Person or group by Month).
4. Join, Subquery
Case 4: I want to look at Sun's transfers behavior from the perspective of USD value of ETH transferred out.
4.1 The transfers show ETH's amount and I want to see the USD value for each transfer
SQL
select block_time ,transactions_info.stat_minute as stat_minute ,"from" ,"to" ,hash ,eth_amount -- Convert value by dividing by power(10,18), 18 is Ethereum's precision ,price ,eth_amount * price as usd_valuefrom(select-- Select fields to query, separate multiple fields with commas block_time ,date_trunc('minute',block_time) as stat_minute -- Truncate block_time to minute as the primary key ,"from" ,"to" ,hash ,value/power(10,18) as eth_amount -- Convert value by dividing by power(10,18), 18 is Ethereum's precisionfrom ethereum.transactions -- Get data from ethereum.transactions tablewhere block_time >date('2022-01-01') -- Limit block_time to be after Jan 1, 2022and"from"= 0x3DdfA8eC3052539b6C9549F12cEA2C295cfF5296andvalue/power(10,18) >1000-- Limit ETH transfer value > 1000order by block_time -- Sort by block_time ascending, add desc at end for descending) transactions_infoleft join-- Join transactions_info with price_info, using left join(-- prices.usd table contains minute-level price dataselectminute as stat_minute ,pricefrom prices.usdwhere blockchain ='ethereum'-- Get Ethereum pricesand symbol ='WETH'-- Get WETH data) price_info on transactions_info.stat_minute = price_info.stat_minute -- Join with stat_minute field
In most cases, the data we need is not in the same table. For example, the transactions table only contains transaction data, not price data. If we want to calculate the USD value of transactions, we need to join these two tables.
A join query can be understood as combining two tables based on some condition to form a virtual table. You can then easily process this virtual table.
A join query has two components:
Join method (join, left join, right join, cross join, full join)
Join condition (on)
The most commonly used are join and left join. Take these two examples to explain the specific usage
- join: Joins two tables based on the join condition (on), taking the intersection
- Table A and Table B are joined on Name, the intersection is Xiaoming and Xiaohong. Since join takes the intersection, the final result will only have Xiaoming and Xiaohong
- All records from both tables that meet the criteria need to be joined. Because Table B has 2 records for Xiaoming, the joined result will also have 2 records for Xiaoming
- left join: Takes the left table as primary and joins the right table on the join condition (on), filling with null if no join is found
- Table A and Table B are joined on Name. Since the left table is primary, even though Xiaolan and Xiaoqing from the left table have no matching join records in the right table, Xiaolan and Xiaoqing will still appear in the result, with the right table portion filled with null.
4.2 Aggregate the detailed data from 4.1 by day, without too many levels of nested SQL
SQL
with transactions_info as -- Create subquery named transactions_info(select block_time ,transactions_info.stat_minute as stat_minute ,"from" ,"to" ,hash ,eth_amount -- Convert value by dividing by power(10,18), 18 is Ethereum's precision ,price ,eth_amount* price as usd_valuefrom (select-- Select fields to query, separate multiple fields with commas block_time ,date_trunc('minute',block_time) as stat_minute -- Truncate block_time to minute ,"from" ,"to" ,hash ,value/power(10,18) as eth_amount -- Convert value by dividing by power(10,18), 18 is Ethereum's precisionfrom ethereum.transactions -- Get data from ethereum.transactions tablewhere block_time >date('2022-01-01') -- Limit block_time to be after Jan 1, 2022and"from"= 0x3DdfA8eC3052539b6C9549F12cEA2C295cfF5296andvalue/power(10,18) >1000-- Limit ETH transfer value > 1000order by block_time -- Sort by block_time ascending, add desc at end for descending ) transactions_infoleft join-- Join transactions_info with price_info, using left join (-- prices.usd table contains minute-level price dataselectminute as stat_minute ,pricefrom prices.usdwhere blockchain ='ethereum'-- Get Ethereum pricesand symbol ='WETH'-- Get WETH data ) price_info on transactions_info.stat_minute = price_info.stat_minute -- Join key is stat_minute)select date_trunc('day',block_time) as stat_date ,sum(eth_amount) as eth_amount ,sum(usd_value) as usd_valuefrom transactions_info -- Get data from 'virtual table' transactions_infogroup by1order by1
with as can be used to construct a subquery, turning the result of some SQL into a 'virtual table' (similar to a view or subquery). The subsequent SQL can then directly retrieve data from this 'virtual table'.
Through with as, the readability of SQL logic can be improved, and multiple nestings can also be avoided.
About Us
Sixdegree is a professional onchain data analysis team Our mission is to provide users with accurate onchain data charts, analysis, and insights. We are committed to popularizing onchain data analysis. By building a community and writing tutorials, among other initiatives, we train onchain data analysts, output valuable analysis content, promote the community to build the data layer of the blockchain, and cultivate talents for the broad future of blockchain data applications. Welcome to the community exchange!