Transform and Analyze Data with Microsoft Fabric
Luis Silva
Solution Architect - Data & AI
SUM()
COUNT()
AVG()
MIN()
MAX()
GROUP BY
STDEV()
VAR()
SELECT
<unaggregated columns>,
function(<aggregated column>)
FROM
<table>
GROUP BY
<unaggregated columns>;
SELECT
[State],
COUNT([Order_ID]) AS [Num Orders],
SUM([Order_Amount]) AS [Total Amount]
FROM
[tbl_Orders]
GROUP BY
[State]
sum()
count()
avg()
min()
and max()
first()
and last()
stdev()
variance()
groupBy()
and agg()
df.groupBy(<unaggregated columns>)
.agg(function(<aggregated column>))
from pyspark.sql.functions import sum
df.groupBy("state").agg(count("order_id"), sum("order_amount")).show()
pyspark.sql.functions
by including a statement at the star of your code.#----- Import one or multiple functions:
from pyspark.sql.functions import sum, avg, count, min, max
#----- Import all SQL functions:
from pyspark.sql.functions import *
#----- Import all SQL functions with an alias:
import pyspark.sql.functions as F
# call sum: F.sum()
Sum
Average
Median
Min
Max
Percentile
Count rows
Transform and Analyze Data with Microsoft Fabric