Transform and Analyze Data with Microsoft Fabric
Luis Silva
Solution Architect - Data & AI


SUM()COUNT()AVG()MIN()MAX()GROUP BYSTDEV()VAR()SELECT
<unaggregated columns>,
function(<aggregated column>)
FROM
<table>
GROUP BY
<unaggregated columns>;
SELECT
[State],
COUNT([Order_ID]) AS [Num Orders],
SUM([Order_Amount]) AS [Total Amount]
FROM
[tbl_Orders]
GROUP BY
[State]

sum()count()avg()min() and max()first() and last()stdev()variance()groupBy() and agg()df.groupBy(<unaggregated columns>)
.agg(function(<aggregated column>))

from pyspark.sql.functions import sum
df.groupBy("state").agg(count("order_id"), sum("order_amount")).show()
pyspark.sql.functions by including a statement at the star of your code.#----- Import one or multiple functions:
from pyspark.sql.functions import sum, avg, count, min, max
#----- Import all SQL functions:
from pyspark.sql.functions import *
#----- Import all SQL functions with an alias:
import pyspark.sql.functions as F
# call sum: F.sum()
SumAverageMedianMin MaxPercentileCount rows


Transform and Analyze Data with Microsoft Fabric