Introducción a PySpark
Benjamin Schmidt
Data Engineer
# Inicializar sesión de Spark spark = SparkSession.builder.appName("Spark SQL Example").getOrCreate()# DataFrame de ejemplo data = [("Alice", "HR", 30), ("Bob", "IT", 40), ("Cathy", "HR", 28)] columns = ["Name", "Department", "Age"] df = spark.createDataFrame(data, schema=columns)# Registrar DataFrame como vista temporal df.createOrReplaceTempView("people")# Consultar usando SQL result = spark.sql("SELECT Name, Age FROM people WHERE Age > 30") result.show()
df = spark.read.csv("path/to/your/file.csv", header=True, inferSchema=True)
# Registrar DataFrame como vista temporal
df.createOrReplaceTempView("employees")
# Resultado de consulta SQL query_result = spark.sql("SELECT Name, Salary FROM employees WHERE Salary > 3000")# Transformación de DataFrame high_earners = query_result.withColumn("Bonus", query_result.Salary * 0.1) high_earners.show()
Introducción a PySpark