Introductie tot Spark SQL in Python
Mark Plutowski
Data Scientist
df.cache()
df.unpersist()
df.is_cached
False
df.cache()
df.is_cached
True
df.unpersist()
df.is_cached()
False
df.unpersist()
df.cache()
df.storageLevel
StorageLevel(True, True, False, True, 1)
In het bovenstaande opslagniveau geldt:
useDisk = TrueuseMemory = TrueuseOffHeap = Falsedeserialized = Truereplication = 1Het volgende is equivalent in Spark 2.1+:
df.persist()
df.persist(storageLevel=pyspark.StorageLevel.MEMORY_AND_DISK)
df.cache() is hetzelfde als df.persist()
df.createOrReplaceTempView('df')
spark.catalog.isCached(tableName='df')
False
spark.catalog.cacheTable('df')
spark.catalog.isCached(tableName='df')
True
spark.catalog.uncacheTable('df')
spark.catalog.isCached(tableName='df')
False
spark.catalog.clearCache()
Introductie tot Spark SQL in Python