Introduction to Spark SQL in Python
Mark Plutowski
Data Scientist
df.cache()
df.unpersist()
df.is_cached
False
df.cache()
df.is_cached
True
df.unpersist()
df.is_cached()
False
df.unpersist()
df.cache()
df.storageLevel
StorageLevel(True, True, False, True, 1)
In the storage level above the following hold:
useDisk
= TrueuseMemory
= TrueuseOffHeap
= Falsedeserialized
= Truereplication
= 1The following are equivalent in Spark 2.1+ :
df.persist()
df.persist(storageLevel=pyspark.StorageLevel.MEMORY_AND_DISK)
df.cache()
is the same as df.persist()
df.createOrReplaceTempView('df')
spark.catalog.isCached(tableName='df')
False
spark.catalog.cacheTable('df')
spark.catalog.isCached(tableName='df')
True
spark.catalog.uncacheTable('df')
spark.catalog.isCached(tableName='df')
False
spark.catalog.clearCache()
Introduction to Spark SQL in Python