Yaygın kelime dizileri

Python ile Spark SQL'e Giriş

Mark Plutowski

Data Scientist

Eğitim

Python ile Spark SQL'e Giriş

Tahminleme

Python ile Spark SQL'e Giriş

Sondaki kelime tahmini

Python ile Spark SQL'e Giriş

Dizi

Python ile Spark SQL'e Giriş

Dizi sonu

Python ile Spark SQL'e Giriş

The quick brown fox

Python ile Spark SQL'e Giriş

Cümle parantezi

Python ile Spark SQL'e Giriş

Başka bir toplulaştırma türü

Python ile Spark SQL'e Giriş

Videolar

Python ile Spark SQL'e Giriş

Kategorik Veri

Python ile Spark SQL'e Giriş

Kategorik vs Sıralı

  • Kategorik: he, hi, she, that, they
  • Sıralı: 1, 2, 3, 4, 5
Python ile Spark SQL'e Giriş

Dizi Analizi

Python ile Spark SQL'e Giriş

Önce ve sonra gelen kelime

Python ile Spark SQL'e Giriş

3'lüler

query3 = """
   SELECT 
   id,
   word AS w1,
   LEAD(word,1) OVER(PARTITION BY part ORDER BY id ) AS w2,
   LEAD(word,2) OVER(PARTITION BY part ORDER BY id ) AS w3
   FROM df
""" 
Python ile Spark SQL'e Giriş

Alt sorgu olarak pencere fonksiyonu SQL'i

query3agg = """
SELECT w1, w2, w3, COUNT(*) as count FROM (
   SELECT 
   word AS w1,
   LEAD(word,1) OVER(PARTITION BY part ORDER BY id ) AS w2,
   LEAD(word,2) OVER(PARTITION BY part ORDER BY id ) AS w3
   FROM df
)
GROUP BY w1, w2, w3 
ORDER BY count DESC
""" 

spark.sql(query3agg).show()
Python ile Spark SQL'e Giriş

Alt sorgu olarak pencere fonksiyonu SQL'i – çıktı

+-----+-----+-----+-----+
|   w1|   w2|   w3|count|
+-----+-----+-----+-----+
|  one|   of|  the|   49|
|    i|think| that|   46|
|   it|   is|    a|   46|
|   it|  was|    a|   45|
| that|   it|  was|   38|
|  out|   of|  the|   35|
|.....|.....|.....|.....|
Python ile Spark SQL'e Giriş

En sık 3'lüler

+-----+-----+-----+-----+
|   w1|   w2|   w3|count|
+-----+-----+-----+-----+
|  one|   of|  the|   49|
|    i|think| that|   46|
|   it|   is|    a|   46|
|   it|  was|    a|   45|
| that|   it|  was|   38|
|  out|   of|  the|   35|
| that|    i| have|   35|
|there|  was|    a|   34|
|    i|   do|  not|   34|
| that|   it|   is|   33|
| that|   he|  was|   30|
| that|   he|  had|   30|
| that|    i|  was|   28|
+-----+-----+-----+-----+
Python ile Spark SQL'e Giriş

Başka bir toplulaştırma türü

query3agg = """
SELECT w1, w2, w3, length(w1)+length(w2)+length(w3) as length FROM (
   SELECT 
   word AS w1,
   LEAD(word,1) OVER(PARTITION BY part ORDER BY id ) AS w2,
   LEAD(word,2) OVER(PARTITION BY part ORDER BY id ) AS w3
   FROM df
   WHERE part <> 0 and part <> 13
)
GROUP BY w1, w2, w3 
ORDER BY length DESC
""" 

spark.sql(query3agg).show(truncate=False)
Python ile Spark SQL'e Giriş

Başka bir toplulaştırma türü

+-------------------+-------------------+---------------+------+
|                 w1|                 w2|             w3|length|
+-------------------+-------------------+---------------+------+
|comfortable-looking|           building|    two-storied|    38|
|         widespread|comfortable-looking|       building|    37|
|      extraordinary|      circumstances|      connected|    35|
|      simple-minded|      nonconformist|      clergyman|    35|
|       particularly|          malignant|  boot-slitting|    34|
|       unsystematic|        sensational|     literature|    33|
|       oppressively|        respectable|     frock-coat|    33|
|         relentless|        keen-witted|   ready-handed|    33|
|   travelling-cloak|                and|  close-fitting|    32|
|        ruddy-faced|      white-aproned|       landlord|    32|
|  fellow-countryman|            colonel|       lysander|    32|
+-------------------+-------------------+---------------+------+
Python ile Spark SQL'e Giriş

Hadi pratik yapalım

Python ile Spark SQL'e Giriş

Preparing Video For Download...