How to convert a PySpark Dataframe to HTML
PySpark is an
Convert PySpark DataFrame to HTML
Transform to Pandas DataFrame Convert PySpark DataFrame into Pandas DataFrame using the method
toPandas(). The whole DataFrame in PySpark will be stored in the memory on the driver node.
import pandas as pdpandas_df = pyspark_df.toPandas()
Pandas DataFrame into HTML Using the
to_html()method, we can convert the Pandas DataFrame into an HTML table. This generates an HTML string representation of the DataFrame.
html_tab = pandas_df.to_html()
Save/display HTML According to your requirements, you can choose to save or display the HTML table.
# to save HTML tablewith open('filename.html','w') as file:file.write(html_table)# to display HTML tablefrom IPython.display import display,HTMLdisplay(HTML(html_table))
The above method copies the entire data into memory. If the DataFrame is too large to fit in memory, you can try sampling the data according to your requirements; create a sample DataFrame in PySpark, and repeat steps 1 to 4.
Free Resources