12:[["$","$La0",null,{"props":{"lessonContent":{"components":[{"type":"MarkdownEditor","content":{"version":"2.0","text":"$a1","mdHtml":"

Spark allows us to manipulate individual DataFrame columns using relational or computational expressions. Conceptually, columns represent a type of field and are similar to columns in pandas, R DataFrames, or relational tables. Columns are represented by the type Column in Spark’s supported languages. Let’s see some examples of working with columns next.

Listing all columns

We’ll assume we ...

","cursorPosition":{"line":0,"ch":0},"comp_id":"bTLIfT6RzHhMk4mLN55Uk"},"iteration":4,"hash":1,"saveVersion":34}],"summary":{"title":"Columns","titleUpdated":true,"tags":["DataFrame Columns","Spark DataFrame"],"description":"Explore how to work with individual columns in Spark DataFrames including accessing, creating new columns with expressions, and sorting data efficiently. Understand the use of the col() method and withColumn() for manipulating structured data for big data projects."},"darkModeContent":["$a2"],"content":["$a2"]},"isPreviewLesson":false,"pageType":"collection_lesson","aiCoachVideoUrl":"https://youtu.be/kgl8y9J3O6c","collectionDetailsSSR":{"title":"An Introduction to Spark","summary":"Spark has come to dominate the big data processing space in a short span of time since its release and now serves as the de-facto unified big data processing engine in the industry. \n\nIn this course, you will get a complete introduction to the basics of Spark. You will start by learning about the architecture, the application lifecycle, and its API.\n\nFrom there, you will dive into the data frame data structure and its API as well as the strongly-typed datasets API. Lastly, you’ll get into the Spark SQL engine which will allow you to issue queries on structured data with a schema.\n\nBy the end of this course, you will have the confidence to use Spark in any of your big data projects.","details":"

This course is a gist of the basics one needs to grasp and master in order to navigate one's way when starting out with Spark. It sets-up the reader with enough knowledge and hands-on practice to tackle more involved and complex topics independently. Treat this comprehensive starter course as a weekend-read for getting up to speed and started on Spark.

","clos":["Basics of Spark","Spark Architecture","Spark SQL","Spark DataFrames","Spark Datasets"],"arabic_available":false,"page_tags":{"6599563637948416":"","6735071131205632":"Dataset,Spark Dataset,Datasets","6377800048050176":"DataFrames,Spark DataFrames","5207891154829312":"Spark,Spark vs MapReduce","5691721099771904":"Resilient Distributed Ds,RDD,Spark RDD","5884302265942016":"Spark Application,Spark","6125776467394560":"Spark Architecture","5567212650233856":"Spark,Spark Processing Engine","5675436160843776":"Spark Job,Spark Stage,Spark Task","4677596862742528":"Spark-Shell,Spark-Submit","5891835883421696":"Spark DataFrame,DataFrames","5484019242762240":"DataFrame Columns,Spark DataFrame","6618420232060928":"Spark Row,DataFrame Row","4834148848500736":"Spark,DataFrame,Spark DataFrame Ops","6602139353219072":"Datasets,Spark Datasets","5084726669344768":"Spark SQL Engine,Tungsten,Catalyst Optimizer","5064965380636672":"Spark SQL","4599925441036288":"Spark Table,Spark View","6516177847713792":"Spark SQL","5971859462422528":"Spark UDF,Spark Built In Functions","5428092309340160":"Spark High Order Funcs","5856401614700544":"Spark Join,Spark Union,Spark Windowing","6696332884967424":"Dataset Encoders,Encoder,Spark Encoder","4594830435418112":"Spark Datasets,Datasets","6748530216534016":"Spark DAG,Spark Task Scheduler","4762823148699648":"Spark"},"collection_toc_is_enabled":true,"page_count":null,"docker":{"envs":[],"container":{"buildLogUrl":"https://www.educative.io/api/author/5352985413550080/collection/6639962691731456/containers/4779952876027904/build/log","imageName":"author-5352985413550080-collection-6639962691731456-rev-25-container-4779952876027904-spark","file":{"name":"spark.tar.gz","size":167861},"buildStatusUrl":"https://www.educative.io/api/author/5352985413550080/collection/6639962691731456/containers/4779952876027904/build/status","metadata":{"sizeInBytes":167861},"id":-1,"tarballDownloadUrl":"https://www.educative.io/api/author/5352985413550080/collection/6639962691731456/containers/4779952876027904/download","rebuildImageUrl":"https://www.educative.io/api/author/5352985413550080/collection/6639962691731456/containers/4779952876027904/rebuild","buildStatus":"SUCCESS","track":false},"version":3,"jobs":[{"key":"lIyjCWeIZeG3q-sxwfXz9","name":"SparkShellUI","inputFileName":"foo","runScript":"ps axf | grep spark-shell | grep -v grep | awk '{print \"kill -9 \" $1}' | sh\n\n/opt/spark/bin/spark-shell\n\nread -p \"holding\"\n\nsleep 3600","ports":"4040","startScript":"echo \"ready\"","jobType":"Live","runInLiveContainer":true},{"key":"FtdeVI2hp4ClaHlSmjeLs","name":"SparkHistoryServerUI","inputFileName":"foo","runScript":"# Kill previous instance of History Server\nps axf | grep HistoryServer | grep -v grep | awk '{print \"kill -9 \" $1}' | sh\n\n# Create directory\nmkdir -p /tmp/spark-events\n\n\n\n# Run the job\n/opt/spark/bin/spark-submit --conf \"spark.eventLog.enabled=true\" --conf \"spark.eventLog.dir=/tmp/spark-events\" --class CountSequelMovies /scripts/target/scala-2.12/scripts_2.12-0.1.0-SNAPSHOT.jar\n\n# Start the Spark's History Server\n/opt/spark/sbin/start-history-server.sh","ports":"18080","startScript":"echo \"ready\"","jobType":"Live","forceRelaunchOnCompChange":true,"runInLiveContainer":true}],"loaded":true},"discounted_price":null,"cover_image_id":4582051394617344,"cover_image_metadata":"{\"width\":1024,\"height\":512,\"sizeInBytes\":49617,\"name\":\"Introduction to Spark_468 x 60.png\"}","cover_image_serving_url":"/v2api/collection/5352985413550080/6639962691731456/image/4582051394617344","tags":["Spark SQL","Spark DataFrames","Examples of Spark SQL","Spark Datasets","Spark SQL views"],"intro_video_url":"","intro_video_thumbnail_url":null,"aggregated_widget_stats":{"MarkdownEditor":75,"codeExerciseCount":0,"codeRunnableCount":20,"codeSnippetCount":135,"illustrations":20,"Code":35,"TerminalWidget":18,"Image":8,"LiveApp":2,"projects":0,"MxGraphWidget":9,"assessments":0,"cloudlabs":0},"default_themes":{"code_themes":{"Code":"default","Markdown":"default","RunJS":"default","SPA":"default","isForced":{"Code":false,"Markdown":false,"RunJS":false,"SPA":false}}},"api_keys":{"api_keys":[]},"skills":[],"testimonials":[],"licensing":null,"target_audience":"beginner","author_id":"5352985413550080","collection_id":"6639962691731456","approval_status":3005,"price":29,"is_private":false,"path_type":"regular","organization_id":null,"is_mini":false,"is_priced":true,"brief_summary":"Gain insights into Spark, its architecture, application lifecycle, and APIs. Delve into data frames, datasets, and Spark SQL to effectively manage and query big data.","approval_update_time":"2021-12-30T17:36:46.104Z","rating_visibility":true,"update_last_published_on_homepage":true,"show_developed_by":true,"udata_files":[],"CodeThemes":{"Code":"default","Markdown":"default","RunJS":"default","SPA":"default","isForced":{"Code":false,"Markdown":false,"RunJS":false,"SPA":false}},"is_marked_for_deletion":false,"transition_page_title":"","is_redirectable":false,"collection_type":"collection","adaptive_learning_mode":false,"HLOs_to_toc":{},"is_guide":false,"read_time":10200,"allow_logged_out_executions":false,"unique_live_widget_urls":false,"metadata_status":101,"palified_version":null},"pageSummarySSR":{"title":"Columns","description":"Explore how to work with individual columns in Spark DataFrames including accessing, creating new columns with expressions, and sorting data efficiently. Understand the use of the col() method and withColumn() for manipulating structured data for big data projects.","discourse_page_url":"https://discuss.educative.io/tag/columns__dataframes__an-introduction-to-spark?open=true&ctag=an-introduction-to-spark__datajek&cslug=introduction-to-spark&pslug=columns"},"adaptiveLearningConfigConstantSSR":0,"allowAllLessonPreview":false,"lockedBannerStatsSSR":{"b2cTrialStats":{"is_b2c_trial_active":true,"b2c_trial_active_duration":21,"b2c_trial_categories":"$a6"},"b2cStatus":100,"learnerTags":"$a7","workStats":1640,"interviewWorksStats":104,"inL2cStarterPack":false,"l2cWorkStats":46,"enableL2cStarterPackPaymentWidget":"false"},"pageTocSSR":"

","authorId":"5352985413550080","collectionId":"6639962691731456","pageId":"5484019242762240","isCollectionPageLockedCachingEnabled":true,"aceFeatureFlags":{"enableAceEditor":true,"enableAceEditorForAnswers":true},"serverConfigConstants":{"enable_notepad_prompt_ai":"$undefined"},"codeFeatureFlags":{"enableCodeCodeTabRedesign":"3"},"meta":{"type":["Article","TechArticle"],"title":"Manipulating Spark DataFrame Columns for Efficient Data Analysis","name":"An Introduction to Spark","description":"Learn how to access, manipulate, and sort columns in Spark DataFrames to enhance structured data processing with Spark SQL and DataFrame APIs.","image":"https://educative.io/api/collection/5352985413550080/6639962691731456/image/4582051394617344.png","isAccessibleForFree":false,"keywords":"$a7","provider":"Educative","publisher":"Educative","id":"courses/introduction-to-spark/columns","author":"DataJek","educationalLevel":"beginner","noIndex":false,"isForcedNoIndex":false,"noFollow":false,"redirectInfo":{"isDeletedCollectionPageRedirectable":false},"page_titles":{"6599563637948416":"Spark API","6735071131205632":"Datasets","5064965380636672":"Spark SQL - An Example","6748530216534016":"Execution of a Spark Application","6377800048050176":"DataFrames","5207891154829312":"Introduction","5675436160843776":"Anatomy of a Spark Application","5691721099771904":"Resilient Distributed Datasets","5884302265942016":"Spark Application Lifecycle","6125776467394560":"Architecture","5567212650233856":"Spark Differentiation","4677596862742528":"Spark Application - An Example","5891835883421696":"Working with DataFrames","5484019242762240":"Columns","6618420232060928":"Rows","4834148848500736":"More Operations with DataFrames","6602139353219072":"Working with Datasets","5084726669344768":"Spark SQL Engine","4599925441036288":"Spark SQL Views and Tables","6516177847713792":"Spark SQL Data Source","5971859462422528":"Spark User Defined Functions","5428092309340160":"Higher-Order Function","5856401614700544":"Joins, Unions, and Window Functions","6696332884967424":"Encoders","4594830435418112":"Datasets with Scala Case Class and Java Bean Class","4762823148699648":"Closing Remarks"},"is_marked_for_deletion":false,"transition_page_title":"","is_redirectable":false,"deleted_course_lesson_redirect":{"author_id":null,"collection_id":null,"page_id":null,"redirect_url_slug":null},"metadata_status":101,"additional_course_alternatives":[],"structured_data_json":"{\"@context\":\"https://schema.org\",\"@type\":\"LearningResource\",\"name\":\"Manipulating Spark DataFrame Columns for Efficient Data Analysis\",\"description\":\"Learn how to access, manipulate, and sort columns in Spark DataFrames to enhance structured data processing with Spark SQL and DataFrame APIs.\",\"isAccessibleForFree\":\"False\",\"hasPart\":[{\"@type\":\"WebContent\",\"isAccessibleForFree\":\"False\",\"cssSelector\":\".ed-lesson-content\"}],\"provider\":{\"@type\":\"Organization\",\"name\":\"Educative\"}}"},"requestUrl":"/courses/introduction-to-spark/columns","requestUrlInfo":{"authorId":5352985413550080,"collectionId":6639962691731456,"pageId":5484019242762240,"courseUrlSlug":"introduction-to-spark","pageUrlSlug":"columns"},"isExternalContent":false}}],[["$","script",null,{"id":"generate-data","type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"$a8"}}],["$","script",null,{"id":"structured-data","type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"LearningResource\",\"name\":\"Manipulating Spark DataFrame Columns for Efficient Data Analysis\",\"description\":\"Learn how to access, manipulate, and sort columns in Spark DataFrames to enhance structured data processing with Spark SQL and DataFrame APIs.\",\"isAccessibleForFree\":\"False\",\"hasPart\":[{\"@type\":\"WebContent\",\"isAccessibleForFree\":\"False\",\"cssSelector\":\".ed-lesson-content\"}],\"provider\":{\"@type\":\"Organization\",\"name\":\"Educative\"}}"}}],"$undefined"]]