14:[["$","$L132",null,{"props":{"lessonContent":{"components":[{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"$133","mdHtml":"

While PySpark provides a familiar environment for Python programmers, it’s good to follow a few best practices\nto make sure you are using Spark efficiently. Here are a set of recommendations I’ve compiled based on my\nexperience porting a few projects from Python to PySpark.

Avoid dictionaries

Using Python data types such as dictionaries means that the code might not be executable in a distributed mode. Instead of using keys to index ...

","cursorPosition":{"line":0,"ch":0},"comp_id":"10330b25-f500-471b-8251-5784f09cc76f"},"hash":0,"iteration":2,"saveVersion":3,"children":[{"text":""}]}],"summary":{"description":"Best practices for Python programmers using PySpark.","titleUpdated":true},"content":[{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"$134","mdHtml":"

Avoid dictionaries

Using Python data types such as dictionaries means that the code might not be executable in a distributed mode. Instead of using keys to index ...

Avoid dictionaries

Using Python data types such as dictionaries means that the code might not be executable in a distributed mode. Instead of using keys to index ...

","cursorPosition":{"line":0,"ch":0},"comp_id":"10330b25-f500-471b-8251-5784f09cc76f"},"hash":0,"iteration":2,"saveVersion":3,"children":[{"text":""}]}]},"isPreviewLesson":false,"pageType":"collection_lesson","aiCoachVideoUrl":"https://youtu.be/kgl8y9J3O6c","collectionDetailsSSR":{"title":"Data Science in Production: Building Scalable Model Pipelines","summary":"The goal of this course is to provide you with a set of tools that can be used to build predictive model services for product teams.\n\nIn this course, you’ll start by covering the different cloud environments and tools for building scalable data and model pipelines. You’ll then learn the different data sets and types of models that will be used heavily in everyday production. Throughout the course, you’ll have plenty of exercises and challenges to get you comfortable working with the diverse toolset.\n\nLastly, you’ll explore streaming model workflows which is crucial for building real-time data pipelines that move data between different components in a cloud environment. \n\n\nAfter working through this course, you will have gained valuable hands-on experience with many of the tools needed to build data products. You will also have a better understanding of how to build scalable machine learning pipelines in a cloud environment.","details":"$136","clos":[],"arabic_available":false,"page_tags":{"6065700733976576":"","5199387929083904":"","6039778240757760":"","6553952163201024":"","6000241556848640":"","5153527996350464":"","5739540984627200":"","5905306136608768":"","6473937794891776":"","5711462703038464":"","6661012645216256":"","5723026231394304":"","6509529182240768":"","4635902782472192":"","5718458198130688":"","5748740418699264":"","5631235264086016":"","4715655761756160":"","5171916898828288":"","4597569394049024":"","5209496050728960":"","5598056020967424":"","5083587591274496":"","5046876291203072":"","6179603266666496":"","5044526138785792":"","6308607927779328":"","6315137620246528":"","4595006590418944":"","4765192555593728":"","6673791045337088":"","5328351267913728":"","5276590033338368":"","4819548202074112":"","6725660996272128":"","5561350324486144":"","6319225321816064":"","5427470019854336":"","6713103585640448":"","6551071984975872":"","6726810436894720":"","5441331724812288":"","5494266223656960":"","4899242444324864":"","6570924464668672":"","4867037772906496":"","5132771375710208":"","6012545497300992":"","5646925266157568":"","4619082734239744":"","5195758211956736":"","6752623089680384":"","5184104824832000":"","4903046258622464":"","5672166252085248":"","6243354825195520":"","6274139137507328":"","6705117345611776":"","5113956717821952":"","6431912512978944":"","4966849507753984":"","5443966519476224":"","5161410133753856":"","6659776315392000":"","5863544189878272":"","6267134582718464":"","6754704974413824":"","5356564337655808":"","5366380200198144":"","6258671282552832":"","6611385942278144":"","5075531390255104":"","4781825370095616":"","5434579734233088":"","6738467548561408":"","6335395957571584":"","5590900840333312":"","4538073225363456":"","4558670143684608":"","5521672275755008":"","6111002857832448":"","5033081628000256":"","5408093996318720":"","4803430246776832":"","4939800827133952":"","4595718565134336":"","5584497681629184":"","6375881879584768":"","4979374169260032":"","6626478776123392":"","6707188945911808":"","4669121334607872":"","5738749318135808":"","5716477949771776":"","6741887684706304":"","5189412162895872":"","5113360136798208":"","6710397588471808":"","5228740473782272":"","6695510929833984":"","6065229914963968":"","6482977627308032":"","6110515848806400":"","6492280107040768":"","5399160749555712":"","6017733599690752":"","5195137639514112":"","5863799069343744":"","5623215754838016":"","5965470357258240":"","5125855520489472":"","5078237218078720":"","5903639998103552":"","4964468223639552":"","5856069403017216":"","5890807903813632":"","5941172166721536":"","5002847910100992":"","6191144606892032":"","5841050858684416":"","6394705412358144":"","4574554442432512":"","4570564887576576":"","5070673362550784":"","5226963397246976":"","5313892436410368":"","5235088737173504":"","6745708527616000":""},"collection_toc_is_enabled":true,"page_count":null,"docker":{"container":{"buildLogUrl":"/api/author/10370001/collection/6068402050301952/containers/4989552247439360/build/log","buildStatusUrl":"/api/author/10370001/collection/6068402050301952/containers/4989552247439360/build/status","tarballDownloadUrl":"/api/author/10370001/collection/6068402050301952/containers/4989552247439360/download","buildStatus":"SUCCESS","imageName":"author-10370001-collection-6068402050301952-rev-50-container-4989552247439360-dsp-new","file":{"name":"dsp-new.tar.gz","size":5298},"rebuildImageUrl":"/api/author/10370001/collection/6068402050301952/containers/4989552247439360/rebuild","id":-1,"metadata":{"sizeInBytes":5298},"track":false},"envs":[{"value":"","id":"b4a6180e-d12b-4bca-9273-7ab3bfafe264","key":"PROJECT_ID","name":"PROJECT_ID","info":"","required":false,"useAsDefault":true,"deleted":false,"defaultValue":""},{"value":"","id":"f9590ef8-cf04-4a6f-b8f8-d1f98af9c57f","key":"aws_access_key_id","name":"aws_access_key_id","info":"","required":false,"useAsDefault":true,"deleted":false,"defaultValue":""},{"value":"","id":"a77bcb1a-b116-473c-9cba-3bfd7de601dc","key":"aws_secret_access_key","name":"aws_secret_access_key","info":"","required":false,"useAsDefault":true,"deleted":false,"defaultValue":""},{"value":"","id":"UIQ5Cpf_d5LKP8QX0VATJ","key":"aws_account_id","name":"aws_account_id","info":"","required":false,"useAsDefault":true,"deleted":false,"defaultValue":""}],"jobs":[{"key":"b0cbdc04-7d46-4308-954e-b84118ae60d7","name":"jupyter-hello","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && nohup jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser > /dev/null 2>&1 &","ports":"8080","startScript":"echo \"start\"","jobType":"Live","runInLiveContainer":true},{"key":"54a6a7f5-f391-422f-8fcb-20afa48e8497","runInLiveContainer":false,"name":"code_widget_advanced","startScript":"","inputFileName":"main.py","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && PATH=/usr/local/google-cloud-sdk/bin:$PATH && cd /usr/local/notebooks/ && export GOOGLE_APPLICATION_CREDENTIALS=creds.json && gcloud auth activate-service-account --key-file=creds.json --project=$PROJECT_ID && python3 main.py","buildScript":"","ports":"","jobType":"Default"},{"runInLiveContainer":true,"name":"spa_job","startScript":"echo \"Basic SPA\" && cd usr/local/notebooks","inputFileName":"helloworld.ipnyb","key":"d873ae92-07fe-4367-bc2c-f87e65762f24","runScript":"cp -rf /usercode/* /usr/local/notebooks/ ","buildScript":"","ports":"8080"},{"key":"55738796-9591-4815-ac90-fcdb97ca4778","runInLiveContainer":true,"name":"spa_job_model_persistence","startScript":"echo \"models loaded\" && cd usr/local/notebooks && rm -rf models/ && python3 model_persistence.py","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"runInLiveContainer":true,"name":"spa_job_gcloud","startScript":"echo \"start gcloud\" && PATH=/usr/local/google-cloud-sdk/bin:$PATH && cd /usr/local/notebooks/ ","inputFileName":"helloworld.ipnyb","key":"f3b7d860-14e3-4a93-9c22-ca4254c5d0e0","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080"},{"runInLiveContainer":true,"name":"spa_job_gcloud_authen","startScript":"echo \"gcloud authenticated \" && PATH=/usr/local/google-cloud-sdk/bin:$PATH && cd /usr/local/notebooks/ && export GOOGLE_APPLICATION_CREDENTIALS=creds.json && gcloud auth activate-service-account --key-file=creds.json --project=$PROJECT_ID","inputFileName":"helloworld.ipnyb","key":"7ec1e2df-c4a0-4fbf-a1bb-ba2e53d91dfd","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080"},{"key":"7ec2cc28-def5-4774-a8db-e0d14c653ac1","runInLiveContainer":true,"name":"spa_gcloudAuthen_sklearnWorkflows","startScript":"PATH=/usr/local/google-cloud-sdk/bin:$PATH && export GOOGLE_APPLICATION_CREDENTIALS=/usr/local/notebooks/creds.json && gcloud auth activate-service-account --key-file=/usr/local/notebooks/creds.json --project=$PROJECT_ID && cd /usr/local/notebooks && docker image build -t \"sklearn_pipeline\" . && cd .. && echo \"Sklearn image loaded\"","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"key":"1a7651a7-4164-4baa-8779-85ee96d25f93","runInLiveContainer":true,"name":"spa_job_apacheKafka","startScript":"cd /root/kafka_2.12-3.2.0 && echo \"apache kafka\" ","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && nohup jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser > /dev/null 2>&1 &","buildScript":"","ports":"8080","jobType":"Live"},{"runInLiveContainer":true,"name":"spa_job_jupyter_gcloudAuth_pubsub","startScript":"echo \"Jupyter, gcloud authenticated and PubSub\"&& cd /usr/local/notebooks/ && PATH=/usr/local/google-cloud-sdk/bin:$PATH && export GOOGLE_APPLICATION_CREDENTIALS=creds.json && gcloud auth activate-service-account --key-file=creds.json --project=$PROJECT_ID","inputFileName":"helloworld.ipnyb","key":"4f3a7096-3567-4965-bf4b-6dae9107c57d","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && nohup jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser > /dev/null 2>&1 &","buildScript":"","ports":"8080"},{"key":"466c72b9-3009-4d74-9372-2ea2371a4a75","name":"code_widget_basic","inputFileName":"main.py","runScript":"python3 main.py","jobType":"Default","runInLiveContainer":false},{"key":"8370d7d0-c5b9-45e3-95c9-eb24efa77c64","runInLiveContainer":true,"name":"spa_job_aws","startScript":"echo \"AWS Configured\" && aws configure set aws_access_key_id $aws_access_key_id && aws configure set aws_secret_access_key $aws_secret_access_key && cd /usercode && cp creds.json dsdemo.json","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"runInLiveContainer":true,"name":"spa_job_gcloudAuth_GCP_Kubernetes","startScript":"echo \"Kubernetes on GCP\" && PATH=/usr/local/google-cloud-sdk/bin:$PATH && GOOGLE_APPLICATION_CREDENTIALS=creds.json && cd /usr/local/notebooks/ && sudo docker image build -t \"echo_service\" .","inputFileName":"helloworld.ipnyb","key":"b7066c78-d51b-45ec-a4e5-d512417fa169","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080"},{"runInLiveContainer":true,"name":"spa_job_jupyter","startScript":"echo \"Jupyter with terminal\" && cd /usr/local/notebooks/ ","inputFileName":"helloworld.ipnyb","key":"27c5b946-f06d-4edf-80be-156ca8842adb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && nohup jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser > /dev/null 2>&1 &","buildScript":"","ports":"8080"},{"key":"e194182e-95ec-4308-8661-5ab45d644fe5","runInLiveContainer":true,"name":"spa_job_gcloud_authen_model","startScript":"echo \"cloud authentication and model loading\" && PATH=/usr/local/google-cloud-sdk/bin:$PATH && cd /usr/local/notebooks/ && export GOOGLE_APPLICATION_CREDENTIALS=creds.json && gcloud auth activate-service-account --key-file=creds.json --project=$PROJECT_ID && rm -rf models/ && python3 model_persistence.py","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"key":"ef161500-9c5a-4d9f-929c-b08b65ca1b80","runInLiveContainer":true,"name":"spa_job_aws_model","startScript":"echo \"AWS Configured and models loaded\" && cd /usr/local/notebooks/ && aws configure set aws_access_key_id $aws_access_key_id && aws configure set aws_secret_access_key $aws_secret_access_key && rm -rf models/ && python3 model_persistence.py","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live","forceRelaunchOnRun":false,"forceRelaunchOnCompChange":false},{"key":"6c1c6d61-99e5-41e9-87cb-69b0d37b0e1f","runInLiveContainer":true,"name":"spa_job_aws_echo_service","startScript":"echo \"AWS Configured and echo_service built\" && cd /usr/local/notebooks/ && aws configure set aws_access_key_id $aws_access_key_id && aws configure set aws_secret_access_key $aws_secret_access_key && sudo docker image build -t \"echo_service\" .","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"key":"a5a61cdc-c862-4704-9e49-379d3c0bf916","runInLiveContainer":true,"name":"spa_job_Airflow","startScript":"echo \"Airflow SPA\" && sudo -H pip3 uninstall -y gunicorn && sudo -H pip3 install gunicorn==19.5.0 && mkdir ~/airflow && export AIRFLOW_HOME=~/airflow && pip3 install --user apache-airflow && PATH=/usr/local/bin/airflow:$PATH && cd /usr/local/notebooks/ && mkdir ~/airflow/dags && cd /usr/local/notebooks/ && cp creds.json dsdemo.json && sudo docker image build -t \"sklearn_pipeline\" . && airflow db init","inputFileName":"helloworld.ipynb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"key":"6a39efa7-15e0-4c8f-989f-903c6e21b7b9","runInLiveContainer":true,"name":"spa_job_aws_kaggle_nhl_data","startScript":"echo \"AWS Configured and NHL Data Loaded\" && aws configure set aws_access_key_id $aws_access_key_id && aws configure set aws_secret_access_key $aws_secret_access_key && cp /usercode/kaggle.json /usr/local/notebooks/kaggle.json && rm -rf /root/.kaggle && mkdir /root/.kaggle && cd /root/.kaggle/ && mv /usr/local/notebooks/kaggle.json ./ && chmod 600 ./kaggle.json && cd /usr/local/notebooks/ && rm game.csv && /usr/local/bin/kaggle datasets download martinellis/nhl-game-data && unzip nhl-game-data.zip && chmod 0600 *.csv","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"runInLiveContainer":true,"name":"spa_job_kaggle_nhl_data_jupyter","startScript":"echo \"Jupyter Notebook and NHL Data Loaded\" && mkdir ~/.kaggle && cd ~/.kaggle && mv /usr/local/notebooks/kaggle.json ./ && chmod 600 ./kaggle.json && alias kaggle=\"~/.local/bin/kaggle\" && cd /usr/local/notebooks/ && rm game.csv && ~/.local/bin/kaggle datasets download martinellis/nhl-game-data && unzip nhl-game-data.zip && chmod 0600 *.csv","inputFileName":"helloworld.ipnyb","key":"c3789c49-0c36-4282-a9e9-dd73107485c9","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ && nohup jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser > /dev/null 2>&1 &","buildScript":"","ports":"8080"},{"runInLiveContainer":true,"name":"airflow_liveApp","startScript":"echo \"Airflow SPA\"&& cd /usr/local/notebooks/ && sudo -H pip3 uninstall -y gunicorn && sudo -H pip3 install gunicorn==19.5.0","inputFileName":"airflow","key":"603f2a00-d33f-4cf0-be0a-93c077b0a93b","runScript":"airflow db init && airflow users create --email=test@test.com --firstname=test --lastname=test --password=hello --role=Admin --username=test && airflow webserver -p 8080","buildScript":"","ports":"8080"},{"runInLiveContainer":true,"name":"spa_job_heroku","startScript":"echo \"Basic SPA for Heroku\" && PATH=/heroku/bin/:$PATH && cd usr/local/notebooks","inputFileName":"helloworld.ipnyb","key":"OfqM12HG7k9Ms5RRgu0BZ","runScript":"cp -rf /usercode/* /usr/local/notebooks/ ","buildScript":"","ports":"8080"},{"runInLiveContainer":false,"name":"test-athar","startScript":"echo \"Basic SPA for Heroku\" && PATH=/heroku/bin/:$PATH && cd usr/local/notebooks","inputFileName":"main.sh","key":"4oW_Fb2XTBtVExjGIjqae","runScript":"bash main.sh","buildScript":"","ports":"8080"},{"key":"qHB1nhggOxMFrjd071Fvc","name":"spa_job_heroku-copy","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ ","ports":"5000","startScript":"echo \"Basic SPA for Heroku\" && PATH=/heroku/bin/:$PATH && cd usr/local/notebooks","jobType":"Live","forceRelaunchOnCompChange":true,"runInLiveContainer":true},{"key":"lKd4ZdIxYFlnR_Hf958-w","name":"core-jupyter","inputFileName":"foo","runScript":"nohup jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser > /dev/null 2>&1 &","jobType":"Live","ports":"8080","startScript":"echo \"hello word\"","runInLiveContainer":true},{"key":"5qCKa0sBy5auj7saltcvt","name":"core-jupyter-copy","inputFileName":"foo","runScript":"echo \"asdasd\"","jobType":"Live","ports":"8080","startScript":"jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser","runInLiveContainer":true,"forceRelaunchOnRun":true},{"key":"3s7tSh1szg6od35PS3RBa","name":"spa-jupyter","inputFileName":"foo","runScript":"echo \"hello\"","jobType":"Live","ports":"8080","startScript":"nohup jupyter notebook /usr/local/notebooks/CH7.ipynb --allow-root --no-browser > /dev/null 2>&1 &","runInLiveContainer":true,"forceRelaunchOnRun":true},{"key":"rpdKI3eHK2Od0aKnw1uE8","name":"jupyter-hello-spa","inputFileName":"helloworld.ipnyb","runScript":"echo \"start\"","ports":"8080","startScript":"nohup jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser > /dev/null 2>&1 &","jobType":"Live","runInLiveContainer":true},{"key":"IkbGK34P4weZXxLWwTrmE","jobType":"Live","name":"osama","inputFileName":"main.py","runScript":"python3 main.py","ports":"8080","startScript":"echo \"hello world\"","https":false,"forceRelaunchOnRun":true,"runInLiveContainer":true},{"key":"OAVAknL1c_5fRLeMEEyBN","runInLiveContainer":true,"name":"sklearn_workflow_job","startScript":"echo \"Jupyter, gcloud authenticated and PubSub\"&& cd /usr/local/notebooks/ && PATH=/usr/local/google-cloud-sdk/bin:$PATH && export GOOGLE_APPLICATION_CREDENTIALS=creds.json && gcloud auth activate-service-account --key-file=creds.json --project=$PROJECT_ID && python3 pipeline.py","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && nohup jupyter notebook /usr/local/notebooks/helloworld.ipynb --allow-root --no-browser > /dev/null 2>&1 &","buildScript":"","ports":"8080","jobType":"Live"},{"key":"yav8k-6UECJoONkXBw7F7","runInLiveContainer":true,"name":"spa_job_model_endpoint_keras","startScript":"cd usr/local/notebooks && rm -rf models/ && python3 model_persistence.py && echo \"models loaded\" ","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live","forceRelaunchOnRun":true},{"key":"XyNllVoP6kwe3xFnc9wdA","runInLiveContainer":true,"name":"spa_job_aws-copy","startScript":"echo \"AWS Configured\" && aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID && aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY && cd /usercode && cp creds.json dsdemo.json","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"key":"8xKQC85hQsM8S_a21VFNL","runInLiveContainer":true,"name":"spa_job_aws_model-copy","startScript":"echo \"AWS Configured and models loaded\" && cd /usr/local/notebooks/ && aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID && aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY && rm -rf models/ && python3 model_persistence.py","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"runInLiveContainer":true,"name":"spa_job_aws_echo_service-copy","startScript":"echo \"AWS Configured and echo_service built\" && cd /usr/local/notebooks/ && aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID && aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY && sudo docker image build -t \"echo_service\" .","inputFileName":"helloworld.ipnyb","key":"xLZGVvuyC5oFaE1MV1Z_w","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"},{"key":"V1PUZoGgN3tTQcDjQ-JPY","runInLiveContainer":true,"name":"spa_job_aws_kaggle_nhl_data-copy","startScript":"echo \"AWS Configured and NHL Data Loaded\" && aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID && aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY && cp /usercode/kaggle.json /usr/local/notebooks/kaggle.json && rm -rf /root/.kaggle && mkdir /root/.kaggle && cd /root/.kaggle/ && mv /usr/local/notebooks/kaggle.json ./ && chmod 600 ./kaggle.json && cd /usr/local/notebooks/ && rm game.csv && /usr/local/bin/kaggle datasets download martinellis/nhl-game-data && unzip nhl-game-data.zip && chmod 0600 *.csv","inputFileName":"helloworld.ipnyb","runScript":"cp -rf /usercode/* /usr/local/notebooks/ && cd /usr/local/notebooks/ ","buildScript":"","ports":"8080","jobType":"Live"}],"testRunners":[],"version":3,"loaded":true},"discounted_price":49,"cover_image_id":4953622195470336,"cover_image_metadata":"\"{\\\"width\\\":1024,\\\"height\\\":512,\\\"sizeInBytes\\\":110486,\\\"name\\\":\\\"Cover.png\\\"}\"","cover_image_serving_url":"/v2api/collection/10370001/6068402050301952/image/4953622195470336","tags":["data science","google cloud platform","Model pipeline","AWS","Lambda functions"],"intro_video_url":"","intro_video_thumbnail_url":null,"aggregated_widget_stats":{"MxGraphWidget":15,"TerminalWidget":1,"Code":210,"Image":65,"codeSnippetCount":194,"codeRunnableCount":53,"MarkdownEditor":516,"WebpackBin":24,"File":1,"illustrations":212,"Quiz":16,"LiveApp":6,"Columns":45,"codeExerciseCount":0,"projects":0,"assessments":0,"SlateHTML":72,"cloudlabs":4},"default_themes":{"code_themes":{"Code":"default","Markdown":"default","RunJS":"default","SPA":"default","isForced":{"Code":false,"Markdown":false,"RunJS":false,"SPA":false}}},"api_keys":{"api_keys":[{"name":"type","value":"","urlText":"","useAsDefault":false,"key":"type","id":"2a12ef50-d1e2-463a-af35-37132761caae"},{"name":"project_id","value":"","urlText":"","useAsDefault":false,"key":"project_id","id":"d897f614-f37f-42cd-898d-7e66f9532ec9"},{"name":"private_key_id","value":"","urlText":"","useAsDefault":false,"key":"private_key_id","id":"16eeb59e-28de-4872-a3f4-f9f4c01a6253"},{"name":"private_key","value":"","urlText":"","useAsDefault":false,"key":"private_key","id":"9611e23d-1c16-4dc9-8908-9a74f05dec72"},{"name":"client_email","value":"","urlText":"","useAsDefault":false,"key":"client_email","id":"415e9653-9028-422d-81ad-f9d46132328e"},{"name":"client_id","value":"","urlText":"","useAsDefault":false,"key":"client_id","id":"1a8ed243-3633-476c-b378-e49ba9f62259"},{"name":"auth_uri","value":"","urlText":"","useAsDefault":false,"key":"auth_uri","id":"83aace45-2fa9-4f7f-b672-7ddf8f24ac4e"},{"name":"token_uri","value":"","urlText":"","useAsDefault":false,"key":"token_uri","id":"cea35c91-d5d1-4c3e-b64d-073e42b9fa57"},{"name":"auth_provider_x509_cert_url","value":"","urlText":"","useAsDefault":false,"key":"auth_provider_x509_cert_url","id":"b6fb5469-1317-4892-a7cf-f87565654313"},{"name":"client_x509_cert_url","value":"","urlText":"","useAsDefault":false,"key":"client_x509_cert_url","id":"e7ee7943-c443-4536-a461-c9396a83e45b"},{"name":"kaggle_username","value":"","urlText":"","useAsDefault":false,"key":"kaggle_username","id":"5a9d0f4e-73b0-4f83-84e9-b22a95fc61d5"},{"name":"kaggle_key","value":"","urlText":"","useAsDefault":false,"key":"kaggle_key","id":"8aa136e8-9fd7-4a11-82bd-01d74a4e1885"}]},"skills":[],"testimonials":[],"licensing":null,"target_audience":"advanced","author_id":"10370001","collection_id":"6068402050301952","approval_status":3005,"price":79,"is_private":false,"path_type":"regular","organization_id":null,"is_mini":false,"is_priced":true,"brief_summary":"Gain insights into building scalable data and model pipelines, explore different cloud environments, delve into streaming workflows, and discover essential tools for creating real-time data products.","approval_update_time":"2020-07-21T00:17:06.171Z","rating_visibility":true,"update_last_published_on_homepage":true,"show_developed_by":true,"udata_files":[],"CodeThemes":{"Code":"default","Markdown":"default","RunJS":"default","SPA":"default","isForced":{"Code":false,"Markdown":false,"RunJS":false,"SPA":false}},"is_marked_for_deletion":false,"transition_page_title":"","is_redirectable":false,"collection_type":"collection","adaptive_learning_mode":false,"HLOs_to_toc":{},"is_guide":false,"read_time":28800,"allow_logged_out_executions":false,"unique_live_widget_urls":false,"metadata_status":101,"palified_version":null},"pageSummarySSR":{"title":"- Best Practices","description":"Best practices for Python programmers using PySpark.","discourse_page_url":"https://discuss.educative.io/tag/-best-practices__pyspark-for-batch-pipelines__data-science-in-production-building-scalable-model-pipelines?open=true&ctag=data-science-in-production-building-scalable-model-pipelines__ben-weber&cslug=data-science-in-production-building-scalable-model-pipelines&pslug=best-practices"},"adaptiveLearningConfigConstantSSR":0,"enableLessonPageLockedBannerV2":true,"allowAllLessonPreview":false,"lockedBannerStatsSSR":{"b2cTrialStats":{"is_b2c_trial_active":true,"b2c_trial_active_duration":7,"b2c_trial_categories":"$137"},"b2cStatus":100,"learnerTags":"$138","workStats":1590,"interviewWorksStats":93,"inL2cStarterPack":false,"l2cWorkStats":44,"enableL2cStarterPackPaymentWidget":"true"},"pageTocSSR":"

","authorId":"10370001","collectionId":"6068402050301952","pageId":"5427470019854336","isCollectionPageLockedCachingEnabled":true,"aceFeatureFlags":{"enableAceEditor":true,"enableAceEditorForAnswers":true},"meta":{"type":["Article","TechArticle"],"title":"- Best Practices","name":"Data Science in Production: Building Scalable Model Pipelines","description":"Best practices for Python programmers using PySpark.","image":"https://educative.io/api/collection/10370001/6068402050301952/image/4953622195470336.png","isAccessibleForFree":false,"keywords":"$138","provider":"Educative","publisher":"Educative","id":"courses/data-science-in-production-building-scalable-model-pipelines/best-practices","author":"Educative","educationalLevel":"advanced","noIndex":true,"isForcedNoIndex":true,"noFollow":false,"redirectInfo":{"isDeletedCollectionPageRedirectable":false},"page_titles":{"6065700733976576":"Conclusion : Streaming model workflows","5199387929083904":"Conclusion : Working tools for model pipelines","6039778240757760":"Managed Services","6553952163201024":"Dataflow Streaming","6000241556848640":"- Cloud Cron","5153527996350464":"Quiz - Machine Learning in PySpark","4619082734239744":"- Apache Airflow","5905306136608768":"- Model Pipeline","6473937794891776":"Distributed Feature Engineering","5711462703038464":"Quiz - Containers","6661012645216256":"Prototype Models","5723026231394304":"- Model Training","6509529182240768":"Quiz - Workflows and Scheduling","4635902782472192":"- Natality Streaming","5718458198130688":"Quiz - Workflow Tools","5748740418699264":"Quiz - Web Deployments","5631235264086016":"Introduction to PySpark for Batch Pipelines","4715655761756160":"Workflow Tools","5171916898828288":"Quiz - Prototype Modeling","4597569394049024":"Conclusion : Workflow Tools for Model Pipelines","5209496050728960":"Quiz - Importing Datasets","5598056020967424":"Course Overview","5083587591274496":"Introduction to Models as Web Endpoints","5046876291203072":"- Model Function","6179603266666496":"- BigQuery Publish","5044526138785792":"- Keras Regression","6308607927779328":"- Transforming Data","6315137620246528":"Introduction to Workflow Tools for Model Pipelines","4595006590418944":"- Heroku","4765192555593728":"- Sklearn Streaming 2","4867037772906496":"🛑 Important Note!","5328351267913728":"- Echo Function","5276590033338368":"- Persisting Dataframes","4819548202074112":"- Model Function","6725660996272128":"- Sklearn Streaming 1","5561350324486144":"Introduction to Streaming Model Workflows","5113360136798208":"Quiz - Managed Services","5427470019854336":"- Best Practices","6713103585640448":"Apache Beam","6551071984975872":"Introduction to Cloud Dataflow and Batch Modeling","6726810436894720":"- Keras Model","5441331724812288":"Conclusion : Models as Web Endpoints","5161410133753856":"Quiz - Lambda Functions","4899242444324864":"- API Gateway","6570924464668672":"- Model Refreshes","6673791045337088":"Conclusion : Data science models, tools and environments","5132771375710208":"Batch Model Pipeline","6012545497300992":"- PubSub","5646925266157568":"- Converting Dataframes","5739540984627200":"Quiz - Web Services and Persistent Models","5195758211956736":"- Cloud Storage (GCS)","6752623089680384":"- Linear Regression","5184104824832000":"GCP Model Pipeline","4903046258622464":"- Simple Storage Service (S3)","5672166252085248":"- Spark Clusters","6243354825195520":"Applied Data Science","6274139137507328":"- Databricks Community Edition","6705117345611776":"- Model Application","5113956717821952":"Quiz - Spark Environments","6431912512978944":"Model Persistence","4966849507753984":"Model Endpoints","5443966519476224":"Productizing PySpark","5494266223656960":"- Datastore Publish","6659776315392000":"Sklearn Workflow","5863544189878272":"- Gunicorn","6267134582718464":"- Managed Airflow","6754704974413824":"Thank You","5356564337655808":"Coding Environments","6707188945911808":"Conclusion : Cloud Dataflow for Batch Modeling","6258671282552832":"Spark Streaming","6611385942278144":"Staging Data","5075531390255104":"Quiz - GCP Model Pipeline","4781825370095616":"- AWS Container Registry (ECR)","5434579734233088":"Conclusion","6738467548561408":"Cloud Functions (GCP)","6335395957571584":"Quiz - PySpark","5590900840333312":"Cron","4538073225363456":"- Echo Service","4558670143684608":"- Pandas UDFs","5521672275755008":"Introduction to Models as Serverless Functions","6111002857832448":"Web Services","5033081628000256":"- Load Balancing","5408093996318720":"A PySpark Primer","4803430246776832":"Kubernetes on GCP","4939800827133952":"- Apache Kafka","4595718565134336":"Automated Feature Engineering","5584497681629184":"- BigQuery to Pandas","6375881879584768":"- GCP Credentials","4979374169260032":"Interactive Web Services with Dash","6626478776123392":"Orchestration","5366380200198144":"Docker","4669121334607872":"- Access Control","5738749318135808":"Introduction to Containers as Reproducible Models","5716477949771776":"Quiz - Data Science Preliminary Concepts","6741887684706304":"Python for Scalable Compute","5189412162895872":"- Logistic Regression","6319225321816064":"Lambda Functions (AWS)","6710397588471808":"- Kaggle to Pandas","5228740473782272":"Deploying a Web Endpoint","6695510929833984":"- AWS Container Service (ECS)","6065229914963968":"Cloud Environments","6482977627308032":"Spark Environments","6110515848806400":"- BigQuery Export","6492280107040768":"Conclusion: Containers as Reproducible Models","5399160749555712":"Echo Service","6017733599690752":"Distributed Deep Learning","5195137639514112":"Quiz - Streaming Model Workflows","5863799069343744":"Introduction to Datasets","5623215754838016":"MLlib Batch Pipeline","5965470357258240":"Quiz - Dataflow and Batch Modeling","5125855520489472":"Create an Echo Function in Lambda","5078237218078720":"Working with AWS Container Registry","5903639998103552":"Working with S3 in Lambda","4964468223639552":"Working with API in Lambda","5996357001936896":"","5856069403017216":null,"5890807903813632":null,"5941172166721536":null,"5002847910100992":null,"6191144606892032":null,"5841050858684416":null,"6394705412358144":null,"4574554442432512":null,"4570564887576576":null,"5070673362550784":null,"5226963397246976":null,"5313892436410368":null,"5235088737173504":null,"6745708527616000":null},"is_marked_for_deletion":false,"transition_page_title":"","is_redirectable":false,"deleted_course_lesson_redirect":{"author_id":null,"collection_id":null,"page_id":null,"redirect_url_slug":null},"metadata_status":101,"additional_course_alternatives":[]},"requestUrl":"/courses/data-science-in-production-building-scalable-model-pipelines/best-practices","requestUrlInfo":{"authorId":10370001,"collectionId":6068402050301952,"pageId":5427470019854336,"courseUrlSlug":"data-science-in-production-building-scalable-model-pipelines","pageUrlSlug":"best-practices"},"isExternalContent":false}}],[["$","script",null,{"id":"generate-data","type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"$139"}}],false,"$undefined"]]