13:[["$","$L142",null,{"props":{"lessonContent":{"components":[{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"## Chapter Goals:\n- Understand how a `MonitoredTrainingSession` works\n- Learn about saving checkpoints and tracking scalar values during training\n- Train a machine learning model using a `MonitoredTrainingSession`","mdHtml":"

Chapter Goals:

Understand how a MonitoredTrainingSession works
Learn about saving checkpoints and tracking scalar values during training
Train a machine learning model using a MonitoredTrainingSession

\n","cursorPosition":{"line":0,"ch":0},"comp_id":"70f318a8-0725-45b5-ac7a-47b4f6e0c451"},"hash":"1","iteration":1,"saveVersion":10},{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"## A. Logging values\nWhile `tf.summary.scalar` lets us keep track of certain values in an events file for TensorBoard, it is also useful to directly log values to STDOUT during training. For instance, it is customary to log the loss and iteration count, so we can stop training if there is an issue.","mdHtml":"

A. Logging values

While tf.summary.scalar lets us keep track of certain values in an events file for TensorBoard, it is also useful to directly log values to STDOUT during training. For instance, it is customary to log the loss and iteration count, so we can stop training if there is an issue.

\n","cursorPosition":{"line":0,"ch":0},"comp_id":"9aff7349-d1b5-4204-9a52-e32e8a930c2a"},"hash":"2","iteration":0,"saveVersion":4},{"type":"Code","mode":"view","content":{"solutionContent":"\n\n\n","docker":{},"codeKeys":[{"codeKeysToReplace":[]}],"judgeContent":null,"hiddenCodeContent":{"prependCode":"\n\n","appendCode":"\n\n","codeSelection":"prependCode"},"enableStdin":false,"caption":"Running a script that trains a model for about 400 steps.","theme":"default","evaluateWithoutExecution":false,"comp_id":"77801340-e214-4981-8fa1-dd04e40ccc43","staticEntryFileName":true,"judgeContentPrepend":"\n\n\n","allowDownload":false,"evaluateLanguage":"shell","additionalContent":[],"judgeHints":null,"showSolution":false,"selectedIndex":0,"judge":false,"treatOutputAsHTML":false,"version":"7.0","title":"","language":"shell","content":"python train_model.py","entryFileName":"main.sh","initiateCodeKeyReplacement":false,"enableHiddenCode":false,"dockerJob":{"key":"-1","name":"","runScript":"","inputFileName":"","runInLiveContainer":false},"runnable":true,"dockerExecutionContext":null,"timeLimit":30,"readOnlyApiKeys":false,"transformOutput":false,"specialInput":"no-input","outputTransformCode":"function outputTransform(stdout, stderr) {\n // Transform output or perform API key extraction.\n const apiKeys = {};\n return { apiKeys, stdout, stderr };\n}","selectedApiKeys":{},"selectedEnvVars":{},"isCodeDrawing":false,"imageId":"","outputImageHeight":150},"hash":"3","iteration":2,"saveVersion":1},{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"You'll notice each line of output is prepended by \"INFO:tensorflow\". This just means the [logging level](https://docs.python.org/3/library/logging.html#logging-levels) is set to INFO.\n\nWe log specific values while training using a `tf.compat.v1.train.LoggingTensorHook` object. The object is initialized with a dictionary mapping labels to scalar valued tensors. In our example, the labels we used were `'loss'` and `'step'`, for the loss and iteration count tensors, respectively. In the `run_model_training` function, `self.loss` represents the loss tensor and `self.global_step` represents the iteration count, also known as the training step.\n\nTo specify the logging frequency, we need to set exactly one of `every_n_iter` or `every_n_secs` as a keyword argument when initializing `tf.compat.v1.train.LoggingTensorHook`. In the example above, we set `every_n_iter` to 100, so that logging is shown every 100 iterations.\n\nWe can also use `every_n_secs` to specify a time interval for displaying logged values.","mdHtml":"

You’ll notice each line of output is prepended by “INFO:tensorflow”. This just means the logging level is set to INFO.

We log specific values while training using a tf.compat.v1.train.LoggingTensorHook object. The object is initialized with a dictionary mapping labels to scalar valued tensors. In our example, the labels we used were 'loss' and 'step', for the ...

","cursorPosition":{"line":0,"ch":0},"comp_id":"bf5d972d-2676-4f6e-95c4-fa64f3f7e2ca"},"hash":"4","iteration":0,"saveVersion":5}],"summary":{"description":"Explore how to train machine learning models using TensorFlow's MonitoredTrainingSession. Learn to log training metrics, handle NaN loss conditions, and save model checkpoints for efficient and reliable model execution.","titleUpdated":true},"content":[{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"# Chapter Goals:\n- Understand how a `MonitoredTrainingSession` works\n- Learn about saving checkpoints and tracking scalar values during training\n- Train a machine learning model using a `MonitoredTrainingSession`","mdHtml":"

Chapter Goals:

Understand how a MonitoredTrainingSession works
Learn about saving checkpoints and tracking scalar values during training
Train a machine learning model using a MonitoredTrainingSession

\n","cursorPosition":{"line":0,"ch":0},"comp_id":"70f318a8-0725-45b5-ac7a-47b4f6e0c451"},"hash":"1","iteration":1,"saveVersion":10},{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"# A. Logging values\nWhile `tf.summary.scalar` lets us keep track of certain values in an events file for TensorBoard, it is also useful to directly log values to STDOUT during training. For instance, it is customary to log the loss and iteration count, so we can stop training if there is an issue.","mdHtml":"

A. Logging values

You’ll notice each line of output is prepended by “INFO:tensorflow”. This just means the logging level is set to INFO.

","cursorPosition":{"line":0,"ch":0},"comp_id":"bf5d972d-2676-4f6e-95c4-fa64f3f7e2ca"},"hash":"4","iteration":0,"saveVersion":5}],"darkModeContent":[{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"# Chapter Goals:\n- Understand how a `MonitoredTrainingSession` works\n- Learn about saving checkpoints and tracking scalar values during training\n- Train a machine learning model using a `MonitoredTrainingSession`","mdHtml":"

Chapter Goals:

Understand how a MonitoredTrainingSession works
Learn about saving checkpoints and tracking scalar values during training
Train a machine learning model using a MonitoredTrainingSession

\n","cursorPosition":{"line":0,"ch":0},"comp_id":"70f318a8-0725-45b5-ac7a-47b4f6e0c451"},"hash":"1","iteration":1,"saveVersion":10},{"type":"MarkdownEditor","mode":"view","content":{"version":"2.0","text":"# A. Logging values\nWhile `tf.summary.scalar` lets us keep track of certain values in an events file for TensorBoard, it is also useful to directly log values to STDOUT during training. For instance, it is customary to log the loss and iteration count, so we can stop training if there is an issue.","mdHtml":"

A. Logging values

You’ll notice each line of output is prepended by “INFO:tensorflow”. This just means the logging level is set to INFO.

","cursorPosition":{"line":0,"ch":0},"comp_id":"bf5d972d-2676-4f6e-95c4-fa64f3f7e2ca"},"hash":"4","iteration":0,"saveVersion":5}]},"isPreviewLesson":false,"pageType":"collection_lesson","aiCoachVideoUrl":"https://youtu.be/kgl8y9J3O6c","collectionDetailsSSR":{"title":"Applied Machine Learning: Deep Learning for Industry","summary":"In this course, you'll level up your skills learned in the Industry Case Study and Machine Learning for Software Engineers. You'll take the modeling and data pipeline concepts and apply them to production-level classification and regression models for industry deployment, while continuing to practice the most efficient techniques for building scalable machine learning models. After this course, you will be able to complete industry-level machine learning projects, from data pipeline creation to model deployment and inference.\n\nThe code for this course is built around the TensorFlow framework, one of the premier frameworks for industry machine learning, and the Python pandas library for data analysis. Knowledge of Python and TensorFlow are prerequisites. \n\nThis course was created by AdaptiLab, a company specializing in evaluating, sourcing, and upskilling enterprise machine learning talent. It is built in collaboration with industry machine learning experts from Google, Microsoft, Amazon, and Apple.","details":"","clos":[],"arabic_available":false,"page_tags":{"6229584439672832":"","5200305392189440":"","6306898515066880":"","6465723280916480":"","6228078483210240":"","6235149308002304":"","6161274394116096":"","5506833013800960":"","5181901591543808":"","5297401852067840":"","4961909642100736":"","6106548189265920":"","6533524981022720":"","6429930369843200":"","4573944877154304":"","5598324440694784":"","5689292686884864":"","6668500502315008":"","5625117520429056":"","6265994857152512":"","6369284525654016":"","6752698672087040":"","4980648282423296":"","4627806921162752":"","6128774108151808":"","6605503431966720":""},"collection_toc_is_enabled":true,"page_count":null,"docker":{"envs":[],"container":{"buildStatusUrl":"https://www.educative.io/api/author/6083138522447872/collection/6598392304107520/containers/5542453150482432/build/status","tarballDownloadUrl":"https://www.educative.io/api/author/6083138522447872/collection/6598392304107520/containers/5542453150482432/download","buildStatus":"SUCCESS","file":{"name":"AML-DLI.tar.gz","size":3421022},"imageName":"author-6083138522447872-collection-6598392304107520-rev-16-container-5542453150482432-aml-dli","metadata":{"sizeInBytes":3421022},"id":-1,"buildLogUrl":"https://www.educative.io/api/author/6083138522447872/collection/6598392304107520/containers/5542453150482432/build/log","rebuildImageUrl":"https://www.educative.io/api/author/6083138522447872/collection/6598392304107520/containers/5542453150482432/rebuild","track":false},"version":3,"jobs":[{"inputFileName":"task.py","name":"case-study","key":"4a924929-b66c-403a-806c-05d511c5ad00","runScript":"python task.py"}],"loaded":true},"discounted_price":null,"cover_image_id":5960573118316544,"cover_image_metadata":"\"{\\\"width\\\":1920,\\\"height\\\":1080,\\\"sizeInBytes\\\":330245}\"","cover_image_serving_url":"/v2api/collection/6083138522447872/6598392304107520/image/5960573118316544","tags":["python","machine learning","deep learning","tensorflow","neural networks"],"intro_video_url":"","intro_video_thumbnail_url":null,"aggregated_widget_stats":{"codeRunnableCount":41,"illustrations":9,"MarkdownEditor":145,"codeExerciseCount":15,"codeSnippetCount":14,"Image":7,"Code":70,"Quiz":2,"projects":0,"TerminalWidget":0,"assessments":0},"default_themes":{"code_themes":{"Code":"default","SPA":"default","RunJS":"default","Markdown":"default","isForced":{"SPA":false,"Code":false,"Markdown":false,"RunJS":false}}},"api_keys":{"api_keys":[]},"skills":[],"testimonials":[],"licensing":null,"target_audience":"advanced","author_id":"6083138522447872","collection_id":"6598392304107520","approval_status":3005,"price":29,"is_private":false,"path_type":"regular","organization_id":null,"is_mini":false,"is_priced":true,"brief_summary":"Gain insights into industry-level machine learning by applying advanced TensorFlow and Python techniques. Explore efficient methods for creating scalable models, from data pipelines to deployment and inference.","approval_update_time":"2019-08-26T20:03:03.147Z","rating_visibility":true,"update_last_published_on_homepage":true,"show_developed_by":true,"udata_files":[],"CodeThemes":{"Code":"default","Markdown":"default","RunJS":"default","SPA":"default","isForced":{"Code":false,"Markdown":false,"RunJS":false,"SPA":false}},"is_marked_for_deletion":false,"transition_page_title":"","is_redirectable":false,"collection_type":"collection","adaptive_learning_mode":false,"HLOs_to_toc":{},"is_guide":false,"read_time":10800,"allow_logged_out_executions":false,"unique_live_widget_urls":false,"metadata_status":101,"palified_version":null},"pageSummarySSR":{"title":"Training","description":"Explore how to train machine learning models using TensorFlow's MonitoredTrainingSession. Learn to log training metrics, handle NaN loss conditions, and save model checkpoints for efficient and reliable model execution.","discourse_page_url":"https://discuss.educative.io/tag/training__model-execution__applied-machine-learning-deep-learning-for-industry?open=true&ctag=applied-machine-learning-deep-learning-for-industry__adaptilab&cslug=deep-learning-for-industry&pslug=training"},"adaptiveLearningConfigConstantSSR":0,"enableLessonPageLockedBannerV2":true,"allowAllLessonPreview":false,"lockedBannerStatsSSR":{"b2cTrialStats":{"is_b2c_trial_active":true,"b2c_trial_active_duration":21,"b2c_trial_categories":"$143"},"b2cStatus":100,"learnerTags":"$144","workStats":1630,"interviewWorksStats":100,"inL2cStarterPack":false,"l2cWorkStats":46,"enableL2cStarterPackPaymentWidget":"false"},"pageTocSSR":"

","authorId":"6083138522447872","collectionId":"6598392304107520","pageId":"6228078483210240","isCollectionPageLockedCachingEnabled":true,"aceFeatureFlags":{"enableAceEditor":true,"enableAceEditorForAnswers":true},"serverConfigConstants":{"enable_notepad_prompt_ai":"$undefined"},"codeFeatureFlags":{"enableCodeCodeTabRedesign":"3"},"meta":{"type":["Article","TechArticle"],"title":"Training models efficiently with TensorFlow MonitoredTrainingSession","name":"Applied Machine Learning: Deep Learning for Industry","description":"Learn how to train machine learning models using TensorFlow's MonitoredTrainingSession with logging, NaN handling, and checkpoint saving techniques.","image":"https://educative.io/api/collection/6083138522447872/6598392304107520/image/5960573118316544.png","isAccessibleForFree":false,"keywords":"$144","provider":"Educative","publisher":"Educative","id":"courses/deep-learning-for-industry/training","author":"Adaptilab","educationalLevel":"advanced","noIndex":true,"isForcedNoIndex":true,"noFollow":false,"redirectInfo":{"isDeletedCollectionPageRedirectable":false},"page_titles":{"4627806921162752":"Estimator Eval","4980648282423296":"TFRecords","5200305392189440":"Estimator Predict","6128774108151808":"Quiz","6752698672087040":"Configuration","6369284525654016":"Features","6265994857152512":"TensorBoard","5625117520429056":"Mapping","6306898515066880":"Estimator Train","6668500502315008":"Save For Inference","5689292686884864":"Estimator","5598324440694784":"Evaluation","6229584439672832":"Predictions","6465723280916480":"Overview","4573944877154304":"Feature Columns","6106548189265920":"Dataset","6533524981022720":"Dataset Iteration","6605503431966720":"Quiz","4961909642100736":"Parsing","5297401852067840":"EstimatorSpec","5181901591543808":"Regression","5506833013800960":"Introduction","6235149308002304":"Checkpoint","6228078483210240":"Training","6161274394116096":"Introduction","6429930369843200":"Protocol Buffer"},"is_marked_for_deletion":false,"transition_page_title":"","is_redirectable":false,"deleted_course_lesson_redirect":{"author_id":null,"collection_id":null,"page_id":null,"redirect_url_slug":null},"metadata_status":101,"additional_course_alternatives":[]},"requestUrl":"/courses/deep-learning-for-industry/training","requestUrlInfo":{"authorId":6083138522447872,"collectionId":6598392304107520,"pageId":6228078483210240,"courseUrlSlug":"deep-learning-for-industry","pageUrlSlug":"training"},"isExternalContent":false}}],[["$","script",null,{"id":"generate-data","type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"$145"}}],false,"$undefined"]]