In this tutorial, we will create a fine-tuned OpenAI model to extract structured information from customer reviews. This will involve creating training data, formatting and validating this data, and initiating a fine-tuning job to build the model.

Here are a few cases where structured information extraction using fine-tuning can be particularly effective:

  1. Financial reporting: Parse financial news or reports to extract "company name", "stock symbol", "financial metric", and "percent change".

  2. Real estate listings: From real estate descriptions, extract "property type", "location", "price", and "number of bedrooms".

  3. Legal document summarization b Summarize legal documents by extracting key information like "case number", "parties involved", "court", and "judgment".

  4. Customer reviews analysis: Analyze customer reviews to identify "product name", "review rating", "customer sentiment", and "issues reported".

  5. Event summaries: From event announcements or articles, extract "event name", "date", "location", and "participants".

  6. Resume screening: Automate the extraction of information from resumes, such as "applicant name", "qualification", "experience", and "skills".

  7. Recipe extraction: From cooking websites or blogs, extract "dish name", "ingredients", "cooking time", and "preparation steps".

  8. Software bug reporting: From bug reports, extract "software version", "error message", "severity level", and "steps to reproduce".

We are going to use the example of customer reviews analysis for this tutorial.

Step 1: Collect and prepare training data

You will need a dataset of customer reviews. For fine-tuning, it is recommended to have at least 50 but of course a lot more than that is much better. You must have at least 10 to start. It is important that each line contains a complete JSON dictionary. The dictionaries cannot be broken up into multiple lines. Here is some sample data with 50 examples:

Get hands-on with 1400+ tech skills courses.