Finding plays/poems by each author (sed, sort, uniq, head)

Well, so far we do not know how many authors are there in the dataset and also their name’s spellings. Therefore, it is not feasible to insert each authors names in the command shown above. We need to do a smarter, but slightly complicated approach!

  • Step 1: Convert the first line into a column, $head -n 1 plays_and_poems_stat.csv | sed -e 's/,/\'$'\n/g'
  • Step 2: For each line, remove all the bits that appeared before the (___play___) and (___poem___), including them. This way we get the raw ‘author names’: sed -r 's/([_a-zA-Z0-9]*)(___play___)|([_a-zA-Z0-9]*)(___poem___)/g'
  • Step 3: Find the unique appearances of each author sort | uniq -c. Not e that it is a requirement that you need to run sort, before you call the uniq. We use uniq -c, because it will then not only “condense” the neighboring lines that are the same, but also count how many of each are seen!

Now let’s combine all the steps into a piped (|) commmand as the steps need to appear in a tandem:

Get hands-on with 1200+ tech skills courses.