How to generate an n-gram in Java

Educative Answers Team

An​ n-gram is a sequence of n characters within a given piece of text.

Generating a list of n-grams

For a given piece of text, we can compute a list of all the n-grams. The size of the n-grams (n)​ must be specified in the arguments.

Let’s make a list of n-grams from a string:

import java.util.*;

class Ngrams {
  public static List<String> ngrams(int n, String str) {
    List<String> ngrams = new ArrayList<String>();
    for (int i = 0; i < str.length() - n + 1; i++)
        // Add the substring or size n
        ngrams.add(str.substring(i, i + n));
        // In each iteration, the window moves one step forward
        // Hence, each n-gram is added to the list

    return ngrams;

  public static void main( String args[] ) {
      String s = "abcdef";
      List<String> ngrams = ngrams(3, s);
      for (String ngram : ngrams){

All you have to do is iterate through the string with a fixed window of size n. In each iteration, the new substring, or n-gram, will be added to the ngrams list.

Creating an iterator

The lazy algorithm approach is to create an iterator that iterates the text. In each iteration, the current n-gram is printed.

import java.util.Iterator;

class NgramIterator implements Iterator<String> {
    private final String str;
    private final int n;
    int pos = 0;
    public NgramIterator(int n, String str) {
        this.n = n;
        this.str = str;
    public boolean hasNext() {
        return pos < str.length() - n + 1;
    public String next() {
        return str.substring(pos, pos++ + n);

    public static void main( String args[] ) {
        String s = "abcdef";
        new NgramIterator(3, s).forEachRemaining(System.out::println);

Once again, n is the size of the n-grams, ​and pos instructs the iterator to begin at the start of the string.


