Go: Reading a Plain-Text File

How does one actually read a plain text file in Go? Some searching through the standard library revealed the bufio.Scanner utility, which seems to be the most convenient way to accomplish this task.

How to open and read an arbitrary file in Go, using only the standard library, is not entirely obvious: the functionality is a little scattered about, across the packages os, io, and bufio. One may also need functionality from the packages bytes for general input, or strings and strconv for plain text, once it is read.

Do it Yourself with Buffers

As a convenience, os offers os.ReadFile(), which returns the contents of the entire file as a []byte. (This function was added only recently, in Go 1.16.)

Alternatively, we can use os.Open() to open a file for reading. The os.File type returned by os.Open() implements the io.Reader interface, with defines the function Read(buf []byte) (n int, err error) to read bytes into a provided buffer.

This takes us over into io territory. There, we find the standalone functions io.ReadAll(), io.ReadAtLeast(), and io.ReadFull(), all of which populate a []byte buffer from a supplied io.Reader. Obviously, we can read any file like this, we just need to implement the required buffer handling ourselves. (For instance scanning for newlines, in order to break the input into lines.)

Using facilities in bufio is a bit more convenient. We can obtain a bufio.Reader from an io.Reader. (Remember that os.File implements io.Reader.) A bufio.Reader implements io.Reader and hence provides a function rdr.Read() to read bytes into a buffer. But it also provides some functions that scan the input file for a delimiter: rdr.ReadBytes(delim byte) and rdr.ReadString(delim byte) scan for the supplied delimiter, and return processed data as []byte or string respectively.

There are also two lower-level functions: rdr.ReadSlice() returns a handle to a byte slice, containing the most recently read data: this is not a copy; the data will be overwritten by the next read event. There is also rdr.ReadLine(), which scans for end-of-line bytes (both \n or \r\n), but does not entirely encapsulate the underlying buffer management.

Introducing bufio.Scanner

The most convenient (but possibly least-well-known) method finally is bufio.Scanner. A Scanner instance is obtained from an io.Reader using bufio.NewScanner(r io.Reader) (not bufio.Scanner()!). A Scanner reads from the io.Reader that it wraps, breaking the input into tokens, and returning the tokens individually. Input is broken into tokens using a user-supplied function, but the package provides pre-defined convenience functions that break the input into lines (this is also the default), whitespace-separated words, or into individual bytes or UTF-8 “runes”.

Using a scanner is a two-step process: first we need to invoke the Scan() function to advance the scanner to the next token; then we need to call either Bytes() or Text() (not String()) to retrieve the token as either a byte buffer or a string. (Caution: the buffer returned by Bytes() is a handle and will be overwritten on the next call to Scan(); the string returned by Text() is a copy.)

All of this sounds complicated, but the actual code is simple. In particular, note the idiom of using Scan() with a for-loop: the method returns false when encountering EOF or an error, thus ending the loop.

import (
	"bufio"
	"os"
)

func main() {
	filename := ...

	file, err := os.Open(filename)
	if err != nil {
		panic(err)
	}

	scanner := bufio.NewScanner(file)

	for scanner.Scan() {
		fmt.Println(scanner.Text())
	}
}

In Conclusion

The bufio.Scanner facility may be the most convenient way to read a line-oriented file, using only routines from the standard library, but it is a strange beast. The two-step process (first you scan, then you retrieve) feels awkward: why does Scan() not return the token it just read? Also, errors are not reported the usual way; instead, one has to invoke the Err() method to find out if there were any.

Finally, watch out for surprising names: Scanner() does not create a scanner (use bufio.NewScanner() for that), use Text() to retrieve a string, and finally, note that Split() does not split, instead, it’s the setter function to install a custom split function!

Finally, let me quickly point out two subpackages of io: io.fs provides abstractions of a filesystem. It provides a fs.ReadFile() function. And the entire contents of the package io.ioutil has been deprecated, and should no longer be considered.