Go: Reading a Plain-Text File
How does one actually read a plain text file in Go? Some searching
through the standard library revealed the bufio.Scanner utility,
which seems to be the most convenient way to accomplish this task.
How to open and read an arbitrary file in Go, using only the standard
library, is not entirely obvious: the functionality is a little scattered
about, across the packages os, io, and bufio. One may also need
functionality from the packages bytes for general input, or strings
and strconv for plain text, once it is read.
Do it Yourself with Buffers
As a convenience, os offers os.ReadFile(), which returns the
contents of the entire file as a []byte. (This function was added
only recently, in Go 1.16.)
Alternatively, we can use os.Open() to open a file for reading. The
os.File type returned by os.Open() implements the io.Reader
interface, with defines the function Read(buf []byte) (n int, err error)
to read bytes into a provided buffer.
This takes us over into io territory. There, we find the standalone
functions io.ReadAll(), io.ReadAtLeast(), and io.ReadFull(), all
of which populate a []byte buffer from a supplied io.Reader.
Obviously, we can read any file like this, we just need to implement
the required buffer handling ourselves. (For instance scanning for
newlines, in order to break the input into lines.)
Using facilities in bufio is a bit more convenient. We can obtain a
bufio.Reader from an io.Reader. (Remember that os.File
implements io.Reader.) A bufio.Reader implements io.Reader and
hence provides a function rdr.Read() to read bytes into a buffer.
But it also provides some functions that scan the input file for a
delimiter: rdr.ReadBytes(delim byte) and rdr.ReadString(delim byte) scan for the supplied delimiter, and return processed data as
[]byte or string respectively.
There are also two lower-level functions: rdr.ReadSlice() returns a
handle to a byte slice, containing the most recently read data: this
is not a copy; the data will be overwritten by the next read event.
There is also rdr.ReadLine(), which scans for end-of-line bytes
(both \n or \r\n), but does not entirely encapsulate the
underlying buffer management.
Introducing bufio.Scanner
The most convenient (but possibly least-well-known) method finally
is bufio.Scanner. A Scanner instance is obtained from an io.Reader
using bufio.NewScanner(r io.Reader) (not bufio.Scanner()!).
A Scanner reads from the io.Reader that it wraps, breaking the
input into tokens, and returning the tokens individually. Input is
broken into tokens using a user-supplied function, but the package
provides pre-defined convenience functions that break the input into
lines (this is also the default), whitespace-separated words, or into
individual bytes or UTF-8 “runes”.
Using a scanner is a two-step process: first we need to invoke the
Scan() function to advance the scanner to the next token; then we
need to call either Bytes() or Text() (not String()) to
retrieve the token as either a byte buffer or a string. (Caution:
the buffer returned by Bytes() is a handle and will be overwritten
on the next call to Scan(); the string returned by Text() is a
copy.)
All of this sounds complicated, but the actual code is simple. In
particular, note the idiom of using Scan() with a for-loop:
the method returns false when encountering EOF or an error, thus
ending the loop.
import (
"bufio"
"os"
)
func main() {
filename := ...
file, err := os.Open(filename)
if err != nil {
panic(err)
}
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
}
In Conclusion
The bufio.Scanner facility may be the most convenient way to read
a line-oriented file, using only routines from the standard library, but
it is a strange beast. The two-step process (first you scan, then you
retrieve) feels awkward: why does Scan() not return the token it
just read? Also, errors are not reported the usual way; instead, one
has to invoke the Err() method to find out if there were any.
Finally, watch out for surprising names: Scanner() does not create a
scanner (use bufio.NewScanner() for that), use Text() to retrieve
a string, and finally, note that Split() does not split, instead,
it’s the setter function to install a custom split function!
Finally, let me quickly point out two subpackages of io: io.fs
provides abstractions of a filesystem. It provides a fs.ReadFile()
function. And the entire contents of the package io.ioutil has
been deprecated, and should no longer be considered.