Data Analysis

Sampling from a Stream

Selecting a random element from an array of length n is easy: simply generate a random integer i, with 0 <= i < n, and use the array element at that index position. But what if the length of the array is not known beforehand, or is, in fact, infinite (i.e. a stream)? And what if we don’t just want a single element, but a set of m samples, without replacement?

How Best to Capture Output from Scientific Calculations?

When writing programs that do computations, my overwhelming preference is to simply write results to standard output, and to use shell redirection to capture the output in a file. In this way, I am leveraging the shell’s full functionality, in particular filename completion, in the most convenient way possible. For the file format itself, I prefer simple, column-oriented, delimiter-separated flat files. They are completely portable, and can be read and understood by most tools. (They also play well with the usual Unix toolset.)

But this simple approach breaks down, once a program has to write more than one output stream: for example in the case of a simulation run, I may want to capture periodic snapshots of the simulation itself, but also track various calculated metrics as well. These two streams will not fit comfortable into a single flat file. One option is to use a structured file format, the other option is to write to multiple files simultaneously.

D3 for the Impatient

D3 for the Impatient

If you are in a hurry to learn D3.js, the leading JavaScript library for web-based graphics and visualization, this book is for you. Written for technically savvy readers with a background in programming or data science, the book moves quickly, emphasizing unifying concepts and patterns. Anticipating common difficulties, the book teaches you how to apply D3 to your own problems.

Data Analysis with Open Source Tools

Data Analysis with Open Source Tools

With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.

Gnuplot in Action

Gnuplot in Action

Gnuplot in Action is a comprehensive tutorial written for all gnuplot users: data analysts, computer professionals, scientists, researchers, and others. It shows how to apply gnuplot to data analysis problems. It gets into tricky and poorly documented areas.