Data Analysis with Open Source Tools


  • Data Analysis with Open Source Tools
  • A Hands-On Guide for Programmers and Data Scientists
  • Philipp K. Janert
  • 540 pages
  • O’Reilly (2010)
  • ISBN: 978-0596802356

From the Back Cover:

Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.

Along the way, you’ll experiment with concepts through hands-on workshops at the end of each chapter. Abova all, you’ll learn how to think about the results you want to achieve—rather than rely on tool to think for you.

Table of Contents:

  1. Introduction
  2. A Single Variable: Shape and Distribution
  3. Two Variables: Establishing Relationships
  4. Time as a Variable: Time-Series Analysis
  5. More Than Two Variables: Graphical Multivariate Analysis
  6. Intermezzo: A Data Analysis Session
  7. Guesstimation and the Back of the Envelope
  8. Models from Scaling Arguments
  9. Arguments from Probability Models
  10. What You Really Need to Know About Classical Statistics
  11. Intermezzo: Mythbusting—Bigfoot, Least Squares, and All That
  12. Simulations
  13. Finding Clusters
  14. Seeing the Forest for the Trees: Finding Important Attributes
  15. Intermezzo: When More is Different
  16. Reporting, Business Intelligence, and Dashboards
  17. Financial Calculations and Modeling
  18. Predictive Analytics
  19. Epilogue: Facts Are Not Reality
  20. Appendix A: Programming Environments for Scientific Computation
  21. Appendix B: Results from Calculus
  22. Appendix C: Working with Data

Data Analysis with Open Source Tools Covers