Data Science for Non-Techies: Career Skills with Practical Examples (A Beginner’s Guide to Big Data, Analytics, and Insights by Maxen Ford

4. How to Collect and Clean Data Like a Pro

  • Step one is getting the data. Maxen mentions a number of government and public databases you can access for free. There are also tools for scraping data that is free on the Internet. Data quality is also important, so work to make your data complete, correct, and valid. The data you harvest may need some addition work to make it useful. This can involve things like eliminating duplicates, finding missing values, and putting data in the necessary format. You may also need to convert data so every piece uses the same units or formats.
  • (Doug) If you have no experience with Excel, you should probably look for a course or teaching videos on YouTube.) It’s always handy if Excel can read your data files. Also, consider some Python training with emphasis on its Pandas library. Be sure to keep a copy of your raw data before you start to clean it and document what you have done.

5. Understanding Data Analysis Without Coding

  • The PivotTable is a very powerful tool that lets you easily make more sense of your data, so be sure to learn how to use it. It’s not difficult. At times you will need to filer and sort your data and find sums, averages, counts, and percentages. Bar charts, line graphs, pie charts, and scatter plots can easily show relationships and trends.
  • . Correlations show the relationships between two variables, but don’t assume that a change in one causes a change in another. You also need to know about mean, median, mode, minimum, maximum, and standard deviation and how to calculate them. All of these tasks are somewhat automatic in Excel and other tools. With these tools you can tell a story with data and show what the numbers mean. You also need to form hypotheses, which are educated guesses that you then test with data.

6. Visualize Insights with Dashboards and Graphs

  • A dashboard is a centralized view of your important data. It can monitor trends and performance in real time. You should start by considering the decisions that a dashboard will support and ask what are the key metrics needed? On it you can display the graphs and computations from the previous chapter. It should be interactive with dropdowns, date sliders, and drill-down options. Make sure it has titles, labels, legends, and brief notes. Use color for emphasis and clarity, but don’t go overboard.
  • Put it in a shared location and automate data refresh. Avoid clutter and be sure to get feedback. When you share your work be sure to offer your recommendations, but be open to questions and other ideas. Your main goal is to start a conversation that will result in a collaborative effort.

7. Intro to Predictive Analytics and Machine Learning

  • Predictive analysis uses historical data to make informed predictions or forecasts. Regression analysis estimates the relationships between variables. It can involve two variables or more. Machine learning is a subset of artificial intelligence that can automatically detect patterns over time. Any estimation of probabilities is only as good as the data it uses. In any case, human judgment is always necessary. Sometimes you will want to classify data into two or more categories. You may also need to analyze data for clustering of data points to better understand your audience.
  • The tools that don’t require coding you want to look at for these tasks are DataRobot and Google AutoML. The goal is to automate repetitive analysis tasks in real time. (Doug: As with other tools mentioned here, you should search YouTube for tutorials.) If you have enough data about each individual, you can personalize your marketing or teaching approach.
Share this:
Share this page via Email Share this page via Stumble Upon Share this page via Digg this Share this page via Facebook Share this page via Twitter Share this page via Google Plus
DrDougGreen.com     If you like the summary, buy the book
Pages: 1 2 3