Homework 5 (Chapter 9 and Chapter 10)

  • What are the main challenges of text analysis?
  • What is a corpus?
  • What are common words (such as a, and, of) called?
  • Why can’t we use TF alone to measure the usefulness of the words?
  • What is a caveat of IDF? How does TFIDF address the problem?
  • Name three benefits of using the TFIDF.
  • What methods can be used for sentiment analysis?
  • Research and document additional use cases and actual implementations for Hadoop.
  • Compare and contrast Hadoop, Pig, Hive, and HBase. List strengths and weaknesses of each tool set.
  • Research and summarize three published use cases for Hadoop, Pig, Hive, and HBase.

