Comment Scorer

Problem Statement

                 Ponder is a platform for sharing your thoughts, or promoting some personal stuff and everything. Not only that it is also a very good place to check to keep yourself with what is happening around the world. But it has a very strict posting policy, and getting credits for good comments are really. You’re someone who has just started to use Ponder as a new user, you want to predict whether your comment is going to receive a good score or not.

Data Description

Train Having 45000 comment replies and the test set having 30000 rows of comments.Here Task was to build a model that can predict scores of comments present in the test dataset.

Column Label Column Description
UID Unique Id
Comment Reply to a parent comment
Date Comment data
parent_commnet The parent comment to which sarcastic comments are made
score Score on the parent_comment

Pipeline WorkFlow

  • Used tfidf vectorization for both word and character level to convert the comments into vector form.
  • Generate new features like comment length,sentiment value of comment & profanity value of comment.
  • Concatenate both and apply the lightgbmRegressor model for 6 folds to get the final prediction.
  • lighgbmRegressor was tuned using the bayesian optimization.

Link to code

Team Member

Utsav Aggarwal Arjun Rana
Written on February 23, 2019