Comment Scorer
Problem Statement
Ponder is a platform for sharing your thoughts, or promoting some personal stuff and everything. Not only that it is also a very good place to check to keep yourself with what is happening around the world. But it has a very strict posting policy, and getting credits for good comments are really. You’re someone who has just started to use Ponder as a new user, you want to predict whether your comment is going to receive a good score or not.
Data Description
Train Having 45000 comment replies and the test set having 30000 rows of comments.Here Task was to build a model that can predict scores of comments present in the test dataset.
Column Label | Column Description |
---|---|
UID | Unique Id |
Comment | Reply to a parent comment |
Date | Comment data |
parent_commnet | The parent comment to which sarcastic comments are made |
score | Score on the parent_comment |
Pipeline WorkFlow
- Used tfidf vectorization for both word and character level to convert the comments into vector form.
- Generate new features like comment length,sentiment value of comment & profanity value of comment.
- Concatenate both and apply the lightgbmRegressor model for 6 folds to get the final prediction.
- lighgbmRegressor was tuned using the bayesian optimization.
Team Member
Utsav Aggarwal Arjun Rana
Written on February 23, 2019