Skip to the content.

KuaiComt

KuaiComt is a comprehensive short video recommendation dataset that includes abundant comment text and interaction data. It contains real user behavior logs collected from the short-video mobile app Kuaishou, a leading short video app in China with over 400 million daily active users. On average, users spend over 120 minutes on the app each day, with more than 7 minutes (over 5%) spent in the video comments section. The comments section boasts a UV penetration rate of over 60%.

This is the first recommendation dataset that not only records item text and interaction data but also includes abundant comment text and interaction data!

Overview

The following figure provides an example of the dataset. When users enter the app, they can scroll up and down to browse different videos. Additionally, users can click the comment button on the right side of the video to enter the comments section, where they can scroll through comments and engage in interactive behaviors such as likes and replies.

kuaidata

The other related datasets are: KuaiRec, KuaiRand and KuaiSAR.

Advantages:

Compared with other existing datasets, KuaiComt has the following advantages:

Statistics

Here we show some basic statistics. Check this page for more detailed Descriptions.

KuaiComt contains the real behavior of 34,701 users on the Kuaishou app from September 30, 2023, to November 3, 2023. Due to the large number of comment impressions to users, we only provide data on user interactions with comments (likes and replies). Videos with fewer than 55 comments and comments with fewer than 2 interactions were filtered out. Additionally, video titles and comment texts were anonymized.

Basic statistics of this dataset in the are summarized as follows:

KuaiComt

Dataset #Users #Videos #Comments #Impressions-V #OpenComments-V #Interactions-C
KuaiComt 34,701 82,452 16,352,904 119,696,682 16,033,443 1,002,672

where ‘Impressions-V’ denotes the impressions of videos to users, ‘OpenComments-V’ denotes the behavior of users opening the comments section, and ‘Interactions-C’ denotes user interactions with comments (such as likes or replies).

The short descriptions for each feature filed are listed as below. Please refer to this page for more details and examples.

Feature Detailed Descriptions.
User feature Users have abundant side information, e.g., user active degree, follow count.
Video feature Videos have abundant side information, e.g., caption, duration.
Comment feature Comments have abundant side information, e.g., comment content, comment like cnt.
V-inter feature Video-interactions have 12 features, e.g., comment stay time, play time, likes, and follows.
C-inter feature Comment-interactions has 2 features, including 2 types of user feedback: likes and replies.

Download the data:

KuaiComt has been shared at https://zenodo.org/records/13922581.

DOI

According to our company’s data sharing policy, our datasets are made available through confidentiality agreements.

In compliance with the recently enacted Personal Information Protection Law and Data Security Assessment Measures for Cross-Border Data Transfer in China, we currently provide datasets exclusively to Chinese entities (universities, research institutes, and companies).

You are required to send us your name and institutional details, after which we will provide you with the relevant confidentiality agreement. The dataset will be shared with you only after the agreement has been signed.

OPTION 1. Download via your browser:

You can download the dataset from this link.

OPTION 2: Download via the ‘wget’ command tool:

For the KuaiComt dataset:

wget https://zenodo.org/record/13922581/files/KuaiComt.zip

unzip KuaiComt.zip

CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

Contact

If you have any questions, please feel free to contact us through github issues) or emails (zhangchangshuo@kuaishou.com)