pandas show distribution of data split

This is the code

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(df.image_id, df.isup_grade, test_size=0.2, random_state=2020)
df.isup_grade.value_counts().plot(kind='bar', color='g', label='total')
y_train.value_counts().plot(kind='bar', color='b', label='train')
y_val.value_counts().plot(kind='bar', color='r', label='val')
plt.legend()
plt.title('train_test_split with val size 0.2')

This is the result

Method 2: `.plot.bar()`

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(df.image_id, df.isup_grade, test_size=0.2, random_state=2020)

new_df = pd.DataFrame()
new_df['total'] = df.isup_grade.value_counts().sort_index()
new_df['train'] = y_train.value_counts().sort_index()
new_df['val'] = y_val.value_counts().sort_index() # sort by index 


new_df.plot(kind='bar', figsize=(8, 8)) # control figsize
plt.xticks(rotation=0) # rotate x ticks
plt.title('train_test_split with val size 0.2')

The effect

Method 2: `.plot.bar()`

Share this post:

Related posts:

Leave a Reply