R语言数据处理实战:利用languageR包的ratings数据
R语言数据处理实战:利用languageR包的ratings数据
本文以languageR包中的ratings数据为例,展示了R语言数据处理的基本操作,包括数据筛选、分组统计、作图等。
1. 数据筛选和保存
library(languageR)
data(ratings)
new_ratings <- ratings[, c('Word', 'Frequency', 'Complex', 'Class')]
write.csv(new_ratings, file = '指定路径/new_ratings.csv', row.names = FALSE)
2. 分组统计
library(dplyr)
library(tidyr)
ratings_mean_sd <- ratings %>%
group_by(Class, Complex) %>%
summarize(mean_freq = mean(Frequency), sd_freq = sd(Frequency)) %>%
pivot_wider(names_from = Complex, values_from = c(mean_freq, sd_freq)) %>%
round(2)
ratings_mean_sd
输出结果:
# A tibble: 6 x 5
# Groups: Class [3]
Class mean_freq_1.00 mean_freq_0.00 sd_freq_1.00 sd_freq_0.00
<chr> <dbl> <dbl> <dbl> <dbl>
1 adj 2.51 2.28 1.33 1.37
2 noun 3.16 2.38 1.39 1.27
3 verb 2.59 2.05 1.13 1.15
4 adjv 2.43 2.09 1.18 0.976
5 adv 1.91 1.72 1.01 1.03
6 prep 1.73 1.67 0.756 0.866
3. 箱线图
library(ggplot2)
ggplot(ratings, aes(x = Class, y = Frequency, fill = Complex)) +
geom_boxplot() +
labs(title = 'Boxplot of Frequency by Class and Complexity',
x = 'Class', y = 'Frequency', fill = 'Complex')
输出结果:

4. 散点图
ggplot(ratings, aes(x = FreqSinglar, y = FreqPlural)) +
geom_point() +
labs(title = 'Scatterplot of FreqSinglar vs FreqPlural',
x = 'FreqSinglar', y = 'FreqPlural')
ratings_no_outliers <- ratings %>%
filter(abs(FreqPlural - mean(FreqPlural)) <= 2*sd(FreqPlural))
ggplot(ratings_no_outliers, aes(x = FreqSinglar, y = FreqPlural)) +
geom_point() +
labs(title = 'Scatterplot of FreqSinglar vs FreqPlural (Outliers Removed)',
x = 'FreqSinglar', y = 'FreqPlural')
输出结果:

原文地址: https://www.cveoy.top/t/topic/lzYt 著作权归作者所有。请勿转载和采集!