def-words_countdf-root_cnt=1n----Bullets-=--joindfBulletsn----with-openBullets_Savetxt-w-as-fn--------fwriteBulletsn----Title-=--joindfTitlen----with-openTitle_Savetxt-w-as-fn--------fwriteTitlen----n----dic_title-=-n----dic_bullet-=-n----lst_title-=
This function takes in a DataFrame 'df' and an optional integer 'root_cnt'
which is set to 1 by default.
It first concatenates all the bullet points and titles in the DataFrame
and saves them to separate text files.
It then initializes two empty dictionaries for storing the frequency of
each word in the titles and bullet points.
The titles and bullet points are split into lists of individual words and
all punctuation is removed from each word.
The 'root_cnt' variable determines how many words are used as the "root"
for each frequency count. If 'root_cnt' is 1, then each individual word
is used as a "root". If 'root_cnt' is greater than 1, then groups of
'root_cnt' words are used as "roots".
The function loops through each "root" in the titles and bullet points
and updates the frequency count in the respective dictionary.
The function then creates two new DataFrames from the dictionaries,
one for the titles and one for the bullet points, with columns for the
"root" word, its frequency, and its frequency as a percentage of the
total frequency.
The two DataFrames are returned as the output of the function.
原文地址: https://www.cveoy.top/t/topic/rZY 著作权归作者所有。请勿转载和采集!