This function takes in a DataFrame 'df' and an optional integer 'root_cnt'

which is set to 1 by default.

It first concatenates all the bullet points and titles in the DataFrame

and saves them to separate text files.

It then initializes two empty dictionaries for storing the frequency of

each word in the titles and bullet points.

The titles and bullet points are split into lists of individual words and

all punctuation is removed from each word.

The 'root_cnt' variable determines how many words are used as the "root"

for each frequency count. If 'root_cnt' is 1, then each individual word

is used as a "root". If 'root_cnt' is greater than 1, then groups of

'root_cnt' words are used as "roots".

The function loops through each "root" in the titles and bullet points

and updates the frequency count in the respective dictionary.

The function then creates two new DataFrames from the dictionaries,

one for the titles and one for the bullet points, with columns for the

"root" word, its frequency, and its frequency as a percentage of the

total frequency.

The two DataFrames are returned as the output of the function.

def-words_countdf-root_cnt=1n----Bullets-=--joindfBulletsn----with-openBullets_Savetxt-w-as-fn--------fwriteBulletsn----Title-=--joindfTitlen----with-openTitle_Savetxt-w-as-fn--------fwriteTitlen----n----dic_title-=-n----dic_bullet-=-n----lst_title-=

原文地址: https://www.cveoy.top/t/topic/rZY 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录