这是我正在工作的可复制的数据..。
covid <- structure(list(Refid = c(32740925L, 32891569L, 2007846266L, 2007846378L,
2007856056L, 2007858108L, 2007863577L, 2007872004L, 2007872013L,
2007915036L, 2007915277L, 2007916087L, 2007916147L, 2007916184L,
2007916258L, 2007916285L, 2007916333L, 2007916710L, 2007917006L,
2007918143L, 2007920589L, 2007921553L, 2007967876L, 2007967891L,
2007967904L, 2007968097L, 2007968362L, 2007968557L, 2007968993L,
2008010059L, 2008010956L, 2008010970L, 2008011456L, 2008011614L,
2008011632L), Title = c("Telemedicine in Otolaryngology in the COVID-19 Era: Initial Lessons Learned.",
"Paracervical blocks facilitate timely brachytherapy amidst COVID-19.",
"The Perils of Covid-19 for Otorhinolaryngologists: An Overview.",
"Air care: an 'aerography' of breath, buildings and bugs in the cystic fibrosis clinic.",
"Breath analysis for detection of viral infection, the current position of the field.",
"An epidemiological study to assess the prevalence of diabetic peripheral neuropathic pain among adults with diabetes attending private and institutional outpatient clinics in South Africa.",
"Concerns and strategies for wastewater treatment during COVID-19 pandemic to stop plausible transmission",
"A Chemoenzymatic Synthesis of the (RP)-Isomer of the Antiviral Prodrug Remdesivir",
"Clinical characteristics and outcome of hemodialysis patients with COVID-19: a large cohort study in a single Chinese center",
"Impact of COVID-19 pandemic on waste management.", "Long-Lasting, Patient-Controlled, Procedure-Free Contraception: A Review of Annovera with a Pharmacist Perspective.",
"Bacillus Calmette-Guerin (BCG) vaccine generates immunoregulatory cells in the cervical lymph nodes in guinea pigs injected intra dermally.",
"Comparison analysis of different swabs and transport mediums suitable for SARS-CoV-2 testing following shortages.",
"Coronaviruses widespread on nonliving surfaces: important questions and promising answers.",
"A Surface Coating that Rapidly Inactivates SARS-CoV-2.", "Plexiglas barrier box to improve ERCP safety during the COVID-19 pandemic.",
"COVID-19 Pandemic Repercussions on the Use and Management of Plastics.",
"Simple, Low-Cost and Long-Lasting Film for Virus Inactivation Using Avian Coronavirus Model as Challenge.",
"Cytokine storm intervention in the early stages of COVID-19 pneumonia.",
"In vitro measurement of the permeability of endovascular coils deployed in cerebral aneurysms.",
"A new system of microwave ablation at 2450 MHz: preliminary experience.",
"Endovascular treatment of 404 intracranial aneurysms treated with nexus detachable coils: short-term and mid-term results from a prospective, consecutive, European multicenter study.",
"Environmentally friendly non-medical mask: An attempt to reduce the environmental impact from used masks during COVID 19 pandemic",
"Plastic residues produced with confirmatory testing for COVID-19: Classification, quantification, fate, and impacts on human health",
"What we need to know about PPE associated with the COVID-19 pandemic in the marine environment",
"Peroral endoscopy during the COVID-19 pandemic: Efficacy of the acrylic box (Endo-Splash Protective (ESP) box) for preventing droplet transmission",
"Disinfection of gloved hands during the COVID-19 pandemic",
"Cysteine focused covalent inhibitors against the main protease of SARS-CoV-2",
"Just the Facts: Recommendations on point-of-care ultrasound use and machine infection control during the coronavirus disease 2019 pandemic",
"A comprehensive risk assessment of toxic elements in international brands of face foundation powders",
"Pediatric E.N.T. emergencies during COVID-19 pandemic: our experience.",
"Collective aeromedical transport of COVID-19 critically ill patients in Europe: A retrospective study.",
"Severity of COVID-19 at elevated exposure to perfluorinated alkylates.",
"Assessment of water and sanitation systems at Palestinian healthcare facilities: pre- and post-COVID-19.",
"Drinking water pollutants may affect the immune system: concerns regarding COVID-19 health effects."
)), class = "data.frame", row.names = c(NA, -35L))我尝试了https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html的解决方案,但是结果显示的前5位输出给了我
library(lattice)
stats <- txt_freq(covid$Title)
structure(list(key = structure(35:30, .Label = c("Drinking water pollutants may affect the immune system: concerns regarding COVID-19 health effects.",
"Assessment of water and sanitation systems at Palestinian healthcare facilities: pre- and post-COVID-19.",
"Severity of COVID-19 at elevated exposure to perfluorinated alkylates.",
"Collective aeromedical transport of COVID-19 critically ill patients in Europe: A retrospective study.",
"Pediatric E.N.T. emergencies during COVID-19 pandemic: our experience.",
"A comprehensive risk assessment of toxic elements in international brands of face foundation powders",
"Just the Facts: Recommendations on point-of-care ultrasound use and machine infection control during the coronavirus disease 2019 pandemic",
"Cysteine focused covalent inhibitors against the main protease of SARS-CoV-2",
"Disinfection of gloved hands during the COVID-19 pandemic",
"Peroral endoscopy during the COVID-19 pandemic: Efficacy of the acrylic box (Endo-Splash Protective (ESP) box) for preventing droplet transmission",
"What we need to know about PPE associated with the COVID-19 pandemic in the marine environment",
"Plastic residues produced with confirmatory testing for COVID-19: Classification, quantification, fate, and impacts on human health",
"Environmentally friendly non-medical mask: An attempt to reduce the environmental impact from used masks during COVID 19 pandemic",
"Endovascular treatment of 404 intracranial aneurysms treated with nexus detachable coils: short-term and mid-term results from a prospective, consecutive, European multicenter study.",
"A new system of microwave ablation at 2450 MHz: preliminary experience.",
"In vitro measurement of the permeability of endovascular coils deployed in cerebral aneurysms.",
"Cytokine storm intervention in the early stages of COVID-19 pneumonia.",
"Simple, Low-Cost and Long-Lasting Film for Virus Inactivation Using Avian Coronavirus Model as Challenge.",
"COVID-19 Pandemic Repercussions on the Use and Management of Plastics.",
"Plexiglas barrier box to improve ERCP safety during the COVID-19 pandemic.",
"A Surface Coating that Rapidly Inactivates SARS-CoV-2.", "Coronaviruses widespread on nonliving surfaces: important questions and promising answers.",
"Comparison analysis of different swabs and transport mediums suitable for SARS-CoV-2 testing following shortages.",
"Bacillus Calmette-Guerin (BCG) vaccine generates immunoregulatory cells in the cervical lymph nodes in guinea pigs injected intra dermally.",
"Long-Lasting, Patient-Controlled, Procedure-Free Contraception: A Review of Annovera with a Pharmacist Perspective.",
"Impact of COVID-19 pandemic on waste management.", "Clinical characteristics and outcome of hemodialysis patients with COVID-19: a large cohort study in a single Chinese center",
"A Chemoenzymatic Synthesis of the (RP)-Isomer of the Antiviral Prodrug Remdesivir",
"Concerns and strategies for wastewater treatment during COVID-19 pandemic to stop plausible transmission",
"An epidemiological study to assess the prevalence of diabetic peripheral neuropathic pain among adults with diabetes attending private and institutional outpatient clinics in South Africa.",
"Breath analysis for detection of viral infection, the current position of the field.",
"Air care: an 'aerography' of breath, buildings and bugs in the cystic fibrosis clinic.",
"The Perils of Covid-19 for Otorhinolaryngologists: An Overview.",
"Paracervical blocks facilitate timely brachytherapy amidst COVID-19.",
"Telemedicine in Otolaryngology in the COVID-19 Era: Initial Lessons Learned."
), class = "factor"), freq = c(1L, 1L, 1L, 1L, 1L, 1L), freq_pct = c(2.85714285714286,
2.85714285714286, 2.85714285714286, 2.85714285714286, 2.85714285714286,
2.85714285714286)), row.names = c(NA, 6L), class = "data.frame")它并不是将列中的所有标题连在一起来分析最常见的单词(似乎是看到一行中的每个标题都是唯一的)。然后我尝试了来自R extract most common word(s) / ngrams in a column by group的解决方案,但是我在一开始就坚持了这个错误“group_by(.,group)中的错误:对象'topic_modelling‘没有找到”。
有人能给我一些建议吗?
发布于 2022-06-27 20:44:21
如果您更喜欢使用ud管材包,您可以这样做:
library(udpipe)
library(data.table)
texts <- data.frame(doc_id = covid$Refid, text = covid$Title, stringsAsFactors = FALSE)
anno <- udpipe(texts, "english-ewt")
anno <- setDT(anno)
anno$term <- tolower(anno$lemma)
# only proper nouns, nouns and verbs
library(wordcloud)
x <- subset(anno, upos %in% c("PROPN", "NOUN", "VERB"))
freq <- txt_freq(x$term)
head(freq)
#> key freq freq_pct
#> 1 covid 18 7.086614
#> 2 pandemic 9 3.543307
#> 3 study 4 1.574803
#> 4 impact 3 1.181102
#> 5 sars 3 1.181102
#> 6 cov 3 1.181102
wordcloud(freq$key, freq$freq, min.freq = 2)

# which proper nouns, nouns and verbs are within 2 words away from another proper noun, noun, verb
cooc <- anno[, cooccurrence(term, relevant = upos %in% c("PROPN", "NOUN", "VERB"), skipgram = 2), by = c("doc_id", "sentence_id")]
# aggregate and plot
library(textplot)
library(ggraph)
cooc <- cooc[, list(cooc = sum(cooc)), by = list(term1, term2)]
textplot_cooccurrence(cooc, top_n = 100, title = "Cooccurrences", subtitle = "Proper nouns, nouns, verbs",
vertex_color = "darkgreen")

https://stackoverflow.com/questions/72751883
复制相似问题