向现有数据帧添加空间输出时,列不对齐
本教程将介绍向现有数据帧添加空间输出时,列不对齐的处理方法,这篇教程是从别的地方看到的,然后加了一些国外程序员的疑问与解答,希望能对你有所帮助,好了,下面开始学习吧。
问题描述
我有一个CSV,其中包含一列文章标题,我使用Spacy从其中提取出现在标题中的任何人名。尝试使用Spacy提取的名称向CSV添加新列时,它们与从中提取它们的行不对齐。
我相信这是因为Spacy结果有自己的索引,独立于原始数据的索引。
我已尝试将, index=df.index)
添加到新列行,但得到";ValueError:传递的值的长度为2,索引暗示为10。&q;
怎么将空格输出与其来源行对齐?
以下是我的代码:
import pandas as pd
from pandas import DataFrame
df = (pd.read_csv(r"C:UsersAdminDownloadsitsnicethat (5).csv", nrows=10,
usecols=['article_title']))
article = [_ for _ in df['article_title']]
import spacy
nlp = spacy.load('en_core_web_lg')
doc = nlp(str(article))
ents = list(doc.ents)
people = []
for ent in ents:
if ent.label_ == "PERSON":
people.append(ent)
import numpy as np
df['artist_names'] = pd.Series(people)
print(df.head())
这是生成的数据帧:
article_title artist_names
0 "They’re like, is that? Oh it’s!" – ...(Hannah, Ward)
1 Billed as London’s biggest public festival of ... (Dylan, Mulvaney)
2 Transport yourself back to the dusky skies and... NaN
3 Turning to art at the beginning of quarantine ... NaN
4 Dylan Mulvaney, head of design at Gretel, expl... NaN
这就是我所期待的:
article_title artist_names
0 "They’re like, is that? Oh it’s!" – ...(Hannah, Ward)
1 Billed as London’s biggest public festival of ... NaN
2 Transport yourself back to the dusky skies and... NaN
3 Turning to art at the beginning of quarantine ... NaN
4 Dylan Mulvaney, head of design at Gretel, expl...(Dylan, Mulvaney)
您可以看到MACTOR_NAMES列中的第5个值与第5个文章标题相关。怎么使它们对齐?
感谢您的帮助。
推荐答案
我会遍历文章,分别检测每个文章中的实体,并将检测到的实体放在一个列表中,每个文章有一个元素:
nlp = spacy.load('en_core_web_lg')
article = [_ for _ in df['article_title']]
entities_by_article = []
for doc in nlp.pipe(article):
people = []
for ent in doc.ents:
if ent.label_ == "PERSON":
people.append(ent)
entities_by_article.append(people)
df['artist_names'] = pd.Series(entities_by_article)
注意:for doc in nlp.pipe(article)
是Spacy在文本列表中循环的更有效方式,可以替换为:
for a in article:
doc = nlp(a)
## rest of code within loop
好了关于向现有数据帧添加空间输出时,列不对齐的教程就到这里就结束了,希望趣模板源码网找到的这篇技术文章能帮助到大家,更多技术教程可以在站内搜索。