向现有数据帧添加空间输出时,列不对齐

本教程将介绍向现有数据帧添加空间输出时,列不对齐的处理方法,这篇教程是从别的地方看到的,然后加了一些国外程序员的疑问与解答,希望能对你有所帮助,好了,下面开始学习吧。

向现有数据帧添加空间输出时,列不对齐 教程 第1张

问题描述

我有一个CSV,其中包含一列文章标题,我使用Spacy从其中提取出现在标题中的任何人名。尝试使用Spacy提取的名称向CSV添加新列时,它们与从中提取它们的行不对齐。

我相信这是因为Spacy结果有自己的索引,独立于原始数据的索引。

我已尝试将, index=df.index)添加到新列行,但得到"ValueError:传递的值的长度为2,索引暗示为10。&q;

怎么将空格输出与其来源行对齐?

以下是我的代码:

import pandas as pd
from pandas import DataFrame
df = (pd.read_csv(r"C:UsersAdminDownloadsitsnicethat (5).csv", nrows=10,
usecols=['article_title']))
article = [_ for _ in df['article_title']]

import spacy
nlp = spacy.load('en_core_web_lg')
doc = nlp(str(article))
ents = list(doc.ents)
people = []
for ent in ents:
 if ent.label_ == "PERSON":
  people.append(ent)

import numpy as np
df['artist_names'] = pd.Series(people)
print(df.head())

这是生成的数据帧:

article_title artist_names
0  "They’re like, is that? Oh it’s!" – ...(Hannah, Ward)
1  Billed as London’s biggest public festival of ...  (Dylan, Mulvaney)
2  Transport yourself back to the dusky skies and... NaN
3  Turning to art at the beginning of quarantine ... NaN
4  Dylan Mulvaney, head of design at Gretel, expl... NaN

这就是我所期待的:

article_title artist_names
0  "They’re like, is that? Oh it’s!" – ...(Hannah, Ward)
1  Billed as London’s biggest public festival of ... NaN
2  Transport yourself back to the dusky skies and... NaN
3  Turning to art at the beginning of quarantine ... NaN
4  Dylan Mulvaney, head of design at Gretel, expl...(Dylan, Mulvaney)

您可以看到MACTOR_NAMES列中的第5个值与第5个文章标题相关。怎么使它们对齐?

感谢您的帮助。

推荐答案

我会遍历文章,分别检测每个文章中的实体,并将检测到的实体放在一个列表中,每个文章有一个元素:

nlp = spacy.load('en_core_web_lg')
article = [_ for _ in df['article_title']]

entities_by_article = []
for doc in nlp.pipe(article):
  people = []
  for ent in doc.ents:
 if ent.label_ == "PERSON":
people.append(ent)
  entities_by_article.append(people)

df['artist_names'] = pd.Series(entities_by_article)

注意:for doc in nlp.pipe(article)是Spacy在文本列表中循环的更有效方式,可以替换为:

for a in article:
  doc = nlp(a)
  ## rest of code within loop

好了关于向现有数据帧添加空间输出时,列不对齐的教程就到这里就结束了,希望趣模板源码网找到的这篇技术文章能帮助到大家,更多技术教程可以在站内搜索。