怎么点积(1,10^{13})(10^13,1)实数稀疏矩阵

本教程将介绍如何点积(1,10^{13})(10^13,1)实数稀疏矩阵的处理方法,这篇教程是从别的地方看到的,然后加了一些国外程序员的疑问与解答,希望能对你有所帮助,好了,下面开始学习吧。

怎么点积(1,10^{13})(10^13,1)实数稀疏矩阵 教程 第1张

问题描述

基本上就是标题所包含的内容。
这两个矩阵几乎都是零。第一个是1 x 9999999999999,第二个是9999999999999 x 1
当我尝试做点积时,我得到了这样的结果。

Unable to allocate 72.8 TiB for an array with shape (10000000000000,) and data type int64

Full traceback </br>

 MemoryError: Unable to allocate 72.8 TiB for an array with shape (10000000000000,) and data type int64

In [31]: imputed.dot(s)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-31-670cfc69d4cf> in <module>
----> 1 imputed.dot(s)

~/.local/lib/python3.8/site-packages/scipy/sparse/base.py in dot(self, other)
 357 
 358"""
--> 359return self * other
 360 
 361  def power(self, n, dtype=None):

~/.local/lib/python3.8/site-packages/scipy/sparse/base.py in __mul__(self, other)
 478 if self.shape[1] != other.shape[0]:
 479  raise ValueError('dimension mismatch')
--> 480 return self._mul_sparse_matrix(other)
 481 
 482# If it's a list or whatever, treat it like a matrix

~/.local/lib/python3.8/site-packages/scipy/sparse/compressed.py in _mul_sparse_matrix(self, other)
 499 
 500major_axis = self._swap((M, N))[0]
--> 501other = self.__class__(other)  # convert to this format
 502 
 503idx_dtype = get_index_dtype((self.indptr, self.indices,

~/.local/lib/python3.8/site-packages/scipy/sparse/compressed.py in __init__(self, arg1, shape, dtype, copy)
  32  arg1 = arg1.copy()
  33 else:
---> 34  arg1 = arg1.asformat(self.format)
  35 self._set_self(arg1)
  36 

~/.local/lib/python3.8/site-packages/scipy/sparse/base.py in asformat(self, format, copy)
 320 # Forward the copy kwarg, if it's accepted.
 321 try:
--> 322  return convert_method(copy=copy)
 323 except TypeError:
 324  return convert_method()

~/.local/lib/python3.8/site-packages/scipy/sparse/csc.py in tocsr(self, copy)
 135idx_dtype = get_index_dtype((self.indptr, self.indices),
 136 maxval=max(self.nnz, N))
--> 137indptr = np.empty(M + 1, dtype=idx_dtype)
 138indices = np.empty(self.nnz, dtype=idx_dtype)
 139data = np.empty(self.nnz, dtype=upcast(self.dtype))

MemoryError: Unable to allocate 72.8 TiB for an array with shape (10000000000000,) and data type int64

似乎scipy正在尝试创建一个temp数组。
我使用的是Scipy提供的.ot方法。
我也对不重要的解决方案持开放态度。
谢谢!

推荐答案

In [105]: from scipy import sparse

如果我制作一个(100,1)CSR矩阵:

In [106]: A = sparse.random(100,1,format='csr')
In [107]: A
Out[107]: 
<100x1 sparse matrix of type '<class 'numpy.float64'>'
 with 1 stored elements in Compressed Sparse Row format>

数据和指数为:

In [109]: A.data
Out[109]: array([0.19060481])
In [110]: A.indices
Out[110]: array([0], dtype=int32)
In [112]: A.indptr
Out[112]: 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

因此,即使只有1个非零项,一个数组也很大(101)。

另一方面,同一数组的csc格式的存储空间要小得多。但形状为(1,100)的csc将看起来像CSR。

In [113]: Ac = A.tocsc()
In [114]: Ac.indptr
Out[114]: array([0, 1], dtype=int32)
In [115]: Ac.indices
Out[115]: array([88], dtype=int32)

数学,特别是矩阵乘积是用csr/csc格式完成的。因此,可能很难避免这种80 TB的内存使用。


查看回溯,我发现它正在尝试将other转换为与self匹配的格式。

所以A.dot(B)A是(1,N)csr,小的形状。B是(N,1)csc,也是小形状。但B.tocsr()需要较大的(N+1,)形状indptr


让我们尝试dot的替代方案

前2个矩阵:

In [122]: A = sparse.random(1,100, .2,format='csr')
In [123]: B = sparse.random(100,1, .2,format='csc')
In [124]: A
Out[124]: 
<1x100 sparse matrix of type '<class 'numpy.float64'>'
 with 20 stored elements in Compressed Sparse Row format>
In [125]: B
Out[125]: 
<100x1 sparse matrix of type '<class 'numpy.float64'>'
 with 20 stored elements in Compressed Sparse Column format>
In [126]: A@B
Out[126]: 
<1x1 sparse matrix of type '<class 'numpy.float64'>'
 with 1 stored elements in Compressed Sparse Row format>
In [127]: _.A
Out[127]: array([[1.33661021]])

它们的非零元素索引。只有匹配的才重要。

In [128]: A.indices, B.indices
Out[128]: 
(array([16, 20, 23, 28, 30, 37, 39, 40, 43, 49, 54, 59, 61, 63, 67, 70, 74,
  91, 94, 99], dtype=int32),
 array([ 5,  8, 15, 25, 34, 35, 40, 46, 47, 51, 53, 60, 68, 70, 75, 81, 87,
  90, 91, 94], dtype=int32))

等式矩阵:

In [129]: mask = A.indices[:,None]==B.indices

In [132]: np.nonzero(mask.any(axis=0))
Out[132]: (array([ 6, 13, 18, 19]),)
In [133]: np.nonzero(mask.any(axis=1))
Out[133]: (array([ 7, 15, 17, 18]),)

匹配索引:

In [139]: A.indices[Out[133]]
Out[139]: array([40, 70, 91, 94], dtype=int32)
In [140]: B.indices[Out[132]]
Out[140]: array([40, 70, 91, 94], dtype=int32)

相应data值的总和与[127]

匹配

In [141]: (A.data[Out[133]]*B.data[Out[132]]).sum()
Out[141]: 1.3366102138511582

好了关于怎么点积(1,10^{13})(10^13,1)实数稀疏矩阵的教程就到这里就结束了,希望趣模板源码网找到的这篇技术文章能帮助到大家,更多技术教程可以在站内搜索。