You will implement a matrix-vector multiplication in CUDA. Iexpect you to use “shared memory” in this assignment.
Your kernel will take an MxN float matrix and an Nx1 floatvector as inputs. You will multiply the matrix by the vector andreturn a new Mx1 float vector as the output. For example:
Output Mx1
Input MxN
Input Nx1
117
92
101
68
120
=
8
6
1
3
6
6
7
5
5
0
1
0
0
4
8
7
0
8
0
5
2
4
4
2
5
9
6
6
1
4
x
5
2
8
3
7
1
I recommend you to start writing the program without the sharedmemory. Make all global memory accesses to implement it. Later,look at your solution and find which part of the data is accessedmultiple times. When you find it, you can try to move that part toshared memory, so there will
PayPal Gateway not configured
PayPal Gateway not configured