Hi,
Yes if you want to have a point where the parallel execution should join (and wait each other) then you have to create the second parallel gateway.
If you will not use the second parallel gateway then the parallel executions will run independently until end node will be reached on each execution.
if you will join the executions directly into another task "join_task" then a new instance of this "join task" will be created for each of the parallel executions.
if you will have a join gateway before the "join task" then all the parallel executions will wait each other in the join gateway and will continue when all of them will arrive into this gateway; so only one instance of "join task" will be created.