DevilKing's blog

冷灯看剑,剑上几分功名?炉香无需计苍生,纵一穿烟逝,万丈云埋,孤阳还照古陵

0%

Golang Tips

Go Datastructures slices

go datastructures

diff between c arrays and go array

  • array name in c is an alias, in go is reference/pointer
  • c arrays can passed to function as a pointer, but go is pass values
  • in c, array can not be copied like ar1 = ar2 unless ar1 and ar2 are pointer, but it is possible in go
  • c array should be freed, go supports garbage collection

diff between go array and slice

  • go slice is a 3 word data structure <pointer, length, capacity>

slice is a collection of data in contiguous blocks of memory

Nil slice: var slice[] int , empty slice slice:=make([]int, 0) or slice:=[]int{}

growing slice -> like java list, factor to grow? no..

growing in slice

  • the append function takes in a source slice and append values and returns a new slice
  • append always increases the length of the new slice but capacity may or may not increase

slice append - third index

  • the third index of the slice restricts the capacity
  • slice:=source[2:3:4]
  • by setting the capacity == length, the new slice is forced to detach from source backing array and creates its own backing array
  • the above technique is used in scenarios where we just want to modify the new slice backing array without changing the source backing array
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
func main() {
k:=make([]int,0)

k = append(k, 1)
k = append(k, 2)
k = append(k, 3)
k = append(k, 4)
k = append(k, 5)
k = append(k, 6)

t:=k[2:3:3]

fmt.Println(k)
fmt.Println(t)

t = append(t, 7)

fmt.Println(k)
fmt.Println(t)
}

the reuslt:

1
2
3
4
[1 2 3 4 5 6]
[3]
[1 2 3 4 5 6]
[3 7]

notice the detach operation

but what the meaning of the capacity?

passing slices to functions

since only the pointer to the backing array is passed, this is very efficient. whether the size of the backing array is 10 or one million only 24 bytes are passed to function

RFC: Apache Beam Go SDK design

RFC: Apache Beam Go SDK design

Apache Beam

is an advanced unified programming model, implement batch and streaming data processing jobs that run on any execution engine

weak point:

  • no generics
  • no function or method overloading
  • no inheritance
  • limited reflection and serialization support
  • no annotation support

strong point:

  • first-class functions
  • full type reflection
  • multiple return values
  • and more

key design points

  • natively-typed dofns and other user functions

  • weakly-typed ptransforms that capture arity natively

  • static type checking at pipeline construction time

    • kv is implicit. we use multiple arguments and return tuples to represent unfolded KV for DoFns
    • slide input forms.
    • simulated generic types. we achieve some of the effect of generics by introducing special ‘universal’ types T,U,… X,Y,Z over interface{}
  • error handling

examples

model representation
  • Pipeline
  • Runner
  • PCollection
  • Coder
  • DoFn and other user functions
Transforms
  • Impulse
  • Create
  • ParDo family
  • GroupByKey
  • Flatten
  • Combine
  • Partition