Temp files in Google Cloud Dataflow -


i'm trying write temporary files on workers executing dataflow jobs, seems files getting deleted while job still running. if ssh running vm, i'm able execute exact same file-generating command , files not destroyed -- perhaps cleanup happens dataflow runner user only. is possible use temp files or platform limitation?

specifically, i'm attempting write location returned files.createtempdir(), /tmp/someidentifier.

edit: not sure happening when posted, files.createtempdirectory() works...

we make no explicit guarantee lifetime of files write local disk.

that said, writing temporary file inside processelement work. can write , read within same processelement. similarly, files created in dofn.startbundle visible in processelement , finishbundle.

you should avoid writing /dataflow/logs/taskrunner/harness. writing files there might conflict dataflow's logging. encourage use standard java apis file.createtempfile() , file.createtempdirectory() instead.

if want preserve data beyond finishbundle should write data durable storage such gcs. can emitting data sideoutput , using textio or 1 of other writers. alternatively, write gcs directly inside dofn.

since dataflow runs inside containers won't able see files ssh'ing vm. container has of directories of host vm mounted, /tmp not 1 of them. need attach appropriate container e.g. running

docker exec -t -i <container id> /bin/bash 

that command start shell inside running container.


Comments

Popular posts from this blog

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

How to show in django cms breadcrumbs full path? -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -