[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message in spark tools when trying to access a local file from other nodes #1417

Open
akiezun opened this issue Jan 6, 2016 · 15 comments

Comments

@akiezun
Copy link
Contributor
akiezun commented Jan 6, 2016

I'm almost certain this used to work.

./bin/gatk/gatk-launch FlagStatSpark -I file:///local/dev/akiezun/bin/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam -- --sparkRunner SPARK --sparkMaster yarn-client

the error is

java.lang.IllegalArgumentException: Wrong FS: file:/local/dev/akiezun/bin/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam, expected: hdfs://dataflow01.broadinstitute.org:8020
    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:654)
    at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:474)
    at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource.getHeader(ReadsSparkSource.java:181)
    at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeReads(GATKSparkTool.java:284)

It's fine when running a LOCAL runner, or when the file is on HDFS.

When resolving the ticket, make sure to devise a way (or at least enter a ticket) to prevent this from happening again - ie some way to discover this kind of problem.

@droazen droazen added this to the alpha-1 milestone Jan 6, 2016
@lbergelson
Copy link
Member

@akiezun Are you running this on your laptop or on one of the dataflow nodes? I suspect it will work on your laptop but not on the dataflow nodes.

I believe this is the inverse problem of #1324

@akiezun
Copy link
Contributor Author
akiezun commented Jan 6, 2016

this is on yarn on dataflow01

@lbergelson
Copy link
Member

This same issue is coming up in a number of different ways. What's happening is that we ask for the FileSystem and then try to open a path using whatever it gives back. The FileSystem it returns only knows how to open 1 type of file, either file:/// or hdfs:///. We could specify a specific file system, but I'm afraid that there will be nasty consequences from that that I don't understand... like getting the wrong hdfs block size or something like that.

          FileSystem fs = FileSystem.get(ctx.hadoopConfiguration());
            Path path = fs.makeQualified(new Path(filePath));
            if (fs.isDirectory(path)) {
                FileStatus[] bamFiles = fs.listStatus(path, new PathFilter() {
                    private static final long serialVersionUID = 1L;
                    @Override
                    public boolean accept(Path path) {
                        return path.getName().startsWith(HADOOP_PART_PREFIX);
                    }
                });
                if (bamFiles.length == 0) {
                    throw new UserException("No BAM files to load header from in: " + path);
                }
                path = bamFiles[0].getPath(); // Hadoop-BAM writes the same header to each shard, so use the first one
            }
            return SAMHeaderReader.readSAMHeaderFrom(path, ctx.hadoopConfiguration());

@lbergelson
Copy link
Member

If you git bisect can you find a commit where this does work?

@lbergelson
Copy link
Member

I was assuming that this would be fixed by #1433, but it only fixed the inverse of this problem. It's possible now to load HDFS files from the local runner using the full namenode path

i.e.
hdfs://dataflow01.broadinstitute.org/user/louisb/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.bam,

Loading files with the sparkRunner and yarn-client is still failing.

We're getting a new error now though.

java.lang.IllegalArgumentException: unknown SAM format, cannot create RecordReader: file:/local/dev/akiezun/bin/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam
    at org.seqdoop.hadoop_bam.AnySAMInputFormat.createRecordReader(AnySAMInputFormat.java:181)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:151)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:124)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

@lbergelson
Copy link
Member

We've been hit with a collective case of confusion. Of course this is going to fail. We're accessing a file on local. This is not visible to the executors on the other nodes. Thus the explosion.

@droazen
Copy link
Collaborator
droazen commented Jan 27, 2016

Doh! Perhaps we could improve the error message in this case?

@lbergelson
Copy link
Member

Yes. We desperately need to improve error messages for hadoop bam.

@droazen droazen changed the title regression: can't run Spark tools on local files when using yarn Improve error message in spark tools when trying to access a local file from other nodes Mar 10, 2016
@droazen
Copy link
Collaborator
droazen commented Mar 10, 2016

Updated the title of this ticket to clarify that the task is now just to improve the error message, not fix an actual bug.

@akiezun
Copy link
Contributor Author
akiezun commented Mar 18, 2016

this is the same as #1452. The bug is in hadoop-bam - let's not put a bandaid for that here.

@akiezun
Copy link
Contributor Author
akiezun commented Mar 18, 2016

opened HadoopGenomics/Hadoop-BAM#79

blocked until that is fixed. not alpha-1

@akiezun akiezun removed this from the alpha-1 milestone Mar 18, 2016
@tomwhite
Copy link
Contributor

@akiezun I created a fix for HadoopGenomics/Hadoop-BAM#79 in HadoopGenomics/Hadoop-BAM#99, which should fix this. Can you take a look?

@akiezun akiezun added this to the alpha-2 milestone Jul 1, 2016
@akiezun
Copy link
Contributor Author
akiezun commented Jul 6, 2016

yes, though not today

@akiezun akiezun modified the milestones: alpha-3, alpha-2 Jul 6, 2016
@akiezun
Copy link
Contributor Author
akiezun commented Jul 28, 2016

@tomwhite So I'm running on the latest gatk which uses hadoop-bam 7.6.0 (which i think includes those fixes) and I still get this error. I'm running on a dataproc cluster and my exact commandline is:

./gatk-launch FlagStatSpark -I file:///home/akiezun/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam -- --sparkRunner SPARK --sparkMaster yarn-client

the file /home/akiezun/gatk/src/test/resources/org/broadinstitute/hellbender/tools/valid.bam does exit locally on the master node.

Can you reproduce it too? What do you get?

@akiezun akiezun assigned droazen and unassigned akiezun Jul 29, 2016
@akiezun
Copy link
Contributor Author
akiezun commented Jul 29, 2016

assigning to @droazen for dispatch

@droazen droazen removed this from the alpha-3 milestone Aug 1, 2016
@droazen droazen assigned tomwhite and unassigned droazen Mar 20, 2017
@droazen droazen added this to the beta milestone Mar 20, 2017
@droazen droazen modified the milestones: 4.0 release, beta May 30, 2017
@tomwhite tomwhite removed their assignment Jun 13, 2017
@droazen droazen removed this from the Engine-4.0 milestone Oct 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants