feat: skip large rows by sarthakbhutani · Pull Request #2482 · googleapis/java-bigtable

sarthakbhutani · 2025-02-11T14:54:14Z

Tasks remaining -

make changes in the read request API which skips for large rows/internally calls readLargeRowsCallable()
expose large rows rowkeys in sidechannel/dlq/some other method - which can be exposed to client

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)
Rollback plan is reviewed and LGTMed
All new data plane features have a completed end to end testing plan

Fixes #<issue_number_goes_here> ☕️

If you write sample code, please follow the samples format.

google-cla · 2025-02-11T14:54:20Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

mutianf

I haven't looked at RowSetUtil or any of the tests but left some comments you can work on first.

mutianf · 2025-02-12T02:10:02Z

+/**
+ * This callable converts the "Received rst stream" exception into a retryable {@link ApiException}.
+ */
+public final class LargeRowReadCallable<RequestT, ResponseT, RowT>


Since you're modifying the ServerStreamingAttemptCallable directly we can remove this class.

mutianf · 2025-02-12T02:10:43Z

+    @Override
+    protected void onErrorImpl(Throwable t) {
+      // this has no impact
+      // if (resumptionStrategy instanceof LargeReadRowsResumptionStrategy) {


I think this has no effect because we create a new strategy here: https://github.com/googleapis/sdk-platform-java/blob/main/gax-java/gax/src/main/java/com/google/api/gax/rpc/RetryingServerStreamingCallable.java#L80 (which I didn't notice previously).

correct!! I believe - we create new resumption strategy, so that parallel requests have their own state tracking via new resumption strategy objects per request

sarthakbhutani

have resolved your comments - which are implemented in code.

unresolved comments - have not done - require a discussion - can discuss offline?

sarthakbhutani · 2025-02-12T06:03:17Z

+    }
+  }
+
+  public void dumpLargeRowKeys() {


this method can be implemented later. have added javadoc for the same

sarthakbhutani · 2025-02-12T07:02:04Z

+    @Override
+    protected void onErrorImpl(Throwable t) {
+      // this has no impact
+      // if (resumptionStrategy instanceof LargeReadRowsResumptionStrategy) {


correct!! I believe - we create new resumption strategy, so that parallel requests have their own state tracking via new resumption strategy objects per request

sarthakbhutani · 2025-02-12T07:25:02Z

+        new RowMergingCallable<>(convertException, rowAdapter);
+
+    LargeReadRowsResumptionStrategy<RowT> largeRowResumptionStrategy;
+    largeRowResumptionStrategy = new LargeReadRowsResumptionStrategy<RowT>(rowAdapter);


testing this

sarthakbhutani · 2025-02-12T10:25:36Z

@@ -0,0 +1,125 @@
+/*


[duplicate comment]
[can discuss offline]

it is required.
it goes to the convertableExceptionCallable layer, which converts the FailedPreCondition Exception to ApiException & set the exception as ```retryable:true``

which then gets thrown to the ResumptionStrategy layer. Hence, this is required.

I have confirmed this in testing/debugging as well.

sarthakbhutani

added comments

mutianf · 2025-02-13T01:44:42Z

            .withRetrySettings(settings.readRowsSettings().getRetrySettings()));
  }

+  public <RowT> ServerStreamingCallable<Query, RowT> createSkipLargeRowsBaseCallable(


I don't think we need to create a base callable for this, so you can probably just extract everything from the other base callable and put it in this method. Otherwise they're all called base callables and it gets a bit confusing.

mutianf · 2025-02-13T01:47:09Z

+        createSkipLargeRowsBaseCallable(
+            settings.readRowsSettings(),
+            rowAdapter,
+            new LargeReadRowsResumptionStrategy<RowT>(rowAdapter));


I don't think you need to pass in the resumption strategy here because it's already overriden in the settings in the other callable (line 580). So this variable is not doing anything.

mutianf · 2025-02-13T02:00:10Z


          @Override
          public void onErrorImpl(Throwable t) {
+            if (resumptionStrategy.getClass() == LargeReadRowsResumptionStrategy.class) {


Suggested change

if (resumptionStrategy.getClass() == LargeReadRowsResumptionStrategy.class) {

if (resumptionStrategy instanceof LargeReadRowsResumptionStrategy) {

This still needs to be updated.

sarthakbhutani

have made the changes

sarthakbhutani · 2025-02-13T05:57:28Z

+    remaining =
+        RowSetUtil.erase(originalRequest.getRows(), lastSuccessKey, !originalRequest.getReversed());
+    if (!largeRowKeys.isEmpty()) {
+      for (ByteString largeRowKey : largeRowKeys) {


have made the changes as you suggested.
Open question - is there a reason - why we didn't do this earlier. thinking of any edge cases that may fail.

generated-files-bot · 2025-02-13T09:52:02Z

Warning: This pull request is touching the following templated files:

.kokoro/presubmit/graalvm-native.cfg
samples/snapshot/pom.xml

… tests & wip integration tests

…haviour which returns error details,metadata on encountering large rows error

mutianf · 2025-02-13T14:32:18Z


          @Override
          public void onErrorImpl(Throwable t) {
+            if (resumptionStrategy.getClass() == LargeReadRowsResumptionStrategy.class) {


This still needs to be updated.

mutianf · 2025-02-19T01:45:33Z

+   * @param fromStart
+   * @return
+   */
+  public static List<RowRange> eraseKeyFromRange(RowRange range, ByteString split, boolean fromStart) {


You can simplify this logic:

private static List<RowRange> splitOnLargeRowKey(RowRange range, ByteString largeRowKey) { List<RowRange> rowRanges = new ArrayList<>(); ByteString startKey = StartPoint.extract(range).value; ByteString endKey = EndPoint.extract(range).value; // if end key is on the left of large row key, don't split if (ByteStringComparator.INSTANCE.compare(endKey, largeRowKey) < 0) { rowRanges.add(range); return rowRanges; } // if start key is on the right of the large row key, don't split if (ByteStringComparator.INSTANCE.compare(startKey, largeRowKey) > 0) { rowRanges.add(range); return rowRanges; } // if start key is on the left of the large row key, set the end key to be large row key open if (ByteStringComparator.INSTANCE.compare(startKey, largeRowKey) < 0) { RowRange beforeSplit = range .toBuilder() .setEndKeyOpen(largeRowKey) .build(); rowRanges.add(beforeSplit); } // if the end key is on the right of the large row key, set the start key to be large row key open if (ByteStringComparator.INSTANCE.compare(endKey, largeRowKey) > 0) { RowRange afterSplit = range .toBuilder() .setStartKeyOpen(largeRowKey) .build(); rowRanges.add(afterSplit); } return rowRanges; }

This should cover all edge cases.

have done. testing.

mutianf · 2025-02-19T01:46:56Z

 public final class RowSetUtil {
  private RowSetUtil() {}

+  public static RowSet createSplitRanges(


I think you should also resume from the last seen row key.

I think you want this:

public static RowSet eraseLargeRow(RowSet rowSet, ByteString lastSeenRowKey, ByteString largeRowKey, boolean fromStart) { // first, remove everything we've already read from the RowSet RowSet remaining = erase(rowSet, lastSeenRowKey, fromStart); // return null if we've read everything if (remaining == null) { return null; } // second, remove the large row key from the remaining RowSet RowSet.Builder newRowSet = RowSet.newBuilder(); // remove large row key from point reads remaining.getRowKeysList().stream().filter(k -> !k.equals(largeRowKey)).forEach(newRowSet::addRowKeys); // remove large row key from row ranges for (RowRange range : remaining.getRowRangesList()) { List<RowRange> afterSplit = splitOnLargeRowKey(range, largeRowKey); if (!afterSplit.isEmpty()) { afterSplit.forEach(newRowSet::addRowRanges); } } if (newRowSet.getRowKeysList().isEmpty() && newRowSet.getRowRangesList().isEmpty()) { return null; } return newRowSet.build(); }

splitOnLargeRowKey is in my other comment.

And in your resumption strategy, you wouldn't need to keep the previous request anymore, because that'll be removed by the RowSet remaining = erase(rowSet, lastSeenRowKey, fromStart);

we needed to keep the prev request, because, for the use case of multiple large row keys together.
if we dont keep the prev request, if the prev request failed because of large-row and the next request also failed because of large-row, in this request, we will remove the 2nd large-row key & the prev one would be there, this would keep failing

ex -
request [r1,r4]
r1 - success key
r2 - large-row key
r3 - large-row key
r4 - large-row key

original request -> [r1,r4]
r1 read, r2 fails -> req becomes (r1,r2),(r2,r4]
r3 fails, req becomes -> (r1,r3), (r3,r4] -> which will now fail for r2 again (if the prev request or prev failed row keys are not cached)

mutianf · 2025-02-20T01:15:22Z

@@ -0,0 +1,125 @@
+/*
+ * Copyright 2021 Google LLC


Please remove this file

sarthakbhutani

add comments

sarthakbhutani · 2025-02-21T17:16:30Z

+   * @param fromStart
+   * @return
+   */
+  public static List<RowRange> eraseKeyFromRange(RowRange range, ByteString split, boolean fromStart) {


have done. testing.

mutianf · 2025-02-21T21:51:30Z

/gcbrun

🤖 I have created a release *beep* *boop* --- ## [2.53.0](https://togithub.com/googleapis/java-bigtable/compare/v2.52.0...v2.53.0) (2025-02-21) ### Features * Skip large rows ([#2482](https://togithub.com/googleapis/java-bigtable/issues/2482)) ([cd7f82e](https://togithub.com/googleapis/java-bigtable/commit/cd7f82e4b66dc3c34262c73b26afc2fdfd1deed7)) --- This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).

Tasks remaining - - [ ] make changes in the read request API which skips for large rows/internally calls readLargeRowsCallable() - [ ] expose large rows rowkeys in sidechannel/dlq/some other method - which can be exposed to client Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly: - [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/java-bigtable/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [ ] Ensure the tests and linter pass - [ ] Code coverage does not decrease (if any source code was changed) - [ ] Appropriate docs were updated (if necessary) - [ ] Rollback plan is reviewed and LGTMed - [ ] All new data plane features have a completed end to end testing plan Fixes #<issue_number_goes_here> ☕️ If you write sample code, please follow the [samples format]( https://togithub.com/GoogleCloudPlatform/java-docs-samples/blob/main/SAMPLE_FORMAT.md).

🤖 I have created a release *beep* *boop* --- ## [2.53.0](https://togithub.com/googleapis/java-bigtable/compare/v2.52.0...v2.53.0) (2025-02-21) ### Features * Skip large rows ([#2482](https://togithub.com/googleapis/java-bigtable/issues/2482)) ([cd7f82e](https://togithub.com/googleapis/java-bigtable/commit/cd7f82e4b66dc3c34262c73b26afc2fdfd1deed7)) --- This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).

sarthakbhutani requested review from a team February 11, 2025 14:54

product-auto-label Bot added size: xl Pull request size is extra large. api: bigtable Issues related to the googleapis/java-bigtable API. labels Feb 11, 2025

mutianf reviewed Feb 12, 2025

View reviewed changes

sarthakbhutani commented Feb 12, 2025

View reviewed changes

Comment thread ...loud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/stub/EnhancedBigtableStub.java Outdated

Comment thread google-cloud-bigtable/src/main/java/com/google/cloud/bigtable/data/v2/BigtableDataClient.java

mutianf reviewed Feb 13, 2025

View reviewed changes

sarthakbhutani commented Feb 13, 2025

View reviewed changes

sarthakbhutani force-pushed the large-row-skip branch from 81dc408 to 18d453c Compare February 13, 2025 09:51

sarthakbhutani requested a review from a team February 13, 2025 09:51

sarthakbhutani force-pushed the large-row-skip branch 3 times, most recently from 19293fa to 3397e51 Compare February 13, 2025 10:08

sarthakbhutani and others added 10 commits February 13, 2025 15:45

feat: skip large rows

81e3f04

merged commit

b796d8c

feat: skip large rows | wip - added large row skip method - with unit…

0eb2269

… tests & wip integration tests

feat: skip large rows | integration tests are working with new AFE be…

bf7c17f

…haviour which returns error details,metadata on encountering large rows error

feat: skip large rows | formatted code

3cf3940

feat:skip large rows | corrected client.readrows() method

b29e73b

feat: skip large rows | feedback incorp

73ca04b

feat: skip large rows | feedback incorp

5b0eced

feat: skip large rows | feedback incorp & code formatting

192e7e1

feat: skip large rows | removed comments

76e8775

sarthakbhutani force-pushed the large-row-skip branch from 3397e51 to 76e8775 Compare February 13, 2025 10:21

Merge branch 'main' into large-row-skip

e49b332

mutianf changed the title ~~Large row skip~~ feat: skip large rows Feb 13, 2025

mutianf reviewed Feb 13, 2025

View reviewed changes

sarthakbhutani added 2 commits February 19, 2025 02:13

ignored large row read IT for emulator

7513c3c

Merge branch 'main' into large-row-skip

70a9329

sarthakbhutani force-pushed the large-row-skip branch from d80c953 to 70a9329 Compare February 18, 2025 21:07

Merge branch 'main' into large-row-skip

e810c36

mutianf reviewed Feb 18, 2025

View reviewed changes

mutianf reviewed Feb 19, 2025

View reviewed changes

Comment thread ...in/java/com/google/cloud/bigtable/data/v2/stub/readrows/LargeReadRowsResumptionStrategy.java

mutianf reviewed Feb 20, 2025

View reviewed changes

Comment thread ...in/java/com/google/cloud/bigtable/data/v2/stub/readrows/LargeReadRowsResumptionStrategy.java

feedback incorporation

4d926e4

mutianf reviewed Feb 21, 2025

View reviewed changes

sarthakbhutani commented Feb 21, 2025

View reviewed changes

mutianf reviewed Feb 21, 2025

View reviewed changes

Comment thread ...igtable/src/test/java/com/google/cloud/bigtable/data/v2/stub/readrows/ReadRowsRetryTest.java

sarthakbhutani added 2 commits February 22, 2025 01:28

feedback incorporation

e656186

feedback incorporation

83ea1a8

mutianf reviewed Feb 21, 2025

View reviewed changes

Comment thread ...igtable/src/test/java/com/google/cloud/bigtable/data/v2/stub/readrows/ReadRowsRetryTest.java Outdated

Comment thread ...igtable/src/test/java/com/google/cloud/bigtable/data/v2/stub/readrows/ReadRowsRetryTest.java

sarthakbhutani added 3 commits February 22, 2025 03:12

feedback incorporation

f5068c8

removed duplicate tests

7ed1d83

removed typo

749fa2a

mutianf approved these changes Feb 21, 2025

View reviewed changes

mutianf added the automerge Merge the pull request once unit tests and other checks pass. label Feb 21, 2025

gcf-merge-on-green Bot merged commit cd7f82e into googleapis:main Feb 21, 2025

gcf-merge-on-green Bot removed the automerge Merge the pull request once unit tests and other checks pass. label Feb 21, 2025

release-please Bot mentioned this pull request Feb 21, 2025

chore(main): release 2.53.0 #2492

Merged

release-please Bot mentioned this pull request Dec 15, 2025

chore(protobuf-4.x-rc): release 2.71.0-rc1 #2726

Merged

	if (resumptionStrategy.getClass() == LargeReadRowsResumptionStrategy.class) {
	if (resumptionStrategy instanceof LargeReadRowsResumptionStrategy) {

Conversation

sarthakbhutani commented Feb 11, 2025

Uh oh!

google-cla Bot commented Feb 11, 2025

Uh oh!

mutianf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sarthakbhutani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sarthakbhutani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sarthakbhutani left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

generated-files-bot Bot commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mutianf Feb 19, 2025 •

edited

Loading

mutianf Feb 19, 2025 •

edited

Loading