Skip to content

Commit a42f711

Browse files
authored
[DOCS]: Fix typos and consistency issues in docs (#1360)
1 parent 85025ad commit a42f711

File tree

14 files changed

+114
-116
lines changed

14 files changed

+114
-116
lines changed

docs/docs/ai/llm.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ description: LLMs integrated with CocoIndex for various built-in functions
66
import Tabs from '@theme/Tabs';
77
import TabItem from '@theme/TabItem';
88

9-
CocoIndex provides builtin functions integrating with various LLM APIs, for various inference tasks:
9+
CocoIndex provides built-in functions integrating with various LLM APIs, for various inference tasks:
1010
* [Text Generation](#text-generation): use LLM to generate text.
1111
* [Text Embedding](#text-embedding): embed text into a vector space.
1212

@@ -36,7 +36,7 @@ We support the following types of LLM APIs:
3636

3737
Generation is used as a building block for certain CocoIndex functions that process data using LLM generation.
3838

39-
We have one builtin functions using LLM generation for now:
39+
We have one built-in function using LLM generation for now:
4040

4141
* [`ExtractByLlm`](/docs/ops/functions#extractbyllm): it extracts information from input text.
4242

@@ -56,7 +56,7 @@ It has the following fields:
5656

5757
Embedding means converting text into a vector space, usually for similarity matching.
5858

59-
We provide a builtin function [`EmbedText`](/docs/ops/functions#embedtext) that converts a given text into a vector space.
59+
We provide a built-in function [`EmbedText`](/docs/ops/functions#embedtext) that converts a given text into a vector space.
6060
The spec takes the following fields:
6161

6262
* `api_type` (type: `cocoindex.LlmApiType`, required)
@@ -171,9 +171,9 @@ cocoindex.functions.EmbedText(
171171

172172
Google exposes Gemini through Google AI Studio APIs.
173173
Based on [Gemini API recommendation](https://cloud.google.com/ai/gemini?hl=en), this is recommended for experimenting and prototyping purposes.
174-
You may use [Vertex AI](#vertex-ai) for production usages.
174+
You may use [Vertex AI](#vertex-ai) for production use.
175175

176-
To use the Gemini by Google AI Studio API, you need to set the environment variable `GEMINI_API_KEY`.
176+
To use Gemini via the Google AI Studio API, you need to set the environment variable `GEMINI_API_KEY`.
177177
You can generate the API key from [Google AI Studio](https://aistudio.google.com/apikey).
178178

179179
You can find the full list of models supported by Gemini [here](https://ai.google.dev/gemini-api/docs/models).

docs/docs/core/cli-commands.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Modes of operation:
1313

1414
**Usage:**
1515

16-
```sh
16+
```bash
1717
cocoindex drop [OPTIONS] [APP_TARGET] [FLOW_NAME]...
1818
```
1919

@@ -45,7 +45,7 @@ flow.
4545

4646
**Usage:**
4747

48-
```sh
48+
```bash
4949
cocoindex evaluate [OPTIONS] APP_FLOW_SPECIFIER
5050
```
5151

@@ -72,7 +72,7 @@ the backend.
7272

7373
**Usage:**
7474

75-
```sh
75+
```bash
7676
cocoindex ls [OPTIONS] [APP_TARGET]
7777
```
7878

@@ -95,7 +95,7 @@ It will allow tools like CocoInsight to access the server.
9595

9696
**Usage:**
9797

98-
```sh
98+
```bash
9999
cocoindex server [OPTIONS] APP_TARGET
100100
```
101101

@@ -128,7 +128,7 @@ storage and target (to export to).
128128

129129
**Usage:**
130130

131-
```sh
131+
```bash
132132
cocoindex setup [OPTIONS] APP_TARGET
133133
```
134134

@@ -160,7 +160,7 @@ flow.
160160

161161
**Usage:**
162162

163-
```sh
163+
```bash
164164
cocoindex show [OPTIONS] APP_FLOW_SPECIFIER
165165
```
166166

@@ -184,7 +184,7 @@ or `module:FlowName`. If `:FlowName` is omitted, updates all flows.
184184

185185
**Usage:**
186186

187-
```sh
187+
```bash
188188
cocoindex update [OPTIONS] APP_FLOW_SPECIFIER
189189
```
190190

docs/docs/core/settings.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ See [List of Environment Variables](#list-of-environment-variables) for specific
2828

2929
:::tip
3030

31-
You can consider place a `.env` file in your directory.
31+
You can consider placing a `.env` file in your directory.
3232
The [CLI](cli#environment-variables) will load environment variables from the `.env` file (see [CLI](cli#environment-variables) for more details).
3333
From your own main module, you can also load environment variables with a package like [`python-dotenv`](https://github.com/theskumar/python-dotenv).
3434

@@ -69,7 +69,7 @@ cocoindex.init(
6969
For example, you can call it in the main function of your application.
7070
Once the `cocoindex.init()` is called with a `cocoindex.Settings` dataclass object as argument, the `@cocoindex.settings` function and environment variables will be ignored.
7171

72-
This is more flexible, as you can more easily construct `cocoindex.Settings` based on other stuffs you loaded earlier.
72+
This is more flexible, as you can more easily construct `cocoindex.Settings` based on other objects or instances you loaded earlier.
7373
But be careful that if you call `cocoindex.init()` only under the path of main (e.g. within `if __name__ == "__main__":` guard), it won't be executed when you're using CocoIndex CLI, as it won't execute your main logic.
7474

7575
:::info
@@ -123,7 +123,7 @@ If not set, all flows are in a default unnamed namespace.
123123

124124
:::tip
125125

126-
Please be careful that all values in `url` needs to be url-encoded if they contain special characters.
126+
Please be careful that all values in `url` need to be URL-encoded if they contain special characters.
127127
For this reason, prefer to use the separated `user` and `password` fields for username and password.
128128

129129
:::
@@ -140,7 +140,7 @@ If not set, all flows are in a default unnamed namespace.
140140

141141
If you use the Postgres database hosted by [Supabase](https://supabase.com/), please click **Connect** on your project dashboard and find the following URL:
142142

143-
* If you're on a IPv6 network, use the URL under **Direct connection**. You can visit [IPv6 test](https://test-ipv6.com/) to see if you have IPv6 Internet connection.
143+
* If you're on an IPv6 network, use the URL under **Direct connection**. You can visit [IPv6 test](https://test-ipv6.com/) to see if you have IPv6 Internet connection.
144144
* Otherwise, use the URL under **Session pooler**.
145145
Note that Supabase has a pool size limit of 15 by default, while CocoIndex's default `max_connections` value is 25.
146146
You can adjust either value to make sure Supabase's pool size limit is greater than CocoIndex's `max_connections` value.
@@ -156,7 +156,7 @@ If you use the Postgres database hosted by [Supabase](https://supabase.com/), pl
156156
* `source_max_inflight_rows` (type: `int | None`, default: `1024`): The maximum number of concurrent inflight rows for all source operations.
157157
* `source_max_inflight_bytes` (type: `int | None`, default: `None`): The maximum number of concurrent inflight bytes for all source operations.
158158

159-
See also [flow definition docs](/docs/core/flow_def#control-processing-concurrency) about why it's necessary to control processing concurrency, and how to configure it on per-source basis.
159+
See also [flow definition docs](/docs/core/flow_def#control-processing-concurrency) about why it's necessary to control processing concurrency, and how to configure it on a per-source basis.
160160
If both global and per-source limits are specified, both need to be satisfied to admit additional source rows.
161161

162162
## List of Environment Variables

docs/docs/examples/examples/custom_source_hackernews.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ In many scenarios, pipelines don't just read from clean tables. They depend on:
3535
- Legacy systems
3636
- Non-standard data models that don’t fit traditional connectors
3737

38-
CocoIndex’s Custom Source API makes these integrations *declarative*, incremental, and safe by default.
38+
CocoIndex’s Custom Source API makes these integrations *declarative*, incremental, and safe by default.
3939

4040
## Overview
4141

@@ -50,7 +50,7 @@ The pipeline consists of three major parts:
5050
2. Build an index with CocoIndex Flow
5151
- Collect thread content
5252
- Collect all comments recursively
53-
- Export to a Postgres table
53+
- Export to a Postgres table
5454
3. Add a lightweight query handler
5555
- Uses PostgreSQL full-text search
5656
- Returns ranked matches for a keyword query
@@ -68,7 +68,7 @@ Every custom source defines two lightweight data types:
6868
- Key Type → uniquely identifies an item
6969
- Value Type → the full content for that item
7070

71-
In hacker news, each news is a thread, and each thread can have multiple comments.
71+
In hacker news, each news is a thread, and each thread can have multiple comments.
7272
![HackerNews Thread and Comments](/img/examples/custom_source_hackernews/hackernews.png)
7373

7474
For HackerNews, let’s define keys like this:
@@ -189,7 +189,7 @@ async def list(
189189
# Use HackerNews search API
190190
search_url = "https://hn.algolia.com/api/v1/search_by_date"
191191
params: dict[str, Any] = {"hitsPerPage": self._spec.max_results}
192-
192+
193193
if self._spec.tag:
194194
params["tags"] = self._spec.tag
195195
async with self._session.get(search_url, params=params) as response:
@@ -390,9 +390,9 @@ Your app can now query it as a real-time search index.
390390

391391
## Querying & Searching the HackerNews Index
392392

393-
With the index flow complete, the next step is to add a query handler.
393+
With the index flow complete, the next step is to add a query handler.
394394
This allows you to search and explore your indexed HackerNews data directly in CocoInsight.
395-
You can implement the query logic using any preferred library or framework.
395+
You can implement the query logic using any preferred library or framework.
396396

397397
<DocumentationButton url="https://cocoindex.io/docs/query#query-handler" text="Query Handler" margin="0 0 16px 0" />
398398

@@ -433,9 +433,9 @@ def search_text(query: str) -> cocoindex.QueryOutput:
433433
return cocoindex.QueryOutput(results=results)
434434
```
435435

436-
This example shows how to create a query handler that lets you search HackerNews threads and comments stored in CocoIndex.
436+
This example shows how to create a query handler that lets you search HackerNews threads and comments stored in CocoIndex.
437437
- The handler looks up the correct database table, then uses PostgreSQL’s full-text search functions (`to_tsvector` and `plainto_tsquery`) to find entries that match your search terms.
438-
- Matching results are sorted by their relevance (`ts_rank`) and by creation time, then converted to dictionaries.
438+
- Matching results are sorted by their relevance (`ts_rank`) and by creation time, then converted to dictionaries.
439439
- Finally, these results are returned in a `cocoindex.QueryOutput` object—making it easy to perform fast, ranked searches across your indexed HackerNews content.
440440

441441
## Running Your HackerNews Custom Source
@@ -480,7 +480,7 @@ cocoindex update -L main
480480

481481
## 3. Troubleshoot & Inspect with CocoInsight
482482

483-
CocoInsight lets you **visualize and debug your flow**, see the lineage of your data, and understand what’s happening under the hood.
483+
CocoInsight lets you **visualize and debug your flow**, see the lineage of your data, and understand what’s happening under the hood.
484484

485485
Start the server:
486486

@@ -491,9 +491,9 @@ cocoindex server -ci main
491491
Then open the UI in your browser: [`https://cocoindex.io/cocoinsight`](https://cocoindex.io/cocoinsight)
492492

493493
> CocoInsight has zero pipeline data retention — it’s safe for debugging and inspecting your flows locally.
494-
>
494+
>
495495
496-
Note that this requires QueryHandler setup in previous step.
496+
Note that this requires QueryHandler setup in previous step.
497497

498498

499499
## What You Can Build Next
@@ -547,8 +547,8 @@ Custom Sources extend this model to *any* API — internal, external, legacy, or
547547
This unlocks a simple but powerful pattern:
548548

549549
> If you can fetch it, CocoIndex can index it, diff it, and sync it.
550-
>
550+
>
551551
552552
## ⭐ Try It, Fork It, Star It
553553

554-
If you found this useful, a **star on [GitHub](https://github.com/cocoindex-io/cocoindex)** means a lot — it helps others discover CocoIndex and supports further development.
554+
If you found this useful, a **star on [GitHub](https://github.com/cocoindex-io/cocoindex)** means a lot — it helps others discover CocoIndex and supports further development.

docs/docs/examples/examples/hackernews_trending_topics.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ import { DocumentationButton, GitHubButton } from '../../../src/components/GitHu
2121

2222
In the age of information overload, understanding what's trending—and why—is crucial for developers, researchers, and data engineers. HackerNews is one of the most influential tech communities, but manually tracking emerging topics across thousands of threads and comments is practically impossible.
2323

24-
What if you could automatically index HackerNews content, extract topics using AI, and query trending discussions in real-time? That's exactly what CocoIndex enables through its [**Custom Sources**](https://cocoindex.io/blogs/custom-source) framework
24+
What if you could automatically index HackerNews content, extract topics using AI, and query trending discussions in real-time? That's exactly what CocoIndex enables through its [**Custom Sources**](https://cocoindex.io/blogs/custom-source) framework
2525
combined with [LLM-powered extraction](https://cocoindex.io/docs/ai/llm).
2626

2727
In this post, we'll explore the **HackerNews Trending Topics** example, a production-ready pipeline that demonstrates some of the most powerful concepts in CocoIndex: incremental data syncing, LLM-powered information extraction, and queryable indexes.
@@ -120,7 +120,7 @@ class _HackerNewsThreadKey(NamedTuple):
120120
thread_id: str
121121
```
122122
123-
CocoIndex recommends to use stable keys. HN thread IDs are perfect keys.
123+
CocoIndex recommends using stable keys. HN thread IDs are perfect keys.
124124
125125
#### Value Type
126126
@@ -319,7 +319,7 @@ The genius of Custom Sources is the **separation of discovery from fetching**.
319319
Sync 1:
320320
list() → 200 thread IDs + timestamps (fast)
321321
get_value() → 200 full threads (expensive)
322-
322+
323323
Sync 2 (30s later):
324324
list() → 200 thread IDs + timestamps (fast)
325325
[CocoIndex detects only 15 changed]
@@ -418,7 +418,7 @@ def hackernews_trending_topics_flow(
418418
419419
```
420420
421-
This block sets up a CocoIndex flow that fetches HackerNews stories and prepares them for indexing. It registers a flow called **HackerNewsTrendingTopics**, then adds a `HackerNewsSource` that retrieves up to 200 stories and refreshes every 30 seconds, storing the result in `data_scope["threads"]` for downstream steps.
421+
This block sets up a CocoIndex flow that fetches HackerNews stories and prepares them for indexing. It registers a flow called **HackerNewsTrendingTopics**, then adds a `HackerNewsSource` that retrieves up to 200 stories and refreshes every 30 seconds, storing the result in `data_scope["threads"]` for downstream steps.
422422
423423
<DocumentationButton url="https://cocoindex.io/docs/core/flow_def" text="Flow Definition Docs" margin="0 0 16px 0" />
424424
@@ -502,7 +502,7 @@ with data_scope["threads"].row() as thread:
502502
)
503503
```
504504
505-
This block processes each HackerNews thread as it flows through the pipeline. Inside `data_scope["threads"].row()`, each `thread` represents a single story record.
505+
This block processes each HackerNews thread as it flows through the pipeline. Inside `data_scope["threads"].row()`, each `thread` represents a single story record.
506506
507507
- We use an LLM (`gpt-5-mini`) to extract semantic **topics** from the thread's text by applying `ExtractByLlm`, which returns a list of `Topic` objects.
508508
- We use `message_index` to collect relevant metadata for this thread.
@@ -630,7 +630,7 @@ def search_by_topic(topic: str) -> cocoindex.QueryOutput:
630630
631631
```
632632
633-
This block adds a query interface to the flow so users can search HackerNews content by topic.
633+
This block adds a query interface to the flow so users can search HackerNews content by topic.
634634
635635
The `@hackernews_trending_topics_flow.query_handler()` decorator registers `search_by_topic()` as a query endpoint for the flow.
636636
@@ -668,7 +668,7 @@ def get_threads_for_topic(topic: str) -> list[dict[str, Any]]:
668668
]
669669
```
670670
671-
This function finds all HackerNews threads related to a given topic and ranks them. It looks up the topic table in the database, calculates a **score** for each thread (higher for main threads, lower for comments), and finds the most recent mention.
671+
This function finds all HackerNews threads related to a given topic and ranks them. It looks up the topic table in the database, calculates a **score** for each thread (higher for main threads, lower for comments), and finds the most recent mention.
672672
673673
It then returns a list of threads with their URL, score, and latest mention time, so you can see which threads are most relevant or trending for that topic.
674674
@@ -681,7 +681,7 @@ By weighting 5:1, the system prioritizes original discussions over tangential me
681681
682682
### get_trending_topics(limit) → Top 20 topics by score
683683
684-
This function finds the **most popular topics** on HackerNews.
684+
This function finds the **most popular topics** on HackerNews.
685685
686686
```python
687687
@hackernews_trending_topics_flow.query_handler()
@@ -755,7 +755,7 @@ cocoindex server -ci main
755755
Then open the UI in your browser: [**`https://cocoindex.io/cocoinsight`**](https://cocoindex.io/cocoinsight)
756756
757757
> CocoInsight has zero pipeline data retention — it’s safe for debugging and inspecting your flows locally.
758-
>
758+
>
759759
760760
Note that this requires QueryHandler setup in previous step.
761761
@@ -823,23 +823,23 @@ By integrating with an AI agent framework, the indexed topics and threads become
823823
## Why Use CocoIndex for This?
824824
825825
1. **Incremental Sync**
826-
826+
827827
Traditional pipelines fetch everything repeatedly. CocoIndex fetches only new or updated threads, dramatically reducing API calls and latency.
828-
828+
829829
2. **Declarative & Modular**
830-
830+
831831
Flows, collectors, and query handlers are modular. You can add new sources (Reddit, Twitter, internal chat logs) or new transformations (summaries, sentiment analysis, embeddings) without rewriting the entire pipeline.
832-
832+
833833
3. **LLM Integration is Seamless**
834-
834+
835835
CocoIndex treats LLMs as first-class citizens for structured extraction. You don’t need complex glue code — the framework handles transformation, type enforcement, and storage.
836-
836+
837837
4. **Queryable Structured Index**
838-
838+
839839
Topics and messages are stored in Postgres, ready for SQL queries or API-based search. You can serve both analytics dashboards and AI agents from the same structured store.
840-
840+
841841
5. **Supports Continuous Workflows**
842-
842+
843843
CocoIndex pipelines can run live, with real-time updates every few seconds. Combine this with AI agents, and you have a **self-updating knowledge system** that reasons over the latest information.
844844
845845
@@ -849,4 +849,4 @@ CocoIndex is designed for systems that need to continuously monitor, detect, and
849849
850850
## Support Us ❤️
851851
852-
Enjoying CocoIndex? Give us a [⭐️ on GitHub](https://github.com/cocoindex-io/cocoindex) and share it with your peers. Every star helps more developers discover the project and strengthens the community.
852+
Enjoying CocoIndex? Give us a [⭐️ on GitHub](https://github.com/cocoindex-io/cocoindex) and share it with your peers. Every star helps more developers discover the project and strengthens the community.

0 commit comments

Comments
 (0)